[PATCH RFC tip/core/rcu 0/14] Rework expedited grace periods

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC tip/core/rcu 0/14] Rework expedited grace periods
@ 2015-06-30 22:25 Paul E. McKenney
  2015-06-30 22:25 ` [PATCH RFC tip/core/rcu 01/14] rcu: Switch synchronize_sched_expedited() to stop_one_cpu() Paul E. McKenney
  0 siblings, 1 reply; 22+ messages in thread
From: Paul E. McKenney @ 2015-06-30 22:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani

Hello!

This RFC series reworks the handling of expedited grace periods, eliminating
the previous use of try_stop_cpus(), improving scalability, and adding
an RCU CPU stall-warning capability.  This series is as follows:

1.	Replace try_stop_cpus() with stop_one_cpu(), courtesy of
	Peter Zijlstra.

2.	Improve counter handling to more efficiently detect that some
	other task has already done the required work.

3.	Replace synchronize_sched_expedited()'s polling loop with
	funnel locking.

4.	Replace stop_one_cpu() with stop_one_cpu_nowait() to avoid the
	latency hit from #1 above, also courtesy of Peter Zijlstra.

5.	Abstract the counter handling.

6.	Apply the newly abstracted counter handling to
	synchronize_rcu_expedited().

7.	Abstract the funnel locking.

8.	Fix a stupid type error in synchronize_sched_expedited().

9.	Replace synchronize_rcu_expedited()'s polling with the newly
	abstracted funnel locking.

10.	Apply the abstracted counter handling to _rcu_barrier().

11.	Consolidate the last expedited open-coded expedited memory barrier
	into the counter handling.

12.	Extend the funnel locking to the rcu_data structure in order to
	further increase scalability.

13.	Add stall warnings to synchronize_sched_expedited().

14.	Document the newly added stall warnings.

							Thanx, Paul

------------------------------------------------------------------------

 b/Documentation/RCU/stallwarn.txt |   17 +
 b/include/trace/events/rcu.h      |    1 
 b/kernel/rcu/tree.c               |  581 +++++++++++++++++++-------------------
 b/kernel/rcu/tree.h               |   31 +-
 b/kernel/rcu/tree_plugin.h        |   53 +--
 b/kernel/rcu/tree_trace.c         |   25 -
 6 files changed, 369 insertions(+), 339 deletions(-)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH RFC tip/core/rcu 01/14] rcu: Switch synchronize_sched_expedited() to stop_one_cpu()
  2015-06-30 22:25 [PATCH RFC tip/core/rcu 0/14] Rework expedited grace periods Paul E. McKenney
@ 2015-06-30 22:25 ` Paul E. McKenney
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 02/14] rcu: Rework synchronize_rcu_expedited() counter handling Paul E. McKenney
                     ` (12 more replies)
  0 siblings, 13 replies; 22+ messages in thread
From: Paul E. McKenney @ 2015-06-30 22:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: Peter Zijlstra <peterz@infradead.org>

The synchronize_sched_expedited() currently invokes try_stop_cpus(),
which schedules the stopper kthreads on each online non-idle CPU,
and waits until all those kthreads are running before letting any
of them stop.  This is disastrous for real-time workloads, which
get hit with a preemption that is as long as the longest scheduling
latency on any CPU, including any non-realtime housekeeping CPUs.
This commit therefore switches to using stop_one_cpu() on each CPU
in turn.  This avoids inflicting the worst-case scheduling latency
on the worst-case CPU onto all other CPUs, and also simplifies the
code a little bit.

Follow-up commits will simplify the counter-snapshotting algorithm
and convert a number of the counters that are now protected by the
new ->expedited_mutex to non-atomic.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
[ paulmck: Kept stop_one_cpu(), dropped disabling of "guardrails". ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 41 ++++++++++++++---------------------------
 kernel/rcu/tree.h |  1 +
 2 files changed, 15 insertions(+), 27 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 78d0a87ff354..a30971474134 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -103,6 +103,7 @@ struct rcu_state sname##_state = { \
 	.orphan_nxttail = &sname##_state.orphan_nxtlist, \
 	.orphan_donetail = &sname##_state.orphan_donelist, \
 	.barrier_mutex = __MUTEX_INITIALIZER(sname##_state.barrier_mutex), \
+	.expedited_mutex = __MUTEX_INITIALIZER(sname##_state.expedited_mutex), \
 	.name = RCU_STATE_NAME(sname), \
 	.abbr = sabbr, \
 }
@@ -3357,8 +3358,6 @@ static int synchronize_sched_expedited_cpu_stop(void *data)
  */
 void synchronize_sched_expedited(void)
 {
-	cpumask_var_t cm;
-	bool cma = false;
 	int cpu;
 	long firstsnap, s, snap;
 	int trycount = 0;
@@ -3394,28 +3393,11 @@ void synchronize_sched_expedited(void)
 	}
 	WARN_ON_ONCE(cpu_is_offline(raw_smp_processor_id()));
 
-	/* Offline CPUs, idle CPUs, and any CPU we run on are quiescent. */
-	cma = zalloc_cpumask_var(&cm, GFP_KERNEL);
-	if (cma) {
-		cpumask_copy(cm, cpu_online_mask);
-		cpumask_clear_cpu(raw_smp_processor_id(), cm);
-		for_each_cpu(cpu, cm) {
-			struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
-
-			if (!(atomic_add_return(0, &rdtp->dynticks) & 0x1))
-				cpumask_clear_cpu(cpu, cm);
-		}
-		if (cpumask_weight(cm) == 0)
-			goto all_cpus_idle;
-	}
-
 	/*
 	 * Each pass through the following loop attempts to force a
 	 * context switch on each CPU.
 	 */
-	while (try_stop_cpus(cma ? cm : cpu_online_mask,
-			     synchronize_sched_expedited_cpu_stop,
-			     NULL) == -EAGAIN) {
+	while (!mutex_trylock(&rsp->expedited_mutex)) {
 		put_online_cpus();
 		atomic_long_inc(&rsp->expedited_tryfail);
 
@@ -3425,7 +3407,6 @@ void synchronize_sched_expedited(void)
 			/* ensure test happens before caller kfree */
 			smp_mb__before_atomic(); /* ^^^ */
 			atomic_long_inc(&rsp->expedited_workdone1);
-			free_cpumask_var(cm);
 			return;
 		}
 
@@ -3435,7 +3416,6 @@ void synchronize_sched_expedited(void)
 		} else {
 			wait_rcu_gp(call_rcu_sched);
 			atomic_long_inc(&rsp->expedited_normal);
-			free_cpumask_var(cm);
 			return;
 		}
 
@@ -3445,7 +3425,6 @@ void synchronize_sched_expedited(void)
 			/* ensure test happens before caller kfree */
 			smp_mb__before_atomic(); /* ^^^ */
 			atomic_long_inc(&rsp->expedited_workdone2);
-			free_cpumask_var(cm);
 			return;
 		}
 
@@ -3460,16 +3439,23 @@ void synchronize_sched_expedited(void)
 			/* CPU hotplug operation in flight, use normal GP. */
 			wait_rcu_gp(call_rcu_sched);
 			atomic_long_inc(&rsp->expedited_normal);
-			free_cpumask_var(cm);
 			return;
 		}
 		snap = atomic_long_read(&rsp->expedited_start);
 		smp_mb(); /* ensure read is before try_stop_cpus(). */
 	}
-	atomic_long_inc(&rsp->expedited_stoppedcpus);
 
-all_cpus_idle:
-	free_cpumask_var(cm);
+	/* Stop each CPU that is online, non-idle, and not us. */
+	for_each_online_cpu(cpu) {
+		struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
+
+		/* Skip our CPU and any idle CPUs. */
+		if (raw_smp_processor_id() == cpu ||
+		    !(atomic_add_return(0, &rdtp->dynticks) & 0x1))
+			continue;
+		stop_one_cpu(cpu, synchronize_sched_expedited_cpu_stop, NULL);
+	}
+	atomic_long_inc(&rsp->expedited_stoppedcpus);
 
 	/*
 	 * Everyone up to our most recent fetch is covered by our grace
@@ -3488,6 +3474,7 @@ all_cpus_idle:
 		}
 	} while (atomic_long_cmpxchg(&rsp->expedited_done, s, snap) != s);
 	atomic_long_inc(&rsp->expedited_done_exit);
+	mutex_unlock(&rsp->expedited_mutex);
 
 	put_online_cpus();
 }
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index de22d6d06bf9..b04ffa0dea58 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -478,6 +478,7 @@ struct rcu_state {
 						/*  _rcu_barrier(). */
 	/* End of fields guarded by barrier_mutex. */
 
+	struct mutex  expedited_mutex;		/* Serializes expediting. */
 	atomic_long_t expedited_start;		/* Starting ticket. */
 	atomic_long_t expedited_done;		/* Done ticket. */
 	atomic_long_t expedited_wrap;		/* # near-wrap incidents. */
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC tip/core/rcu 02/14] rcu: Rework synchronize_rcu_expedited() counter handling
  2015-06-30 22:25 ` [PATCH RFC tip/core/rcu 01/14] rcu: Switch synchronize_sched_expedited() to stop_one_cpu() Paul E. McKenney
@ 2015-06-30 22:25   ` Paul E. McKenney
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 03/14] rcu: Get rid of synchronize_sched_expedited()'s polling loop Paul E. McKenney
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Paul E. McKenney @ 2015-06-30 22:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Now that synchronize_rcu_expedited() have a mutex, it can use simpler
work-already-done detection scheme.  This commit simplifies this scheme
by using something similar to the sequence-locking counter scheme.
A counter is incremented before and after each grace period, so that
the counter is odd in the midst of the grace period and even otherwise.
So if the counter has advanced to the second even number that is
greater than or equal to the snapshot, the required grace period has
already happened.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c       | 98 +++++++++++++++----------------------------------
 kernel/rcu/tree.h       |  9 +----
 kernel/rcu/tree_trace.c | 12 ++----
 3 files changed, 36 insertions(+), 83 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a30971474134..68212d1bfaef 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3335,56 +3335,24 @@ static int synchronize_sched_expedited_cpu_stop(void *data)
  * restructure your code to batch your updates, and then use a single
  * synchronize_sched() instead.
  *
- * This implementation can be thought of as an application of ticket
- * locking to RCU, with sync_sched_expedited_started and
- * sync_sched_expedited_done taking on the roles of the halves
- * of the ticket-lock word.  Each task atomically increments
- * sync_sched_expedited_started upon entry, snapshotting the old value,
- * then attempts to stop all the CPUs.  If this succeeds, then each
- * CPU will have executed a context switch, resulting in an RCU-sched
- * grace period.  We are then done, so we use atomic_cmpxchg() to
- * update sync_sched_expedited_done to match our snapshot -- but
- * only if someone else has not already advanced past our snapshot.
- *
- * On the other hand, if try_stop_cpus() fails, we check the value
- * of sync_sched_expedited_done.  If it has advanced past our
- * initial snapshot, then someone else must have forced a grace period
- * some time after we took our snapshot.  In this case, our work is
- * done for us, and we can simply return.  Otherwise, we try again,
- * but keep our initial snapshot for purposes of checking for someone
- * doing our work for us.
- *
- * If we fail too many times in a row, we fall back to synchronize_sched().
+ * This implementation can be thought of as an application of sequence
+ * locking to expedited grace periods, but using the sequence counter to
+ * determine when someone else has already done the work instead of for
+ * retrying readers.  We do a mutex_trylock() polling loop, but if we fail
+ * too many times in a row, we fall back to synchronize_sched().
  */
 void synchronize_sched_expedited(void)
 {
 	int cpu;
-	long firstsnap, s, snap;
+	long s;
 	int trycount = 0;
 	struct rcu_state *rsp = &rcu_sched_state;
 
-	/*
-	 * If we are in danger of counter wrap, just do synchronize_sched().
-	 * By allowing sync_sched_expedited_started to advance no more than
-	 * ULONG_MAX/8 ahead of sync_sched_expedited_done, we are ensuring
-	 * that more than 3.5 billion CPUs would be required to force a
-	 * counter wrap on a 32-bit system.  Quite a few more CPUs would of
-	 * course be required on a 64-bit system.
-	 */
-	if (ULONG_CMP_GE((ulong)atomic_long_read(&rsp->expedited_start),
-			 (ulong)atomic_long_read(&rsp->expedited_done) +
-			 ULONG_MAX / 8)) {
-		wait_rcu_gp(call_rcu_sched);
-		atomic_long_inc(&rsp->expedited_wrap);
-		return;
-	}
+	/* Take a snapshot of the sequence number.  */
+	smp_mb(); /* Caller's modifications seen first by other CPUs. */
+	s = (READ_ONCE(rsp->expedited_sequence) + 3) & ~0x1;
+	smp_mb(); /* Above access must not bleed into critical section. */
 
-	/*
-	 * Take a ticket.  Note that atomic_inc_return() implies a
-	 * full memory barrier.
-	 */
-	snap = atomic_long_inc_return(&rsp->expedited_start);
-	firstsnap = snap;
 	if (!try_get_online_cpus()) {
 		/* CPU hotplug operation in flight, fall back to normal GP. */
 		wait_rcu_gp(call_rcu_sched);
@@ -3394,16 +3362,15 @@ void synchronize_sched_expedited(void)
 	WARN_ON_ONCE(cpu_is_offline(raw_smp_processor_id()));
 
 	/*
-	 * Each pass through the following loop attempts to force a
-	 * context switch on each CPU.
+	 * Each pass through the following loop attempts to acquire
+	 * ->expedited_mutex, checking for others doing our work each time.
 	 */
 	while (!mutex_trylock(&rsp->expedited_mutex)) {
 		put_online_cpus();
 		atomic_long_inc(&rsp->expedited_tryfail);
 
 		/* Check to see if someone else did our work for us. */
-		s = atomic_long_read(&rsp->expedited_done);
-		if (ULONG_CMP_GE((ulong)s, (ulong)firstsnap)) {
+		if (ULONG_CMP_GE(READ_ONCE(rsp->expedited_sequence), s)) {
 			/* ensure test happens before caller kfree */
 			smp_mb__before_atomic(); /* ^^^ */
 			atomic_long_inc(&rsp->expedited_workdone1);
@@ -3420,8 +3387,7 @@ void synchronize_sched_expedited(void)
 		}
 
 		/* Recheck to see if someone else did our work for us. */
-		s = atomic_long_read(&rsp->expedited_done);
-		if (ULONG_CMP_GE((ulong)s, (ulong)firstsnap)) {
+		if (ULONG_CMP_GE(READ_ONCE(rsp->expedited_sequence), s)) {
 			/* ensure test happens before caller kfree */
 			smp_mb__before_atomic(); /* ^^^ */
 			atomic_long_inc(&rsp->expedited_workdone2);
@@ -3441,10 +3407,20 @@ void synchronize_sched_expedited(void)
 			atomic_long_inc(&rsp->expedited_normal);
 			return;
 		}
-		snap = atomic_long_read(&rsp->expedited_start);
-		smp_mb(); /* ensure read is before try_stop_cpus(). */
 	}
 
+	/* Recheck yet again to see if someone else did our work for us. */
+	if (ULONG_CMP_GE(READ_ONCE(rsp->expedited_sequence), s)) {
+		rsp->expedited_workdone3++;
+		mutex_unlock(&rsp->expedited_mutex);
+		smp_mb(); /* ensure test happens before caller kfree */
+		return;
+	}
+
+	WRITE_ONCE(rsp->expedited_sequence, rsp->expedited_sequence + 1);
+	smp_mb(); /* Ensure expedited GP seen after counter increment. */
+	WARN_ON_ONCE(!(rsp->expedited_sequence & 0x1));
+
 	/* Stop each CPU that is online, non-idle, and not us. */
 	for_each_online_cpu(cpu) {
 		struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
@@ -3455,26 +3431,12 @@ void synchronize_sched_expedited(void)
 			continue;
 		stop_one_cpu(cpu, synchronize_sched_expedited_cpu_stop, NULL);
 	}
-	atomic_long_inc(&rsp->expedited_stoppedcpus);
 
-	/*
-	 * Everyone up to our most recent fetch is covered by our grace
-	 * period.  Update the counter, but only if our work is still
-	 * relevant -- which it won't be if someone who started later
-	 * than we did already did their update.
-	 */
-	do {
-		atomic_long_inc(&rsp->expedited_done_tries);
-		s = atomic_long_read(&rsp->expedited_done);
-		if (ULONG_CMP_GE((ulong)s, (ulong)snap)) {
-			/* ensure test happens before caller kfree */
-			smp_mb__before_atomic(); /* ^^^ */
-			atomic_long_inc(&rsp->expedited_done_lost);
-			break;
-		}
-	} while (atomic_long_cmpxchg(&rsp->expedited_done, s, snap) != s);
-	atomic_long_inc(&rsp->expedited_done_exit);
+	smp_mb(); /* Ensure expedited GP seen before counter increment. */
+	WRITE_ONCE(rsp->expedited_sequence, rsp->expedited_sequence + 1);
+	WARN_ON_ONCE(rsp->expedited_sequence & 0x1);
 	mutex_unlock(&rsp->expedited_mutex);
+	smp_mb(); /* ensure subsequent action seen after grace period. */
 
 	put_online_cpus();
 }
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index b04ffa0dea58..f15dd1e08ddd 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -479,17 +479,12 @@ struct rcu_state {
 	/* End of fields guarded by barrier_mutex. */
 
 	struct mutex  expedited_mutex;		/* Serializes expediting. */
-	atomic_long_t expedited_start;		/* Starting ticket. */
-	atomic_long_t expedited_done;		/* Done ticket. */
-	atomic_long_t expedited_wrap;		/* # near-wrap incidents. */
+	unsigned long expedited_sequence;	/* Take a ticket. */
 	atomic_long_t expedited_tryfail;	/* # acquisition failures. */
 	atomic_long_t expedited_workdone1;	/* # done by others #1. */
 	atomic_long_t expedited_workdone2;	/* # done by others #2. */
+	unsigned long expedited_workdone3;	/* # done by others #3. */
 	atomic_long_t expedited_normal;		/* # fallbacks to normal. */
-	atomic_long_t expedited_stoppedcpus;	/* # successful stop_cpus. */
-	atomic_long_t expedited_done_tries;	/* # tries to update _done. */
-	atomic_long_t expedited_done_lost;	/* # times beaten to _done. */
-	atomic_long_t expedited_done_exit;	/* # times exited _done loop. */
 
 	unsigned long jiffies_force_qs;		/* Time at which to invoke */
 						/*  force_quiescent_state(). */
diff --git a/kernel/rcu/tree_trace.c b/kernel/rcu/tree_trace.c
index 3ea7ffc7d5c4..a1ab3a5f6290 100644
--- a/kernel/rcu/tree_trace.c
+++ b/kernel/rcu/tree_trace.c
@@ -185,18 +185,14 @@ static int show_rcuexp(struct seq_file *m, void *v)
 {
 	struct rcu_state *rsp = (struct rcu_state *)m->private;
 
-	seq_printf(m, "s=%lu d=%lu w=%lu tf=%lu wd1=%lu wd2=%lu n=%lu sc=%lu dt=%lu dl=%lu dx=%lu\n",
-		   atomic_long_read(&rsp->expedited_start),
-		   atomic_long_read(&rsp->expedited_done),
-		   atomic_long_read(&rsp->expedited_wrap),
+	seq_printf(m, "t=%lu tf=%lu wd1=%lu wd2=%lu wd3=%lu n=%lu sc=%lu\n",
+		   rsp->expedited_sequence,
 		   atomic_long_read(&rsp->expedited_tryfail),
 		   atomic_long_read(&rsp->expedited_workdone1),
 		   atomic_long_read(&rsp->expedited_workdone2),
+		   rsp->expedited_workdone3,
 		   atomic_long_read(&rsp->expedited_normal),
-		   atomic_long_read(&rsp->expedited_stoppedcpus),
-		   atomic_long_read(&rsp->expedited_done_tries),
-		   atomic_long_read(&rsp->expedited_done_lost),
-		   atomic_long_read(&rsp->expedited_done_exit));
+		   rsp->expedited_sequence / 2);
 	return 0;
 }
 
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC tip/core/rcu 03/14] rcu: Get rid of synchronize_sched_expedited()'s polling loop
  2015-06-30 22:25 ` [PATCH RFC tip/core/rcu 01/14] rcu: Switch synchronize_sched_expedited() to stop_one_cpu() Paul E. McKenney
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 02/14] rcu: Rework synchronize_rcu_expedited() counter handling Paul E. McKenney
@ 2015-06-30 22:25   ` Paul E. McKenney
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 04/14] rcu: Make expedited GP CPU stoppage asynchronous Paul E. McKenney
                     ` (10 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Paul E. McKenney @ 2015-06-30 22:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

This commit gets rid of synchronize_sched_expedited()'s mutex_trylock()
polling loop in favor of a funnel-locking scheme based on the rcu_node
tree.  The work-done check is done at each level of the tree, allowing
high-contention situations to be resolved quickly with reasonable levels
of mutex contention.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c       | 95 +++++++++++++++++++++----------------------------
 kernel/rcu/tree.h       |  8 +++--
 kernel/rcu/tree_trace.c |  3 +-
 3 files changed, 47 insertions(+), 59 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 68212d1bfaef..887370b7e52a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -70,6 +70,7 @@ MODULE_ALIAS("rcutree");
 
 static struct lock_class_key rcu_node_class[RCU_NUM_LVLS];
 static struct lock_class_key rcu_fqs_class[RCU_NUM_LVLS];
+static struct lock_class_key rcu_exp_class[RCU_NUM_LVLS];
 
 /*
  * In order to export the rcu_state name to the tracing tools, it
@@ -103,7 +104,6 @@ struct rcu_state sname##_state = { \
 	.orphan_nxttail = &sname##_state.orphan_nxtlist, \
 	.orphan_donetail = &sname##_state.orphan_donelist, \
 	.barrier_mutex = __MUTEX_INITIALIZER(sname##_state.barrier_mutex), \
-	.expedited_mutex = __MUTEX_INITIALIZER(sname##_state.expedited_mutex), \
 	.name = RCU_STATE_NAME(sname), \
 	.abbr = sabbr, \
 }
@@ -3324,6 +3324,22 @@ static int synchronize_sched_expedited_cpu_stop(void *data)
 	return 0;
 }
 
+/* Common code for synchronize_sched_expedited() work-done checking. */
+static bool sync_sched_exp_wd(struct rcu_state *rsp, struct rcu_node *rnp,
+			      atomic_long_t *stat, unsigned long s)
+{
+	if (ULONG_CMP_GE(READ_ONCE(rsp->expedited_sequence), s)) {
+		if (rnp)
+			mutex_unlock(&rnp->exp_funnel_mutex);
+		/* Ensure test happens before caller kfree(). */
+		smp_mb__before_atomic(); /* ^^^ */
+		atomic_long_inc(stat);
+		put_online_cpus();
+		return true;
+	}
+	return false;
+}
+
 /**
  * synchronize_sched_expedited - Brute-force RCU-sched grace period
  *
@@ -3338,15 +3354,15 @@ static int synchronize_sched_expedited_cpu_stop(void *data)
  * This implementation can be thought of as an application of sequence
  * locking to expedited grace periods, but using the sequence counter to
  * determine when someone else has already done the work instead of for
- * retrying readers.  We do a mutex_trylock() polling loop, but if we fail
- * too many times in a row, we fall back to synchronize_sched().
+ * retrying readers.
  */
 void synchronize_sched_expedited(void)
 {
 	int cpu;
 	long s;
-	int trycount = 0;
 	struct rcu_state *rsp = &rcu_sched_state;
+	struct rcu_node *rnp0;
+	struct rcu_node *rnp1 = NULL;
 
 	/* Take a snapshot of the sequence number.  */
 	smp_mb(); /* Caller's modifications seen first by other CPUs. */
@@ -3362,60 +3378,25 @@ void synchronize_sched_expedited(void)
 	WARN_ON_ONCE(cpu_is_offline(raw_smp_processor_id()));
 
 	/*
-	 * Each pass through the following loop attempts to acquire
-	 * ->expedited_mutex, checking for others doing our work each time.
+	 * Each pass through the following loop works its way
+	 * up the rcu_node tree, returning if others have done the
+	 * work or otherwise falls through holding the root rnp's
+	 * ->exp_funnel_mutex.  The mapping from CPU to rcu_node structure
+	 * can be inexact, as it is just promoting locality and is not
+	 * strictly needed for correctness.
 	 */
-	while (!mutex_trylock(&rsp->expedited_mutex)) {
-		put_online_cpus();
-		atomic_long_inc(&rsp->expedited_tryfail);
-
-		/* Check to see if someone else did our work for us. */
-		if (ULONG_CMP_GE(READ_ONCE(rsp->expedited_sequence), s)) {
-			/* ensure test happens before caller kfree */
-			smp_mb__before_atomic(); /* ^^^ */
-			atomic_long_inc(&rsp->expedited_workdone1);
-			return;
-		}
-
-		/* No joy, try again later.  Or just synchronize_sched(). */
-		if (trycount++ < 10) {
-			udelay(trycount * num_online_cpus());
-		} else {
-			wait_rcu_gp(call_rcu_sched);
-			atomic_long_inc(&rsp->expedited_normal);
+	rnp0 = per_cpu_ptr(rsp->rda, raw_smp_processor_id())->mynode;
+	for (; rnp0 != NULL; rnp0 = rnp0->parent) {
+		if (sync_sched_exp_wd(rsp, rnp1, &rsp->expedited_workdone1, s))
 			return;
-		}
-
-		/* Recheck to see if someone else did our work for us. */
-		if (ULONG_CMP_GE(READ_ONCE(rsp->expedited_sequence), s)) {
-			/* ensure test happens before caller kfree */
-			smp_mb__before_atomic(); /* ^^^ */
-			atomic_long_inc(&rsp->expedited_workdone2);
-			return;
-		}
-
-		/*
-		 * Refetching sync_sched_expedited_started allows later
-		 * callers to piggyback on our grace period.  We retry
-		 * after they started, so our grace period works for them,
-		 * and they started after our first try, so their grace
-		 * period works for us.
-		 */
-		if (!try_get_online_cpus()) {
-			/* CPU hotplug operation in flight, use normal GP. */
-			wait_rcu_gp(call_rcu_sched);
-			atomic_long_inc(&rsp->expedited_normal);
-			return;
-		}
+		mutex_lock(&rnp0->exp_funnel_mutex);
+		if (rnp1)
+			mutex_unlock(&rnp1->exp_funnel_mutex);
+		rnp1 = rnp0;
 	}
-
-	/* Recheck yet again to see if someone else did our work for us. */
-	if (ULONG_CMP_GE(READ_ONCE(rsp->expedited_sequence), s)) {
-		rsp->expedited_workdone3++;
-		mutex_unlock(&rsp->expedited_mutex);
-		smp_mb(); /* ensure test happens before caller kfree */
+	rnp0 = rnp1;  /* rcu_get_root(rsp), AKA root rcu_node structure. */
+	if (sync_sched_exp_wd(rsp, rnp0, &rsp->expedited_workdone2, s))
 		return;
-	}
 
 	WRITE_ONCE(rsp->expedited_sequence, rsp->expedited_sequence + 1);
 	smp_mb(); /* Ensure expedited GP seen after counter increment. */
@@ -3435,7 +3416,7 @@ void synchronize_sched_expedited(void)
 	smp_mb(); /* Ensure expedited GP seen before counter increment. */
 	WRITE_ONCE(rsp->expedited_sequence, rsp->expedited_sequence + 1);
 	WARN_ON_ONCE(rsp->expedited_sequence & 0x1);
-	mutex_unlock(&rsp->expedited_mutex);
+	mutex_unlock(&rnp0->exp_funnel_mutex);
 	smp_mb(); /* ensure subsequent action seen after grace period. */
 
 	put_online_cpus();
@@ -3992,6 +3973,7 @@ static void __init rcu_init_one(struct rcu_state *rsp,
 {
 	static const char * const buf[] = RCU_NODE_NAME_INIT;
 	static const char * const fqs[] = RCU_FQS_NAME_INIT;
+	static const char * const exp[] = RCU_EXP_NAME_INIT;
 	static u8 fl_mask = 0x1;
 
 	int levelcnt[RCU_NUM_LVLS];		/* # nodes in each level. */
@@ -4050,6 +4032,9 @@ static void __init rcu_init_one(struct rcu_state *rsp,
 			rnp->level = i;
 			INIT_LIST_HEAD(&rnp->blkd_tasks);
 			rcu_init_one_nocb(rnp);
+			mutex_init(&rnp->exp_funnel_mutex);
+			lockdep_set_class_and_name(&rnp->exp_funnel_mutex,
+						   &rcu_exp_class[i], exp[i]);
 		}
 	}
 
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index f15dd1e08ddd..f0f4dd96dd73 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -68,6 +68,7 @@
 #  define NUM_RCU_LVL_INIT    { NUM_RCU_LVL_0 }
 #  define RCU_NODE_NAME_INIT  { "rcu_node_0" }
 #  define RCU_FQS_NAME_INIT   { "rcu_node_fqs_0" }
+#  define RCU_EXP_NAME_INIT   { "rcu_node_exp_0" }
 #elif NR_CPUS <= RCU_FANOUT_2
 #  define RCU_NUM_LVLS	      2
 #  define NUM_RCU_LVL_0	      1
@@ -76,6 +77,7 @@
 #  define NUM_RCU_LVL_INIT    { NUM_RCU_LVL_0, NUM_RCU_LVL_1 }
 #  define RCU_NODE_NAME_INIT  { "rcu_node_0", "rcu_node_1" }
 #  define RCU_FQS_NAME_INIT   { "rcu_node_fqs_0", "rcu_node_fqs_1" }
+#  define RCU_EXP_NAME_INIT   { "rcu_node_exp_0", "rcu_node_exp_1" }
 #elif NR_CPUS <= RCU_FANOUT_3
 #  define RCU_NUM_LVLS	      3
 #  define NUM_RCU_LVL_0	      1
@@ -85,6 +87,7 @@
 #  define NUM_RCU_LVL_INIT    { NUM_RCU_LVL_0, NUM_RCU_LVL_1, NUM_RCU_LVL_2 }
 #  define RCU_NODE_NAME_INIT  { "rcu_node_0", "rcu_node_1", "rcu_node_2" }
 #  define RCU_FQS_NAME_INIT   { "rcu_node_fqs_0", "rcu_node_fqs_1", "rcu_node_fqs_2" }
+#  define RCU_EXP_NAME_INIT   { "rcu_node_exp_0", "rcu_node_exp_1", "rcu_node_exp_2" }
 #elif NR_CPUS <= RCU_FANOUT_4
 #  define RCU_NUM_LVLS	      4
 #  define NUM_RCU_LVL_0	      1
@@ -95,6 +98,7 @@
 #  define NUM_RCU_LVL_INIT    { NUM_RCU_LVL_0, NUM_RCU_LVL_1, NUM_RCU_LVL_2, NUM_RCU_LVL_3 }
 #  define RCU_NODE_NAME_INIT  { "rcu_node_0", "rcu_node_1", "rcu_node_2", "rcu_node_3" }
 #  define RCU_FQS_NAME_INIT   { "rcu_node_fqs_0", "rcu_node_fqs_1", "rcu_node_fqs_2", "rcu_node_fqs_3" }
+#  define RCU_EXP_NAME_INIT   { "rcu_node_exp_0", "rcu_node_exp_1", "rcu_node_exp_2", "rcu_node_exp_3" }
 #else
 # error "CONFIG_RCU_FANOUT insufficient for NR_CPUS"
 #endif /* #if (NR_CPUS) <= RCU_FANOUT_1 */
@@ -237,6 +241,8 @@ struct rcu_node {
 	int need_future_gp[2];
 				/* Counts of upcoming no-CB GP requests. */
 	raw_spinlock_t fqslock ____cacheline_internodealigned_in_smp;
+
+	struct mutex exp_funnel_mutex ____cacheline_internodealigned_in_smp;
 } ____cacheline_internodealigned_in_smp;
 
 /*
@@ -478,12 +484,10 @@ struct rcu_state {
 						/*  _rcu_barrier(). */
 	/* End of fields guarded by barrier_mutex. */
 
-	struct mutex  expedited_mutex;		/* Serializes expediting. */
 	unsigned long expedited_sequence;	/* Take a ticket. */
 	atomic_long_t expedited_tryfail;	/* # acquisition failures. */
 	atomic_long_t expedited_workdone1;	/* # done by others #1. */
 	atomic_long_t expedited_workdone2;	/* # done by others #2. */
-	unsigned long expedited_workdone3;	/* # done by others #3. */
 	atomic_long_t expedited_normal;		/* # fallbacks to normal. */
 
 	unsigned long jiffies_force_qs;		/* Time at which to invoke */
diff --git a/kernel/rcu/tree_trace.c b/kernel/rcu/tree_trace.c
index a1ab3a5f6290..d2aab8dcd58e 100644
--- a/kernel/rcu/tree_trace.c
+++ b/kernel/rcu/tree_trace.c
@@ -185,12 +185,11 @@ static int show_rcuexp(struct seq_file *m, void *v)
 {
 	struct rcu_state *rsp = (struct rcu_state *)m->private;
 
-	seq_printf(m, "t=%lu tf=%lu wd1=%lu wd2=%lu wd3=%lu n=%lu sc=%lu\n",
+	seq_printf(m, "t=%lu tf=%lu wd1=%lu wd2=%lu n=%lu sc=%lu\n",
 		   rsp->expedited_sequence,
 		   atomic_long_read(&rsp->expedited_tryfail),
 		   atomic_long_read(&rsp->expedited_workdone1),
 		   atomic_long_read(&rsp->expedited_workdone2),
-		   rsp->expedited_workdone3,
 		   atomic_long_read(&rsp->expedited_normal),
 		   rsp->expedited_sequence / 2);
 	return 0;
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC tip/core/rcu 04/14] rcu: Make expedited GP CPU stoppage asynchronous
  2015-06-30 22:25 ` [PATCH RFC tip/core/rcu 01/14] rcu: Switch synchronize_sched_expedited() to stop_one_cpu() Paul E. McKenney
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 02/14] rcu: Rework synchronize_rcu_expedited() counter handling Paul E. McKenney
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 03/14] rcu: Get rid of synchronize_sched_expedited()'s polling loop Paul E. McKenney
@ 2015-06-30 22:25   ` Paul E. McKenney
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 05/14] rcu: Abstract sequence counting from synchronize_sched_expedited() Paul E. McKenney
                     ` (9 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Paul E. McKenney @ 2015-06-30 22:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: Peter Zijlstra <peterz@infradead.org>

Sequentially stopping the CPUs slows down expedited grace periods by
at least a factor of two, based on rcutorture's grace-period-per-second
rate.  This is a conservative measure because rcutorture uses unusually
long RCU read-side critical sections and because rcutorture periodically
quiesces the system in order to test RCU's ability to ramp down to and
up from the idle state.  This commit therefore replaces the stop_one_cpu()
with stop_one_cpu_nowait(), using an atomic-counter scheme to determine
when all CPUs have passed through the stopped state.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c       | 31 +++++++++++++++++--------------
 kernel/rcu/tree.h       |  6 ++++++
 kernel/rcu/tree_trace.c |  3 ++-
 3 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 887370b7e52a..c58fd27b4a22 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3309,18 +3309,11 @@ EXPORT_SYMBOL_GPL(cond_synchronize_sched);
 
 static int synchronize_sched_expedited_cpu_stop(void *data)
 {
-	/*
-	 * There must be a full memory barrier on each affected CPU
-	 * between the time that try_stop_cpus() is called and the
-	 * time that it returns.
-	 *
-	 * In the current initial implementation of cpu_stop, the
-	 * above condition is already met when the control reaches
-	 * this point and the following smp_mb() is not strictly
-	 * necessary.  Do smp_mb() anyway for documentation and
-	 * robustness against future implementation changes.
-	 */
-	smp_mb(); /* See above comment block. */
+	struct rcu_state *rsp = data;
+
+	/* We are here: If we are last, do the wakeup. */
+	if (atomic_dec_and_test(&rsp->expedited_need_qs))
+		wake_up(&rsp->expedited_wq);
 	return 0;
 }
 
@@ -3360,9 +3353,9 @@ void synchronize_sched_expedited(void)
 {
 	int cpu;
 	long s;
-	struct rcu_state *rsp = &rcu_sched_state;
 	struct rcu_node *rnp0;
 	struct rcu_node *rnp1 = NULL;
+	struct rcu_state *rsp = &rcu_sched_state;
 
 	/* Take a snapshot of the sequence number.  */
 	smp_mb(); /* Caller's modifications seen first by other CPUs. */
@@ -3403,16 +3396,26 @@ void synchronize_sched_expedited(void)
 	WARN_ON_ONCE(!(rsp->expedited_sequence & 0x1));
 
 	/* Stop each CPU that is online, non-idle, and not us. */
+	init_waitqueue_head(&rsp->expedited_wq);
+	atomic_set(&rsp->expedited_need_qs, 1); /* Extra count avoids race. */
 	for_each_online_cpu(cpu) {
+		struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
 		struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
 
 		/* Skip our CPU and any idle CPUs. */
 		if (raw_smp_processor_id() == cpu ||
 		    !(atomic_add_return(0, &rdtp->dynticks) & 0x1))
 			continue;
-		stop_one_cpu(cpu, synchronize_sched_expedited_cpu_stop, NULL);
+		atomic_inc(&rsp->expedited_need_qs);
+		stop_one_cpu_nowait(cpu, synchronize_sched_expedited_cpu_stop,
+				    rsp, &rdp->exp_stop_work);
 	}
 
+	/* Remove extra count and, if necessary, wait for CPUs to stop. */
+	if (!atomic_dec_and_test(&rsp->expedited_need_qs))
+		wait_event(rsp->expedited_wq,
+			   !atomic_read(&rsp->expedited_need_qs));
+
 	smp_mb(); /* Ensure expedited GP seen before counter increment. */
 	WRITE_ONCE(rsp->expedited_sequence, rsp->expedited_sequence + 1);
 	WARN_ON_ONCE(rsp->expedited_sequence & 0x1);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index f0f4dd96dd73..c15cba1b97ab 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -27,6 +27,7 @@
 #include <linux/threads.h>
 #include <linux/cpumask.h>
 #include <linux/seqlock.h>
+#include <linux/stop_machine.h>
 
 /*
  * Define shape of hierarchy based on NR_CPUS, CONFIG_RCU_FANOUT, and
@@ -298,6 +299,9 @@ struct rcu_data {
 					/*  ticks this CPU has handled */
 					/*  during and after the last grace */
 					/* period it is aware of. */
+	struct cpu_stop_work exp_stop_work;
+					/* Expedited grace-period control */
+					/*  for CPU stopping. */
 
 	/* 2) batch handling */
 	/*
@@ -489,6 +493,8 @@ struct rcu_state {
 	atomic_long_t expedited_workdone1;	/* # done by others #1. */
 	atomic_long_t expedited_workdone2;	/* # done by others #2. */
 	atomic_long_t expedited_normal;		/* # fallbacks to normal. */
+	atomic_t expedited_need_qs;		/* # CPUs left to check in. */
+	wait_queue_head_t expedited_wq;		/* Wait for check-ins. */
 
 	unsigned long jiffies_force_qs;		/* Time at which to invoke */
 						/*  force_quiescent_state(). */
diff --git a/kernel/rcu/tree_trace.c b/kernel/rcu/tree_trace.c
index d2aab8dcd58e..36c04b46d3b8 100644
--- a/kernel/rcu/tree_trace.c
+++ b/kernel/rcu/tree_trace.c
@@ -185,12 +185,13 @@ static int show_rcuexp(struct seq_file *m, void *v)
 {
 	struct rcu_state *rsp = (struct rcu_state *)m->private;
 
-	seq_printf(m, "t=%lu tf=%lu wd1=%lu wd2=%lu n=%lu sc=%lu\n",
+	seq_printf(m, "t=%lu tf=%lu wd1=%lu wd2=%lu n=%lu enq=%d sc=%lu\n",
 		   rsp->expedited_sequence,
 		   atomic_long_read(&rsp->expedited_tryfail),
 		   atomic_long_read(&rsp->expedited_workdone1),
 		   atomic_long_read(&rsp->expedited_workdone2),
 		   atomic_long_read(&rsp->expedited_normal),
+		   atomic_read(&rsp->expedited_need_qs),
 		   rsp->expedited_sequence / 2);
 	return 0;
 }
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC tip/core/rcu 05/14] rcu: Abstract sequence counting from synchronize_sched_expedited()
  2015-06-30 22:25 ` [PATCH RFC tip/core/rcu 01/14] rcu: Switch synchronize_sched_expedited() to stop_one_cpu() Paul E. McKenney
                     ` (2 preceding siblings ...)
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 04/14] rcu: Make expedited GP CPU stoppage asynchronous Paul E. McKenney
@ 2015-06-30 22:25   ` Paul E. McKenney
  2015-07-01 10:27     ` Peter Zijlstra
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 06/14] rcu: Make synchronize_rcu_expedited() use sequence-counter scheme Paul E. McKenney
                     ` (8 subsequent siblings)
  12 siblings, 1 reply; 22+ messages in thread
From: Paul E. McKenney @ 2015-06-30 22:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

This commit creates rcu_exp_gp_seq_start() and rcu_exp_gp_seq_end() to
bracket an expedited grace period, rcu_exp_gp_seq_snap() to snapshot the
sequence counter, and rcu_exp_gp_seq_done() to check to see if a full
expedited grace period has elapsed since the snapshot.  These will be
applied to synchronize_rcu_expedited().  These are defined in terms of
underlying rcu_seq_start(), rcu_seq_end(), rcu_seq_snap(), rcu_seq_done(),
which will be applied to _rcu_barrier().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 58 insertions(+), 10 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c58fd27b4a22..f96500e462fd 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3307,6 +3307,60 @@ void cond_synchronize_sched(unsigned long oldstate)
 }
 EXPORT_SYMBOL_GPL(cond_synchronize_sched);
 
+/* Adjust sequence number for start of update-side operation. */
+static void rcu_seq_start(unsigned long *sp)
+{
+	WRITE_ONCE(*sp, *sp + 1);
+	smp_mb(); /* Ensure update-side operation after counter increment. */
+	WARN_ON_ONCE(!(*sp & 0x1));
+}
+
+/* Adjust sequence number for end of update-side operation. */
+static void rcu_seq_end(unsigned long *sp)
+{
+	smp_mb(); /* Ensure update-side operation before counter increment. */
+	WRITE_ONCE(*sp, *sp + 1);
+	WARN_ON_ONCE(*sp & 0x1);
+}
+
+/* Take a snapshot of the update side's sequence number. */
+static unsigned long rcu_seq_snap(unsigned long *sp)
+{
+	unsigned long s;
+
+	smp_mb(); /* Caller's modifications seen first by other CPUs. */
+	s = (READ_ONCE(*sp) + 3) & ~0x1;
+	smp_mb(); /* Above access must not bleed into critical section. */
+	return s;
+}
+
+/*
+ * Given a snapshot from rcu_seq_snap(), determine whether or not a
+ * full update-side operation has occurred.
+ */
+static bool rcu_seq_done(unsigned long *sp, unsigned long s)
+{
+	return ULONG_CMP_GE(READ_ONCE(*sp), s);
+}
+
+/* Wrapper functions for expedited grace periods.  */
+static void rcu_exp_gp_seq_start(struct rcu_state *rsp)
+{
+	rcu_seq_start(&rsp->expedited_sequence);
+}
+static void rcu_exp_gp_seq_end(struct rcu_state *rsp)
+{
+	rcu_seq_end(&rsp->expedited_sequence);
+}
+static unsigned long rcu_exp_gp_seq_snap(struct rcu_state *rsp)
+{
+	return rcu_seq_snap(&rsp->expedited_sequence);
+}
+static bool rcu_exp_gp_seq_done(struct rcu_state *rsp, unsigned long s)
+{
+	return rcu_seq_done(&rsp->expedited_sequence, s);
+}
+
 static int synchronize_sched_expedited_cpu_stop(void *data)
 {
 	struct rcu_state *rsp = data;
@@ -3321,7 +3375,7 @@ static int synchronize_sched_expedited_cpu_stop(void *data)
 static bool sync_sched_exp_wd(struct rcu_state *rsp, struct rcu_node *rnp,
 			      atomic_long_t *stat, unsigned long s)
 {
-	if (ULONG_CMP_GE(READ_ONCE(rsp->expedited_sequence), s)) {
+	if (rcu_exp_gp_seq_done(rsp, s)) {
 		if (rnp)
 			mutex_unlock(&rnp->exp_funnel_mutex);
 		/* Ensure test happens before caller kfree(). */
@@ -3358,9 +3412,7 @@ void synchronize_sched_expedited(void)
 	struct rcu_state *rsp = &rcu_sched_state;
 
 	/* Take a snapshot of the sequence number.  */
-	smp_mb(); /* Caller's modifications seen first by other CPUs. */
-	s = (READ_ONCE(rsp->expedited_sequence) + 3) & ~0x1;
-	smp_mb(); /* Above access must not bleed into critical section. */
+	s = rcu_exp_gp_seq_snap(rsp);
 
 	if (!try_get_online_cpus()) {
 		/* CPU hotplug operation in flight, fall back to normal GP. */
@@ -3391,9 +3443,7 @@ void synchronize_sched_expedited(void)
 	if (sync_sched_exp_wd(rsp, rnp0, &rsp->expedited_workdone2, s))
 		return;
 
-	WRITE_ONCE(rsp->expedited_sequence, rsp->expedited_sequence + 1);
-	smp_mb(); /* Ensure expedited GP seen after counter increment. */
-	WARN_ON_ONCE(!(rsp->expedited_sequence & 0x1));
+	rcu_exp_gp_seq_start(rsp);
 
 	/* Stop each CPU that is online, non-idle, and not us. */
 	init_waitqueue_head(&rsp->expedited_wq);
@@ -3416,9 +3466,7 @@ void synchronize_sched_expedited(void)
 		wait_event(rsp->expedited_wq,
 			   !atomic_read(&rsp->expedited_need_qs));
 
-	smp_mb(); /* Ensure expedited GP seen before counter increment. */
-	WRITE_ONCE(rsp->expedited_sequence, rsp->expedited_sequence + 1);
-	WARN_ON_ONCE(rsp->expedited_sequence & 0x1);
+	rcu_exp_gp_seq_end(rsp);
 	mutex_unlock(&rnp0->exp_funnel_mutex);
 	smp_mb(); /* ensure subsequent action seen after grace period. */
 
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC tip/core/rcu 06/14] rcu: Make synchronize_rcu_expedited() use sequence-counter scheme
  2015-06-30 22:25 ` [PATCH RFC tip/core/rcu 01/14] rcu: Switch synchronize_sched_expedited() to stop_one_cpu() Paul E. McKenney
                     ` (3 preceding siblings ...)
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 05/14] rcu: Abstract sequence counting from synchronize_sched_expedited() Paul E. McKenney
@ 2015-06-30 22:25   ` Paul E. McKenney
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 07/14] rcu: Abstract funnel locking from synchronize_sched_expedited() Paul E. McKenney
                     ` (7 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Paul E. McKenney @ 2015-06-30 22:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Although synchronize_rcu_expedited() uses a sequence-counter scheme, it
is based on a single increment per grace period, which means that tasks
piggybacking off of concurrent grace periods may be forced to wait longer
than necessary.  This commit therefore applies the new sequence-count
functions developed for synchronize_sched_expedited() to speed things
up a bit and to consolidate the sequence-counter implementation.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree_plugin.h | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 07bd9fb393b4..190213ffad19 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -536,7 +536,6 @@ void synchronize_rcu(void)
 EXPORT_SYMBOL_GPL(synchronize_rcu);
 
 static DECLARE_WAIT_QUEUE_HEAD(sync_rcu_preempt_exp_wq);
-static unsigned long sync_rcu_preempt_exp_count;
 static DEFINE_MUTEX(sync_rcu_preempt_exp_mutex);
 
 /*
@@ -704,12 +703,10 @@ void synchronize_rcu_expedited(void)
 {
 	struct rcu_node *rnp;
 	struct rcu_state *rsp = rcu_state_p;
-	unsigned long snap;
+	unsigned long s;
 	int trycount = 0;
 
-	smp_mb(); /* Caller's modifications seen first by other CPUs. */
-	snap = READ_ONCE(sync_rcu_preempt_exp_count) + 1;
-	smp_mb(); /* Above access cannot bleed into critical section. */
+	s = rcu_exp_gp_seq_snap(rsp);
 
 	/*
 	 * Acquire lock, falling back to synchronize_rcu() if too many
@@ -717,8 +714,7 @@ void synchronize_rcu_expedited(void)
 	 * expedited grace period for us, just leave.
 	 */
 	while (!mutex_trylock(&sync_rcu_preempt_exp_mutex)) {
-		if (ULONG_CMP_LT(snap,
-		    READ_ONCE(sync_rcu_preempt_exp_count)))
+		if (rcu_exp_gp_seq_done(rsp, s))
 			goto mb_ret; /* Others did our work for us. */
 		if (trycount++ < 10) {
 			udelay(trycount * num_online_cpus());
@@ -727,8 +723,9 @@ void synchronize_rcu_expedited(void)
 			return;
 		}
 	}
-	if (ULONG_CMP_LT(snap, READ_ONCE(sync_rcu_preempt_exp_count)))
+	if (rcu_exp_gp_seq_done(rsp, s))
 		goto unlock_mb_ret; /* Others did our work for us. */
+	rcu_exp_gp_seq_start(rsp);
 
 	/* force all RCU readers onto ->blkd_tasks lists. */
 	synchronize_sched_expedited();
@@ -750,8 +747,7 @@ void synchronize_rcu_expedited(void)
 		   sync_rcu_preempt_exp_done(rnp));
 
 	/* Clean up and exit. */
-	smp_mb(); /* ensure expedited GP seen before counter increment. */
-	WRITE_ONCE(sync_rcu_preempt_exp_count, sync_rcu_preempt_exp_count + 1);
+	rcu_exp_gp_seq_end(rsp);
 unlock_mb_ret:
 	mutex_unlock(&sync_rcu_preempt_exp_mutex);
 mb_ret:
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC tip/core/rcu 07/14] rcu: Abstract funnel locking from synchronize_sched_expedited()
  2015-06-30 22:25 ` [PATCH RFC tip/core/rcu 01/14] rcu: Switch synchronize_sched_expedited() to stop_one_cpu() Paul E. McKenney
                     ` (4 preceding siblings ...)
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 06/14] rcu: Make synchronize_rcu_expedited() use sequence-counter scheme Paul E. McKenney
@ 2015-06-30 22:25   ` Paul E. McKenney
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 08/14] rcu: Fix synchronize_sched_expedited() type error for "s" Paul E. McKenney
                     ` (6 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Paul E. McKenney @ 2015-06-30 22:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

This commit abstracts funnel locking from synchronize_sched_expedited()
so that it may be used by synchronize_rcu_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 80 ++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 47 insertions(+), 33 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f96500e462fd..ad6980af1c99 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3361,16 +3361,6 @@ static bool rcu_exp_gp_seq_done(struct rcu_state *rsp, unsigned long s)
 	return rcu_seq_done(&rsp->expedited_sequence, s);
 }
 
-static int synchronize_sched_expedited_cpu_stop(void *data)
-{
-	struct rcu_state *rsp = data;
-
-	/* We are here: If we are last, do the wakeup. */
-	if (atomic_dec_and_test(&rsp->expedited_need_qs))
-		wake_up(&rsp->expedited_wq);
-	return 0;
-}
-
 /* Common code for synchronize_sched_expedited() work-done checking. */
 static bool sync_sched_exp_wd(struct rcu_state *rsp, struct rcu_node *rnp,
 			      atomic_long_t *stat, unsigned long s)
@@ -3387,6 +3377,48 @@ static bool sync_sched_exp_wd(struct rcu_state *rsp, struct rcu_node *rnp,
 	return false;
 }
 
+/*
+ * Funnel-lock acquisition for expedited grace periods.  Returns a
+ * pointer to the root rcu_node structure, or NULL if some other
+ * task did the expedited grace period for us.
+ */
+static struct rcu_node *exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
+{
+	struct rcu_node *rnp0;
+	struct rcu_node *rnp1 = NULL;
+
+	/*
+	 * Each pass through the following loop works its way
+	 * up the rcu_node tree, returning if others have done the
+	 * work or otherwise falls through holding the root rnp's
+	 * ->exp_funnel_mutex.  The mapping from CPU to rcu_node structure
+	 * can be inexact, as it is just promoting locality and is not
+	 * strictly needed for correctness.
+	 */
+	rnp0 = per_cpu_ptr(rsp->rda, raw_smp_processor_id())->mynode;
+	for (; rnp0 != NULL; rnp0 = rnp0->parent) {
+		if (sync_sched_exp_wd(rsp, rnp1, &rsp->expedited_workdone1, s))
+			return NULL;
+		mutex_lock(&rnp0->exp_funnel_mutex);
+		if (rnp1)
+			mutex_unlock(&rnp1->exp_funnel_mutex);
+		rnp1 = rnp0;
+	}
+	if (sync_sched_exp_wd(rsp, rnp1, &rsp->expedited_workdone2, s))
+		return NULL;
+	return rnp1;
+}
+
+static int synchronize_sched_expedited_cpu_stop(void *data)
+{
+	struct rcu_state *rsp = data;
+
+	/* We are here: If we are last, do the wakeup. */
+	if (atomic_dec_and_test(&rsp->expedited_need_qs))
+		wake_up(&rsp->expedited_wq);
+	return 0;
+}
+
 /**
  * synchronize_sched_expedited - Brute-force RCU-sched grace period
  *
@@ -3407,8 +3439,7 @@ void synchronize_sched_expedited(void)
 {
 	int cpu;
 	long s;
-	struct rcu_node *rnp0;
-	struct rcu_node *rnp1 = NULL;
+	struct rcu_node *rnp;
 	struct rcu_state *rsp = &rcu_sched_state;
 
 	/* Take a snapshot of the sequence number.  */
@@ -3422,26 +3453,9 @@ void synchronize_sched_expedited(void)
 	}
 	WARN_ON_ONCE(cpu_is_offline(raw_smp_processor_id()));
 
-	/*
-	 * Each pass through the following loop works its way
-	 * up the rcu_node tree, returning if others have done the
-	 * work or otherwise falls through holding the root rnp's
-	 * ->exp_funnel_mutex.  The mapping from CPU to rcu_node structure
-	 * can be inexact, as it is just promoting locality and is not
-	 * strictly needed for correctness.
-	 */
-	rnp0 = per_cpu_ptr(rsp->rda, raw_smp_processor_id())->mynode;
-	for (; rnp0 != NULL; rnp0 = rnp0->parent) {
-		if (sync_sched_exp_wd(rsp, rnp1, &rsp->expedited_workdone1, s))
-			return;
-		mutex_lock(&rnp0->exp_funnel_mutex);
-		if (rnp1)
-			mutex_unlock(&rnp1->exp_funnel_mutex);
-		rnp1 = rnp0;
-	}
-	rnp0 = rnp1;  /* rcu_get_root(rsp), AKA root rcu_node structure. */
-	if (sync_sched_exp_wd(rsp, rnp0, &rsp->expedited_workdone2, s))
-		return;
+	rnp = exp_funnel_lock(rsp, s);
+	if (rnp == NULL)
+		return;  /* Someone else did our work for us. */
 
 	rcu_exp_gp_seq_start(rsp);
 
@@ -3467,7 +3481,7 @@ void synchronize_sched_expedited(void)
 			   !atomic_read(&rsp->expedited_need_qs));
 
 	rcu_exp_gp_seq_end(rsp);
-	mutex_unlock(&rnp0->exp_funnel_mutex);
+	mutex_unlock(&rnp->exp_funnel_mutex);
 	smp_mb(); /* ensure subsequent action seen after grace period. */
 
 	put_online_cpus();
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC tip/core/rcu 08/14] rcu: Fix synchronize_sched_expedited() type error for "s"
  2015-06-30 22:25 ` [PATCH RFC tip/core/rcu 01/14] rcu: Switch synchronize_sched_expedited() to stop_one_cpu() Paul E. McKenney
                     ` (5 preceding siblings ...)
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 07/14] rcu: Abstract funnel locking from synchronize_sched_expedited() Paul E. McKenney
@ 2015-06-30 22:25   ` Paul E. McKenney
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 09/14] rcu: Use funnel locking for synchronize_rcu_expedited()'s polling loop Paul E. McKenney
                     ` (5 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Paul E. McKenney @ 2015-06-30 22:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

The type of "s" has been "long" rather than the correct "unsigned long"
for quite some time.  This commit fixes this type error.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index ad6980af1c99..65ba4977b200 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3438,7 +3438,7 @@ static int synchronize_sched_expedited_cpu_stop(void *data)
 void synchronize_sched_expedited(void)
 {
 	int cpu;
-	long s;
+	unsigned long s;
 	struct rcu_node *rnp;
 	struct rcu_state *rsp = &rcu_sched_state;
 
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC tip/core/rcu 09/14] rcu: Use funnel locking for synchronize_rcu_expedited()'s polling loop
  2015-06-30 22:25 ` [PATCH RFC tip/core/rcu 01/14] rcu: Switch synchronize_sched_expedited() to stop_one_cpu() Paul E. McKenney
                     ` (6 preceding siblings ...)
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 08/14] rcu: Fix synchronize_sched_expedited() type error for "s" Paul E. McKenney
@ 2015-06-30 22:25   ` Paul E. McKenney
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 10/14] rcu: Apply rcu_seq operations to _rcu_barrier() Paul E. McKenney
                     ` (4 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Paul E. McKenney @ 2015-06-30 22:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

This commit gets rid of synchronize_rcu_expedited()'s mutex_trylock()
polling loop in favor of the funnel-locking scheme that was abstracted
from synchronize_sched_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c        | 15 ++++++++-------
 kernel/rcu/tree_plugin.h | 36 ++++++++++--------------------------
 2 files changed, 18 insertions(+), 33 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 65ba4977b200..2ade4592aa70 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3361,9 +3361,9 @@ static bool rcu_exp_gp_seq_done(struct rcu_state *rsp, unsigned long s)
 	return rcu_seq_done(&rsp->expedited_sequence, s);
 }
 
-/* Common code for synchronize_sched_expedited() work-done checking. */
-static bool sync_sched_exp_wd(struct rcu_state *rsp, struct rcu_node *rnp,
-			      atomic_long_t *stat, unsigned long s)
+/* Common code for synchronize_{rcu,sched}_expedited() work-done checking. */
+static bool sync_exp_work_done(struct rcu_state *rsp, struct rcu_node *rnp,
+			       atomic_long_t *stat, unsigned long s)
 {
 	if (rcu_exp_gp_seq_done(rsp, s)) {
 		if (rnp)
@@ -3371,7 +3371,6 @@ static bool sync_sched_exp_wd(struct rcu_state *rsp, struct rcu_node *rnp,
 		/* Ensure test happens before caller kfree(). */
 		smp_mb__before_atomic(); /* ^^^ */
 		atomic_long_inc(stat);
-		put_online_cpus();
 		return true;
 	}
 	return false;
@@ -3397,14 +3396,14 @@ static struct rcu_node *exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
 	 */
 	rnp0 = per_cpu_ptr(rsp->rda, raw_smp_processor_id())->mynode;
 	for (; rnp0 != NULL; rnp0 = rnp0->parent) {
-		if (sync_sched_exp_wd(rsp, rnp1, &rsp->expedited_workdone1, s))
+		if (sync_exp_work_done(rsp, rnp1, &rsp->expedited_workdone1, s))
 			return NULL;
 		mutex_lock(&rnp0->exp_funnel_mutex);
 		if (rnp1)
 			mutex_unlock(&rnp1->exp_funnel_mutex);
 		rnp1 = rnp0;
 	}
-	if (sync_sched_exp_wd(rsp, rnp1, &rsp->expedited_workdone2, s))
+	if (sync_exp_work_done(rsp, rnp1, &rsp->expedited_workdone2, s))
 		return NULL;
 	return rnp1;
 }
@@ -3454,8 +3453,10 @@ void synchronize_sched_expedited(void)
 	WARN_ON_ONCE(cpu_is_offline(raw_smp_processor_id()));
 
 	rnp = exp_funnel_lock(rsp, s);
-	if (rnp == NULL)
+	if (rnp == NULL) {
+		put_online_cpus();
 		return;  /* Someone else did our work for us. */
+	}
 
 	rcu_exp_gp_seq_start(rsp);
 
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 190213ffad19..f5055b6b08a0 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -536,7 +536,6 @@ void synchronize_rcu(void)
 EXPORT_SYMBOL_GPL(synchronize_rcu);
 
 static DECLARE_WAIT_QUEUE_HEAD(sync_rcu_preempt_exp_wq);
-static DEFINE_MUTEX(sync_rcu_preempt_exp_mutex);
 
 /*
  * Return non-zero if there are any tasks in RCU read-side critical
@@ -556,7 +555,7 @@ static int rcu_preempted_readers_exp(struct rcu_node *rnp)
  * for the current expedited grace period.  Works only for preemptible
  * RCU -- other RCU implementation use other means.
  *
- * Caller must hold sync_rcu_preempt_exp_mutex.
+ * Caller must hold the root rcu_node's exp_funnel_mutex.
  */
 static int sync_rcu_preempt_exp_done(struct rcu_node *rnp)
 {
@@ -572,7 +571,7 @@ static int sync_rcu_preempt_exp_done(struct rcu_node *rnp)
  * recursively up the tree.  (Calm down, calm down, we do the recursion
  * iteratively!)
  *
- * Caller must hold sync_rcu_preempt_exp_mutex.
+ * Caller must hold the root rcu_node's exp_funnel_mutex.
  */
 static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
 			       bool wake)
@@ -611,7 +610,7 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
  * set the ->expmask bits on the leaf rcu_node structures to tell phase 2
  * that work is needed here.
  *
- * Caller must hold sync_rcu_preempt_exp_mutex.
+ * Caller must hold the root rcu_node's exp_funnel_mutex.
  */
 static void
 sync_rcu_preempt_exp_init1(struct rcu_state *rsp, struct rcu_node *rnp)
@@ -654,7 +653,7 @@ sync_rcu_preempt_exp_init1(struct rcu_state *rsp, struct rcu_node *rnp)
  * invoke rcu_report_exp_rnp() to clear out the upper-level ->expmask bits,
  * enabling rcu_read_unlock_special() to do the bit-clearing.
  *
- * Caller must hold sync_rcu_preempt_exp_mutex.
+ * Caller must hold the root rcu_node's exp_funnel_mutex.
  */
 static void
 sync_rcu_preempt_exp_init2(struct rcu_state *rsp, struct rcu_node *rnp)
@@ -702,29 +701,16 @@ sync_rcu_preempt_exp_init2(struct rcu_state *rsp, struct rcu_node *rnp)
 void synchronize_rcu_expedited(void)
 {
 	struct rcu_node *rnp;
+	struct rcu_node *rnp_unlock;
 	struct rcu_state *rsp = rcu_state_p;
 	unsigned long s;
-	int trycount = 0;
 
 	s = rcu_exp_gp_seq_snap(rsp);
 
-	/*
-	 * Acquire lock, falling back to synchronize_rcu() if too many
-	 * lock-acquisition failures.  Of course, if someone does the
-	 * expedited grace period for us, just leave.
-	 */
-	while (!mutex_trylock(&sync_rcu_preempt_exp_mutex)) {
-		if (rcu_exp_gp_seq_done(rsp, s))
-			goto mb_ret; /* Others did our work for us. */
-		if (trycount++ < 10) {
-			udelay(trycount * num_online_cpus());
-		} else {
-			wait_rcu_gp(call_rcu);
-			return;
-		}
-	}
-	if (rcu_exp_gp_seq_done(rsp, s))
-		goto unlock_mb_ret; /* Others did our work for us. */
+	rnp_unlock = exp_funnel_lock(rsp, s);
+	if (rnp_unlock == NULL)
+		return;  /* Someone else did our work for us. */
+
 	rcu_exp_gp_seq_start(rsp);
 
 	/* force all RCU readers onto ->blkd_tasks lists. */
@@ -748,9 +734,7 @@ void synchronize_rcu_expedited(void)
 
 	/* Clean up and exit. */
 	rcu_exp_gp_seq_end(rsp);
-unlock_mb_ret:
-	mutex_unlock(&sync_rcu_preempt_exp_mutex);
-mb_ret:
+	mutex_unlock(&rnp_unlock->exp_funnel_mutex);
 	smp_mb(); /* ensure subsequent action seen after grace period. */
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC tip/core/rcu 10/14] rcu: Apply rcu_seq operations to _rcu_barrier()
  2015-06-30 22:25 ` [PATCH RFC tip/core/rcu 01/14] rcu: Switch synchronize_sched_expedited() to stop_one_cpu() Paul E. McKenney
                     ` (7 preceding siblings ...)
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 09/14] rcu: Use funnel locking for synchronize_rcu_expedited()'s polling loop Paul E. McKenney
@ 2015-06-30 22:25   ` Paul E. McKenney
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 11/14] rcu: Consolidate last open-coded expedited memory barrier Paul E. McKenney
                     ` (3 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Paul E. McKenney @ 2015-06-30 22:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

The rcu_seq operations were open-coded in _rcu_barrier(), so this commit
replaces the open-coding with the shiny new rcu_seq operations.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/trace/events/rcu.h |  1 -
 kernel/rcu/tree.c          | 72 ++++++++++++----------------------------------
 kernel/rcu/tree.h          |  2 +-
 kernel/rcu/tree_trace.c    |  4 +--
 4 files changed, 22 insertions(+), 57 deletions(-)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index c78e88ce5ea3..ef72c4aada56 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -661,7 +661,6 @@ TRACE_EVENT(rcu_torture_read,
  * Tracepoint for _rcu_barrier() execution.  The string "s" describes
  * the _rcu_barrier phase:
  *	"Begin": _rcu_barrier() started.
- *	"Check": _rcu_barrier() checking for piggybacking.
  *	"EarlyExit": _rcu_barrier() piggybacked, thus early exit.
  *	"Inc1": _rcu_barrier() piggyback check counter incremented.
  *	"OfflineNoCB": _rcu_barrier() found callback on never-online CPU
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 2ade4592aa70..041968e3b7ce 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3620,10 +3620,10 @@ static void rcu_barrier_callback(struct rcu_head *rhp)
 	struct rcu_state *rsp = rdp->rsp;
 
 	if (atomic_dec_and_test(&rsp->barrier_cpu_count)) {
-		_rcu_barrier_trace(rsp, "LastCB", -1, rsp->n_barrier_done);
+		_rcu_barrier_trace(rsp, "LastCB", -1, rsp->barrier_sequence);
 		complete(&rsp->barrier_completion);
 	} else {
-		_rcu_barrier_trace(rsp, "CB", -1, rsp->n_barrier_done);
+		_rcu_barrier_trace(rsp, "CB", -1, rsp->barrier_sequence);
 	}
 }
 
@@ -3635,7 +3635,7 @@ static void rcu_barrier_func(void *type)
 	struct rcu_state *rsp = type;
 	struct rcu_data *rdp = raw_cpu_ptr(rsp->rda);
 
-	_rcu_barrier_trace(rsp, "IRQ", -1, rsp->n_barrier_done);
+	_rcu_barrier_trace(rsp, "IRQ", -1, rsp->barrier_sequence);
 	atomic_inc(&rsp->barrier_cpu_count);
 	rsp->call(&rdp->barrier_head, rcu_barrier_callback);
 }
@@ -3648,55 +3648,24 @@ static void _rcu_barrier(struct rcu_state *rsp)
 {
 	int cpu;
 	struct rcu_data *rdp;
-	unsigned long snap = READ_ONCE(rsp->n_barrier_done);
-	unsigned long snap_done;
+	unsigned long s = rcu_seq_snap(&rsp->barrier_sequence);
 
-	_rcu_barrier_trace(rsp, "Begin", -1, snap);
+	_rcu_barrier_trace(rsp, "Begin", -1, s);
 
 	/* Take mutex to serialize concurrent rcu_barrier() requests. */
 	mutex_lock(&rsp->barrier_mutex);
 
-	/*
-	 * Ensure that all prior references, including to ->n_barrier_done,
-	 * are ordered before the _rcu_barrier() machinery.
-	 */
-	smp_mb();  /* See above block comment. */
-
-	/*
-	 * Recheck ->n_barrier_done to see if others did our work for us.
-	 * This means checking ->n_barrier_done for an even-to-odd-to-even
-	 * transition.  The "if" expression below therefore rounds the old
-	 * value up to the next even number and adds two before comparing.
-	 */
-	snap_done = rsp->n_barrier_done;
-	_rcu_barrier_trace(rsp, "Check", -1, snap_done);
-
-	/*
-	 * If the value in snap is odd, we needed to wait for the current
-	 * rcu_barrier() to complete, then wait for the next one, in other
-	 * words, we need the value of snap_done to be three larger than
-	 * the value of snap.  On the other hand, if the value in snap is
-	 * even, we only had to wait for the next rcu_barrier() to complete,
-	 * in other words, we need the value of snap_done to be only two
-	 * greater than the value of snap.  The "(snap + 3) & ~0x1" computes
-	 * this for us (thank you, Linus!).
-	 */
-	if (ULONG_CMP_GE(snap_done, (snap + 3) & ~0x1)) {
-		_rcu_barrier_trace(rsp, "EarlyExit", -1, snap_done);
+	/* Did someone else do our work for us? */
+	if (rcu_seq_done(&rsp->barrier_sequence, s)) {
+		_rcu_barrier_trace(rsp, "EarlyExit", -1, rsp->barrier_sequence);
 		smp_mb(); /* caller's subsequent code after above check. */
 		mutex_unlock(&rsp->barrier_mutex);
 		return;
 	}
 
-	/*
-	 * Increment ->n_barrier_done to avoid duplicate work.  Use
-	 * WRITE_ONCE() to prevent the compiler from speculating
-	 * the increment to precede the early-exit check.
-	 */
-	WRITE_ONCE(rsp->n_barrier_done, rsp->n_barrier_done + 1);
-	WARN_ON_ONCE((rsp->n_barrier_done & 0x1) != 1);
-	_rcu_barrier_trace(rsp, "Inc1", -1, rsp->n_barrier_done);
-	smp_mb(); /* Order ->n_barrier_done increment with below mechanism. */
+	/* Mark the start of the barrier operation. */
+	rcu_seq_start(&rsp->barrier_sequence);
+	_rcu_barrier_trace(rsp, "Inc1", -1, rsp->barrier_sequence);
 
 	/*
 	 * Initialize the count to one rather than to zero in order to
@@ -3720,10 +3689,10 @@ static void _rcu_barrier(struct rcu_state *rsp)
 		if (rcu_is_nocb_cpu(cpu)) {
 			if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) {
 				_rcu_barrier_trace(rsp, "OfflineNoCB", cpu,
-						   rsp->n_barrier_done);
+						   rsp->barrier_sequence);
 			} else {
 				_rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
-						   rsp->n_barrier_done);
+						   rsp->barrier_sequence);
 				smp_mb__before_atomic();
 				atomic_inc(&rsp->barrier_cpu_count);
 				__call_rcu(&rdp->barrier_head,
@@ -3731,11 +3700,11 @@ static void _rcu_barrier(struct rcu_state *rsp)
 			}
 		} else if (READ_ONCE(rdp->qlen)) {
 			_rcu_barrier_trace(rsp, "OnlineQ", cpu,
-					   rsp->n_barrier_done);
+					   rsp->barrier_sequence);
 			smp_call_function_single(cpu, rcu_barrier_func, rsp, 1);
 		} else {
 			_rcu_barrier_trace(rsp, "OnlineNQ", cpu,
-					   rsp->n_barrier_done);
+					   rsp->barrier_sequence);
 		}
 	}
 	put_online_cpus();
@@ -3747,16 +3716,13 @@ static void _rcu_barrier(struct rcu_state *rsp)
 	if (atomic_dec_and_test(&rsp->barrier_cpu_count))
 		complete(&rsp->barrier_completion);
 
-	/* Increment ->n_barrier_done to prevent duplicate work. */
-	smp_mb(); /* Keep increment after above mechanism. */
-	WRITE_ONCE(rsp->n_barrier_done, rsp->n_barrier_done + 1);
-	WARN_ON_ONCE((rsp->n_barrier_done & 0x1) != 0);
-	_rcu_barrier_trace(rsp, "Inc2", -1, rsp->n_barrier_done);
-	smp_mb(); /* Keep increment before caller's subsequent code. */
-
 	/* Wait for all rcu_barrier_callback() callbacks to be invoked. */
 	wait_for_completion(&rsp->barrier_completion);
 
+	/* Mark the end of the barrier operation. */
+	_rcu_barrier_trace(rsp, "Inc2", -1, rsp->barrier_sequence);
+	rcu_seq_end(&rsp->barrier_sequence);
+
 	/* Other rcu_barrier() invocations can now safely proceed. */
 	mutex_unlock(&rsp->barrier_mutex);
 }
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index c15cba1b97ab..e40b65d45495 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -484,7 +484,7 @@ struct rcu_state {
 	struct mutex barrier_mutex;		/* Guards barrier fields. */
 	atomic_t barrier_cpu_count;		/* # CPUs waiting on. */
 	struct completion barrier_completion;	/* Wake at barrier end. */
-	unsigned long n_barrier_done;		/* ++ at start and end of */
+	unsigned long barrier_sequence;		/* ++ at start and end of */
 						/*  _rcu_barrier(). */
 	/* End of fields guarded by barrier_mutex. */
 
diff --git a/kernel/rcu/tree_trace.c b/kernel/rcu/tree_trace.c
index 36c04b46d3b8..d9982a2ce305 100644
--- a/kernel/rcu/tree_trace.c
+++ b/kernel/rcu/tree_trace.c
@@ -81,9 +81,9 @@ static void r_stop(struct seq_file *m, void *v)
 static int show_rcubarrier(struct seq_file *m, void *v)
 {
 	struct rcu_state *rsp = (struct rcu_state *)m->private;
-	seq_printf(m, "bcc: %d nbd: %lu\n",
+	seq_printf(m, "bcc: %d bseq: %lu\n",
 		   atomic_read(&rsp->barrier_cpu_count),
-		   rsp->n_barrier_done);
+		   rsp->barrier_sequence);
 	return 0;
 }
 
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC tip/core/rcu 11/14] rcu: Consolidate last open-coded expedited memory barrier
  2015-06-30 22:25 ` [PATCH RFC tip/core/rcu 01/14] rcu: Switch synchronize_sched_expedited() to stop_one_cpu() Paul E. McKenney
                     ` (8 preceding siblings ...)
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 10/14] rcu: Apply rcu_seq operations to _rcu_barrier() Paul E. McKenney
@ 2015-06-30 22:25   ` Paul E. McKenney
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 12/14] rcu: Extend expedited funnel locking to rcu_data structure Paul E. McKenney
                     ` (2 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Paul E. McKenney @ 2015-06-30 22:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

One of the requirements on RCU grace periods is that if there is a
causal chain of operations that starts after one grace period and
ends before another grace period, then the two grace periods must
be serialized.  There has been (and might still be) code that relies
on this, for example, certain types of reference-counting code that
does a call_rcu() within an RCU callback function.

This requirement is why there is an smp_mb() at the end of both
synchronize_sched_expedited() and synchronize_rcu_expedited().
However, this is the only smp_mb() in these functions, so it would
be nicer to consolidate it into rcu_exp_gp_seq_end().  This commit
does just that.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c        | 2 +-
 kernel/rcu/tree_plugin.h | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 041968e3b7ce..e64416ad5c45 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3351,6 +3351,7 @@ static void rcu_exp_gp_seq_start(struct rcu_state *rsp)
 static void rcu_exp_gp_seq_end(struct rcu_state *rsp)
 {
 	rcu_seq_end(&rsp->expedited_sequence);
+	smp_mb(); /* Ensure that consecutive grace periods serialize. */
 }
 static unsigned long rcu_exp_gp_seq_snap(struct rcu_state *rsp)
 {
@@ -3483,7 +3484,6 @@ void synchronize_sched_expedited(void)
 
 	rcu_exp_gp_seq_end(rsp);
 	mutex_unlock(&rnp->exp_funnel_mutex);
-	smp_mb(); /* ensure subsequent action seen after grace period. */
 
 	put_online_cpus();
 }
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index f5055b6b08a0..4225efe8086d 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -735,7 +735,6 @@ void synchronize_rcu_expedited(void)
 	/* Clean up and exit. */
 	rcu_exp_gp_seq_end(rsp);
 	mutex_unlock(&rnp_unlock->exp_funnel_mutex);
-	smp_mb(); /* ensure subsequent action seen after grace period. */
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
 
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC tip/core/rcu 12/14] rcu: Extend expedited funnel locking to rcu_data structure
  2015-06-30 22:25 ` [PATCH RFC tip/core/rcu 01/14] rcu: Switch synchronize_sched_expedited() to stop_one_cpu() Paul E. McKenney
                     ` (9 preceding siblings ...)
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 11/14] rcu: Consolidate last open-coded expedited memory barrier Paul E. McKenney
@ 2015-06-30 22:25   ` Paul E. McKenney
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 13/14] rcu: Add stall warnings to synchronize_sched_expedited() Paul E. McKenney
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 14/14] documentation: Describe new expedited stall warnings Paul E. McKenney
  12 siblings, 0 replies; 22+ messages in thread
From: Paul E. McKenney @ 2015-06-30 22:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

The strictly rcu_node based funnel-locking scheme works well in many
cases, but systems with CONFIG_RCU_FANOUT_LEAF=64 won't necessarily get
all that much concurrency.  This commit therefore extends the funnel
locking into the per-CPU rcu_data structure, providing concurrency equal
to the number of CPUs.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c       | 19 ++++++++++++++++---
 kernel/rcu/tree.h       |  4 +++-
 kernel/rcu/tree_trace.c |  3 ++-
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index e64416ad5c45..f39ac217916a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3364,11 +3364,14 @@ static bool rcu_exp_gp_seq_done(struct rcu_state *rsp, unsigned long s)
 
 /* Common code for synchronize_{rcu,sched}_expedited() work-done checking. */
 static bool sync_exp_work_done(struct rcu_state *rsp, struct rcu_node *rnp,
+			       struct rcu_data *rdp,
 			       atomic_long_t *stat, unsigned long s)
 {
 	if (rcu_exp_gp_seq_done(rsp, s)) {
 		if (rnp)
 			mutex_unlock(&rnp->exp_funnel_mutex);
+		else if (rdp)
+			mutex_unlock(&rdp->exp_funnel_mutex);
 		/* Ensure test happens before caller kfree(). */
 		smp_mb__before_atomic(); /* ^^^ */
 		atomic_long_inc(stat);
@@ -3384,6 +3387,7 @@ static bool sync_exp_work_done(struct rcu_state *rsp, struct rcu_node *rnp,
  */
 static struct rcu_node *exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
 {
+	struct rcu_data *rdp;
 	struct rcu_node *rnp0;
 	struct rcu_node *rnp1 = NULL;
 
@@ -3395,16 +3399,24 @@ static struct rcu_node *exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
 	 * can be inexact, as it is just promoting locality and is not
 	 * strictly needed for correctness.
 	 */
-	rnp0 = per_cpu_ptr(rsp->rda, raw_smp_processor_id())->mynode;
+	rdp = per_cpu_ptr(rsp->rda, raw_smp_processor_id());
+	if (sync_exp_work_done(rsp, NULL, NULL, &rsp->expedited_workdone1, s))
+		return NULL;
+	mutex_lock(&rdp->exp_funnel_mutex);
+	rnp0 = rdp->mynode;
 	for (; rnp0 != NULL; rnp0 = rnp0->parent) {
-		if (sync_exp_work_done(rsp, rnp1, &rsp->expedited_workdone1, s))
+		if (sync_exp_work_done(rsp, rnp1, rdp,
+				       &rsp->expedited_workdone2, s))
 			return NULL;
 		mutex_lock(&rnp0->exp_funnel_mutex);
 		if (rnp1)
 			mutex_unlock(&rnp1->exp_funnel_mutex);
+		else
+			mutex_unlock(&rdp->exp_funnel_mutex);
 		rnp1 = rnp0;
 	}
-	if (sync_exp_work_done(rsp, rnp1, &rsp->expedited_workdone2, s))
+	if (sync_exp_work_done(rsp, rnp1, rdp,
+			       &rsp->expedited_workdone3, s))
 		return NULL;
 	return rnp1;
 }
@@ -3785,6 +3797,7 @@ rcu_boot_init_percpu_data(int cpu, struct rcu_state *rsp)
 	WARN_ON_ONCE(atomic_read(&rdp->dynticks->dynticks) != 1);
 	rdp->cpu = cpu;
 	rdp->rsp = rsp;
+	mutex_init(&rdp->exp_funnel_mutex);
 	rcu_boot_init_nocb_percpu_data(rdp);
 	raw_spin_unlock_irqrestore(&rnp->lock, flags);
 }
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index e40b65d45495..1b38f11dba06 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -364,11 +364,12 @@ struct rcu_data {
 	unsigned long n_rp_nocb_defer_wakeup;
 	unsigned long n_rp_need_nothing;
 
-	/* 6) _rcu_barrier() and OOM callbacks. */
+	/* 6) _rcu_barrier(), OOM callbacks, and expediting. */
 	struct rcu_head barrier_head;
 #ifdef CONFIG_RCU_FAST_NO_HZ
 	struct rcu_head oom_head;
 #endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */
+	struct mutex exp_funnel_mutex;
 
 	/* 7) Callback offloading. */
 #ifdef CONFIG_RCU_NOCB_CPU
@@ -492,6 +493,7 @@ struct rcu_state {
 	atomic_long_t expedited_tryfail;	/* # acquisition failures. */
 	atomic_long_t expedited_workdone1;	/* # done by others #1. */
 	atomic_long_t expedited_workdone2;	/* # done by others #2. */
+	atomic_long_t expedited_workdone3;	/* # done by others #3. */
 	atomic_long_t expedited_normal;		/* # fallbacks to normal. */
 	atomic_t expedited_need_qs;		/* # CPUs left to check in. */
 	wait_queue_head_t expedited_wq;		/* Wait for check-ins. */
diff --git a/kernel/rcu/tree_trace.c b/kernel/rcu/tree_trace.c
index d9982a2ce305..ec62369f1b02 100644
--- a/kernel/rcu/tree_trace.c
+++ b/kernel/rcu/tree_trace.c
@@ -185,11 +185,12 @@ static int show_rcuexp(struct seq_file *m, void *v)
 {
 	struct rcu_state *rsp = (struct rcu_state *)m->private;
 
-	seq_printf(m, "t=%lu tf=%lu wd1=%lu wd2=%lu n=%lu enq=%d sc=%lu\n",
+	seq_printf(m, "t=%lu tf=%lu wd1=%lu wd2=%lu wd3=%lu n=%lu enq=%d sc=%lu\n",
 		   rsp->expedited_sequence,
 		   atomic_long_read(&rsp->expedited_tryfail),
 		   atomic_long_read(&rsp->expedited_workdone1),
 		   atomic_long_read(&rsp->expedited_workdone2),
+		   atomic_long_read(&rsp->expedited_workdone3),
 		   atomic_long_read(&rsp->expedited_normal),
 		   atomic_read(&rsp->expedited_need_qs),
 		   rsp->expedited_sequence / 2);
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC tip/core/rcu 13/14] rcu: Add stall warnings to synchronize_sched_expedited()
  2015-06-30 22:25 ` [PATCH RFC tip/core/rcu 01/14] rcu: Switch synchronize_sched_expedited() to stop_one_cpu() Paul E. McKenney
                     ` (10 preceding siblings ...)
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 12/14] rcu: Extend expedited funnel locking to rcu_data structure Paul E. McKenney
@ 2015-06-30 22:25   ` Paul E. McKenney
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 14/14] documentation: Describe new expedited stall warnings Paul E. McKenney
  12 siblings, 0 replies; 22+ messages in thread
From: Paul E. McKenney @ 2015-06-30 22:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Although synchronize_sched_expedited() historically has no RCU CPU stall
warnings, the availability of the rcupdate.rcu_expedited boot parameter
invalidates the old assumption that synchronize_sched()'s stall warnings
would suffice.  This commit therefore adds RCU CPU stall warnings to
synchronize_sched_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++----
 kernel/rcu/tree.h |  1 +
 2 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f39ac217916a..4992bfd360b6 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3421,16 +3421,65 @@ static struct rcu_node *exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
 	return rnp1;
 }
 
+/* Invoked on each online non-idle CPU for expedited quiescent state. */
 static int synchronize_sched_expedited_cpu_stop(void *data)
 {
-	struct rcu_state *rsp = data;
+	struct rcu_data *rdp = data;
+	struct rcu_state *rsp = rdp->rsp;
 
 	/* We are here: If we are last, do the wakeup. */
+	rdp->exp_done = true;
 	if (atomic_dec_and_test(&rsp->expedited_need_qs))
 		wake_up(&rsp->expedited_wq);
 	return 0;
 }
 
+static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
+{
+	int cpu;
+	unsigned long jiffies_stall;
+	unsigned long jiffies_start;
+	struct rcu_data *rdp;
+	int ret;
+
+	jiffies_stall = rcu_jiffies_till_stall_check();
+	jiffies_start = jiffies;
+
+	for (;;) {
+		ret = wait_event_interruptible_timeout(
+				rsp->expedited_wq,
+				!atomic_read(&rsp->expedited_need_qs),
+				jiffies_stall);
+		if (ret > 0)
+			return;
+		if (ret < 0) {
+			/* Hit a signal, disable CPU stall warnings. */
+			wait_event(rsp->expedited_wq,
+				   !atomic_read(&rsp->expedited_need_qs));
+			return;
+		}
+		pr_err("INFO: %s detected expedited stalls on CPUs: {",
+		       rsp->name);
+		for_each_online_cpu(cpu) {
+			rdp = per_cpu_ptr(rsp->rda, cpu);
+
+			if (rdp->exp_done)
+				continue;
+			pr_cont(" %d", cpu);
+		}
+		pr_cont(" } %lu jiffies s: %lu\n",
+			jiffies - jiffies_start, rsp->expedited_sequence);
+		for_each_online_cpu(cpu) {
+			rdp = per_cpu_ptr(rsp->rda, cpu);
+
+			if (rdp->exp_done)
+				continue;
+			dump_cpu_task(cpu);
+		}
+		jiffies_stall = 3 * rcu_jiffies_till_stall_check() + 3;
+	}
+}
+
 /**
  * synchronize_sched_expedited - Brute-force RCU-sched grace period
  *
@@ -3480,19 +3529,20 @@ void synchronize_sched_expedited(void)
 		struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
 		struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
 
+		rdp->exp_done = false;
+
 		/* Skip our CPU and any idle CPUs. */
 		if (raw_smp_processor_id() == cpu ||
 		    !(atomic_add_return(0, &rdtp->dynticks) & 0x1))
 			continue;
 		atomic_inc(&rsp->expedited_need_qs);
 		stop_one_cpu_nowait(cpu, synchronize_sched_expedited_cpu_stop,
-				    rsp, &rdp->exp_stop_work);
+				    rdp, &rdp->exp_stop_work);
 	}
 
 	/* Remove extra count and, if necessary, wait for CPUs to stop. */
 	if (!atomic_dec_and_test(&rsp->expedited_need_qs))
-		wait_event(rsp->expedited_wq,
-			   !atomic_read(&rsp->expedited_need_qs));
+		synchronize_sched_expedited_wait(rsp);
 
 	rcu_exp_gp_seq_end(rsp);
 	mutex_unlock(&rnp->exp_funnel_mutex);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 1b38f11dba06..cb402b5c2e71 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -370,6 +370,7 @@ struct rcu_data {
 	struct rcu_head oom_head;
 #endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */
 	struct mutex exp_funnel_mutex;
+	bool exp_done;			/* Expedited QS for this CPU? */
 
 	/* 7) Callback offloading. */
 #ifdef CONFIG_RCU_NOCB_CPU
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFC tip/core/rcu 14/14] documentation: Describe new expedited stall warnings
  2015-06-30 22:25 ` [PATCH RFC tip/core/rcu 01/14] rcu: Switch synchronize_sched_expedited() to stop_one_cpu() Paul E. McKenney
                     ` (11 preceding siblings ...)
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 13/14] rcu: Add stall warnings to synchronize_sched_expedited() Paul E. McKenney
@ 2015-06-30 22:25   ` Paul E. McKenney
  12 siblings, 0 replies; 22+ messages in thread
From: Paul E. McKenney @ 2015-06-30 22:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, tglx,
	peterz, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani, Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/RCU/stallwarn.txt | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index 046f32637b95..efb9454875ab 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -163,6 +163,23 @@ message will be about three times the interval between the beginning
 of the stall and the first message.
 
 
+Stall Warnings for Expedited Grace Periods
+
+If an expedited grace period detects a stall, it will place a message
+like the following in dmesg:
+
+	INFO: rcu_sched detected expedited stalls on CPUs: { 1 2 6 } 26009 jiffies s: 1043
+
+This indicates that CPUs 1, 2, and 6 have failed to respond to a
+reschedule IPI, that the expedited grace period has been going on for
+26,009 jiffies, and that the expedited grace-period sequence counter is
+1043.  The fact that this last value is odd indicates that an expedited
+grace period is in flight.
+
+It is entirely possible to see stall warnings from normal and from
+expedited grace periods at about the same time from the same run.
+
+
 What Causes RCU CPU Stall Warnings?
 
 So your kernel printed an RCU CPU stall warning.  The next question is
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC tip/core/rcu 05/14] rcu: Abstract sequence counting from synchronize_sched_expedited()
  2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 05/14] rcu: Abstract sequence counting from synchronize_sched_expedited() Paul E. McKenney
@ 2015-07-01 10:27     ` Peter Zijlstra
  2015-07-01 22:18       ` Paul E. McKenney
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2015-07-01 10:27 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani

On Tue, Jun 30, 2015 at 03:25:45PM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> 
> This commit creates rcu_exp_gp_seq_start() and rcu_exp_gp_seq_end() to
> bracket an expedited grace period, rcu_exp_gp_seq_snap() to snapshot the
> sequence counter, and rcu_exp_gp_seq_done() to check to see if a full
> expedited grace period has elapsed since the snapshot.  These will be
> applied to synchronize_rcu_expedited().  These are defined in terms of
> underlying rcu_seq_start(), rcu_seq_end(), rcu_seq_snap(), rcu_seq_done(),
> which will be applied to _rcu_barrier().

It would be good to explain why you cannot use seqcount primitives.
They're >.< close.

> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  kernel/rcu/tree.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 58 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index c58fd27b4a22..f96500e462fd 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3307,6 +3307,60 @@ void cond_synchronize_sched(unsigned long oldstate)
>  }
>  EXPORT_SYMBOL_GPL(cond_synchronize_sched);
>  
> +/* Adjust sequence number for start of update-side operation. */
> +static void rcu_seq_start(unsigned long *sp)
> +{
> +	WRITE_ONCE(*sp, *sp + 1);
> +	smp_mb(); /* Ensure update-side operation after counter increment. */
> +	WARN_ON_ONCE(!(*sp & 0x1));
> +}

That wants to be an ACQUIRE, right?

> +
> +/* Adjust sequence number for end of update-side operation. */
> +static void rcu_seq_end(unsigned long *sp)
> +{
> +	smp_mb(); /* Ensure update-side operation before counter increment. */

And that wants to be a RELEASE, right?

> +	WRITE_ONCE(*sp, *sp + 1);

	smp_store_release();

even if balanced against a full barrier, might be better here?

> +	WARN_ON_ONCE(*sp & 0x1);
> +}

And the only difference between these and
raw_write_seqcount_{begin,end}() is the smp_wmb() vs your smp_mb().

Since seqcounts have a distinct read vs writer side, we really only care
about limiting the stores. I suspect you really do care about reads
between these 'sequence points'. A few words to that effect could
explain the existence of these primitives.

> +/* Take a snapshot of the update side's sequence number. */
> +static unsigned long rcu_seq_snap(unsigned long *sp)
> +{
> +	unsigned long s;
> +
> +	smp_mb(); /* Caller's modifications seen first by other CPUs. */
> +	s = (READ_ONCE(*sp) + 3) & ~0x1;
> +	smp_mb(); /* Above access must not bleed into critical section. */

	smp_load_acquire() then?

> +	return s;
> +}
> +
> +/*
> + * Given a snapshot from rcu_seq_snap(), determine whether or not a
> + * full update-side operation has occurred.
> + */
> +static bool rcu_seq_done(unsigned long *sp, unsigned long s)
> +{
> +	return ULONG_CMP_GE(READ_ONCE(*sp), s);

I'm always amused you're not wanting to rely on 2s complement for
integer overflow. I _know_ its undefined behaviour in the C rule book,
but the entire rest of the kernel hard assumes it.

> +}
> +
> +/* Wrapper functions for expedited grace periods.  */
> +static void rcu_exp_gp_seq_start(struct rcu_state *rsp)
> +{
> +	rcu_seq_start(&rsp->expedited_sequence);
> +}
> +static void rcu_exp_gp_seq_end(struct rcu_state *rsp)
> +{
> +	rcu_seq_end(&rsp->expedited_sequence);
> +}
> +static unsigned long rcu_exp_gp_seq_snap(struct rcu_state *rsp)
> +{
> +	return rcu_seq_snap(&rsp->expedited_sequence);
> +}
> +static bool rcu_exp_gp_seq_done(struct rcu_state *rsp, unsigned long s)
> +{
> +	return rcu_seq_done(&rsp->expedited_sequence, s);
> +}

This is wrappers for wrappers sake? Why?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC tip/core/rcu 05/14] rcu: Abstract sequence counting from synchronize_sched_expedited()
  2015-07-01 10:27     ` Peter Zijlstra
@ 2015-07-01 22:18       ` Paul E. McKenney
  2015-07-02  8:50         ` Peter Zijlstra
  2015-07-09  8:42         ` Dan Carpenter
  0 siblings, 2 replies; 22+ messages in thread
From: Paul E. McKenney @ 2015-07-01 22:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani

On Wed, Jul 01, 2015 at 12:27:17PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 30, 2015 at 03:25:45PM -0700, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > 
> > This commit creates rcu_exp_gp_seq_start() and rcu_exp_gp_seq_end() to
> > bracket an expedited grace period, rcu_exp_gp_seq_snap() to snapshot the
> > sequence counter, and rcu_exp_gp_seq_done() to check to see if a full
> > expedited grace period has elapsed since the snapshot.  These will be
> > applied to synchronize_rcu_expedited().  These are defined in terms of
> > underlying rcu_seq_start(), rcu_seq_end(), rcu_seq_snap(), rcu_seq_done(),
> > which will be applied to _rcu_barrier().
> 
> It would be good to explain why you cannot use seqcount primitives.
> They're >.< close.

They are indeed!  I gave it some thought, but it would inflict an
unnecessary smp_mb() on seqlocks, as you note below.

> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  kernel/rcu/tree.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++--------
> >  1 file changed, 58 insertions(+), 10 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index c58fd27b4a22..f96500e462fd 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3307,6 +3307,60 @@ void cond_synchronize_sched(unsigned long oldstate)
> >  }
> >  EXPORT_SYMBOL_GPL(cond_synchronize_sched);
> >  
> > +/* Adjust sequence number for start of update-side operation. */
> > +static void rcu_seq_start(unsigned long *sp)
> > +{
> > +	WRITE_ONCE(*sp, *sp + 1);
> > +	smp_mb(); /* Ensure update-side operation after counter increment. */
> > +	WARN_ON_ONCE(!(*sp & 0x1));
> > +}
> 
> That wants to be an ACQUIRE, right?

I cannot put the acquire in the WARN_ON_ONCE() because there
are configurations where WARN_ON_ONCE() is compiled out.  I could
conditionally compile, but given that this is nothing like a fastpath,
I cannot really justify doing that.

We could define an smp_store_acquire(), but that would require a full
barrier against subsequent loads.  The C++ committee hit this one when
trying to implement seqeunce locking using the C/C++11 atomics.  ;-)

> > +
> > +/* Adjust sequence number for end of update-side operation. */
> > +static void rcu_seq_end(unsigned long *sp)
> > +{
> > +	smp_mb(); /* Ensure update-side operation before counter increment. */
> 
> And that wants to be a RELEASE, right?
> 
> > +	WRITE_ONCE(*sp, *sp + 1);
> 
> 	smp_store_release();
> 
> even if balanced against a full barrier, might be better here?

I -think- it -might- be, and if it was in a fastpath, I might be
more motivated to worry about it.  I am not so sure that pairing an
smp_store_release() with a full memory barrier is in any way an aid to
readability, though.

> > +	WARN_ON_ONCE(*sp & 0x1);
> > +}
> 
> And the only difference between these and
> raw_write_seqcount_{begin,end}() is the smp_wmb() vs your smp_mb().
> 
> Since seqcounts have a distinct read vs writer side, we really only care
> about limiting the stores. I suspect you really do care about reads
> between these 'sequence points'. A few words to that effect could
> explain the existence of these primitives.

Excellent point!  I have updated the commit log accordingly.

> > +/* Take a snapshot of the update side's sequence number. */
> > +static unsigned long rcu_seq_snap(unsigned long *sp)
> > +{
> > +	unsigned long s;
> > +
> > +	smp_mb(); /* Caller's modifications seen first by other CPUs. */
> > +	s = (READ_ONCE(*sp) + 3) & ~0x1;
> > +	smp_mb(); /* Above access must not bleed into critical section. */
> 
> 	smp_load_acquire() then?

I have transitivity concerns.  Which might well be baseless, but again,
this is nowhere near a fastpath.

> > +	return s;
> > +}
> > +
> > +/*
> > + * Given a snapshot from rcu_seq_snap(), determine whether or not a
> > + * full update-side operation has occurred.
> > + */
> > +static bool rcu_seq_done(unsigned long *sp, unsigned long s)
> > +{
> > +	return ULONG_CMP_GE(READ_ONCE(*sp), s);
> 
> I'm always amused you're not wanting to rely on 2s complement for
> integer overflow. I _know_ its undefined behaviour in the C rule book,
> but the entire rest of the kernel hard assumes it.

I take it you have never seen the demonic glow in the eyes of a compiler
implementer when thinking of all the code that can be broken^W^W^W^W^W
optimizations that are enabled by relying on undefined behavior for
signed integer overflow?  ;-)

> > +}
> > +
> > +/* Wrapper functions for expedited grace periods.  */
> > +static void rcu_exp_gp_seq_start(struct rcu_state *rsp)
> > +{
> > +	rcu_seq_start(&rsp->expedited_sequence);
> > +}
> > +static void rcu_exp_gp_seq_end(struct rcu_state *rsp)
> > +{
> > +	rcu_seq_end(&rsp->expedited_sequence);
> > +}
> > +static unsigned long rcu_exp_gp_seq_snap(struct rcu_state *rsp)
> > +{
> > +	return rcu_seq_snap(&rsp->expedited_sequence);
> > +}
> > +static bool rcu_exp_gp_seq_done(struct rcu_state *rsp, unsigned long s)
> > +{
> > +	return rcu_seq_done(&rsp->expedited_sequence, s);
> > +}
> 
> This is wrappers for wrappers sake? Why?

For _rcu_barrier(), as noted in the commit log.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC tip/core/rcu 05/14] rcu: Abstract sequence counting from synchronize_sched_expedited()
  2015-07-01 22:18       ` Paul E. McKenney
@ 2015-07-02  8:50         ` Peter Zijlstra
  2015-07-02 14:13           ` Paul E. McKenney
  2015-07-09  8:42         ` Dan Carpenter
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2015-07-02  8:50 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani

On Wed, Jul 01, 2015 at 03:18:04PM -0700, Paul E. McKenney wrote:
> On Wed, Jul 01, 2015 at 12:27:17PM +0200, Peter Zijlstra wrote:

> > That wants to be an ACQUIRE, right?
> 
> I cannot put the acquire in the WARN_ON_ONCE() because there
> are configurations where WARN_ON_ONCE() is compiled out.  I could
> conditionally compile, but given that this is nothing like a fastpath,
> I cannot really justify doing that.

Fair enough.

> We could define an smp_store_acquire(), but that would require a full
> barrier against subsequent loads.  The C++ committee hit this one when
> trying to implement seqeunce locking using the C/C++11 atomics.  ;-)

Yeah, I'm not sure how much sense smp_store_acquire() makes, but I'm
fairly sure this isn't the first time I've wondered about it.

> > > +static bool rcu_seq_done(unsigned long *sp, unsigned long s)
> > > +{
> > > +	return ULONG_CMP_GE(READ_ONCE(*sp), s);
> > 
> > I'm always amused you're not wanting to rely on 2s complement for
> > integer overflow. I _know_ its undefined behaviour in the C rule book,
> > but the entire rest of the kernel hard assumes it.
> 
> I take it you have never seen the demonic glow in the eyes of a compiler
> implementer when thinking of all the code that can be broken^W^W^W^W^W
> optimizations that are enabled by relying on undefined behavior for
> signed integer overflow?  ;-)

Note that this is unsigned integers, but yes I know, you've said. But
they cannot unilaterally change this 'undefined' behaviour because its
been defined as 'whatever the hardware does' for such a long time.

Likewise they can dream all they want about breaking our concurrent code
and state we should use the brand spanking new primitives, sod 30 years
of existing code, but that's just not realistic either.

Even if we didn't 'have' to support a wide range of compiler versions,
most of which do not even support these new fangled primitives, who is
going to audit our existing many million lines of code? Not to mention
the many more million lines of code in other projects that rely on these
same things.

Its really time for them to stop wanking and stare reality in the face.

> > > +/* Wrapper functions for expedited grace periods.  */
> > > +static void rcu_exp_gp_seq_start(struct rcu_state *rsp)
> > > +{
> > > +	rcu_seq_start(&rsp->expedited_sequence);
> > > +}
> > > +static void rcu_exp_gp_seq_end(struct rcu_state *rsp)
> > > +{
> > > +	rcu_seq_end(&rsp->expedited_sequence);
> > > +}
> > > +static unsigned long rcu_exp_gp_seq_snap(struct rcu_state *rsp)
> > > +{
> > > +	return rcu_seq_snap(&rsp->expedited_sequence);
> > > +}
> > > +static bool rcu_exp_gp_seq_done(struct rcu_state *rsp, unsigned long s)
> > > +{
> > > +	return rcu_seq_done(&rsp->expedited_sequence, s);
> > > +}
> > 
> > This is wrappers for wrappers sake? Why?
> 
> For _rcu_barrier(), as noted in the commit log.

Yes it said; but why? Surely _rcu_barrier() can do the
->expedited_sequence thing itself, that hardly seems worthy of a
wrapper.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC tip/core/rcu 05/14] rcu: Abstract sequence counting from synchronize_sched_expedited()
  2015-07-02  8:50         ` Peter Zijlstra
@ 2015-07-02 14:13           ` Paul E. McKenney
  2015-07-02 16:50             ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Paul E. McKenney @ 2015-07-02 14:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani

On Thu, Jul 02, 2015 at 10:50:41AM +0200, Peter Zijlstra wrote:
> On Wed, Jul 01, 2015 at 03:18:04PM -0700, Paul E. McKenney wrote:
> > On Wed, Jul 01, 2015 at 12:27:17PM +0200, Peter Zijlstra wrote:
> 
> > > That wants to be an ACQUIRE, right?
> > 
> > I cannot put the acquire in the WARN_ON_ONCE() because there
> > are configurations where WARN_ON_ONCE() is compiled out.  I could
> > conditionally compile, but given that this is nothing like a fastpath,
> > I cannot really justify doing that.
> 
> Fair enough.
> 
> > We could define an smp_store_acquire(), but that would require a full
> > barrier against subsequent loads.  The C++ committee hit this one when
> > trying to implement seqeunce locking using the C/C++11 atomics.  ;-)
> 
> Yeah, I'm not sure how much sense smp_store_acquire() makes, but I'm
> fairly sure this isn't the first time I've wondered about it.
> 
> > > > +static bool rcu_seq_done(unsigned long *sp, unsigned long s)
> > > > +{
> > > > +	return ULONG_CMP_GE(READ_ONCE(*sp), s);
> > > 
> > > I'm always amused you're not wanting to rely on 2s complement for
> > > integer overflow. I _know_ its undefined behaviour in the C rule book,
> > > but the entire rest of the kernel hard assumes it.
> > 
> > I take it you have never seen the demonic glow in the eyes of a compiler
> > implementer when thinking of all the code that can be broken^W^W^W^W^W
> > optimizations that are enabled by relying on undefined behavior for
> > signed integer overflow?  ;-)
> 
> Note that this is unsigned integers, but yes I know, you've said. But
> they cannot unilaterally change this 'undefined' behaviour because its
> been defined as 'whatever the hardware does' for such a long time.

For pure unsigned arithmetic, their options are indeed limited.  For a
cast to signed, I am not so sure.  I have been using time_before() and
friends for jiffy comparisons, which does a cast to signed after the
subtraction.  Signed overflow is already unsafe with current compilers,
though the kernel suppresses these.

> Likewise they can dream all they want about breaking our concurrent code
> and state we should use the brand spanking new primitives, sod 30 years
> of existing code, but that's just not realistic either.
> 
> Even if we didn't 'have' to support a wide range of compiler versions,
> most of which do not even support these new fangled primitives, who is
> going to audit our existing many million lines of code? Not to mention
> the many more million lines of code in other projects that rely on these
> same things.
> 
> Its really time for them to stop wanking and stare reality in the face.

Indeed, I have been and will be continuing to make myself unpopular with
that topic.  ;-)

> > > > +/* Wrapper functions for expedited grace periods.  */
> > > > +static void rcu_exp_gp_seq_start(struct rcu_state *rsp)
> > > > +{
> > > > +	rcu_seq_start(&rsp->expedited_sequence);
> > > > +}
> > > > +static void rcu_exp_gp_seq_end(struct rcu_state *rsp)
> > > > +{
> > > > +	rcu_seq_end(&rsp->expedited_sequence);
> > > > +}
> > > > +static unsigned long rcu_exp_gp_seq_snap(struct rcu_state *rsp)
> > > > +{
> > > > +	return rcu_seq_snap(&rsp->expedited_sequence);
> > > > +}
> > > > +static bool rcu_exp_gp_seq_done(struct rcu_state *rsp, unsigned long s)
> > > > +{
> > > > +	return rcu_seq_done(&rsp->expedited_sequence, s);
> > > > +}
> > > 
> > > This is wrappers for wrappers sake? Why?
> > 
> > For _rcu_barrier(), as noted in the commit log.
> 
> Yes it said; but why? Surely _rcu_barrier() can do the
> ->expedited_sequence thing itself, that hardly seems worthy of a
> wrapper.

Ah, you want synchronize_rcu_expedited() and synchronize_sched_expedited()
to use rcu_seq_start() and friends directly.  I can certainly do that.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC tip/core/rcu 05/14] rcu: Abstract sequence counting from synchronize_sched_expedited()
  2015-07-02 14:13           ` Paul E. McKenney
@ 2015-07-02 16:50             ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2015-07-02 16:50 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, tglx, rostedt, dhowells, edumazet, dvhart, fweisbec, oleg,
	bobby.prani

On Thu, Jul 02, 2015 at 07:13:30AM -0700, Paul E. McKenney wrote:

> > Its really time for them to stop wanking and stare reality in the face.
> 
> Indeed, I have been and will be continuing to make myself unpopular with
> that topic.  ;-)

Thanks!!

> > > > > +/* Wrapper functions for expedited grace periods.  */
> > > > > +static void rcu_exp_gp_seq_start(struct rcu_state *rsp)
> > > > > +{
> > > > > +	rcu_seq_start(&rsp->expedited_sequence);
> > > > > +}
> > > > > +static void rcu_exp_gp_seq_end(struct rcu_state *rsp)
> > > > > +{
> > > > > +	rcu_seq_end(&rsp->expedited_sequence);
> > > > > +}
> > > > > +static unsigned long rcu_exp_gp_seq_snap(struct rcu_state *rsp)
> > > > > +{
> > > > > +	return rcu_seq_snap(&rsp->expedited_sequence);
> > > > > +}
> > > > > +static bool rcu_exp_gp_seq_done(struct rcu_state *rsp, unsigned long s)
> > > > > +{
> > > > > +	return rcu_seq_done(&rsp->expedited_sequence, s);
> > > > > +}
> > > > 
> > > > This is wrappers for wrappers sake? Why?
> > > 
> > > For _rcu_barrier(), as noted in the commit log.
> > 
> > Yes it said; but why? Surely _rcu_barrier() can do the
> > ->expedited_sequence thing itself, that hardly seems worthy of a
> > wrapper.
> 
> Ah, you want synchronize_rcu_expedited() and synchronize_sched_expedited()
> to use rcu_seq_start() and friends directly.  I can certainly do that.

Well, 'want' is a strong word, I was just questioning the use of these
trivial wrappers.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC tip/core/rcu 05/14] rcu: Abstract sequence counting from synchronize_sched_expedited()
  2015-07-01 22:18       ` Paul E. McKenney
  2015-07-02  8:50         ` Peter Zijlstra
@ 2015-07-09  8:42         ` Dan Carpenter
  2015-07-09 14:21           ` Paul E. McKenney
  1 sibling, 1 reply; 22+ messages in thread
From: Dan Carpenter @ 2015-07-09  8:42 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Wed, Jul 01, 2015 at 03:18:04PM -0700, Paul E. McKenney wrote:
> > > +/* Adjust sequence number for start of update-side operation. */
> > > +static void rcu_seq_start(unsigned long *sp)
> > > +{
> > > +	WRITE_ONCE(*sp, *sp + 1);
> > > +	smp_mb(); /* Ensure update-side operation after counter increment. */
> > > +	WARN_ON_ONCE(!(*sp & 0x1));
> > > +}
> > 
> > That wants to be an ACQUIRE, right?
> 
> I cannot put the acquire in the WARN_ON_ONCE() because there
> are configurations where WARN_ON_ONCE() is compiled out.

I think WARN_ON_ONCE() always evaulates the condition.  You are maybe
thinking of VM_WARN_ON_ONCE().

I'm on a different thread where we almost introduced a bug by using
VM_WARN_ONCE() instead of WARN_ONCE().  The VM_WARNING conditions had
long execute times so they are weird.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFC tip/core/rcu 05/14] rcu: Abstract sequence counting from synchronize_sched_expedited()
  2015-07-09  8:42         ` Dan Carpenter
@ 2015-07-09 14:21           ` Paul E. McKenney
  0 siblings, 0 replies; 22+ messages in thread
From: Paul E. McKenney @ 2015-07-09 14:21 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Peter Zijlstra, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Thu, Jul 09, 2015 at 11:42:49AM +0300, Dan Carpenter wrote:
> On Wed, Jul 01, 2015 at 03:18:04PM -0700, Paul E. McKenney wrote:
> > > > +/* Adjust sequence number for start of update-side operation. */
> > > > +static void rcu_seq_start(unsigned long *sp)
> > > > +{
> > > > +	WRITE_ONCE(*sp, *sp + 1);
> > > > +	smp_mb(); /* Ensure update-side operation after counter increment. */
> > > > +	WARN_ON_ONCE(!(*sp & 0x1));
> > > > +}
> > > 
> > > That wants to be an ACQUIRE, right?
> > 
> > I cannot put the acquire in the WARN_ON_ONCE() because there
> > are configurations where WARN_ON_ONCE() is compiled out.
> 
> I think WARN_ON_ONCE() always evaulates the condition.  You are maybe
> thinking of VM_WARN_ON_ONCE().

Even if it happens to now, it is only a matter of time until the
tinification people make it optional.

> I'm on a different thread where we almost introduced a bug by using
> VM_WARN_ONCE() instead of WARN_ONCE().  The VM_WARNING conditions had
> long execute times so they are weird.

New one on me!  At first glance, they look like they map pretty
directly, though.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2015-07-09 14:21 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-30 22:25 [PATCH RFC tip/core/rcu 0/14] Rework expedited grace periods Paul E. McKenney
2015-06-30 22:25 ` [PATCH RFC tip/core/rcu 01/14] rcu: Switch synchronize_sched_expedited() to stop_one_cpu() Paul E. McKenney
2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 02/14] rcu: Rework synchronize_rcu_expedited() counter handling Paul E. McKenney
2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 03/14] rcu: Get rid of synchronize_sched_expedited()'s polling loop Paul E. McKenney
2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 04/14] rcu: Make expedited GP CPU stoppage asynchronous Paul E. McKenney
2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 05/14] rcu: Abstract sequence counting from synchronize_sched_expedited() Paul E. McKenney
2015-07-01 10:27     ` Peter Zijlstra
2015-07-01 22:18       ` Paul E. McKenney
2015-07-02  8:50         ` Peter Zijlstra
2015-07-02 14:13           ` Paul E. McKenney
2015-07-02 16:50             ` Peter Zijlstra
2015-07-09  8:42         ` Dan Carpenter
2015-07-09 14:21           ` Paul E. McKenney
2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 06/14] rcu: Make synchronize_rcu_expedited() use sequence-counter scheme Paul E. McKenney
2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 07/14] rcu: Abstract funnel locking from synchronize_sched_expedited() Paul E. McKenney
2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 08/14] rcu: Fix synchronize_sched_expedited() type error for "s" Paul E. McKenney
2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 09/14] rcu: Use funnel locking for synchronize_rcu_expedited()'s polling loop Paul E. McKenney
2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 10/14] rcu: Apply rcu_seq operations to _rcu_barrier() Paul E. McKenney
2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 11/14] rcu: Consolidate last open-coded expedited memory barrier Paul E. McKenney
2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 12/14] rcu: Extend expedited funnel locking to rcu_data structure Paul E. McKenney
2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 13/14] rcu: Add stall warnings to synchronize_sched_expedited() Paul E. McKenney
2015-06-30 22:25   ` [PATCH RFC tip/core/rcu 14/14] documentation: Describe new expedited stall warnings Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).