linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH tip/core/rcu 0/12] Miscellaneous RCU fixes for 4.8
@ 2016-06-15 21:45 Paul E. McKenney
  2016-06-15 21:46 ` [PATCH tip/core/rcu 01/12] rcu: Fix outdated rcu_scheduler_active comment Paul E. McKenney
                   ` (11 more replies)
  0 siblings, 12 replies; 25+ messages in thread
From: Paul E. McKenney @ 2016-06-15 21:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani

Hello!

This series contains miscellaneous fixes:

1.	Fix outdated rcu_scheduler_active comment.

2.	Fix outdated hotplug-exclusion comment in rcu_gp_init().

3.	Remove some superfluous lines, courtesy of Peter Zijlstra.

4.	Move expedited code from tree.c to tree_exp.h.

5.	Move expedited code from tree_plugin.h to tree_exp.h.

6.	Document RCU_NONIDLE() restrictions in comment header.

7.	No ordering for rcu_assign_pointer() of NULL.

8.	Disable TASKS_RCU for usermode Linux in preparation for
	making call_rcu_tasks() tolerate being invoked with interrupts
	disabled.

9.	Make call_rcu_tasks() tolerate first call with irqs disabled.

10.	Fix a typo in a comment.

11.	Panic on RCU Stall at user's option, courtesy of Daniel Bristot
	de Oliveira.

12.	Correctly handle sparse possible cpus, courtesy of Mark Rutland.

							Thanx, Paul

------------------------------------------------------------------------

 Documentation/sysctl/kernel.txt |   12 
 include/linux/kernel.h          |    1 
 include/linux/rcupdate.h        |   23 +
 init/Kconfig                    |    1 
 kernel/rcu/rcutorture.c         |    2 
 kernel/rcu/tree.c               |  586 +---------------------------------
 kernel/rcu/tree.h               |   15 
 kernel/rcu/tree_exp.h           |  674 +++++++++++++++++++++++++++++++++++++++-
 kernel/rcu/tree_plugin.h        |   93 -----
 kernel/rcu/update.c             |    7 
 kernel/sched/fair.c             |    2 
 kernel/sysctl.c                 |   11 
 12 files changed, 757 insertions(+), 670 deletions(-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH tip/core/rcu 01/12] rcu: Fix outdated rcu_scheduler_active comment
  2016-06-15 21:45 [PATCH tip/core/rcu 0/12] Miscellaneous RCU fixes for 4.8 Paul E. McKenney
@ 2016-06-15 21:46 ` Paul E. McKenney
  2016-06-15 21:46 ` [PATCH tip/core/rcu 02/12] rcu: Fix outdated hotplug-exclusion comment in rcu_gp_init() Paul E. McKenney
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Paul E. McKenney @ 2016-06-15 21:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

The comment header for rcu_scheduler_active states that it is used
to optimize synchronize_sched() at early boot.  This is incorrect.
The synchronize_sched() function instead checks the number of online
CPUs.  This commit therefore replaces the comment's synchronize_sched()
with synchronize_rcu(), which really does use rcu_scheduler_active for
this purpose.

Reported-by: Lihao Liang <lihao.liang@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c7f1bc4f817c..0fa692f1094e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -130,7 +130,7 @@ int rcu_num_nodes __read_mostly = NUM_RCU_NODES; /* Total # rcu_nodes in use. */
  * The rcu_scheduler_active variable transitions from zero to one just
  * before the first task is spawned.  So when this variable is zero, RCU
  * can assume that there is but one task, allowing RCU to (for example)
- * optimize synchronize_sched() to a simple barrier().  When this variable
+ * optimize synchronize_rcu() to a simple barrier().  When this variable
  * is one, RCU must actually do all the hard work required to detect real
  * grace periods.  This variable is also used to suppress boot-time false
  * positives from lockdep-RCU error checking.
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH tip/core/rcu 02/12] rcu: Fix outdated hotplug-exclusion comment in rcu_gp_init()
  2016-06-15 21:45 [PATCH tip/core/rcu 0/12] Miscellaneous RCU fixes for 4.8 Paul E. McKenney
  2016-06-15 21:46 ` [PATCH tip/core/rcu 01/12] rcu: Fix outdated rcu_scheduler_active comment Paul E. McKenney
@ 2016-06-15 21:46 ` Paul E. McKenney
  2016-06-15 21:46 ` [PATCH tip/core/rcu 03/12] rcu: Remove some superfluous lines Paul E. McKenney
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Paul E. McKenney @ 2016-06-15 21:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

In the past, RCU grace-period initialization excluded CPU-hotplug
operations, but this is no longer the case.  This commit therefore
removed an outdated comment in rcu_gp_init() claiming that these
are excluded.

Reported-by: Lihao Liang <lihao.liang@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0fa692f1094e..6043c14165d1 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1989,8 +1989,7 @@ static bool rcu_gp_init(struct rcu_state *rsp)
 	 * of the tree within the rsp->node[] array.  Note that other CPUs
 	 * will access only the leaves of the hierarchy, thus seeing that no
 	 * grace period is in progress, at least until the corresponding
-	 * leaf node has been initialized.  In addition, we have excluded
-	 * CPU-hotplug operations.
+	 * leaf node has been initialized.
 	 *
 	 * The grace period cannot complete until the initialization
 	 * process finishes, because this kthread handles both.
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH tip/core/rcu 03/12] rcu: Remove some superfluous lines
  2016-06-15 21:45 [PATCH tip/core/rcu 0/12] Miscellaneous RCU fixes for 4.8 Paul E. McKenney
  2016-06-15 21:46 ` [PATCH tip/core/rcu 01/12] rcu: Fix outdated rcu_scheduler_active comment Paul E. McKenney
  2016-06-15 21:46 ` [PATCH tip/core/rcu 02/12] rcu: Fix outdated hotplug-exclusion comment in rcu_gp_init() Paul E. McKenney
@ 2016-06-15 21:46 ` Paul E. McKenney
  2016-06-15 21:46 ` [PATCH tip/core/rcu 04/12] rcu: Move expedited code from tree.c to tree_exp.h Paul E. McKenney
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Paul E. McKenney @ 2016-06-15 21:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

From: Peter Zijlstra <peterz@infradead.org>

I think you'll find this condition is superfluous, as the whole function
is under #ifdef of that same.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 6043c14165d1..4aefeafb9a95 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4363,9 +4363,6 @@ static void rcu_cleanup_dying_idle_cpu(int cpu, struct rcu_state *rsp)
 	struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
 	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
 
-	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
-		return;
-
 	/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
 	mask = rdp->grpmask;
 	raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH tip/core/rcu 04/12] rcu: Move expedited code from tree.c to tree_exp.h
  2016-06-15 21:45 [PATCH tip/core/rcu 0/12] Miscellaneous RCU fixes for 4.8 Paul E. McKenney
                   ` (2 preceding siblings ...)
  2016-06-15 21:46 ` [PATCH tip/core/rcu 03/12] rcu: Remove some superfluous lines Paul E. McKenney
@ 2016-06-15 21:46 ` Paul E. McKenney
  2016-06-15 22:05   ` Peter Zijlstra
  2016-06-17 15:48   ` Pranith Kumar
  2016-06-15 21:46 ` [PATCH tip/core/rcu 05/12] rcu: Move expedited code from tree_plugin.h " Paul E. McKenney
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 25+ messages in thread
From: Paul E. McKenney @ 2016-06-15 21:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

People have been having some difficulty finding their way around the
RCU code.  This commit therefore pulls some of the expedited grace-period
code from tree.c to a new tree_exp.h file.  This commit is strictly code
movement, with the exception of a forward declaration that was added
for the sync_sched_exp_online_cleanup() function.

A subsequent commit will move the remaining expedited grace-period code
from tree_plugin.h to tree_exp.h.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c     | 545 +-----------------------------------------------
 kernel/rcu/tree_exp.h | 564 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 566 insertions(+), 543 deletions(-)
 create mode 100644 kernel/rcu/tree_exp.h

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 4aefeafb9a95..c844b6142a86 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -159,6 +159,7 @@ static void invoke_rcu_core(void);
 static void invoke_rcu_callbacks(struct rcu_state *rsp, struct rcu_data *rdp);
 static void rcu_report_exp_rdp(struct rcu_state *rsp,
 			       struct rcu_data *rdp, bool wake);
+static void sync_sched_exp_online_cleanup(int cpu);
 
 /* rcuc/rcub kthread realtime priority */
 #ifdef CONFIG_RCU_KTHREAD_PRIO
@@ -3447,549 +3448,6 @@ static bool rcu_seq_done(unsigned long *sp, unsigned long s)
 	return ULONG_CMP_GE(READ_ONCE(*sp), s);
 }
 
-/* Wrapper functions for expedited grace periods.  */
-static void rcu_exp_gp_seq_start(struct rcu_state *rsp)
-{
-	rcu_seq_start(&rsp->expedited_sequence);
-}
-static void rcu_exp_gp_seq_end(struct rcu_state *rsp)
-{
-	rcu_seq_end(&rsp->expedited_sequence);
-	smp_mb(); /* Ensure that consecutive grace periods serialize. */
-}
-static unsigned long rcu_exp_gp_seq_snap(struct rcu_state *rsp)
-{
-	unsigned long s;
-
-	smp_mb(); /* Caller's modifications seen first by other CPUs. */
-	s = rcu_seq_snap(&rsp->expedited_sequence);
-	trace_rcu_exp_grace_period(rsp->name, s, TPS("snap"));
-	return s;
-}
-static bool rcu_exp_gp_seq_done(struct rcu_state *rsp, unsigned long s)
-{
-	return rcu_seq_done(&rsp->expedited_sequence, s);
-}
-
-/*
- * Reset the ->expmaskinit values in the rcu_node tree to reflect any
- * recent CPU-online activity.  Note that these masks are not cleared
- * when CPUs go offline, so they reflect the union of all CPUs that have
- * ever been online.  This means that this function normally takes its
- * no-work-to-do fastpath.
- */
-static void sync_exp_reset_tree_hotplug(struct rcu_state *rsp)
-{
-	bool done;
-	unsigned long flags;
-	unsigned long mask;
-	unsigned long oldmask;
-	int ncpus = READ_ONCE(rsp->ncpus);
-	struct rcu_node *rnp;
-	struct rcu_node *rnp_up;
-
-	/* If no new CPUs onlined since last time, nothing to do. */
-	if (likely(ncpus == rsp->ncpus_snap))
-		return;
-	rsp->ncpus_snap = ncpus;
-
-	/*
-	 * Each pass through the following loop propagates newly onlined
-	 * CPUs for the current rcu_node structure up the rcu_node tree.
-	 */
-	rcu_for_each_leaf_node(rsp, rnp) {
-		raw_spin_lock_irqsave_rcu_node(rnp, flags);
-		if (rnp->expmaskinit == rnp->expmaskinitnext) {
-			raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
-			continue;  /* No new CPUs, nothing to do. */
-		}
-
-		/* Update this node's mask, track old value for propagation. */
-		oldmask = rnp->expmaskinit;
-		rnp->expmaskinit = rnp->expmaskinitnext;
-		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
-
-		/* If was already nonzero, nothing to propagate. */
-		if (oldmask)
-			continue;
-
-		/* Propagate the new CPU up the tree. */
-		mask = rnp->grpmask;
-		rnp_up = rnp->parent;
-		done = false;
-		while (rnp_up) {
-			raw_spin_lock_irqsave_rcu_node(rnp_up, flags);
-			if (rnp_up->expmaskinit)
-				done = true;
-			rnp_up->expmaskinit |= mask;
-			raw_spin_unlock_irqrestore_rcu_node(rnp_up, flags);
-			if (done)
-				break;
-			mask = rnp_up->grpmask;
-			rnp_up = rnp_up->parent;
-		}
-	}
-}
-
-/*
- * Reset the ->expmask values in the rcu_node tree in preparation for
- * a new expedited grace period.
- */
-static void __maybe_unused sync_exp_reset_tree(struct rcu_state *rsp)
-{
-	unsigned long flags;
-	struct rcu_node *rnp;
-
-	sync_exp_reset_tree_hotplug(rsp);
-	rcu_for_each_node_breadth_first(rsp, rnp) {
-		raw_spin_lock_irqsave_rcu_node(rnp, flags);
-		WARN_ON_ONCE(rnp->expmask);
-		rnp->expmask = rnp->expmaskinit;
-		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
-	}
-}
-
-/*
- * Return non-zero if there is no RCU expedited grace period in progress
- * for the specified rcu_node structure, in other words, if all CPUs and
- * tasks covered by the specified rcu_node structure have done their bit
- * for the current expedited grace period.  Works only for preemptible
- * RCU -- other RCU implementation use other means.
- *
- * Caller must hold the rcu_state's exp_mutex.
- */
-static int sync_rcu_preempt_exp_done(struct rcu_node *rnp)
-{
-	return rnp->exp_tasks == NULL &&
-	       READ_ONCE(rnp->expmask) == 0;
-}
-
-/*
- * Report the exit from RCU read-side critical section for the last task
- * that queued itself during or before the current expedited preemptible-RCU
- * grace period.  This event is reported either to the rcu_node structure on
- * which the task was queued or to one of that rcu_node structure's ancestors,
- * recursively up the tree.  (Calm down, calm down, we do the recursion
- * iteratively!)
- *
- * Caller must hold the rcu_state's exp_mutex and the specified rcu_node
- * structure's ->lock.
- */
-static void __rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
-				 bool wake, unsigned long flags)
-	__releases(rnp->lock)
-{
-	unsigned long mask;
-
-	for (;;) {
-		if (!sync_rcu_preempt_exp_done(rnp)) {
-			if (!rnp->expmask)
-				rcu_initiate_boost(rnp, flags);
-			else
-				raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
-			break;
-		}
-		if (rnp->parent == NULL) {
-			raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
-			if (wake) {
-				smp_mb(); /* EGP done before wake_up(). */
-				swake_up(&rsp->expedited_wq);
-			}
-			break;
-		}
-		mask = rnp->grpmask;
-		raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled */
-		rnp = rnp->parent;
-		raw_spin_lock_rcu_node(rnp); /* irqs already disabled */
-		WARN_ON_ONCE(!(rnp->expmask & mask));
-		rnp->expmask &= ~mask;
-	}
-}
-
-/*
- * Report expedited quiescent state for specified node.  This is a
- * lock-acquisition wrapper function for __rcu_report_exp_rnp().
- *
- * Caller must hold the rcu_state's exp_mutex.
- */
-static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
-					      struct rcu_node *rnp, bool wake)
-{
-	unsigned long flags;
-
-	raw_spin_lock_irqsave_rcu_node(rnp, flags);
-	__rcu_report_exp_rnp(rsp, rnp, wake, flags);
-}
-
-/*
- * Report expedited quiescent state for multiple CPUs, all covered by the
- * specified leaf rcu_node structure.  Caller must hold the rcu_state's
- * exp_mutex.
- */
-static void rcu_report_exp_cpu_mult(struct rcu_state *rsp, struct rcu_node *rnp,
-				    unsigned long mask, bool wake)
-{
-	unsigned long flags;
-
-	raw_spin_lock_irqsave_rcu_node(rnp, flags);
-	if (!(rnp->expmask & mask)) {
-		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
-		return;
-	}
-	rnp->expmask &= ~mask;
-	__rcu_report_exp_rnp(rsp, rnp, wake, flags); /* Releases rnp->lock. */
-}
-
-/*
- * Report expedited quiescent state for specified rcu_data (CPU).
- */
-static void rcu_report_exp_rdp(struct rcu_state *rsp, struct rcu_data *rdp,
-			       bool wake)
-{
-	rcu_report_exp_cpu_mult(rsp, rdp->mynode, rdp->grpmask, wake);
-}
-
-/* Common code for synchronize_{rcu,sched}_expedited() work-done checking. */
-static bool sync_exp_work_done(struct rcu_state *rsp, atomic_long_t *stat,
-			       unsigned long s)
-{
-	if (rcu_exp_gp_seq_done(rsp, s)) {
-		trace_rcu_exp_grace_period(rsp->name, s, TPS("done"));
-		/* Ensure test happens before caller kfree(). */
-		smp_mb__before_atomic(); /* ^^^ */
-		atomic_long_inc(stat);
-		return true;
-	}
-	return false;
-}
-
-/*
- * Funnel-lock acquisition for expedited grace periods.  Returns true
- * if some other task completed an expedited grace period that this task
- * can piggy-back on, and with no mutex held.  Otherwise, returns false
- * with the mutex held, indicating that the caller must actually do the
- * expedited grace period.
- */
-static bool exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
-{
-	struct rcu_data *rdp = per_cpu_ptr(rsp->rda, raw_smp_processor_id());
-	struct rcu_node *rnp = rdp->mynode;
-	struct rcu_node *rnp_root = rcu_get_root(rsp);
-
-	/* Low-contention fastpath. */
-	if (ULONG_CMP_LT(READ_ONCE(rnp->exp_seq_rq), s) &&
-	    (rnp == rnp_root ||
-	     ULONG_CMP_LT(READ_ONCE(rnp_root->exp_seq_rq), s)) &&
-	    !mutex_is_locked(&rsp->exp_mutex) &&
-	    mutex_trylock(&rsp->exp_mutex))
-		goto fastpath;
-
-	/*
-	 * Each pass through the following loop works its way up
-	 * the rcu_node tree, returning if others have done the work or
-	 * otherwise falls through to acquire rsp->exp_mutex.  The mapping
-	 * from CPU to rcu_node structure can be inexact, as it is just
-	 * promoting locality and is not strictly needed for correctness.
-	 */
-	for (; rnp != NULL; rnp = rnp->parent) {
-		if (sync_exp_work_done(rsp, &rdp->exp_workdone1, s))
-			return true;
-
-		/* Work not done, either wait here or go up. */
-		spin_lock(&rnp->exp_lock);
-		if (ULONG_CMP_GE(rnp->exp_seq_rq, s)) {
-
-			/* Someone else doing GP, so wait for them. */
-			spin_unlock(&rnp->exp_lock);
-			trace_rcu_exp_funnel_lock(rsp->name, rnp->level,
-						  rnp->grplo, rnp->grphi,
-						  TPS("wait"));
-			wait_event(rnp->exp_wq[(s >> 1) & 0x3],
-				   sync_exp_work_done(rsp,
-						      &rdp->exp_workdone2, s));
-			return true;
-		}
-		rnp->exp_seq_rq = s; /* Followers can wait on us. */
-		spin_unlock(&rnp->exp_lock);
-		trace_rcu_exp_funnel_lock(rsp->name, rnp->level, rnp->grplo,
-					  rnp->grphi, TPS("nxtlvl"));
-	}
-	mutex_lock(&rsp->exp_mutex);
-fastpath:
-	if (sync_exp_work_done(rsp, &rdp->exp_workdone3, s)) {
-		mutex_unlock(&rsp->exp_mutex);
-		return true;
-	}
-	rcu_exp_gp_seq_start(rsp);
-	trace_rcu_exp_grace_period(rsp->name, s, TPS("start"));
-	return false;
-}
-
-/* Invoked on each online non-idle CPU for expedited quiescent state. */
-static void sync_sched_exp_handler(void *data)
-{
-	struct rcu_data *rdp;
-	struct rcu_node *rnp;
-	struct rcu_state *rsp = data;
-
-	rdp = this_cpu_ptr(rsp->rda);
-	rnp = rdp->mynode;
-	if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
-	    __this_cpu_read(rcu_sched_data.cpu_no_qs.b.exp))
-		return;
-	if (rcu_is_cpu_rrupt_from_idle()) {
-		rcu_report_exp_rdp(&rcu_sched_state,
-				   this_cpu_ptr(&rcu_sched_data), true);
-		return;
-	}
-	__this_cpu_write(rcu_sched_data.cpu_no_qs.b.exp, true);
-	resched_cpu(smp_processor_id());
-}
-
-/* Send IPI for expedited cleanup if needed at end of CPU-hotplug operation. */
-static void sync_sched_exp_online_cleanup(int cpu)
-{
-	struct rcu_data *rdp;
-	int ret;
-	struct rcu_node *rnp;
-	struct rcu_state *rsp = &rcu_sched_state;
-
-	rdp = per_cpu_ptr(rsp->rda, cpu);
-	rnp = rdp->mynode;
-	if (!(READ_ONCE(rnp->expmask) & rdp->grpmask))
-		return;
-	ret = smp_call_function_single(cpu, sync_sched_exp_handler, rsp, 0);
-	WARN_ON_ONCE(ret);
-}
-
-/*
- * Select the nodes that the upcoming expedited grace period needs
- * to wait for.
- */
-static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
-				     smp_call_func_t func)
-{
-	int cpu;
-	unsigned long flags;
-	unsigned long mask;
-	unsigned long mask_ofl_test;
-	unsigned long mask_ofl_ipi;
-	int ret;
-	struct rcu_node *rnp;
-
-	sync_exp_reset_tree(rsp);
-	rcu_for_each_leaf_node(rsp, rnp) {
-		raw_spin_lock_irqsave_rcu_node(rnp, flags);
-
-		/* Each pass checks a CPU for identity, offline, and idle. */
-		mask_ofl_test = 0;
-		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++) {
-			struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
-			struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
-
-			if (raw_smp_processor_id() == cpu ||
-			    !(atomic_add_return(0, &rdtp->dynticks) & 0x1))
-				mask_ofl_test |= rdp->grpmask;
-		}
-		mask_ofl_ipi = rnp->expmask & ~mask_ofl_test;
-
-		/*
-		 * Need to wait for any blocked tasks as well.  Note that
-		 * additional blocking tasks will also block the expedited
-		 * GP until such time as the ->expmask bits are cleared.
-		 */
-		if (rcu_preempt_has_tasks(rnp))
-			rnp->exp_tasks = rnp->blkd_tasks.next;
-		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
-
-		/* IPI the remaining CPUs for expedited quiescent state. */
-		mask = 1;
-		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
-			if (!(mask_ofl_ipi & mask))
-				continue;
-retry_ipi:
-			ret = smp_call_function_single(cpu, func, rsp, 0);
-			if (!ret) {
-				mask_ofl_ipi &= ~mask;
-				continue;
-			}
-			/* Failed, raced with offline. */
-			raw_spin_lock_irqsave_rcu_node(rnp, flags);
-			if (cpu_online(cpu) &&
-			    (rnp->expmask & mask)) {
-				raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
-				schedule_timeout_uninterruptible(1);
-				if (cpu_online(cpu) &&
-				    (rnp->expmask & mask))
-					goto retry_ipi;
-				raw_spin_lock_irqsave_rcu_node(rnp, flags);
-			}
-			if (!(rnp->expmask & mask))
-				mask_ofl_ipi &= ~mask;
-			raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
-		}
-		/* Report quiescent states for those that went offline. */
-		mask_ofl_test |= mask_ofl_ipi;
-		if (mask_ofl_test)
-			rcu_report_exp_cpu_mult(rsp, rnp, mask_ofl_test, false);
-	}
-}
-
-static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
-{
-	int cpu;
-	unsigned long jiffies_stall;
-	unsigned long jiffies_start;
-	unsigned long mask;
-	int ndetected;
-	struct rcu_node *rnp;
-	struct rcu_node *rnp_root = rcu_get_root(rsp);
-	int ret;
-
-	jiffies_stall = rcu_jiffies_till_stall_check();
-	jiffies_start = jiffies;
-
-	for (;;) {
-		ret = swait_event_timeout(
-				rsp->expedited_wq,
-				sync_rcu_preempt_exp_done(rnp_root),
-				jiffies_stall);
-		if (ret > 0 || sync_rcu_preempt_exp_done(rnp_root))
-			return;
-		if (ret < 0) {
-			/* Hit a signal, disable CPU stall warnings. */
-			swait_event(rsp->expedited_wq,
-				   sync_rcu_preempt_exp_done(rnp_root));
-			return;
-		}
-		pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {",
-		       rsp->name);
-		ndetected = 0;
-		rcu_for_each_leaf_node(rsp, rnp) {
-			ndetected += rcu_print_task_exp_stall(rnp);
-			mask = 1;
-			for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
-				struct rcu_data *rdp;
-
-				if (!(rnp->expmask & mask))
-					continue;
-				ndetected++;
-				rdp = per_cpu_ptr(rsp->rda, cpu);
-				pr_cont(" %d-%c%c%c", cpu,
-					"O."[!!cpu_online(cpu)],
-					"o."[!!(rdp->grpmask & rnp->expmaskinit)],
-					"N."[!!(rdp->grpmask & rnp->expmaskinitnext)]);
-			}
-			mask <<= 1;
-		}
-		pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n",
-			jiffies - jiffies_start, rsp->expedited_sequence,
-			rnp_root->expmask, ".T"[!!rnp_root->exp_tasks]);
-		if (ndetected) {
-			pr_err("blocking rcu_node structures:");
-			rcu_for_each_node_breadth_first(rsp, rnp) {
-				if (rnp == rnp_root)
-					continue; /* printed unconditionally */
-				if (sync_rcu_preempt_exp_done(rnp))
-					continue;
-				pr_cont(" l=%u:%d-%d:%#lx/%c",
-					rnp->level, rnp->grplo, rnp->grphi,
-					rnp->expmask,
-					".T"[!!rnp->exp_tasks]);
-			}
-			pr_cont("\n");
-		}
-		rcu_for_each_leaf_node(rsp, rnp) {
-			mask = 1;
-			for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
-				if (!(rnp->expmask & mask))
-					continue;
-				dump_cpu_task(cpu);
-			}
-		}
-		jiffies_stall = 3 * rcu_jiffies_till_stall_check() + 3;
-	}
-}
-
-/*
- * Wait for the current expedited grace period to complete, and then
- * wake up everyone who piggybacked on the just-completed expedited
- * grace period.  Also update all the ->exp_seq_rq counters as needed
- * in order to avoid counter-wrap problems.
- */
-static void rcu_exp_wait_wake(struct rcu_state *rsp, unsigned long s)
-{
-	struct rcu_node *rnp;
-
-	synchronize_sched_expedited_wait(rsp);
-	rcu_exp_gp_seq_end(rsp);
-	trace_rcu_exp_grace_period(rsp->name, s, TPS("end"));
-
-	/*
-	 * Switch over to wakeup mode, allowing the next GP, but -only- the
-	 * next GP, to proceed.
-	 */
-	mutex_lock(&rsp->exp_wake_mutex);
-	mutex_unlock(&rsp->exp_mutex);
-
-	rcu_for_each_node_breadth_first(rsp, rnp) {
-		if (ULONG_CMP_LT(READ_ONCE(rnp->exp_seq_rq), s)) {
-			spin_lock(&rnp->exp_lock);
-			/* Recheck, avoid hang in case someone just arrived. */
-			if (ULONG_CMP_LT(rnp->exp_seq_rq, s))
-				rnp->exp_seq_rq = s;
-			spin_unlock(&rnp->exp_lock);
-		}
-		wake_up_all(&rnp->exp_wq[(rsp->expedited_sequence >> 1) & 0x3]);
-	}
-	trace_rcu_exp_grace_period(rsp->name, s, TPS("endwake"));
-	mutex_unlock(&rsp->exp_wake_mutex);
-}
-
-/**
- * synchronize_sched_expedited - Brute-force RCU-sched grace period
- *
- * Wait for an RCU-sched grace period to elapse, but use a "big hammer"
- * approach to force the grace period to end quickly.  This consumes
- * significant time on all CPUs and is unfriendly to real-time workloads,
- * so is thus not recommended for any sort of common-case code.  In fact,
- * if you are using synchronize_sched_expedited() in a loop, please
- * restructure your code to batch your updates, and then use a single
- * synchronize_sched() instead.
- *
- * This implementation can be thought of as an application of sequence
- * locking to expedited grace periods, but using the sequence counter to
- * determine when someone else has already done the work instead of for
- * retrying readers.
- */
-void synchronize_sched_expedited(void)
-{
-	unsigned long s;
-	struct rcu_state *rsp = &rcu_sched_state;
-
-	/* If only one CPU, this is automatically a grace period. */
-	if (rcu_blocking_is_gp())
-		return;
-
-	/* If expedited grace periods are prohibited, fall back to normal. */
-	if (rcu_gp_is_normal()) {
-		wait_rcu_gp(call_rcu_sched);
-		return;
-	}
-
-	/* Take a snapshot of the sequence number.  */
-	s = rcu_exp_gp_seq_snap(rsp);
-	if (exp_funnel_lock(rsp, s))
-		return;  /* Someone else did our work for us. */
-
-	/* Initialize the rcu_node tree in preparation for the wait. */
-	sync_rcu_exp_select_cpus(rsp, sync_sched_exp_handler);
-
-	/* Wait and clean up, including waking everyone. */
-	rcu_exp_wait_wake(rsp, s);
-}
-EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
-
 /*
  * Check to see if there is any immediate RCU-related work to be done
  * by the current CPU, for the specified type of RCU, returning 1 if so.
@@ -4747,4 +4205,5 @@ void __init rcu_init(void)
 		rcu_cpu_notify(NULL, CPU_UP_PREPARE, (void *)(long)cpu);
 }
 
+#include "tree_exp.h"
 #include "tree_plugin.h"
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
new file mode 100644
index 000000000000..db0909cf7fe1
--- /dev/null
+++ b/kernel/rcu/tree_exp.h
@@ -0,0 +1,564 @@
+/*
+ * RCU expedited grace periods
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ * Copyright IBM Corporation, 2016
+ *
+ * Authors: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
+ */
+
+/* Wrapper functions for expedited grace periods.  */
+static void rcu_exp_gp_seq_start(struct rcu_state *rsp)
+{
+	rcu_seq_start(&rsp->expedited_sequence);
+}
+static void rcu_exp_gp_seq_end(struct rcu_state *rsp)
+{
+	rcu_seq_end(&rsp->expedited_sequence);
+	smp_mb(); /* Ensure that consecutive grace periods serialize. */
+}
+static unsigned long rcu_exp_gp_seq_snap(struct rcu_state *rsp)
+{
+	unsigned long s;
+
+	smp_mb(); /* Caller's modifications seen first by other CPUs. */
+	s = rcu_seq_snap(&rsp->expedited_sequence);
+	trace_rcu_exp_grace_period(rsp->name, s, TPS("snap"));
+	return s;
+}
+static bool rcu_exp_gp_seq_done(struct rcu_state *rsp, unsigned long s)
+{
+	return rcu_seq_done(&rsp->expedited_sequence, s);
+}
+
+/*
+ * Reset the ->expmaskinit values in the rcu_node tree to reflect any
+ * recent CPU-online activity.  Note that these masks are not cleared
+ * when CPUs go offline, so they reflect the union of all CPUs that have
+ * ever been online.  This means that this function normally takes its
+ * no-work-to-do fastpath.
+ */
+static void sync_exp_reset_tree_hotplug(struct rcu_state *rsp)
+{
+	bool done;
+	unsigned long flags;
+	unsigned long mask;
+	unsigned long oldmask;
+	int ncpus = READ_ONCE(rsp->ncpus);
+	struct rcu_node *rnp;
+	struct rcu_node *rnp_up;
+
+	/* If no new CPUs onlined since last time, nothing to do. */
+	if (likely(ncpus == rsp->ncpus_snap))
+		return;
+	rsp->ncpus_snap = ncpus;
+
+	/*
+	 * Each pass through the following loop propagates newly onlined
+	 * CPUs for the current rcu_node structure up the rcu_node tree.
+	 */
+	rcu_for_each_leaf_node(rsp, rnp) {
+		raw_spin_lock_irqsave_rcu_node(rnp, flags);
+		if (rnp->expmaskinit == rnp->expmaskinitnext) {
+			raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+			continue;  /* No new CPUs, nothing to do. */
+		}
+
+		/* Update this node's mask, track old value for propagation. */
+		oldmask = rnp->expmaskinit;
+		rnp->expmaskinit = rnp->expmaskinitnext;
+		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+
+		/* If was already nonzero, nothing to propagate. */
+		if (oldmask)
+			continue;
+
+		/* Propagate the new CPU up the tree. */
+		mask = rnp->grpmask;
+		rnp_up = rnp->parent;
+		done = false;
+		while (rnp_up) {
+			raw_spin_lock_irqsave_rcu_node(rnp_up, flags);
+			if (rnp_up->expmaskinit)
+				done = true;
+			rnp_up->expmaskinit |= mask;
+			raw_spin_unlock_irqrestore_rcu_node(rnp_up, flags);
+			if (done)
+				break;
+			mask = rnp_up->grpmask;
+			rnp_up = rnp_up->parent;
+		}
+	}
+}
+
+/*
+ * Reset the ->expmask values in the rcu_node tree in preparation for
+ * a new expedited grace period.
+ */
+static void __maybe_unused sync_exp_reset_tree(struct rcu_state *rsp)
+{
+	unsigned long flags;
+	struct rcu_node *rnp;
+
+	sync_exp_reset_tree_hotplug(rsp);
+	rcu_for_each_node_breadth_first(rsp, rnp) {
+		raw_spin_lock_irqsave_rcu_node(rnp, flags);
+		WARN_ON_ONCE(rnp->expmask);
+		rnp->expmask = rnp->expmaskinit;
+		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+	}
+}
+
+/*
+ * Return non-zero if there is no RCU expedited grace period in progress
+ * for the specified rcu_node structure, in other words, if all CPUs and
+ * tasks covered by the specified rcu_node structure have done their bit
+ * for the current expedited grace period.  Works only for preemptible
+ * RCU -- other RCU implementation use other means.
+ *
+ * Caller must hold the rcu_state's exp_mutex.
+ */
+static int sync_rcu_preempt_exp_done(struct rcu_node *rnp)
+{
+	return rnp->exp_tasks == NULL &&
+	       READ_ONCE(rnp->expmask) == 0;
+}
+
+/*
+ * Report the exit from RCU read-side critical section for the last task
+ * that queued itself during or before the current expedited preemptible-RCU
+ * grace period.  This event is reported either to the rcu_node structure on
+ * which the task was queued or to one of that rcu_node structure's ancestors,
+ * recursively up the tree.  (Calm down, calm down, we do the recursion
+ * iteratively!)
+ *
+ * Caller must hold the rcu_state's exp_mutex and the specified rcu_node
+ * structure's ->lock.
+ */
+static void __rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
+				 bool wake, unsigned long flags)
+	__releases(rnp->lock)
+{
+	unsigned long mask;
+
+	for (;;) {
+		if (!sync_rcu_preempt_exp_done(rnp)) {
+			if (!rnp->expmask)
+				rcu_initiate_boost(rnp, flags);
+			else
+				raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+			break;
+		}
+		if (rnp->parent == NULL) {
+			raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+			if (wake) {
+				smp_mb(); /* EGP done before wake_up(). */
+				swake_up(&rsp->expedited_wq);
+			}
+			break;
+		}
+		mask = rnp->grpmask;
+		raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled */
+		rnp = rnp->parent;
+		raw_spin_lock_rcu_node(rnp); /* irqs already disabled */
+		WARN_ON_ONCE(!(rnp->expmask & mask));
+		rnp->expmask &= ~mask;
+	}
+}
+
+/*
+ * Report expedited quiescent state for specified node.  This is a
+ * lock-acquisition wrapper function for __rcu_report_exp_rnp().
+ *
+ * Caller must hold the rcu_state's exp_mutex.
+ */
+static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp,
+					      struct rcu_node *rnp, bool wake)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave_rcu_node(rnp, flags);
+	__rcu_report_exp_rnp(rsp, rnp, wake, flags);
+}
+
+/*
+ * Report expedited quiescent state for multiple CPUs, all covered by the
+ * specified leaf rcu_node structure.  Caller must hold the rcu_state's
+ * exp_mutex.
+ */
+static void rcu_report_exp_cpu_mult(struct rcu_state *rsp, struct rcu_node *rnp,
+				    unsigned long mask, bool wake)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave_rcu_node(rnp, flags);
+	if (!(rnp->expmask & mask)) {
+		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+		return;
+	}
+	rnp->expmask &= ~mask;
+	__rcu_report_exp_rnp(rsp, rnp, wake, flags); /* Releases rnp->lock. */
+}
+
+/*
+ * Report expedited quiescent state for specified rcu_data (CPU).
+ */
+static void rcu_report_exp_rdp(struct rcu_state *rsp, struct rcu_data *rdp,
+			       bool wake)
+{
+	rcu_report_exp_cpu_mult(rsp, rdp->mynode, rdp->grpmask, wake);
+}
+
+/* Common code for synchronize_{rcu,sched}_expedited() work-done checking. */
+static bool sync_exp_work_done(struct rcu_state *rsp, atomic_long_t *stat,
+			       unsigned long s)
+{
+	if (rcu_exp_gp_seq_done(rsp, s)) {
+		trace_rcu_exp_grace_period(rsp->name, s, TPS("done"));
+		/* Ensure test happens before caller kfree(). */
+		smp_mb__before_atomic(); /* ^^^ */
+		atomic_long_inc(stat);
+		return true;
+	}
+	return false;
+}
+
+/*
+ * Funnel-lock acquisition for expedited grace periods.  Returns true
+ * if some other task completed an expedited grace period that this task
+ * can piggy-back on, and with no mutex held.  Otherwise, returns false
+ * with the mutex held, indicating that the caller must actually do the
+ * expedited grace period.
+ */
+static bool exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
+{
+	struct rcu_data *rdp = per_cpu_ptr(rsp->rda, raw_smp_processor_id());
+	struct rcu_node *rnp = rdp->mynode;
+	struct rcu_node *rnp_root = rcu_get_root(rsp);
+
+	/* Low-contention fastpath. */
+	if (ULONG_CMP_LT(READ_ONCE(rnp->exp_seq_rq), s) &&
+	    (rnp == rnp_root ||
+	     ULONG_CMP_LT(READ_ONCE(rnp_root->exp_seq_rq), s)) &&
+	    !mutex_is_locked(&rsp->exp_mutex) &&
+	    mutex_trylock(&rsp->exp_mutex))
+		goto fastpath;
+
+	/*
+	 * Each pass through the following loop works its way up
+	 * the rcu_node tree, returning if others have done the work or
+	 * otherwise falls through to acquire rsp->exp_mutex.  The mapping
+	 * from CPU to rcu_node structure can be inexact, as it is just
+	 * promoting locality and is not strictly needed for correctness.
+	 */
+	for (; rnp != NULL; rnp = rnp->parent) {
+		if (sync_exp_work_done(rsp, &rdp->exp_workdone1, s))
+			return true;
+
+		/* Work not done, either wait here or go up. */
+		spin_lock(&rnp->exp_lock);
+		if (ULONG_CMP_GE(rnp->exp_seq_rq, s)) {
+
+			/* Someone else doing GP, so wait for them. */
+			spin_unlock(&rnp->exp_lock);
+			trace_rcu_exp_funnel_lock(rsp->name, rnp->level,
+						  rnp->grplo, rnp->grphi,
+						  TPS("wait"));
+			wait_event(rnp->exp_wq[(s >> 1) & 0x3],
+				   sync_exp_work_done(rsp,
+						      &rdp->exp_workdone2, s));
+			return true;
+		}
+		rnp->exp_seq_rq = s; /* Followers can wait on us. */
+		spin_unlock(&rnp->exp_lock);
+		trace_rcu_exp_funnel_lock(rsp->name, rnp->level, rnp->grplo,
+					  rnp->grphi, TPS("nxtlvl"));
+	}
+	mutex_lock(&rsp->exp_mutex);
+fastpath:
+	if (sync_exp_work_done(rsp, &rdp->exp_workdone3, s)) {
+		mutex_unlock(&rsp->exp_mutex);
+		return true;
+	}
+	rcu_exp_gp_seq_start(rsp);
+	trace_rcu_exp_grace_period(rsp->name, s, TPS("start"));
+	return false;
+}
+
+/* Invoked on each online non-idle CPU for expedited quiescent state. */
+static void sync_sched_exp_handler(void *data)
+{
+	struct rcu_data *rdp;
+	struct rcu_node *rnp;
+	struct rcu_state *rsp = data;
+
+	rdp = this_cpu_ptr(rsp->rda);
+	rnp = rdp->mynode;
+	if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
+	    __this_cpu_read(rcu_sched_data.cpu_no_qs.b.exp))
+		return;
+	if (rcu_is_cpu_rrupt_from_idle()) {
+		rcu_report_exp_rdp(&rcu_sched_state,
+				   this_cpu_ptr(&rcu_sched_data), true);
+		return;
+	}
+	__this_cpu_write(rcu_sched_data.cpu_no_qs.b.exp, true);
+	resched_cpu(smp_processor_id());
+}
+
+/* Send IPI for expedited cleanup if needed at end of CPU-hotplug operation. */
+static void sync_sched_exp_online_cleanup(int cpu)
+{
+	struct rcu_data *rdp;
+	int ret;
+	struct rcu_node *rnp;
+	struct rcu_state *rsp = &rcu_sched_state;
+
+	rdp = per_cpu_ptr(rsp->rda, cpu);
+	rnp = rdp->mynode;
+	if (!(READ_ONCE(rnp->expmask) & rdp->grpmask))
+		return;
+	ret = smp_call_function_single(cpu, sync_sched_exp_handler, rsp, 0);
+	WARN_ON_ONCE(ret);
+}
+
+/*
+ * Select the nodes that the upcoming expedited grace period needs
+ * to wait for.
+ */
+static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
+				     smp_call_func_t func)
+{
+	int cpu;
+	unsigned long flags;
+	unsigned long mask;
+	unsigned long mask_ofl_test;
+	unsigned long mask_ofl_ipi;
+	int ret;
+	struct rcu_node *rnp;
+
+	sync_exp_reset_tree(rsp);
+	rcu_for_each_leaf_node(rsp, rnp) {
+		raw_spin_lock_irqsave_rcu_node(rnp, flags);
+
+		/* Each pass checks a CPU for identity, offline, and idle. */
+		mask_ofl_test = 0;
+		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++) {
+			struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
+			struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
+
+			if (raw_smp_processor_id() == cpu ||
+			    !(atomic_add_return(0, &rdtp->dynticks) & 0x1))
+				mask_ofl_test |= rdp->grpmask;
+		}
+		mask_ofl_ipi = rnp->expmask & ~mask_ofl_test;
+
+		/*
+		 * Need to wait for any blocked tasks as well.  Note that
+		 * additional blocking tasks will also block the expedited
+		 * GP until such time as the ->expmask bits are cleared.
+		 */
+		if (rcu_preempt_has_tasks(rnp))
+			rnp->exp_tasks = rnp->blkd_tasks.next;
+		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+
+		/* IPI the remaining CPUs for expedited quiescent state. */
+		mask = 1;
+		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
+			if (!(mask_ofl_ipi & mask))
+				continue;
+retry_ipi:
+			ret = smp_call_function_single(cpu, func, rsp, 0);
+			if (!ret) {
+				mask_ofl_ipi &= ~mask;
+				continue;
+			}
+			/* Failed, raced with offline. */
+			raw_spin_lock_irqsave_rcu_node(rnp, flags);
+			if (cpu_online(cpu) &&
+			    (rnp->expmask & mask)) {
+				raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+				schedule_timeout_uninterruptible(1);
+				if (cpu_online(cpu) &&
+				    (rnp->expmask & mask))
+					goto retry_ipi;
+				raw_spin_lock_irqsave_rcu_node(rnp, flags);
+			}
+			if (!(rnp->expmask & mask))
+				mask_ofl_ipi &= ~mask;
+			raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+		}
+		/* Report quiescent states for those that went offline. */
+		mask_ofl_test |= mask_ofl_ipi;
+		if (mask_ofl_test)
+			rcu_report_exp_cpu_mult(rsp, rnp, mask_ofl_test, false);
+	}
+}
+
+static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
+{
+	int cpu;
+	unsigned long jiffies_stall;
+	unsigned long jiffies_start;
+	unsigned long mask;
+	int ndetected;
+	struct rcu_node *rnp;
+	struct rcu_node *rnp_root = rcu_get_root(rsp);
+	int ret;
+
+	jiffies_stall = rcu_jiffies_till_stall_check();
+	jiffies_start = jiffies;
+
+	for (;;) {
+		ret = swait_event_timeout(
+				rsp->expedited_wq,
+				sync_rcu_preempt_exp_done(rnp_root),
+				jiffies_stall);
+		if (ret > 0 || sync_rcu_preempt_exp_done(rnp_root))
+			return;
+		if (ret < 0) {
+			/* Hit a signal, disable CPU stall warnings. */
+			swait_event(rsp->expedited_wq,
+				   sync_rcu_preempt_exp_done(rnp_root));
+			return;
+		}
+		pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {",
+		       rsp->name);
+		ndetected = 0;
+		rcu_for_each_leaf_node(rsp, rnp) {
+			ndetected += rcu_print_task_exp_stall(rnp);
+			mask = 1;
+			for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
+				struct rcu_data *rdp;
+
+				if (!(rnp->expmask & mask))
+					continue;
+				ndetected++;
+				rdp = per_cpu_ptr(rsp->rda, cpu);
+				pr_cont(" %d-%c%c%c", cpu,
+					"O."[!!cpu_online(cpu)],
+					"o."[!!(rdp->grpmask & rnp->expmaskinit)],
+					"N."[!!(rdp->grpmask & rnp->expmaskinitnext)]);
+			}
+			mask <<= 1;
+		}
+		pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n",
+			jiffies - jiffies_start, rsp->expedited_sequence,
+			rnp_root->expmask, ".T"[!!rnp_root->exp_tasks]);
+		if (ndetected) {
+			pr_err("blocking rcu_node structures:");
+			rcu_for_each_node_breadth_first(rsp, rnp) {
+				if (rnp == rnp_root)
+					continue; /* printed unconditionally */
+				if (sync_rcu_preempt_exp_done(rnp))
+					continue;
+				pr_cont(" l=%u:%d-%d:%#lx/%c",
+					rnp->level, rnp->grplo, rnp->grphi,
+					rnp->expmask,
+					".T"[!!rnp->exp_tasks]);
+			}
+			pr_cont("\n");
+		}
+		rcu_for_each_leaf_node(rsp, rnp) {
+			mask = 1;
+			for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
+				if (!(rnp->expmask & mask))
+					continue;
+				dump_cpu_task(cpu);
+			}
+		}
+		jiffies_stall = 3 * rcu_jiffies_till_stall_check() + 3;
+	}
+}
+
+/*
+ * Wait for the current expedited grace period to complete, and then
+ * wake up everyone who piggybacked on the just-completed expedited
+ * grace period.  Also update all the ->exp_seq_rq counters as needed
+ * in order to avoid counter-wrap problems.
+ */
+static void rcu_exp_wait_wake(struct rcu_state *rsp, unsigned long s)
+{
+	struct rcu_node *rnp;
+
+	synchronize_sched_expedited_wait(rsp);
+	rcu_exp_gp_seq_end(rsp);
+	trace_rcu_exp_grace_period(rsp->name, s, TPS("end"));
+
+	/*
+	 * Switch over to wakeup mode, allowing the next GP, but -only- the
+	 * next GP, to proceed.
+	 */
+	mutex_lock(&rsp->exp_wake_mutex);
+	mutex_unlock(&rsp->exp_mutex);
+
+	rcu_for_each_node_breadth_first(rsp, rnp) {
+		if (ULONG_CMP_LT(READ_ONCE(rnp->exp_seq_rq), s)) {
+			spin_lock(&rnp->exp_lock);
+			/* Recheck, avoid hang in case someone just arrived. */
+			if (ULONG_CMP_LT(rnp->exp_seq_rq, s))
+				rnp->exp_seq_rq = s;
+			spin_unlock(&rnp->exp_lock);
+		}
+		wake_up_all(&rnp->exp_wq[(rsp->expedited_sequence >> 1) & 0x3]);
+	}
+	trace_rcu_exp_grace_period(rsp->name, s, TPS("endwake"));
+	mutex_unlock(&rsp->exp_wake_mutex);
+}
+
+/**
+ * synchronize_sched_expedited - Brute-force RCU-sched grace period
+ *
+ * Wait for an RCU-sched grace period to elapse, but use a "big hammer"
+ * approach to force the grace period to end quickly.  This consumes
+ * significant time on all CPUs and is unfriendly to real-time workloads,
+ * so is thus not recommended for any sort of common-case code.  In fact,
+ * if you are using synchronize_sched_expedited() in a loop, please
+ * restructure your code to batch your updates, and then use a single
+ * synchronize_sched() instead.
+ *
+ * This implementation can be thought of as an application of sequence
+ * locking to expedited grace periods, but using the sequence counter to
+ * determine when someone else has already done the work instead of for
+ * retrying readers.
+ */
+void synchronize_sched_expedited(void)
+{
+	unsigned long s;
+	struct rcu_state *rsp = &rcu_sched_state;
+
+	/* If only one CPU, this is automatically a grace period. */
+	if (rcu_blocking_is_gp())
+		return;
+
+	/* If expedited grace periods are prohibited, fall back to normal. */
+	if (rcu_gp_is_normal()) {
+		wait_rcu_gp(call_rcu_sched);
+		return;
+	}
+
+	/* Take a snapshot of the sequence number.  */
+	s = rcu_exp_gp_seq_snap(rsp);
+	if (exp_funnel_lock(rsp, s))
+		return;  /* Someone else did our work for us. */
+
+	/* Initialize the rcu_node tree in preparation for the wait. */
+	sync_rcu_exp_select_cpus(rsp, sync_sched_exp_handler);
+
+	/* Wait and clean up, including waking everyone. */
+	rcu_exp_wait_wake(rsp, s);
+}
+EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH tip/core/rcu 05/12] rcu: Move expedited code from tree_plugin.h to tree_exp.h
  2016-06-15 21:45 [PATCH tip/core/rcu 0/12] Miscellaneous RCU fixes for 4.8 Paul E. McKenney
                   ` (3 preceding siblings ...)
  2016-06-15 21:46 ` [PATCH tip/core/rcu 04/12] rcu: Move expedited code from tree.c to tree_exp.h Paul E. McKenney
@ 2016-06-15 21:46 ` Paul E. McKenney
  2016-06-15 21:46 ` [PATCH tip/core/rcu 06/12] rcu: Document RCU_NONIDLE() restrictions in comment header Paul E. McKenney
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Paul E. McKenney @ 2016-06-15 21:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

People have been having some difficulty finding their way around the
RCU code.  This commit therefore pulls some of the expedited grace-period
code from tree_plugin.h to a new tree_exp.h file.  This commit is strictly
code movement.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree_exp.h    | 94 ++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/rcu/tree_plugin.h | 88 ---------------------------------------------
 2 files changed, 94 insertions(+), 88 deletions(-)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index db0909cf7fe1..00a02a231ada 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -562,3 +562,97 @@ void synchronize_sched_expedited(void)
 	rcu_exp_wait_wake(rsp, s);
 }
 EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
+
+#ifdef CONFIG_PREEMPT_RCU
+
+/*
+ * Remote handler for smp_call_function_single().  If there is an
+ * RCU read-side critical section in effect, request that the
+ * next rcu_read_unlock() record the quiescent state up the
+ * ->expmask fields in the rcu_node tree.  Otherwise, immediately
+ * report the quiescent state.
+ */
+static void sync_rcu_exp_handler(void *info)
+{
+	struct rcu_data *rdp;
+	struct rcu_state *rsp = info;
+	struct task_struct *t = current;
+
+	/*
+	 * Within an RCU read-side critical section, request that the next
+	 * rcu_read_unlock() report.  Unless this RCU read-side critical
+	 * section has already blocked, in which case it is already set
+	 * up for the expedited grace period to wait on it.
+	 */
+	if (t->rcu_read_lock_nesting > 0 &&
+	    !t->rcu_read_unlock_special.b.blocked) {
+		t->rcu_read_unlock_special.b.exp_need_qs = true;
+		return;
+	}
+
+	/*
+	 * We are either exiting an RCU read-side critical section (negative
+	 * values of t->rcu_read_lock_nesting) or are not in one at all
+	 * (zero value of t->rcu_read_lock_nesting).  Or we are in an RCU
+	 * read-side critical section that blocked before this expedited
+	 * grace period started.  Either way, we can immediately report
+	 * the quiescent state.
+	 */
+	rdp = this_cpu_ptr(rsp->rda);
+	rcu_report_exp_rdp(rsp, rdp, true);
+}
+
+/**
+ * synchronize_rcu_expedited - Brute-force RCU grace period
+ *
+ * Wait for an RCU-preempt grace period, but expedite it.  The basic
+ * idea is to IPI all non-idle non-nohz online CPUs.  The IPI handler
+ * checks whether the CPU is in an RCU-preempt critical section, and
+ * if so, it sets a flag that causes the outermost rcu_read_unlock()
+ * to report the quiescent state.  On the other hand, if the CPU is
+ * not in an RCU read-side critical section, the IPI handler reports
+ * the quiescent state immediately.
+ *
+ * Although this is a greate improvement over previous expedited
+ * implementations, it is still unfriendly to real-time workloads, so is
+ * thus not recommended for any sort of common-case code.  In fact, if
+ * you are using synchronize_rcu_expedited() in a loop, please restructure
+ * your code to batch your updates, and then Use a single synchronize_rcu()
+ * instead.
+ */
+void synchronize_rcu_expedited(void)
+{
+	struct rcu_state *rsp = rcu_state_p;
+	unsigned long s;
+
+	/* If expedited grace periods are prohibited, fall back to normal. */
+	if (rcu_gp_is_normal()) {
+		wait_rcu_gp(call_rcu);
+		return;
+	}
+
+	s = rcu_exp_gp_seq_snap(rsp);
+	if (exp_funnel_lock(rsp, s))
+		return;  /* Someone else did our work for us. */
+
+	/* Initialize the rcu_node tree in preparation for the wait. */
+	sync_rcu_exp_select_cpus(rsp, sync_rcu_exp_handler);
+
+	/* Wait for ->blkd_tasks lists to drain, then wake everyone up. */
+	rcu_exp_wait_wake(rsp, s);
+}
+EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
+
+#else /* #ifdef CONFIG_PREEMPT_RCU */
+
+/*
+ * Wait for an rcu-preempt grace period, but make it happen quickly.
+ * But because preemptible RCU does not exist, map to rcu-sched.
+ */
+void synchronize_rcu_expedited(void)
+{
+	synchronize_sched_expedited();
+}
+EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
+
+#endif /* #else #ifdef CONFIG_PREEMPT_RCU */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index ff1cd4e1188d..695071dd1e9c 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -681,84 +681,6 @@ void synchronize_rcu(void)
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu);
 
-/*
- * Remote handler for smp_call_function_single().  If there is an
- * RCU read-side critical section in effect, request that the
- * next rcu_read_unlock() record the quiescent state up the
- * ->expmask fields in the rcu_node tree.  Otherwise, immediately
- * report the quiescent state.
- */
-static void sync_rcu_exp_handler(void *info)
-{
-	struct rcu_data *rdp;
-	struct rcu_state *rsp = info;
-	struct task_struct *t = current;
-
-	/*
-	 * Within an RCU read-side critical section, request that the next
-	 * rcu_read_unlock() report.  Unless this RCU read-side critical
-	 * section has already blocked, in which case it is already set
-	 * up for the expedited grace period to wait on it.
-	 */
-	if (t->rcu_read_lock_nesting > 0 &&
-	    !t->rcu_read_unlock_special.b.blocked) {
-		t->rcu_read_unlock_special.b.exp_need_qs = true;
-		return;
-	}
-
-	/*
-	 * We are either exiting an RCU read-side critical section (negative
-	 * values of t->rcu_read_lock_nesting) or are not in one at all
-	 * (zero value of t->rcu_read_lock_nesting).  Or we are in an RCU
-	 * read-side critical section that blocked before this expedited
-	 * grace period started.  Either way, we can immediately report
-	 * the quiescent state.
-	 */
-	rdp = this_cpu_ptr(rsp->rda);
-	rcu_report_exp_rdp(rsp, rdp, true);
-}
-
-/**
- * synchronize_rcu_expedited - Brute-force RCU grace period
- *
- * Wait for an RCU-preempt grace period, but expedite it.  The basic
- * idea is to IPI all non-idle non-nohz online CPUs.  The IPI handler
- * checks whether the CPU is in an RCU-preempt critical section, and
- * if so, it sets a flag that causes the outermost rcu_read_unlock()
- * to report the quiescent state.  On the other hand, if the CPU is
- * not in an RCU read-side critical section, the IPI handler reports
- * the quiescent state immediately.
- *
- * Although this is a greate improvement over previous expedited
- * implementations, it is still unfriendly to real-time workloads, so is
- * thus not recommended for any sort of common-case code.  In fact, if
- * you are using synchronize_rcu_expedited() in a loop, please restructure
- * your code to batch your updates, and then Use a single synchronize_rcu()
- * instead.
- */
-void synchronize_rcu_expedited(void)
-{
-	struct rcu_state *rsp = rcu_state_p;
-	unsigned long s;
-
-	/* If expedited grace periods are prohibited, fall back to normal. */
-	if (rcu_gp_is_normal()) {
-		wait_rcu_gp(call_rcu);
-		return;
-	}
-
-	s = rcu_exp_gp_seq_snap(rsp);
-	if (exp_funnel_lock(rsp, s))
-		return;  /* Someone else did our work for us. */
-
-	/* Initialize the rcu_node tree in preparation for the wait. */
-	sync_rcu_exp_select_cpus(rsp, sync_rcu_exp_handler);
-
-	/* Wait for ->blkd_tasks lists to drain, then wake everyone up. */
-	rcu_exp_wait_wake(rsp, s);
-}
-EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
-
 /**
  * rcu_barrier - Wait until all in-flight call_rcu() callbacks complete.
  *
@@ -883,16 +805,6 @@ static void rcu_preempt_check_callbacks(void)
 }
 
 /*
- * Wait for an rcu-preempt grace period, but make it happen quickly.
- * But because preemptible RCU does not exist, map to rcu-sched.
- */
-void synchronize_rcu_expedited(void)
-{
-	synchronize_sched_expedited();
-}
-EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
-
-/*
  * Because preemptible RCU does not exist, rcu_barrier() is just
  * another name for rcu_barrier_sched().
  */
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH tip/core/rcu 06/12] rcu: Document RCU_NONIDLE() restrictions in comment header
  2016-06-15 21:45 [PATCH tip/core/rcu 0/12] Miscellaneous RCU fixes for 4.8 Paul E. McKenney
                   ` (4 preceding siblings ...)
  2016-06-15 21:46 ` [PATCH tip/core/rcu 05/12] rcu: Move expedited code from tree_plugin.h " Paul E. McKenney
@ 2016-06-15 21:46 ` Paul E. McKenney
  2016-06-15 21:46 ` [PATCH tip/core/rcu 07/12] rcu: No ordering for rcu_assign_pointer() of NULL Paul E. McKenney
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Paul E. McKenney @ 2016-06-15 21:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rcupdate.h | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 5f1533e3d032..c61b6b9506e7 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -379,12 +379,13 @@ static inline void rcu_init_nohz(void)
  * in the inner idle loop.
  *
  * This macro provides the way out:  RCU_NONIDLE(do_something_with_RCU())
- * will tell RCU that it needs to pay attending, invoke its argument
- * (in this example, a call to the do_something_with_RCU() function),
+ * will tell RCU that it needs to pay attention, invoke its argument
+ * (in this example, calling the do_something_with_RCU() function),
  * and then tell RCU to go back to ignoring this CPU.  It is permissible
- * to nest RCU_NONIDLE() wrappers, but the nesting level is currently
- * quite limited.  If deeper nesting is required, it will be necessary
- * to adjust DYNTICK_TASK_NESTING_VALUE accordingly.
+ * to nest RCU_NONIDLE() wrappers, but not indefinitely (but the limit is
+ * on the order of a million or so, even on 32-bit systems).  It is
+ * not legal to block within RCU_NONIDLE(), nor is it permissible to
+ * transfer control either into or out of RCU_NONIDLE()'s statement.
  */
 #define RCU_NONIDLE(a) \
 	do { \
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH tip/core/rcu 07/12] rcu: No ordering for rcu_assign_pointer() of NULL
  2016-06-15 21:45 [PATCH tip/core/rcu 0/12] Miscellaneous RCU fixes for 4.8 Paul E. McKenney
                   ` (5 preceding siblings ...)
  2016-06-15 21:46 ` [PATCH tip/core/rcu 06/12] rcu: Document RCU_NONIDLE() restrictions in comment header Paul E. McKenney
@ 2016-06-15 21:46 ` Paul E. McKenney
  2016-06-15 22:03   ` Peter Zijlstra
  2016-06-15 21:46 ` [PATCH tip/core/rcu 08/12] rcu: Disable TASKS_RCU for usermode Linux Paul E. McKenney
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2016-06-15 21:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

This commit does a compile-time check for rcu_assign_pointer() of NULL,
and uses WRITE_ONCE() rather than smp_store_release() in that case.

Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rcupdate.h | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index c61b6b9506e7..9be61e47badc 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -650,7 +650,16 @@ static inline void rcu_preempt_sleep_check(void)
  * please be careful when making changes to rcu_assign_pointer() and the
  * other macros that it invokes.
  */
-#define rcu_assign_pointer(p, v) smp_store_release(&p, RCU_INITIALIZER(v))
+#define rcu_assign_pointer(p, v) \
+({ \
+	uintptr_t _r_a_p__v = (uintptr_t)(v); \
+	\
+	if (__builtin_constant_p(v) && (_r_a_p__v) == (uintptr_t)NULL) \
+		WRITE_ONCE((p), (typeof(p))(_r_a_p__v)); \
+	else \
+		smp_store_release(&p, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \
+	_r_a_p__v; \
+})
 
 /**
  * rcu_access_pointer() - fetch RCU pointer with no dereferencing
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH tip/core/rcu 08/12] rcu: Disable TASKS_RCU for usermode Linux
  2016-06-15 21:45 [PATCH tip/core/rcu 0/12] Miscellaneous RCU fixes for 4.8 Paul E. McKenney
                   ` (6 preceding siblings ...)
  2016-06-15 21:46 ` [PATCH tip/core/rcu 07/12] rcu: No ordering for rcu_assign_pointer() of NULL Paul E. McKenney
@ 2016-06-15 21:46 ` Paul E. McKenney
  2016-06-15 21:47   ` Richard Weinberger
  2016-06-15 21:46 ` [PATCH tip/core/rcu 09/12] rcu: Make call_rcu_tasks() tolerate first call with irqs disabled Paul E. McKenney
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2016-06-15 21:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney, Richard Weinberger,
	Jeff Dike

Usermode Linux currently does not implement arch_irqs_disabled_flags(),
which results in a build failure in TASKS_RCU.  Therefore, this commit
disables the TASKS_RCU Kconfig option in usermode Linux builds.  The
usermode Linux maintainers expect to merge arch_irqs_disabled_flags()
into 4.8, at which point this commit may be reverted.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Jeff Dike <jdike@addtoit.com>
---
 init/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/init/Kconfig b/init/Kconfig
index f755a602d4a1..a068265fbcaf 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -517,6 +517,7 @@ config SRCU
 config TASKS_RCU
 	bool
 	default n
+	depends on !UML
 	select SRCU
 	help
 	  This option enables a task-based RCU implementation that uses
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH tip/core/rcu 09/12] rcu: Make call_rcu_tasks() tolerate first call with irqs disabled
  2016-06-15 21:45 [PATCH tip/core/rcu 0/12] Miscellaneous RCU fixes for 4.8 Paul E. McKenney
                   ` (7 preceding siblings ...)
  2016-06-15 21:46 ` [PATCH tip/core/rcu 08/12] rcu: Disable TASKS_RCU for usermode Linux Paul E. McKenney
@ 2016-06-15 21:46 ` Paul E. McKenney
  2016-06-15 22:15   ` Peter Zijlstra
  2016-06-15 22:16   ` Peter Zijlstra
  2016-06-15 21:46 ` [PATCH tip/core/rcu 10/12] rcu: Fix a typo in a comment Paul E. McKenney
                   ` (2 subsequent siblings)
  11 siblings, 2 replies; 25+ messages in thread
From: Paul E. McKenney @ 2016-06-15 21:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

Currently, if the very first call to call_rcu_tasks() has irqs disabled,
it will create the rcu_tasks_kthread with irqs disabled, which will
result in a splat in the memory allocator, which kthread_run() invokes
with the expectation that irqs are enabled.

This commit fixes this problem by deferring kthread creation if called
with irqs disabled.  The first call to call_rcu_tasks() that has irqs
enabled will create the kthread.

This bug was detected by rcutorture changes that were motivated by
Iftekhar Ahmed's mutation-testing efforts.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rcupdate.h | 1 +
 kernel/rcu/update.c      | 7 +++++--
 kernel/sched/fair.c      | 2 +-
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 9be61e47badc..a225530b2ece 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -45,6 +45,7 @@
 #include <linux/bug.h>
 #include <linux/compiler.h>
 #include <linux/ktime.h>
+#include <linux/irqflags.h>
 
 #include <asm/barrier.h>
 
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 3e888cd5a594..f0d8322bc3ec 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -528,6 +528,7 @@ static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 10;
 module_param(rcu_task_stall_timeout, int, 0644);
 
 static void rcu_spawn_tasks_kthread(void);
+static struct task_struct *rcu_tasks_kthread_ptr;
 
 /*
  * Post an RCU-tasks callback.  First call must be from process context
@@ -537,6 +538,7 @@ void call_rcu_tasks(struct rcu_head *rhp, rcu_callback_t func)
 {
 	unsigned long flags;
 	bool needwake;
+	bool havetask = READ_ONCE(rcu_tasks_kthread_ptr);
 
 	rhp->next = NULL;
 	rhp->func = func;
@@ -545,7 +547,9 @@ void call_rcu_tasks(struct rcu_head *rhp, rcu_callback_t func)
 	*rcu_tasks_cbs_tail = rhp;
 	rcu_tasks_cbs_tail = &rhp->next;
 	raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags);
-	if (needwake) {
+	/* We can't create the thread unless interrupts are enabled. */
+	if ((needwake && havetask) ||
+	    (!havetask && !irqs_disabled_flags(flags))) {
 		rcu_spawn_tasks_kthread();
 		wake_up(&rcu_tasks_cbs_wq);
 	}
@@ -790,7 +794,6 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 static void rcu_spawn_tasks_kthread(void)
 {
 	static DEFINE_MUTEX(rcu_tasks_kthread_mutex);
-	static struct task_struct *rcu_tasks_kthread_ptr;
 	struct task_struct *t;
 
 	if (READ_ONCE(rcu_tasks_kthread_ptr)) {
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 218f8e83db73..4a3b279beb42 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2175,7 +2175,7 @@ void task_numa_free(struct task_struct *p)
 
 		grp->nr_tasks--;
 		spin_unlock_irqrestore(&grp->lock, flags);
-		RCU_INIT_POINTER(p->numa_group, NULL);
+		rcu_assign_pointer(p->numa_group, NULL);
 		put_numa_group(grp);
 	}
 
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH tip/core/rcu 10/12] rcu: Fix a typo in a comment
  2016-06-15 21:45 [PATCH tip/core/rcu 0/12] Miscellaneous RCU fixes for 4.8 Paul E. McKenney
                   ` (8 preceding siblings ...)
  2016-06-15 21:46 ` [PATCH tip/core/rcu 09/12] rcu: Make call_rcu_tasks() tolerate first call with irqs disabled Paul E. McKenney
@ 2016-06-15 21:46 ` Paul E. McKenney
  2016-06-15 21:46 ` [PATCH tip/core/rcu 11/12] rcu: sysctl: Panic on RCU Stall Paul E. McKenney
  2016-06-15 21:46 ` [PATCH tip/core/rcu 12/12] rcu: Correctly handle sparse possible cpus Paul E. McKenney
  11 siblings, 0 replies; 25+ messages in thread
From: Paul E. McKenney @ 2016-06-15 21:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Paul E. McKenney

In the area in ht pursuit of a bug, so might as well clean it up.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/rcutorture.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 084a28a732eb..60bd533902f9 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -1476,7 +1476,7 @@ static int rcu_torture_barrier_cbs(void *arg)
 			break;
 		/*
 		 * The above smp_load_acquire() ensures barrier_phase load
-		 * is ordered before the folloiwng ->call().
+		 * is ordered before the following ->call().
 		 */
 		local_irq_disable(); /* Just to test no-irq call_rcu(). */
 		cur_ops->call(&rcu, rcu_torture_barrier_cbf);
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH tip/core/rcu 11/12] rcu: sysctl: Panic on RCU Stall
  2016-06-15 21:45 [PATCH tip/core/rcu 0/12] Miscellaneous RCU fixes for 4.8 Paul E. McKenney
                   ` (9 preceding siblings ...)
  2016-06-15 21:46 ` [PATCH tip/core/rcu 10/12] rcu: Fix a typo in a comment Paul E. McKenney
@ 2016-06-15 21:46 ` Paul E. McKenney
  2016-06-15 21:46 ` [PATCH tip/core/rcu 12/12] rcu: Correctly handle sparse possible cpus Paul E. McKenney
  11 siblings, 0 replies; 25+ messages in thread
From: Paul E. McKenney @ 2016-06-15 21:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Daniel Bristot de Oliveira, Jonathan Corbet,
	Paul E. McKenney

From: Daniel Bristot de Oliveira <bristot@redhat.com>

It is not always easy to determine the cause of an RCU stall just by
analysing the RCU stall messages, mainly when the problem is caused
by the indirect starvation of rcu threads. For example, when preempt_rcu
is not awakened due to the starvation of a timer softirq.

We have been hard coding panic() in the RCU stall functions for
some time while testing the kernel-rt. But this is not possible in
some scenarios, like when supporting customers.

This patch implements the sysctl kernel.panic_on_rcu_stall. If
set to 1, the system will panic() when an RCU stall takes place,
enabling the capture of a vmcore. The vmcore provides a way to analyze
all kernel/tasks states, helping out to point to the culprit and the
solution for the stall.

The kernel.panic_on_rcu_stall sysctl is disabled by default.

Changes from v1:
- Fixed a typo in the git log
- The if(sysctl_panic_on_rcu_stall) panic() is in a static function
- Fixed the CONFIG_TINY_RCU compilation issue
- The var sysctl_panic_on_rcu_stall is now __read_mostly

Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Tested-by: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/sysctl/kernel.txt | 12 ++++++++++++
 include/linux/kernel.h          |  1 +
 kernel/rcu/tree.c               | 12 ++++++++++++
 kernel/sysctl.c                 | 11 +++++++++++
 4 files changed, 36 insertions(+)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index a3683ce2a2f3..33204604de6c 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -58,6 +58,7 @@ show up in /proc/sys/kernel:
 - panic_on_stackoverflow
 - panic_on_unrecovered_nmi
 - panic_on_warn
+- panic_on_rcu_stall
 - perf_cpu_time_max_percent
 - perf_event_paranoid
 - perf_event_max_stack
@@ -618,6 +619,17 @@ a kernel rebuild when attempting to kdump at the location of a WARN().
 
 ==============================================================
 
+panic_on_rcu_stall:
+
+When set to 1, calls panic() after RCU stall detection messages. This
+is useful to define the root cause of RCU stalls using a vmcore.
+
+0: do not panic() when RCU stall takes place, default behavior.
+
+1: panic() after printing RCU stall messages.
+
+==============================================================
+
 perf_cpu_time_max_percent:
 
 Hints to the kernel how much CPU time it should be allowed to
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 94aa10ffe156..c42082112ec8 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -451,6 +451,7 @@ extern int panic_on_oops;
 extern int panic_on_unrecovered_nmi;
 extern int panic_on_io_nmi;
 extern int panic_on_warn;
+extern int sysctl_panic_on_rcu_stall;
 extern int sysctl_panic_on_stackoverflow;
 
 extern bool crash_kexec_post_notifiers;
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c844b6142a86..e5ca15a461b9 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -125,6 +125,8 @@ int rcu_num_lvls __read_mostly = RCU_NUM_LVLS;
 /* Number of rcu_nodes at specified level. */
 static int num_rcu_lvl[] = NUM_RCU_LVL_INIT;
 int rcu_num_nodes __read_mostly = NUM_RCU_NODES; /* Total # rcu_nodes in use. */
+/* panic() on RCU Stall sysctl. */
+int sysctl_panic_on_rcu_stall __read_mostly;
 
 /*
  * The rcu_scheduler_active variable transitions from zero to one just
@@ -1312,6 +1314,12 @@ static void rcu_stall_kick_kthreads(struct rcu_state *rsp)
 	}
 }
 
+static inline void panic_on_rcu_stall(void)
+{
+	if (sysctl_panic_on_rcu_stall)
+		panic("RCU Stall\n");
+}
+
 static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
 {
 	int cpu;
@@ -1391,6 +1399,8 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
 
 	rcu_check_gp_kthread_starvation(rsp);
 
+	panic_on_rcu_stall();
+
 	force_quiescent_state(rsp);  /* Kick them all. */
 }
 
@@ -1431,6 +1441,8 @@ static void print_cpu_stall(struct rcu_state *rsp)
 			   jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 
+	panic_on_rcu_stall();
+
 	/*
 	 * Attempt to revive the RCU machinery by forcing a context switch.
 	 *
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 87b2fc38398b..35f0dcb1cb4f 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1205,6 +1205,17 @@ static struct ctl_table kern_table[] = {
 		.extra2		= &one,
 	},
 #endif
+#if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU)
+	{
+		.procname	= "panic_on_rcu_stall",
+		.data		= &sysctl_panic_on_rcu_stall,
+		.maxlen		= sizeof(sysctl_panic_on_rcu_stall),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+#endif
 	{ }
 };
 
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH tip/core/rcu 12/12] rcu: Correctly handle sparse possible cpus
  2016-06-15 21:45 [PATCH tip/core/rcu 0/12] Miscellaneous RCU fixes for 4.8 Paul E. McKenney
                   ` (10 preceding siblings ...)
  2016-06-15 21:46 ` [PATCH tip/core/rcu 11/12] rcu: sysctl: Panic on RCU Stall Paul E. McKenney
@ 2016-06-15 21:46 ` Paul E. McKenney
  11 siblings, 0 replies; 25+ messages in thread
From: Paul E. McKenney @ 2016-06-15 21:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Mark Rutland, Catalin Marinas, Steve Capper,
	Will Deacon, Paul E. McKenney

From: Mark Rutland <mark.rutland@arm.com>

In many cases in the RCU tree code, we iterate over the set of cpus for
a leaf node described by rcu_node::grplo and rcu_node::grphi, checking
per-cpu data for each cpu in this range. However, if the set of possible
cpus is sparse, some cpus described in this range are not possible, and
thus no per-cpu region will have been allocated (or initialised) for
them by the generic percpu code.

Erroneous accesses to a per-cpu area for these !possible cpus may fault
or may hit other data depending on the addressed generated when the
erroneous per cpu offset is applied. In practice, both cases have been
observed on arm64 hardware (the former being silent, but detectable with
additional patches).

To avoid issues resulting from this, we must iterate over the set of
*possible* cpus for a given leaf node. This patch add a new helper,
for_each_leaf_node_possible_cpu, to enable this. As iteration is often
intertwined with rcu_node local bitmask manipulation, a new
leaf_node_cpu_bit helper is added to make this simpler and more
consistent. The RCU tree code is made to use both of these where
appropriate.

Without this patch, running reboot at a shell can result in an oops
like:

[ 3369.075979] Unable to handle kernel paging request at virtual address ffffff8008b21b4c
[ 3369.083881] pgd = ffffffc3ecdda000
[ 3369.087270] [ffffff8008b21b4c] *pgd=00000083eca48003, *pud=00000083eca48003, *pmd=0000000000000000
[ 3369.096222] Internal error: Oops: 96000007 [#1] PREEMPT SMP
[ 3369.101781] Modules linked in:
[ 3369.104825] CPU: 2 PID: 1817 Comm: NetworkManager Tainted: G        W       4.6.0+ #3
[ 3369.121239] task: ffffffc0fa13e000 ti: ffffffc3eb940000 task.ti: ffffffc3eb940000
[ 3369.128708] PC is at sync_rcu_exp_select_cpus+0x188/0x510
[ 3369.134094] LR is at sync_rcu_exp_select_cpus+0x104/0x510
[ 3369.139479] pc : [<ffffff80081109a8>] lr : [<ffffff8008110924>] pstate: 200001c5
[ 3369.146860] sp : ffffffc3eb9435a0
[ 3369.150162] x29: ffffffc3eb9435a0 x28: ffffff8008be4f88
[ 3369.155465] x27: ffffff8008b66c80 x26: ffffffc3eceb2600
[ 3369.160767] x25: 0000000000000001 x24: ffffff8008be4f88
[ 3369.166070] x23: ffffff8008b51c3c x22: ffffff8008b66c80
[ 3369.171371] x21: 0000000000000001 x20: ffffff8008b21b40
[ 3369.176673] x19: ffffff8008b66c80 x18: 0000000000000000
[ 3369.181975] x17: 0000007fa951a010 x16: ffffff80086a30f0
[ 3369.187278] x15: 0000007fa9505590 x14: 0000000000000000
[ 3369.192580] x13: ffffff8008b51000 x12: ffffffc3eb940000
[ 3369.197882] x11: 0000000000000006 x10: ffffff8008b51b78
[ 3369.203184] x9 : 0000000000000001 x8 : ffffff8008be4000
[ 3369.208486] x7 : ffffff8008b21b40 x6 : 0000000000001003
[ 3369.213788] x5 : 0000000000000000 x4 : ffffff8008b27280
[ 3369.219090] x3 : ffffff8008b21b4c x2 : 0000000000000001
[ 3369.224406] x1 : 0000000000000001 x0 : 0000000000000140
...
[ 3369.972257] [<ffffff80081109a8>] sync_rcu_exp_select_cpus+0x188/0x510
[ 3369.978685] [<ffffff80081128b4>] synchronize_rcu_expedited+0x64/0xa8
[ 3369.985026] [<ffffff80086b987c>] synchronize_net+0x24/0x30
[ 3369.990499] [<ffffff80086ddb54>] dev_deactivate_many+0x28c/0x298
[ 3369.996493] [<ffffff80086b6bb8>] __dev_close_many+0x60/0xd0
[ 3370.002052] [<ffffff80086b6d48>] __dev_close+0x28/0x40
[ 3370.007178] [<ffffff80086bf62c>] __dev_change_flags+0x8c/0x158
[ 3370.012999] [<ffffff80086bf718>] dev_change_flags+0x20/0x60
[ 3370.018558] [<ffffff80086cf7f0>] do_setlink+0x288/0x918
[ 3370.023771] [<ffffff80086d0798>] rtnl_newlink+0x398/0x6a8
[ 3370.029158] [<ffffff80086cee84>] rtnetlink_rcv_msg+0xe4/0x220
[ 3370.034891] [<ffffff80086e274c>] netlink_rcv_skb+0xc4/0xf8
[ 3370.040364] [<ffffff80086ced8c>] rtnetlink_rcv+0x2c/0x40
[ 3370.045663] [<ffffff80086e1fe8>] netlink_unicast+0x160/0x238
[ 3370.051309] [<ffffff80086e24b8>] netlink_sendmsg+0x2f0/0x358
[ 3370.056956] [<ffffff80086a0070>] sock_sendmsg+0x18/0x30
[ 3370.062168] [<ffffff80086a21cc>] ___sys_sendmsg+0x26c/0x280
[ 3370.067728] [<ffffff80086a30ac>] __sys_sendmsg+0x44/0x88
[ 3370.073027] [<ffffff80086a3100>] SyS_sendmsg+0x10/0x20
[ 3370.078153] [<ffffff8008085e70>] el0_svc_naked+0x24/0x28

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Dennis Chen <dennis.chen@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/tree.c        | 21 +++++++++------------
 kernel/rcu/tree.h        | 15 +++++++++++++++
 kernel/rcu/tree_exp.h    | 16 +++++++---------
 kernel/rcu/tree_plugin.h |  5 +++--
 4 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index e5ca15a461b9..f433959e9322 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1287,9 +1287,9 @@ static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
 	rcu_for_each_leaf_node(rsp, rnp) {
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		if (rnp->qsmask != 0) {
-			for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)
-				if (rnp->qsmask & (1UL << cpu))
-					dump_cpu_task(rnp->grplo + cpu);
+			for_each_leaf_node_possible_cpu(rnp, cpu)
+				if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu))
+					dump_cpu_task(cpu);
 		}
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 	}
@@ -1360,10 +1360,9 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		ndetected += rcu_print_task_stall(rnp);
 		if (rnp->qsmask != 0) {
-			for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)
-				if (rnp->qsmask & (1UL << cpu)) {
-					print_cpu_stall_info(rsp,
-							     rnp->grplo + cpu);
+			for_each_leaf_node_possible_cpu(rnp, cpu)
+				if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) {
+					print_cpu_stall_info(rsp, cpu);
 					ndetected++;
 				}
 		}
@@ -2884,7 +2883,6 @@ static void force_qs_rnp(struct rcu_state *rsp,
 				  unsigned long *maxj),
 			 bool *isidle, unsigned long *maxj)
 {
-	unsigned long bit;
 	int cpu;
 	unsigned long flags;
 	unsigned long mask;
@@ -2919,9 +2917,8 @@ static void force_qs_rnp(struct rcu_state *rsp,
 				continue;
 			}
 		}
-		cpu = rnp->grplo;
-		bit = 1;
-		for (; cpu <= rnp->grphi; cpu++, bit <<= 1) {
+		for_each_leaf_node_possible_cpu(rnp, cpu) {
+			unsigned long bit = leaf_node_cpu_bit(rnp, cpu);
 			if ((rnp->qsmask & bit) != 0) {
 				if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
 					mask |= bit;
@@ -3750,7 +3747,7 @@ rcu_boot_init_percpu_data(int cpu, struct rcu_state *rsp)
 
 	/* Set up local state, ensuring consistent view of global state. */
 	raw_spin_lock_irqsave_rcu_node(rnp, flags);
-	rdp->grpmask = 1UL << (cpu - rdp->mynode->grplo);
+	rdp->grpmask = leaf_node_cpu_bit(rdp->mynode, cpu);
 	rdp->dynticks = &per_cpu(rcu_dynticks, cpu);
 	WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != DYNTICK_TASK_EXIT_IDLE);
 	WARN_ON_ONCE(atomic_read(&rdp->dynticks->dynticks) != 1);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index e3959f5e6ddf..f714f873bf9d 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -254,6 +254,13 @@ struct rcu_node {
 } ____cacheline_internodealigned_in_smp;
 
 /*
+ * Bitmasks in an rcu_node cover the interval [grplo, grphi] of CPU IDs, and
+ * are indexed relative to this interval rather than the global CPU ID space.
+ * This generates the bit for a CPU in node-local masks.
+ */
+#define leaf_node_cpu_bit(rnp, cpu) (1UL << ((cpu) - (rnp)->grplo))
+
+/*
  * Do a full breadth-first scan of the rcu_node structures for the
  * specified rcu_state structure.
  */
@@ -281,6 +288,14 @@ struct rcu_node {
 	     (rnp) < &(rsp)->node[rcu_num_nodes]; (rnp)++)
 
 /*
+ * Iterate over all possible CPUs in a leaf RCU node.
+ */
+#define for_each_leaf_node_possible_cpu(rnp, cpu) \
+	for ((cpu) = cpumask_next(rnp->grplo - 1, cpu_possible_mask); \
+	     cpu <= rnp->grphi; \
+	     cpu = cpumask_next((cpu), cpu_possible_mask))
+
+/*
  * Union to allow "aggregate OR" operation on the need for a quiescent
  * state by the normal and expedited grace periods.
  */
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 00a02a231ada..d400434af6b2 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -344,7 +344,6 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
 {
 	int cpu;
 	unsigned long flags;
-	unsigned long mask;
 	unsigned long mask_ofl_test;
 	unsigned long mask_ofl_ipi;
 	int ret;
@@ -356,7 +355,7 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
 
 		/* Each pass checks a CPU for identity, offline, and idle. */
 		mask_ofl_test = 0;
-		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++) {
+		for_each_leaf_node_possible_cpu(rnp, cpu) {
 			struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
 			struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
 
@@ -376,8 +375,8 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 
 		/* IPI the remaining CPUs for expedited quiescent state. */
-		mask = 1;
-		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
+		for_each_leaf_node_possible_cpu(rnp, cpu) {
+			unsigned long mask = leaf_node_cpu_bit(rnp, cpu);
 			if (!(mask_ofl_ipi & mask))
 				continue;
 retry_ipi:
@@ -440,10 +439,10 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 		ndetected = 0;
 		rcu_for_each_leaf_node(rsp, rnp) {
 			ndetected += rcu_print_task_exp_stall(rnp);
-			mask = 1;
-			for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
+			for_each_leaf_node_possible_cpu(rnp, cpu) {
 				struct rcu_data *rdp;
 
+				mask = leaf_node_cpu_bit(rnp, cpu);
 				if (!(rnp->expmask & mask))
 					continue;
 				ndetected++;
@@ -453,7 +452,6 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 					"o."[!!(rdp->grpmask & rnp->expmaskinit)],
 					"N."[!!(rdp->grpmask & rnp->expmaskinitnext)]);
 			}
-			mask <<= 1;
 		}
 		pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n",
 			jiffies - jiffies_start, rsp->expedited_sequence,
@@ -473,8 +471,8 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 			pr_cont("\n");
 		}
 		rcu_for_each_leaf_node(rsp, rnp) {
-			mask = 1;
-			for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
+			for_each_leaf_node_possible_cpu(rnp, cpu) {
+				mask = leaf_node_cpu_bit(rnp, cpu);
 				if (!(rnp->expmask & mask))
 					continue;
 				dump_cpu_task(cpu);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 695071dd1e9c..534c590e8852 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1166,8 +1166,9 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
 		return;
 	if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
 		return;
-	for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask >>= 1)
-		if ((mask & 0x1) && cpu != outgoingcpu)
+	for_each_leaf_node_possible_cpu(rnp, cpu)
+		if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
+		    cpu != outgoingcpu)
 			cpumask_set_cpu(cpu, cm);
 	if (cpumask_weight(cm) == 0)
 		cpumask_setall(cm);
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH tip/core/rcu 08/12] rcu: Disable TASKS_RCU for usermode Linux
  2016-06-15 21:46 ` [PATCH tip/core/rcu 08/12] rcu: Disable TASKS_RCU for usermode Linux Paul E. McKenney
@ 2016-06-15 21:47   ` Richard Weinberger
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Weinberger @ 2016-06-15 21:47 UTC (permalink / raw)
  To: Paul E. McKenney, linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, dvhart, fweisbec,
	oleg, bobby.prani, Jeff Dike

Am 15.06.2016 um 23:46 schrieb Paul E. McKenney:
> Usermode Linux currently does not implement arch_irqs_disabled_flags(),
> which results in a build failure in TASKS_RCU.  Therefore, this commit
> disables the TASKS_RCU Kconfig option in usermode Linux builds.  The
> usermode Linux maintainers expect to merge arch_irqs_disabled_flags()
> into 4.8, at which point this commit may be reverted.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Richard Weinberger <richard@nod.at>
> Cc: Jeff Dike <jdike@addtoit.com>
> ---
>  init/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index f755a602d4a1..a068265fbcaf 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -517,6 +517,7 @@ config SRCU
>  config TASKS_RCU
>  	bool
>  	default n
> +	depends on !UML
>  	select SRCU
>  	help
>  	  This option enables a task-based RCU implementation that uses

Acked-by: Richard Weinberger <richard@nod.at>

Thanks,
//richard

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH tip/core/rcu 07/12] rcu: No ordering for rcu_assign_pointer() of NULL
  2016-06-15 21:46 ` [PATCH tip/core/rcu 07/12] rcu: No ordering for rcu_assign_pointer() of NULL Paul E. McKenney
@ 2016-06-15 22:03   ` Peter Zijlstra
  2016-06-15 22:12     ` Peter Zijlstra
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2016-06-15 22:03 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Wed, Jun 15, 2016 at 02:46:08PM -0700, Paul E. McKenney wrote:
> This commit does a compile-time check for rcu_assign_pointer() of NULL,
> and uses WRITE_ONCE() rather than smp_store_release() in that case.
> 
> Reported-by: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  include/linux/rcupdate.h | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index c61b6b9506e7..9be61e47badc 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -650,7 +650,16 @@ static inline void rcu_preempt_sleep_check(void)
>   * please be careful when making changes to rcu_assign_pointer() and the
>   * other macros that it invokes.
>   */
> -#define rcu_assign_pointer(p, v) smp_store_release(&p, RCU_INITIALIZER(v))
> +#define rcu_assign_pointer(p, v) \
> +({ \
> +	uintptr_t _r_a_p__v = (uintptr_t)(v); \
> +	\
> +	if (__builtin_constant_p(v) && (_r_a_p__v) == (uintptr_t)NULL) \
> +		WRITE_ONCE((p), (typeof(p))(_r_a_p__v)); \
> +	else \
> +		smp_store_release(&p, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \
> +	_r_a_p__v; \
> +})

Can we pretty please right align the '\'s ?

Also, didn't we used to do this and then reverted it again for some
obscure reason?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH tip/core/rcu 04/12] rcu: Move expedited code from tree.c to tree_exp.h
  2016-06-15 21:46 ` [PATCH tip/core/rcu 04/12] rcu: Move expedited code from tree.c to tree_exp.h Paul E. McKenney
@ 2016-06-15 22:05   ` Peter Zijlstra
  2016-06-15 22:16     ` Paul E. McKenney
  2016-06-17 15:48   ` Pranith Kumar
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2016-06-15 22:05 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Wed, Jun 15, 2016 at 02:46:05PM -0700, Paul E. McKenney wrote:
> People have been having some difficulty finding their way around the
> RCU code.  This commit therefore pulls some of the expedited grace-period
> code from tree.c to a new tree_exp.h file.  This commit is strictly code
> movement, with the exception of a forward declaration that was added
> for the sync_sched_exp_online_cleanup() function.
> 
> A subsequent commit will move the remaining expedited grace-period code
> from tree_plugin.h to tree_exp.h.

Part of the problem for me is always the fact that its so weirdly
divided over these files. Now you're making that worse. How is this
helping?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH tip/core/rcu 07/12] rcu: No ordering for rcu_assign_pointer() of NULL
  2016-06-15 22:03   ` Peter Zijlstra
@ 2016-06-15 22:12     ` Peter Zijlstra
  2016-06-15 22:41       ` Paul E. McKenney
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2016-06-15 22:12 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Thu, Jun 16, 2016 at 12:03:39AM +0200, Peter Zijlstra wrote:
> On Wed, Jun 15, 2016 at 02:46:08PM -0700, Paul E. McKenney wrote:
> > This commit does a compile-time check for rcu_assign_pointer() of NULL,
> > and uses WRITE_ONCE() rather than smp_store_release() in that case.
> > 
> > Reported-by: Christoph Hellwig <hch@infradead.org>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  include/linux/rcupdate.h | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index c61b6b9506e7..9be61e47badc 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -650,7 +650,16 @@ static inline void rcu_preempt_sleep_check(void)
> >   * please be careful when making changes to rcu_assign_pointer() and the
> >   * other macros that it invokes.
> >   */
> > -#define rcu_assign_pointer(p, v) smp_store_release(&p, RCU_INITIALIZER(v))
> > +#define rcu_assign_pointer(p, v) \
> > +({ \
> > +	uintptr_t _r_a_p__v = (uintptr_t)(v); \
> > +	\
> > +	if (__builtin_constant_p(v) && (_r_a_p__v) == (uintptr_t)NULL) \
> > +		WRITE_ONCE((p), (typeof(p))(_r_a_p__v)); \
> > +	else \
> > +		smp_store_release(&p, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \
> > +	_r_a_p__v; \
> > +})
> 
> Can we pretty please right align the '\'s ?
> 
> Also, didn't we used to do this and then reverted it again for some
> obscure reason?


lkml.kernel.org/r/20140909094235.GD19379@twins.programming.kicks-ass.net

What changed since then? And can we now pretty please get rid of that
RCU_INIT_POINTER() nonsense?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH tip/core/rcu 09/12] rcu: Make call_rcu_tasks() tolerate first call with irqs disabled
  2016-06-15 21:46 ` [PATCH tip/core/rcu 09/12] rcu: Make call_rcu_tasks() tolerate first call with irqs disabled Paul E. McKenney
@ 2016-06-15 22:15   ` Peter Zijlstra
  2016-06-15 22:54     ` Paul E. McKenney
  2016-06-15 22:16   ` Peter Zijlstra
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2016-06-15 22:15 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Wed, Jun 15, 2016 at 02:46:10PM -0700, Paul E. McKenney wrote:
> Currently, if the very first call to call_rcu_tasks() has irqs disabled,
> it will create the rcu_tasks_kthread with irqs disabled, which will
> result in a splat in the memory allocator, which kthread_run() invokes
> with the expectation that irqs are enabled.
> 
> This commit fixes this problem by deferring kthread creation if called
> with irqs disabled.  The first call to call_rcu_tasks() that has irqs
> enabled will create the kthread.
> 
> This bug was detected by rcutorture changes that were motivated by
> Iftekhar Ahmed's mutation-testing efforts.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 218f8e83db73..4a3b279beb42 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2175,7 +2175,7 @@ void task_numa_free(struct task_struct *p)
>  
>  		grp->nr_tasks--;
>  		spin_unlock_irqrestore(&grp->lock, flags);
> -		RCU_INIT_POINTER(p->numa_group, NULL);
> +		rcu_assign_pointer(p->numa_group, NULL);
>  		put_numa_group(grp);
>  	}

This seems entirely unrelated; albeit desired given that other patch.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH tip/core/rcu 09/12] rcu: Make call_rcu_tasks() tolerate first call with irqs disabled
  2016-06-15 21:46 ` [PATCH tip/core/rcu 09/12] rcu: Make call_rcu_tasks() tolerate first call with irqs disabled Paul E. McKenney
  2016-06-15 22:15   ` Peter Zijlstra
@ 2016-06-15 22:16   ` Peter Zijlstra
  2016-06-15 22:58     ` Paul E. McKenney
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2016-06-15 22:16 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Wed, Jun 15, 2016 at 02:46:10PM -0700, Paul E. McKenney wrote:
> Currently, if the very first call to call_rcu_tasks() has irqs disabled,
> it will create the rcu_tasks_kthread with irqs disabled, which will
> result in a splat in the memory allocator, which kthread_run() invokes
> with the expectation that irqs are enabled.
> 
> This commit fixes this problem by deferring kthread creation if called
> with irqs disabled.  The first call to call_rcu_tasks() that has irqs
> enabled will create the kthread.
> 
> This bug was detected by rcutorture changes that were motivated by
> Iftekhar Ahmed's mutation-testing efforts.

Seems fragile. What if someone manages to only use call_rcu_tasks() with
IRQs disabled?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH tip/core/rcu 04/12] rcu: Move expedited code from tree.c to tree_exp.h
  2016-06-15 22:05   ` Peter Zijlstra
@ 2016-06-15 22:16     ` Paul E. McKenney
  0 siblings, 0 replies; 25+ messages in thread
From: Paul E. McKenney @ 2016-06-15 22:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Thu, Jun 16, 2016 at 12:05:39AM +0200, Peter Zijlstra wrote:
> On Wed, Jun 15, 2016 at 02:46:05PM -0700, Paul E. McKenney wrote:
> > People have been having some difficulty finding their way around the
> > RCU code.  This commit therefore pulls some of the expedited grace-period
> > code from tree.c to a new tree_exp.h file.  This commit is strictly code
> > movement, with the exception of a forward declaration that was added
> > for the sync_sched_exp_online_cleanup() function.
> > 
> > A subsequent commit will move the remaining expedited grace-period code
> > from tree_plugin.h to tree_exp.h.
> 
> Part of the problem for me is always the fact that its so weirdly
> divided over these files. Now you're making that worse. How is this
> helping?

The expedited stuff is now in one place rather than being split across
two different files.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH tip/core/rcu 07/12] rcu: No ordering for rcu_assign_pointer() of NULL
  2016-06-15 22:12     ` Peter Zijlstra
@ 2016-06-15 22:41       ` Paul E. McKenney
  0 siblings, 0 replies; 25+ messages in thread
From: Paul E. McKenney @ 2016-06-15 22:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Thu, Jun 16, 2016 at 12:12:58AM +0200, Peter Zijlstra wrote:
> On Thu, Jun 16, 2016 at 12:03:39AM +0200, Peter Zijlstra wrote:
> > On Wed, Jun 15, 2016 at 02:46:08PM -0700, Paul E. McKenney wrote:
> > > This commit does a compile-time check for rcu_assign_pointer() of NULL,
> > > and uses WRITE_ONCE() rather than smp_store_release() in that case.
> > > 
> > > Reported-by: Christoph Hellwig <hch@infradead.org>
> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > ---
> > >  include/linux/rcupdate.h | 11 ++++++++++-
> > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > index c61b6b9506e7..9be61e47badc 100644
> > > --- a/include/linux/rcupdate.h
> > > +++ b/include/linux/rcupdate.h
> > > @@ -650,7 +650,16 @@ static inline void rcu_preempt_sleep_check(void)
> > >   * please be careful when making changes to rcu_assign_pointer() and the
> > >   * other macros that it invokes.
> > >   */
> > > -#define rcu_assign_pointer(p, v) smp_store_release(&p, RCU_INITIALIZER(v))
> > > +#define rcu_assign_pointer(p, v) \
> > > +({ \
> > > +	uintptr_t _r_a_p__v = (uintptr_t)(v); \
> > > +	\
> > > +	if (__builtin_constant_p(v) && (_r_a_p__v) == (uintptr_t)NULL) \
> > > +		WRITE_ONCE((p), (typeof(p))(_r_a_p__v)); \
> > > +	else \
> > > +		smp_store_release(&p, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \
> > > +	_r_a_p__v; \
> > > +})
> > 
> > Can we pretty please right align the '\'s ?

If you insist...  ;-)

Done.

> > Also, didn't we used to do this and then reverted it again for some
> > obscure reason?
> 
> lkml.kernel.org/r/20140909094235.GD19379@twins.programming.kicks-ass.net

There was indeed a compiler bug long ago that could generate spurious
warnings:

https://groups.google.com/forum/#!topic/linux.kernel/y2FIhJ-WVJc

> What changed since then? And can we now pretty please get rid of that
> RCU_INIT_POINTER() nonsense?

Five years has passed, the structure of rcu_assign_pointer() has
completely changed, and someone asked for the old behavior.  Seemed
worth a try, given the very visible nature of the gcc complaint.

No complaints thus far, but then again there probably aren't that
many people running -rcu.  That said, I am encouraged by the lack
of reports from the 0day test robot.

If this goes in and there aren't any problems for some time, then
I agree that shrinking the RCU API would be worthwhile.  My idea of
"some time" is about a year, given that it would be a real pain to
push a bunch of changes throughout the kernel only to have to revert
them if the old compiler bug managed to crop up again.  :-/

							Thanx, Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH tip/core/rcu 09/12] rcu: Make call_rcu_tasks() tolerate first call with irqs disabled
  2016-06-15 22:15   ` Peter Zijlstra
@ 2016-06-15 22:54     ` Paul E. McKenney
  0 siblings, 0 replies; 25+ messages in thread
From: Paul E. McKenney @ 2016-06-15 22:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Thu, Jun 16, 2016 at 12:15:14AM +0200, Peter Zijlstra wrote:
> On Wed, Jun 15, 2016 at 02:46:10PM -0700, Paul E. McKenney wrote:
> > Currently, if the very first call to call_rcu_tasks() has irqs disabled,
> > it will create the rcu_tasks_kthread with irqs disabled, which will
> > result in a splat in the memory allocator, which kthread_run() invokes
> > with the expectation that irqs are enabled.
> > 
> > This commit fixes this problem by deferring kthread creation if called
> > with irqs disabled.  The first call to call_rcu_tasks() that has irqs
> > enabled will create the kthread.
> > 
> > This bug was detected by rcutorture changes that were motivated by
> > Iftekhar Ahmed's mutation-testing efforts.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 218f8e83db73..4a3b279beb42 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -2175,7 +2175,7 @@ void task_numa_free(struct task_struct *p)
> >  
> >  		grp->nr_tasks--;
> >  		spin_unlock_irqrestore(&grp->lock, flags);
> > -		RCU_INIT_POINTER(p->numa_group, NULL);
> > +		rcu_assign_pointer(p->numa_group, NULL);
> >  		put_numa_group(grp);
> >  	}
> 
> This seems entirely unrelated; albeit desired given that other patch.

Yikes!

As you probably guessed, this was my test case for rcu_assign_pointer(NULL),
and I clearly failed to clean up after myself.  It turns out that more than
30 few rcu_assign_pointer(NULL) instances have been added in the meantime,
several of which look to be in popular core code.  So some testing will
happen.

But if you would like to have your code to also participate in this testing
effort, here is that patch standalone.

							Thanx, Paul

------------------------------------------------------------------------

commit 0f11d148dfdb67b55efdc72a1a959c8e44c5d54c
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Wed Jun 15 15:52:25 2016 -0700

    sched: Switch from RCU_INIT_POINTER() to rcu_assign_pointer()
    
    Given that rcu_assign_pointer() now avoids providing memory ordering
    when the value assigned is the constant NULL, this commit switches
    task_numa_free() from RCU_INIT_POINTER() to rcu_assign_pointer().
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 218f8e83db73..4a3b279beb42 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2175,7 +2175,7 @@ void task_numa_free(struct task_struct *p)
 
 		grp->nr_tasks--;
 		spin_unlock_irqrestore(&grp->lock, flags);
-		RCU_INIT_POINTER(p->numa_group, NULL);
+		rcu_assign_pointer(p->numa_group, NULL);
 		put_numa_group(grp);
 	}
 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH tip/core/rcu 09/12] rcu: Make call_rcu_tasks() tolerate first call with irqs disabled
  2016-06-15 22:16   ` Peter Zijlstra
@ 2016-06-15 22:58     ` Paul E. McKenney
  0 siblings, 0 replies; 25+ messages in thread
From: Paul E. McKenney @ 2016-06-15 22:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	dvhart, fweisbec, oleg, bobby.prani

On Thu, Jun 16, 2016 at 12:16:04AM +0200, Peter Zijlstra wrote:
> On Wed, Jun 15, 2016 at 02:46:10PM -0700, Paul E. McKenney wrote:
> > Currently, if the very first call to call_rcu_tasks() has irqs disabled,
> > it will create the rcu_tasks_kthread with irqs disabled, which will
> > result in a splat in the memory allocator, which kthread_run() invokes
> > with the expectation that irqs are enabled.
> > 
> > This commit fixes this problem by deferring kthread creation if called
> > with irqs disabled.  The first call to call_rcu_tasks() that has irqs
> > enabled will create the kthread.
> > 
> > This bug was detected by rcutorture changes that were motivated by
> > Iftekhar Ahmed's mutation-testing efforts.
> 
> Seems fragile. What if someone manages to only use call_rcu_tasks() with
> IRQs disabled?

It would have to have users before that could possibly happen.  :-/
And it would not be hard to remove the fragility if needed by setting
up a workqueue, possibly mediated by a timer or whatever.  But it is
hard to motivate myself to do so in advance of users.  For that matter...

Steven, is call_rcu_tasks() needed, or should I just rip it out?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH tip/core/rcu 04/12] rcu: Move expedited code from tree.c to tree_exp.h
  2016-06-15 21:46 ` [PATCH tip/core/rcu 04/12] rcu: Move expedited code from tree.c to tree_exp.h Paul E. McKenney
  2016-06-15 22:05   ` Peter Zijlstra
@ 2016-06-17 15:48   ` Pranith Kumar
  2016-06-17 17:46     ` Paul E. McKenney
  1 sibling, 1 reply; 25+ messages in thread
From: Pranith Kumar @ 2016-06-17 15:48 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Ingo Molnar, jiangshanlai, Dipankar Sarma, Andrew Morton,
	Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, David Howells, Eric Dumazet,
	Darren Hart, Frédéric Weisbecker, Oleg Nesterov

Hi Paul,

On Wed, Jun 15, 2016 at 5:46 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> People have been having some difficulty finding their way around the
> RCU code.  This commit therefore pulls some of the expedited grace-period
> code from tree.c to a new tree_exp.h file.  This commit is strictly code
> movement, with the exception of a forward declaration that was added
> for the sync_sched_exp_online_cleanup() function.
>
> A subsequent commit will move the remaining expedited grace-period code
> from tree_plugin.h to tree_exp.h.
>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  kernel/rcu/tree.c     | 545 +-----------------------------------------------
>  kernel/rcu/tree_exp.h | 564 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 566 insertions(+), 543 deletions(-)
>  create mode 100644 kernel/rcu/tree_exp.h
>

I've always wondered why you chose to only have a header file instead
of the traditional x.h/x.c split for declarations and
definitions(looking at tree_plugin.h). Is there any particular reason
for this?

Thanks,
-- 
Pranith

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH tip/core/rcu 04/12] rcu: Move expedited code from tree.c to tree_exp.h
  2016-06-17 15:48   ` Pranith Kumar
@ 2016-06-17 17:46     ` Paul E. McKenney
  0 siblings, 0 replies; 25+ messages in thread
From: Paul E. McKenney @ 2016-06-17 17:46 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: LKML, Ingo Molnar, jiangshanlai, Dipankar Sarma, Andrew Morton,
	Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, David Howells, Eric Dumazet,
	Darren Hart, Frédéric Weisbecker, Oleg Nesterov

On Fri, Jun 17, 2016 at 11:48:12AM -0400, Pranith Kumar wrote:
> Hi Paul,
> 
> On Wed, Jun 15, 2016 at 5:46 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > People have been having some difficulty finding their way around the
> > RCU code.  This commit therefore pulls some of the expedited grace-period
> > code from tree.c to a new tree_exp.h file.  This commit is strictly code
> > movement, with the exception of a forward declaration that was added
> > for the sync_sched_exp_online_cleanup() function.
> >
> > A subsequent commit will move the remaining expedited grace-period code
> > from tree_plugin.h to tree_exp.h.
> >
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  kernel/rcu/tree.c     | 545 +-----------------------------------------------
> >  kernel/rcu/tree_exp.h | 564 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 566 insertions(+), 543 deletions(-)
> >  create mode 100644 kernel/rcu/tree_exp.h
> >
> 
> I've always wondered why you chose to only have a header file instead
> of the traditional x.h/x.c split for declarations and
> definitions(looking at tree_plugin.h). Is there any particular reason
> for this?

I didn't want to worry about function-call overhead, and doing it this
way allowed inlining.  Perhaps if link-time optimizations end up being
used heavily, I should go to the usual .h/.c split.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-06-17 17:46 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-15 21:45 [PATCH tip/core/rcu 0/12] Miscellaneous RCU fixes for 4.8 Paul E. McKenney
2016-06-15 21:46 ` [PATCH tip/core/rcu 01/12] rcu: Fix outdated rcu_scheduler_active comment Paul E. McKenney
2016-06-15 21:46 ` [PATCH tip/core/rcu 02/12] rcu: Fix outdated hotplug-exclusion comment in rcu_gp_init() Paul E. McKenney
2016-06-15 21:46 ` [PATCH tip/core/rcu 03/12] rcu: Remove some superfluous lines Paul E. McKenney
2016-06-15 21:46 ` [PATCH tip/core/rcu 04/12] rcu: Move expedited code from tree.c to tree_exp.h Paul E. McKenney
2016-06-15 22:05   ` Peter Zijlstra
2016-06-15 22:16     ` Paul E. McKenney
2016-06-17 15:48   ` Pranith Kumar
2016-06-17 17:46     ` Paul E. McKenney
2016-06-15 21:46 ` [PATCH tip/core/rcu 05/12] rcu: Move expedited code from tree_plugin.h " Paul E. McKenney
2016-06-15 21:46 ` [PATCH tip/core/rcu 06/12] rcu: Document RCU_NONIDLE() restrictions in comment header Paul E. McKenney
2016-06-15 21:46 ` [PATCH tip/core/rcu 07/12] rcu: No ordering for rcu_assign_pointer() of NULL Paul E. McKenney
2016-06-15 22:03   ` Peter Zijlstra
2016-06-15 22:12     ` Peter Zijlstra
2016-06-15 22:41       ` Paul E. McKenney
2016-06-15 21:46 ` [PATCH tip/core/rcu 08/12] rcu: Disable TASKS_RCU for usermode Linux Paul E. McKenney
2016-06-15 21:47   ` Richard Weinberger
2016-06-15 21:46 ` [PATCH tip/core/rcu 09/12] rcu: Make call_rcu_tasks() tolerate first call with irqs disabled Paul E. McKenney
2016-06-15 22:15   ` Peter Zijlstra
2016-06-15 22:54     ` Paul E. McKenney
2016-06-15 22:16   ` Peter Zijlstra
2016-06-15 22:58     ` Paul E. McKenney
2016-06-15 21:46 ` [PATCH tip/core/rcu 10/12] rcu: Fix a typo in a comment Paul E. McKenney
2016-06-15 21:46 ` [PATCH tip/core/rcu 11/12] rcu: sysctl: Panic on RCU Stall Paul E. McKenney
2016-06-15 21:46 ` [PATCH tip/core/rcu 12/12] rcu: Correctly handle sparse possible cpus Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).