[PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
@ 2011-07-20  0:17 Paul E. McKenney
  2011-07-20  0:18 ` [PATCH tip/core/urgent 1/7] rcu: decrease rcu_report_exp_rnp coupling with scheduler Paul E. McKenney
                   ` (9 more replies)
  0 siblings, 10 replies; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20  0:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, greearb, edt

Hello!

This patch set contains fixes for a trainwreck involving RCU, the
scheduler, and threaded interrupts.  This trainwreck involved RCU failing
to properly protect one of its bit fields, use of RCU by the scheduler
from portions of irq_exit() where in_irq() returns false, uses of the
scheduler by RCU colliding with uses of RCU by the scheduler, threaded
interrupts exercising the problematic portions of irq_exit() more heavily,
and so on.  The patches are as follows:

1.	Properly protect current->rcu_read_unlock_special().
	Lack of protection was causing RCU to recurse on itself, which
	in turn resulted in deadlocks involving RCU and the scheduler.
	This affects only RCU_BOOST=y configurations.
2.	Streamline code produced by __rcu_read_unlock().  This one is
	an innocent bystander that is being carried due to conflicts
	with other patches.  (A later version will likely merge it
	with #3 below.)
3.	Make __rcu_read_unlock() delay counting the per-task
	->rcu_read_lock_nesting variable to zero until all cleanup for the
	just-ended RCU read-side critical section has completed.  This
	prevents a number of other cases that could result in deadlock
	due to self recursion.	This affects only TREE_PREEMPT_RCU=y
	configurations.
4.	Make scheduler_ipi() correctly identify itself as being
	in_irq() when it needs to do anything that might involve RCU,
	thus enabling RCU to avoid yet another class of potential
	self-recursions and deadlocks.	This affects PREEMPT_RCU=y
	configurations.
5.	Make irq_exit() inform RCU when it is invoking the scheduler
	in situations where in_irq() would return false, thus
	allowing RCU to correctly avoid self-recursion.  This affects
	TREE_PREEMPT_RCU=y configurations.
6.	Make __lock_task_sighand() execute the entire RCU read-side
	critical section with irqs disabled.  (An experimental patch at
	http://marc.info/?l=linux-kernel&m=131110647222185 might possibly
	make it legal to have an RCU read-side critical section where
	the rcu_read_unlock() is executed with interrupts disabled,
	but where some protion of the RCU read-side critical section
	was preemptible.)  This affects TREE_PREEMPT_RCU=y configurations.

TINY_PREEMPT_RCU will also need a few of these changes, but in the
meantime this patch stack helps organize things better for testing.
These are also available from the following subject-to-rebase git branch:

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git rcu/urgent

							Thanx, Paul

------------------------------------------------------------------------

 b/include/linux/sched.h   |    3 +++
 b/kernel/rcutree_plugin.h |    3 ++-
 b/kernel/sched.c          |   44 ++++++++++++++++++++++++++++++++++++++------
 b/kernel/signal.c         |   16 +++++++++++-----
 b/kernel/softirq.c        |   12 ++++++++++--
 kernel/rcutree_plugin.h   |   45 ++++++++++++++++++++++++++++++++-------------
 6 files changed, 96 insertions(+), 27 deletions(-)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH tip/core/urgent 1/7] rcu: decrease rcu_report_exp_rnp coupling with scheduler
  2011-07-20  0:17 [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck Paul E. McKenney
@ 2011-07-20  0:18 ` Paul E. McKenney
  2011-07-20  2:40   ` Peter Zijlstra
  2011-07-20  0:18 ` [PATCH tip/core/urgent 2/7] rcu: Fix RCU_BOOST race handling current->rcu_read_unlock_special Paul E. McKenney
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20  0:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, greearb, edt, Paul E. McKenney

PREEMPT_RCU read-side critical sections blocking an expedited grace
period invoke rcu_report_exp_rnp().  When the last such critical section
has completed, rcu_report_exp_rnp() invokes the scheduler to wake up the
task that invoked synchronize_rcu_expedited() -- needlessly holding the
root rcu_node structure's lock while doing so, thus needlessly providing
a way for RCU and the scheduler to deadlock.

This commit therefore releases the root rcu_node structure's lock before
calling wake_up().

Reported-by: Ed Tomlinson <edt@aei.ca>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree_plugin.h |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 75113cb..94a674e 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -696,8 +696,10 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
 	raw_spin_lock_irqsave(&rnp->lock, flags);
 	for (;;) {
 		if (!sync_rcu_preempt_exp_done(rnp))
+			raw_spin_unlock_irqrestore(&rnp->lock, flags);
 			break;
 		if (rnp->parent == NULL) {
+			raw_spin_unlock_irqrestore(&rnp->lock, flags);
 			wake_up(&sync_rcu_preempt_exp_wq);
 			break;
 		}
@@ -707,7 +709,6 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
 		raw_spin_lock(&rnp->lock); /* irqs already disabled */
 		rnp->expmask &= ~mask;
 	}
-	raw_spin_unlock_irqrestore(&rnp->lock, flags);
 }
 
 /*
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH tip/core/urgent 2/7] rcu: Fix RCU_BOOST race handling current->rcu_read_unlock_special
  2011-07-20  0:17 [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck Paul E. McKenney
  2011-07-20  0:18 ` [PATCH tip/core/urgent 1/7] rcu: decrease rcu_report_exp_rnp coupling with scheduler Paul E. McKenney
@ 2011-07-20  0:18 ` Paul E. McKenney
  2011-07-20  0:18 ` [PATCH tip/core/urgent 3/7] rcu: Streamline code produced by __rcu_read_unlock() Paul E. McKenney
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20  0:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, greearb, edt, Paul E. McKenney,
	Paul E. McKenney

From: Paul E. McKenney <paul.mckenney@linaro.org>

The RCU_BOOST commits for TREE_PREEMPT_RCU introduced an other-task
write to a new RCU_READ_UNLOCK_BOOSTED bit in the task_struct structure's
->rcu_read_unlock_special field, but, as noted by Steven Rostedt, without
correctly synchronizing all accesses to ->rcu_read_unlock_special.
This could result in bits in ->rcu_read_unlock_special being spuriously
set and cleared due to conflicting accesses, which in turn could result
in deadlocks between the rcu_node structure's ->lock and the scheduler's
rq and pi locks.  These deadlocks would result from RCU incorrectly
believing that the just-ended RCU read-side critical section had been
preempted and/or boosted.  If that RCU read-side critical section was
executed with either rq or pi locks held, RCU's ensuing (incorrect)
calls to the scheduler would cause the scheduler to attempt to once
again acquire the rq and pi locks, resulting in deadlock.  More complex
deadlock cycles are also possible, involving multiple rq and pi locks
as well as locks from multiple rcu_node structures.

This commit fixes synchronization by creating ->rcu_boosted field in
task_struct that is accessed and modified only when holding the ->lock
in the rcu_node structure on which the task is queued (on that rcu_node
structure's ->blkd_tasks list).  This results in tasks accessing only
their own current->rcu_read_unlock_special fields, making unsynchronized
access once again legal, and keeping the rcu_read_unlock() fastpath free
of atomic instructions and memory barriers.

The reason that the rcu_read_unlock() fastpath does not need to access
the new current->rcu_boosted field is that this new field cannot
be non-zero unless the RCU_READ_UNLOCK_BLOCKED bit is set in the
current->rcu_read_unlock_special field.  Therefore, rcu_read_unlock()
need only test current->rcu_read_unlock_special: if that is zero, then
current->rcu_boosted must also be zero.

This bug does not affect TINY_PREEMPT_RCU because this implementation
of RCU accesses current->rcu_read_unlock_special with irqs disabled,
thus preventing races on the !SMP systems that TINY_PREEMPT_RCU runs on.

Maybe-reported-by: Dave Jones <davej@redhat.com>
Maybe-reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/sched.h   |    3 +++
 kernel/rcutree_plugin.h |    8 ++++++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 496770a..76676a4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1254,6 +1254,9 @@ struct task_struct {
 #ifdef CONFIG_PREEMPT_RCU
 	int rcu_read_lock_nesting;
 	char rcu_read_unlock_special;
+#if defined(CONFIG_RCU_BOOST) && defined(CONFIG_TREE_PREEMPT_RCU)
+	int rcu_boosted;
+#endif /* #if defined(CONFIG_RCU_BOOST) && defined(CONFIG_TREE_PREEMPT_RCU) */
 	struct list_head rcu_node_entry;
 #endif /* #ifdef CONFIG_PREEMPT_RCU */
 #ifdef CONFIG_TREE_PREEMPT_RCU
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 94a674e..82b3c58 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -342,6 +342,11 @@ static void rcu_read_unlock_special(struct task_struct *t)
 #ifdef CONFIG_RCU_BOOST
 		if (&t->rcu_node_entry == rnp->boost_tasks)
 			rnp->boost_tasks = np;
+		/* Snapshot and clear ->rcu_boosted with rcu_node lock held. */
+		if (t->rcu_boosted) {
+			special |= RCU_READ_UNLOCK_BOOSTED;
+			t->rcu_boosted = 0;
+		}
 #endif /* #ifdef CONFIG_RCU_BOOST */
 		t->rcu_blocked_node = NULL;
 
@@ -358,7 +363,6 @@ static void rcu_read_unlock_special(struct task_struct *t)
 #ifdef CONFIG_RCU_BOOST
 		/* Unboost if we were boosted. */
 		if (special & RCU_READ_UNLOCK_BOOSTED) {
-			t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_BOOSTED;
 			rt_mutex_unlock(t->rcu_boost_mutex);
 			t->rcu_boost_mutex = NULL;
 		}
@@ -1175,7 +1179,7 @@ static int rcu_boost(struct rcu_node *rnp)
 	t = container_of(tb, struct task_struct, rcu_node_entry);
 	rt_mutex_init_proxy_locked(&mtx, t);
 	t->rcu_boost_mutex = &mtx;
-	t->rcu_read_unlock_special |= RCU_READ_UNLOCK_BOOSTED;
+	t->rcu_boosted = 1;
 	raw_spin_unlock_irqrestore(&rnp->lock, flags);
 	rt_mutex_lock(&mtx);  /* Side effect: boosts task t's priority. */
 	rt_mutex_unlock(&mtx);  /* Keep lockdep happy. */
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH tip/core/urgent 3/7] rcu: Streamline code produced by __rcu_read_unlock()
  2011-07-20  0:17 [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck Paul E. McKenney
  2011-07-20  0:18 ` [PATCH tip/core/urgent 1/7] rcu: decrease rcu_report_exp_rnp coupling with scheduler Paul E. McKenney
  2011-07-20  0:18 ` [PATCH tip/core/urgent 2/7] rcu: Fix RCU_BOOST race handling current->rcu_read_unlock_special Paul E. McKenney
@ 2011-07-20  0:18 ` Paul E. McKenney
  2011-07-20  0:18 ` [PATCH tip/core/urgent 4/7] rcu: protect __rcu_read_unlock() against scheduler-using irq handlers Paul E. McKenney
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20  0:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, greearb, edt, Paul E. McKenney

Given some common flag combinations, particularly -Os, gcc will inline
rcu_read_unlock_special() despite its being in an unlikely() clause.
Use noinline to prohibit this misoptimization.

In addition, move the second barrier() in __rcu_read_unlock() so that
it is not on the common-case code path.  This will allow the compiler to
generate better code for the common-case path through __rcu_read_unlock().

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 kernel/rcutree_plugin.h |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 82b3c58..9439a25 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -284,7 +284,7 @@ static struct list_head *rcu_next_node_entry(struct task_struct *t,
  * notify RCU core processing or task having blocked during the RCU
  * read-side critical section.
  */
-static void rcu_read_unlock_special(struct task_struct *t)
+static noinline void rcu_read_unlock_special(struct task_struct *t)
 {
 	int empty;
 	int empty_exp;
@@ -391,11 +391,11 @@ void __rcu_read_unlock(void)
 	struct task_struct *t = current;
 
 	barrier();  /* needed if we ever invoke rcu_read_unlock in rcutree.c */
-	--t->rcu_read_lock_nesting;
-	barrier();  /* decrement before load of ->rcu_read_unlock_special */
-	if (t->rcu_read_lock_nesting == 0 &&
-	    unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
-		rcu_read_unlock_special(t);
+	if (--t->rcu_read_lock_nesting == 0) {
+		barrier();  /* decr before ->rcu_read_unlock_special load */
+		if (unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
+			rcu_read_unlock_special(t);
+	}
 #ifdef CONFIG_PROVE_LOCKING
 	WARN_ON_ONCE(ACCESS_ONCE(t->rcu_read_lock_nesting) < 0);
 #endif /* #ifdef CONFIG_PROVE_LOCKING */
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH tip/core/urgent 4/7] rcu: protect __rcu_read_unlock() against scheduler-using irq handlers
  2011-07-20  0:17 [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck Paul E. McKenney
                   ` (2 preceding siblings ...)
  2011-07-20  0:18 ` [PATCH tip/core/urgent 3/7] rcu: Streamline code produced by __rcu_read_unlock() Paul E. McKenney
@ 2011-07-20  0:18 ` Paul E. McKenney
  2011-07-20 12:54   ` Peter Zijlstra
  2011-07-20  0:18 ` [PATCH tip/core/urgent 5/7] sched: Add irq_{enter,exit}() to scheduler_ipi() Paul E. McKenney
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20  0:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, greearb, edt, Paul E. McKenney,
	Paul E. McKenney

From: Paul E. McKenney <paul.mckenney@linaro.org>

The addition of RCU read-side critical sections within runqueue and
priority-inheritance lock critical sections introduced some deadlock
cycles, for example, involving interrupts from __rcu_read_unlock()
where the interrupt handlers call wake_up().  This situation can cause
the instance of __rcu_read_unlock() invoked from interrupt to do some
of the processing that would otherwise have been carried out by the
task-level instance of __rcu_read_unlock().  When the interrupt-level
instance of __rcu_read_unlock() is called with a scheduler lock held
from interrupt-entry/exit situations where in_irq() returns false,
deadlock can result.

This commit resolves these deadlocks by using negative values of
the per-task ->rcu_read_lock_nesting counter to indicate that an
instance of __rcu_read_unlock() is in flight, which in turn prevents
instances from interrupt handlers from doing any special processing.
This patch is inspired by Steven Rostedt's earlier patch that similarly
made __rcu_read_unlock() guard against interrupt-mediated recursion
(see https://lkml.org/lkml/2011/7/15/326), but this commit refines
Steven's approach to avoid the need for preemption disabling on the
__rcu_read_unlock() fastpath and to also avoid the need for manipulating
a separate per-CPU variable.

This patch avoids need for preempt_disable() by instead using negative
values of the per-task ->rcu_read_lock_nesting counter.  Note that nested
rcu_read_lock()/rcu_read_unlock() pairs are still permitted, but they will
never see ->rcu_read_lock_nesting go to zero, and will therefore never
invoke rcu_read_unlock_special(), thus preventing them from seeing the
RCU_READ_UNLOCK_BLOCKED bit should it be set in ->rcu_read_unlock_special.
This patch also adds a check for ->rcu_read_unlock_special being negative
in rcu_check_callbacks(), thus preventing the RCU_READ_UNLOCK_NEED_QS
bit from being set should a scheduling-clock interrupt occur while
__rcu_read_unlock() is exiting from an outermost RCU read-side critical
section.

Of course, __rcu_read_unlock() can be preempted during the time that
->rcu_read_lock_nesting is negative.  This could result in the setting
of the RCU_READ_UNLOCK_BLOCKED bit after __rcu_read_unlock() checks it,
and would also result it this task being queued on the corresponding
rcu_node structure's blkd_tasks list.  Therefore, some later RCU read-side
critical section would enter rcu_read_unlock_special() to clean up --
which could result in deadlock if that critical section happened to be in
the scheduler where the runqueue or priority-inheritance locks were held.

This situation is dealt with by making rcu_preempt_note_context_switch()
check for negative ->rcu_read_lock_nesting, thus refraining from
queuing the task (and from setting RCU_READ_UNLOCK_BLOCKED) if we are
already exiting from the outermost RCU read-side critical section (in
other words, we really are no longer actually in that RCU read-side
critical section).  In addition, rcu_preempt_note_context_switch()
invokes rcu_read_unlock_special() to carry out the cleanup in this case,
which clears out the ->rcu_read_unlock_special bits and dequeues the task
(if necessary), in turn avoiding needless delay of the current RCU grace
period and needless RCU priority boosting.

It is still illegal to call rcu_read_unlock() while holding a scheduler
lock if the prior RCU read-side critical section has ever had either
preemption or irqs enabled.  However, the common use case is legal,
namely where then entire RCU read-side critical section executes with
irqs disabled, for example, when the scheduler lock is held across the
entire lifetime of the RCU read-side critical section.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree_plugin.h |   23 +++++++++++++++++++----
 1 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 9439a25..ad4539a 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -68,6 +68,7 @@ struct rcu_state rcu_preempt_state = RCU_STATE_INITIALIZER(rcu_preempt_state);
 DEFINE_PER_CPU(struct rcu_data, rcu_preempt_data);
 static struct rcu_state *rcu_state = &rcu_preempt_state;
 
+static void rcu_read_unlock_special(struct task_struct *t);
 static int rcu_preempted_readers_exp(struct rcu_node *rnp);
 
 /*
@@ -147,7 +148,7 @@ static void rcu_preempt_note_context_switch(int cpu)
 	struct rcu_data *rdp;
 	struct rcu_node *rnp;
 
-	if (t->rcu_read_lock_nesting &&
+	if (t->rcu_read_lock_nesting > 0 &&
 	    (t->rcu_read_unlock_special & RCU_READ_UNLOCK_BLOCKED) == 0) {
 
 		/* Possibly blocking in an RCU read-side critical section. */
@@ -190,6 +191,14 @@ static void rcu_preempt_note_context_switch(int cpu)
 				rnp->gp_tasks = &t->rcu_node_entry;
 		}
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
+	} else if (t->rcu_read_lock_nesting < 0 &&
+		   t->rcu_read_unlock_special) {
+
+		/*
+		 * Complete exit from RCU read-side critical section on
+		 * behalf of preempted instance of __rcu_read_unlock().
+		 */
+		rcu_read_unlock_special(t);
 	}
 
 	/*
@@ -391,10 +400,15 @@ void __rcu_read_unlock(void)
 	struct task_struct *t = current;
 
 	barrier();  /* needed if we ever invoke rcu_read_unlock in rcutree.c */
-	if (--t->rcu_read_lock_nesting == 0) {
-		barrier();  /* decr before ->rcu_read_unlock_special load */
+	if (t->rcu_read_lock_nesting != 1)
+		--t->rcu_read_lock_nesting;
+	else {
+		t->rcu_read_lock_nesting = INT_MIN;
+		barrier();  /* assign before ->rcu_read_unlock_special load */
 		if (unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
 			rcu_read_unlock_special(t);
+		barrier();  /* ->rcu_read_unlock_special load before assign */
+		t->rcu_read_lock_nesting = 0;
 	}
 #ifdef CONFIG_PROVE_LOCKING
 	WARN_ON_ONCE(ACCESS_ONCE(t->rcu_read_lock_nesting) < 0);
@@ -593,7 +607,8 @@ static void rcu_preempt_check_callbacks(int cpu)
 		rcu_preempt_qs(cpu);
 		return;
 	}
-	if (per_cpu(rcu_preempt_data, cpu).qs_pending)
+	if (t->rcu_read_lock_nesting > 0 &&
+	    per_cpu(rcu_preempt_data, cpu).qs_pending)
 		t->rcu_read_unlock_special |= RCU_READ_UNLOCK_NEED_QS;
 }
 
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH tip/core/urgent 5/7] sched: Add irq_{enter,exit}() to scheduler_ipi()
  2011-07-20  0:17 [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck Paul E. McKenney
                   ` (3 preceding siblings ...)
  2011-07-20  0:18 ` [PATCH tip/core/urgent 4/7] rcu: protect __rcu_read_unlock() against scheduler-using irq handlers Paul E. McKenney
@ 2011-07-20  0:18 ` Paul E. McKenney
  2011-07-20  0:18 ` [PATCH tip/core/urgent 6/7] softirq,rcu: Inform RCU of irq_exit() activity Paul E. McKenney
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20  0:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, greearb, edt, Peter Zijlstra, Paul E. McKenney

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

Ensure scheduler_ipi() calls irq_{enter,exit} when it does some actual
work. Traditionally we never did any actual work from the resched IPI
and all magic happened in the return from interrupt path.

Now that we do do some work, we need to ensure irq_{enter,exit} are
called so that we don't confuse things.

This affects things like timekeeping, NO_HZ and RCU, basically
everything with a hook in irq_enter/exit.

Explicit examples of things going wrong are:

  sched_clock_cpu() -- has a callback when leaving NO_HZ state to take
                    a new reading from GTOD and TSC. Without this
                    callback, time is stuck in the past.

  RCU -- needs in_irq() to work in order to avoid some nasty deadlocks

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/sched.c |   44 ++++++++++++++++++++++++++++++++++++++------
 1 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 9769c75..1930ee1 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2544,13 +2544,9 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
 }
 
 #ifdef CONFIG_SMP
-static void sched_ttwu_pending(void)
+static void sched_ttwu_do_pending(struct task_struct *list)
 {
 	struct rq *rq = this_rq();
-	struct task_struct *list = xchg(&rq->wake_list, NULL);
-
-	if (!list)
-		return;
 
 	raw_spin_lock(&rq->lock);
 
@@ -2563,9 +2559,45 @@ static void sched_ttwu_pending(void)
 	raw_spin_unlock(&rq->lock);
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
+
+static void sched_ttwu_pending(void)
+{
+	struct rq *rq = this_rq();
+	struct task_struct *list = xchg(&rq->wake_list, NULL);
+
+	if (!list)
+		return;
+
+	sched_ttwu_do_pending(list);
+}
+
+#endif /* CONFIG_HOTPLUG_CPU */
+
 void scheduler_ipi(void)
 {
-	sched_ttwu_pending();
+	struct rq *rq = this_rq();
+	struct task_struct *list = xchg(&rq->wake_list, NULL);
+
+	if (!list)
+		return;
+
+	/*
+	 * Not all reschedule IPI handlers call irq_enter/irq_exit, since
+	 * traditionally all their work was done from the interrupt return
+	 * path. Now that we actually do some work, we need to make sure
+	 * we do call them.
+	 *
+	 * Some archs already do call them, luckily irq_enter/exit nest
+	 * properly.
+	 *
+	 * Arguably we should visit all archs and update all handlers,
+	 * however a fair share of IPIs are still resched only so this would
+	 * somewhat pessimize the simple resched case.
+	 */
+	irq_enter();
+	sched_ttwu_do_pending(list);
+	irq_exit();
 }
 
 static void ttwu_queue_remote(struct task_struct *p, int cpu)
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH tip/core/urgent 6/7] softirq,rcu: Inform RCU of irq_exit() activity
  2011-07-20  0:17 [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck Paul E. McKenney
                   ` (4 preceding siblings ...)
  2011-07-20  0:18 ` [PATCH tip/core/urgent 5/7] sched: Add irq_{enter,exit}() to scheduler_ipi() Paul E. McKenney
@ 2011-07-20  0:18 ` Paul E. McKenney
  2011-07-20  0:18 ` [PATCH tip/core/urgent 7/7] signal: align __lock_task_sighand() irq disabling and RCU Paul E. McKenney
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20  0:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, greearb, edt, Peter Zijlstra, Paul E. McKenney

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

The rcu_read_unlock_special() function relies on in_irq() to exclude
scheduler activity from interrupt level.  This fails because exit_irq()
can invoke the scheduler after clearing the preempt_count() bits that
in_irq() uses to determine that it is at interrupt level.  This situation
can result in failures as follows:

 $task			IRQ		SoftIRQ

 rcu_read_lock()

 /* do stuff */

 <preempt> |= UNLOCK_BLOCKED

 rcu_read_unlock()
   --t->rcu_read_lock_nesting

			irq_enter();
			/* do stuff, don't use RCU */
			irq_exit();
			  sub_preempt_count(IRQ_EXIT_OFFSET);
			  invoke_softirq()

					ttwu();
					  spin_lock_irq(&pi->lock)
					  rcu_read_lock();
					  /* do stuff */
					  rcu_read_unlock();
					    rcu_read_unlock_special()
					      rcu_report_exp_rnp()
					        ttwu()
					          spin_lock_irq(&pi->lock) /* deadlock */

   rcu_read_unlock_special(t);

Ed can simply trigger this 'easy' because invoke_softirq() immediately
does a ttwu() of ksoftirqd/# instead of doing the in-place softirq stuff
first, but even without that the above happens.

Cure this by also excluding softirqs from the
rcu_read_unlock_special() handler and ensuring the force_irqthreads
ksoftirqd/# wakeup is done from full softirq context.

[ Alternatively, delaying the ->rcu_read_lock_nesting decrement
  until after the special handling would make the thing more robust
  in the face of interrupts as well.  And there is a separate patch
  for that. ]

Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-and-tested-by: Ed Tomlinson <edt@aei.ca>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree_plugin.h |    2 +-
 kernel/softirq.c        |   12 ++++++++++--
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index ad4539a..6c96c67 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -318,7 +318,7 @@ static noinline void rcu_read_unlock_special(struct task_struct *t)
 	}
 
 	/* Hardware IRQ handlers cannot block. */
-	if (in_irq()) {
+	if (in_irq() || in_serving_softirq()) {
 		local_irq_restore(flags);
 		return;
 	}
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 40cf63d..fca82c3 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -315,16 +315,24 @@ static inline void invoke_softirq(void)
 {
 	if (!force_irqthreads)
 		__do_softirq();
-	else
+	else {
+		__local_bh_disable((unsigned long)__builtin_return_address(0),
+				SOFTIRQ_OFFSET);
 		wakeup_softirqd();
+		__local_bh_enable(SOFTIRQ_OFFSET);
+	}
 }
 #else
 static inline void invoke_softirq(void)
 {
 	if (!force_irqthreads)
 		do_softirq();
-	else
+	else {
+		__local_bh_disable((unsigned long)__builtin_return_address(0),
+				SOFTIRQ_OFFSET);
 		wakeup_softirqd();
+		__local_bh_enable(SOFTIRQ_OFFSET);
+	}
 }
 #endif
 
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH tip/core/urgent 7/7] signal: align __lock_task_sighand() irq disabling and RCU
  2011-07-20  0:17 [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck Paul E. McKenney
                   ` (5 preceding siblings ...)
  2011-07-20  0:18 ` [PATCH tip/core/urgent 6/7] softirq,rcu: Inform RCU of irq_exit() activity Paul E. McKenney
@ 2011-07-20  0:18 ` Paul E. McKenney
  2011-07-20  1:10 ` [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck Ben Greear
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20  0:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, greearb, edt, Paul E. McKenney,
	Paul E. McKenney

From: Paul E. McKenney <paul.mckenney@linaro.org>

The __lock_task_sighand() function calls rcu_read_lock() with interrupts
and preemption enabled, but later calls rcu_read_unlock() with interrupts
disabled.  It is therefore possible that this RCU read-side critical
section will be preempted and later RCU priority boosted, which means that
rcu_read_unlock() will call rt_mutex_unlock() in order to deboost itself, but
with interrupts disabled. This results in lockdep splats, so this commit
nests the RCU read-side critical section within the interrupt-disabled
region of code.  This prevents the RCU read-side critical section from
being preempted, and thus prevents the attempt to deboost with interrupts
disabled.

It is quite possible that a better long-term fix is to make rt_mutex_unlock()
disable irqs when acquiring the rt_mutex structure's ->wait_lock.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/signal.c |   16 +++++++++++-----
 1 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index ff76786..a0eb019 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1178,18 +1178,24 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk,
 {
 	struct sighand_struct *sighand;
 
-	rcu_read_lock();
 	for (;;) {
+		local_irq_save(*flags);
+		rcu_read_lock();
 		sighand = rcu_dereference(tsk->sighand);
-		if (unlikely(sighand == NULL))
+		if (unlikely(sighand == NULL)) {
+			rcu_read_unlock();
+			local_irq_restore(*flags);
 			break;
+		}
 
-		spin_lock_irqsave(&sighand->siglock, *flags);
-		if (likely(sighand == tsk->sighand))
+		spin_lock(&sighand->siglock);
+		if (likely(sighand == tsk->sighand)) {
+			rcu_read_unlock();
 			break;
+		}
+		rcu_read_unlock();
 		spin_unlock_irqrestore(&sighand->siglock, *flags);
 	}
-	rcu_read_unlock();
 
 	return sighand;
 }
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20  0:17 [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck Paul E. McKenney
                   ` (6 preceding siblings ...)
  2011-07-20  0:18 ` [PATCH tip/core/urgent 7/7] signal: align __lock_task_sighand() irq disabling and RCU Paul E. McKenney
@ 2011-07-20  1:10 ` Ben Greear
  2011-07-20  1:30 ` Ed Tomlinson
  2011-07-20 18:25 ` [PATCH rcu/urgent 0/7 v2] " Paul E. McKenney
  9 siblings, 0 replies; 55+ messages in thread
From: Ben Greear @ 2011-07-20  1:10 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches, edt

On 07/19/2011 05:17 PM, Paul E. McKenney wrote:
> Hello!
>
> This patch set contains fixes for a trainwreck involving RCU, the
> scheduler, and threaded interrupts.  This trainwreck involved RCU failing
> to properly protect one of its bit fields, use of RCU by the scheduler
> from portions of irq_exit() where in_irq() returns false, uses of the
> scheduler by RCU colliding with uses of RCU by the scheduler, threaded
> interrupts exercising the problematic portions of irq_exit() more heavily,
> and so on.  The patches are as follows:
>
> 1.	Properly protect current->rcu_read_unlock_special().
> 	Lack of protection was causing RCU to recurse on itself, which
> 	in turn resulted in deadlocks involving RCU and the scheduler.
> 	This affects only RCU_BOOST=y configurations.
> 2.	Streamline code produced by __rcu_read_unlock().  This one is
> 	an innocent bystander that is being carried due to conflicts
> 	with other patches.  (A later version will likely merge it
> 	with #3 below.)
> 3.	Make __rcu_read_unlock() delay counting the per-task
> 	->rcu_read_lock_nesting variable to zero until all cleanup for the
> 	just-ended RCU read-side critical section has completed.  This
> 	prevents a number of other cases that could result in deadlock
> 	due to self recursion.	This affects only TREE_PREEMPT_RCU=y
> 	configurations.
> 4.	Make scheduler_ipi() correctly identify itself as being
> 	in_irq() when it needs to do anything that might involve RCU,
> 	thus enabling RCU to avoid yet another class of potential
> 	self-recursions and deadlocks.	This affects PREEMPT_RCU=y
> 	configurations.
> 5.	Make irq_exit() inform RCU when it is invoking the scheduler
> 	in situations where in_irq() would return false, thus
> 	allowing RCU to correctly avoid self-recursion.  This affects
> 	TREE_PREEMPT_RCU=y configurations.
> 6.	Make __lock_task_sighand() execute the entire RCU read-side
> 	critical section with irqs disabled.  (An experimental patch at
> 	http://marc.info/?l=linux-kernel&m=131110647222185 might possibly
> 	make it legal to have an RCU read-side critical section where
> 	the rcu_read_unlock() is executed with interrupts disabled,
> 	but where some protion of the RCU read-side critical section
> 	was preemptible.)  This affects TREE_PREEMPT_RCU=y configurations.
>
> TINY_PREEMPT_RCU will also need a few of these changes, but in the
> meantime this patch stack helps organize things better for testing.
> These are also available from the following subject-to-rebase git branch:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git rcu/urgent

I pulled these in, and see this bug on startup (my user-space app appears to be unloading
the bridge module here).  Don't recall seeing it before,
not sure if it's related to your changes or other changes since I last pulled
-rc7 a few days back:

BUG: scheduling while atomic: rmmod/1870/0x00000005
Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipt_MASQUERADE iptable_nat nf_nat bridge(-) stp llc nfs lockd fscache auth_rpcgss 
nfs_acl sunrpc ipv6 kvm_intel kvm uinput i5k_amb i5000_edac edac_core e1000e ioatdma iTCO_wdt shpchp iTCO_vendor_support i2c_i801 dca pcspkr microcode floppy 
radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
Pid: 1870, comm: rmmod Not tainted 3.0.0-rc7+ #23
Call Trace:
  [<ffffffff8103e89f>] __schedule_bug+0x5c/0x60
  [<ffffffff81428bfc>] schedule+0xa0/0x617
  [<ffffffff8105c96d>] ? prepare_to_wait+0x71/0x7c
  [<ffffffff8109607f>] synchronize_rcu_expedited+0x1b1/0x1c2
  [<ffffffff8105c72b>] ? wake_up_bit+0x25/0x25
  [<ffffffff81049f82>] ? local_bh_enable_ip+0x9/0xb
  [<ffffffff81376f33>] synchronize_net+0x25/0x2e
  [<ffffffff81379dee>] rollback_registered_many+0x122/0x216
  [<ffffffff81379ef8>] unregister_netdevice_many+0x16/0x62
  [<ffffffffa0320e1c>] br_net_exit+0x6d/0x7d [bridge]
  [<ffffffff81373f9d>] ops_exit_list+0x25/0x4e
  [<ffffffff813740ff>] unregister_pernet_operations+0x83/0xb1
  [<ffffffff81374191>] unregister_pernet_subsys+0x20/0x31
  [<ffffffffa03293e0>] br_deinit+0x34/0x50 [bridge]
  [<ffffffff81071f4a>] sys_delete_module+0x1a6/0x20a
  [<ffffffff810f77d9>] ? path_put+0x1d/0x22
  [<ffffffff8108c57c>] ? audit_syscall_entry+0x119/0x145
  [<ffffffff8142f892>] system_call_fastpath+0x16/0x1b
Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0




-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20  0:17 [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck Paul E. McKenney
                   ` (7 preceding siblings ...)
  2011-07-20  1:10 ` [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck Ben Greear
@ 2011-07-20  1:30 ` Ed Tomlinson
  2011-07-20  2:07   ` Ed Tomlinson
  2011-07-20 18:25 ` [PATCH rcu/urgent 0/7 v2] " Paul E. McKenney
  9 siblings, 1 reply; 55+ messages in thread
From: Ed Tomlinson @ 2011-07-20  1:30 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches, greearb

On Tuesday 19 July 2011 20:17:38 Paul E. McKenney wrote:

Paul,

Pulling this on top of master and rebuilding with boost enabled and booting with threadirqs does not
boot correctly here.  

 * Starting APC UPS daemon ...
[   46.007217]
[   46.007221] =================================
[   46.008013] [ INFO: inconsistent lock state ]
[   46.008013] 3.0.0-rc7-crc+ #340
[   46.008013] ---------------------------------
[   46.008013] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[   46.008013] ip/2982 [HC0[0]:SC0[0]:HE1:SE1] takes:
[   46.008013]  (rcu_node_level_0){+.?...}, at: [<ffffffff810ba279>] rcu_report_exp_rnp+0x19/0x50
[   46.008013] {IN-SOFTIRQ-W} state was registered at:
[   46.008013]   [<ffffffff8108a1e5>] __lock_acquire+0x5a5/0x16a0
[   46.008013]   [<ffffffff8108b8d5>] lock_acquire+0x95/0x140
[   46.008013]   [<ffffffff815793a6>] _raw_spin_lock_irqsave+0x46/0x60
[   46.008013]   [<ffffffff810bbf80>] __rcu_process_callbacks+0x190/0x2a0
[   46.008013]   [<ffffffff810bc0ed>] rcu_process_callbacks+0x5d/0x60
[   46.008013]   [<ffffffff81052e14>] __do_softirq+0x194/0x300
[   46.008013]   [<ffffffff8105311d>] run_ksoftirqd+0x19d/0x240
[   46.008013]   [<ffffffff81071296>] kthread+0xb6/0xc0
[   46.008013]   [<ffffffff81582294>] kernel_thread_helper+0x4/0x10
[   46.008013] irq event stamp: 6802
[   46.008013] hardirqs last  enabled at (6801): [<ffffffff815776bd>] __mutex_unlock_slowpath+0x10d/0x1c0
[   46.008013] hardirqs last disabled at (6802): [<ffffffff8157937c>] _raw_spin_lock_irqsave+0x1c/0x60
[   46.008013] softirqs last  enabled at (6698): [<ffffffff814c274a>] sk_common_release+0x6a/0x120
[   46.008013] softirqs last disabled at (6696): [<ffffffff81579476>] _raw_write_lock_bh+0x16/0x50
[   46.008013]
[   46.008013] other info that might help us debug this:
[   46.008013]  Possible unsafe locking scenario:
[   46.008013]
[   46.008013]        CPU0
[   46.008013]        ----
[   46.008013]   lock(rcu_node_level_0);
[   46.008013]   <Interrupt>
[   46.008013]     lock(rcu_node_level_0);
[   46.008013]
[   46.008013]  *** DEADLOCK ***
[   46.008013]
[   46.008013] 3 locks held by ip/2982:
[   46.008013]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff814e4fd7>] rtnl_lock+0x17/0x20
[   46.008013]  #1:  (sync_rcu_preempt_exp_mutex){+.+...}, at: [<ffffffff810bc9e7>] synchronize_rcu_expedited+0x37/0x210
[   46.008013]  #2:  (rcu_node_level_0){+.?...}, at: [<ffffffff810ba279>] rcu_report_exp_rnp+0x19/0x50
[   46.008013]
[   46.008013] stack backtrace:
[   46.008013] Pid: 2982, comm: ip Tainted: G        W   3.0.0-rc7-crc+ #340
[   46.008013] Call Trace:
[   46.008013]  [<ffffffff81088c33>] print_usage_bug+0x223/0x270
[   46.008013]  [<ffffffff81089558>] mark_lock+0x328/0x400
[   46.008013]  [<ffffffff8108969e>] mark_held_locks+0x6e/0xa0
[   46.008013]  [<ffffffff81579a5d>] ? _raw_spin_unlock_irqrestore+0x7d/0x90
[   46.008013]  [<ffffffff8108999d>] trace_hardirqs_on_caller+0x14d/0x190
[   46.008013]  [<ffffffff810899ed>] trace_hardirqs_on+0xd/0x10
[   46.008013]  [<ffffffff81579a5d>] _raw_spin_unlock_irqrestore+0x7d/0x90
[   46.008013]  [<ffffffff810bcb18>] synchronize_rcu_expedited+0x168/0x210
[   46.008013]  [<ffffffff8111a073>] ? might_fault+0x53/0xb0
[   46.008013]  [<ffffffff8125200a>] ? security_capable+0x2a/0x30
[   46.008013]  [<ffffffff814d0985>] synchronize_net+0x45/0x50
[   46.008013]  [<ffffffffa0359613>] ipip6_tunnel_ioctl+0x5f3/0x800 [sit]
[   46.008013]  [<ffffffffa0359020>] ? ipip6_tunnel_locate+0x2e0/0x2e0 [sit] 
[   46.008013]  [<ffffffff814d43ba>] dev_ifsioc+0x11a/0x2c0
[   46.008013]  [<ffffffff814d678a>] dev_ioctl+0x35a/0x810
[   46.008013]  [<ffffffff81089a8d>] ? debug_check_no_locks_freed+0x9d/0x150
[   46.008013]  [<ffffffff8108999d>] ? trace_hardirqs_on_caller+0x14d/0x190
[   46.008013]  [<ffffffff810899ed>] ? trace_hardirqs_on+0xd/0x10
[   46.008013]  [<ffffffff814bb83a>] sock_ioctl+0xea/0x2b0
[   46.008013]  [<ffffffff81162474>] do_vfs_ioctl+0xa4/0x5a0
[   46.008013]  [<ffffffff815799cc>] ? _raw_spin_unlock+0x5c/0x70
[   46.008013]  [<ffffffff8114c1cc>] ? fd_install+0x7c/0x90
[   46.008013]  [<ffffffff8158149c>] ? sysret_check+0x27/0x62
[   46.008013]  [<ffffffff81162a09>] sys_ioctl+0x99/0xa0
[   46.008013]  [<ffffffff8158146b>] system_call_fastpath+0x16/0x1b
[   46.786123] BUG: sleeping function called from invalid context at net/ipv4/route.c:757
[   46.799548] in_atomic(): 1, irqs_disabled(): 0, pid: 2982, name: ip
[   46.799553] INFO: lockdep is turned off.
[   46.799561] Pid: 2982, comm: ip Tainted: G        W   3.0.0-rc7-crc+ #340
[   46.799565] Call Trace:
[   46.799576]  [<ffffffff81036b39>] __might_sleep+0xf9/0x120
[   46.799586]  [<ffffffff814fea38>] rt_do_flush+0x178/0x1b0
[   46.799594]  [<ffffffff814feafc>] rt_cache_flush+0x5c/0x70
[   46.799606]  [<ffffffff81542d92>] fib_netdev_event+0x72/0xf0
[   46.799615]  [<ffffffff8157d576>] notifier_call_chain+0x56/0x80
[   46.799625]  [<ffffffff81077ff6>] raw_notifier_call_chain+0x16/0x20
[   46.799633]  [<ffffffff814d1cd4>] call_netdevice_notifiers+0x74/0x90
[   46.799642]  [<ffffffff814d2c37>] netdev_state_change+0x27/0x40
[   46.799653]  [<ffffffffa0359686>] ipip6_tunnel_ioctl+0x666/0x800 [sit]
[   46.799663]  [<ffffffffa0359020>] ? ipip6_tunnel_locate+0x2e0/0x2e0 [sit]
[   46.799673]  [<ffffffff814d43ba>] dev_ifsioc+0x11a/0x2c0
[   46.799682]  [<ffffffff814d678a>] dev_ioctl+0x35a/0x810
[   46.799690]  [<ffffffff81089a8d>] ? debug_check_no_locks_freed+0x9d/0x150
[   46.799700]  [<ffffffff8108999d>] ? trace_hardirqs_on_caller+0x14d/0x190
[   46.799707]  [<ffffffff810899ed>] ? trace_hardirqs_on+0xd/0x10
[   46.799718]  [<ffffffff814bb83a>] sock_ioctl+0xea/0x2b0
[   46.799726]  [<ffffffff81162474>] do_vfs_ioctl+0xa4/0x5a0
[   46.799735]  [<ffffffff815799cc>] ? _raw_spin_unlock+0x5c/0x70
[   46.799744]  [<ffffffff8114c1cc>] ? fd_install+0x7c/0x90
[   46.799752]  [<ffffffff8158149c>] ? sysret_check+0x27/0x62
[   46.799760]  [<ffffffff81162a09>] sys_ioctl+0x99/0xa0
[   46.799769]  [<ffffffff8158146b>] system_call_fastpath+0x16/0x1b

Followed by more BUGs and yielding a box that needed in interrupt button pressed to
force a reboot.   

I've previously tested with patches 5 and 6 from Peter Z with good results with threadirqs 
with boost disabled.

Ed

> This patch set contains fixes for a trainwreck involving RCU, the
> scheduler, and threaded interrupts.  This trainwreck involved RCU failing
> to properly protect one of its bit fields, use of RCU by the scheduler
> from portions of irq_exit() where in_irq() returns false, uses of the
> scheduler by RCU colliding with uses of RCU by the scheduler, threaded
> interrupts exercising the problematic portions of irq_exit() more heavily,
> and so on.  The patches are as follows:
> 
> 1.	Properly protect current->rcu_read_unlock_special().
> 	Lack of protection was causing RCU to recurse on itself, which
> 	in turn resulted in deadlocks involving RCU and the scheduler.
> 	This affects only RCU_BOOST=y configurations.
> 2.	Streamline code produced by __rcu_read_unlock().  This one is
> 	an innocent bystander that is being carried due to conflicts
> 	with other patches.  (A later version will likely merge it
> 	with #3 below.)
> 3.	Make __rcu_read_unlock() delay counting the per-task
> 	->rcu_read_lock_nesting variable to zero until all cleanup for the
> 	just-ended RCU read-side critical section has completed.  This
> 	prevents a number of other cases that could result in deadlock
> 	due to self recursion.	This affects only TREE_PREEMPT_RCU=y
> 	configurations.
> 4.	Make scheduler_ipi() correctly identify itself as being
> 	in_irq() when it needs to do anything that might involve RCU,
> 	thus enabling RCU to avoid yet another class of potential
> 	self-recursions and deadlocks.	This affects PREEMPT_RCU=y
> 	configurations.
> 5.	Make irq_exit() inform RCU when it is invoking the scheduler
> 	in situations where in_irq() would return false, thus
> 	allowing RCU to correctly avoid self-recursion.  This affects
> 	TREE_PREEMPT_RCU=y configurations.
> 6.	Make __lock_task_sighand() execute the entire RCU read-side
> 	critical section with irqs disabled.  (An experimental patch at
> 	http://marc.info/?l=linux-kernel&m=131110647222185 might possibly
> 	make it legal to have an RCU read-side critical section where
> 	the rcu_read_unlock() is executed with interrupts disabled,
> 	but where some protion of the RCU read-side critical section
> 	was preemptible.)  This affects TREE_PREEMPT_RCU=y configurations.
> 
> TINY_PREEMPT_RCU will also need a few of these changes, but in the
> meantime this patch stack helps organize things better for testing.
> These are also available from the following subject-to-rebase git branch:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git rcu/urgent
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
>  b/include/linux/sched.h   |    3 +++
>  b/kernel/rcutree_plugin.h |    3 ++-
>  b/kernel/sched.c          |   44 ++++++++++++++++++++++++++++++++++++++------
>  b/kernel/signal.c         |   16 +++++++++++-----
>  b/kernel/softirq.c        |   12 ++++++++++--
>  kernel/rcutree_plugin.h   |   45 ++++++++++++++++++++++++++++++++-------------
>  6 files changed, 96 insertions(+), 27 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20  1:30 ` Ed Tomlinson
@ 2011-07-20  2:07   ` Ed Tomlinson
  2011-07-20  4:44     ` Paul E. McKenney
  0 siblings, 1 reply; 55+ messages in thread
From: Ed Tomlinson @ 2011-07-20  2:07 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches, greearb

On Tuesday 19 July 2011 21:30:00 Ed Tomlinson wrote:
> On Tuesday 19 July 2011 20:17:38 Paul E. McKenney wrote:

Paul,

Two things to add.  Its possible an eariler error is missing from the log below.  My serial console was missing
entries from before '0 and 40ish' and I did think I saw a bunch of FFFFFFFs scroll by eariler...

Second, booting with threadirqs and boost enabled in .config with patches 2,5,6 is ok so far.

Thanks,
Ed

> Pulling this on top of master and rebuilding with boost enabled and booting with threadirqs does not
> boot correctly here.  
> 
>  * Starting APC UPS daemon ...
> [   46.007217]
> [   46.007221] =================================
> [   46.008013] [ INFO: inconsistent lock state ]
> [   46.008013] 3.0.0-rc7-crc+ #340
> [   46.008013] ---------------------------------
> [   46.008013] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> [   46.008013] ip/2982 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [   46.008013]  (rcu_node_level_0){+.?...}, at: [<ffffffff810ba279>] rcu_report_exp_rnp+0x19/0x50
> [   46.008013] {IN-SOFTIRQ-W} state was registered at:
> [   46.008013]   [<ffffffff8108a1e5>] __lock_acquire+0x5a5/0x16a0
> [   46.008013]   [<ffffffff8108b8d5>] lock_acquire+0x95/0x140
> [   46.008013]   [<ffffffff815793a6>] _raw_spin_lock_irqsave+0x46/0x60
> [   46.008013]   [<ffffffff810bbf80>] __rcu_process_callbacks+0x190/0x2a0
> [   46.008013]   [<ffffffff810bc0ed>] rcu_process_callbacks+0x5d/0x60
> [   46.008013]   [<ffffffff81052e14>] __do_softirq+0x194/0x300
> [   46.008013]   [<ffffffff8105311d>] run_ksoftirqd+0x19d/0x240
> [   46.008013]   [<ffffffff81071296>] kthread+0xb6/0xc0
> [   46.008013]   [<ffffffff81582294>] kernel_thread_helper+0x4/0x10
> [   46.008013] irq event stamp: 6802
> [   46.008013] hardirqs last  enabled at (6801): [<ffffffff815776bd>] __mutex_unlock_slowpath+0x10d/0x1c0
> [   46.008013] hardirqs last disabled at (6802): [<ffffffff8157937c>] _raw_spin_lock_irqsave+0x1c/0x60
> [   46.008013] softirqs last  enabled at (6698): [<ffffffff814c274a>] sk_common_release+0x6a/0x120
> [   46.008013] softirqs last disabled at (6696): [<ffffffff81579476>] _raw_write_lock_bh+0x16/0x50
> [   46.008013]
> [   46.008013] other info that might help us debug this:
> [   46.008013]  Possible unsafe locking scenario:
> [   46.008013]
> [   46.008013]        CPU0
> [   46.008013]        ----
> [   46.008013]   lock(rcu_node_level_0);
> [   46.008013]   <Interrupt>
> [   46.008013]     lock(rcu_node_level_0);
> [   46.008013]
> [   46.008013]  *** DEADLOCK ***
> [   46.008013]
> [   46.008013] 3 locks held by ip/2982:
> [   46.008013]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff814e4fd7>] rtnl_lock+0x17/0x20
> [   46.008013]  #1:  (sync_rcu_preempt_exp_mutex){+.+...}, at: [<ffffffff810bc9e7>] synchronize_rcu_expedited+0x37/0x210
> [   46.008013]  #2:  (rcu_node_level_0){+.?...}, at: [<ffffffff810ba279>] rcu_report_exp_rnp+0x19/0x50
> [   46.008013]
> [   46.008013] stack backtrace:
> [   46.008013] Pid: 2982, comm: ip Tainted: G        W   3.0.0-rc7-crc+ #340
> [   46.008013] Call Trace:
> [   46.008013]  [<ffffffff81088c33>] print_usage_bug+0x223/0x270
> [   46.008013]  [<ffffffff81089558>] mark_lock+0x328/0x400
> [   46.008013]  [<ffffffff8108969e>] mark_held_locks+0x6e/0xa0
> [   46.008013]  [<ffffffff81579a5d>] ? _raw_spin_unlock_irqrestore+0x7d/0x90
> [   46.008013]  [<ffffffff8108999d>] trace_hardirqs_on_caller+0x14d/0x190
> [   46.008013]  [<ffffffff810899ed>] trace_hardirqs_on+0xd/0x10
> [   46.008013]  [<ffffffff81579a5d>] _raw_spin_unlock_irqrestore+0x7d/0x90
> [   46.008013]  [<ffffffff810bcb18>] synchronize_rcu_expedited+0x168/0x210
> [   46.008013]  [<ffffffff8111a073>] ? might_fault+0x53/0xb0
> [   46.008013]  [<ffffffff8125200a>] ? security_capable+0x2a/0x30
> [   46.008013]  [<ffffffff814d0985>] synchronize_net+0x45/0x50
> [   46.008013]  [<ffffffffa0359613>] ipip6_tunnel_ioctl+0x5f3/0x800 [sit]
> [   46.008013]  [<ffffffffa0359020>] ? ipip6_tunnel_locate+0x2e0/0x2e0 [sit] 
> [   46.008013]  [<ffffffff814d43ba>] dev_ifsioc+0x11a/0x2c0
> [   46.008013]  [<ffffffff814d678a>] dev_ioctl+0x35a/0x810
> [   46.008013]  [<ffffffff81089a8d>] ? debug_check_no_locks_freed+0x9d/0x150
> [   46.008013]  [<ffffffff8108999d>] ? trace_hardirqs_on_caller+0x14d/0x190
> [   46.008013]  [<ffffffff810899ed>] ? trace_hardirqs_on+0xd/0x10
> [   46.008013]  [<ffffffff814bb83a>] sock_ioctl+0xea/0x2b0
> [   46.008013]  [<ffffffff81162474>] do_vfs_ioctl+0xa4/0x5a0
> [   46.008013]  [<ffffffff815799cc>] ? _raw_spin_unlock+0x5c/0x70
> [   46.008013]  [<ffffffff8114c1cc>] ? fd_install+0x7c/0x90
> [   46.008013]  [<ffffffff8158149c>] ? sysret_check+0x27/0x62
> [   46.008013]  [<ffffffff81162a09>] sys_ioctl+0x99/0xa0
> [   46.008013]  [<ffffffff8158146b>] system_call_fastpath+0x16/0x1b
> [   46.786123] BUG: sleeping function called from invalid context at net/ipv4/route.c:757
> [   46.799548] in_atomic(): 1, irqs_disabled(): 0, pid: 2982, name: ip
> [   46.799553] INFO: lockdep is turned off.
> [   46.799561] Pid: 2982, comm: ip Tainted: G        W   3.0.0-rc7-crc+ #340
> [   46.799565] Call Trace:
> [   46.799576]  [<ffffffff81036b39>] __might_sleep+0xf9/0x120
> [   46.799586]  [<ffffffff814fea38>] rt_do_flush+0x178/0x1b0
> [   46.799594]  [<ffffffff814feafc>] rt_cache_flush+0x5c/0x70
> [   46.799606]  [<ffffffff81542d92>] fib_netdev_event+0x72/0xf0
> [   46.799615]  [<ffffffff8157d576>] notifier_call_chain+0x56/0x80
> [   46.799625]  [<ffffffff81077ff6>] raw_notifier_call_chain+0x16/0x20
> [   46.799633]  [<ffffffff814d1cd4>] call_netdevice_notifiers+0x74/0x90
> [   46.799642]  [<ffffffff814d2c37>] netdev_state_change+0x27/0x40
> [   46.799653]  [<ffffffffa0359686>] ipip6_tunnel_ioctl+0x666/0x800 [sit]
> [   46.799663]  [<ffffffffa0359020>] ? ipip6_tunnel_locate+0x2e0/0x2e0 [sit]
> [   46.799673]  [<ffffffff814d43ba>] dev_ifsioc+0x11a/0x2c0
> [   46.799682]  [<ffffffff814d678a>] dev_ioctl+0x35a/0x810
> [   46.799690]  [<ffffffff81089a8d>] ? debug_check_no_locks_freed+0x9d/0x150
> [   46.799700]  [<ffffffff8108999d>] ? trace_hardirqs_on_caller+0x14d/0x190
> [   46.799707]  [<ffffffff810899ed>] ? trace_hardirqs_on+0xd/0x10
> [   46.799718]  [<ffffffff814bb83a>] sock_ioctl+0xea/0x2b0
> [   46.799726]  [<ffffffff81162474>] do_vfs_ioctl+0xa4/0x5a0
> [   46.799735]  [<ffffffff815799cc>] ? _raw_spin_unlock+0x5c/0x70
> [   46.799744]  [<ffffffff8114c1cc>] ? fd_install+0x7c/0x90
> [   46.799752]  [<ffffffff8158149c>] ? sysret_check+0x27/0x62
> [   46.799760]  [<ffffffff81162a09>] sys_ioctl+0x99/0xa0
> [   46.799769]  [<ffffffff8158146b>] system_call_fastpath+0x16/0x1b
> 
> Followed by more BUGs and yielding a box that needed in interrupt button pressed to
> force a reboot.   
> 
> I've previously tested with patches 5 and 6 from Peter Z with good results with threadirqs 
> with boost disabled.
> 
> Ed
> 
> > This patch set contains fixes for a trainwreck involving RCU, the
> > scheduler, and threaded interrupts.  This trainwreck involved RCU failing
> > to properly protect one of its bit fields, use of RCU by the scheduler
> > from portions of irq_exit() where in_irq() returns false, uses of the
> > scheduler by RCU colliding with uses of RCU by the scheduler, threaded
> > interrupts exercising the problematic portions of irq_exit() more heavily,
> > and so on.  The patches are as follows:
> > 
> > 1.	Properly protect current->rcu_read_unlock_special().
> > 	Lack of protection was causing RCU to recurse on itself, which
> > 	in turn resulted in deadlocks involving RCU and the scheduler.
> > 	This affects only RCU_BOOST=y configurations.
> > 2.	Streamline code produced by __rcu_read_unlock().  This one is
> > 	an innocent bystander that is being carried due to conflicts
> > 	with other patches.  (A later version will likely merge it
> > 	with #3 below.)
> > 3.	Make __rcu_read_unlock() delay counting the per-task
> > 	->rcu_read_lock_nesting variable to zero until all cleanup for the
> > 	just-ended RCU read-side critical section has completed.  This
> > 	prevents a number of other cases that could result in deadlock
> > 	due to self recursion.	This affects only TREE_PREEMPT_RCU=y
> > 	configurations.
> > 4.	Make scheduler_ipi() correctly identify itself as being
> > 	in_irq() when it needs to do anything that might involve RCU,
> > 	thus enabling RCU to avoid yet another class of potential
> > 	self-recursions and deadlocks.	This affects PREEMPT_RCU=y
> > 	configurations.
> > 5.	Make irq_exit() inform RCU when it is invoking the scheduler
> > 	in situations where in_irq() would return false, thus
> > 	allowing RCU to correctly avoid self-recursion.  This affects
> > 	TREE_PREEMPT_RCU=y configurations.
> > 6.	Make __lock_task_sighand() execute the entire RCU read-side
> > 	critical section with irqs disabled.  (An experimental patch at
> > 	http://marc.info/?l=linux-kernel&m=131110647222185 might possibly
> > 	make it legal to have an RCU read-side critical section where
> > 	the rcu_read_unlock() is executed with interrupts disabled,
> > 	but where some protion of the RCU read-side critical section
> > 	was preemptible.)  This affects TREE_PREEMPT_RCU=y configurations.
> > 
> > TINY_PREEMPT_RCU will also need a few of these changes, but in the
> > meantime this patch stack helps organize things better for testing.
> > These are also available from the following subject-to-rebase git branch:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git rcu/urgent
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> >  b/include/linux/sched.h   |    3 +++
> >  b/kernel/rcutree_plugin.h |    3 ++-
> >  b/kernel/sched.c          |   44 ++++++++++++++++++++++++++++++++++++++------
> >  b/kernel/signal.c         |   16 +++++++++++-----
> >  b/kernel/softirq.c        |   12 ++++++++++--
> >  kernel/rcutree_plugin.h   |   45 ++++++++++++++++++++++++++++++++-------------
> >  6 files changed, 96 insertions(+), 27 deletions(-)
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH tip/core/urgent 1/7] rcu: decrease rcu_report_exp_rnp coupling with scheduler
  2011-07-20  0:18 ` [PATCH tip/core/urgent 1/7] rcu: decrease rcu_report_exp_rnp coupling with scheduler Paul E. McKenney
@ 2011-07-20  2:40   ` Peter Zijlstra
  2011-07-20  4:54     ` Paul E. McKenney
  0 siblings, 1 reply; 55+ messages in thread
From: Peter Zijlstra @ 2011-07-20  2:40 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches, greearb, edt

On Tue, 2011-07-19 at 17:18 -0700, Paul E. McKenney wrote:
> +++ b/kernel/rcutree_plugin.h
> @@ -696,8 +696,10 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
>         raw_spin_lock_irqsave(&rnp->lock, flags);
>         for (;;) {
>                 if (!sync_rcu_preempt_exp_done(rnp))
> +                       raw_spin_unlock_irqrestore(&rnp->lock, flags);
>                         break;

I bet that'll all work much better if you wrap it in curly braces like:

		if (!sync_rcu_preempt_exp_done(rnp)) {
			raw_spin_unlock_irqrestore(&rnp->lock, flags);
			break;
		}

That might also explain those explosions Ed and Ben have been seeing.

>                 if (rnp->parent == NULL) {
> +                       raw_spin_unlock_irqrestore(&rnp->lock, flags);
>                         wake_up(&sync_rcu_preempt_exp_wq);
>                         break;
>                 }
> @@ -707,7 +709,6 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
>                 raw_spin_lock(&rnp->lock); /* irqs already disabled */
>                 rnp->expmask &= ~mask;
>         }
> -       raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  } 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20  2:07   ` Ed Tomlinson
@ 2011-07-20  4:44     ` Paul E. McKenney
  2011-07-20  5:03       ` Linus Torvalds
  2011-07-20 10:49       ` Ed Tomlinson
  0 siblings, 2 replies; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20  4:44 UTC (permalink / raw)
  To: Ed Tomlinson
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches, greearb

On Tue, Jul 19, 2011 at 10:07:32PM -0400, Ed Tomlinson wrote:
> On Tuesday 19 July 2011 21:30:00 Ed Tomlinson wrote:
> > On Tuesday 19 July 2011 20:17:38 Paul E. McKenney wrote:
> 
> Paul,
> 
> Two things to add.  Its possible an eariler error is missing from the log below.  My serial console was missing
> entries from before '0 and 40ish' and I did think I saw a bunch of FFFFFFFs scroll by eariler...
> 
> Second, booting with threadirqs and boost enabled in .config with patches 2,5,6 is ok so far.

Patches 3 and 4 resulted in failures?  (Patch 1 certainly did.)

							Thanx, Paul

> Thanks,
> Ed
> 
> > Pulling this on top of master and rebuilding with boost enabled and booting with threadirqs does not
> > boot correctly here.  
> > 
> >  * Starting APC UPS daemon ...
> > [   46.007217]
> > [   46.007221] =================================
> > [   46.008013] [ INFO: inconsistent lock state ]
> > [   46.008013] 3.0.0-rc7-crc+ #340
> > [   46.008013] ---------------------------------
> > [   46.008013] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> > [   46.008013] ip/2982 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > [   46.008013]  (rcu_node_level_0){+.?...}, at: [<ffffffff810ba279>] rcu_report_exp_rnp+0x19/0x50
> > [   46.008013] {IN-SOFTIRQ-W} state was registered at:
> > [   46.008013]   [<ffffffff8108a1e5>] __lock_acquire+0x5a5/0x16a0
> > [   46.008013]   [<ffffffff8108b8d5>] lock_acquire+0x95/0x140
> > [   46.008013]   [<ffffffff815793a6>] _raw_spin_lock_irqsave+0x46/0x60
> > [   46.008013]   [<ffffffff810bbf80>] __rcu_process_callbacks+0x190/0x2a0
> > [   46.008013]   [<ffffffff810bc0ed>] rcu_process_callbacks+0x5d/0x60
> > [   46.008013]   [<ffffffff81052e14>] __do_softirq+0x194/0x300
> > [   46.008013]   [<ffffffff8105311d>] run_ksoftirqd+0x19d/0x240
> > [   46.008013]   [<ffffffff81071296>] kthread+0xb6/0xc0
> > [   46.008013]   [<ffffffff81582294>] kernel_thread_helper+0x4/0x10
> > [   46.008013] irq event stamp: 6802
> > [   46.008013] hardirqs last  enabled at (6801): [<ffffffff815776bd>] __mutex_unlock_slowpath+0x10d/0x1c0
> > [   46.008013] hardirqs last disabled at (6802): [<ffffffff8157937c>] _raw_spin_lock_irqsave+0x1c/0x60
> > [   46.008013] softirqs last  enabled at (6698): [<ffffffff814c274a>] sk_common_release+0x6a/0x120
> > [   46.008013] softirqs last disabled at (6696): [<ffffffff81579476>] _raw_write_lock_bh+0x16/0x50
> > [   46.008013]
> > [   46.008013] other info that might help us debug this:
> > [   46.008013]  Possible unsafe locking scenario:
> > [   46.008013]
> > [   46.008013]        CPU0
> > [   46.008013]        ----
> > [   46.008013]   lock(rcu_node_level_0);
> > [   46.008013]   <Interrupt>
> > [   46.008013]     lock(rcu_node_level_0);
> > [   46.008013]
> > [   46.008013]  *** DEADLOCK ***
> > [   46.008013]
> > [   46.008013] 3 locks held by ip/2982:
> > [   46.008013]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff814e4fd7>] rtnl_lock+0x17/0x20
> > [   46.008013]  #1:  (sync_rcu_preempt_exp_mutex){+.+...}, at: [<ffffffff810bc9e7>] synchronize_rcu_expedited+0x37/0x210
> > [   46.008013]  #2:  (rcu_node_level_0){+.?...}, at: [<ffffffff810ba279>] rcu_report_exp_rnp+0x19/0x50
> > [   46.008013]
> > [   46.008013] stack backtrace:
> > [   46.008013] Pid: 2982, comm: ip Tainted: G        W   3.0.0-rc7-crc+ #340
> > [   46.008013] Call Trace:
> > [   46.008013]  [<ffffffff81088c33>] print_usage_bug+0x223/0x270
> > [   46.008013]  [<ffffffff81089558>] mark_lock+0x328/0x400
> > [   46.008013]  [<ffffffff8108969e>] mark_held_locks+0x6e/0xa0
> > [   46.008013]  [<ffffffff81579a5d>] ? _raw_spin_unlock_irqrestore+0x7d/0x90
> > [   46.008013]  [<ffffffff8108999d>] trace_hardirqs_on_caller+0x14d/0x190
> > [   46.008013]  [<ffffffff810899ed>] trace_hardirqs_on+0xd/0x10
> > [   46.008013]  [<ffffffff81579a5d>] _raw_spin_unlock_irqrestore+0x7d/0x90
> > [   46.008013]  [<ffffffff810bcb18>] synchronize_rcu_expedited+0x168/0x210
> > [   46.008013]  [<ffffffff8111a073>] ? might_fault+0x53/0xb0
> > [   46.008013]  [<ffffffff8125200a>] ? security_capable+0x2a/0x30
> > [   46.008013]  [<ffffffff814d0985>] synchronize_net+0x45/0x50
> > [   46.008013]  [<ffffffffa0359613>] ipip6_tunnel_ioctl+0x5f3/0x800 [sit]
> > [   46.008013]  [<ffffffffa0359020>] ? ipip6_tunnel_locate+0x2e0/0x2e0 [sit] 
> > [   46.008013]  [<ffffffff814d43ba>] dev_ifsioc+0x11a/0x2c0
> > [   46.008013]  [<ffffffff814d678a>] dev_ioctl+0x35a/0x810
> > [   46.008013]  [<ffffffff81089a8d>] ? debug_check_no_locks_freed+0x9d/0x150
> > [   46.008013]  [<ffffffff8108999d>] ? trace_hardirqs_on_caller+0x14d/0x190
> > [   46.008013]  [<ffffffff810899ed>] ? trace_hardirqs_on+0xd/0x10
> > [   46.008013]  [<ffffffff814bb83a>] sock_ioctl+0xea/0x2b0
> > [   46.008013]  [<ffffffff81162474>] do_vfs_ioctl+0xa4/0x5a0
> > [   46.008013]  [<ffffffff815799cc>] ? _raw_spin_unlock+0x5c/0x70
> > [   46.008013]  [<ffffffff8114c1cc>] ? fd_install+0x7c/0x90
> > [   46.008013]  [<ffffffff8158149c>] ? sysret_check+0x27/0x62
> > [   46.008013]  [<ffffffff81162a09>] sys_ioctl+0x99/0xa0
> > [   46.008013]  [<ffffffff8158146b>] system_call_fastpath+0x16/0x1b
> > [   46.786123] BUG: sleeping function called from invalid context at net/ipv4/route.c:757
> > [   46.799548] in_atomic(): 1, irqs_disabled(): 0, pid: 2982, name: ip
> > [   46.799553] INFO: lockdep is turned off.
> > [   46.799561] Pid: 2982, comm: ip Tainted: G        W   3.0.0-rc7-crc+ #340
> > [   46.799565] Call Trace:
> > [   46.799576]  [<ffffffff81036b39>] __might_sleep+0xf9/0x120
> > [   46.799586]  [<ffffffff814fea38>] rt_do_flush+0x178/0x1b0
> > [   46.799594]  [<ffffffff814feafc>] rt_cache_flush+0x5c/0x70
> > [   46.799606]  [<ffffffff81542d92>] fib_netdev_event+0x72/0xf0
> > [   46.799615]  [<ffffffff8157d576>] notifier_call_chain+0x56/0x80
> > [   46.799625]  [<ffffffff81077ff6>] raw_notifier_call_chain+0x16/0x20
> > [   46.799633]  [<ffffffff814d1cd4>] call_netdevice_notifiers+0x74/0x90
> > [   46.799642]  [<ffffffff814d2c37>] netdev_state_change+0x27/0x40
> > [   46.799653]  [<ffffffffa0359686>] ipip6_tunnel_ioctl+0x666/0x800 [sit]
> > [   46.799663]  [<ffffffffa0359020>] ? ipip6_tunnel_locate+0x2e0/0x2e0 [sit]
> > [   46.799673]  [<ffffffff814d43ba>] dev_ifsioc+0x11a/0x2c0
> > [   46.799682]  [<ffffffff814d678a>] dev_ioctl+0x35a/0x810
> > [   46.799690]  [<ffffffff81089a8d>] ? debug_check_no_locks_freed+0x9d/0x150
> > [   46.799700]  [<ffffffff8108999d>] ? trace_hardirqs_on_caller+0x14d/0x190
> > [   46.799707]  [<ffffffff810899ed>] ? trace_hardirqs_on+0xd/0x10
> > [   46.799718]  [<ffffffff814bb83a>] sock_ioctl+0xea/0x2b0
> > [   46.799726]  [<ffffffff81162474>] do_vfs_ioctl+0xa4/0x5a0
> > [   46.799735]  [<ffffffff815799cc>] ? _raw_spin_unlock+0x5c/0x70
> > [   46.799744]  [<ffffffff8114c1cc>] ? fd_install+0x7c/0x90
> > [   46.799752]  [<ffffffff8158149c>] ? sysret_check+0x27/0x62
> > [   46.799760]  [<ffffffff81162a09>] sys_ioctl+0x99/0xa0
> > [   46.799769]  [<ffffffff8158146b>] system_call_fastpath+0x16/0x1b
> > 
> > Followed by more BUGs and yielding a box that needed in interrupt button pressed to
> > force a reboot.   
> > 
> > I've previously tested with patches 5 and 6 from Peter Z with good results with threadirqs 
> > with boost disabled.
> > 
> > Ed
> > 
> > > This patch set contains fixes for a trainwreck involving RCU, the
> > > scheduler, and threaded interrupts.  This trainwreck involved RCU failing
> > > to properly protect one of its bit fields, use of RCU by the scheduler
> > > from portions of irq_exit() where in_irq() returns false, uses of the
> > > scheduler by RCU colliding with uses of RCU by the scheduler, threaded
> > > interrupts exercising the problematic portions of irq_exit() more heavily,
> > > and so on.  The patches are as follows:
> > > 
> > > 1.	Properly protect current->rcu_read_unlock_special().
> > > 	Lack of protection was causing RCU to recurse on itself, which
> > > 	in turn resulted in deadlocks involving RCU and the scheduler.
> > > 	This affects only RCU_BOOST=y configurations.
> > > 2.	Streamline code produced by __rcu_read_unlock().  This one is
> > > 	an innocent bystander that is being carried due to conflicts
> > > 	with other patches.  (A later version will likely merge it
> > > 	with #3 below.)
> > > 3.	Make __rcu_read_unlock() delay counting the per-task
> > > 	->rcu_read_lock_nesting variable to zero until all cleanup for the
> > > 	just-ended RCU read-side critical section has completed.  This
> > > 	prevents a number of other cases that could result in deadlock
> > > 	due to self recursion.	This affects only TREE_PREEMPT_RCU=y
> > > 	configurations.
> > > 4.	Make scheduler_ipi() correctly identify itself as being
> > > 	in_irq() when it needs to do anything that might involve RCU,
> > > 	thus enabling RCU to avoid yet another class of potential
> > > 	self-recursions and deadlocks.	This affects PREEMPT_RCU=y
> > > 	configurations.
> > > 5.	Make irq_exit() inform RCU when it is invoking the scheduler
> > > 	in situations where in_irq() would return false, thus
> > > 	allowing RCU to correctly avoid self-recursion.  This affects
> > > 	TREE_PREEMPT_RCU=y configurations.
> > > 6.	Make __lock_task_sighand() execute the entire RCU read-side
> > > 	critical section with irqs disabled.  (An experimental patch at
> > > 	http://marc.info/?l=linux-kernel&m=131110647222185 might possibly
> > > 	make it legal to have an RCU read-side critical section where
> > > 	the rcu_read_unlock() is executed with interrupts disabled,
> > > 	but where some protion of the RCU read-side critical section
> > > 	was preemptible.)  This affects TREE_PREEMPT_RCU=y configurations.
> > > 
> > > TINY_PREEMPT_RCU will also need a few of these changes, but in the
> > > meantime this patch stack helps organize things better for testing.
> > > These are also available from the following subject-to-rebase git branch:
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git rcu/urgent
> > > 
> > > 							Thanx, Paul
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > >  b/include/linux/sched.h   |    3 +++
> > >  b/kernel/rcutree_plugin.h |    3 ++-
> > >  b/kernel/sched.c          |   44 ++++++++++++++++++++++++++++++++++++++------
> > >  b/kernel/signal.c         |   16 +++++++++++-----
> > >  b/kernel/softirq.c        |   12 ++++++++++--
> > >  kernel/rcutree_plugin.h   |   45 ++++++++++++++++++++++++++++++++-------------
> > >  6 files changed, 96 insertions(+), 27 deletions(-)
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at  http://www.tux.org/lkml/
> > > 
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 
> > 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH tip/core/urgent 1/7] rcu: decrease rcu_report_exp_rnp coupling with scheduler
  2011-07-20  2:40   ` Peter Zijlstra
@ 2011-07-20  4:54     ` Paul E. McKenney
  2011-07-20 11:23       ` Ed Tomlinson
  0 siblings, 1 reply; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20  4:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches, greearb, edt

On Wed, Jul 20, 2011 at 04:40:18AM +0200, Peter Zijlstra wrote:
> On Tue, 2011-07-19 at 17:18 -0700, Paul E. McKenney wrote:
> > +++ b/kernel/rcutree_plugin.h
> > @@ -696,8 +696,10 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
> >         raw_spin_lock_irqsave(&rnp->lock, flags);
> >         for (;;) {
> >                 if (!sync_rcu_preempt_exp_done(rnp))
> > +                       raw_spin_unlock_irqrestore(&rnp->lock, flags);
> >                         break;
> 
> I bet that'll all work much better if you wrap it in curly braces like:
> 
> 		if (!sync_rcu_preempt_exp_done(rnp)) {
> 			raw_spin_unlock_irqrestore(&rnp->lock, flags);
> 			break;
> 		}
> 
> That might also explain those explosions Ed and Ben have been seeing.

Indeed.  Must be the call of the snake.  :-(

Thank you for catching this!

> >                 if (rnp->parent == NULL) {
> > +                       raw_spin_unlock_irqrestore(&rnp->lock, flags);
> >                         wake_up(&sync_rcu_preempt_exp_wq);
> >                         break;
> >                 }
> > @@ -707,7 +709,6 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
> >                 raw_spin_lock(&rnp->lock); /* irqs already disabled */
> >                 rnp->expmask &= ~mask;
> >         }
> > -       raw_spin_unlock_irqrestore(&rnp->lock, flags);
> >  } 

So this time I am testing the exact patch series before resending.
In the meantime, here is the updated version of this patch.

							Thanx, Paul

------------------------------------------------------------------------

rcu: decrease rcu_report_exp_rnp coupling with scheduler

PREEMPT_RCU read-side critical sections blocking an expedited grace
period invoke rcu_report_exp_rnp().  When the last such critical section
has completed, rcu_report_exp_rnp() invokes the scheduler to wake up the
task that invoked synchronize_rcu_expedited() -- needlessly holding the
root rcu_node structure's lock while doing so, thus needlessly providing
a way for RCU and the scheduler to deadlock.

This commit therefore releases the root rcu_node structure's lock before
calling wake_up().

Reported-by: Ed Tomlinson <edt@aei.ca>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 75113cb..6abef3c 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -695,9 +695,12 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
 
 	raw_spin_lock_irqsave(&rnp->lock, flags);
 	for (;;) {
-		if (!sync_rcu_preempt_exp_done(rnp))
+		if (!sync_rcu_preempt_exp_done(rnp)) {
+			raw_spin_unlock_irqrestore(&rnp->lock, flags);
 			break;
+		}
 		if (rnp->parent == NULL) {
+			raw_spin_unlock_irqrestore(&rnp->lock, flags);
 			wake_up(&sync_rcu_preempt_exp_wq);
 			break;
 		}
@@ -707,7 +710,6 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
 		raw_spin_lock(&rnp->lock); /* irqs already disabled */
 		rnp->expmask &= ~mask;
 	}
-	raw_spin_unlock_irqrestore(&rnp->lock, flags);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20  4:44     ` Paul E. McKenney
@ 2011-07-20  5:03       ` Linus Torvalds
  2011-07-20 13:34         ` Paul E. McKenney
  2011-07-20 10:49       ` Ed Tomlinson
  1 sibling, 1 reply; 55+ messages in thread
From: Linus Torvalds @ 2011-07-20  5:03 UTC (permalink / raw)
  To: paulmck
  Cc: Ed Tomlinson, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, niv, tglx, peterz, rostedt,
	Valdis.Kletnieks, dhowells, eric.dumazet, darren, patches,
	greearb

On Tue, Jul 19, 2011 at 9:44 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Tue, Jul 19, 2011 at 10:07:32PM -0400, Ed Tomlinson wrote:
>> On Tuesday 19 July 2011 21:30:00 Ed Tomlinson wrote:
>> > On Tuesday 19 July 2011 20:17:38 Paul E. McKenney wrote:
>>
>> Paul,
>>
>> Two things to add.  Its possible an eariler error is missing from the log below.  My serial console was missing
>> entries from before '0 and 40ish' and I did think I saw a bunch of FFFFFFFs scroll by eariler...
>>
>> Second, booting with threadirqs and boost enabled in .config with patches 2,5,6 is ok so far.
>
> Patches 3 and 4 resulted in failures?  (Patch 1 certainly did.)

Gaah. Any possibility of getting this resolved soon?

I've merged the pathname lookup fix. I was hoping all the RCU issues
were behind us..

                    Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20  4:44     ` Paul E. McKenney
  2011-07-20  5:03       ` Linus Torvalds
@ 2011-07-20 10:49       ` Ed Tomlinson
  1 sibling, 0 replies; 55+ messages in thread
From: Ed Tomlinson @ 2011-07-20 10:49 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches, greearb

On Wednesday 20 July 2011 00:44:35 Paul E. McKenney wrote:
> On Tuesday 19 July 2011 20:17:38 Paul E. McKenney wrote:
> > 
> > Paul,
> > 
> > Two things to add.  Its possible an eariler error is missing from the log below.  My serial console was missing
> > entries from before '0 and 40ish' and I did think I saw a bunch of FFFFFFFs scroll by eariler...
> > 
> > Second, booting with threadirqs and boost enabled in .config with patches 2,5,6 is ok so far.
> 
> Patches 3 and 4 resulted in failures?  (Patch 1 certainly did.)

I tried with all 7 and got an unstable system.  decided to add use the two I had already tested and
add one at a time.  Things seem ok with 2 (fix boost).  I've not tried with 2, 3,4,5,6.  Given that you
have changed patch 1, I'll reboot with all 7 again.

Ed

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH tip/core/urgent 1/7] rcu: decrease rcu_report_exp_rnp coupling with scheduler
  2011-07-20  4:54     ` Paul E. McKenney
@ 2011-07-20 11:23       ` Ed Tomlinson
  2011-07-20 11:31         ` Ed Tomlinson
                           ` (2 more replies)
  0 siblings, 3 replies; 55+ messages in thread
From: Ed Tomlinson @ 2011-07-20 11:23 UTC (permalink / raw)
  To: paulmck
  Cc: Peter Zijlstra, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, niv, tglx, rostedt, Valdis.Kletnieks,
	dhowells, eric.dumazet, darren, patches, greearb

On Wednesday 20 July 2011 00:54:15 Paul E. McKenney wrote:
> On Wed, Jul 20, 2011 at 04:40:18AM +0200, Peter Zijlstra wrote:
> > On Tue, 2011-07-19 at 17:18 -0700, Paul E. McKenney wrote:
> > > +++ b/kernel/rcutree_plugin.h
> > > @@ -696,8 +696,10 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
> > >         raw_spin_lock_irqsave(&rnp->lock, flags);
> > >         for (;;) {
> > >                 if (!sync_rcu_preempt_exp_done(rnp))
> > > +                       raw_spin_unlock_irqrestore(&rnp->lock, flags);
> > >                         break;
> > 
> > I bet that'll all work much better if you wrap it in curly braces like:
> > 
> > 		if (!sync_rcu_preempt_exp_done(rnp)) {
> > 			raw_spin_unlock_irqrestore(&rnp->lock, flags);
> > 			break;
> > 		}
> > 
> > That might also explain those explosions Ed and Ben have been seeing.
> 
> Indeed.  Must be the call of the snake.  :-(
> 
> Thank you for catching this!
> 
> > >                 if (rnp->parent == NULL) {
> > > +                       raw_spin_unlock_irqrestore(&rnp->lock, flags);
> > >                         wake_up(&sync_rcu_preempt_exp_wq);
> > >                         break;
> > >                 }
> > > @@ -707,7 +709,6 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
> > >                 raw_spin_lock(&rnp->lock); /* irqs already disabled */
> > >                 rnp->expmask &= ~mask;
> > >         }
> > > -       raw_spin_unlock_irqrestore(&rnp->lock, flags);
> > >  } 
> 
> So this time I am testing the exact patch series before resending.
> In the meantime, here is the updated version of this patch.
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> rcu: decrease rcu_report_exp_rnp coupling with scheduler
> 
> PREEMPT_RCU read-side critical sections blocking an expedited grace
> period invoke rcu_report_exp_rnp().  When the last such critical section
> has completed, rcu_report_exp_rnp() invokes the scheduler to wake up the
> task that invoked synchronize_rcu_expedited() -- needlessly holding the
> root rcu_node structure's lock while doing so, thus needlessly providing
> a way for RCU and the scheduler to deadlock.
> 
> This commit therefore releases the root rcu_node structure's lock before
> calling wake_up().
> 
> Reported-by: Ed Tomlinson <edt@aei.ca>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> index 75113cb..6abef3c 100644
> --- a/kernel/rcutree_plugin.h
> +++ b/kernel/rcutree_plugin.h
> @@ -695,9 +695,12 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
>  
>  	raw_spin_lock_irqsave(&rnp->lock, flags);
>  	for (;;) {
> -		if (!sync_rcu_preempt_exp_done(rnp))
> +		if (!sync_rcu_preempt_exp_done(rnp)) {
> +			raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  			break;
> +		}
>  		if (rnp->parent == NULL) {
> +			raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  			wake_up(&sync_rcu_preempt_exp_wq);
>  			break;
>  		}
> @@ -707,7 +710,6 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
>  		raw_spin_lock(&rnp->lock); /* irqs already disabled */
>  		rnp->expmask &= ~mask;
>  	}
> -	raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  }
>  
>  /*

Paul/Peter,

With all 7 fixes applied with the above version of patch 1, I get a warning during boot:

[    3.443140] EXT4-fs (sdc3): mounted filesystem with ordered data mode. Opts: (null)
[    3.456097] VFS: Mounted root (ext4 filesystem) readonly on device 8:35.
[    3.469734] devtmpfs: mounted
[    3.478488] Freeing unused kernel memory: 628k freed
[    3.488590] Write protecting the kernel read-only data: 10240k
[    3.500809] Freeing unused kernel memory: 468k freed
[    3.514158] Freeing unused kernel memory: 1240k freed
[    3.552337] ------------[ cut here ]------------
[    3.553004] ------------[ cut here ]------------
[    3.553004] WARNING: at kernel/rcutree_plugin.h:414 __rcu_read_unlock+0xc9/0x120()
[    3.553004] Hardware name: System Product Name
[    3.553004] Modules linked in:
[    3.553004] Pid: 1, comm: init Not tainted 3.0.0-rcr-crc+ #342
[    3.553004] Call Trace:
[    3.553004]  [<ffffffff8104b00f>] warn_slowpath_common+0x7f/0xc0
[    3.553004]  [<ffffffff8104b06a>] warn_slowpath_null+0x1a/0x20
[    3.553004]  [<ffffffff810bb479>] __rcu_read_unlock+0xc9/0x120
[    3.553004]  [<ffffffff8157d6b1>] __atomic_notifier_call_chain+0x91/0xb0
[    3.553004]  [<ffffffff8157d620>] ? notifier_call_chain+0x80/0x80
[    3.553004]  [<ffffffff812c5860>] ? bit_putcs+0x5b0/0x5b0
[    3.553004]  [<ffffffff8157d6e6>] atomic_notifier_call_chain+0x16/0x20
[    3.553004]  [<ffffffff813203c9>] notify_write+0x29/0x30
[    3.553004]  [<ffffffff81322024>] vt_console_print+0x164/0x3b0
[    3.553004]  [<ffffffff8104b29e>] __call_console_drivers+0x8e/0xb0
[    3.553004]  [<ffffffff8104b30a>] _call_console_drivers+0x4a/0x80
[    3.553004]  [<ffffffff8104b9ed>] console_unlock+0xfd/0x210
[    3.553004]  [<ffffffff8104c246>] vprintk+0x3f6/0x530
[    3.553004]  [<ffffffff810bb479>] ? __rcu_read_unlock+0xc9/0x120
[    3.553004]  [<ffffffff8157585c>] printk+0x41/0x45
[    3.553004]  [<ffffffff810bb479>] ? __rcu_read_unlock+0xc9/0x120
[    3.553004]  [<ffffffff8104afcd>] warn_slowpath_common+0x3d/0xc0
[    3.553004]  [<ffffffff8104b06a>] warn_slowpath_null+0x1a/0x20
[    3.553004]  [<ffffffff810bb479>] __rcu_read_unlock+0xc9/0x120
[    3.553004]  [<ffffffff8103fed8>] cpuacct_charge+0xc8/0xe0
[    3.553004]  [<ffffffff8103fe58>] ? cpuacct_charge+0x48/0xe0
[    3.553004]  [<ffffffff810326b7>] ? task_of+0x97/0xd0
[    3.553004]  [<ffffffff81040ef5>] update_curr+0x1a5/0x210
[    3.553004]  [<ffffffff81576d78>] ? preempt_schedule_irq+0x68/0xa0
[    3.553004]  [<ffffffff810419e0>] put_prev_task_fair+0x110/0x120
[    3.553004]  [<ffffffff81575f3a>] schedule+0x1da/0xc50
[    3.553004]  [<ffffffff81576d72>] ? preempt_schedule_irq+0x62/0xa0
[    3.553004]  [<ffffffff81576d78>] preempt_schedule_irq+0x68/0xa0
[    3.553004]  [<ffffffff8157a316>] retint_kernel+0x26/0x30
[    3.553004]  [<ffffffff810da494>] ? ftrace_likely_update+0x14/0x20
[    3.553004]  [<ffffffff810bb4ab>] __rcu_read_unlock+0xfb/0x120
[    3.553004]  [<ffffffff810f8190>] find_get_page+0x170/0x190
[    3.553004]  [<ffffffff810f8020>] ? find_get_pages+0x280/0x280
[    3.553004]  [<ffffffff81180a30>] __find_get_block_slow+0x40/0x130
[    3.553004]  [<ffffffff81180d5f>] __find_get_block+0xdf/0x210
[    3.553004]  [<ffffffff8157d541>] ? sub_preempt_count+0x51/0x60
[    3.553004]  [<ffffffff81184c5e>] __getblk+0x11e/0x300
[    3.553004]  [<ffffffff81184e55>] __breadahead+0x15/0x60
[    3.553004]  [<ffffffff811e5e9e>] __ext4_get_inode_loc+0x36e/0x3f0
[    3.553004]  [<ffffffff811e7dfe>] ext4_iget+0x7e/0x870
[    3.553004]  [<ffffffff8129247e>] ? do_raw_spin_lock+0xde/0x1c0
[    3.553004]  [<ffffffff811f25e5>] ext4_lookup+0xb5/0x150
[    3.553004]  [<ffffffff8115a5c3>] d_alloc_and_lookup+0xc3/0x100
[    3.553004]  [<ffffffff8115cd00>] do_lookup+0x260/0x480
[    3.553004]  [<ffffffff8157d541>] ? sub_preempt_count+0x51/0x60
[    3.553004]  [<ffffffff8115dc58>] link_path_walk+0x248/0x930
[    3.553004]  [<ffffffff812925f6>] ? __raw_spin_lock_init+0x36/0x60
[    3.553004]  [<ffffffff81160417>] path_openat+0x107/0x4f0
[    3.553004]  [<ffffffff8108b392>] ? lock_release_non_nested+0xb2/0x330
[    3.553004]  [<ffffffff81160979>] do_filp_open+0x49/0x100
[    3.553004]  [<ffffffff8129247e>] ? do_raw_spin_lock+0xde/0x1c0
[    3.553004]  [<ffffffff81579a3c>] ? _raw_spin_unlock+0x5c/0x70
[    3.553004]  [<ffffffff8116db4a>] ? alloc_fd+0xfa/0x140
[    3.553004]  [<ffffffff8114d892>] do_sys_open+0x172/0x220
[    3.553004]  [<ffffffff8114d980>] sys_open+0x20/0x30
[    3.553004]  [<ffffffff815814eb>] system_call_fastpath+0x16/0x1b
[    3.553004] ---[ end trace 9137bd8a48443ea9 ]---
[    3.553004] WARNING: at kernel/rcutree_plugin.h:414 __rcu_read_unlock+0xc9/0x120()
[    3.553004] Hardware name: System Product Name
[    3.553004] Modules linked in:
[    3.553004] Pid: 1, comm: init Tainted: G        W   3.0.0-rcr-crc+ #342
[    3.553004] Call Trace:
[    3.553004]  [<ffffffff8104b00f>] warn_slowpath_common+0x7f/0xc0
[    3.553004]  [<ffffffff8104b06a>] warn_slowpath_null+0x1a/0x20
[    3.553004]  [<ffffffff810bb479>] __rcu_read_unlock+0xc9/0x120
[    3.553004]  [<ffffffff8103fed8>] cpuacct_charge+0xc8/0xe0
[    3.553004]  [<ffffffff8103fe58>] ? cpuacct_charge+0x48/0xe0
[    3.553004]  [<ffffffff810326b7>] ? task_of+0x97/0xd0
[    3.553004]  [<ffffffff81040ef5>] update_curr+0x1a5/0x210
[    3.553004]  [<ffffffff81576d78>] ? preempt_schedule_irq+0x68/0xa0
[    3.553004]  [<ffffffff810419e0>] put_prev_task_fair+0x110/0x120
[    3.553004]  [<ffffffff81575f3a>] schedule+0x1da/0xc50
[    3.553004]  [<ffffffff81576d72>] ? preempt_schedule_irq+0x62/0xa0
[    3.553004]  [<ffffffff81576d78>] preempt_schedule_irq+0x68/0xa0
[    3.553004]  [<ffffffff8157a316>] retint_kernel+0x26/0x30
[    3.553004]  [<ffffffff810da494>] ? ftrace_likely_update+0x14/0x20
[    3.553004]  [<ffffffff810bb4ab>] __rcu_read_unlock+0xfb/0x120
[    3.553004]  [<ffffffff810f8190>] find_get_page+0x170/0x190
[    3.553004]  [<ffffffff810f8020>] ? find_get_pages+0x280/0x280
[    3.553004]  [<ffffffff81180a30>] __find_get_block_slow+0x40/0x130
[    3.553004]  [<ffffffff81180d5f>] __find_get_block+0xdf/0x210
[    3.553004]  [<ffffffff8157d541>] ? sub_preempt_count+0x51/0x60
[    3.553004]  [<ffffffff81184c5e>] __getblk+0x11e/0x300
[    3.553004]  [<ffffffff81184e55>] __breadahead+0x15/0x60
[    3.553004]  [<ffffffff811e5e9e>] __ext4_get_inode_loc+0x36e/0x3f0
[    3.553004]  [<ffffffff811e7dfe>] ext4_iget+0x7e/0x870
[    3.553004]  [<ffffffff8129247e>] ? do_raw_spin_lock+0xde/0x1c0
[    3.553004]  [<ffffffff811f25e5>] ext4_lookup+0xb5/0x150
[    3.553004]  [<ffffffff8115a5c3>] d_alloc_and_lookup+0xc3/0x100
[    3.553004]  [<ffffffff8115cd00>] do_lookup+0x260/0x480
[    3.553004]  [<ffffffff8157d541>] ? sub_preempt_count+0x51/0x60
[    3.553004]  [<ffffffff8115dc58>] link_path_walk+0x248/0x930
[    3.553004]  [<ffffffff812925f6>] ? __raw_spin_lock_init+0x36/0x60
[    3.553004]  [<ffffffff81160417>] path_openat+0x107/0x4f0
[    3.553004]  [<ffffffff8108b392>] ? lock_release_non_nested+0xb2/0x330
[    3.553004]  [<ffffffff81160979>] do_filp_open+0x49/0x100
[    3.553004]  [<ffffffff8129247e>] ? do_raw_spin_lock+0xde/0x1c0
[    3.553004]  [<ffffffff81579a3c>] ? _raw_spin_unlock+0x5c/0x70
[    3.553004]  [<ffffffff8116db4a>] ? alloc_fd+0xfa/0x140
[    3.553004]  [<ffffffff8114d892>] do_sys_open+0x172/0x220
[    3.553004]  [<ffffffff8114d980>] sys_open+0x20/0x30
[    3.553004]  [<ffffffff815814eb>] system_call_fastpath+0x16/0x1b
[    3.553004] ---[ end trace 9137bd8a48443eaa ]---
INIT: version 2.88 booting
G
   OpenRC 0.8.3 is starting up Gentoo Linux (x86_64)

RCU settings are:grep RCU .config
# RCU Subsystem
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
# CONFIG_RCU_TRACE is not set
CONFIG_RCU_FANOUT=64
# CONFIG_RCU_FANOUT_EXACT is not set
# CONFIG_TREE_RCU_TRACE is not set
CONFIG_RCU_BOOST=y
CONFIG_RCU_BOOST_PRIO=1
CONFIG_RCU_BOOST_DELAY=500
# CONFIG_PROVE_RCU is not set
# CONFIG_SPARSE_RCU_POINTER is not set
CONFIG_RCU_TORTURE_TEST=m
CONFIG_RCU_CPU_STALL_TIMEOUT=60
# CONFIG_RCU_CPU_STALL_VERBOSE is not set
 
and threadirqs are enabled by boot.

Hope this helps,
Ed

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH tip/core/urgent 1/7] rcu: decrease rcu_report_exp_rnp coupling with scheduler
  2011-07-20 11:23       ` Ed Tomlinson
@ 2011-07-20 11:31         ` Ed Tomlinson
  2011-07-20 12:35         ` Peter Zijlstra
  2011-07-20 13:23         ` Paul E. McKenney
  2 siblings, 0 replies; 55+ messages in thread
From: Ed Tomlinson @ 2011-07-20 11:31 UTC (permalink / raw)
  To: paulmck
  Cc: Peter Zijlstra, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, niv, tglx, rostedt, Valdis.Kletnieks,
	dhowells, eric.dumazet, darren, patches, greearb

On Wednesday 20 July 2011 07:23:32 Ed Tomlinson wrote:

[omitted] 

> Paul/Peter,
> 
> With all 7 fixes applied with the above version of patch 1, I get a warning during boot:

[omitted]

I rebooted to see if the warning was solid and it is.  I'm now running with all 7 patches and will see what happens.
I'll send you a report in a few hours from my work email if all seems ok.  Meanwhile I'll see if I cannot keep this
box busy for a few hours.

Ed

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH tip/core/urgent 1/7] rcu: decrease rcu_report_exp_rnp coupling with scheduler
  2011-07-20 11:23       ` Ed Tomlinson
  2011-07-20 11:31         ` Ed Tomlinson
@ 2011-07-20 12:35         ` Peter Zijlstra
  2011-07-20 13:23         ` Paul E. McKenney
  2 siblings, 0 replies; 55+ messages in thread
From: Peter Zijlstra @ 2011-07-20 12:35 UTC (permalink / raw)
  To: Ed Tomlinson
  Cc: paulmck, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, niv, tglx, rostedt, Valdis.Kletnieks,
	dhowells, eric.dumazet, darren, patches, greearb

On Wed, 2011-07-20 at 07:23 -0400, Ed Tomlinson wrote:
> [    3.553004]  [<ffffffff8104b06a>] warn_slowpath_null+0x1a/0x20
> [    3.553004]  [<ffffffff810bb479>] __rcu_read_unlock+0xc9/0x120
> [    3.553004]  [<ffffffff8103fed8>] cpuacct_charge+0xc8/0xe0
> [    3.553004]  [<ffffffff8103fe58>] ? cpuacct_charge+0x48/0xe0
> [    3.553004]  [<ffffffff810326b7>] ? task_of+0x97/0xd0
> [    3.553004]  [<ffffffff81040ef5>] update_curr+0x1a5/0x210
> [    3.553004]  [<ffffffff81576d78>] ? preempt_schedule_irq+0x68/0xa0
> [    3.553004]  [<ffffffff810419e0>] put_prev_task_fair+0x110/0x120
> [    3.553004]  [<ffffffff81575f3a>] schedule+0x1da/0xc50
> [    3.553004]  [<ffffffff81576d72>] ? preempt_schedule_irq+0x62/0xa0
> [    3.553004]  [<ffffffff81576d78>] preempt_schedule_irq+0x68/0xa0
> [    3.553004]  [<ffffffff8157a316>] retint_kernel+0x26/0x30
> [    3.553004]  [<ffffffff810da494>] ? ftrace_likely_update+0x14/0x20
> [    3.553004]  [<ffffffff810bb4ab>] __rcu_read_unlock+0xfb/0x120
> [    3.553004]  [<ffffffff810f8190>] find_get_page+0x170/0x190 

Ok, so we're running some task that does rcu_read_unlock(), right in the
middle of __rcu_read_unlock() we get preempted, the scheduler calls
rcu_note_context_switch()->rcu_preempt_note_context_switch() which sets
->rcu_read_unlock_special |= UNLOCK_BLOCKED.

Then before finishing the context switch, the cpuacct muck uses rcu, and
its rcu_read_unlock() triggers __rcu_read_unlock() and goes bang.

That rcu usage isn't new, that's been there since March 2009.

AFAICT even .39 should suffer from this particular issue, or am I
missing something here.. Paul?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH tip/core/urgent 4/7] rcu: protect __rcu_read_unlock() against scheduler-using irq handlers
  2011-07-20  0:18 ` [PATCH tip/core/urgent 4/7] rcu: protect __rcu_read_unlock() against scheduler-using irq handlers Paul E. McKenney
@ 2011-07-20 12:54   ` Peter Zijlstra
  2011-07-20 13:25     ` Paul E. McKenney
  0 siblings, 1 reply; 55+ messages in thread
From: Peter Zijlstra @ 2011-07-20 12:54 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches, greearb, edt, Paul E. McKenney

On Tue, 2011-07-19 at 17:18 -0700, Paul E. McKenney wrote:
> @@ -391,10 +400,15 @@ void __rcu_read_unlock(void)
>         struct task_struct *t = current;
>  
>         barrier();  /* needed if we ever invoke rcu_read_unlock in rcutree.c */
> -       if (--t->rcu_read_lock_nesting == 0) {
> -               barrier();  /* decr before ->rcu_read_unlock_special load */
> +       if (t->rcu_read_lock_nesting != 1)
> +               --t->rcu_read_lock_nesting;
> +       else {
> +               t->rcu_read_lock_nesting = INT_MIN;
> +               barrier();  /* assign before ->rcu_read_unlock_special load */
>                 if (unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
>                         rcu_read_unlock_special(t);
> +               barrier();  /* ->rcu_read_unlock_special load before assign */
> +               t->rcu_read_lock_nesting = 0;
>         }
>  #ifdef CONFIG_PROVE_LOCKING
>         WARN_ON_ONCE(ACCESS_ONCE(t->rcu_read_lock_nesting) < 0); 

But won't the above change make that WARN_ON_ONCE() invalid?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH tip/core/urgent 1/7] rcu: decrease rcu_report_exp_rnp coupling with scheduler
  2011-07-20 11:23       ` Ed Tomlinson
  2011-07-20 11:31         ` Ed Tomlinson
  2011-07-20 12:35         ` Peter Zijlstra
@ 2011-07-20 13:23         ` Paul E. McKenney
  2 siblings, 0 replies; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20 13:23 UTC (permalink / raw)
  To: Ed Tomlinson
  Cc: Peter Zijlstra, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, niv, tglx, rostedt, Valdis.Kletnieks,
	dhowells, eric.dumazet, darren, patches, greearb,
	edward.tomlinson

On Wed, Jul 20, 2011 at 07:23:32AM -0400, Ed Tomlinson wrote:
> On Wednesday 20 July 2011 00:54:15 Paul E. McKenney wrote:
> > On Wed, Jul 20, 2011 at 04:40:18AM +0200, Peter Zijlstra wrote:
> > > On Tue, 2011-07-19 at 17:18 -0700, Paul E. McKenney wrote:

[ . . . ]

> Paul/Peter,
> 
> With all 7 fixes applied with the above version of patch 1, I get a warning during boot:
> 
> [    3.443140] EXT4-fs (sdc3): mounted filesystem with ordered data mode. Opts: (null)
> [    3.456097] VFS: Mounted root (ext4 filesystem) readonly on device 8:35.
> [    3.469734] devtmpfs: mounted
> [    3.478488] Freeing unused kernel memory: 628k freed
> [    3.488590] Write protecting the kernel read-only data: 10240k
> [    3.500809] Freeing unused kernel memory: 468k freed
> [    3.514158] Freeing unused kernel memory: 1240k freed
> [    3.552337] ------------[ cut here ]------------
> [    3.553004] ------------[ cut here ]------------
> [    3.553004] WARNING: at kernel/rcutree_plugin.h:414 __rcu_read_unlock+0xc9/0x120()
> [    3.553004] Hardware name: System Product Name
> [    3.553004] Modules linked in:
> [    3.553004] Pid: 1, comm: init Not tainted 3.0.0-rcr-crc+ #342
> [    3.553004] Call Trace:
> [    3.553004]  [<ffffffff8104b00f>] warn_slowpath_common+0x7f/0xc0
> [    3.553004]  [<ffffffff8104b06a>] warn_slowpath_null+0x1a/0x20
> [    3.553004]  [<ffffffff810bb479>] __rcu_read_unlock+0xc9/0x120
> [    3.553004]  [<ffffffff8157d6b1>] __atomic_notifier_call_chain+0x91/0xb0
> [    3.553004]  [<ffffffff8157d620>] ? notifier_call_chain+0x80/0x80
> [    3.553004]  [<ffffffff812c5860>] ? bit_putcs+0x5b0/0x5b0
> [    3.553004]  [<ffffffff8157d6e6>] atomic_notifier_call_chain+0x16/0x20
> [    3.553004]  [<ffffffff813203c9>] notify_write+0x29/0x30
> [    3.553004]  [<ffffffff81322024>] vt_console_print+0x164/0x3b0
> [    3.553004]  [<ffffffff8104b29e>] __call_console_drivers+0x8e/0xb0
> [    3.553004]  [<ffffffff8104b30a>] _call_console_drivers+0x4a/0x80
> [    3.553004]  [<ffffffff8104b9ed>] console_unlock+0xfd/0x210
> [    3.553004]  [<ffffffff8104c246>] vprintk+0x3f6/0x530
> [    3.553004]  [<ffffffff810bb479>] ? __rcu_read_unlock+0xc9/0x120
> [    3.553004]  [<ffffffff8157585c>] printk+0x41/0x45
> [    3.553004]  [<ffffffff810bb479>] ? __rcu_read_unlock+0xc9/0x120
> [    3.553004]  [<ffffffff8104afcd>] warn_slowpath_common+0x3d/0xc0
> [    3.553004]  [<ffffffff8104b06a>] warn_slowpath_null+0x1a/0x20
> [    3.553004]  [<ffffffff810bb479>] __rcu_read_unlock+0xc9/0x120
> [    3.553004]  [<ffffffff8103fed8>] cpuacct_charge+0xc8/0xe0
> [    3.553004]  [<ffffffff8103fe58>] ? cpuacct_charge+0x48/0xe0
> [    3.553004]  [<ffffffff810326b7>] ? task_of+0x97/0xd0

The WARN_ON_ONCE() is no longer correct.  Here is a patch that goes
on top of #4 that fixes it.  The bug is in the warning, so if you are
running stably with all seven fixes and only get this warning, then that
is a -very- good sign!

							Thanx, Paul

------------------------------------------------------------------------

Fix warning to allow for negative values of ->rcu_read_lock_nesting.
    
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index cb33705..2b5f3e1 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -411,7 +411,11 @@ void __rcu_read_unlock(void)
 		t->rcu_read_lock_nesting = 0;
 	}
 #ifdef CONFIG_PROVE_LOCKING
-	WARN_ON_ONCE(ACCESS_ONCE(t->rcu_read_lock_nesting) < 0);
+	{
+		int rrln = ACCESS_ONCE(t->rcu_read_lock_nesting);
+
+		WARN_ON_ONCE(rrln < 0 && rrln < INT_MIN / 2);
+	}
 #endif /* #ifdef CONFIG_PROVE_LOCKING */
 }
 EXPORT_SYMBOL_GPL(__rcu_read_unlock);

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH tip/core/urgent 4/7] rcu: protect __rcu_read_unlock() against scheduler-using irq handlers
  2011-07-20 12:54   ` Peter Zijlstra
@ 2011-07-20 13:25     ` Paul E. McKenney
  0 siblings, 0 replies; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20 13:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches, greearb, edt, Paul E. McKenney

On Wed, Jul 20, 2011 at 02:54:27PM +0200, Peter Zijlstra wrote:
> On Tue, 2011-07-19 at 17:18 -0700, Paul E. McKenney wrote:
> > @@ -391,10 +400,15 @@ void __rcu_read_unlock(void)
> >         struct task_struct *t = current;
> >  
> >         barrier();  /* needed if we ever invoke rcu_read_unlock in rcutree.c */
> > -       if (--t->rcu_read_lock_nesting == 0) {
> > -               barrier();  /* decr before ->rcu_read_unlock_special load */
> > +       if (t->rcu_read_lock_nesting != 1)
> > +               --t->rcu_read_lock_nesting;
> > +       else {
> > +               t->rcu_read_lock_nesting = INT_MIN;
> > +               barrier();  /* assign before ->rcu_read_unlock_special load */
> >                 if (unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
> >                         rcu_read_unlock_special(t);
> > +               barrier();  /* ->rcu_read_unlock_special load before assign */
> > +               t->rcu_read_lock_nesting = 0;
> >         }
> >  #ifdef CONFIG_PROVE_LOCKING
> >         WARN_ON_ONCE(ACCESS_ONCE(t->rcu_read_lock_nesting) < 0); 
> 
> But won't the above change make that WARN_ON_ONCE() invalid?

Yes, please see the patch I just sent.  So that warning was spurious,
and if that was the only problem...

							Thanx, Paul

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20  5:03       ` Linus Torvalds
@ 2011-07-20 13:34         ` Paul E. McKenney
  2011-07-20 17:02           ` Ben Greear
  0 siblings, 1 reply; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20 13:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ed Tomlinson, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, niv, tglx, peterz, rostedt,
	Valdis.Kletnieks, dhowells, eric.dumazet, darren, patches,
	greearb, edward.tomlinson

On Tue, Jul 19, 2011 at 10:03:05PM -0700, Linus Torvalds wrote:
> On Tue, Jul 19, 2011 at 9:44 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Tue, Jul 19, 2011 at 10:07:32PM -0400, Ed Tomlinson wrote:
> >> On Tuesday 19 July 2011 21:30:00 Ed Tomlinson wrote:
> >> > On Tuesday 19 July 2011 20:17:38 Paul E. McKenney wrote:
> >>
> >> Paul,
> >>
> >> Two things to add.  Its possible an eariler error is missing from the log below.  My serial console was missing
> >> entries from before '0 and 40ish' and I did think I saw a bunch of FFFFFFFs scroll by eariler...
> >>
> >> Second, booting with threadirqs and boost enabled in .config with patches 2,5,6 is ok so far.
> >
> > Patches 3 and 4 resulted in failures?  (Patch 1 certainly did.)
> 
> Gaah. Any possibility of getting this resolved soon?

I believe that we are very close.  These failures all involve unusual
configurations, which is why they showed up so late.

> I've merged the pathname lookup fix. I was hoping all the RCU issues
> were behind us..

If we have time, I would like to see what Ed gets with the most recent
fix.  If we absolutely need to release today, then I will push patches
2,5,6 which are covering Ed's particular situation.  My concern with doing
so is that other failures might pop up.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20 13:34         ` Paul E. McKenney
@ 2011-07-20 17:02           ` Ben Greear
  2011-07-20 17:15             ` Paul E. McKenney
  0 siblings, 1 reply; 55+ messages in thread
From: Ben Greear @ 2011-07-20 17:02 UTC (permalink / raw)
  To: paulmck
  Cc: Linus Torvalds, Ed Tomlinson, linux-kernel, mingo, laijs,
	dipankar, akpm, mathieu.desnoyers, josh, niv, tglx, peterz,
	rostedt, Valdis.Kletnieks, dhowells, eric.dumazet, darren,
	patches, edward.tomlinson

On 07/20/2011 06:34 AM, Paul E. McKenney wrote:
> On Tue, Jul 19, 2011 at 10:03:05PM -0700, Linus Torvalds wrote:
>> On Tue, Jul 19, 2011 at 9:44 PM, Paul E. McKenney
>> <paulmck@linux.vnet.ibm.com>  wrote:
>>> On Tue, Jul 19, 2011 at 10:07:32PM -0400, Ed Tomlinson wrote:
>>>> On Tuesday 19 July 2011 21:30:00 Ed Tomlinson wrote:
>>>>> On Tuesday 19 July 2011 20:17:38 Paul E. McKenney wrote:
>>>>
>>>> Paul,
>>>>
>>>> Two things to add.  Its possible an eariler error is missing from the log below.  My serial console was missing
>>>> entries from before '0 and 40ish' and I did think I saw a bunch of FFFFFFFs scroll by eariler...
>>>>
>>>> Second, booting with threadirqs and boost enabled in .config with patches 2,5,6 is ok so far.
>>>
>>> Patches 3 and 4 resulted in failures?  (Patch 1 certainly did.)
>>
>> Gaah. Any possibility of getting this resolved soon?
>
> I believe that we are very close.  These failures all involve unusual
> configurations, which is why they showed up so late.

What is strange about my configuration?  Just that I enabled the
RCU boost and other new RCU features?

Surely it's not that I'm just doing lots of mounts and unmounts
and file-io with lots of processes....

And I was posting lockdep warnings on this since -rc1 or so, though granted
I was running a few extra patches and having other bugs at the same time....

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20 17:02           ` Ben Greear
@ 2011-07-20 17:15             ` Paul E. McKenney
  2011-07-20 18:44               ` Ingo Molnar
  0 siblings, 1 reply; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20 17:15 UTC (permalink / raw)
  To: Ben Greear
  Cc: Linus Torvalds, Ed Tomlinson, linux-kernel, mingo, laijs,
	dipankar, akpm, mathieu.desnoyers, josh, niv, tglx, peterz,
	rostedt, Valdis.Kletnieks, dhowells, eric.dumazet, darren,
	patches, edward.tomlinson

On Wed, Jul 20, 2011 at 10:02:06AM -0700, Ben Greear wrote:
> On 07/20/2011 06:34 AM, Paul E. McKenney wrote:
> >On Tue, Jul 19, 2011 at 10:03:05PM -0700, Linus Torvalds wrote:
> >>On Tue, Jul 19, 2011 at 9:44 PM, Paul E. McKenney
> >><paulmck@linux.vnet.ibm.com>  wrote:
> >>>On Tue, Jul 19, 2011 at 10:07:32PM -0400, Ed Tomlinson wrote:
> >>>>On Tuesday 19 July 2011 21:30:00 Ed Tomlinson wrote:
> >>>>>On Tuesday 19 July 2011 20:17:38 Paul E. McKenney wrote:
> >>>>
> >>>>Paul,
> >>>>
> >>>>Two things to add.  Its possible an eariler error is missing from the log below.  My serial console was missing
> >>>>entries from before '0 and 40ish' and I did think I saw a bunch of FFFFFFFs scroll by eariler...
> >>>>
> >>>>Second, booting with threadirqs and boost enabled in .config with patches 2,5,6 is ok so far.
> >>>
> >>>Patches 3 and 4 resulted in failures?  (Patch 1 certainly did.)
> >>
> >>Gaah. Any possibility of getting this resolved soon?
> >
> >I believe that we are very close.  These failures all involve unusual
> >configurations, which is why they showed up so late.
> 
> What is strange about my configuration?  Just that I enabled the
> RCU boost and other new RCU features?

RCU boost is new and counts as strange for another few days at least.
And I do very much appreciate your testing efforts.  For whatever reason,
you are finding things that my testing did not.  Which means that my
testing needs to improve, but getting these bugs fixed is higher priority
for the moment.

> Surely it's not that I'm just doing lots of mounts and unmounts
> and file-io with lots of processes....
> 
> And I was posting lockdep warnings on this since -rc1 or so, though granted
> I was running a few extra patches and having other bugs at the same time....

Indeed, there were other bugs that were hiding the ones that we are
now hunting down.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH rcu/urgent 0/7 v2] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20  0:17 [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck Paul E. McKenney
                   ` (8 preceding siblings ...)
  2011-07-20  1:30 ` Ed Tomlinson
@ 2011-07-20 18:25 ` Paul E. McKenney
  2011-07-20 18:26   ` [PATCH tip/core/urgent 1/7] rcu: decrease rcu_report_exp_rnp coupling with scheduler Paul E. McKenney
                     ` (6 more replies)
  9 siblings, 7 replies; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20 18:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, greearb, edt

Hello!

This patch set contains version 2 of fixes for a trainwreck involving
RCU, the scheduler, and threaded interrupts.  This trainwreck involved
RCU failing to properly protect one of its bit fields, use of RCU by
the scheduler from portions of irq_exit() where in_irq() returns false,
uses of the scheduler by RCU colliding with uses of RCU by the scheduler,
threaded interrupts exercising the problematic portions of irq_exit()
more heavily, and so on.  The patches are as follows:

1.	Properly protect current->rcu_read_unlock_special().
	Lack of protection was causing RCU to recurse on itself, which
	in turn resulted in deadlocks involving RCU and the scheduler.
	This affects only RCU_BOOST=y configurations.
2.	Streamline code produced by __rcu_read_unlock().  This one is
	an innocent bystander that is being carried due to conflicts
	with other patches.  (A later version will likely merge it
	with #3 below.)
3.	Make __rcu_read_unlock() delay counting the per-task
	->rcu_read_lock_nesting variable to zero until all cleanup for the
	just-ended RCU read-side critical section has completed.  This
	prevents a number of other cases that could result in deadlock
	due to self recursion.	This affects only TREE_PREEMPT_RCU=y
	configurations.
4.	Make scheduler_ipi() correctly identify itself as being
	in_irq() when it needs to do anything that might involve RCU,
	thus enabling RCU to avoid yet another class of potential
	self-recursions and deadlocks.	This affects PREEMPT_RCU=y
	configurations.
5.	Make irq_exit() inform RCU when it is invoking the scheduler
	in situations where in_irq() would return false, thus
	allowing RCU to correctly avoid self-recursion.  This affects
	TREE_PREEMPT_RCU=y configurations.
6.	Make __lock_task_sighand() execute the entire RCU read-side
	critical section with irqs disabled.  (An experimental patch at
	http://marc.info/?l=linux-kernel&m=131110647222185 might possibly
	make it legal to have an RCU read-side critical section where
	the rcu_read_unlock() is executed with interrupts disabled,
	but where some protion of the RCU read-side critical section
	was preemptible.)  This affects TREE_PREEMPT_RCU=y configurations.

TINY_PREEMPT_RCU will also need a few of these changes, but in the
meantime this patch stack helps organize things better for testing.

Changes from v1 (lkml.kernel.org/r/20110720001738.GA16369@linux.vnet.ibm.com):

o	Fix an embarrassing missing-braces bug in rcu_report_exp_rnp().

o	Fix the WARN_ON_ONCE() in __rcu_read_unlock() to correctly handle
	the new negative values.

o	Make the RCU read-side critical section in __lock_task_sighand()
	cover the full sighand->siglock critical section.

Many thanks to Peter for his review and to Ed and Ben for their testing.

These are also available from the following subject-to-rebase git branch:

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git rcu/urgent

Or will be once kernel.org's mirrors have updated.

							Thanx, Paul

------------------------------------------------------------------------

 b/include/linux/sched.h   |    3 ++
 b/kernel/rcutree_plugin.h |    6 +++--
 b/kernel/sched.c          |   44 ++++++++++++++++++++++++++++++++++-----
 b/kernel/signal.c         |   19 +++++++++++------
 b/kernel/softirq.c        |   12 +++++++++-
 kernel/rcutree_plugin.h   |   51 +++++++++++++++++++++++++++++++++-------------
 6 files changed, 105 insertions(+), 30 deletions(-)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH tip/core/urgent 1/7] rcu: decrease rcu_report_exp_rnp coupling with scheduler
  2011-07-20 18:25 ` [PATCH rcu/urgent 0/7 v2] " Paul E. McKenney
@ 2011-07-20 18:26   ` Paul E. McKenney
  2011-07-20 18:26   ` [PATCH tip/core/urgent 2/7] rcu: Fix RCU_BOOST race handling current->rcu_read_unlock_special Paul E. McKenney
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, greearb, edt, Paul E. McKenney

PREEMPT_RCU read-side critical sections blocking an expedited grace
period invoke rcu_report_exp_rnp().  When the last such critical section
has completed, rcu_report_exp_rnp() invokes the scheduler to wake up the
task that invoked synchronize_rcu_expedited() -- needlessly holding the
root rcu_node structure's lock while doing so, thus needlessly providing
a way for RCU and the scheduler to deadlock.

This commit therefore releases the root rcu_node structure's lock before
calling wake_up().

Reported-by: Ed Tomlinson <edt@aei.ca>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree_plugin.h |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 75113cb..6abef3c 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -695,9 +695,12 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
 
 	raw_spin_lock_irqsave(&rnp->lock, flags);
 	for (;;) {
-		if (!sync_rcu_preempt_exp_done(rnp))
+		if (!sync_rcu_preempt_exp_done(rnp)) {
+			raw_spin_unlock_irqrestore(&rnp->lock, flags);
 			break;
+		}
 		if (rnp->parent == NULL) {
+			raw_spin_unlock_irqrestore(&rnp->lock, flags);
 			wake_up(&sync_rcu_preempt_exp_wq);
 			break;
 		}
@@ -707,7 +710,6 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
 		raw_spin_lock(&rnp->lock); /* irqs already disabled */
 		rnp->expmask &= ~mask;
 	}
-	raw_spin_unlock_irqrestore(&rnp->lock, flags);
 }
 
 /*
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH tip/core/urgent 2/7] rcu: Fix RCU_BOOST race handling current->rcu_read_unlock_special
  2011-07-20 18:25 ` [PATCH rcu/urgent 0/7 v2] " Paul E. McKenney
  2011-07-20 18:26   ` [PATCH tip/core/urgent 1/7] rcu: decrease rcu_report_exp_rnp coupling with scheduler Paul E. McKenney
@ 2011-07-20 18:26   ` Paul E. McKenney
  2011-07-20 18:26   ` [PATCH tip/core/urgent 3/7] rcu: Streamline code produced by __rcu_read_unlock() Paul E. McKenney
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, greearb, edt, Paul E. McKenney,
	Paul E. McKenney

From: Paul E. McKenney <paul.mckenney@linaro.org>

The RCU_BOOST commits for TREE_PREEMPT_RCU introduced an other-task
write to a new RCU_READ_UNLOCK_BOOSTED bit in the task_struct structure's
->rcu_read_unlock_special field, but, as noted by Steven Rostedt, without
correctly synchronizing all accesses to ->rcu_read_unlock_special.
This could result in bits in ->rcu_read_unlock_special being spuriously
set and cleared due to conflicting accesses, which in turn could result
in deadlocks between the rcu_node structure's ->lock and the scheduler's
rq and pi locks.  These deadlocks would result from RCU incorrectly
believing that the just-ended RCU read-side critical section had been
preempted and/or boosted.  If that RCU read-side critical section was
executed with either rq or pi locks held, RCU's ensuing (incorrect)
calls to the scheduler would cause the scheduler to attempt to once
again acquire the rq and pi locks, resulting in deadlock.  More complex
deadlock cycles are also possible, involving multiple rq and pi locks
as well as locks from multiple rcu_node structures.

This commit fixes synchronization by creating ->rcu_boosted field in
task_struct that is accessed and modified only when holding the ->lock
in the rcu_node structure on which the task is queued (on that rcu_node
structure's ->blkd_tasks list).  This results in tasks accessing only
their own current->rcu_read_unlock_special fields, making unsynchronized
access once again legal, and keeping the rcu_read_unlock() fastpath free
of atomic instructions and memory barriers.

The reason that the rcu_read_unlock() fastpath does not need to access
the new current->rcu_boosted field is that this new field cannot
be non-zero unless the RCU_READ_UNLOCK_BLOCKED bit is set in the
current->rcu_read_unlock_special field.  Therefore, rcu_read_unlock()
need only test current->rcu_read_unlock_special: if that is zero, then
current->rcu_boosted must also be zero.

This bug does not affect TINY_PREEMPT_RCU because this implementation
of RCU accesses current->rcu_read_unlock_special with irqs disabled,
thus preventing races on the !SMP systems that TINY_PREEMPT_RCU runs on.

Maybe-reported-by: Dave Jones <davej@redhat.com>
Maybe-reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/sched.h   |    3 +++
 kernel/rcutree_plugin.h |    8 ++++++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 496770a..76676a4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1254,6 +1254,9 @@ struct task_struct {
 #ifdef CONFIG_PREEMPT_RCU
 	int rcu_read_lock_nesting;
 	char rcu_read_unlock_special;
+#if defined(CONFIG_RCU_BOOST) && defined(CONFIG_TREE_PREEMPT_RCU)
+	int rcu_boosted;
+#endif /* #if defined(CONFIG_RCU_BOOST) && defined(CONFIG_TREE_PREEMPT_RCU) */
 	struct list_head rcu_node_entry;
 #endif /* #ifdef CONFIG_PREEMPT_RCU */
 #ifdef CONFIG_TREE_PREEMPT_RCU
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 6abef3c..3a0ae03 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -342,6 +342,11 @@ static void rcu_read_unlock_special(struct task_struct *t)
 #ifdef CONFIG_RCU_BOOST
 		if (&t->rcu_node_entry == rnp->boost_tasks)
 			rnp->boost_tasks = np;
+		/* Snapshot and clear ->rcu_boosted with rcu_node lock held. */
+		if (t->rcu_boosted) {
+			special |= RCU_READ_UNLOCK_BOOSTED;
+			t->rcu_boosted = 0;
+		}
 #endif /* #ifdef CONFIG_RCU_BOOST */
 		t->rcu_blocked_node = NULL;
 
@@ -358,7 +363,6 @@ static void rcu_read_unlock_special(struct task_struct *t)
 #ifdef CONFIG_RCU_BOOST
 		/* Unboost if we were boosted. */
 		if (special & RCU_READ_UNLOCK_BOOSTED) {
-			t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_BOOSTED;
 			rt_mutex_unlock(t->rcu_boost_mutex);
 			t->rcu_boost_mutex = NULL;
 		}
@@ -1176,7 +1180,7 @@ static int rcu_boost(struct rcu_node *rnp)
 	t = container_of(tb, struct task_struct, rcu_node_entry);
 	rt_mutex_init_proxy_locked(&mtx, t);
 	t->rcu_boost_mutex = &mtx;
-	t->rcu_read_unlock_special |= RCU_READ_UNLOCK_BOOSTED;
+	t->rcu_boosted = 1;
 	raw_spin_unlock_irqrestore(&rnp->lock, flags);
 	rt_mutex_lock(&mtx);  /* Side effect: boosts task t's priority. */
 	rt_mutex_unlock(&mtx);  /* Keep lockdep happy. */
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH tip/core/urgent 3/7] rcu: Streamline code produced by __rcu_read_unlock()
  2011-07-20 18:25 ` [PATCH rcu/urgent 0/7 v2] " Paul E. McKenney
  2011-07-20 18:26   ` [PATCH tip/core/urgent 1/7] rcu: decrease rcu_report_exp_rnp coupling with scheduler Paul E. McKenney
  2011-07-20 18:26   ` [PATCH tip/core/urgent 2/7] rcu: Fix RCU_BOOST race handling current->rcu_read_unlock_special Paul E. McKenney
@ 2011-07-20 18:26   ` Paul E. McKenney
  2011-07-20 22:44     ` Linus Torvalds
  2011-07-20 18:26   ` [PATCH tip/core/urgent 4/7] rcu: protect __rcu_read_unlock() against scheduler-using irq handlers Paul E. McKenney
                     ` (3 subsequent siblings)
  6 siblings, 1 reply; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, greearb, edt, Paul E. McKenney

Given some common flag combinations, particularly -Os, gcc will inline
rcu_read_unlock_special() despite its being in an unlikely() clause.
Use noinline to prohibit this misoptimization.

In addition, move the second barrier() in __rcu_read_unlock() so that
it is not on the common-case code path.  This will allow the compiler to
generate better code for the common-case path through __rcu_read_unlock().

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 kernel/rcutree_plugin.h |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 3a0ae03..4d2c068 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -284,7 +284,7 @@ static struct list_head *rcu_next_node_entry(struct task_struct *t,
  * notify RCU core processing or task having blocked during the RCU
  * read-side critical section.
  */
-static void rcu_read_unlock_special(struct task_struct *t)
+static noinline void rcu_read_unlock_special(struct task_struct *t)
 {
 	int empty;
 	int empty_exp;
@@ -391,11 +391,11 @@ void __rcu_read_unlock(void)
 	struct task_struct *t = current;
 
 	barrier();  /* needed if we ever invoke rcu_read_unlock in rcutree.c */
-	--t->rcu_read_lock_nesting;
-	barrier();  /* decrement before load of ->rcu_read_unlock_special */
-	if (t->rcu_read_lock_nesting == 0 &&
-	    unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
-		rcu_read_unlock_special(t);
+	if (--t->rcu_read_lock_nesting == 0) {
+		barrier();  /* decr before ->rcu_read_unlock_special load */
+		if (unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
+			rcu_read_unlock_special(t);
+	}
 #ifdef CONFIG_PROVE_LOCKING
 	WARN_ON_ONCE(ACCESS_ONCE(t->rcu_read_lock_nesting) < 0);
 #endif /* #ifdef CONFIG_PROVE_LOCKING */
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH tip/core/urgent 4/7] rcu: protect __rcu_read_unlock() against scheduler-using irq handlers
  2011-07-20 18:25 ` [PATCH rcu/urgent 0/7 v2] " Paul E. McKenney
                     ` (2 preceding siblings ...)
  2011-07-20 18:26   ` [PATCH tip/core/urgent 3/7] rcu: Streamline code produced by __rcu_read_unlock() Paul E. McKenney
@ 2011-07-20 18:26   ` Paul E. McKenney
  2011-07-20 18:26   ` [PATCH tip/core/urgent 5/7] sched: Add irq_{enter,exit}() to scheduler_ipi() Paul E. McKenney
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, greearb, edt, Paul E. McKenney,
	Paul E. McKenney

From: Paul E. McKenney <paul.mckenney@linaro.org>

The addition of RCU read-side critical sections within runqueue and
priority-inheritance lock critical sections introduced some deadlock
cycles, for example, involving interrupts from __rcu_read_unlock()
where the interrupt handlers call wake_up().  This situation can cause
the instance of __rcu_read_unlock() invoked from interrupt to do some
of the processing that would otherwise have been carried out by the
task-level instance of __rcu_read_unlock().  When the interrupt-level
instance of __rcu_read_unlock() is called with a scheduler lock held
from interrupt-entry/exit situations where in_irq() returns false,
deadlock can result.

This commit resolves these deadlocks by using negative values of
the per-task ->rcu_read_lock_nesting counter to indicate that an
instance of __rcu_read_unlock() is in flight, which in turn prevents
instances from interrupt handlers from doing any special processing.
This patch is inspired by Steven Rostedt's earlier patch that similarly
made __rcu_read_unlock() guard against interrupt-mediated recursion
(see https://lkml.org/lkml/2011/7/15/326), but this commit refines
Steven's approach to avoid the need for preemption disabling on the
__rcu_read_unlock() fastpath and to also avoid the need for manipulating
a separate per-CPU variable.

This patch avoids need for preempt_disable() by instead using negative
values of the per-task ->rcu_read_lock_nesting counter.  Note that nested
rcu_read_lock()/rcu_read_unlock() pairs are still permitted, but they will
never see ->rcu_read_lock_nesting go to zero, and will therefore never
invoke rcu_read_unlock_special(), thus preventing them from seeing the
RCU_READ_UNLOCK_BLOCKED bit should it be set in ->rcu_read_unlock_special.
This patch also adds a check for ->rcu_read_unlock_special being negative
in rcu_check_callbacks(), thus preventing the RCU_READ_UNLOCK_NEED_QS
bit from being set should a scheduling-clock interrupt occur while
__rcu_read_unlock() is exiting from an outermost RCU read-side critical
section.

Of course, __rcu_read_unlock() can be preempted during the time that
->rcu_read_lock_nesting is negative.  This could result in the setting
of the RCU_READ_UNLOCK_BLOCKED bit after __rcu_read_unlock() checks it,
and would also result it this task being queued on the corresponding
rcu_node structure's blkd_tasks list.  Therefore, some later RCU read-side
critical section would enter rcu_read_unlock_special() to clean up --
which could result in deadlock if that critical section happened to be in
the scheduler where the runqueue or priority-inheritance locks were held.

This situation is dealt with by making rcu_preempt_note_context_switch()
check for negative ->rcu_read_lock_nesting, thus refraining from
queuing the task (and from setting RCU_READ_UNLOCK_BLOCKED) if we are
already exiting from the outermost RCU read-side critical section (in
other words, we really are no longer actually in that RCU read-side
critical section).  In addition, rcu_preempt_note_context_switch()
invokes rcu_read_unlock_special() to carry out the cleanup in this case,
which clears out the ->rcu_read_unlock_special bits and dequeues the task
(if necessary), in turn avoiding needless delay of the current RCU grace
period and needless RCU priority boosting.

It is still illegal to call rcu_read_unlock() while holding a scheduler
lock if the prior RCU read-side critical section has ever had either
preemption or irqs enabled.  However, the common use case is legal,
namely where then entire RCU read-side critical section executes with
irqs disabled, for example, when the scheduler lock is held across the
entire lifetime of the RCU read-side critical section.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree_plugin.h |   29 ++++++++++++++++++++++++-----
 1 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 4d2c068..d9d7a89 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -68,6 +68,7 @@ struct rcu_state rcu_preempt_state = RCU_STATE_INITIALIZER(rcu_preempt_state);
 DEFINE_PER_CPU(struct rcu_data, rcu_preempt_data);
 static struct rcu_state *rcu_state = &rcu_preempt_state;

+static void rcu_read_unlock_special(struct task_struct *t);
 static int rcu_preempted_readers_exp(struct rcu_node *rnp);

 /*
@@ -147,7 +148,7 @@ static void rcu_preempt_note_context_switch(int cpu)
 	struct rcu_data *rdp;
 	struct rcu_node *rnp;

-	if (t->rcu_read_lock_nesting &&
+	if (t->rcu_read_lock_nesting > 0 &&
 	    (t->rcu_read_unlock_special & RCU_READ_UNLOCK_BLOCKED) == 0) {

 		/* Possibly blocking in an RCU read-side critical section. */
@@ -190,6 +191,14 @@ static void rcu_preempt_note_context_switch(int cpu)
 				rnp->gp_tasks = &t->rcu_node_entry;
 		}
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
+	} else if (t->rcu_read_lock_nesting < 0 &&
+		   t->rcu_read_unlock_special) {
+
+		/*
+		 * Complete exit from RCU read-side critical section on
+		 * behalf of preempted instance of __rcu_read_unlock().
+		 */
+		rcu_read_unlock_special(t);
 	}

 	/*
@@ -391,13 +400,22 @@ void __rcu_read_unlock(void)
 	struct task_struct *t = current;

 	barrier();  /* needed if we ever invoke rcu_read_unlock in rcutree.c */
-	if (--t->rcu_read_lock_nesting == 0) {
-		barrier();  /* decr before ->rcu_read_unlock_special load */
+	if (t->rcu_read_lock_nesting != 1)
+		--t->rcu_read_lock_nesting;
+	else {
+		t->rcu_read_lock_nesting = INT_MIN;
+		barrier();  /* assign before ->rcu_read_unlock_special load */
 		if (unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
 			rcu_read_unlock_special(t);
+		barrier();  /* ->rcu_read_unlock_special load before assign */
+		t->rcu_read_lock_nesting = 0;
 	}
 #ifdef CONFIG_PROVE_LOCKING
-	WARN_ON_ONCE(ACCESS_ONCE(t->rcu_read_lock_nesting) < 0);
+	{
+		int rrln = ACCESS_ONCE(t->rcu_read_lock_nesting);
+
+		WARN_ON_ONCE(rrln < 0 && rrln > INT_MIN / 2);
+	}
 #endif /* #ifdef CONFIG_PROVE_LOCKING */
 }
 EXPORT_SYMBOL_GPL(__rcu_read_unlock);
@@ -593,7 +611,8 @@ static void rcu_preempt_check_callbacks(int cpu)
 		rcu_preempt_qs(cpu);
 		return;
 	}
-	if (per_cpu(rcu_preempt_data, cpu).qs_pending)
+	if (t->rcu_read_lock_nesting > 0 &&
+	    per_cpu(rcu_preempt_data, cpu).qs_pending)
 		t->rcu_read_unlock_special |= RCU_READ_UNLOCK_NEED_QS;
 }

-- 
1.7.3.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH tip/core/urgent 5/7] sched: Add irq_{enter,exit}() to scheduler_ipi()
  2011-07-20 18:25 ` [PATCH rcu/urgent 0/7 v2] " Paul E. McKenney
                     ` (3 preceding siblings ...)
  2011-07-20 18:26   ` [PATCH tip/core/urgent 4/7] rcu: protect __rcu_read_unlock() against scheduler-using irq handlers Paul E. McKenney
@ 2011-07-20 18:26   ` Paul E. McKenney
  2011-07-20 18:26   ` [PATCH tip/core/urgent 6/7] softirq,rcu: Inform RCU of irq_exit() activity Paul E. McKenney
  2011-07-20 18:26   ` [PATCH tip/core/urgent 7/7] signal: align __lock_task_sighand() irq disabling and RCU Paul E. McKenney
  6 siblings, 0 replies; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, greearb, edt, Peter Zijlstra, Paul E. McKenney

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

Ensure scheduler_ipi() calls irq_{enter,exit} when it does some actual
work. Traditionally we never did any actual work from the resched IPI
and all magic happened in the return from interrupt path.

Now that we do do some work, we need to ensure irq_{enter,exit} are
called so that we don't confuse things.

This affects things like timekeeping, NO_HZ and RCU, basically
everything with a hook in irq_enter/exit.

Explicit examples of things going wrong are:

  sched_clock_cpu() -- has a callback when leaving NO_HZ state to take
                    a new reading from GTOD and TSC. Without this
                    callback, time is stuck in the past.

  RCU -- needs in_irq() to work in order to avoid some nasty deadlocks

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/sched.c |   44 ++++++++++++++++++++++++++++++++++++++------
 1 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 9769c75..1930ee1 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2544,13 +2544,9 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
 }
 
 #ifdef CONFIG_SMP
-static void sched_ttwu_pending(void)
+static void sched_ttwu_do_pending(struct task_struct *list)
 {
 	struct rq *rq = this_rq();
-	struct task_struct *list = xchg(&rq->wake_list, NULL);
-
-	if (!list)
-		return;
 
 	raw_spin_lock(&rq->lock);
 
@@ -2563,9 +2559,45 @@ static void sched_ttwu_pending(void)
 	raw_spin_unlock(&rq->lock);
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
+
+static void sched_ttwu_pending(void)
+{
+	struct rq *rq = this_rq();
+	struct task_struct *list = xchg(&rq->wake_list, NULL);
+
+	if (!list)
+		return;
+
+	sched_ttwu_do_pending(list);
+}
+
+#endif /* CONFIG_HOTPLUG_CPU */
+
 void scheduler_ipi(void)
 {
-	sched_ttwu_pending();
+	struct rq *rq = this_rq();
+	struct task_struct *list = xchg(&rq->wake_list, NULL);
+
+	if (!list)
+		return;
+
+	/*
+	 * Not all reschedule IPI handlers call irq_enter/irq_exit, since
+	 * traditionally all their work was done from the interrupt return
+	 * path. Now that we actually do some work, we need to make sure
+	 * we do call them.
+	 *
+	 * Some archs already do call them, luckily irq_enter/exit nest
+	 * properly.
+	 *
+	 * Arguably we should visit all archs and update all handlers,
+	 * however a fair share of IPIs are still resched only so this would
+	 * somewhat pessimize the simple resched case.
+	 */
+	irq_enter();
+	sched_ttwu_do_pending(list);
+	irq_exit();
 }
 
 static void ttwu_queue_remote(struct task_struct *p, int cpu)
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH tip/core/urgent 6/7] softirq,rcu: Inform RCU of irq_exit() activity
  2011-07-20 18:25 ` [PATCH rcu/urgent 0/7 v2] " Paul E. McKenney
                     ` (4 preceding siblings ...)
  2011-07-20 18:26   ` [PATCH tip/core/urgent 5/7] sched: Add irq_{enter,exit}() to scheduler_ipi() Paul E. McKenney
@ 2011-07-20 18:26   ` Paul E. McKenney
  2011-07-20 18:26   ` [PATCH tip/core/urgent 7/7] signal: align __lock_task_sighand() irq disabling and RCU Paul E. McKenney
  6 siblings, 0 replies; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, greearb, edt, Peter Zijlstra, Paul E. McKenney

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

The rcu_read_unlock_special() function relies on in_irq() to exclude
scheduler activity from interrupt level.  This fails because exit_irq()
can invoke the scheduler after clearing the preempt_count() bits that
in_irq() uses to determine that it is at interrupt level.  This situation
can result in failures as follows:

 $task			IRQ		SoftIRQ

 rcu_read_lock()

 /* do stuff */

 <preempt> |= UNLOCK_BLOCKED

 rcu_read_unlock()
   --t->rcu_read_lock_nesting

			irq_enter();
			/* do stuff, don't use RCU */
			irq_exit();
			  sub_preempt_count(IRQ_EXIT_OFFSET);
			  invoke_softirq()

					ttwu();
					  spin_lock_irq(&pi->lock)
					  rcu_read_lock();
					  /* do stuff */
					  rcu_read_unlock();
					    rcu_read_unlock_special()
					      rcu_report_exp_rnp()
					        ttwu()
					          spin_lock_irq(&pi->lock) /* deadlock */

   rcu_read_unlock_special(t);

Ed can simply trigger this 'easy' because invoke_softirq() immediately
does a ttwu() of ksoftirqd/# instead of doing the in-place softirq stuff
first, but even without that the above happens.

Cure this by also excluding softirqs from the
rcu_read_unlock_special() handler and ensuring the force_irqthreads
ksoftirqd/# wakeup is done from full softirq context.

[ Alternatively, delaying the ->rcu_read_lock_nesting decrement
  until after the special handling would make the thing more robust
  in the face of interrupts as well.  And there is a separate patch
  for that. ]

Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-and-tested-by: Ed Tomlinson <edt@aei.ca>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree_plugin.h |    2 +-
 kernel/softirq.c        |   12 ++++++++++--
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index d9d7a89..8aafbb8 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -318,7 +318,7 @@ static noinline void rcu_read_unlock_special(struct task_struct *t)
 	}
 
 	/* Hardware IRQ handlers cannot block. */
-	if (in_irq()) {
+	if (in_irq() || in_serving_softirq()) {
 		local_irq_restore(flags);
 		return;
 	}
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 40cf63d..fca82c3 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -315,16 +315,24 @@ static inline void invoke_softirq(void)
 {
 	if (!force_irqthreads)
 		__do_softirq();
-	else
+	else {
+		__local_bh_disable((unsigned long)__builtin_return_address(0),
+				SOFTIRQ_OFFSET);
 		wakeup_softirqd();
+		__local_bh_enable(SOFTIRQ_OFFSET);
+	}
 }
 #else
 static inline void invoke_softirq(void)
 {
 	if (!force_irqthreads)
 		do_softirq();
-	else
+	else {
+		__local_bh_disable((unsigned long)__builtin_return_address(0),
+				SOFTIRQ_OFFSET);
 		wakeup_softirqd();
+		__local_bh_enable(SOFTIRQ_OFFSET);
+	}
 }
 #endif
 
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH tip/core/urgent 7/7] signal: align __lock_task_sighand() irq disabling and RCU
  2011-07-20 18:25 ` [PATCH rcu/urgent 0/7 v2] " Paul E. McKenney
                     ` (5 preceding siblings ...)
  2011-07-20 18:26   ` [PATCH tip/core/urgent 6/7] softirq,rcu: Inform RCU of irq_exit() activity Paul E. McKenney
@ 2011-07-20 18:26   ` Paul E. McKenney
  6 siblings, 0 replies; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, greearb, edt, Paul E. McKenney,
	Paul E. McKenney

From: Paul E. McKenney <paul.mckenney@linaro.org>

The __lock_task_sighand() function calls rcu_read_lock() with interrupts
and preemption enabled, but later calls rcu_read_unlock() with interrupts
disabled.  It is therefore possible that this RCU read-side critical
section will be preempted and later RCU priority boosted, which means that
rcu_read_unlock() will call rt_mutex_unlock() in order to deboost itself, but
with interrupts disabled. This results in lockdep splats, so this commit
nests the RCU read-side critical section within the interrupt-disabled
region of code.  This prevents the RCU read-side critical section from
being preempted, and thus prevents the attempt to deboost with interrupts
disabled.

It is quite possible that a better long-term fix is to make rt_mutex_unlock()
disable irqs when acquiring the rt_mutex structure's ->wait_lock.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/signal.c |   19 +++++++++++++------
 1 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index ff76786..415d85d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1178,18 +1178,25 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk,
 {
 	struct sighand_struct *sighand;

-	rcu_read_lock();
 	for (;;) {
+		local_irq_save(*flags);
+		rcu_read_lock();
 		sighand = rcu_dereference(tsk->sighand);
-		if (unlikely(sighand == NULL))
+		if (unlikely(sighand == NULL)) {
+			rcu_read_unlock();
+			local_irq_restore(*flags);
 			break;
+		}

-		spin_lock_irqsave(&sighand->siglock, *flags);
-		if (likely(sighand == tsk->sighand))
+		spin_lock(&sighand->siglock);
+		if (likely(sighand == tsk->sighand)) {
+			rcu_read_unlock();
 			break;
-		spin_unlock_irqrestore(&sighand->siglock, *flags);
+		}
+		spin_unlock(&sighand->siglock);
+		rcu_read_unlock();
+		local_irq_restore(*flags);
 	}
-	rcu_read_unlock();

 	return sighand;
 }
-- 
1.7.3.2

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20 17:15             ` Paul E. McKenney
@ 2011-07-20 18:44               ` Ingo Molnar
  2011-07-20 18:52                 ` Peter Zijlstra
  0 siblings, 1 reply; 55+ messages in thread
From: Ingo Molnar @ 2011-07-20 18:44 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Ben Greear, Linus Torvalds, Ed Tomlinson, linux-kernel, laijs,
	dipankar, akpm, mathieu.desnoyers, josh, niv, tglx, peterz,
	rostedt, Valdis.Kletnieks, dhowells, eric.dumazet, darren,
	patches, edward.tomlinson


* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

> RCU boost is new and counts as strange for another few days at least.

I'd rather prefer a oneliner patch that turns it off in the Kconfig 
for now.

We are awfully late in the -rc cycle and this diffstat:

 b/include/linux/sched.h   |    3 ++
 b/kernel/rcutree_plugin.h |    6 +++--
 b/kernel/sched.c          |   44 ++++++++++++++++++++++++++++++++++-----
 b/kernel/signal.c         |   19 +++++++++++------
 b/kernel/softirq.c        |   12 +++++++++-
 kernel/rcutree_plugin.h   |   51 +++++++++++++++++++++++++++++++++-------------
 6 files changed, 105 insertions(+), 30 deletions(-)

looks very scary to me.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20 18:44               ` Ingo Molnar
@ 2011-07-20 18:52                 ` Peter Zijlstra
  2011-07-20 19:01                   ` Paul E. McKenney
  2011-07-20 19:02                   ` Linus Torvalds
  0 siblings, 2 replies; 55+ messages in thread
From: Peter Zijlstra @ 2011-07-20 18:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Paul E. McKenney, Ben Greear, Linus Torvalds, Ed Tomlinson,
	linux-kernel, laijs, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, edward.tomlinson

On Wed, 2011-07-20 at 20:44 +0200, Ingo Molnar wrote:
> * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> 
> > RCU boost is new and counts as strange for another few days at least.
> 
> I'd rather prefer a oneliner patch that turns it off in the Kconfig 
> for now.
> 
> We are awfully late in the -rc cycle and this diffstat:
> 
>  b/include/linux/sched.h   |    3 ++
>  b/kernel/rcutree_plugin.h |    6 +++--
>  b/kernel/sched.c          |   44 ++++++++++++++++++++++++++++++++++-----
>  b/kernel/signal.c         |   19 +++++++++++------
>  b/kernel/softirq.c        |   12 +++++++++-
>  kernel/rcutree_plugin.h   |   51 +++++++++++++++++++++++++++++++++-------------
>  6 files changed, 105 insertions(+), 30 deletions(-)
> 
> looks very scary to me.

A lot of that is also relevant to !BOOST.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20 18:52                 ` Peter Zijlstra
@ 2011-07-20 19:01                   ` Paul E. McKenney
  2011-07-20 19:25                     ` Peter Zijlstra
  2011-07-20 19:02                   ` Linus Torvalds
  1 sibling, 1 reply; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20 19:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Ben Greear, Linus Torvalds, Ed Tomlinson,
	linux-kernel, laijs, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, edward.tomlinson

On Wed, Jul 20, 2011 at 08:52:58PM +0200, Peter Zijlstra wrote:
> On Wed, 2011-07-20 at 20:44 +0200, Ingo Molnar wrote:
> > * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > > RCU boost is new and counts as strange for another few days at least.
> > 
> > I'd rather prefer a oneliner patch that turns it off in the Kconfig 
> > for now.
> > 
> > We are awfully late in the -rc cycle and this diffstat:
> > 
> >  b/include/linux/sched.h   |    3 ++
> >  b/kernel/rcutree_plugin.h |    6 +++--
> >  b/kernel/sched.c          |   44 ++++++++++++++++++++++++++++++++++-----
> >  b/kernel/signal.c         |   19 +++++++++++------
> >  b/kernel/softirq.c        |   12 +++++++++-
> >  kernel/rcutree_plugin.h   |   51 +++++++++++++++++++++++++++++++++-------------
> >  6 files changed, 105 insertions(+), 30 deletions(-)
> > 
> > looks very scary to me.
> 
> A lot of that is also relevant to !BOOST.

Unfortunately, agreed.  :-(

The expedited grace periods also cause rcu_read_unlock() to call
into the scheduler.  This can interact badly with the recently
added RCU read-side critical sections in the scheduler that have
either the runqueue or the priority-inheritance locks held, especially
when interrupts occur towards the end of __rcu_read_unlock().  But
there are some failure cases that don't involve interrupts (though
I believe all of them involve RCU_BOOST).

That said, turning off RCU_BOOST would allow dropping this patches:

7765be2fec0f4 Fix RCU_BOOST race handling current->rcu_read_unlock_special



							Thanx, Paul

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20 18:52                 ` Peter Zijlstra
  2011-07-20 19:01                   ` Paul E. McKenney
@ 2011-07-20 19:02                   ` Linus Torvalds
  2011-07-20 19:29                     ` Paul E. McKenney
  1 sibling, 1 reply; 55+ messages in thread
From: Linus Torvalds @ 2011-07-20 19:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Paul E. McKenney, Ben Greear, Ed Tomlinson,
	linux-kernel, laijs, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, edward.tomlinson

On Wed, Jul 20, 2011 at 11:52 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>  6 files changed, 105 insertions(+), 30 deletions(-)
>>
>> looks very scary to me.
>
> A lot of that is also relevant to !BOOST.

Can we limit this somehow? Or split it up? Which part of this is "fix
new BOOST features, not ever even executed without BOOST", and which
part of this is "fixes core stuff"?

I *really* hate the timing of this. The code that is only impacted by
BOOST I cannot find it in myself to care about, and I'd be willing to
consider it basically EXPERIMENTAL and just pulling it.

IOW, is the core non-boost fix just a few obvious oneliners?

The "it all broke completely" in previous version of this also doesn't
make me get all the warm fuzzies. Which all makes me go "what is
minimal and really really SAFE?"

            Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20 19:01                   ` Paul E. McKenney
@ 2011-07-20 19:25                     ` Peter Zijlstra
  2011-07-20 20:06                       ` Paul E. McKenney
  0 siblings, 1 reply; 55+ messages in thread
From: Peter Zijlstra @ 2011-07-20 19:25 UTC (permalink / raw)
  To: paulmck
  Cc: Ingo Molnar, Ben Greear, Linus Torvalds, Ed Tomlinson,
	linux-kernel, laijs, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, edward.tomlinson

On Wed, 2011-07-20 at 12:01 -0700, Paul E. McKenney wrote:
>   This can interact badly with the recently
> added RCU read-side critical sections in the scheduler that have
> either the runqueue or the priority-inheritance locks held, especially
> when interrupts occur towards the end of __rcu_read_unlock(). 

Right, so while I recently added a lot more, there have been rcu usage
sites under rq->lock for a long while, see commits

a18b83b7ef ("cpuacct: make cpuacct hierarchy walk in cpuacct_charge()
safe when rcupreempt is used -v2") -- March 2009.

f3b577dec1 ("rcu: apply RCU protection to wake_affine()") -- Jun 2010

b0a0f667 ("sched: suppress RCU lockdep splat in task_fork_fair") -- Oct
2010

So I'm not quite seeing how the problems we're hitting now are new.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20 19:02                   ` Linus Torvalds
@ 2011-07-20 19:29                     ` Paul E. McKenney
  2011-07-20 19:39                       ` Ingo Molnar
  2011-07-20 21:05                       ` Peter Zijlstra
  0 siblings, 2 replies; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20 19:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Ingo Molnar, Ben Greear, Ed Tomlinson,
	linux-kernel, laijs, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, edward.tomlinson

On Wed, Jul 20, 2011 at 12:02:47PM -0700, Linus Torvalds wrote:
> On Wed, Jul 20, 2011 at 11:52 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >>  6 files changed, 105 insertions(+), 30 deletions(-)
> >>
> >> looks very scary to me.
> >
> > A lot of that is also relevant to !BOOST.
> 
> Can we limit this somehow? Or split it up? Which part of this is "fix
> new BOOST features, not ever even executed without BOOST", and which
> part of this is "fixes core stuff"?

#2 (Fix RCU_BOOST race handling current->rcu_read_unlock_special) and
#7 (align __lock_task_sighand() irq disabling and RCU) are needed only
for RCU_BOOST.  The rest fix problems that can occur even with !RCU_BOOST.
I believe that #4 (protect __rcu_read_unlock() against scheduler-using 
irq handlers) turns #1 (decrease rcu_report_exp_rnp coupling with scheduler)
into a longer-term maintenance issue rather than an urgent fix.

> I *really* hate the timing of this. The code that is only impacted by
> BOOST I cannot find it in myself to care about, and I'd be willing to
> consider it basically EXPERIMENTAL and just pulling it.

I can only say that I completely failed in my goal of making my code
go in without a ripple.  :-(

> IOW, is the core non-boost fix just a few obvious oneliners?
> 
> The "it all broke completely" in previous version of this also doesn't
> make me get all the warm fuzzies. Which all makes me go "what is
> minimal and really really SAFE?"

Peter, does #4 (protect __rcu_read_unlock() against scheduler-using
irq handlers) remove the need for #5 (Add irq_{enter,exit}() to
scheduler_ipi()) and #6 (Inform RCU of irq_exit() activity)?  My guess is
"no" for #5 and "yes" for #6.

If my guess is correct, then the minimal non-RCU_BOOST fix is #4 (which
drags along #3) and #6.  Which are not one-liners, but somewhat smaller:

 b/kernel/rcutree_plugin.h |   12 ++++++------
 b/kernel/softirq.c        |   12 ++++++++++--
 kernel/rcutree_plugin.h   |   31 +++++++++++++++++++++++++------
 3 files changed, 41 insertions(+), 14 deletions(-)

How would you like to proceed?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20 19:29                     ` Paul E. McKenney
@ 2011-07-20 19:39                       ` Ingo Molnar
  2011-07-20 19:57                         ` Ingo Molnar
  2011-07-20 20:08                         ` [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck Paul E. McKenney
  2011-07-20 21:05                       ` Peter Zijlstra
  1 sibling, 2 replies; 55+ messages in thread
From: Ingo Molnar @ 2011-07-20 19:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Peter Zijlstra, Ben Greear, Ed Tomlinson,
	linux-kernel, laijs, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, edward.tomlinson


* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

> If my guess is correct, then the minimal non-RCU_BOOST fix is #4 
> (which drags along #3) and #6.  Which are not one-liners, but 
> somewhat smaller:
> 
>  b/kernel/rcutree_plugin.h |   12 ++++++------
>  b/kernel/softirq.c        |   12 ++++++++++--
>  kernel/rcutree_plugin.h   |   31 +++++++++++++++++++++++++------
>  3 files changed, 41 insertions(+), 14 deletions(-)

That's half the patch size and half the patch count.

PeterZ's question is relevant: since we apparently had similar bugs 
in v2.6.39 as well, what changed in v3.0 that makes them so urgent
to fix?

If it's just better instrumentation that proves them better then i'd 
suggest fixing this in v3.1 and not risking v3.0 with an unintended 
side effect.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20 19:39                       ` Ingo Molnar
@ 2011-07-20 19:57                         ` Ingo Molnar
  2011-07-20 20:33                           ` Paul E. McKenney
  2011-07-20 21:04                           ` [GIT PULL] RCU fixes for v3.0 Ingo Molnar
  2011-07-20 20:08                         ` [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck Paul E. McKenney
  1 sibling, 2 replies; 55+ messages in thread
From: Ingo Molnar @ 2011-07-20 19:57 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Peter Zijlstra, Ben Greear, Ed Tomlinson,
	linux-kernel, laijs, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, edward.tomlinson


* Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> 
> > If my guess is correct, then the minimal non-RCU_BOOST fix is #4 
> > (which drags along #3) and #6.  Which are not one-liners, but 
> > somewhat smaller:
> > 
> >  b/kernel/rcutree_plugin.h |   12 ++++++------
> >  b/kernel/softirq.c        |   12 ++++++++++--
> >  kernel/rcutree_plugin.h   |   31 +++++++++++++++++++++++++------
> >  3 files changed, 41 insertions(+), 14 deletions(-)
> 
> That's half the patch size and half the patch count.
> 
> PeterZ's question is relevant: since we apparently had similar bugs 
> in v2.6.39 as well, what changed in v3.0 that makes them so urgent 
> to fix?
> 
> If it's just better instrumentation that proves them better then 
> i'd suggest fixing this in v3.1 and not risking v3.0 with an 
> unintended side effect.

Ok, i looked some more at the background and the symptoms that people 
are seeing: kernel crashes and lockups. I think we want these 
problems fixed in v3.0, even if it was the recent introduction of 
RCU_BOOST that made it really prominent.

Having put some testing into your rcu/urgent branch today i'd feel 
more comfortable with taking this plus perhaps an RCU_BOOST disabling 
patch. That makes it all fundamentally tested by a number of people 
(including those who reported/reproduced the problems).

Linus, would that approach be fine with you? I'll send an RFC pull 
request for the 6 patches as a reply to this mail, in a couple of 
minutes.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20 19:25                     ` Peter Zijlstra
@ 2011-07-20 20:06                       ` Paul E. McKenney
  0 siblings, 0 replies; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20 20:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Ben Greear, Linus Torvalds, Ed Tomlinson,
	linux-kernel, laijs, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, edward.tomlinson

On Wed, Jul 20, 2011 at 09:25:46PM +0200, Peter Zijlstra wrote:
> On Wed, 2011-07-20 at 12:01 -0700, Paul E. McKenney wrote:
> >   This can interact badly with the recently
> > added RCU read-side critical sections in the scheduler that have
> > either the runqueue or the priority-inheritance locks held, especially
> > when interrupts occur towards the end of __rcu_read_unlock(). 
> 
> Right, so while I recently added a lot more, there have been rcu usage
> sites under rq->lock for a long while, see commits
> 
> a18b83b7ef ("cpuacct: make cpuacct hierarchy walk in cpuacct_charge()
> safe when rcupreempt is used -v2") -- March 2009.
> 
> f3b577dec1 ("rcu: apply RCU protection to wake_affine()") -- Jun 2010
> 
> b0a0f667 ("sched: suppress RCU lockdep splat in task_fork_fair") -- Oct
> 2010
> 
> So I'm not quite seeing how the problems we're hitting now are new.

My guess is that the problem has been there for quite some time, but
that the probability of hitting it has increased, at least for some
combinations of config options.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20 19:39                       ` Ingo Molnar
  2011-07-20 19:57                         ` Ingo Molnar
@ 2011-07-20 20:08                         ` Paul E. McKenney
  1 sibling, 0 replies; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20 20:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Peter Zijlstra, Ben Greear, Ed Tomlinson,
	linux-kernel, laijs, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, edward.tomlinson

On Wed, Jul 20, 2011 at 09:39:25PM +0200, Ingo Molnar wrote:
> 
> * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> 
> > If my guess is correct, then the minimal non-RCU_BOOST fix is #4 
> > (which drags along #3) and #6.  Which are not one-liners, but 
> > somewhat smaller:
> > 
> >  b/kernel/rcutree_plugin.h |   12 ++++++------
> >  b/kernel/softirq.c        |   12 ++++++++++--
> >  kernel/rcutree_plugin.h   |   31 +++++++++++++++++++++++++------
> >  3 files changed, 41 insertions(+), 14 deletions(-)
> 
> That's half the patch size and half the patch count.
> 
> PeterZ's question is relevant: since we apparently had similar bugs 
> in v2.6.39 as well, what changed in v3.0 that makes them so urgent
> to fix?
> 
> If it's just better instrumentation that proves them better then i'd 
> suggest fixing this in v3.1 and not risking v3.0 with an unintended 
> side effect.

Agreed, a fix in v3.1 and a backport to v3.0-stable after serious testing
would make a lot of sense.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20 19:57                         ` Ingo Molnar
@ 2011-07-20 20:33                           ` Paul E. McKenney
  2011-07-20 20:54                             ` Ben Greear
  2011-07-20 21:04                           ` [GIT PULL] RCU fixes for v3.0 Ingo Molnar
  1 sibling, 1 reply; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20 20:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Peter Zijlstra, Ben Greear, Ed Tomlinson,
	linux-kernel, laijs, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, edward.tomlinson

On Wed, Jul 20, 2011 at 09:57:42PM +0200, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > 
> > * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > > If my guess is correct, then the minimal non-RCU_BOOST fix is #4 
> > > (which drags along #3) and #6.  Which are not one-liners, but 
> > > somewhat smaller:
> > > 
> > >  b/kernel/rcutree_plugin.h |   12 ++++++------
> > >  b/kernel/softirq.c        |   12 ++++++++++--
> > >  kernel/rcutree_plugin.h   |   31 +++++++++++++++++++++++++------
> > >  3 files changed, 41 insertions(+), 14 deletions(-)
> > 
> > That's half the patch size and half the patch count.
> > 
> > PeterZ's question is relevant: since we apparently had similar bugs 
> > in v2.6.39 as well, what changed in v3.0 that makes them so urgent 
> > to fix?
> > 
> > If it's just better instrumentation that proves them better then 
> > i'd suggest fixing this in v3.1 and not risking v3.0 with an 
> > unintended side effect.
> 
> Ok, i looked some more at the background and the symptoms that people 
> are seeing: kernel crashes and lockups. I think we want these 
> problems fixed in v3.0, even if it was the recent introduction of 
> RCU_BOOST that made it really prominent.
> 
> Having put some testing into your rcu/urgent branch today i'd feel 
> more comfortable with taking this plus perhaps an RCU_BOOST disabling 
> patch. That makes it all fundamentally tested by a number of people 
> (including those who reported/reproduced the problems).

RCU_BOOST is currently default=n.  Is that sufficient?  If not, one
low-risk approach would be for me to just remove RCU_BOOST from init/Kconfig
in 3.0 and add it back in for 3.1.  Please let me know what works best.

							Thanx, Paul

> Linus, would that approach be fine with you? I'll send an RFC pull 
> request for the 6 patches as a reply to this mail, in a couple of 
> minutes.
> 
> Thanks,
> 
> 	Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20 20:33                           ` Paul E. McKenney
@ 2011-07-20 20:54                             ` Ben Greear
  2011-07-20 21:12                               ` Paul E. McKenney
  0 siblings, 1 reply; 55+ messages in thread
From: Ben Greear @ 2011-07-20 20:54 UTC (permalink / raw)
  To: paulmck
  Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, Ed Tomlinson,
	linux-kernel, laijs, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, edward.tomlinson

On 07/20/2011 01:33 PM, Paul E. McKenney wrote:
> On Wed, Jul 20, 2011 at 09:57:42PM +0200, Ingo Molnar wrote:
>>
>> * Ingo Molnar<mingo@elte.hu>  wrote:
>>
>>>
>>> * Paul E. McKenney<paulmck@linux.vnet.ibm.com>  wrote:
>>>
>>>> If my guess is correct, then the minimal non-RCU_BOOST fix is #4
>>>> (which drags along #3) and #6.  Which are not one-liners, but
>>>> somewhat smaller:
>>>>
>>>>   b/kernel/rcutree_plugin.h |   12 ++++++------
>>>>   b/kernel/softirq.c        |   12 ++++++++++--
>>>>   kernel/rcutree_plugin.h   |   31 +++++++++++++++++++++++++------
>>>>   3 files changed, 41 insertions(+), 14 deletions(-)
>>>
>>> That's half the patch size and half the patch count.
>>>
>>> PeterZ's question is relevant: since we apparently had similar bugs
>>> in v2.6.39 as well, what changed in v3.0 that makes them so urgent
>>> to fix?
>>>
>>> If it's just better instrumentation that proves them better then
>>> i'd suggest fixing this in v3.1 and not risking v3.0 with an
>>> unintended side effect.
>>
>> Ok, i looked some more at the background and the symptoms that people
>> are seeing: kernel crashes and lockups. I think we want these
>> problems fixed in v3.0, even if it was the recent introduction of
>> RCU_BOOST that made it really prominent.
>>
>> Having put some testing into your rcu/urgent branch today i'd feel
>> more comfortable with taking this plus perhaps an RCU_BOOST disabling
>> patch. That makes it all fundamentally tested by a number of people
>> (including those who reported/reproduced the problems).
>
> RCU_BOOST is currently default=n.  Is that sufficient?  If not, one

Not if it remains broken I think..unless you put it under CONFIG_BROKEN
or something.  Otherwise, folks are liable to turn it on and not realize
it's the cause of subtle bugs.

For what it's worth, my tests have been running clean for around 2 hours, so the full set of
fixes with RCU_BOOST appears good, so far.  I'll let it continue to run
at least overnight to make sure I'm not just getting lucky...

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [GIT PULL] RCU fixes for v3.0
  2011-07-20 19:57                         ` Ingo Molnar
  2011-07-20 20:33                           ` Paul E. McKenney
@ 2011-07-20 21:04                           ` Ingo Molnar
  2011-07-20 21:55                             ` Ed Tomlinson
  1 sibling, 1 reply; 55+ messages in thread
From: Ingo Molnar @ 2011-07-20 21:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linus Torvalds, Peter Zijlstra, Ben Greear, Ed Tomlinson,
	linux-kernel, laijs, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, edward.tomlinson, Paul E. McKenney


* Ingo Molnar <mingo@elte.hu> wrote:

> Having put some testing into your rcu/urgent branch today i'd feel 
> more comfortable with taking this plus perhaps an RCU_BOOST 
> disabling patch. That makes it all fundamentally tested by a number 
> of people (including those who reported/reproduced the problems).
> 
> Linus, would that approach be fine with you? I'll send an RFC pull 
> request for the 6 patches as a reply to this mail, in a couple of 
> minutes.

here it is:

Linus,

Please pull the latest core-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-urgent-for-linus

In addition to the testing Paul & co has performed i have boot tested 
x86 defconfig/allno/allyes/allmod and a fair number of randconfigs, 
and i build tested a dozen architecture defconfigs, amongst them the 
key ones.

( We could also mark RCU_BOOST in init/Kconfig as EXPERIMENTAL, to
  further de-risk it - but that change is not part of this pull
  request. )

 Thanks,

	Ingo

------------------>
Paul E. McKenney (5):
      rcu: decrease rcu_report_exp_rnp coupling with scheduler
      rcu: Fix RCU_BOOST race handling current->rcu_read_unlock_special
      rcu: Streamline code produced by __rcu_read_unlock()
      rcu: protect __rcu_read_unlock() against scheduler-using irq handlers
      signal: align __lock_task_sighand() irq disabling and RCU

Peter Zijlstra (2):
      sched: Add irq_{enter,exit}() to scheduler_ipi()
      softirq,rcu: Inform RCU of irq_exit() activity


 include/linux/sched.h   |    3 ++
 kernel/rcutree_plugin.h |   53 ++++++++++++++++++++++++++++++++++------------
 kernel/sched.c          |   44 +++++++++++++++++++++++++++++++++-----
 kernel/signal.c         |   19 +++++++++++-----
 kernel/softirq.c        |   12 ++++++++-
 5 files changed, 103 insertions(+), 28 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 496770a..76676a4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1254,6 +1254,9 @@ struct task_struct {
 #ifdef CONFIG_PREEMPT_RCU
 	int rcu_read_lock_nesting;
 	char rcu_read_unlock_special;
+#if defined(CONFIG_RCU_BOOST) && defined(CONFIG_TREE_PREEMPT_RCU)
+	int rcu_boosted;
+#endif /* #if defined(CONFIG_RCU_BOOST) && defined(CONFIG_TREE_PREEMPT_RCU) */
 	struct list_head rcu_node_entry;
 #endif /* #ifdef CONFIG_PREEMPT_RCU */
 #ifdef CONFIG_TREE_PREEMPT_RCU
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 75113cb..8aafbb8 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -68,6 +68,7 @@ struct rcu_state rcu_preempt_state = RCU_STATE_INITIALIZER(rcu_preempt_state);
 DEFINE_PER_CPU(struct rcu_data, rcu_preempt_data);
 static struct rcu_state *rcu_state = &rcu_preempt_state;
 
+static void rcu_read_unlock_special(struct task_struct *t);
 static int rcu_preempted_readers_exp(struct rcu_node *rnp);
 
 /*
@@ -147,7 +148,7 @@ static void rcu_preempt_note_context_switch(int cpu)
 	struct rcu_data *rdp;
 	struct rcu_node *rnp;
 
-	if (t->rcu_read_lock_nesting &&
+	if (t->rcu_read_lock_nesting > 0 &&
 	    (t->rcu_read_unlock_special & RCU_READ_UNLOCK_BLOCKED) == 0) {
 
 		/* Possibly blocking in an RCU read-side critical section. */
@@ -190,6 +191,14 @@ static void rcu_preempt_note_context_switch(int cpu)
 				rnp->gp_tasks = &t->rcu_node_entry;
 		}
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
+	} else if (t->rcu_read_lock_nesting < 0 &&
+		   t->rcu_read_unlock_special) {
+
+		/*
+		 * Complete exit from RCU read-side critical section on
+		 * behalf of preempted instance of __rcu_read_unlock().
+		 */
+		rcu_read_unlock_special(t);
 	}
 
 	/*
@@ -284,7 +293,7 @@ static struct list_head *rcu_next_node_entry(struct task_struct *t,
  * notify RCU core processing or task having blocked during the RCU
  * read-side critical section.
  */
-static void rcu_read_unlock_special(struct task_struct *t)
+static noinline void rcu_read_unlock_special(struct task_struct *t)
 {
 	int empty;
 	int empty_exp;
@@ -309,7 +318,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
 	}
 
 	/* Hardware IRQ handlers cannot block. */
-	if (in_irq()) {
+	if (in_irq() || in_serving_softirq()) {
 		local_irq_restore(flags);
 		return;
 	}
@@ -342,6 +351,11 @@ static void rcu_read_unlock_special(struct task_struct *t)
 #ifdef CONFIG_RCU_BOOST
 		if (&t->rcu_node_entry == rnp->boost_tasks)
 			rnp->boost_tasks = np;
+		/* Snapshot and clear ->rcu_boosted with rcu_node lock held. */
+		if (t->rcu_boosted) {
+			special |= RCU_READ_UNLOCK_BOOSTED;
+			t->rcu_boosted = 0;
+		}
 #endif /* #ifdef CONFIG_RCU_BOOST */
 		t->rcu_blocked_node = NULL;
 
@@ -358,7 +372,6 @@ static void rcu_read_unlock_special(struct task_struct *t)
 #ifdef CONFIG_RCU_BOOST
 		/* Unboost if we were boosted. */
 		if (special & RCU_READ_UNLOCK_BOOSTED) {
-			t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_BOOSTED;
 			rt_mutex_unlock(t->rcu_boost_mutex);
 			t->rcu_boost_mutex = NULL;
 		}
@@ -387,13 +400,22 @@ void __rcu_read_unlock(void)
 	struct task_struct *t = current;
 
 	barrier();  /* needed if we ever invoke rcu_read_unlock in rcutree.c */
-	--t->rcu_read_lock_nesting;
-	barrier();  /* decrement before load of ->rcu_read_unlock_special */
-	if (t->rcu_read_lock_nesting == 0 &&
-	    unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
-		rcu_read_unlock_special(t);
+	if (t->rcu_read_lock_nesting != 1)
+		--t->rcu_read_lock_nesting;
+	else {
+		t->rcu_read_lock_nesting = INT_MIN;
+		barrier();  /* assign before ->rcu_read_unlock_special load */
+		if (unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
+			rcu_read_unlock_special(t);
+		barrier();  /* ->rcu_read_unlock_special load before assign */
+		t->rcu_read_lock_nesting = 0;
+	}
 #ifdef CONFIG_PROVE_LOCKING
-	WARN_ON_ONCE(ACCESS_ONCE(t->rcu_read_lock_nesting) < 0);
+	{
+		int rrln = ACCESS_ONCE(t->rcu_read_lock_nesting);
+
+		WARN_ON_ONCE(rrln < 0 && rrln > INT_MIN / 2);
+	}
 #endif /* #ifdef CONFIG_PROVE_LOCKING */
 }
 EXPORT_SYMBOL_GPL(__rcu_read_unlock);
@@ -589,7 +611,8 @@ static void rcu_preempt_check_callbacks(int cpu)
 		rcu_preempt_qs(cpu);
 		return;
 	}
-	if (per_cpu(rcu_preempt_data, cpu).qs_pending)
+	if (t->rcu_read_lock_nesting > 0 &&
+	    per_cpu(rcu_preempt_data, cpu).qs_pending)
 		t->rcu_read_unlock_special |= RCU_READ_UNLOCK_NEED_QS;
 }
 
@@ -695,9 +718,12 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
 
 	raw_spin_lock_irqsave(&rnp->lock, flags);
 	for (;;) {
-		if (!sync_rcu_preempt_exp_done(rnp))
+		if (!sync_rcu_preempt_exp_done(rnp)) {
+			raw_spin_unlock_irqrestore(&rnp->lock, flags);
 			break;
+		}
 		if (rnp->parent == NULL) {
+			raw_spin_unlock_irqrestore(&rnp->lock, flags);
 			wake_up(&sync_rcu_preempt_exp_wq);
 			break;
 		}
@@ -707,7 +733,6 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
 		raw_spin_lock(&rnp->lock); /* irqs already disabled */
 		rnp->expmask &= ~mask;
 	}
-	raw_spin_unlock_irqrestore(&rnp->lock, flags);
 }
 
 /*
@@ -1174,7 +1199,7 @@ static int rcu_boost(struct rcu_node *rnp)
 	t = container_of(tb, struct task_struct, rcu_node_entry);
 	rt_mutex_init_proxy_locked(&mtx, t);
 	t->rcu_boost_mutex = &mtx;
-	t->rcu_read_unlock_special |= RCU_READ_UNLOCK_BOOSTED;
+	t->rcu_boosted = 1;
 	raw_spin_unlock_irqrestore(&rnp->lock, flags);
 	rt_mutex_lock(&mtx);  /* Side effect: boosts task t's priority. */
 	rt_mutex_unlock(&mtx);  /* Keep lockdep happy. */
diff --git a/kernel/sched.c b/kernel/sched.c
index 3dc716f..31e92ae 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2544,13 +2544,9 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
 }
 
 #ifdef CONFIG_SMP
-static void sched_ttwu_pending(void)
+static void sched_ttwu_do_pending(struct task_struct *list)
 {
 	struct rq *rq = this_rq();
-	struct task_struct *list = xchg(&rq->wake_list, NULL);
-
-	if (!list)
-		return;
 
 	raw_spin_lock(&rq->lock);
 
@@ -2563,9 +2559,45 @@ static void sched_ttwu_pending(void)
 	raw_spin_unlock(&rq->lock);
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
+
+static void sched_ttwu_pending(void)
+{
+	struct rq *rq = this_rq();
+	struct task_struct *list = xchg(&rq->wake_list, NULL);
+
+	if (!list)
+		return;
+
+	sched_ttwu_do_pending(list);
+}
+
+#endif /* CONFIG_HOTPLUG_CPU */
+
 void scheduler_ipi(void)
 {
-	sched_ttwu_pending();
+	struct rq *rq = this_rq();
+	struct task_struct *list = xchg(&rq->wake_list, NULL);
+
+	if (!list)
+		return;
+
+	/*
+	 * Not all reschedule IPI handlers call irq_enter/irq_exit, since
+	 * traditionally all their work was done from the interrupt return
+	 * path. Now that we actually do some work, we need to make sure
+	 * we do call them.
+	 *
+	 * Some archs already do call them, luckily irq_enter/exit nest
+	 * properly.
+	 *
+	 * Arguably we should visit all archs and update all handlers,
+	 * however a fair share of IPIs are still resched only so this would
+	 * somewhat pessimize the simple resched case.
+	 */
+	irq_enter();
+	sched_ttwu_do_pending(list);
+	irq_exit();
 }
 
 static void ttwu_queue_remote(struct task_struct *p, int cpu)
diff --git a/kernel/signal.c b/kernel/signal.c
index ff76786..415d85d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1178,18 +1178,25 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk,
 {
 	struct sighand_struct *sighand;
 
-	rcu_read_lock();
 	for (;;) {
+		local_irq_save(*flags);
+		rcu_read_lock();
 		sighand = rcu_dereference(tsk->sighand);
-		if (unlikely(sighand == NULL))
+		if (unlikely(sighand == NULL)) {
+			rcu_read_unlock();
+			local_irq_restore(*flags);
 			break;
+		}
 
-		spin_lock_irqsave(&sighand->siglock, *flags);
-		if (likely(sighand == tsk->sighand))
+		spin_lock(&sighand->siglock);
+		if (likely(sighand == tsk->sighand)) {
+			rcu_read_unlock();
 			break;
-		spin_unlock_irqrestore(&sighand->siglock, *flags);
+		}
+		spin_unlock(&sighand->siglock);
+		rcu_read_unlock();
+		local_irq_restore(*flags);
 	}
-	rcu_read_unlock();
 
 	return sighand;
 }
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 40cf63d..fca82c3 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -315,16 +315,24 @@ static inline void invoke_softirq(void)
 {
 	if (!force_irqthreads)
 		__do_softirq();
-	else
+	else {
+		__local_bh_disable((unsigned long)__builtin_return_address(0),
+				SOFTIRQ_OFFSET);
 		wakeup_softirqd();
+		__local_bh_enable(SOFTIRQ_OFFSET);
+	}
 }
 #else
 static inline void invoke_softirq(void)
 {
 	if (!force_irqthreads)
 		do_softirq();
-	else
+	else {
+		__local_bh_disable((unsigned long)__builtin_return_address(0),
+				SOFTIRQ_OFFSET);
 		wakeup_softirqd();
+		__local_bh_enable(SOFTIRQ_OFFSET);
+	}
 }
 #endif
 

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20 19:29                     ` Paul E. McKenney
  2011-07-20 19:39                       ` Ingo Molnar
@ 2011-07-20 21:05                       ` Peter Zijlstra
  2011-07-20 21:39                         ` Paul E. McKenney
  1 sibling, 1 reply; 55+ messages in thread
From: Peter Zijlstra @ 2011-07-20 21:05 UTC (permalink / raw)
  To: paulmck
  Cc: Linus Torvalds, Ingo Molnar, Ben Greear, Ed Tomlinson,
	linux-kernel, laijs, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, edward.tomlinson

On Wed, 2011-07-20 at 12:29 -0700, Paul E. McKenney wrote:

> Peter, does #4 (protect __rcu_read_unlock() against scheduler-using
> irq handlers) remove the need for #5 (Add irq_{enter,exit}() to
> scheduler_ipi()) and #6 (Inform RCU of irq_exit() activity)?  My guess is
> "no" for #5 and "yes" for #6.

More or less, we want to keep #5 for it does more than just fix RCU, but
yeah, I _think_ #4 obsoletes the direct need for #6.





^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20 20:54                             ` Ben Greear
@ 2011-07-20 21:12                               ` Paul E. McKenney
  2011-07-21  3:25                                 ` Ben Greear
  0 siblings, 1 reply; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20 21:12 UTC (permalink / raw)
  To: Ben Greear
  Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, Ed Tomlinson,
	linux-kernel, laijs, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, edward.tomlinson

On Wed, Jul 20, 2011 at 01:54:49PM -0700, Ben Greear wrote:
> On 07/20/2011 01:33 PM, Paul E. McKenney wrote:
> >On Wed, Jul 20, 2011 at 09:57:42PM +0200, Ingo Molnar wrote:
> >>
> >>* Ingo Molnar<mingo@elte.hu>  wrote:
> >>
> >>>
> >>>* Paul E. McKenney<paulmck@linux.vnet.ibm.com>  wrote:
> >>>
> >>>>If my guess is correct, then the minimal non-RCU_BOOST fix is #4
> >>>>(which drags along #3) and #6.  Which are not one-liners, but
> >>>>somewhat smaller:
> >>>>
> >>>>  b/kernel/rcutree_plugin.h |   12 ++++++------
> >>>>  b/kernel/softirq.c        |   12 ++++++++++--
> >>>>  kernel/rcutree_plugin.h   |   31 +++++++++++++++++++++++++------
> >>>>  3 files changed, 41 insertions(+), 14 deletions(-)
> >>>
> >>>That's half the patch size and half the patch count.
> >>>
> >>>PeterZ's question is relevant: since we apparently had similar bugs
> >>>in v2.6.39 as well, what changed in v3.0 that makes them so urgent
> >>>to fix?
> >>>
> >>>If it's just better instrumentation that proves them better then
> >>>i'd suggest fixing this in v3.1 and not risking v3.0 with an
> >>>unintended side effect.
> >>
> >>Ok, i looked some more at the background and the symptoms that people
> >>are seeing: kernel crashes and lockups. I think we want these
> >>problems fixed in v3.0, even if it was the recent introduction of
> >>RCU_BOOST that made it really prominent.
> >>
> >>Having put some testing into your rcu/urgent branch today i'd feel
> >>more comfortable with taking this plus perhaps an RCU_BOOST disabling
> >>patch. That makes it all fundamentally tested by a number of people
> >>(including those who reported/reproduced the problems).
> >
> >RCU_BOOST is currently default=n.  Is that sufficient?  If not, one
> 
> Not if it remains broken I think..unless you put it under CONFIG_BROKEN
> or something.  Otherwise, folks are liable to turn it on and not realize
> it's the cause of subtle bugs.

Good point, I could easily add "depends on BROKEN".

> For what it's worth, my tests have been running clean for around 2 hours, so the full set of
> fixes with RCU_BOOST appears good, so far.  I'll let it continue to run
> at least overnight to make sure I'm not just getting lucky...

Continuing to think good thoughts...  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20 21:05                       ` Peter Zijlstra
@ 2011-07-20 21:39                         ` Paul E. McKenney
  0 siblings, 0 replies; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20 21:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Ingo Molnar, Ben Greear, Ed Tomlinson,
	linux-kernel, laijs, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, edward.tomlinson

On Wed, Jul 20, 2011 at 11:05:53PM +0200, Peter Zijlstra wrote:
> On Wed, 2011-07-20 at 12:29 -0700, Paul E. McKenney wrote:
> 
> > Peter, does #4 (protect __rcu_read_unlock() against scheduler-using
> > irq handlers) remove the need for #5 (Add irq_{enter,exit}() to
> > scheduler_ipi()) and #6 (Inform RCU of irq_exit() activity)?  My guess is
> > "no" for #5 and "yes" for #6.
> 
> More or less, we want to keep #5 for it does more than just fix RCU, but
> yeah, I _think_ #4 obsoletes the direct need for #6.

Heh.  So the lowest risk is keeping #6 for now and deciding later
whether we really need it.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [GIT PULL] RCU fixes for v3.0
  2011-07-20 21:04                           ` [GIT PULL] RCU fixes for v3.0 Ingo Molnar
@ 2011-07-20 21:55                             ` Ed Tomlinson
  2011-07-20 22:06                               ` Paul E. McKenney
  0 siblings, 1 reply; 55+ messages in thread
From: Ed Tomlinson @ 2011-07-20 21:55 UTC (permalink / raw)
  To: Ingo Molnar, Paul E. McKenney
  Cc: Linus Torvalds, Peter Zijlstra, Ben Greear, linux-kernel, laijs,
	dipankar, akpm, mathieu.desnoyers, josh, niv, tglx, rostedt,
	Valdis.Kletnieks, dhowells, eric.dumazet, darren, patches

On Wednesday 20 July 2011 17:04:26 Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > Having put some testing into your rcu/urgent branch today i'd feel 
> > more comfortable with taking this plus perhaps an RCU_BOOST 
> > disabling patch. That makes it all fundamentally tested by a number 
> > of people (including those who reported/reproduced the problems).
> > 
> > Linus, would that approach be fine with you? I'll send an RFC pull 
> > request for the 6 patches as a reply to this mail, in a couple of 
> > minutes.
> 
> here it is:
> 
> Linus,
> 
> Please pull the latest core-urgent-for-linus git tree from:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-urgent-for-linus

I've been running on the previous version of this tree along with a fix for the first patch.  An 
additional patch was added this morning to prevent an invalid warning from triggering.  Aside 
from that warning the system was stable with no other unexpected output in dmesg.  

Resetting to master and pulling the above tree gives me a clean boot.  On atleast one box that was
seeing problems this seems too be making things better (boost and threadirqs enabled).

Thanks
Ed

> In addition to the testing Paul & co has performed i have boot tested 
> x86 defconfig/allno/allyes/allmod and a fair number of randconfigs, 
> and i build tested a dozen architecture defconfigs, amongst them the 
> key ones.
> 
> ( We could also mark RCU_BOOST in init/Kconfig as EXPERIMENTAL, to
>   further de-risk it - but that change is not part of this pull
>   request. )
> 
>  Thanks,
> 
> 	Ingo
> 
> ------------------>
> Paul E. McKenney (5):
>       rcu: decrease rcu_report_exp_rnp coupling with scheduler
>       rcu: Fix RCU_BOOST race handling current->rcu_read_unlock_special
>       rcu: Streamline code produced by __rcu_read_unlock()
>       rcu: protect __rcu_read_unlock() against scheduler-using irq handlers
>       signal: align __lock_task_sighand() irq disabling and RCU
> 
> Peter Zijlstra (2):
>       sched: Add irq_{enter,exit}() to scheduler_ipi()
>       softirq,rcu: Inform RCU of irq_exit() activity
> 
> 
>  include/linux/sched.h   |    3 ++
>  kernel/rcutree_plugin.h |   53 ++++++++++++++++++++++++++++++++++------------
>  kernel/sched.c          |   44 +++++++++++++++++++++++++++++++++-----
>  kernel/signal.c         |   19 +++++++++++-----
>  kernel/softirq.c        |   12 ++++++++-
>  5 files changed, 103 insertions(+), 28 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 496770a..76676a4 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1254,6 +1254,9 @@ struct task_struct {
>  #ifdef CONFIG_PREEMPT_RCU
>  	int rcu_read_lock_nesting;
>  	char rcu_read_unlock_special;
> +#if defined(CONFIG_RCU_BOOST) && defined(CONFIG_TREE_PREEMPT_RCU)
> +	int rcu_boosted;
> +#endif /* #if defined(CONFIG_RCU_BOOST) && defined(CONFIG_TREE_PREEMPT_RCU) */
>  	struct list_head rcu_node_entry;
>  #endif /* #ifdef CONFIG_PREEMPT_RCU */
>  #ifdef CONFIG_TREE_PREEMPT_RCU
> diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> index 75113cb..8aafbb8 100644
> --- a/kernel/rcutree_plugin.h
> +++ b/kernel/rcutree_plugin.h
> @@ -68,6 +68,7 @@ struct rcu_state rcu_preempt_state = RCU_STATE_INITIALIZER(rcu_preempt_state);
>  DEFINE_PER_CPU(struct rcu_data, rcu_preempt_data);
>  static struct rcu_state *rcu_state = &rcu_preempt_state;
>  
> +static void rcu_read_unlock_special(struct task_struct *t);
>  static int rcu_preempted_readers_exp(struct rcu_node *rnp);
>  
>  /*
> @@ -147,7 +148,7 @@ static void rcu_preempt_note_context_switch(int cpu)
>  	struct rcu_data *rdp;
>  	struct rcu_node *rnp;
>  
> -	if (t->rcu_read_lock_nesting &&
> +	if (t->rcu_read_lock_nesting > 0 &&
>  	    (t->rcu_read_unlock_special & RCU_READ_UNLOCK_BLOCKED) == 0) {
>  
>  		/* Possibly blocking in an RCU read-side critical section. */
> @@ -190,6 +191,14 @@ static void rcu_preempt_note_context_switch(int cpu)
>  				rnp->gp_tasks = &t->rcu_node_entry;
>  		}
>  		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> +	} else if (t->rcu_read_lock_nesting < 0 &&
> +		   t->rcu_read_unlock_special) {
> +
> +		/*
> +		 * Complete exit from RCU read-side critical section on
> +		 * behalf of preempted instance of __rcu_read_unlock().
> +		 */
> +		rcu_read_unlock_special(t);
>  	}
>  
>  	/*
> @@ -284,7 +293,7 @@ static struct list_head *rcu_next_node_entry(struct task_struct *t,
>   * notify RCU core processing or task having blocked during the RCU
>   * read-side critical section.
>   */
> -static void rcu_read_unlock_special(struct task_struct *t)
> +static noinline void rcu_read_unlock_special(struct task_struct *t)
>  {
>  	int empty;
>  	int empty_exp;
> @@ -309,7 +318,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
>  	}
>  
>  	/* Hardware IRQ handlers cannot block. */
> -	if (in_irq()) {
> +	if (in_irq() || in_serving_softirq()) {
>  		local_irq_restore(flags);
>  		return;
>  	}
> @@ -342,6 +351,11 @@ static void rcu_read_unlock_special(struct task_struct *t)
>  #ifdef CONFIG_RCU_BOOST
>  		if (&t->rcu_node_entry == rnp->boost_tasks)
>  			rnp->boost_tasks = np;
> +		/* Snapshot and clear ->rcu_boosted with rcu_node lock held. */
> +		if (t->rcu_boosted) {
> +			special |= RCU_READ_UNLOCK_BOOSTED;
> +			t->rcu_boosted = 0;
> +		}
>  #endif /* #ifdef CONFIG_RCU_BOOST */
>  		t->rcu_blocked_node = NULL;
>  
> @@ -358,7 +372,6 @@ static void rcu_read_unlock_special(struct task_struct *t)
>  #ifdef CONFIG_RCU_BOOST
>  		/* Unboost if we were boosted. */
>  		if (special & RCU_READ_UNLOCK_BOOSTED) {
> -			t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_BOOSTED;
>  			rt_mutex_unlock(t->rcu_boost_mutex);
>  			t->rcu_boost_mutex = NULL;
>  		}
> @@ -387,13 +400,22 @@ void __rcu_read_unlock(void)
>  	struct task_struct *t = current;
>  
>  	barrier();  /* needed if we ever invoke rcu_read_unlock in rcutree.c */
> -	--t->rcu_read_lock_nesting;
> -	barrier();  /* decrement before load of ->rcu_read_unlock_special */
> -	if (t->rcu_read_lock_nesting == 0 &&
> -	    unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
> -		rcu_read_unlock_special(t);
> +	if (t->rcu_read_lock_nesting != 1)
> +		--t->rcu_read_lock_nesting;
> +	else {
> +		t->rcu_read_lock_nesting = INT_MIN;
> +		barrier();  /* assign before ->rcu_read_unlock_special load */
> +		if (unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
> +			rcu_read_unlock_special(t);
> +		barrier();  /* ->rcu_read_unlock_special load before assign */
> +		t->rcu_read_lock_nesting = 0;
> +	}
>  #ifdef CONFIG_PROVE_LOCKING
> -	WARN_ON_ONCE(ACCESS_ONCE(t->rcu_read_lock_nesting) < 0);
> +	{
> +		int rrln = ACCESS_ONCE(t->rcu_read_lock_nesting);
> +
> +		WARN_ON_ONCE(rrln < 0 && rrln > INT_MIN / 2);
> +	}
>  #endif /* #ifdef CONFIG_PROVE_LOCKING */
>  }
>  EXPORT_SYMBOL_GPL(__rcu_read_unlock);
> @@ -589,7 +611,8 @@ static void rcu_preempt_check_callbacks(int cpu)
>  		rcu_preempt_qs(cpu);
>  		return;
>  	}
> -	if (per_cpu(rcu_preempt_data, cpu).qs_pending)
> +	if (t->rcu_read_lock_nesting > 0 &&
> +	    per_cpu(rcu_preempt_data, cpu).qs_pending)
>  		t->rcu_read_unlock_special |= RCU_READ_UNLOCK_NEED_QS;
>  }
>  
> @@ -695,9 +718,12 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
>  
>  	raw_spin_lock_irqsave(&rnp->lock, flags);
>  	for (;;) {
> -		if (!sync_rcu_preempt_exp_done(rnp))
> +		if (!sync_rcu_preempt_exp_done(rnp)) {
> +			raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  			break;
> +		}
>  		if (rnp->parent == NULL) {
> +			raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  			wake_up(&sync_rcu_preempt_exp_wq);
>  			break;
>  		}
> @@ -707,7 +733,6 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
>  		raw_spin_lock(&rnp->lock); /* irqs already disabled */
>  		rnp->expmask &= ~mask;
>  	}
> -	raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  }
>  
>  /*
> @@ -1174,7 +1199,7 @@ static int rcu_boost(struct rcu_node *rnp)
>  	t = container_of(tb, struct task_struct, rcu_node_entry);
>  	rt_mutex_init_proxy_locked(&mtx, t);
>  	t->rcu_boost_mutex = &mtx;
> -	t->rcu_read_unlock_special |= RCU_READ_UNLOCK_BOOSTED;
> +	t->rcu_boosted = 1;
>  	raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  	rt_mutex_lock(&mtx);  /* Side effect: boosts task t's priority. */
>  	rt_mutex_unlock(&mtx);  /* Keep lockdep happy. */
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 3dc716f..31e92ae 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -2544,13 +2544,9 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
>  }
>  
>  #ifdef CONFIG_SMP
> -static void sched_ttwu_pending(void)
> +static void sched_ttwu_do_pending(struct task_struct *list)
>  {
>  	struct rq *rq = this_rq();
> -	struct task_struct *list = xchg(&rq->wake_list, NULL);
> -
> -	if (!list)
> -		return;
>  
>  	raw_spin_lock(&rq->lock);
>  
> @@ -2563,9 +2559,45 @@ static void sched_ttwu_pending(void)
>  	raw_spin_unlock(&rq->lock);
>  }
>  
> +#ifdef CONFIG_HOTPLUG_CPU
> +
> +static void sched_ttwu_pending(void)
> +{
> +	struct rq *rq = this_rq();
> +	struct task_struct *list = xchg(&rq->wake_list, NULL);
> +
> +	if (!list)
> +		return;
> +
> +	sched_ttwu_do_pending(list);
> +}
> +
> +#endif /* CONFIG_HOTPLUG_CPU */
> +
>  void scheduler_ipi(void)
>  {
> -	sched_ttwu_pending();
> +	struct rq *rq = this_rq();
> +	struct task_struct *list = xchg(&rq->wake_list, NULL);
> +
> +	if (!list)
> +		return;
> +
> +	/*
> +	 * Not all reschedule IPI handlers call irq_enter/irq_exit, since
> +	 * traditionally all their work was done from the interrupt return
> +	 * path. Now that we actually do some work, we need to make sure
> +	 * we do call them.
> +	 *
> +	 * Some archs already do call them, luckily irq_enter/exit nest
> +	 * properly.
> +	 *
> +	 * Arguably we should visit all archs and update all handlers,
> +	 * however a fair share of IPIs are still resched only so this would
> +	 * somewhat pessimize the simple resched case.
> +	 */
> +	irq_enter();
> +	sched_ttwu_do_pending(list);
> +	irq_exit();
>  }
>  
>  static void ttwu_queue_remote(struct task_struct *p, int cpu)
> diff --git a/kernel/signal.c b/kernel/signal.c
> index ff76786..415d85d 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1178,18 +1178,25 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk,
>  {
>  	struct sighand_struct *sighand;
>  
> -	rcu_read_lock();
>  	for (;;) {
> +		local_irq_save(*flags);
> +		rcu_read_lock();
>  		sighand = rcu_dereference(tsk->sighand);
> -		if (unlikely(sighand == NULL))
> +		if (unlikely(sighand == NULL)) {
> +			rcu_read_unlock();
> +			local_irq_restore(*flags);
>  			break;
> +		}
>  
> -		spin_lock_irqsave(&sighand->siglock, *flags);
> -		if (likely(sighand == tsk->sighand))
> +		spin_lock(&sighand->siglock);
> +		if (likely(sighand == tsk->sighand)) {
> +			rcu_read_unlock();
>  			break;
> -		spin_unlock_irqrestore(&sighand->siglock, *flags);
> +		}
> +		spin_unlock(&sighand->siglock);
> +		rcu_read_unlock();
> +		local_irq_restore(*flags);
>  	}
> -	rcu_read_unlock();
>  
>  	return sighand;
>  }
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 40cf63d..fca82c3 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -315,16 +315,24 @@ static inline void invoke_softirq(void)
>  {
>  	if (!force_irqthreads)
>  		__do_softirq();
> -	else
> +	else {
> +		__local_bh_disable((unsigned long)__builtin_return_address(0),
> +				SOFTIRQ_OFFSET);
>  		wakeup_softirqd();
> +		__local_bh_enable(SOFTIRQ_OFFSET);
> +	}
>  }
>  #else
>  static inline void invoke_softirq(void)
>  {
>  	if (!force_irqthreads)
>  		do_softirq();
> -	else
> +	else {
> +		__local_bh_disable((unsigned long)__builtin_return_address(0),
> +				SOFTIRQ_OFFSET);
>  		wakeup_softirqd();
> +		__local_bh_enable(SOFTIRQ_OFFSET);
> +	}
>  }
>  #endif
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [GIT PULL] RCU fixes for v3.0
  2011-07-20 21:55                             ` Ed Tomlinson
@ 2011-07-20 22:06                               ` Paul E. McKenney
  0 siblings, 0 replies; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-20 22:06 UTC (permalink / raw)
  To: Ed Tomlinson
  Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, Ben Greear,
	linux-kernel, laijs, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches

On Wed, Jul 20, 2011 at 05:55:37PM -0400, Ed Tomlinson wrote:
> On Wednesday 20 July 2011 17:04:26 Ingo Molnar wrote:
> > 
> > * Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > > Having put some testing into your rcu/urgent branch today i'd feel 
> > > more comfortable with taking this plus perhaps an RCU_BOOST 
> > > disabling patch. That makes it all fundamentally tested by a number 
> > > of people (including those who reported/reproduced the problems).
> > > 
> > > Linus, would that approach be fine with you? I'll send an RFC pull 
> > > request for the 6 patches as a reply to this mail, in a couple of 
> > > minutes.
> > 
> > here it is:
> > 
> > Linus,
> > 
> > Please pull the latest core-urgent-for-linus git tree from:
> > 
> >    git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core-urgent-for-linus
> 
> I've been running on the previous version of this tree along with a fix for the first patch.  An 
> additional patch was added this morning to prevent an invalid warning from triggering.  Aside 
> from that warning the system was stable with no other unexpected output in dmesg.  
> 
> Resetting to master and pulling the above tree gives me a clean boot.  On atleast one box that was
> seeing problems this seems too be making things better (boost and threadirqs enabled).

Thank you for all the testing, Ed!!!  As always, keeping my fingers crossed.

							Thanx, Paul

> Thanks
> Ed
> 
> > In addition to the testing Paul & co has performed i have boot tested 
> > x86 defconfig/allno/allyes/allmod and a fair number of randconfigs, 
> > and i build tested a dozen architecture defconfigs, amongst them the 
> > key ones.
> > 
> > ( We could also mark RCU_BOOST in init/Kconfig as EXPERIMENTAL, to
> >   further de-risk it - but that change is not part of this pull
> >   request. )
> > 
> >  Thanks,
> > 
> > 	Ingo
> > 
> > ------------------>
> > Paul E. McKenney (5):
> >       rcu: decrease rcu_report_exp_rnp coupling with scheduler
> >       rcu: Fix RCU_BOOST race handling current->rcu_read_unlock_special
> >       rcu: Streamline code produced by __rcu_read_unlock()
> >       rcu: protect __rcu_read_unlock() against scheduler-using irq handlers
> >       signal: align __lock_task_sighand() irq disabling and RCU
> > 
> > Peter Zijlstra (2):
> >       sched: Add irq_{enter,exit}() to scheduler_ipi()
> >       softirq,rcu: Inform RCU of irq_exit() activity
> > 
> > 
> >  include/linux/sched.h   |    3 ++
> >  kernel/rcutree_plugin.h |   53 ++++++++++++++++++++++++++++++++++------------
> >  kernel/sched.c          |   44 +++++++++++++++++++++++++++++++++-----
> >  kernel/signal.c         |   19 +++++++++++-----
> >  kernel/softirq.c        |   12 ++++++++-
> >  5 files changed, 103 insertions(+), 28 deletions(-)
> > 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 496770a..76676a4 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1254,6 +1254,9 @@ struct task_struct {
> >  #ifdef CONFIG_PREEMPT_RCU
> >  	int rcu_read_lock_nesting;
> >  	char rcu_read_unlock_special;
> > +#if defined(CONFIG_RCU_BOOST) && defined(CONFIG_TREE_PREEMPT_RCU)
> > +	int rcu_boosted;
> > +#endif /* #if defined(CONFIG_RCU_BOOST) && defined(CONFIG_TREE_PREEMPT_RCU) */
> >  	struct list_head rcu_node_entry;
> >  #endif /* #ifdef CONFIG_PREEMPT_RCU */
> >  #ifdef CONFIG_TREE_PREEMPT_RCU
> > diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> > index 75113cb..8aafbb8 100644
> > --- a/kernel/rcutree_plugin.h
> > +++ b/kernel/rcutree_plugin.h
> > @@ -68,6 +68,7 @@ struct rcu_state rcu_preempt_state = RCU_STATE_INITIALIZER(rcu_preempt_state);
> >  DEFINE_PER_CPU(struct rcu_data, rcu_preempt_data);
> >  static struct rcu_state *rcu_state = &rcu_preempt_state;
> >  
> > +static void rcu_read_unlock_special(struct task_struct *t);
> >  static int rcu_preempted_readers_exp(struct rcu_node *rnp);
> >  
> >  /*
> > @@ -147,7 +148,7 @@ static void rcu_preempt_note_context_switch(int cpu)
> >  	struct rcu_data *rdp;
> >  	struct rcu_node *rnp;
> >  
> > -	if (t->rcu_read_lock_nesting &&
> > +	if (t->rcu_read_lock_nesting > 0 &&
> >  	    (t->rcu_read_unlock_special & RCU_READ_UNLOCK_BLOCKED) == 0) {
> >  
> >  		/* Possibly blocking in an RCU read-side critical section. */
> > @@ -190,6 +191,14 @@ static void rcu_preempt_note_context_switch(int cpu)
> >  				rnp->gp_tasks = &t->rcu_node_entry;
> >  		}
> >  		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> > +	} else if (t->rcu_read_lock_nesting < 0 &&
> > +		   t->rcu_read_unlock_special) {
> > +
> > +		/*
> > +		 * Complete exit from RCU read-side critical section on
> > +		 * behalf of preempted instance of __rcu_read_unlock().
> > +		 */
> > +		rcu_read_unlock_special(t);
> >  	}
> >  
> >  	/*
> > @@ -284,7 +293,7 @@ static struct list_head *rcu_next_node_entry(struct task_struct *t,
> >   * notify RCU core processing or task having blocked during the RCU
> >   * read-side critical section.
> >   */
> > -static void rcu_read_unlock_special(struct task_struct *t)
> > +static noinline void rcu_read_unlock_special(struct task_struct *t)
> >  {
> >  	int empty;
> >  	int empty_exp;
> > @@ -309,7 +318,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
> >  	}
> >  
> >  	/* Hardware IRQ handlers cannot block. */
> > -	if (in_irq()) {
> > +	if (in_irq() || in_serving_softirq()) {
> >  		local_irq_restore(flags);
> >  		return;
> >  	}
> > @@ -342,6 +351,11 @@ static void rcu_read_unlock_special(struct task_struct *t)
> >  #ifdef CONFIG_RCU_BOOST
> >  		if (&t->rcu_node_entry == rnp->boost_tasks)
> >  			rnp->boost_tasks = np;
> > +		/* Snapshot and clear ->rcu_boosted with rcu_node lock held. */
> > +		if (t->rcu_boosted) {
> > +			special |= RCU_READ_UNLOCK_BOOSTED;
> > +			t->rcu_boosted = 0;
> > +		}
> >  #endif /* #ifdef CONFIG_RCU_BOOST */
> >  		t->rcu_blocked_node = NULL;
> >  
> > @@ -358,7 +372,6 @@ static void rcu_read_unlock_special(struct task_struct *t)
> >  #ifdef CONFIG_RCU_BOOST
> >  		/* Unboost if we were boosted. */
> >  		if (special & RCU_READ_UNLOCK_BOOSTED) {
> > -			t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_BOOSTED;
> >  			rt_mutex_unlock(t->rcu_boost_mutex);
> >  			t->rcu_boost_mutex = NULL;
> >  		}
> > @@ -387,13 +400,22 @@ void __rcu_read_unlock(void)
> >  	struct task_struct *t = current;
> >  
> >  	barrier();  /* needed if we ever invoke rcu_read_unlock in rcutree.c */
> > -	--t->rcu_read_lock_nesting;
> > -	barrier();  /* decrement before load of ->rcu_read_unlock_special */
> > -	if (t->rcu_read_lock_nesting == 0 &&
> > -	    unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
> > -		rcu_read_unlock_special(t);
> > +	if (t->rcu_read_lock_nesting != 1)
> > +		--t->rcu_read_lock_nesting;
> > +	else {
> > +		t->rcu_read_lock_nesting = INT_MIN;
> > +		barrier();  /* assign before ->rcu_read_unlock_special load */
> > +		if (unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
> > +			rcu_read_unlock_special(t);
> > +		barrier();  /* ->rcu_read_unlock_special load before assign */
> > +		t->rcu_read_lock_nesting = 0;
> > +	}
> >  #ifdef CONFIG_PROVE_LOCKING
> > -	WARN_ON_ONCE(ACCESS_ONCE(t->rcu_read_lock_nesting) < 0);
> > +	{
> > +		int rrln = ACCESS_ONCE(t->rcu_read_lock_nesting);
> > +
> > +		WARN_ON_ONCE(rrln < 0 && rrln > INT_MIN / 2);
> > +	}
> >  #endif /* #ifdef CONFIG_PROVE_LOCKING */
> >  }
> >  EXPORT_SYMBOL_GPL(__rcu_read_unlock);
> > @@ -589,7 +611,8 @@ static void rcu_preempt_check_callbacks(int cpu)
> >  		rcu_preempt_qs(cpu);
> >  		return;
> >  	}
> > -	if (per_cpu(rcu_preempt_data, cpu).qs_pending)
> > +	if (t->rcu_read_lock_nesting > 0 &&
> > +	    per_cpu(rcu_preempt_data, cpu).qs_pending)
> >  		t->rcu_read_unlock_special |= RCU_READ_UNLOCK_NEED_QS;
> >  }
> >  
> > @@ -695,9 +718,12 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
> >  
> >  	raw_spin_lock_irqsave(&rnp->lock, flags);
> >  	for (;;) {
> > -		if (!sync_rcu_preempt_exp_done(rnp))
> > +		if (!sync_rcu_preempt_exp_done(rnp)) {
> > +			raw_spin_unlock_irqrestore(&rnp->lock, flags);
> >  			break;
> > +		}
> >  		if (rnp->parent == NULL) {
> > +			raw_spin_unlock_irqrestore(&rnp->lock, flags);
> >  			wake_up(&sync_rcu_preempt_exp_wq);
> >  			break;
> >  		}
> > @@ -707,7 +733,6 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp)
> >  		raw_spin_lock(&rnp->lock); /* irqs already disabled */
> >  		rnp->expmask &= ~mask;
> >  	}
> > -	raw_spin_unlock_irqrestore(&rnp->lock, flags);
> >  }
> >  
> >  /*
> > @@ -1174,7 +1199,7 @@ static int rcu_boost(struct rcu_node *rnp)
> >  	t = container_of(tb, struct task_struct, rcu_node_entry);
> >  	rt_mutex_init_proxy_locked(&mtx, t);
> >  	t->rcu_boost_mutex = &mtx;
> > -	t->rcu_read_unlock_special |= RCU_READ_UNLOCK_BOOSTED;
> > +	t->rcu_boosted = 1;
> >  	raw_spin_unlock_irqrestore(&rnp->lock, flags);
> >  	rt_mutex_lock(&mtx);  /* Side effect: boosts task t's priority. */
> >  	rt_mutex_unlock(&mtx);  /* Keep lockdep happy. */
> > diff --git a/kernel/sched.c b/kernel/sched.c
> > index 3dc716f..31e92ae 100644
> > --- a/kernel/sched.c
> > +++ b/kernel/sched.c
> > @@ -2544,13 +2544,9 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
> >  }
> >  
> >  #ifdef CONFIG_SMP
> > -static void sched_ttwu_pending(void)
> > +static void sched_ttwu_do_pending(struct task_struct *list)
> >  {
> >  	struct rq *rq = this_rq();
> > -	struct task_struct *list = xchg(&rq->wake_list, NULL);
> > -
> > -	if (!list)
> > -		return;
> >  
> >  	raw_spin_lock(&rq->lock);
> >  
> > @@ -2563,9 +2559,45 @@ static void sched_ttwu_pending(void)
> >  	raw_spin_unlock(&rq->lock);
> >  }
> >  
> > +#ifdef CONFIG_HOTPLUG_CPU
> > +
> > +static void sched_ttwu_pending(void)
> > +{
> > +	struct rq *rq = this_rq();
> > +	struct task_struct *list = xchg(&rq->wake_list, NULL);
> > +
> > +	if (!list)
> > +		return;
> > +
> > +	sched_ttwu_do_pending(list);
> > +}
> > +
> > +#endif /* CONFIG_HOTPLUG_CPU */
> > +
> >  void scheduler_ipi(void)
> >  {
> > -	sched_ttwu_pending();
> > +	struct rq *rq = this_rq();
> > +	struct task_struct *list = xchg(&rq->wake_list, NULL);
> > +
> > +	if (!list)
> > +		return;
> > +
> > +	/*
> > +	 * Not all reschedule IPI handlers call irq_enter/irq_exit, since
> > +	 * traditionally all their work was done from the interrupt return
> > +	 * path. Now that we actually do some work, we need to make sure
> > +	 * we do call them.
> > +	 *
> > +	 * Some archs already do call them, luckily irq_enter/exit nest
> > +	 * properly.
> > +	 *
> > +	 * Arguably we should visit all archs and update all handlers,
> > +	 * however a fair share of IPIs are still resched only so this would
> > +	 * somewhat pessimize the simple resched case.
> > +	 */
> > +	irq_enter();
> > +	sched_ttwu_do_pending(list);
> > +	irq_exit();
> >  }
> >  
> >  static void ttwu_queue_remote(struct task_struct *p, int cpu)
> > diff --git a/kernel/signal.c b/kernel/signal.c
> > index ff76786..415d85d 100644
> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -1178,18 +1178,25 @@ struct sighand_struct *__lock_task_sighand(struct task_struct *tsk,
> >  {
> >  	struct sighand_struct *sighand;
> >  
> > -	rcu_read_lock();
> >  	for (;;) {
> > +		local_irq_save(*flags);
> > +		rcu_read_lock();
> >  		sighand = rcu_dereference(tsk->sighand);
> > -		if (unlikely(sighand == NULL))
> > +		if (unlikely(sighand == NULL)) {
> > +			rcu_read_unlock();
> > +			local_irq_restore(*flags);
> >  			break;
> > +		}
> >  
> > -		spin_lock_irqsave(&sighand->siglock, *flags);
> > -		if (likely(sighand == tsk->sighand))
> > +		spin_lock(&sighand->siglock);
> > +		if (likely(sighand == tsk->sighand)) {
> > +			rcu_read_unlock();
> >  			break;
> > -		spin_unlock_irqrestore(&sighand->siglock, *flags);
> > +		}
> > +		spin_unlock(&sighand->siglock);
> > +		rcu_read_unlock();
> > +		local_irq_restore(*flags);
> >  	}
> > -	rcu_read_unlock();
> >  
> >  	return sighand;
> >  }
> > diff --git a/kernel/softirq.c b/kernel/softirq.c
> > index 40cf63d..fca82c3 100644
> > --- a/kernel/softirq.c
> > +++ b/kernel/softirq.c
> > @@ -315,16 +315,24 @@ static inline void invoke_softirq(void)
> >  {
> >  	if (!force_irqthreads)
> >  		__do_softirq();
> > -	else
> > +	else {
> > +		__local_bh_disable((unsigned long)__builtin_return_address(0),
> > +				SOFTIRQ_OFFSET);
> >  		wakeup_softirqd();
> > +		__local_bh_enable(SOFTIRQ_OFFSET);
> > +	}
> >  }
> >  #else
> >  static inline void invoke_softirq(void)
> >  {
> >  	if (!force_irqthreads)
> >  		do_softirq();
> > -	else
> > +	else {
> > +		__local_bh_disable((unsigned long)__builtin_return_address(0),
> > +				SOFTIRQ_OFFSET);
> >  		wakeup_softirqd();
> > +		__local_bh_enable(SOFTIRQ_OFFSET);
> > +	}
> >  }
> >  #endif
> >  
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 
> > 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH tip/core/urgent 3/7] rcu: Streamline code produced by __rcu_read_unlock()
  2011-07-20 18:26   ` [PATCH tip/core/urgent 3/7] rcu: Streamline code produced by __rcu_read_unlock() Paul E. McKenney
@ 2011-07-20 22:44     ` Linus Torvalds
  2011-07-21  5:09       ` Paul E. McKenney
  0 siblings, 1 reply; 55+ messages in thread
From: Linus Torvalds @ 2011-07-20 22:44 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches, greearb, edt

On Wed, Jul 20, 2011 at 11:26 AM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> Given some common flag combinations, particularly -Os, gcc will inline
> rcu_read_unlock_special() despite its being in an unlikely() clause.
> Use noinline to prohibit this misoptimization.

Btw, I suspect that we should at least look at what it would mean if
we make the rcu_read_lock_nesting and the preempt counters both be
per-cpu variables instead of making them per-thread/process counters.

Then, when we switch threads, we'd just save/restore them from the
process register save area.

There's a lot of critical code sequences (spin-lock/unlock, rcu
read-lock/unlock) that currently fetches the thread/process pointer
only to then offset it and increment the count. I get the strong
feeling that code generation could be improved and we could avoid one
level of indirection by just making it a per-thread counter.

For example, instead of __rcu_read_lock: looking like this (and being
an external function, partly because of header file dependencies on
the data structures involved):

  push   %rbp
  mov    %rsp,%rbp
  mov    %gs:0xb580,%rax
  incl   0x100(%rax)
  leaveq
  retq

it should inline to just something like

  incl %gs:0x100

instead. Same for the preempt counter.

Of course, it would need to involve making sure that we pick a good
cacheline etc that is already always dirty. But other than that, is
there any real downside?

                       Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-20 21:12                               ` Paul E. McKenney
@ 2011-07-21  3:25                                 ` Ben Greear
  2011-07-21 16:04                                   ` Paul E. McKenney
  0 siblings, 1 reply; 55+ messages in thread
From: Ben Greear @ 2011-07-21  3:25 UTC (permalink / raw)
  To: paulmck
  Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, Ed Tomlinson,
	linux-kernel, laijs, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, edward.tomlinson

On 07/20/2011 02:12 PM, Paul E. McKenney wrote:
> On Wed, Jul 20, 2011 at 01:54:49PM -0700, Ben Greear wrote:
>> On 07/20/2011 01:33 PM, Paul E. McKenney wrote:
>>> On Wed, Jul 20, 2011 at 09:57:42PM +0200, Ingo Molnar wrote:
>>>>
>>>> * Ingo Molnar<mingo@elte.hu>   wrote:
>>>>
>>>>>
>>>>> * Paul E. McKenney<paulmck@linux.vnet.ibm.com>   wrote:
>>>>>
>>>>>> If my guess is correct, then the minimal non-RCU_BOOST fix is #4
>>>>>> (which drags along #3) and #6.  Which are not one-liners, but
>>>>>> somewhat smaller:
>>>>>>
>>>>>>   b/kernel/rcutree_plugin.h |   12 ++++++------
>>>>>>   b/kernel/softirq.c        |   12 ++++++++++--
>>>>>>   kernel/rcutree_plugin.h   |   31 +++++++++++++++++++++++++------
>>>>>>   3 files changed, 41 insertions(+), 14 deletions(-)
>>>>>
>>>>> That's half the patch size and half the patch count.
>>>>>
>>>>> PeterZ's question is relevant: since we apparently had similar bugs
>>>>> in v2.6.39 as well, what changed in v3.0 that makes them so urgent
>>>>> to fix?
>>>>>
>>>>> If it's just better instrumentation that proves them better then
>>>>> i'd suggest fixing this in v3.1 and not risking v3.0 with an
>>>>> unintended side effect.
>>>>
>>>> Ok, i looked some more at the background and the symptoms that people
>>>> are seeing: kernel crashes and lockups. I think we want these
>>>> problems fixed in v3.0, even if it was the recent introduction of
>>>> RCU_BOOST that made it really prominent.
>>>>
>>>> Having put some testing into your rcu/urgent branch today i'd feel
>>>> more comfortable with taking this plus perhaps an RCU_BOOST disabling
>>>> patch. That makes it all fundamentally tested by a number of people
>>>> (including those who reported/reproduced the problems).
>>>
>>> RCU_BOOST is currently default=n.  Is that sufficient?  If not, one
>>
>> Not if it remains broken I think..unless you put it under CONFIG_BROKEN
>> or something.  Otherwise, folks are liable to turn it on and not realize
>> it's the cause of subtle bugs.
>
> Good point, I could easily add "depends on BROKEN".
>
>> For what it's worth, my tests have been running clean for around 2 hours, so the full set of
>> fixes with RCU_BOOST appears good, so far.  I'll let it continue to run
>> at least overnight to make sure I'm not just getting lucky...
>
> Continuing to think good thoughts...  ;-)

My test is still going strong with no splats or errors, so I think that
nailed the problems I was seeing...

Thanks,
Ben

>
> 							Thanx, Paul
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH tip/core/urgent 3/7] rcu: Streamline code produced by __rcu_read_unlock()
  2011-07-20 22:44     ` Linus Torvalds
@ 2011-07-21  5:09       ` Paul E. McKenney
  0 siblings, 0 replies; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-21  5:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, darren, patches, greearb, edt

On Wed, Jul 20, 2011 at 03:44:55PM -0700, Linus Torvalds wrote:
> On Wed, Jul 20, 2011 at 11:26 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > Given some common flag combinations, particularly -Os, gcc will inline
> > rcu_read_unlock_special() despite its being in an unlikely() clause.
> > Use noinline to prohibit this misoptimization.
> 
> Btw, I suspect that we should at least look at what it would mean if
> we make the rcu_read_lock_nesting and the preempt counters both be
> per-cpu variables instead of making them per-thread/process counters.
> 
> Then, when we switch threads, we'd just save/restore them from the
> process register save area.
> 
> There's a lot of critical code sequences (spin-lock/unlock, rcu
> read-lock/unlock) that currently fetches the thread/process pointer
> only to then offset it and increment the count. I get the strong
> feeling that code generation could be improved and we could avoid one
> level of indirection by just making it a per-thread counter.
> 
> For example, instead of __rcu_read_lock: looking like this (and being
> an external function, partly because of header file dependencies on
> the data structures involved):
> 
>   push   %rbp
>   mov    %rsp,%rbp
>   mov    %gs:0xb580,%rax
>   incl   0x100(%rax)
>   leaveq
>   retq
> 
> it should inline to just something like
> 
>   incl %gs:0x100
> 
> instead. Same for the preempt counter.
> 
> Of course, it would need to involve making sure that we pick a good
> cacheline etc that is already always dirty. But other than that, is
> there any real downside?

We would need a form of per-CPU variable access that generated
efficient code, but that didn't complain about being used when
preemption was enabled.  __this_cpu_add_4() might do the trick,
but I haven't dug fully through it yet.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
  2011-07-21  3:25                                 ` Ben Greear
@ 2011-07-21 16:04                                   ` Paul E. McKenney
  0 siblings, 0 replies; 55+ messages in thread
From: Paul E. McKenney @ 2011-07-21 16:04 UTC (permalink / raw)
  To: Ben Greear
  Cc: Ingo Molnar, Linus Torvalds, Peter Zijlstra, Ed Tomlinson,
	linux-kernel, laijs, dipankar, akpm, mathieu.desnoyers, josh,
	niv, tglx, rostedt, Valdis.Kletnieks, dhowells, eric.dumazet,
	darren, patches, edward.tomlinson

On Wed, Jul 20, 2011 at 08:25:24PM -0700, Ben Greear wrote:
> On 07/20/2011 02:12 PM, Paul E. McKenney wrote:
> >On Wed, Jul 20, 2011 at 01:54:49PM -0700, Ben Greear wrote:
> >>On 07/20/2011 01:33 PM, Paul E. McKenney wrote:
> >>>On Wed, Jul 20, 2011 at 09:57:42PM +0200, Ingo Molnar wrote:
> >>>>
> >>>>* Ingo Molnar<mingo@elte.hu>   wrote:
> >>>>
> >>>>>
> >>>>>* Paul E. McKenney<paulmck@linux.vnet.ibm.com>   wrote:
> >>>>>
> >>>>>>If my guess is correct, then the minimal non-RCU_BOOST fix is #4
> >>>>>>(which drags along #3) and #6.  Which are not one-liners, but
> >>>>>>somewhat smaller:
> >>>>>>
> >>>>>>  b/kernel/rcutree_plugin.h |   12 ++++++------
> >>>>>>  b/kernel/softirq.c        |   12 ++++++++++--
> >>>>>>  kernel/rcutree_plugin.h   |   31 +++++++++++++++++++++++++------
> >>>>>>  3 files changed, 41 insertions(+), 14 deletions(-)
> >>>>>
> >>>>>That's half the patch size and half the patch count.
> >>>>>
> >>>>>PeterZ's question is relevant: since we apparently had similar bugs
> >>>>>in v2.6.39 as well, what changed in v3.0 that makes them so urgent
> >>>>>to fix?
> >>>>>
> >>>>>If it's just better instrumentation that proves them better then
> >>>>>i'd suggest fixing this in v3.1 and not risking v3.0 with an
> >>>>>unintended side effect.
> >>>>
> >>>>Ok, i looked some more at the background and the symptoms that people
> >>>>are seeing: kernel crashes and lockups. I think we want these
> >>>>problems fixed in v3.0, even if it was the recent introduction of
> >>>>RCU_BOOST that made it really prominent.
> >>>>
> >>>>Having put some testing into your rcu/urgent branch today i'd feel
> >>>>more comfortable with taking this plus perhaps an RCU_BOOST disabling
> >>>>patch. That makes it all fundamentally tested by a number of people
> >>>>(including those who reported/reproduced the problems).
> >>>
> >>>RCU_BOOST is currently default=n.  Is that sufficient?  If not, one
> >>
> >>Not if it remains broken I think..unless you put it under CONFIG_BROKEN
> >>or something.  Otherwise, folks are liable to turn it on and not realize
> >>it's the cause of subtle bugs.
> >
> >Good point, I could easily add "depends on BROKEN".
> >
> >>For what it's worth, my tests have been running clean for around 2 hours, so the full set of
> >>fixes with RCU_BOOST appears good, so far.  I'll let it continue to run
> >>at least overnight to make sure I'm not just getting lucky...
> >
> >Continuing to think good thoughts...  ;-)
> 
> My test is still going strong with no splats or errors, so I think that
> nailed the problems I was seeing...

Excellent news!!!

And again, thank you for all the testing!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2011-07-21 16:13 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-20  0:17 [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck Paul E. McKenney
2011-07-20  0:18 ` [PATCH tip/core/urgent 1/7] rcu: decrease rcu_report_exp_rnp coupling with scheduler Paul E. McKenney
2011-07-20  2:40   ` Peter Zijlstra
2011-07-20  4:54     ` Paul E. McKenney
2011-07-20 11:23       ` Ed Tomlinson
2011-07-20 11:31         ` Ed Tomlinson
2011-07-20 12:35         ` Peter Zijlstra
2011-07-20 13:23         ` Paul E. McKenney
2011-07-20  0:18 ` [PATCH tip/core/urgent 2/7] rcu: Fix RCU_BOOST race handling current->rcu_read_unlock_special Paul E. McKenney
2011-07-20  0:18 ` [PATCH tip/core/urgent 3/7] rcu: Streamline code produced by __rcu_read_unlock() Paul E. McKenney
2011-07-20  0:18 ` [PATCH tip/core/urgent 4/7] rcu: protect __rcu_read_unlock() against scheduler-using irq handlers Paul E. McKenney
2011-07-20 12:54   ` Peter Zijlstra
2011-07-20 13:25     ` Paul E. McKenney
2011-07-20  0:18 ` [PATCH tip/core/urgent 5/7] sched: Add irq_{enter,exit}() to scheduler_ipi() Paul E. McKenney
2011-07-20  0:18 ` [PATCH tip/core/urgent 6/7] softirq,rcu: Inform RCU of irq_exit() activity Paul E. McKenney
2011-07-20  0:18 ` [PATCH tip/core/urgent 7/7] signal: align __lock_task_sighand() irq disabling and RCU Paul E. McKenney
2011-07-20  1:10 ` [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck Ben Greear
2011-07-20  1:30 ` Ed Tomlinson
2011-07-20  2:07   ` Ed Tomlinson
2011-07-20  4:44     ` Paul E. McKenney
2011-07-20  5:03       ` Linus Torvalds
2011-07-20 13:34         ` Paul E. McKenney
2011-07-20 17:02           ` Ben Greear
2011-07-20 17:15             ` Paul E. McKenney
2011-07-20 18:44               ` Ingo Molnar
2011-07-20 18:52                 ` Peter Zijlstra
2011-07-20 19:01                   ` Paul E. McKenney
2011-07-20 19:25                     ` Peter Zijlstra
2011-07-20 20:06                       ` Paul E. McKenney
2011-07-20 19:02                   ` Linus Torvalds
2011-07-20 19:29                     ` Paul E. McKenney
2011-07-20 19:39                       ` Ingo Molnar
2011-07-20 19:57                         ` Ingo Molnar
2011-07-20 20:33                           ` Paul E. McKenney
2011-07-20 20:54                             ` Ben Greear
2011-07-20 21:12                               ` Paul E. McKenney
2011-07-21  3:25                                 ` Ben Greear
2011-07-21 16:04                                   ` Paul E. McKenney
2011-07-20 21:04                           ` [GIT PULL] RCU fixes for v3.0 Ingo Molnar
2011-07-20 21:55                             ` Ed Tomlinson
2011-07-20 22:06                               ` Paul E. McKenney
2011-07-20 20:08                         ` [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck Paul E. McKenney
2011-07-20 21:05                       ` Peter Zijlstra
2011-07-20 21:39                         ` Paul E. McKenney
2011-07-20 10:49       ` Ed Tomlinson
2011-07-20 18:25 ` [PATCH rcu/urgent 0/7 v2] " Paul E. McKenney
2011-07-20 18:26   ` [PATCH tip/core/urgent 1/7] rcu: decrease rcu_report_exp_rnp coupling with scheduler Paul E. McKenney
2011-07-20 18:26   ` [PATCH tip/core/urgent 2/7] rcu: Fix RCU_BOOST race handling current->rcu_read_unlock_special Paul E. McKenney
2011-07-20 18:26   ` [PATCH tip/core/urgent 3/7] rcu: Streamline code produced by __rcu_read_unlock() Paul E. McKenney
2011-07-20 22:44     ` Linus Torvalds
2011-07-21  5:09       ` Paul E. McKenney
2011-07-20 18:26   ` [PATCH tip/core/urgent 4/7] rcu: protect __rcu_read_unlock() against scheduler-using irq handlers Paul E. McKenney
2011-07-20 18:26   ` [PATCH tip/core/urgent 5/7] sched: Add irq_{enter,exit}() to scheduler_ipi() Paul E. McKenney
2011-07-20 18:26   ` [PATCH tip/core/urgent 6/7] softirq,rcu: Inform RCU of irq_exit() activity Paul E. McKenney
2011-07-20 18:26   ` [PATCH tip/core/urgent 7/7] signal: align __lock_task_sighand() irq disabling and RCU Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).