linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RT 00/14] Linux 3.10.32-rt31-rc2
@ 2014-03-01  3:52 Steven Rostedt
  2014-03-01  3:52 ` [PATCH RT 01/14] rcu: Dont activate RCU core on NO_HZ_FULL CPUs Steven Rostedt
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: Steven Rostedt @ 2014-03-01  3:52 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Paul Gortmaker


Dear RT Folks,

This is the RT stable review cycle of patch 3.10.32-rt31-rc2.

Please scream at me if I messed something up. Please test the patches too.

The -rc release will be uploaded to kernel.org and will be deleted when
the final release is out. This is just a review release (or release candidate).

The pre-releases will not be pushed to the git repository, only the
final release is.

If all goes well, this patch will be converted to the next main release
on 3/3/2014.

Enjoy,

-- Steve


To build 3.10.32-rt31-rc2 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.x/linux-3.10.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.x/patch-3.10.32.xz

  http://www.kernel.org/pub/linux/kernel/projects/rt/3.10/patch-3.10.32-rt31-rc2.patch.xz

You can also build from 3.10.32-rt30 by applying the incremental patch:

http://www.kernel.org/pub/linux/kernel/projects/rt/3.10/incr/patch-3.10.32-rt30-rt31-rc2.patch.xz


Changes from 3.10.32-rt30:

---


Nicholas Mc Guire (1):
      net: ip_send_unicast_reply: add missing local serialization

Paul E. McKenney (2):
      rcu: Don't activate RCU core on NO_HZ_FULL CPUs
      rcu: Eliminate softirq processing from rcutree

Sebastian Andrzej Siewior (5):
      Revert "x86: Disable IST stacks for debug/int 3/stack fault for PREEMPT_RT"
      kernel/hrtimer: be non-freezeable in cpu_chill()
      irq_work: allow certain work in hard irq context
      arm/unwind: use a raw_spin_lock
      leds: trigger: disable CPU trigger on -RT

Steven Rostedt (3):
      timer: Raise softirq if there's irq_work
      timer/rt: Always raise the softirq if there's irq_work to be done
      rt: Make cpu_chill() use hrtimer instead of msleep()

Steven Rostedt (Red Hat) (1):
      Linux 3.10.32-rt31-rc2

Thomas Gleixner (1):
      timers: do not raise softirq unconditionally

Tiejun Chen (1):
      rcutree/rcu_bh_qs: disable irq while calling rcu_preempt_qs()

----
 arch/arm/kernel/unwind.c             |  14 ++--
 arch/powerpc/kernel/time.c           |   2 +-
 arch/sparc/kernel/pcr.c              |   2 +
 arch/x86/include/asm/page_64_types.h |  21 ++---
 arch/x86/kernel/cpu/common.c         |   2 -
 arch/x86/kernel/dumpstack_64.c       |   4 -
 drivers/leds/trigger/Kconfig         |   2 +-
 include/linux/delay.h                |   2 +-
 include/linux/hrtimer.h              |   3 +-
 include/linux/irq_work.h             |   1 +
 kernel/hrtimer.c                     |  50 ++++++------
 kernel/irq_work.c                    |  22 ++++-
 kernel/rcutree.c                     | 122 +++++++++++++++++++++++----
 kernel/rcutree.h                     |   4 +-
 kernel/rcutree_plugin.h              | 154 ++++++++---------------------------
 kernel/time/tick-sched.c             |   1 +
 kernel/timer.c                       |  37 ++++++++-
 localversion-rt                      |   2 +-
 net/ipv4/ip_output.c                 |   9 +-
 19 files changed, 249 insertions(+), 205 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RT 01/14] rcu: Dont activate RCU core on NO_HZ_FULL CPUs
  2014-03-01  3:52 [PATCH RT 00/14] Linux 3.10.32-rt31-rc2 Steven Rostedt
@ 2014-03-01  3:52 ` Steven Rostedt
  2014-03-01  3:52 ` [PATCH RT 02/14] timers: do not raise softirq unconditionally Steven Rostedt
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2014-03-01  3:52 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Paul Gortmaker, stable-rt, Mike Galbraith,
	Paul E. McKenney, Frederic Weisbecker

[-- Attachment #1: 0001-rcu-Don-t-activate-RCU-core-on-NO_HZ_FULL-CPUs.patch --]
[-- Type: text/plain, Size: 3834 bytes --]

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

------------------

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Whenever a CPU receives a scheduling-clock interrupt, RCU checks to see
if the RCU core needs anything from this CPU.  If so, RCU raises
RCU_SOFTIRQ to carry out any needed processing.

This approach has worked well historically, but it is undesirable on
NO_HZ_FULL CPUs.  Such CPUs are expected to spend almost all of their time
in userspace, so that scheduling-clock interrupts can be disabled while
there is only one runnable task on the CPU in question.  Unfortunately,
raising any softirq has the potential to wake up ksoftirqd, which would
provide the second runnable task on that CPU, preventing disabling of
scheduling-clock interrupts.

What is needed instead is for RCU to leave NO_HZ_FULL CPUs alone,
relying on the grace-period kthreads' quiescent-state forcing to
do any needed RCU work on behalf of those CPUs.

This commit therefore refrains from raising RCU_SOFTIRQ on any
NO_HZ_FULL CPUs during any grace periods that have been in effect
for less than one second.  The one-second limit handles the case
where an inappropriate workload is running on a NO_HZ_FULL CPU
that features lots of scheduling-clock interrupts, but no idle
or userspace time.

Cc: stable-rt@vger.kernel.org
Reported-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Mike Galbraith <bitbucket@online.de>
Tested-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/rcutree.c        |  4 ++++
 kernel/rcutree.h        |  1 +
 kernel/rcutree_plugin.h | 20 ++++++++++++++++++++
 3 files changed, 25 insertions(+)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 55915b1..75743cb 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -2749,6 +2749,10 @@ static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp)
 	/* Check for CPU stalls, if enabled. */
 	check_cpu_stall(rsp, rdp);
 
+	/* Is this CPU a NO_HZ_FULL CPU that should ignore RCU? */
+	if (rcu_nohz_full_cpu(rsp))
+		return 0;
+
 	/* Is the RCU core waiting for a quiescent state from this CPU? */
 	if (rcu_scheduler_fully_active &&
 	    rdp->qs_pending && !rdp->passed_quiesce) {
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 8491f47..7e0b397 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -541,6 +541,7 @@ static void rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp);
 static void rcu_spawn_nocb_kthreads(struct rcu_state *rsp);
 static void rcu_kick_nohz_cpu(int cpu);
 static bool init_nocb_callback_list(struct rcu_data *rdp);
+static bool rcu_nohz_full_cpu(struct rcu_state *rsp);
 
 #endif /* #ifndef RCU_TREE_NONCORE */
 
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index dc0c4b2..481a124 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -2356,3 +2356,23 @@ static void rcu_kick_nohz_cpu(int cpu)
 		smp_send_reschedule(cpu);
 #endif /* #ifdef CONFIG_NO_HZ_FULL */
 }
+
+/*
+ * Is this CPU a NO_HZ_FULL CPU that should ignore RCU so that the
+ * grace-period kthread will do force_quiescent_state() processing?
+ * The idea is to avoid waking up RCU core processing on such a
+ * CPU unless the grace period has extended for too long.
+ *
+ * This code relies on the fact that all NO_HZ_FULL CPUs are also
+ * CONFIG_RCU_NOCB_CPUs.
+ */
+static bool rcu_nohz_full_cpu(struct rcu_state *rsp)
+{
+#ifdef CONFIG_NO_HZ_FULL
+	if (tick_nohz_full_cpu(smp_processor_id()) &&
+	    (!rcu_gp_in_progress(rsp) ||
+	     ULONG_CMP_LT(jiffies, ACCESS_ONCE(rsp->gp_start) + HZ)))
+		return 1;
+#endif /* #ifdef CONFIG_NO_HZ_FULL */
+	return 0;
+}
-- 
1.8.5.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RT 02/14] timers: do not raise softirq unconditionally
  2014-03-01  3:52 [PATCH RT 00/14] Linux 3.10.32-rt31-rc2 Steven Rostedt
  2014-03-01  3:52 ` [PATCH RT 01/14] rcu: Dont activate RCU core on NO_HZ_FULL CPUs Steven Rostedt
@ 2014-03-01  3:52 ` Steven Rostedt
  2014-03-01  3:52 ` [PATCH RT 03/14] timer: Raise softirq if theres irq_work Steven Rostedt
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2014-03-01  3:52 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Paul Gortmaker, stable-rt

[-- Attachment #1: 0002-timers-do-not-raise-softirq-unconditionally.patch --]
[-- Type: text/plain, Size: 5560 bytes --]

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

Mike,

On Thu, 7 Nov 2013, Mike Galbraith wrote:

> On Thu, 2013-11-07 at 04:26 +0100, Mike Galbraith wrote:
> > On Wed, 2013-11-06 at 18:49 +0100, Thomas Gleixner wrote:
>
> > > I bet you are trying to work around some of the side effects of the
> > > occasional tick which is still necessary despite of full nohz, right?
> >
> > Nope, I wanted to check out cost of nohz_full for rt, and found that it
> > doesn't work at all instead, looked, and found that the sole running
> > task has just awakened ksoftirqd when it wants to shut the tick down, so
> > that shutdown never happens.
>
> Like so in virgin 3.10-rt.  Box is x3550 M3 booted nowatchdog
> rcu_nocbs=1-3 nohz_full=1-3, and CPUs1-3 are completely isolated via
> cpusets as well.

well, that very same problem is in mainline if you add "threadirqs" to
the command line. But we can be smart about this. The untested patch
below should address that issue. If that works on mainline we can
adapt it for RT (needs a trylock(&base->lock) there).

Though it's not a full solution. It needs some thought versus the
softirq code of timers. Assume we have only one timer queued 1000
ticks into the future. So this change will cause the timer softirq not
to be called until that timer expires and then the timer softirq is
going to do 1000 loops until it catches up with jiffies. That's
anything but pretty ...

What worries me more is this one:

  pert-5229  [003] d..h1..   684.482618: softirq_raise: vec=9 [action=RCU]

The CPU has no callbacks as you shoved them over to cpu 0, so why is
the RCU softirq raised?

Thanks,

	tglx
------------------
Message-id: <alpine.DEB.2.02.1311071158350.23353@ionos.tec.linutronix.de>
|CONFIG_NO_HZ_FULL + CONFIG_PREEMPT_RT_FULL = nogo
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/hrtimer.h |  3 +--
 kernel/hrtimer.c        | 31 +++++++------------------------
 kernel/timer.c          | 28 +++++++++++++++++++++++++---
 3 files changed, 33 insertions(+), 29 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 79a7a35..bdbf77db 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -461,9 +461,8 @@ extern int schedule_hrtimeout_range_clock(ktime_t *expires,
 		unsigned long delta, const enum hrtimer_mode mode, int clock);
 extern int schedule_hrtimeout(ktime_t *expires, const enum hrtimer_mode mode);
 
-/* Soft interrupt function to run the hrtimer queues: */
+/* Called from the periodic timer tick */
 extern void hrtimer_run_queues(void);
-extern void hrtimer_run_pending(void);
 
 /* Bootup initialization: */
 extern void __init hrtimers_init(void);
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index a63cfaf..a7e90b2 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1691,30 +1691,6 @@ static void run_hrtimer_softirq(struct softirq_action *h)
 }
 
 /*
- * Called from timer softirq every jiffy, expire hrtimers:
- *
- * For HRT its the fall back code to run the softirq in the timer
- * softirq context in case the hrtimer initialization failed or has
- * not been done yet.
- */
-void hrtimer_run_pending(void)
-{
-	if (hrtimer_hres_active())
-		return;
-
-	/*
-	 * This _is_ ugly: We have to check in the softirq context,
-	 * whether we can switch to highres and / or nohz mode. The
-	 * clocksource switch happens in the timer interrupt with
-	 * xtime_lock held. Notification from there only sets the
-	 * check bit in the tick_oneshot code, otherwise we might
-	 * deadlock vs. xtime_lock.
-	 */
-	if (tick_check_oneshot_change(!hrtimer_is_hres_enabled()))
-		hrtimer_switch_to_hres();
-}
-
-/*
  * Called from hardirq context every jiffy
  */
 void hrtimer_run_queues(void)
@@ -1727,6 +1703,13 @@ void hrtimer_run_queues(void)
 	if (hrtimer_hres_active())
 		return;
 
+	/*
+	 * Check whether we can switch to highres mode.
+	 */
+	if (tick_check_oneshot_change(!hrtimer_is_hres_enabled())
+	    && hrtimer_switch_to_hres())
+		return;
+
 	for (index = 0; index < HRTIMER_MAX_CLOCK_BASES; index++) {
 		base = &cpu_base->clock_base[index];
 		if (!timerqueue_getnext(&base->active))
diff --git a/kernel/timer.c b/kernel/timer.c
index 48652cc..3d7313c 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1443,8 +1443,6 @@ static void run_timer_softirq(struct softirq_action *h)
 	irq_work_run();
 #endif
 
-	hrtimer_run_pending();
-
 	if (time_after_eq(jiffies, base->timer_jiffies))
 		__run_timers(base);
 }
@@ -1454,8 +1452,32 @@ static void run_timer_softirq(struct softirq_action *h)
  */
 void run_local_timers(void)
 {
+	struct tvec_base *base = __this_cpu_read(tvec_bases);
+
 	hrtimer_run_queues();
-	raise_softirq(TIMER_SOFTIRQ);
+	/*
+	 * We can access this lockless as we are in the timer
+	 * interrupt. If there are no timers queued, nothing to do in
+	 * the timer softirq.
+	 */
+#ifdef CONFIG_PREEMPT_RT_FULL
+	if (!spin_do_trylock(&base->lock)) {
+		raise_softirq(TIMER_SOFTIRQ);
+		return;
+	}
+#endif
+	if (!base->active_timers)
+		goto out;
+
+	/* Check whether the next pending timer has expired */
+	if (time_before_eq(base->next_timer, jiffies))
+		raise_softirq(TIMER_SOFTIRQ);
+out:
+#ifdef CONFIG_PREEMPT_RT_FULL
+	rt_spin_unlock_after_trylock_in_irq(&base->lock);
+#endif
+	/* The ; ensures that gcc won't complain in the !RT case */
+	;
 }
 
 #ifdef __ARCH_WANT_SYS_ALARM
-- 
1.8.5.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RT 03/14] timer: Raise softirq if theres irq_work
  2014-03-01  3:52 [PATCH RT 00/14] Linux 3.10.32-rt31-rc2 Steven Rostedt
  2014-03-01  3:52 ` [PATCH RT 01/14] rcu: Dont activate RCU core on NO_HZ_FULL CPUs Steven Rostedt
  2014-03-01  3:52 ` [PATCH RT 02/14] timers: do not raise softirq unconditionally Steven Rostedt
@ 2014-03-01  3:52 ` Steven Rostedt
  2014-03-01  3:52 ` [PATCH RT 04/14] timer/rt: Always raise the softirq if theres irq_work to be done Steven Rostedt
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2014-03-01  3:52 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Paul Gortmaker, stable-rt

[-- Attachment #1: 0003-timer-Raise-softirq-if-there-s-irq_work.patch --]
[-- Type: text/plain, Size: 1750 bytes --]

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Steven Rostedt <rostedt@goodmis.org>

[ Talking with Sebastian on IRC, it seems that doing the irq_work_run()
  from the interrupt in -rt is a bad thing. Here we simply raise the
  softirq if there's irq work to do. This too boots on my i7 ]

After trying hard to figure out why my i7 box was locking up with the
new active_timers code, that does not run the timer softirq if there
are no active timers, I took an extra look at the softirq handler and
noticed that it doesn't just run timer softirqs, it also runs irq work.

This was the bug that was locking up the system. It wasn't missing a
timer, it was missing irq work. By always doing the irq work callbacks,
the system boots fine. The missing irq work callback was the RCU's
sp_wakeup() function.

No need to check for defined(CONFIG_IRQ_WORK). When that's not set the
"irq_work_needs_cpu()" is a static inline that returns false.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/timer.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index 3d7313c..25a38c4 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1466,8 +1466,13 @@ void run_local_timers(void)
 		return;
 	}
 #endif
-	if (!base->active_timers)
-		goto out;
+	if (!base->active_timers) {
+#ifdef CONFIG_PREEMPT_RT_FULL
+		/* On RT, irq work runs from softirq */
+		if (!irq_work_needs_cpu())
+#endif
+			goto out;
+	}
 
 	/* Check whether the next pending timer has expired */
 	if (time_before_eq(base->next_timer, jiffies))
-- 
1.8.5.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RT 04/14] timer/rt: Always raise the softirq if theres irq_work to be done
  2014-03-01  3:52 [PATCH RT 00/14] Linux 3.10.32-rt31-rc2 Steven Rostedt
                   ` (2 preceding siblings ...)
  2014-03-01  3:52 ` [PATCH RT 03/14] timer: Raise softirq if theres irq_work Steven Rostedt
@ 2014-03-01  3:52 ` Steven Rostedt
  2014-03-01  3:52 ` [PATCH RT 05/14] rcutree/rcu_bh_qs: disable irq while calling rcu_preempt_qs() Steven Rostedt
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2014-03-01  3:52 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Paul Gortmaker, stable-rt

[-- Attachment #1: 0004-timer-rt-Always-raise-the-softirq-if-there-s-irq_wor.patch --]
[-- Type: text/plain, Size: 2482 bytes --]

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Steven Rostedt <rostedt@goodmis.org>

It was previously discovered that some systems would hang on boot up
with a previous version of 3.12-rt. This was due to RCU using irq_work,
and RT defers the irq_work to a softirq. But if there's no active
timers, the softirq will not be raised, and RCU work will not get done,
causing the system to hang.  The fix was to check that if there was no
active timers but irq_work to be done, then we should raise the softirq.

But this fix was not 100% correct. It left out the case that there were
active timers that were not expired yet. This would have the softirq
not get raised even if there was irq work to be done.

If there is irq_work to be done, then we must raise the timer softirq
regardless of if there is active timers or whether they are expired or
not. The softirq can handle those cases. But we can never ignore
irq_work.

As it is only PREEMPT_RT_FULL that requires irq_work to be done in the
softirq, we can pull out the check in the active_timers condition, and
make the code a bit cleaner by having the irq_work check separate, and
put the code in with the other #ifdef PREEMPT_RT. If there is irq_work
to be done, there's no need to check the active timers or if they are
expired. Just raise the time softirq and be done with it. Otherwise, we
can do the timer checks just like we do with non -rt.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/timer.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index 25a38c4..f63a793 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1461,18 +1461,20 @@ void run_local_timers(void)
 	 * the timer softirq.
 	 */
 #ifdef CONFIG_PREEMPT_RT_FULL
+	/* On RT, irq work runs from softirq */
+	if (irq_work_needs_cpu()) {
+		raise_softirq(TIMER_SOFTIRQ);
+		return;
+	}
+
 	if (!spin_do_trylock(&base->lock)) {
 		raise_softirq(TIMER_SOFTIRQ);
 		return;
 	}
 #endif
-	if (!base->active_timers) {
-#ifdef CONFIG_PREEMPT_RT_FULL
-		/* On RT, irq work runs from softirq */
-		if (!irq_work_needs_cpu())
-#endif
-			goto out;
-	}
+
+	if (!base->active_timers)
+		goto out;
 
 	/* Check whether the next pending timer has expired */
 	if (time_before_eq(base->next_timer, jiffies))
-- 
1.8.5.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RT 05/14] rcutree/rcu_bh_qs: disable irq while calling rcu_preempt_qs()
  2014-03-01  3:52 [PATCH RT 00/14] Linux 3.10.32-rt31-rc2 Steven Rostedt
                   ` (3 preceding siblings ...)
  2014-03-01  3:52 ` [PATCH RT 04/14] timer/rt: Always raise the softirq if theres irq_work to be done Steven Rostedt
@ 2014-03-01  3:52 ` Steven Rostedt
  2014-03-01  3:52 ` [PATCH RT 06/14] Revert "x86: Disable IST stacks for debug/int 3/stack fault for PREEMPT_RT" Steven Rostedt
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2014-03-01  3:52 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Paul Gortmaker, stable-rt, Tiejun Chen, Bin Jiang

[-- Attachment #1: 0005-rcutree-rcu_bh_qs-disable-irq-while-calling-rcu_pree.patch --]
[-- Type: text/plain, Size: 1597 bytes --]

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Tiejun Chen <tiejun.chen@windriver.com>

Any callers to the function rcu_preempt_qs() must disable irqs in
order to protect the assignment to ->rcu_read_unlock_special. In
RT case, rcu_bh_qs() as the wrapper of rcu_preempt_qs() is called
in some scenarios where irq is enabled, like this path,

do_single_softirq()
    |
    + local_irq_enable();
    + handle_softirq()
    |    |
    |    + rcu_bh_qs()
    |        |
    |        + rcu_preempt_qs()
    |
    + local_irq_disable()

So here we'd better disable irq directly inside of rcu_bh_qs() to
fix this, otherwise the kernel may be freezable sometimes as
observed. And especially this way is also kind and safe for the
potential rcu_bh_qs() usage elsewhere in the future.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Bin Jiang <bin.jiang@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/rcutree.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 75743cb..5439cee 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -187,7 +187,12 @@ static void rcu_preempt_qs(int cpu);
 
 void rcu_bh_qs(int cpu)
 {
+	unsigned long flags;
+
+	/* Callers to this function, rcu_preempt_qs(), must disable irqs. */
+	local_irq_save(flags);
 	rcu_preempt_qs(cpu);
+	local_irq_restore(flags);
 }
 #else
 void rcu_bh_qs(int cpu)
-- 
1.8.5.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RT 06/14] Revert "x86: Disable IST stacks for debug/int 3/stack fault for PREEMPT_RT"
  2014-03-01  3:52 [PATCH RT 00/14] Linux 3.10.32-rt31-rc2 Steven Rostedt
                   ` (4 preceding siblings ...)
  2014-03-01  3:52 ` [PATCH RT 05/14] rcutree/rcu_bh_qs: disable irq while calling rcu_preempt_qs() Steven Rostedt
@ 2014-03-01  3:52 ` Steven Rostedt
  2014-03-01  3:52 ` [PATCH RT 07/14] rt: Make cpu_chill() use hrtimer instead of msleep() Steven Rostedt
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2014-03-01  3:52 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Paul Gortmaker, stable-rt, Brian Silverman,
	Andi Kleen

[-- Attachment #1: 0006-Revert-x86-Disable-IST-stacks-for-debug-int-3-stack-.patch --]
[-- Type: text/plain, Size: 4379 bytes --]

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

where do I start. Let me explain what is going on here. The code
sequence
| pushf
| pop    %edx
| or     $0x1,%dh
| push   %edx
| mov    $0xe0,%eax
| popf
| sysenter

triggers the bug. On 64bit kernel we see the double fault (with 32bit and
64bit userland) and on 32bit kernel there is no problem. The reporter said
that double fault does not happen on 64bit kernel with 64bit userland and
this is because in that case the VDSO uses the "syscall" interface instead
of "sysenter".

The bug. "popf" loads the flags with the TF bit set which enables
"single stepping" and this leads to a debug exception. Usually on 64bit
we have a special IST stack for the debug exception. Due to patch [0] we
do not use the IST stack but the kernel stack instead. On 64bit the
sysenter instruction starts in kernel with the stack address NULL. The
code sequence above enters the debug exception (TF flag) after the
sysenter instruction was executed which sets the stack pointer to NULL
and we have a fault (it seems that the debug exception saves some bytes
on the stack).
To fix the double fault I'm going to drop patch [0]. It is completely
pointless. In do_debug() and do_stack_segment() we disable preemption
which means the task can't leave the CPU. So it does not matter if we run
on IST or on kernel stack.
There is a patch [1] which drops preempt_disable() call for a 32bit
kernel but not for 64bit so there should be no regression.
And [1] seems valid even for this code sequence. We enter the debug
exception with a 256bytes long per cpu stack and migrate to the kernel
stack before calling do_debug().

[0] x86-disable-debug-stack.patch
[1] fix-rt-int3-x86_32-3.2-rt.patch

Cc: stable-rt@vger.kernel.org
Reported-by: Brian Silverman <bsilver16384@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 arch/x86/include/asm/page_64_types.h | 21 ++++++---------------
 arch/x86/kernel/cpu/common.c         |  2 --
 arch/x86/kernel/dumpstack_64.c       |  4 ----
 3 files changed, 6 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 3cedb22..6c896fb 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -14,21 +14,12 @@
 #define IRQ_STACK_ORDER 2
 #define IRQ_STACK_SIZE (PAGE_SIZE << IRQ_STACK_ORDER)
 
-#ifdef CONFIG_PREEMPT_RT_FULL
-# define STACKFAULT_STACK 0
-# define DOUBLEFAULT_STACK 1
-# define NMI_STACK 2
-# define DEBUG_STACK 0
-# define MCE_STACK 3
-# define N_EXCEPTION_STACKS 3  /* hw limit: 7 */
-#else
-# define STACKFAULT_STACK 1
-# define DOUBLEFAULT_STACK 2
-# define NMI_STACK 3
-# define DEBUG_STACK 4
-# define MCE_STACK 5
-# define N_EXCEPTION_STACKS 5  /* hw limit: 7 */
-#endif
+#define STACKFAULT_STACK 1
+#define DOUBLEFAULT_STACK 2
+#define NMI_STACK 3
+#define DEBUG_STACK 4
+#define MCE_STACK 5
+#define N_EXCEPTION_STACKS 5  /* hw limit: 7 */
 
 #define PUD_PAGE_SIZE		(_AC(1, UL) << PUD_SHIFT)
 #define PUD_PAGE_MASK		(~(PUD_PAGE_SIZE-1))
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index d39993b..deeb48d 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1109,9 +1109,7 @@ DEFINE_PER_CPU(struct task_struct *, fpu_owner_task);
  */
 static const unsigned int exception_stack_sizes[N_EXCEPTION_STACKS] = {
 	  [0 ... N_EXCEPTION_STACKS - 1]	= EXCEPTION_STKSZ,
-#if DEBUG_STACK > 0
 	  [DEBUG_STACK - 1]			= DEBUG_STKSZ
-#endif
 };
 
 static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 52b4bcd..addb207 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -21,14 +21,10 @@
 		(N_EXCEPTION_STACKS + DEBUG_STKSZ/EXCEPTION_STKSZ - 2)
 
 static char x86_stack_ids[][8] = {
-#if DEBUG_STACK > 0
 		[ DEBUG_STACK-1			]	= "#DB",
-#endif
 		[ NMI_STACK-1			]	= "NMI",
 		[ DOUBLEFAULT_STACK-1		]	= "#DF",
-#if STACKFAULT_STACK > 0
 		[ STACKFAULT_STACK-1		]	= "#SS",
-#endif
 		[ MCE_STACK-1			]	= "#MC",
 #if DEBUG_STKSZ > EXCEPTION_STKSZ
 		[ N_EXCEPTION_STACKS ...
-- 
1.8.5.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RT 07/14] rt: Make cpu_chill() use hrtimer instead of msleep()
  2014-03-01  3:52 [PATCH RT 00/14] Linux 3.10.32-rt31-rc2 Steven Rostedt
                   ` (5 preceding siblings ...)
  2014-03-01  3:52 ` [PATCH RT 06/14] Revert "x86: Disable IST stacks for debug/int 3/stack fault for PREEMPT_RT" Steven Rostedt
@ 2014-03-01  3:52 ` Steven Rostedt
  2014-03-01  3:52 ` [PATCH RT 08/14] kernel/hrtimer: be non-freezeable in cpu_chill() Steven Rostedt
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2014-03-01  3:52 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Paul Gortmaker, stable-rt

[-- Attachment #1: 0007-rt-Make-cpu_chill-use-hrtimer-instead-of-msleep.patch --]
[-- Type: text/plain, Size: 3760 bytes --]

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Steven Rostedt <rostedt@goodmis.org>

Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken
up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is
called from softirq context, it may block the ksoftirqd() from running, in
which case, it may never wake up the msleep() causing the deadlock.

I checked the vmcore, and irq/74-qla2xxx is stuck in the msleep() call,
running on CPU 8. The one ksoftirqd that is stuck, happens to be the one that
runs on CPU 8, and it is blocked on a lock held by irq/74-qla2xxx. As that
ksoftirqd is the one that will wake up irq/74-qla2xxx, and it happens to be
blocked on a lock that irq/74-qla2xxx holds, we have our deadlock.

The solution is not to convert the cpu_chill() back to a cpu_relax() as that
will re-create a possible live lock that the cpu_chill() fixed earlier, and may
also leave this bug open on other softirqs. The fix is to remove the
dependency on ksoftirqd from cpu_chill(). That is, instead of calling
msleep() that requires ksoftirqd to wake it up, use the
hrtimer_nanosleep() code that does the wakeup from hard irq context.

|Looks to be the lock of the block softirq. I don't have the core dump
|anymore, but from what I could tell the ksoftirqd was blocked on the
|block softirq lock, where the block softirq handler did a msleep
|(called by the qla2xxx interrupt handler).
|
|Looking at trigger_softirq() in block/blk-softirq.c, it can do a
|smp_callfunction() to another cpu to run the block softirq. If that
|happens to be the cpu where the qla2xx irq handler is doing the block
|softirq and is in a middle of a msleep(), I believe the ksoftirqd will
|try to run the softirq. If it does that, then BOOM, it's deadlocked
|because the ksoftirqd will never run the timer softirq either.

|I should have also stated that it was only one lock that was involved.
|But the lock owner was doing a msleep() that requires a wakeup by
|ksoftirqd to continue. If ksoftirqd happens to be blocked on a lock
|held by the msleep() caller, then you have your deadlock.
|
|It's best not to have any softirqs going to sleep requiring another
|softirq to wake it up. Note, if we ever require a timer softirq to do a
|cpu_chill() it will most definitely hit this deadlock.

Cc: stable-rt@vger.kernel.org
Found-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bigeasy: add the 4 | chapters from email]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/delay.h |  2 +-
 kernel/hrtimer.c      | 15 +++++++++++++++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/delay.h b/include/linux/delay.h
index e23a7c0..37caab3 100644
--- a/include/linux/delay.h
+++ b/include/linux/delay.h
@@ -53,7 +53,7 @@ static inline void ssleep(unsigned int seconds)
 }
 
 #ifdef CONFIG_PREEMPT_RT_FULL
-# define cpu_chill()	msleep(1)
+extern void cpu_chill(void);
 #else
 # define cpu_chill()	cpu_relax()
 #endif
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index a7e90b2..b569c6d 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1887,6 +1887,21 @@ SYSCALL_DEFINE2(nanosleep, struct timespec __user *, rqtp,
 	return hrtimer_nanosleep(&tu, rmtp, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
 }
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+/*
+ * Sleep for 1 ms in hope whoever holds what we want will let it go.
+ */
+void cpu_chill(void)
+{
+	struct timespec tu = {
+		.tv_nsec = NSEC_PER_MSEC,
+	};
+
+	hrtimer_nanosleep(&tu, NULL, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
+}
+EXPORT_SYMBOL(cpu_chill);
+#endif
+
 /*
  * Functions related to boot-time initialization:
  */
-- 
1.8.5.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RT 08/14] kernel/hrtimer: be non-freezeable in cpu_chill()
  2014-03-01  3:52 [PATCH RT 00/14] Linux 3.10.32-rt31-rc2 Steven Rostedt
                   ` (6 preceding siblings ...)
  2014-03-01  3:52 ` [PATCH RT 07/14] rt: Make cpu_chill() use hrtimer instead of msleep() Steven Rostedt
@ 2014-03-01  3:52 ` Steven Rostedt
  2014-03-01  3:52 ` [PATCH RT 09/14] irq_work: allow certain work in hard irq context Steven Rostedt
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2014-03-01  3:52 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Paul Gortmaker, stable-rt

[-- Attachment #1: 0008-kernel-hrtimer-be-non-freezeable-in-cpu_chill.patch --]
[-- Type: text/plain, Size: 2392 bytes --]

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Since we replaced msleep() by hrtimer I see now and then (rarely) this:

| [....] Waiting for /dev to be fully populated...
| =====================================
| [ BUG: udevd/229 still has locks held! ]
| 3.12.11-rt17 #23 Not tainted
| -------------------------------------
| 1 lock held by udevd/229:
|  #0:  (&type->i_mutex_dir_key#2){+.+.+.}, at: lookup_slow+0x28/0x98
|
| stack backtrace:
| CPU: 0 PID: 229 Comm: udevd Not tainted 3.12.11-rt17 #23
| (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14)
| (show_stack+0x10/0x14) from (dump_stack+0x74/0xbc)
| (dump_stack+0x74/0xbc) from (do_nanosleep+0x120/0x160)
| (do_nanosleep+0x120/0x160) from (hrtimer_nanosleep+0x90/0x110)
| (hrtimer_nanosleep+0x90/0x110) from (cpu_chill+0x30/0x38)
| (cpu_chill+0x30/0x38) from (dentry_kill+0x158/0x1ec)
| (dentry_kill+0x158/0x1ec) from (dput+0x74/0x15c)
| (dput+0x74/0x15c) from (lookup_real+0x4c/0x50)
| (lookup_real+0x4c/0x50) from (__lookup_hash+0x34/0x44)
| (__lookup_hash+0x34/0x44) from (lookup_slow+0x38/0x98)
| (lookup_slow+0x38/0x98) from (path_lookupat+0x208/0x7fc)
| (path_lookupat+0x208/0x7fc) from (filename_lookup+0x20/0x60)
| (filename_lookup+0x20/0x60) from (user_path_at_empty+0x50/0x7c)
| (user_path_at_empty+0x50/0x7c) from (user_path_at+0x14/0x1c)
| (user_path_at+0x14/0x1c) from (vfs_fstatat+0x48/0x94)
| (vfs_fstatat+0x48/0x94) from (SyS_stat64+0x14/0x30)
| (SyS_stat64+0x14/0x30) from (ret_fast_syscall+0x0/0x48)

For now I see no better way but to disable the freezer the sleep the period.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/hrtimer.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index b569c6d..eb4c8831c 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1896,8 +1896,12 @@ void cpu_chill(void)
 	struct timespec tu = {
 		.tv_nsec = NSEC_PER_MSEC,
 	};
+	unsigned int freeze_flag = current->flags & PF_NOFREEZE;
 
+	current->flags |= PF_NOFREEZE;
 	hrtimer_nanosleep(&tu, NULL, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
+	if (!freeze_flag)
+		current->flags &= ~PF_NOFREEZE;
 }
 EXPORT_SYMBOL(cpu_chill);
 #endif
-- 
1.8.5.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RT 09/14] irq_work: allow certain work in hard irq context
  2014-03-01  3:52 [PATCH RT 00/14] Linux 3.10.32-rt31-rc2 Steven Rostedt
                   ` (7 preceding siblings ...)
  2014-03-01  3:52 ` [PATCH RT 08/14] kernel/hrtimer: be non-freezeable in cpu_chill() Steven Rostedt
@ 2014-03-01  3:52 ` Steven Rostedt
  2014-03-01  3:52 ` [PATCH RT 10/14] arm/unwind: use a raw_spin_lock Steven Rostedt
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2014-03-01  3:52 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Paul Gortmaker, stable-rt

[-- Attachment #1: 0009-irq_work-allow-certain-work-in-hard-irq-context.patch --]
[-- Type: text/plain, Size: 5070 bytes --]

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

irq_work is processed in softirq context on -RT because we want to avoid
long latencies which might arise from processing lots of perf events.
The noHZ-full mode requires its callback to be called from real hardirq
context (commit 76c24fb ("nohz: New APIs to re-evaluate the tick on full
dynticks CPUs")). If it is called from a thread context we might get
wrong results for checks like "is_idle_task(current)".
This patch introduces a second list (hirq_work_list) which will be used
if irq_work_run() has been invoked from hardirq context and process only
work items marked with IRQ_WORK_HARD_IRQ.

This patch also removes arch_irq_work_raise() from sparc & powerpc like
it is already done for x86. Atleast for powerpc it is somehow
superfluous because it is called from the timer interrupt which should
invoke update_process_times().

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 arch/powerpc/kernel/time.c |  2 +-
 arch/sparc/kernel/pcr.c    |  2 ++
 include/linux/irq_work.h   |  1 +
 kernel/irq_work.c          | 22 +++++++++++++++++++---
 kernel/time/tick-sched.c   |  1 +
 kernel/timer.c             |  2 +-
 6 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 5fc29ad..7cc55b2 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -423,7 +423,7 @@ unsigned long profile_pc(struct pt_regs *regs)
 EXPORT_SYMBOL(profile_pc);
 #endif
 
-#ifdef CONFIG_IRQ_WORK
+#if defined(CONFIG_IRQ_WORK) && !defined(CONFIG_PREEMPT_RT_FULL)
 
 /*
  * 64-bit uses a byte in the PACA, 32-bit uses a per-cpu variable...
diff --git a/arch/sparc/kernel/pcr.c b/arch/sparc/kernel/pcr.c
index 269af58..dbb51a6 100644
--- a/arch/sparc/kernel/pcr.c
+++ b/arch/sparc/kernel/pcr.c
@@ -43,10 +43,12 @@ void __irq_entry deferred_pcr_work_irq(int irq, struct pt_regs *regs)
 	set_irq_regs(old_regs);
 }
 
+#ifndef CONFIG_PREEMPT_RT_FULL
 void arch_irq_work_raise(void)
 {
 	set_softint(1 << PIL_DEFERRED_PCR_WORK);
 }
+#endif
 
 const struct pcr_ops *pcr_ops;
 EXPORT_SYMBOL_GPL(pcr_ops);
diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index 6601702..60c19ee 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -16,6 +16,7 @@
 #define IRQ_WORK_BUSY		2UL
 #define IRQ_WORK_FLAGS		3UL
 #define IRQ_WORK_LAZY		4UL /* Doesn't want IPI, wait for tick */
+#define IRQ_WORK_HARD_IRQ	8UL /* Run hard IRQ context, even on RT */
 
 struct irq_work {
 	unsigned long flags;
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index f6e4377..35d21f9 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -20,6 +20,9 @@
 
 
 static DEFINE_PER_CPU(struct llist_head, irq_work_list);
+#ifdef CONFIG_PREEMPT_RT_FULL
+static DEFINE_PER_CPU(struct llist_head, hirq_work_list);
+#endif
 static DEFINE_PER_CPU(int, irq_work_raised);
 
 /*
@@ -48,7 +51,11 @@ static bool irq_work_claim(struct irq_work *work)
 	return true;
 }
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+void arch_irq_work_raise(void)
+#else
 void __weak arch_irq_work_raise(void)
+#endif
 {
 	/*
 	 * Lame architectures will get the timer tick callback
@@ -70,8 +77,12 @@ void irq_work_queue(struct irq_work *work)
 	/* Queue the entry and raise the IPI if needed. */
 	preempt_disable();
 
-	llist_add(&work->llnode, &__get_cpu_var(irq_work_list));
-
+#ifdef CONFIG_PREEMPT_RT_FULL
+	if (work->flags & IRQ_WORK_HARD_IRQ)
+		llist_add(&work->llnode, &__get_cpu_var(hirq_work_list));
+	else
+#endif
+		llist_add(&work->llnode, &__get_cpu_var(irq_work_list));
 	/*
 	 * If the work is not "lazy" or the tick is stopped, raise the irq
 	 * work interrupt (if supported by the arch), otherwise, just wait
@@ -115,7 +126,12 @@ static void __irq_work_run(void)
 	__this_cpu_write(irq_work_raised, 0);
 	barrier();
 
-	this_list = &__get_cpu_var(irq_work_list);
+#ifdef CONFIG_PREEMPT_RT_FULL
+	if (in_irq())
+		this_list = &__get_cpu_var(hirq_work_list);
+	else
+#endif
+		this_list = &__get_cpu_var(irq_work_list);
 	if (llist_empty(this_list))
 		return;
 
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 4e657e5..aa1e4b2 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -214,6 +214,7 @@ static void nohz_full_kick_work_func(struct irq_work *work)
 
 static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
 	.func = nohz_full_kick_work_func,
+	.flags = IRQ_WORK_HARD_IRQ,
 };
 
 /*
diff --git a/kernel/timer.c b/kernel/timer.c
index f63a793..76846a1 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1425,7 +1425,7 @@ void update_process_times(int user_tick)
 	scheduler_tick();
 	run_local_timers();
 	rcu_check_callbacks(cpu, user_tick);
-#if defined(CONFIG_IRQ_WORK) && !defined(CONFIG_PREEMPT_RT_FULL)
+#if defined(CONFIG_IRQ_WORK)
 	if (in_irq())
 		irq_work_run();
 #endif
-- 
1.8.5.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RT 10/14] arm/unwind: use a raw_spin_lock
  2014-03-01  3:52 [PATCH RT 00/14] Linux 3.10.32-rt31-rc2 Steven Rostedt
                   ` (8 preceding siblings ...)
  2014-03-01  3:52 ` [PATCH RT 09/14] irq_work: allow certain work in hard irq context Steven Rostedt
@ 2014-03-01  3:52 ` Steven Rostedt
  2014-03-01  3:52 ` [PATCH RT 11/14] net: ip_send_unicast_reply: add missing local serialization Steven Rostedt
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2014-03-01  3:52 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Paul Gortmaker, stable-rt

[-- Attachment #1: 0010-arm-unwind-use-a-raw_spin_lock.patch --]
[-- Type: text/plain, Size: 2884 bytes --]

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Mostly unwind is done with irqs enabled however SLUB may call it with
irqs disabled while creating a new SLUB cache.

I had system freeze while loading a module which called
kmem_cache_create() on init. That means SLUB's __slab_alloc() disabled
interrupts and then

->new_slab_objects()
 ->new_slab()
  ->setup_object()
   ->setup_object_debug()
    ->init_tracking()
     ->set_track()
      ->save_stack_trace()
       ->save_stack_trace_tsk()
        ->walk_stackframe()
         ->unwind_frame()
          ->unwind_find_idx()
           =>spin_lock_irqsave(&unwind_lock);

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 arch/arm/kernel/unwind.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c
index 00df012..bbafc67 100644
--- a/arch/arm/kernel/unwind.c
+++ b/arch/arm/kernel/unwind.c
@@ -87,7 +87,7 @@ extern const struct unwind_idx __start_unwind_idx[];
 static const struct unwind_idx *__origin_unwind_idx;
 extern const struct unwind_idx __stop_unwind_idx[];
 
-static DEFINE_SPINLOCK(unwind_lock);
+static DEFINE_RAW_SPINLOCK(unwind_lock);
 static LIST_HEAD(unwind_tables);
 
 /* Convert a prel31 symbol to an absolute address */
@@ -195,7 +195,7 @@ static const struct unwind_idx *unwind_find_idx(unsigned long addr)
 		/* module unwind tables */
 		struct unwind_table *table;
 
-		spin_lock_irqsave(&unwind_lock, flags);
+		raw_spin_lock_irqsave(&unwind_lock, flags);
 		list_for_each_entry(table, &unwind_tables, list) {
 			if (addr >= table->begin_addr &&
 			    addr < table->end_addr) {
@@ -207,7 +207,7 @@ static const struct unwind_idx *unwind_find_idx(unsigned long addr)
 				break;
 			}
 		}
-		spin_unlock_irqrestore(&unwind_lock, flags);
+		raw_spin_unlock_irqrestore(&unwind_lock, flags);
 	}
 
 	pr_debug("%s: idx = %p\n", __func__, idx);
@@ -469,9 +469,9 @@ struct unwind_table *unwind_table_add(unsigned long start, unsigned long size,
 	tab->begin_addr = text_addr;
 	tab->end_addr = text_addr + text_size;
 
-	spin_lock_irqsave(&unwind_lock, flags);
+	raw_spin_lock_irqsave(&unwind_lock, flags);
 	list_add_tail(&tab->list, &unwind_tables);
-	spin_unlock_irqrestore(&unwind_lock, flags);
+	raw_spin_unlock_irqrestore(&unwind_lock, flags);
 
 	return tab;
 }
@@ -483,9 +483,9 @@ void unwind_table_del(struct unwind_table *tab)
 	if (!tab)
 		return;
 
-	spin_lock_irqsave(&unwind_lock, flags);
+	raw_spin_lock_irqsave(&unwind_lock, flags);
 	list_del(&tab->list);
-	spin_unlock_irqrestore(&unwind_lock, flags);
+	raw_spin_unlock_irqrestore(&unwind_lock, flags);
 
 	kfree(tab);
 }
-- 
1.8.5.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RT 11/14] net: ip_send_unicast_reply: add missing local serialization
  2014-03-01  3:52 [PATCH RT 00/14] Linux 3.10.32-rt31-rc2 Steven Rostedt
                   ` (9 preceding siblings ...)
  2014-03-01  3:52 ` [PATCH RT 10/14] arm/unwind: use a raw_spin_lock Steven Rostedt
@ 2014-03-01  3:52 ` Steven Rostedt
  2014-03-01  3:52 ` [PATCH RT 12/14] leds: trigger: disable CPU trigger on -RT Steven Rostedt
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2014-03-01  3:52 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Paul Gortmaker, stable-rt, Nicholas Mc Guire

[-- Attachment #1: 0011-net-ip_send_unicast_reply-add-missing-local-serializ.patch --]
[-- Type: text/plain, Size: 3660 bytes --]

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Nicholas Mc Guire <der.herr@hofr.at>

in response to the oops in ip_output.c:ip_send_unicast_reply under high
network load with CONFIG_PREEMPT_RT_FULL=y, reported by Sami Pietikainen
<Sami.Pietikainen@wapice.com>, this patch adds local serialization in
ip_send_unicast_reply.

from ip_output.c:
/*
 *      Generic function to send a packet as reply to another packet.
 *      Used to send some TCP resets/acks so far.
 *
 *      Use a fake percpu inet socket to avoid false sharing and contention.
 */
static DEFINE_PER_CPU(struct inet_sock, unicast_sock) = {
...

which was added in commit be9f4a44 in linux-stable. The git log, wich
introduced the PER_CPU unicast_sock, states:
<snip>
commit be9f4a44e7d41cee50ddb5f038fc2391cbbb4046
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Jul 19 07:34:03 2012 +0000

    ipv4: tcp: remove per net tcp_sock

    tcp_v4_send_reset() and tcp_v4_send_ack() use a single socket
    per network namespace.

    This leads to bad behavior on multiqueue NICS, because many cpus
    contend for the socket lock and once socket lock is acquired, extra
    false sharing on various socket fields slow down the operations.

    To better resist to attacks, we use a percpu socket. Each cpu can
    run without contention, using appropriate memory (local node)
<snip>

The per-cpu here thus is assuming exclusivity serializing per cpu - so
the use of get_cpu_ligh introduced in
net-use-cpu-light-in-ip-send-unicast-reply.patch, which droped the
preempt_disable in favor of a migrate_disable is probably wrong as this
only handles the referencial consistency but not the serialization. To
evade a preempt_disable here a local lock would be needed.

Therapie:
 * add local lock:
 * and re-introduce local serialization:

Tested on x86 with high network load using the testcase from Sami Pietikainen
  while : ; do wget -O - ftp://LOCAL_SERVER/empty_file > /dev/null 2>&1; done

Link: http://www.spinics.net/lists/linux-rt-users/msg11007.html
Cc: stable-rt@vger.kernel.org
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 net/ipv4/ip_output.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 2d69b1b..ab674e8 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -79,6 +79,7 @@
 #include <linux/mroute.h>
 #include <linux/netlink.h>
 #include <linux/tcp.h>
+#include <linux/locallock.h>
 
 int sysctl_ip_default_ttl __read_mostly = IPDEFTTL;
 EXPORT_SYMBOL(sysctl_ip_default_ttl);
@@ -1471,6 +1472,9 @@ static DEFINE_PER_CPU(struct inet_sock, unicast_sock) = {
 	.uc_ttl		= -1,
 };
 
+/* serialize concurrent calls on the same CPU to ip_send_unicast_reply */
+static DEFINE_LOCAL_IRQ_LOCK(unicast_lock);
+
 void ip_send_unicast_reply(struct net *net, struct sk_buff *skb, __be32 daddr,
 			   __be32 saddr, const struct ip_reply_arg *arg,
 			   unsigned int len)
@@ -1508,8 +1512,7 @@ void ip_send_unicast_reply(struct net *net, struct sk_buff *skb, __be32 daddr,
 	if (IS_ERR(rt))
 		return;
 
-	get_cpu_light();
-	inet = &__get_cpu_var(unicast_sock);
+	inet = &get_locked_var(unicast_lock, unicast_sock);
 
 	inet->tos = arg->tos;
 	sk = &inet->sk;
@@ -1533,7 +1536,7 @@ void ip_send_unicast_reply(struct net *net, struct sk_buff *skb, __be32 daddr,
 		ip_push_pending_frames(sk, &fl4);
 	}
 
-	put_cpu_light();
+	put_locked_var(unicast_lock, unicast_sock);
 
 	ip_rt_put(rt);
 }
-- 
1.8.5.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RT 12/14] leds: trigger: disable CPU trigger on -RT
  2014-03-01  3:52 [PATCH RT 00/14] Linux 3.10.32-rt31-rc2 Steven Rostedt
                   ` (10 preceding siblings ...)
  2014-03-01  3:52 ` [PATCH RT 11/14] net: ip_send_unicast_reply: add missing local serialization Steven Rostedt
@ 2014-03-01  3:52 ` Steven Rostedt
  2014-03-01  3:52 ` [PATCH RT 13/14] rcu: Eliminate softirq processing from rcutree Steven Rostedt
  2014-03-01  3:52 ` [PATCH RT 14/14] Linux 3.10.32-rt31-rc2 Steven Rostedt
  13 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2014-03-01  3:52 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Paul Gortmaker, stable-rt

[-- Attachment #1: 0012-leds-trigger-disable-CPU-trigger-on-RT.patch --]
[-- Type: text/plain, Size: 1910 bytes --]

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

as it triggers:
|CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.8-rt10 #141
|[<c0014aa4>] (unwind_backtrace+0x0/0xf8) from [<c0012788>] (show_stack+0x1c/0x20)
|[<c0012788>] (show_stack+0x1c/0x20) from [<c043c8dc>] (dump_stack+0x20/0x2c)
|[<c043c8dc>] (dump_stack+0x20/0x2c) from [<c004c5e8>] (__might_sleep+0x13c/0x170)
|[<c004c5e8>] (__might_sleep+0x13c/0x170) from [<c043f270>] (__rt_spin_lock+0x28/0x38)
|[<c043f270>] (__rt_spin_lock+0x28/0x38) from [<c043fa00>] (rt_read_lock+0x68/0x7c)
|[<c043fa00>] (rt_read_lock+0x68/0x7c) from [<c036cf74>] (led_trigger_event+0x2c/0x5c)
|[<c036cf74>] (led_trigger_event+0x2c/0x5c) from [<c036e0bc>] (ledtrig_cpu+0x54/0x5c)
|[<c036e0bc>] (ledtrig_cpu+0x54/0x5c) from [<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c)
|[<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c) from [<c00590b8>] (cpu_startup_entry+0xa8/0x234)
|[<c00590b8>] (cpu_startup_entry+0xa8/0x234) from [<c043b2cc>] (rest_init+0xb8/0xe0)
|[<c043b2cc>] (rest_init+0xb8/0xe0) from [<c061ebe0>] (start_kernel+0x2c4/0x380)

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 drivers/leds/trigger/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/leds/trigger/Kconfig b/drivers/leds/trigger/Kconfig
index 49794b4..3d7245d 100644
--- a/drivers/leds/trigger/Kconfig
+++ b/drivers/leds/trigger/Kconfig
@@ -61,7 +61,7 @@ config LEDS_TRIGGER_BACKLIGHT
 
 config LEDS_TRIGGER_CPU
 	bool "LED CPU Trigger"
-	depends on LEDS_TRIGGERS
+	depends on LEDS_TRIGGERS && !PREEMPT_RT_BASE
 	help
 	  This allows LEDs to be controlled by active CPUs. This shows
 	  the active CPUs across an array of LEDs so you can see which
-- 
1.8.5.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RT 13/14] rcu: Eliminate softirq processing from rcutree
  2014-03-01  3:52 [PATCH RT 00/14] Linux 3.10.32-rt31-rc2 Steven Rostedt
                   ` (11 preceding siblings ...)
  2014-03-01  3:52 ` [PATCH RT 12/14] leds: trigger: disable CPU trigger on -RT Steven Rostedt
@ 2014-03-01  3:52 ` Steven Rostedt
  2014-03-01  3:52 ` [PATCH RT 14/14] Linux 3.10.32-rt31-rc2 Steven Rostedt
  13 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2014-03-01  3:52 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Paul Gortmaker, stable-rt, Mike Galbraith,
	Paul E. McKenney

[-- Attachment #1: 0013-rcu-Eliminate-softirq-processing-from-rcutree.patch --]
[-- Type: text/plain, Size: 12581 bytes --]

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

------------------

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Running RCU out of softirq is a problem for some workloads that would
like to manage RCU core processing independently of other softirq work,
for example, setting kthread priority.  This commit therefore moves the
RCU core work from softirq to a per-CPU/per-flavor SCHED_OTHER kthread
named rcuc.  The SCHED_OTHER approach avoids the scalability problems
that appeared with the earlier attempt to move RCU core processing to
from softirq to kthreads.  That said, kernels built with RCU_BOOST=y
will run the rcuc kthreads at the RCU-boosting priority.

Cc: stable-rt@vger.kernel.org
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/rcutree.c        | 113 +++++++++++++++++++++++++++++++++++-----
 kernel/rcutree.h        |   3 +-
 kernel/rcutree_plugin.h | 134 +++++-------------------------------------------
 3 files changed, 113 insertions(+), 137 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 5439cee..f3bc6eb 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -53,6 +53,11 @@
 #include <linux/delay.h>
 #include <linux/stop_machine.h>
 #include <linux/random.h>
+#include <linux/delay.h>
+#include <linux/gfp.h>
+#include <linux/oom.h>
+#include <linux/smpboot.h>
+#include "time/tick-internal.h"
 
 #include "rcutree.h"
 #include <trace/events/rcu.h>
@@ -128,8 +133,6 @@ EXPORT_SYMBOL_GPL(rcu_scheduler_active);
  */
 static int rcu_scheduler_fully_active __read_mostly;
 
-#ifdef CONFIG_RCU_BOOST
-
 /*
  * Control variables for per-CPU and per-rcu_node kthreads.  These
  * handle all flavors of RCU.
@@ -139,8 +142,6 @@ DEFINE_PER_CPU(unsigned int, rcu_cpu_kthread_status);
 DEFINE_PER_CPU(unsigned int, rcu_cpu_kthread_loops);
 DEFINE_PER_CPU(char, rcu_cpu_has_work);
 
-#endif /* #ifdef CONFIG_RCU_BOOST */
-
 static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu);
 static void invoke_rcu_core(void);
 static void invoke_rcu_callbacks(struct rcu_state *rsp, struct rcu_data *rdp);
@@ -2312,16 +2313,14 @@ __rcu_process_callbacks(struct rcu_state *rsp)
 /*
  * Do RCU core processing for the current CPU.
  */
-static void rcu_process_callbacks(struct softirq_action *unused)
+static void rcu_process_callbacks(void)
 {
 	struct rcu_state *rsp;
 
 	if (cpu_is_offline(smp_processor_id()))
 		return;
-	trace_rcu_utilization("Start RCU core");
 	for_each_rcu_flavor(rsp)
 		__rcu_process_callbacks(rsp);
-	trace_rcu_utilization("End RCU core");
 }
 
 /*
@@ -2335,18 +2334,105 @@ static void invoke_rcu_callbacks(struct rcu_state *rsp, struct rcu_data *rdp)
 {
 	if (unlikely(!ACCESS_ONCE(rcu_scheduler_fully_active)))
 		return;
-	if (likely(!rsp->boost)) {
-		rcu_do_batch(rsp, rdp);
+	rcu_do_batch(rsp, rdp);
+}
+
+static void rcu_wake_cond(struct task_struct *t, int status)
+{
+	/*
+	 * If the thread is yielding, only wake it when this
+	 * is invoked from idle
+	 */
+	if (t && (status != RCU_KTHREAD_YIELDING || is_idle_task(current)))
+		wake_up_process(t);
+}
+
+/*
+ * Wake up this CPU's rcuc kthread to do RCU core processing.
+ */
+static void invoke_rcu_core(void)
+{
+	unsigned long flags;
+	struct task_struct *t;
+
+	if (!cpu_online(smp_processor_id()))
 		return;
+	local_irq_save(flags);
+	__this_cpu_write(rcu_cpu_has_work, 1);
+	t = __this_cpu_read(rcu_cpu_kthread_task);
+	if (t != NULL && current != t)
+		rcu_wake_cond(t, __this_cpu_read(rcu_cpu_kthread_status));
+	local_irq_restore(flags);
+}
+
+static void rcu_cpu_kthread_park(unsigned int cpu)
+{
+	per_cpu(rcu_cpu_kthread_status, cpu) = RCU_KTHREAD_OFFCPU;
+}
+
+static int rcu_cpu_kthread_should_run(unsigned int cpu)
+{
+	return __this_cpu_read(rcu_cpu_has_work);
+}
+
+/*
+ * Per-CPU kernel thread that invokes RCU callbacks.  This replaces the
+ * RCU softirq used in flavors and configurations of RCU that do not
+ * support RCU priority boosting.
+ */
+static void rcu_cpu_kthread(unsigned int cpu)
+{
+	unsigned int *statusp = &__get_cpu_var(rcu_cpu_kthread_status);
+	char work, *workp = &__get_cpu_var(rcu_cpu_has_work);
+	int spincnt;
+
+	for (spincnt = 0; spincnt < 10; spincnt++) {
+		trace_rcu_utilization("Start CPU kthread@rcu_wait");
+		local_bh_disable();
+		*statusp = RCU_KTHREAD_RUNNING;
+		this_cpu_inc(rcu_cpu_kthread_loops);
+		local_irq_disable();
+		work = *workp;
+		*workp = 0;
+		local_irq_enable();
+		if (work)
+			rcu_process_callbacks();
+		local_bh_enable();
+		if (*workp == 0) {
+			trace_rcu_utilization("End CPU kthread@rcu_wait");
+			*statusp = RCU_KTHREAD_WAITING;
+			return;
+		}
 	}
-	invoke_rcu_callbacks_kthread();
+	*statusp = RCU_KTHREAD_YIELDING;
+	trace_rcu_utilization("Start CPU kthread@rcu_yield");
+	schedule_timeout_interruptible(2);
+	trace_rcu_utilization("End CPU kthread@rcu_yield");
+	*statusp = RCU_KTHREAD_WAITING;
 }
 
-static void invoke_rcu_core(void)
+static struct smp_hotplug_thread rcu_cpu_thread_spec = {
+	.store			= &rcu_cpu_kthread_task,
+	.thread_should_run	= rcu_cpu_kthread_should_run,
+	.thread_fn		= rcu_cpu_kthread,
+	.thread_comm		= "rcuc/%u",
+	.setup			= rcu_cpu_kthread_setup,
+	.park			= rcu_cpu_kthread_park,
+};
+
+/*
+ * Spawn per-CPU RCU core processing kthreads.
+ */
+static int __init rcu_spawn_core_kthreads(void)
 {
-	if (cpu_online(smp_processor_id()))
-		raise_softirq(RCU_SOFTIRQ);
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		per_cpu(rcu_cpu_has_work, cpu) = 0;
+	BUG_ON(smpboot_register_percpu_thread(&rcu_cpu_thread_spec));
+	return 0;
 }
+early_initcall(rcu_spawn_core_kthreads);
 
 /*
  * Handle any core-RCU processing required by a call_rcu() invocation.
@@ -3355,7 +3441,6 @@ void __init rcu_init(void)
 	rcu_init_one(&rcu_sched_state, &rcu_sched_data);
 	rcu_init_one(&rcu_bh_state, &rcu_bh_data);
 	__rcu_init_preempt();
-	open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
 
 	/*
 	 * We don't need protection against CPU-hotplug here because
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 7e0b397..75035be 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -513,10 +513,9 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
 static void __init __rcu_init_preempt(void);
 static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags);
 static void rcu_preempt_boost_start_gp(struct rcu_node *rnp);
-static void invoke_rcu_callbacks_kthread(void);
 static bool rcu_is_callbacks_kthread(void);
+static void rcu_cpu_kthread_setup(unsigned int cpu);
 #ifdef CONFIG_RCU_BOOST
-static void rcu_preempt_do_callbacks(void);
 static int __cpuinit rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
 						 struct rcu_node *rnp);
 #endif /* #ifdef CONFIG_RCU_BOOST */
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 481a124..e2d68e5 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -24,12 +24,6 @@
  *	   Paul E. McKenney <paulmck@linux.vnet.ibm.com>
  */
 
-#include <linux/delay.h>
-#include <linux/gfp.h>
-#include <linux/oom.h>
-#include <linux/smpboot.h>
-#include <linux/tick.h>
-
 #define RCU_KTHREAD_PRIO 1
 
 #ifdef CONFIG_RCU_BOOST
@@ -659,15 +653,6 @@ static void rcu_preempt_check_callbacks(int cpu)
 		t->rcu_read_unlock_special |= RCU_READ_UNLOCK_NEED_QS;
 }
 
-#ifdef CONFIG_RCU_BOOST
-
-static void rcu_preempt_do_callbacks(void)
-{
-	rcu_do_batch(&rcu_preempt_state, &__get_cpu_var(rcu_preempt_data));
-}
-
-#endif /* #ifdef CONFIG_RCU_BOOST */
-
 /*
  * Queue a preemptible-RCU callback for invocation after a grace period.
  */
@@ -1103,6 +1088,19 @@ static void __init __rcu_init_preempt(void)
 
 #endif /* #else #ifdef CONFIG_TREE_PREEMPT_RCU */
 
+/*
+ * If boosting, set rcuc kthreads to realtime priority.
+ */
+static void rcu_cpu_kthread_setup(unsigned int cpu)
+{
+#ifdef CONFIG_RCU_BOOST
+	struct sched_param sp;
+
+	sp.sched_priority = RCU_KTHREAD_PRIO;
+	sched_setscheduler_nocheck(current, SCHED_FIFO, &sp);
+#endif /* #ifdef CONFIG_RCU_BOOST */
+}
+
 #ifdef CONFIG_RCU_BOOST
 
 #include "rtmutex_common.h"
@@ -1134,16 +1132,6 @@ static void rcu_initiate_boost_trace(struct rcu_node *rnp)
 
 #endif /* #else #ifdef CONFIG_RCU_TRACE */
 
-static void rcu_wake_cond(struct task_struct *t, int status)
-{
-	/*
-	 * If the thread is yielding, only wake it when this
-	 * is invoked from idle
-	 */
-	if (status != RCU_KTHREAD_YIELDING || is_idle_task(current))
-		wake_up_process(t);
-}
-
 /*
  * Carry out RCU priority boosting on the task indicated by ->exp_tasks
  * or ->boost_tasks, advancing the pointer to the next task in the
@@ -1287,23 +1275,6 @@ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
 }
 
 /*
- * Wake up the per-CPU kthread to invoke RCU callbacks.
- */
-static void invoke_rcu_callbacks_kthread(void)
-{
-	unsigned long flags;
-
-	local_irq_save(flags);
-	__this_cpu_write(rcu_cpu_has_work, 1);
-	if (__this_cpu_read(rcu_cpu_kthread_task) != NULL &&
-	    current != __this_cpu_read(rcu_cpu_kthread_task)) {
-		rcu_wake_cond(__this_cpu_read(rcu_cpu_kthread_task),
-			      __this_cpu_read(rcu_cpu_kthread_status));
-	}
-	local_irq_restore(flags);
-}
-
-/*
  * Is the current CPU running the RCU-callbacks kthread?
  * Caller must have preemption disabled.
  */
@@ -1357,67 +1328,6 @@ static int __cpuinit rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
 	return 0;
 }
 
-static void rcu_kthread_do_work(void)
-{
-	rcu_do_batch(&rcu_sched_state, &__get_cpu_var(rcu_sched_data));
-	rcu_do_batch(&rcu_bh_state, &__get_cpu_var(rcu_bh_data));
-	rcu_preempt_do_callbacks();
-}
-
-static void rcu_cpu_kthread_setup(unsigned int cpu)
-{
-	struct sched_param sp;
-
-	sp.sched_priority = RCU_KTHREAD_PRIO;
-	sched_setscheduler_nocheck(current, SCHED_FIFO, &sp);
-}
-
-static void rcu_cpu_kthread_park(unsigned int cpu)
-{
-	per_cpu(rcu_cpu_kthread_status, cpu) = RCU_KTHREAD_OFFCPU;
-}
-
-static int rcu_cpu_kthread_should_run(unsigned int cpu)
-{
-	return __get_cpu_var(rcu_cpu_has_work);
-}
-
-/*
- * Per-CPU kernel thread that invokes RCU callbacks.  This replaces the
- * RCU softirq used in flavors and configurations of RCU that do not
- * support RCU priority boosting.
- */
-static void rcu_cpu_kthread(unsigned int cpu)
-{
-	unsigned int *statusp = &__get_cpu_var(rcu_cpu_kthread_status);
-	char work, *workp = &__get_cpu_var(rcu_cpu_has_work);
-	int spincnt;
-
-	for (spincnt = 0; spincnt < 10; spincnt++) {
-		trace_rcu_utilization("Start CPU kthread@rcu_wait");
-		local_bh_disable();
-		*statusp = RCU_KTHREAD_RUNNING;
-		this_cpu_inc(rcu_cpu_kthread_loops);
-		local_irq_disable();
-		work = *workp;
-		*workp = 0;
-		local_irq_enable();
-		if (work)
-			rcu_kthread_do_work();
-		local_bh_enable();
-		if (*workp == 0) {
-			trace_rcu_utilization("End CPU kthread@rcu_wait");
-			*statusp = RCU_KTHREAD_WAITING;
-			return;
-		}
-	}
-	*statusp = RCU_KTHREAD_YIELDING;
-	trace_rcu_utilization("Start CPU kthread@rcu_yield");
-	schedule_timeout_interruptible(2);
-	trace_rcu_utilization("End CPU kthread@rcu_yield");
-	*statusp = RCU_KTHREAD_WAITING;
-}
-
 /*
  * Set the per-rcu_node kthread's affinity to cover all CPUs that are
  * served by the rcu_node in question.  The CPU hotplug lock is still
@@ -1451,27 +1361,14 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
 	free_cpumask_var(cm);
 }
 
-static struct smp_hotplug_thread rcu_cpu_thread_spec = {
-	.store			= &rcu_cpu_kthread_task,
-	.thread_should_run	= rcu_cpu_kthread_should_run,
-	.thread_fn		= rcu_cpu_kthread,
-	.thread_comm		= "rcuc/%u",
-	.setup			= rcu_cpu_kthread_setup,
-	.park			= rcu_cpu_kthread_park,
-};
-
 /*
  * Spawn all kthreads -- called as soon as the scheduler is running.
  */
 static int __init rcu_spawn_kthreads(void)
 {
 	struct rcu_node *rnp;
-	int cpu;
 
 	rcu_scheduler_fully_active = 1;
-	for_each_possible_cpu(cpu)
-		per_cpu(rcu_cpu_has_work, cpu) = 0;
-	BUG_ON(smpboot_register_percpu_thread(&rcu_cpu_thread_spec));
 	rnp = rcu_get_root(rcu_state);
 	(void)rcu_spawn_one_boost_kthread(rcu_state, rnp);
 	if (NUM_RCU_NODES > 1) {
@@ -1499,11 +1396,6 @@ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
 	raw_spin_unlock_irqrestore(&rnp->lock, flags);
 }
 
-static void invoke_rcu_callbacks_kthread(void)
-{
-	WARN_ON_ONCE(1);
-}
-
 static bool rcu_is_callbacks_kthread(void)
 {
 	return false;
-- 
1.8.5.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RT 14/14] Linux 3.10.32-rt31-rc2
  2014-03-01  3:52 [PATCH RT 00/14] Linux 3.10.32-rt31-rc2 Steven Rostedt
                   ` (12 preceding siblings ...)
  2014-03-01  3:52 ` [PATCH RT 13/14] rcu: Eliminate softirq processing from rcutree Steven Rostedt
@ 2014-03-01  3:52 ` Steven Rostedt
  13 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2014-03-01  3:52 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Paul Gortmaker

[-- Attachment #1: 0014-Linux-3.10.32-rt31-rc2.patch --]
[-- Type: text/plain, Size: 406 bytes --]

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

------------------

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

---
 localversion-rt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/localversion-rt b/localversion-rt
index b72862e..c6273a4 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt30
+-rt31-rc2
-- 
1.8.5.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-03-01  3:57 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-01  3:52 [PATCH RT 00/14] Linux 3.10.32-rt31-rc2 Steven Rostedt
2014-03-01  3:52 ` [PATCH RT 01/14] rcu: Dont activate RCU core on NO_HZ_FULL CPUs Steven Rostedt
2014-03-01  3:52 ` [PATCH RT 02/14] timers: do not raise softirq unconditionally Steven Rostedt
2014-03-01  3:52 ` [PATCH RT 03/14] timer: Raise softirq if theres irq_work Steven Rostedt
2014-03-01  3:52 ` [PATCH RT 04/14] timer/rt: Always raise the softirq if theres irq_work to be done Steven Rostedt
2014-03-01  3:52 ` [PATCH RT 05/14] rcutree/rcu_bh_qs: disable irq while calling rcu_preempt_qs() Steven Rostedt
2014-03-01  3:52 ` [PATCH RT 06/14] Revert "x86: Disable IST stacks for debug/int 3/stack fault for PREEMPT_RT" Steven Rostedt
2014-03-01  3:52 ` [PATCH RT 07/14] rt: Make cpu_chill() use hrtimer instead of msleep() Steven Rostedt
2014-03-01  3:52 ` [PATCH RT 08/14] kernel/hrtimer: be non-freezeable in cpu_chill() Steven Rostedt
2014-03-01  3:52 ` [PATCH RT 09/14] irq_work: allow certain work in hard irq context Steven Rostedt
2014-03-01  3:52 ` [PATCH RT 10/14] arm/unwind: use a raw_spin_lock Steven Rostedt
2014-03-01  3:52 ` [PATCH RT 11/14] net: ip_send_unicast_reply: add missing local serialization Steven Rostedt
2014-03-01  3:52 ` [PATCH RT 12/14] leds: trigger: disable CPU trigger on -RT Steven Rostedt
2014-03-01  3:52 ` [PATCH RT 13/14] rcu: Eliminate softirq processing from rcutree Steven Rostedt
2014-03-01  3:52 ` [PATCH RT 14/14] Linux 3.10.32-rt31-rc2 Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).