All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review
@ 2012-07-18 22:39 Steven Rostedt
  2012-07-18 22:39 ` [PATCH RT 01/12] Latency histogramms: Cope with backwards running local trace clock Steven Rostedt
                   ` (12 more replies)
  0 siblings, 13 replies; 26+ messages in thread
From: Steven Rostedt @ 2012-07-18 22:39 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users; +Cc: Thomas Gleixner, Carsten Emde, John Kacur


Dear RT Folks,

This is the RT stable review cycle of patch 3.0.36-rt58-rc1.

Please scream at me if I messed something up. Please test the patches too.

The -rc release will be uploaded to kernel.org and will be deleted when
the final release is out. This is just a review release (or release candidate).

The pre-releases will not be pushed to the git repository, only the
final release is.

If all goes well, this patch will be converted to the next main release
on 7/20/2012.

Enjoy,

-- Steve


To build 3.0.36-rt58-rc1 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.0/linux-3.0.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.0/patch-3.0.36.xz

  http://www.kernel.org/pub/linux/kernel/projects/rt/3.0/patch-3.0.36-rt58-rc1.patch.xz

You can also build from 3.0.36-rt57 by applying the incremental patch:

http://www.kernel.org/pub/linux/kernel/projects/rt/3.0/incr/patch-3.0.36-rt57-rt58-rc1.patch.xz


Changes from 3.0.36-rt57:

---


Carsten Emde (4):
      Latency histogramms: Cope with backwards running local trace clock
      Latency histograms: Adjust timer, if already elapsed when programmed
      Disable RT_GROUP_SCHED in PREEMPT_RT_FULL
      Latency histograms: Detect another yet overlooked sharedprio condition

Mike Galbraith (1):
      fs, jbd: pull your plug when waiting for space

Steven Rostedt (5):
      cpu/rt: Rework cpu down for PREEMPT_RT
      cpu/rt: Fix cpu_hotplug variable initialization
      workqueue: Revert workqueue: Fix PF_THREAD_BOUND abuse
      workqueue: Revert workqueue: Fix cpuhotplug trainwreck
      Linux 3.0.36-rt58-rc1

Thomas Gleixner (1):
      slab: Prevent local lock deadlock

Yong Zhang (1):
      perf: Make swevent hrtimer run in irq instead of softirq

----
 fs/jbd/checkpoint.c         |    2 +
 include/linux/cpu.h         |   14 +-
 include/linux/hrtimer.h     |    3 +
 include/linux/sched.h       |    9 +-
 include/linux/workqueue.h   |    5 +-
 init/Kconfig                |    1 +
 kernel/cpu.c                |  240 ++++++++++++++----
 kernel/events/core.c        |    1 +
 kernel/hrtimer.c            |   16 +-
 kernel/sched.c              |   82 +++++-
 kernel/trace/latency_hist.c |   74 +++---
 kernel/workqueue.c          |  578 +++++++++++++++++++++++++++++++------------
 localversion-rt             |    2 +-
 mm/slab.c                   |   26 +-
 14 files changed, 792 insertions(+), 261 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH RT 01/12] Latency histogramms: Cope with backwards running local trace clock
  2012-07-18 22:39 [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review Steven Rostedt
@ 2012-07-18 22:39 ` Steven Rostedt
  2012-07-18 22:39 ` [PATCH RT 02/12] Latency histograms: Adjust timer, if already elapsed when programmed Steven Rostedt
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Steven Rostedt @ 2012-07-18 22:39 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users; +Cc: Thomas Gleixner, Carsten Emde, John Kacur

[-- Attachment #1: 0001-Latency-histogramms-Cope-with-backwards-running-loca.patch --]
[-- Type: text/plain, Size: 9414 bytes --]

From: Carsten Emde <C.Emde@osadl.org>

Thanks to the wonders of modern technology, the local trace clock can
now run backwards. Since this never happened before, the time difference
between now and somewhat earlier was expected to never become negative
and, thus, stored in an unsigned integer variable. Nowadays, we need a
signed integer to ensure that the value is stored as underflow in the
related histogram. (In cases where this is not a misfunction, bipolar
histograms can be used.)

This patch takes care that all latency variables are represented as
signed integers and negative numbers are considered as histogram
underflows.

In one of the misbehaving processors switching to global clock solved
the problem:
  echo global >/sys/kernel/debug/tracing/trace_clock

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/sched.h       |    2 +-
 kernel/trace/latency_hist.c |   71 ++++++++++++++++++++++---------------------
 2 files changed, 38 insertions(+), 35 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 77e132f..beffba3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1580,7 +1580,7 @@ struct task_struct {
 #ifdef CONFIG_WAKEUP_LATENCY_HIST
 	u64 preempt_timestamp_hist;
 #ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST
-	unsigned long timer_offset;
+	long timer_offset;
 #endif
 #endif
 #endif /* CONFIG_TRACING */
diff --git a/kernel/trace/latency_hist.c b/kernel/trace/latency_hist.c
index 9d49fcb..d514eef 100644
--- a/kernel/trace/latency_hist.c
+++ b/kernel/trace/latency_hist.c
@@ -27,6 +27,8 @@
 #include "trace.h"
 #include <trace/events/sched.h>
 
+#define NSECS_PER_USECS 1000L
+
 #define CREATE_TRACE_POINTS
 #include <trace/events/hist.h>
 
@@ -46,11 +48,11 @@ enum {
 struct hist_data {
 	atomic_t hist_mode; /* 0 log, 1 don't log */
 	long offset; /* set it to MAX_ENTRY_NUM/2 for a bipolar scale */
-	unsigned long min_lat;
-	unsigned long max_lat;
+	long min_lat;
+	long max_lat;
 	unsigned long long below_hist_bound_samples;
 	unsigned long long above_hist_bound_samples;
-	unsigned long long accumulate_lat;
+	long long accumulate_lat;
 	unsigned long long total_samples;
 	unsigned long long hist_array[MAX_ENTRY_NUM];
 };
@@ -152,8 +154,8 @@ static struct enable_data timerandwakeup_enabled_data = {
 static DEFINE_PER_CPU(struct maxlatproc_data, timerandwakeup_maxlatproc);
 #endif
 
-void notrace latency_hist(int latency_type, int cpu, unsigned long latency,
-			  unsigned long timeroffset, cycle_t stop,
+void notrace latency_hist(int latency_type, int cpu, long latency,
+			  long timeroffset, cycle_t stop,
 			  struct task_struct *p)
 {
 	struct hist_data *my_hist;
@@ -224,7 +226,7 @@ void notrace latency_hist(int latency_type, int cpu, unsigned long latency,
 		my_hist->hist_array[latency]++;
 
 	if (unlikely(latency > my_hist->max_lat ||
-	    my_hist->min_lat == ULONG_MAX)) {
+	    my_hist->min_lat == LONG_MAX)) {
 #if defined(CONFIG_WAKEUP_LATENCY_HIST) || \
     defined(CONFIG_MISSED_TIMER_OFFSETS_HIST)
 		if (latency_type == WAKEUP_LATENCY ||
@@ -263,15 +265,14 @@ static void *l_start(struct seq_file *m, loff_t *pos)
 		atomic_dec(&my_hist->hist_mode);
 
 		if (likely(my_hist->total_samples)) {
-			unsigned long avg = (unsigned long)
-			    div64_u64(my_hist->accumulate_lat,
+			long avg = (long) div64_s64(my_hist->accumulate_lat,
 			    my_hist->total_samples);
 			snprintf(minstr, sizeof(minstr), "%ld",
-			    (long) my_hist->min_lat - my_hist->offset);
+			    my_hist->min_lat - my_hist->offset);
 			snprintf(avgstr, sizeof(avgstr), "%ld",
-			    (long) avg - my_hist->offset);
+			    avg - my_hist->offset);
 			snprintf(maxstr, sizeof(maxstr), "%ld",
-			    (long) my_hist->max_lat - my_hist->offset);
+			    my_hist->max_lat - my_hist->offset);
 		} else {
 			strcpy(minstr, "<undef>");
 			strcpy(avgstr, minstr);
@@ -376,10 +377,10 @@ static void hist_reset(struct hist_data *hist)
 	memset(hist->hist_array, 0, sizeof(hist->hist_array));
 	hist->below_hist_bound_samples = 0ULL;
 	hist->above_hist_bound_samples = 0ULL;
-	hist->min_lat = ULONG_MAX;
-	hist->max_lat = 0UL;
+	hist->min_lat = LONG_MAX;
+	hist->max_lat = LONG_MIN;
 	hist->total_samples = 0ULL;
-	hist->accumulate_lat = 0ULL;
+	hist->accumulate_lat = 0LL;
 
 	atomic_inc(&hist->hist_mode);
 }
@@ -790,9 +791,9 @@ static notrace void probe_preemptirqsoff_hist(void *v, int reason,
 
 			stop = ftrace_now(cpu);
 			time_set++;
-			if (start && stop >= start) {
-				unsigned long latency =
-				    nsecs_to_usecs(stop - start);
+			if (start) {
+				long latency = ((long) (stop - start)) /
+				    NSECS_PER_USECS;
 
 				latency_hist(IRQSOFF_LATENCY, cpu, latency, 0,
 				    stop, NULL);
@@ -808,9 +809,9 @@ static notrace void probe_preemptirqsoff_hist(void *v, int reason,
 
 			if (!(time_set++))
 				stop = ftrace_now(cpu);
-			if (start && stop >= start) {
-				unsigned long latency =
-				    nsecs_to_usecs(stop - start);
+			if (start) {
+				long latency = ((long) (stop - start)) /
+				    NSECS_PER_USECS;
 
 				latency_hist(PREEMPTOFF_LATENCY, cpu, latency,
 				    0, stop, NULL);
@@ -827,9 +828,10 @@ static notrace void probe_preemptirqsoff_hist(void *v, int reason,
 
 			if (!time_set)
 				stop = ftrace_now(cpu);
-			if (start && stop >= start) {
-				unsigned long latency =
-				    nsecs_to_usecs(stop - start);
+			if (start) {
+				long latency = ((long) (stop - start)) /
+				    NSECS_PER_USECS;
+
 				latency_hist(PREEMPTIRQSOFF_LATENCY, cpu,
 				    latency, 0, stop, NULL);
 			}
@@ -908,7 +910,7 @@ static notrace void probe_wakeup_latency_hist_stop(void *v,
 {
 	unsigned long flags;
 	int cpu = task_cpu(next);
-	unsigned long latency;
+	long latency;
 	cycle_t stop;
 	struct task_struct *cpu_wakeup_task;
 
@@ -939,7 +941,8 @@ static notrace void probe_wakeup_latency_hist_stop(void *v,
 	 */
 	stop = ftrace_now(raw_smp_processor_id());
 
-	latency = nsecs_to_usecs(stop - next->preempt_timestamp_hist);
+	latency = ((long) (stop - next->preempt_timestamp_hist)) /
+	    NSECS_PER_USECS;
 
 	if (per_cpu(wakeup_sharedprio, cpu)) {
 		latency_hist(WAKEUP_LATENCY_SHAREDPRIO, cpu, latency, 0, stop,
@@ -975,7 +978,7 @@ static notrace void probe_hrtimer_interrupt(void *v, int cpu,
 	    (task->prio < curr->prio ||
 	    (task->prio == curr->prio &&
 	    !cpumask_test_cpu(cpu, &task->cpus_allowed)))) {
-		unsigned long latency;
+		long latency;
 		cycle_t now;
 
 		if (missed_timer_offsets_pid) {
@@ -985,7 +988,7 @@ static notrace void probe_hrtimer_interrupt(void *v, int cpu,
 		}
 
 		now = ftrace_now(cpu);
-		latency = (unsigned long) div_s64(-latency_ns, 1000);
+		latency = (long) div_s64(-latency_ns, NSECS_PER_USECS);
 		latency_hist(MISSED_TIMER_OFFSETS, cpu, latency, latency, now,
 		    task);
 #ifdef CONFIG_WAKEUP_LATENCY_HIST
@@ -1026,7 +1029,7 @@ static __init int latency_hist_init(void)
 		    &per_cpu(irqsoff_hist, i), &latency_hist_fops);
 		my_hist = &per_cpu(irqsoff_hist, i);
 		atomic_set(&my_hist->hist_mode, 1);
-		my_hist->min_lat = 0xFFFFFFFFUL;
+		my_hist->min_lat = LONG_MAX;
 	}
 	entry = debugfs_create_file("reset", 0644, dentry,
 	    (void *)IRQSOFF_LATENCY, &latency_hist_reset_fops);
@@ -1041,7 +1044,7 @@ static __init int latency_hist_init(void)
 		    &per_cpu(preemptoff_hist, i), &latency_hist_fops);
 		my_hist = &per_cpu(preemptoff_hist, i);
 		atomic_set(&my_hist->hist_mode, 1);
-		my_hist->min_lat = 0xFFFFFFFFUL;
+		my_hist->min_lat = LONG_MAX;
 	}
 	entry = debugfs_create_file("reset", 0644, dentry,
 	    (void *)PREEMPTOFF_LATENCY, &latency_hist_reset_fops);
@@ -1056,7 +1059,7 @@ static __init int latency_hist_init(void)
 		    &per_cpu(preemptirqsoff_hist, i), &latency_hist_fops);
 		my_hist = &per_cpu(preemptirqsoff_hist, i);
 		atomic_set(&my_hist->hist_mode, 1);
-		my_hist->min_lat = 0xFFFFFFFFUL;
+		my_hist->min_lat = LONG_MAX;
 	}
 	entry = debugfs_create_file("reset", 0644, dentry,
 	    (void *)PREEMPTIRQSOFF_LATENCY, &latency_hist_reset_fops);
@@ -1081,14 +1084,14 @@ static __init int latency_hist_init(void)
 		    &latency_hist_fops);
 		my_hist = &per_cpu(wakeup_latency_hist, i);
 		atomic_set(&my_hist->hist_mode, 1);
-		my_hist->min_lat = 0xFFFFFFFFUL;
+		my_hist->min_lat = LONG_MAX;
 
 		entry = debugfs_create_file(name, 0444, dentry_sharedprio,
 		    &per_cpu(wakeup_latency_hist_sharedprio, i),
 		    &latency_hist_fops);
 		my_hist = &per_cpu(wakeup_latency_hist_sharedprio, i);
 		atomic_set(&my_hist->hist_mode, 1);
-		my_hist->min_lat = 0xFFFFFFFFUL;
+		my_hist->min_lat = LONG_MAX;
 
 		sprintf(name, cpufmt_maxlatproc, i);
 
@@ -1122,7 +1125,7 @@ static __init int latency_hist_init(void)
 		    &per_cpu(missed_timer_offsets, i), &latency_hist_fops);
 		my_hist = &per_cpu(missed_timer_offsets, i);
 		atomic_set(&my_hist->hist_mode, 1);
-		my_hist->min_lat = 0xFFFFFFFFUL;
+		my_hist->min_lat = LONG_MAX;
 
 		sprintf(name, cpufmt_maxlatproc, i);
 		mp = &per_cpu(missed_timer_offsets_maxlatproc, i);
@@ -1150,7 +1153,7 @@ static __init int latency_hist_init(void)
 		    &latency_hist_fops);
 		my_hist = &per_cpu(timerandwakeup_latency_hist, i);
 		atomic_set(&my_hist->hist_mode, 1);
-		my_hist->min_lat = 0xFFFFFFFFUL;
+		my_hist->min_lat = LONG_MAX;
 
 		sprintf(name, cpufmt_maxlatproc, i);
 		mp = &per_cpu(timerandwakeup_maxlatproc, i);
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RT 02/12] Latency histograms: Adjust timer, if already elapsed when programmed
  2012-07-18 22:39 [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review Steven Rostedt
  2012-07-18 22:39 ` [PATCH RT 01/12] Latency histogramms: Cope with backwards running local trace clock Steven Rostedt
@ 2012-07-18 22:39 ` Steven Rostedt
  2012-07-18 22:39 ` [PATCH RT 03/12] Disable RT_GROUP_SCHED in PREEMPT_RT_FULL Steven Rostedt
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Steven Rostedt @ 2012-07-18 22:39 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users; +Cc: Thomas Gleixner, Carsten Emde, John Kacur

[-- Attachment #1: 0002-Latency-histograms-Adjust-timer-if-already-elapsed-w.patch --]
[-- Type: text/plain, Size: 2131 bytes --]

From: Carsten Emde <C.Emde@osadl.org>

Nothing prevents a programmer from calling clock_nanosleep() with an
already elapsed wakeup time in absolute time mode or with a too small
delay in relative time mode. Such timers cannot wake up in time and,
thus, should be corrected when entered into the missed timers latency
histogram (CONFIG_MISSED_TIMERS_HIST).

This patch marks such timers and uses a corrected expiration time.

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/hrtimer.h |    3 +++
 kernel/hrtimer.c        |   16 ++++++++++++++--
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 0e37086..7408760 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -113,6 +113,9 @@ struct hrtimer {
 	unsigned long			state;
 	struct list_head		cb_entry;
 	int				irqsafe;
+#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST
+	ktime_t 			praecox;
+#endif
 #ifdef CONFIG_TIMER_STATS
 	int				start_pid;
 	void				*start_site;
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index bb07742..363965f 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1021,6 +1021,17 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
 #endif
 	}
 
+#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST
+	{
+		ktime_t now = new_base->get_time();
+
+		if (ktime_to_ns(tim) < ktime_to_ns(now))
+			timer->praecox = now;
+		else
+			timer->praecox = ktime_set(0, 0);
+	}
+#endif
+
 	hrtimer_set_expires_range_ns(timer, tim, delta_ns);
 
 	timer_stats_hrtimer_set_start_info(timer);
@@ -1458,8 +1469,9 @@ retry:
 			timer = container_of(node, struct hrtimer, node);
 
 			trace_hrtimer_interrupt(raw_smp_processor_id(),
-			    ktime_to_ns(ktime_sub(
-				hrtimer_get_expires(timer), basenow)),
+			    ktime_to_ns(ktime_sub(ktime_to_ns(timer->praecox) ?
+				timer->praecox : hrtimer_get_expires(timer),
+				basenow)),
 			    current,
 			    timer->function == hrtimer_wakeup ?
 			    container_of(timer, struct hrtimer_sleeper,
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RT 03/12] Disable RT_GROUP_SCHED in PREEMPT_RT_FULL
  2012-07-18 22:39 [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review Steven Rostedt
  2012-07-18 22:39 ` [PATCH RT 01/12] Latency histogramms: Cope with backwards running local trace clock Steven Rostedt
  2012-07-18 22:39 ` [PATCH RT 02/12] Latency histograms: Adjust timer, if already elapsed when programmed Steven Rostedt
@ 2012-07-18 22:39 ` Steven Rostedt
  2012-07-18 22:39 ` [PATCH RT 04/12] Latency histograms: Detect another yet overlooked sharedprio condition Steven Rostedt
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Steven Rostedt @ 2012-07-18 22:39 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users; +Cc: Thomas Gleixner, Carsten Emde, John Kacur

[-- Attachment #1: 0003-Disable-RT_GROUP_SCHED-in-PREEMPT_RT_FULL.patch --]
[-- Type: text/plain, Size: 728 bytes --]

From: Carsten Emde <C.Emde@osadl.org>

Strange CPU stalls have been observed in RT when RT_GROUP_SCHED
was configured.

Disable it for now.

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 init/Kconfig |    1 +
 1 file changed, 1 insertion(+)

diff --git a/init/Kconfig b/init/Kconfig
index 89e40a4..5ed453f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -719,6 +719,7 @@ config RT_GROUP_SCHED
 	bool "Group scheduling for SCHED_RR/FIFO"
 	depends on EXPERIMENTAL
 	depends on CGROUP_SCHED
+	depends on !PREEMPT_RT_FULL
 	default n
 	help
 	  This feature lets you explicitly allocate real CPU bandwidth
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RT 04/12] Latency histograms: Detect another yet overlooked sharedprio condition
  2012-07-18 22:39 [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review Steven Rostedt
                   ` (2 preceding siblings ...)
  2012-07-18 22:39 ` [PATCH RT 03/12] Disable RT_GROUP_SCHED in PREEMPT_RT_FULL Steven Rostedt
@ 2012-07-18 22:39 ` Steven Rostedt
  2012-07-18 22:39 ` [PATCH RT 05/12] slab: Prevent local lock deadlock Steven Rostedt
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Steven Rostedt @ 2012-07-18 22:39 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users; +Cc: Thomas Gleixner, Carsten Emde, John Kacur

[-- Attachment #1: 0004-Latency-histograms-Detect-another-yet-overlooked-sha.patch --]
[-- Type: text/plain, Size: 1161 bytes --]

From: Carsten Emde <C.Emde@osadl.org>

While waiting for an RT process to be woken up, the previous process may
go to wait and switch to another one with the same priority which then
becomes current. This condition was not correctly recognized and led to
erroneously high latency recordings during periods of low CPU load.

This patch correctly marks such latencies as sharedprio and prevents
them from being recorded as actual system latency.

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/latency_hist.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/trace/latency_hist.c b/kernel/trace/latency_hist.c
index d514eef..6a4c869 100644
--- a/kernel/trace/latency_hist.c
+++ b/kernel/trace/latency_hist.c
@@ -935,6 +935,9 @@ static notrace void probe_wakeup_latency_hist_stop(void *v,
 		goto out;
 	}
 
+	if (current->prio == cpu_wakeup_task->prio)
+		per_cpu(wakeup_sharedprio, cpu) = 1;
+
 	/*
 	 * The task we are waiting for is about to be switched to.
 	 * Calculate latency and store it in histogram.
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RT 05/12] slab: Prevent local lock deadlock
  2012-07-18 22:39 [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review Steven Rostedt
                   ` (3 preceding siblings ...)
  2012-07-18 22:39 ` [PATCH RT 04/12] Latency histograms: Detect another yet overlooked sharedprio condition Steven Rostedt
@ 2012-07-18 22:39 ` Steven Rostedt
  2012-07-27  0:15   ` Frank Rowand
  2012-07-18 22:39 ` [PATCH RT 06/12] fs, jbd: pull your plug when waiting for space Steven Rostedt
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Steven Rostedt @ 2012-07-18 22:39 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users; +Cc: Thomas Gleixner, Carsten Emde, John Kacur

[-- Attachment #1: 0005-slab-Prevent-local-lock-deadlock.patch --]
[-- Type: text/plain, Size: 2000 bytes --]

From: Thomas Gleixner <tglx@linutronix.de>

On RT we avoid the cross cpu function calls and take the per cpu local
locks instead. Now the code missed that taking the local lock on the
cpu which runs the code must use the proper local lock functions and
not a simple spin_lock(). Otherwise it deadlocks later when trying to
acquire the local lock with the proper function.

Reported-and-tested-by: Chris Pringle <chris.pringle@miranda.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 mm/slab.c |   26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 5251b99..827e2d6 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -737,8 +737,26 @@ slab_on_each_cpu(void (*func)(void *arg, int this_cpu), void *arg)
 {
 	unsigned int i;
 
+	get_cpu_light();
 	for_each_online_cpu(i)
 		func(arg, i);
+	put_cpu_light();
+}
+
+static void lock_slab_on(unsigned int cpu)
+{
+	if (cpu == smp_processor_id())
+		local_lock_irq(slab_lock);
+	else
+		local_spin_lock_irq(slab_lock, &per_cpu(slab_lock, cpu).lock);
+}
+
+static void unlock_slab_on(unsigned int cpu)
+{
+	if (cpu == smp_processor_id())
+		local_unlock_irq(slab_lock);
+	else
+		local_spin_unlock_irq(slab_lock, &per_cpu(slab_lock, cpu).lock);
 }
 #endif
 
@@ -2625,10 +2643,10 @@ static void do_drain(void *arg, int cpu)
 {
 	LIST_HEAD(tmp);
 
-	spin_lock_irq(&per_cpu(slab_lock, cpu).lock);
+	lock_slab_on(cpu);
 	__do_drain(arg, cpu);
 	list_splice_init(&per_cpu(slab_free_list, cpu), &tmp);
-	spin_unlock_irq(&per_cpu(slab_lock, cpu).lock);
+	unlock_slab_on(cpu);
 	free_delayed(&tmp);
 }
 #endif
@@ -4099,9 +4117,9 @@ static void do_ccupdate_local(void *info)
 #else
 static void do_ccupdate_local(void *info, int cpu)
 {
-	spin_lock_irq(&per_cpu(slab_lock, cpu).lock);
+	lock_slab_on(cpu);
 	__do_ccupdate_local(info, cpu);
-	spin_unlock_irq(&per_cpu(slab_lock, cpu).lock);
+	unlock_slab_on(cpu);
 }
 #endif
 
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RT 06/12] fs, jbd: pull your plug when waiting for space
  2012-07-18 22:39 [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review Steven Rostedt
                   ` (4 preceding siblings ...)
  2012-07-18 22:39 ` [PATCH RT 05/12] slab: Prevent local lock deadlock Steven Rostedt
@ 2012-07-18 22:39 ` Steven Rostedt
  2012-07-18 22:39 ` [PATCH RT 07/12] perf: Make swevent hrtimer run in irq instead of softirq Steven Rostedt
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Steven Rostedt @ 2012-07-18 22:39 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, John Kacur, Mike Galbraith, Theodore Tso

[-- Attachment #1: 0006-fs-jbd-pull-your-plug-when-waiting-for-space.patch --]
[-- Type: text/plain, Size: 1004 bytes --]

From: Mike Galbraith <mgalbraith@suse.de>

With an -rt kernel, and a heavy sync IO load, tasks can jam
up on journal locks without unplugging, which can lead to
terminal IO starvation.  Unplug and schedule when waiting for space.

Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Theodore Tso <tytso@mit.edu>
Link: http://lkml.kernel.org/r/1341812414.7370.73.camel@marge.simpson.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 fs/jbd/checkpoint.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/jbd/checkpoint.c b/fs/jbd/checkpoint.c
index e4b87bc..e24b2d5 100644
--- a/fs/jbd/checkpoint.c
+++ b/fs/jbd/checkpoint.c
@@ -123,6 +123,8 @@ void __log_wait_for_space(journal_t *journal)
 		if (journal->j_flags & JFS_ABORT)
 			return;
 		spin_unlock(&journal->j_state_lock);
+		if (current->plug)
+			io_schedule();
 		mutex_lock(&journal->j_checkpoint_mutex);
 
 		/*
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RT 07/12] perf: Make swevent hrtimer run in irq instead of softirq
  2012-07-18 22:39 [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review Steven Rostedt
                   ` (5 preceding siblings ...)
  2012-07-18 22:39 ` [PATCH RT 06/12] fs, jbd: pull your plug when waiting for space Steven Rostedt
@ 2012-07-18 22:39 ` Steven Rostedt
  2012-07-18 22:39 ` [PATCH RT 08/12] cpu/rt: Rework cpu down for PREEMPT_RT Steven Rostedt
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Steven Rostedt @ 2012-07-18 22:39 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, John Kacur, Yong Zhang, Peter Zijlstra

[-- Attachment #1: 0007-perf-Make-swevent-hrtimer-run-in-irq-instead-of-soft.patch --]
[-- Type: text/plain, Size: 3604 bytes --]

From: Yong Zhang <yong.zhang@windriver.com>

Otherwise we get a deadlock like below:

[ 1044.042749] BUG: scheduling while atomic: ksoftirqd/21/141/0x00010003
[ 1044.042752] INFO: lockdep is turned off.
[ 1044.042754] Modules linked in:
[ 1044.042757] Pid: 141, comm: ksoftirqd/21 Tainted: G        W    3.4.0-rc2-rt3-23676-ga723175-dirty #29
[ 1044.042759] Call Trace:
[ 1044.042761]  <IRQ>  [<ffffffff8107d8e5>] __schedule_bug+0x65/0x80
[ 1044.042770]  [<ffffffff8168978c>] __schedule+0x83c/0xa70
[ 1044.042775]  [<ffffffff8106bdd2>] ? prepare_to_wait+0x32/0xb0
[ 1044.042779]  [<ffffffff81689a5e>] schedule+0x2e/0xa0
[ 1044.042782]  [<ffffffff81071ebd>] hrtimer_wait_for_timer+0x6d/0xb0
[ 1044.042786]  [<ffffffff8106bb30>] ? wake_up_bit+0x40/0x40
[ 1044.042790]  [<ffffffff81071f20>] hrtimer_cancel+0x20/0x40
[ 1044.042794]  [<ffffffff8111da0c>] perf_swevent_cancel_hrtimer+0x3c/0x50
[ 1044.042798]  [<ffffffff8111da31>] task_clock_event_stop+0x11/0x40
[ 1044.042802]  [<ffffffff8111da6e>] task_clock_event_del+0xe/0x10
[ 1044.042805]  [<ffffffff8111c568>] event_sched_out+0x118/0x1d0
[ 1044.042809]  [<ffffffff8111c649>] group_sched_out+0x29/0x90
[ 1044.042813]  [<ffffffff8111ed7e>] __perf_event_disable+0x18e/0x200
[ 1044.042817]  [<ffffffff8111c343>] remote_function+0x63/0x70
[ 1044.042821]  [<ffffffff810b0aae>] generic_smp_call_function_single_interrupt+0xce/0x120
[ 1044.042826]  [<ffffffff81022bc7>] smp_call_function_single_interrupt+0x27/0x40
[ 1044.042831]  [<ffffffff8168d50c>] call_function_single_interrupt+0x6c/0x80
[ 1044.042833]  <EOI>  [<ffffffff811275b0>] ? perf_event_overflow+0x20/0x20
[ 1044.042840]  [<ffffffff8168b970>] ? _raw_spin_unlock_irq+0x30/0x70
[ 1044.042844]  [<ffffffff8168b976>] ? _raw_spin_unlock_irq+0x36/0x70
[ 1044.042848]  [<ffffffff810702e2>] run_hrtimer_softirq+0xc2/0x200
[ 1044.042853]  [<ffffffff811275b0>] ? perf_event_overflow+0x20/0x20
[ 1044.042857]  [<ffffffff81045265>] __do_softirq_common+0xf5/0x3a0
[ 1044.042862]  [<ffffffff81045c3d>] __thread_do_softirq+0x15d/0x200
[ 1044.042865]  [<ffffffff81045dda>] run_ksoftirqd+0xfa/0x210
[ 1044.042869]  [<ffffffff81045ce0>] ? __thread_do_softirq+0x200/0x200
[ 1044.042873]  [<ffffffff81045ce0>] ? __thread_do_softirq+0x200/0x200
[ 1044.042877]  [<ffffffff8106b596>] kthread+0xb6/0xc0
[ 1044.042881]  [<ffffffff8168b97b>] ? _raw_spin_unlock_irq+0x3b/0x70
[ 1044.042886]  [<ffffffff8168d994>] kernel_thread_helper+0x4/0x10
[ 1044.042889]  [<ffffffff8107d98c>] ? finish_task_switch+0x8c/0x110
[ 1044.042894]  [<ffffffff8168b97b>] ? _raw_spin_unlock_irq+0x3b/0x70
[ 1044.042897]  [<ffffffff8168bd5d>] ? retint_restore_args+0xe/0xe
[ 1044.042900]  [<ffffffff8106b4e0>] ? kthreadd+0x1e0/0x1e0
[ 1044.042902]  [<ffffffff8168d990>] ? gs_change+0xb/0xb

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1341476476-5666-1-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/events/core.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 08315ad..1c876e02 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5702,6 +5702,7 @@ static void perf_swevent_init_hrtimer(struct perf_event *event)
 
 	hrtimer_init(&hwc->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
 	hwc->hrtimer.function = perf_swevent_hrtimer;
+	hwc->hrtimer.irqsafe = 1;
 
 	/*
 	 * Since hrtimers have a fixed rate, we can do a static freq->period
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RT 08/12] cpu/rt: Rework cpu down for PREEMPT_RT
  2012-07-18 22:39 [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review Steven Rostedt
                   ` (6 preceding siblings ...)
  2012-07-18 22:39 ` [PATCH RT 07/12] perf: Make swevent hrtimer run in irq instead of softirq Steven Rostedt
@ 2012-07-18 22:39 ` Steven Rostedt
  2012-07-18 22:39 ` [PATCH RT 09/12] cpu/rt: Fix cpu_hotplug variable initialization Steven Rostedt
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Steven Rostedt @ 2012-07-18 22:39 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users; +Cc: Thomas Gleixner, Carsten Emde, John Kacur

[-- Attachment #1: 0008-cpu-rt-Rework-cpu-down-for-PREEMPT_RT.patch --]
[-- Type: text/plain, Size: 15696 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Bringing a CPU down is a pain with the PREEMPT_RT kernel because
tasks can be preempted in many more places than in non-RT. In
order to handle per_cpu variables, tasks may be pinned to a CPU
for a while, and even sleep. But these tasks need to be off the CPU
if that CPU is going down.

Several synchronization methods have been tried, but when stressed
they failed. This is a new approach.

A sync_tsk thread is still created and tasks may still block on a
lock when the CPU is going down, but how that works is a bit different.
When cpu_down() starts, it will create the sync_tsk and wait on it
to inform that current tasks that are pinned on the CPU are no longer
pinned. But new tasks that are about to be pinned will still be allowed
to do so at this time.

Then the notifiers are called. Several notifiers will bring down tasks
that will enter these locations. Some of these tasks will take locks
of other tasks that are on the CPU. If we don't let those other tasks
continue, but make them block until CPU down is done, the tasks that
the notifiers are waiting on will never complete as they are waiting
for the locks held by the tasks that are blocked.

Thus we still let the task pin the CPU until the notifiers are done.
After the notifiers run, we then make new tasks entering the pinned
CPU sections grab a mutex and wait. This mutex is now a per CPU mutex
in the hotplug_pcp descriptor.

To help things along, a new function in the scheduler code is created
called migrate_me(). This function will try to migrate the current task
off the CPU this is going down if possible. When the sync_tsk is created,
all tasks will then try to migrate off the CPU going down. There are
several cases that this wont work, but it helps in most cases.

After the notifiers are called and if a task can't migrate off but enters
the pin CPU sections, it will be forced to wait on the hotplug_pcp mutex
until the CPU down is complete. Then the scheduler will force the migration
anyway.

Also, I found that THREAD_BOUND need to also be accounted for in the
pinned CPU, and the migrate_disable no longer treats them special.
This helps fix issues with ksoftirqd and workqueue that unbind on CPU down.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/sched.h |    7 ++
 kernel/cpu.c          |  236 +++++++++++++++++++++++++++++++++++++++++--------
 kernel/sched.c        |   82 ++++++++++++++++-
 3 files changed, 285 insertions(+), 40 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index beffba3..d25892e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1908,6 +1908,10 @@ extern void do_set_cpus_allowed(struct task_struct *p,
 
 extern int set_cpus_allowed_ptr(struct task_struct *p,
 				const struct cpumask *new_mask);
+int migrate_me(void);
+void tell_sched_cpu_down_begin(int cpu);
+void tell_sched_cpu_down_done(int cpu);
+
 #else
 static inline void do_set_cpus_allowed(struct task_struct *p,
 				      const struct cpumask *new_mask)
@@ -1920,6 +1924,9 @@ static inline int set_cpus_allowed_ptr(struct task_struct *p,
 		return -EINVAL;
 	return 0;
 }
+static inline int migrate_me(void) { return 0; }
+static inline void tell_sched_cpu_down_begin(int cpu) { }
+static inline void tell_sched_cpu_down_done(int cpu) { }
 #endif
 
 #ifndef CONFIG_CPUMASK_OFFSTACK
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 21c8380..11e6e9a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -46,12 +46,7 @@ static int cpu_hotplug_disabled;
 
 static struct {
 	struct task_struct *active_writer;
-#ifdef CONFIG_PREEMPT_RT_FULL
-	/* Makes the lock keep the task's state */
-	spinlock_t lock;
-#else
 	struct mutex lock; /* Synchronizes accesses to refcount, */
-#endif
 	/*
 	 * Also blocks the new readers during
 	 * an ongoing cpu hotplug operation.
@@ -67,20 +62,42 @@ static struct {
 	.refcount = 0,
 };
 
-#ifdef CONFIG_PREEMPT_RT_FULL
-# define hotplug_lock() rt_spin_lock(&cpu_hotplug.lock)
-# define hotplug_unlock() rt_spin_unlock(&cpu_hotplug.lock)
-#else
-# define hotplug_lock() mutex_lock(&cpu_hotplug.lock)
-# define hotplug_unlock() mutex_unlock(&cpu_hotplug.lock)
-#endif
-
+/**
+ * hotplug_pcp - per cpu hotplug descriptor
+ * @unplug:	set when pin_current_cpu() needs to sync tasks
+ * @sync_tsk:	the task that waits for tasks to finish pinned sections
+ * @refcount:	counter of tasks in pinned sections
+ * @grab_lock:	set when the tasks entering pinned sections should wait
+ * @synced:	notifier for @sync_tsk to tell cpu_down it's finished
+ * @mutex:	the mutex to make tasks wait (used when @grab_lock is true)
+ * @mutex_init:	zero if the mutex hasn't been initialized yet.
+ *
+ * Although @unplug and @sync_tsk may point to the same task, the @unplug
+ * is used as a flag and still exists after @sync_tsk has exited and
+ * @sync_tsk set to NULL.
+ */
 struct hotplug_pcp {
 	struct task_struct *unplug;
+	struct task_struct *sync_tsk;
 	int refcount;
+	int grab_lock;
 	struct completion synced;
+#ifdef CONFIG_PREEMPT_RT_FULL
+	spinlock_t lock;
+#else
+	struct mutex mutex;
+#endif
+	int mutex_init;
 };
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+# define hotplug_lock(hp) rt_spin_lock(&(hp)->lock)
+# define hotplug_unlock(hp) rt_spin_unlock(&(hp)->lock)
+#else
+# define hotplug_lock(hp) mutex_lock(&(hp)->mutex)
+# define hotplug_unlock(hp) mutex_unlock(&(hp)->mutex)
+#endif
+
 static DEFINE_PER_CPU(struct hotplug_pcp, hotplug_pcp);
 
 /**
@@ -94,18 +111,40 @@ static DEFINE_PER_CPU(struct hotplug_pcp, hotplug_pcp);
 void pin_current_cpu(void)
 {
 	struct hotplug_pcp *hp;
+	int force = 0;
 
 retry:
 	hp = &__get_cpu_var(hotplug_pcp);
 
-	if (!hp->unplug || hp->refcount || preempt_count() > 1 ||
+	if (!hp->unplug || hp->refcount || force || preempt_count() > 1 ||
 	    hp->unplug == current || (current->flags & PF_STOMPER)) {
 		hp->refcount++;
 		return;
 	}
-	preempt_enable();
-	hotplug_lock();
-	hotplug_unlock();
+
+	if (hp->grab_lock) {
+		preempt_enable();
+		hotplug_lock(hp);
+		hotplug_unlock(hp);
+	} else {
+		preempt_enable();
+		/*
+		 * Try to push this task off of this CPU.
+		 */
+		if (!migrate_me()) {
+			preempt_disable();
+			hp = &__get_cpu_var(hotplug_pcp);
+			if (!hp->grab_lock) {
+				/*
+				 * Just let it continue it's already pinned
+				 * or about to sleep.
+				 */
+				force = 1;
+				goto retry;
+			}
+			preempt_enable();
+		}
+	}
 	preempt_disable();
 	goto retry;
 }
@@ -127,26 +166,84 @@ void unpin_current_cpu(void)
 		wake_up_process(hp->unplug);
 }
 
-/*
- * FIXME: Is this really correct under all circumstances ?
- */
+static void wait_for_pinned_cpus(struct hotplug_pcp *hp)
+{
+	set_current_state(TASK_UNINTERRUPTIBLE);
+	while (hp->refcount) {
+		schedule_preempt_disabled();
+		set_current_state(TASK_UNINTERRUPTIBLE);
+	}
+}
+
 static int sync_unplug_thread(void *data)
 {
 	struct hotplug_pcp *hp = data;
 
 	preempt_disable();
 	hp->unplug = current;
+	wait_for_pinned_cpus(hp);
+
+	/*
+	 * This thread will synchronize the cpu_down() with threads
+	 * that have pinned the CPU. When the pinned CPU count reaches
+	 * zero, we inform the cpu_down code to continue to the next step.
+	 */
 	set_current_state(TASK_UNINTERRUPTIBLE);
-	while (hp->refcount) {
-		schedule_preempt_disabled();
+	preempt_enable();
+	complete(&hp->synced);
+
+	/*
+	 * If all succeeds, the next step will need tasks to wait till
+	 * the CPU is offline before continuing. To do this, the grab_lock
+	 * is set and tasks going into pin_current_cpu() will block on the
+	 * mutex. But we still need to wait for those that are already in
+	 * pinned CPU sections. If the cpu_down() failed, the kthread_should_stop()
+	 * will kick this thread out.
+	 */
+	while (!hp->grab_lock && !kthread_should_stop()) {
+		schedule();
+		set_current_state(TASK_UNINTERRUPTIBLE);
+	}
+
+	/* Make sure grab_lock is seen before we see a stale completion */
+	smp_mb();
+
+	/*
+	 * Now just before cpu_down() enters stop machine, we need to make
+	 * sure all tasks that are in pinned CPU sections are out, and new
+	 * tasks will now grab the lock, keeping them from entering pinned
+	 * CPU sections.
+	 */
+	if (!kthread_should_stop()) {
+		preempt_disable();
+		wait_for_pinned_cpus(hp);
+		preempt_enable();
+		complete(&hp->synced);
+	}
+
+	set_current_state(TASK_UNINTERRUPTIBLE);
+	while (!kthread_should_stop()) {
+		schedule();
 		set_current_state(TASK_UNINTERRUPTIBLE);
 	}
 	set_current_state(TASK_RUNNING);
-	preempt_enable();
-	complete(&hp->synced);
+
+	/*
+	 * Force this thread off this CPU as it's going down and
+	 * we don't want any more work on this CPU.
+	 */
+	current->flags &= ~PF_THREAD_BOUND;
+	do_set_cpus_allowed(current, cpu_present_mask);
+	migrate_me();
 	return 0;
 }
 
+static void __cpu_unplug_sync(struct hotplug_pcp *hp)
+{
+	wake_up_process(hp->sync_tsk);
+	wait_for_completion(&hp->synced);
+}
+
 /*
  * Start the sync_unplug_thread on the target cpu and wait for it to
  * complete.
@@ -154,23 +251,83 @@ static int sync_unplug_thread(void *data)
 static int cpu_unplug_begin(unsigned int cpu)
 {
 	struct hotplug_pcp *hp = &per_cpu(hotplug_pcp, cpu);
-	struct task_struct *tsk;
+	int err;
+
+	/* Protected by cpu_hotplug.lock */
+	if (!hp->mutex_init) {
+#ifdef CONFIG_PREEMPT_RT_FULL
+		spin_lock_init(&hp->lock);
+#else
+		mutex_init(&hp->mutex);
+#endif
+		hp->mutex_init = 1;
+	}
+
+	/* Inform the scheduler to migrate tasks off this CPU */
+	tell_sched_cpu_down_begin(cpu);
 
 	init_completion(&hp->synced);
-	tsk = kthread_create(sync_unplug_thread, hp, "sync_unplug/%d", cpu);
-	if (IS_ERR(tsk))
-		return (PTR_ERR(tsk));
-	kthread_bind(tsk, cpu);
-	wake_up_process(tsk);
-	wait_for_completion(&hp->synced);
+
+	hp->sync_tsk = kthread_create(sync_unplug_thread, hp, "sync_unplug/%d", cpu);
+	if (IS_ERR(hp->sync_tsk)) {
+		err = PTR_ERR(hp->sync_tsk);
+		hp->sync_tsk = NULL;
+		return err;
+	}
+	kthread_bind(hp->sync_tsk, cpu);
+
+	/*
+	 * Wait for tasks to get out of the pinned sections,
+	 * it's still OK if new tasks enter. Some CPU notifiers will
+	 * wait for tasks that are going to enter these sections and
+	 * we must not have them block.
+	 */
+	__cpu_unplug_sync(hp);
+
 	return 0;
 }
 
+static void cpu_unplug_sync(unsigned int cpu)
+{
+	struct hotplug_pcp *hp = &per_cpu(hotplug_pcp, cpu);
+
+	init_completion(&hp->synced);
+	/* The completion needs to be initialzied before setting grab_lock */
+	smp_wmb();
+
+	/* Grab the mutex before setting grab_lock */
+	hotplug_lock(hp);
+	hp->grab_lock = 1;
+
+	/*
+	 * The CPU notifiers have been completed.
+	 * Wait for tasks to get out of pinned CPU sections and have new
+	 * tasks block until the CPU is completely down.
+	 */
+	__cpu_unplug_sync(hp);
+
+	/* All done with the sync thread */
+	kthread_stop(hp->sync_tsk);
+	hp->sync_tsk = NULL;
+}
+
 static void cpu_unplug_done(unsigned int cpu)
 {
 	struct hotplug_pcp *hp = &per_cpu(hotplug_pcp, cpu);
 
 	hp->unplug = NULL;
+	/* Let all tasks know cpu unplug is finished before cleaning up */
+	smp_wmb();
+
+	if (hp->sync_tsk)
+		kthread_stop(hp->sync_tsk);
+
+	if (hp->grab_lock) {
+		hotplug_unlock(hp);
+		/* protected by cpu_hotplug.lock */
+		hp->grab_lock = 0;
+	}
+	tell_sched_cpu_down_done(cpu);
 }
 
 void get_online_cpus(void)
@@ -178,9 +335,9 @@ void get_online_cpus(void)
 	might_sleep();
 	if (cpu_hotplug.active_writer == current)
 		return;
-	hotplug_lock();
+	mutex_lock(&cpu_hotplug.lock);
 	cpu_hotplug.refcount++;
-	hotplug_unlock();
+	mutex_unlock(&cpu_hotplug.lock);
 
 }
 EXPORT_SYMBOL_GPL(get_online_cpus);
@@ -189,10 +346,10 @@ void put_online_cpus(void)
 {
 	if (cpu_hotplug.active_writer == current)
 		return;
-	hotplug_lock();
+	mutex_lock(&cpu_hotplug.lock);
 	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
 		wake_up_process(cpu_hotplug.active_writer);
-	hotplug_unlock();
+	mutex_unlock(&cpu_hotplug.lock);
 
 }
 EXPORT_SYMBOL_GPL(put_online_cpus);
@@ -224,11 +381,11 @@ static void cpu_hotplug_begin(void)
 	cpu_hotplug.active_writer = current;
 
 	for (;;) {
-		hotplug_lock();
+		mutex_lock(&cpu_hotplug.lock);
 		if (likely(!cpu_hotplug.refcount))
 			break;
 		__set_current_state(TASK_UNINTERRUPTIBLE);
-		hotplug_unlock();
+		mutex_unlock(&cpu_hotplug.lock);
 		schedule();
 	}
 }
@@ -236,7 +393,7 @@ static void cpu_hotplug_begin(void)
 static void cpu_hotplug_done(void)
 {
 	cpu_hotplug.active_writer = NULL;
-	hotplug_unlock();
+	mutex_unlock(&cpu_hotplug.lock);
 }
 
 #else /* #if CONFIG_HOTPLUG_CPU */
@@ -371,6 +528,9 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
 		goto out_release;
 	}
 
+	/* Notifiers are done. Don't let any more tasks pin this CPU. */
+	cpu_unplug_sync(cpu);
+
 	err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
 	if (err) {
 		/* CPU didn't die: tell everyone.  Can't complain. */
diff --git a/kernel/sched.c b/kernel/sched.c
index c72e258..7e398c1 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4232,7 +4232,7 @@ void migrate_disable(void)
 {
 	struct task_struct *p = current;
 
-	if (in_atomic() || p->flags & PF_THREAD_BOUND) {
+	if (in_atomic()) {
 #ifdef CONFIG_SCHED_DEBUG
 		p->migrate_disable_atomic++;
 #endif
@@ -4263,7 +4263,7 @@ void migrate_enable(void)
 	unsigned long flags;
 	struct rq *rq;
 
-	if (in_atomic() || p->flags & PF_THREAD_BOUND) {
+	if (in_atomic()) {
 #ifdef CONFIG_SCHED_DEBUG
 		p->migrate_disable_atomic--;
 #endif
@@ -6185,6 +6185,84 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
 	cpumask_copy(&p->cpus_allowed, new_mask);
 }
 
+static DEFINE_PER_CPU(struct cpumask, sched_cpumasks);
+static DEFINE_MUTEX(sched_down_mutex);
+static cpumask_t sched_down_cpumask;
+
+void tell_sched_cpu_down_begin(int cpu)
+{
+	mutex_lock(&sched_down_mutex);
+	cpumask_set_cpu(cpu, &sched_down_cpumask);
+	mutex_unlock(&sched_down_mutex);
+}
+
+void tell_sched_cpu_down_done(int cpu)
+{
+	mutex_lock(&sched_down_mutex);
+	cpumask_clear_cpu(cpu, &sched_down_cpumask);
+	mutex_unlock(&sched_down_mutex);
+}
+
+/**
+ * migrate_me - try to move the current task off this cpu
+ *
+ * Used by the pin_current_cpu() code to try to get tasks
+ * to move off the current CPU as it is going down.
+ * It will only move the task if the task isn't pinned to
+ * the CPU (with migrate_disable, affinity or THREAD_BOUND)
+ * and the task has to be in a RUNNING state. Otherwise the
+ * movement of the task will wake it up (change its state
+ * to running) when the task did not expect it.
+ *
+ * Returns 1 if it succeeded in moving the current task
+ *         0 otherwise.
+ */
+int migrate_me(void)
+{
+	struct task_struct *p = current;
+	struct migration_arg arg;
+	struct cpumask *cpumask;
+	struct cpumask *mask;
+	unsigned long flags;
+	unsigned int dest_cpu;
+	struct rq *rq;
+
+	/*
+	 * We can not migrate tasks bounded to a CPU or tasks not
+	 * running. The movement of the task will wake it up.
+	 */
+	if (p->flags & PF_THREAD_BOUND || p->state)
+		return 0;
+
+	mutex_lock(&sched_down_mutex);
+	rq = task_rq_lock(p, &flags);
+
+	cpumask = &__get_cpu_var(sched_cpumasks);
+	mask = &p->cpus_allowed;
+
+	cpumask_andnot(cpumask, mask, &sched_down_cpumask);
+
+	if (!cpumask_weight(cpumask)) {
+		/* It's only on this CPU? */
+		task_rq_unlock(rq, p, &flags);
+		mutex_unlock(&sched_down_mutex);
+		return 0;
+	}
+
+	dest_cpu = cpumask_any_and(cpu_active_mask, cpumask);
+
+	arg.task = p;
+	arg.dest_cpu = dest_cpu;
+
+	task_rq_unlock(rq, p, &flags);
+
+	stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
+	tlb_migrate_finish(p->mm);
+	mutex_unlock(&sched_down_mutex);
+
+	return 1;
+}
+
 /*
  * This is how migration works:
  *
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RT 09/12] cpu/rt: Fix cpu_hotplug variable initialization
  2012-07-18 22:39 [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review Steven Rostedt
                   ` (7 preceding siblings ...)
  2012-07-18 22:39 ` [PATCH RT 08/12] cpu/rt: Rework cpu down for PREEMPT_RT Steven Rostedt
@ 2012-07-18 22:39 ` Steven Rostedt
  2012-07-18 22:39 ` [PATCH RT 10/12] workqueue: Revert workqueue: Fix PF_THREAD_BOUND abuse Steven Rostedt
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Steven Rostedt @ 2012-07-18 22:39 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users; +Cc: Thomas Gleixner, Carsten Emde, John Kacur

[-- Attachment #1: 0009-cpu-rt-Fix-cpu_hotplug-variable-initialization.patch --]
[-- Type: text/plain, Size: 842 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The commit "cpu/rt: Rework cpu down for PREEMPT_RT" changed the double
meaning of the cpu_hotplug.lock, where it was a spinlock for RT and a
mutex for non-RT, to just a mutex for both.  But the initialization of
the variable was not updated to reflect this change.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/cpu.c |    4 ----
 1 file changed, 4 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 11e6e9a..3bcbf99 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -54,11 +54,7 @@ static struct {
 	int refcount;
 } cpu_hotplug = {
 	.active_writer = NULL,
-#ifdef CONFIG_PREEMPT_RT_FULL
-	.lock = __SPIN_LOCK_UNLOCKED(cpu_hotplug.lock),
-#else
 	.lock = __MUTEX_INITIALIZER(cpu_hotplug.lock),
-#endif
 	.refcount = 0,
 };
 
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RT 10/12] workqueue: Revert workqueue: Fix PF_THREAD_BOUND abuse
  2012-07-18 22:39 [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review Steven Rostedt
                   ` (8 preceding siblings ...)
  2012-07-18 22:39 ` [PATCH RT 09/12] cpu/rt: Fix cpu_hotplug variable initialization Steven Rostedt
@ 2012-07-18 22:39 ` Steven Rostedt
  2012-07-18 22:39 ` [PATCH RT 11/12] workqueue: Revert workqueue: Fix cpuhotplug trainwreck Steven Rostedt
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Steven Rostedt @ 2012-07-18 22:39 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users; +Cc: Thomas Gleixner, Carsten Emde, John Kacur

[-- Attachment #1: 0010-workqueue-Revert-workqueue-Fix-PF_THREAD_BOUND-abuse.patch --]
[-- Type: text/plain, Size: 2413 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Revert commit

    Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Date:   Mon Oct 3 12:43:25 2011 +0200
    workqueue: Fix PF_THREAD_BOUND abuse

As TREAD_BOUND no longer affects cpu down, and this code introduced
a lot of races with taking down a CPU.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/workqueue.c |   29 +++++++++--------------------
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index a08a963..53268e3 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1287,14 +1287,8 @@ __acquires(&gcwq->lock)
 			return false;
 		if (task_cpu(task) == gcwq->cpu &&
 		    cpumask_equal(&current->cpus_allowed,
-				  get_cpu_mask(gcwq->cpu))) {
-			/*
-			 * Since we're binding to a particular cpu and need to
-			 * stay there for correctness, mark us PF_THREAD_BOUND.
-			 */
-			task->flags |= PF_THREAD_BOUND;
+				  get_cpu_mask(gcwq->cpu)))
 			return true;
-		}
 		spin_unlock_irq(&gcwq->lock);
 
 		/*
@@ -1308,18 +1302,6 @@ __acquires(&gcwq->lock)
 	}
 }
 
-static void worker_unbind_and_unlock(struct worker *worker)
-{
-	struct global_cwq *gcwq = worker->gcwq;
-	struct task_struct *task = worker->task;
-
-	/*
-	 * Its no longer required we're PF_THREAD_BOUND, the work is done.
-	 */
-	task->flags &= ~PF_THREAD_BOUND;
-	spin_unlock_irq(&gcwq->lock);
-}
-
 static struct worker *alloc_worker(void)
 {
 	struct worker *worker;
@@ -1382,9 +1364,15 @@ static struct worker *create_worker(struct global_cwq *gcwq, bool bind)
 	if (IS_ERR(worker->task))
 		goto fail;
 
+	/*
+	 * A rogue worker will become a regular one if CPU comes
+	 * online later on.  Make sure every worker has
+	 * PF_THREAD_BOUND set.
+	 */
 	if (bind && !on_unbound_cpu)
 		kthread_bind(worker->task, gcwq->cpu);
 	else {
+		worker->task->flags |= PF_THREAD_BOUND;
 		if (on_unbound_cpu)
 			worker->flags |= WORKER_UNBOUND;
 	}
@@ -2061,7 +2049,7 @@ repeat:
 		if (keep_working(gcwq))
 			wake_up_worker(gcwq);
 
-		worker_unbind_and_unlock(rescuer);
+		spin_unlock_irq(&gcwq->lock);
 	}
 
 	schedule();
@@ -2957,6 +2945,7 @@ struct workqueue_struct *__alloc_workqueue_key(const char *name,
 		if (IS_ERR(rescuer->task))
 			goto err;
 
+		rescuer->task->flags |= PF_THREAD_BOUND;
 		wake_up_process(rescuer->task);
 	}
 
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RT 11/12] workqueue: Revert workqueue: Fix cpuhotplug trainwreck
  2012-07-18 22:39 [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review Steven Rostedt
                   ` (9 preceding siblings ...)
  2012-07-18 22:39 ` [PATCH RT 10/12] workqueue: Revert workqueue: Fix PF_THREAD_BOUND abuse Steven Rostedt
@ 2012-07-18 22:39 ` Steven Rostedt
  2012-07-18 22:39 ` [PATCH RT 12/12] Linux 3.0.36-rt58-rc1 Steven Rostedt
  2012-07-19  4:00 ` [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review Mike Galbraith
  12 siblings, 0 replies; 26+ messages in thread
From: Steven Rostedt @ 2012-07-18 22:39 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users; +Cc: Thomas Gleixner, Carsten Emde, John Kacur

[-- Attachment #1: 0011-workqueue-Revert-workqueue-Fix-cpuhotplug-trainwreck.patch --]
[-- Type: text/plain, Size: 25290 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Revert

    Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Date:   Fri Sep 30 11:57:58 2011 +0200
    workqueue: Fix cpuhotplug trainwreck

As TREAD_BOUND no longer affects cpu down, and this code introduced
a lot of races with taking down a CPU.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Conflicts:

	include/linux/cpu.h
	include/linux/workqueue.h
---
 include/linux/cpu.h       |   14 +-
 include/linux/workqueue.h |    5 +-
 kernel/workqueue.c        |  561 +++++++++++++++++++++++++++++++++------------
 3 files changed, 419 insertions(+), 161 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index d7d7a12..c7823c5 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -60,16 +60,14 @@ enum {
 	 */
 	CPU_PRI_SCHED_ACTIVE	= INT_MAX,
 	CPU_PRI_CPUSET_ACTIVE	= INT_MAX - 1,
-
-	/* migration should happen before other stuff but after perf */
-	CPU_PRI_PERF			= 20,
-	CPU_PRI_MIGRATION		= 10,
-	CPU_PRI_WORKQUEUE_ACTIVE	= 5,  /* prepare workqueues for others */
-	CPU_PRI_NORMAL			= 0,
-	CPU_PRI_WORKQUEUE_INACTIVE	= -5, /* flush workqueues after others */
-
 	CPU_PRI_SCHED_INACTIVE	= INT_MIN + 1,
 	CPU_PRI_CPUSET_INACTIVE	= INT_MIN,
+
+	/* migration should happen before other stuff but after perf */
+	CPU_PRI_PERF		= 20,
+	CPU_PRI_MIGRATION	= 10,
+	/* prepare workqueues for other notifiers */
+	CPU_PRI_WORKQUEUE	= 5,
 };
 
 #ifdef CONFIG_SMP
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 5b86348..6c56a14 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -254,10 +254,9 @@ enum {
 	WQ_MEM_RECLAIM		= 1 << 3, /* may be used for memory reclaim */
 	WQ_HIGHPRI		= 1 << 4, /* high priority */
 	WQ_CPU_INTENSIVE	= 1 << 5, /* cpu instensive workqueue */
-	WQ_NON_AFFINE		= 1 << 6, /* free to move works around cpus */
 
-	WQ_DYING		= 1 << 7, /* internal: workqueue is dying */
-	WQ_RESCUER		= 1 << 8, /* internal: workqueue has rescuer */
+	WQ_DYING		= 1 << 6, /* internal: workqueue is dying */
+	WQ_RESCUER		= 1 << 7, /* internal: workqueue has rescuer */
 
 	WQ_MAX_ACTIVE		= 512,	  /* I like 512, better ideas? */
 	WQ_MAX_UNBOUND_PER_CPU	= 4,	  /* 4 * #cpus for unbound wq */
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 53268e3..99be108 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -41,7 +41,6 @@
 #include <linux/debug_locks.h>
 #include <linux/lockdep.h>
 #include <linux/idr.h>
-#include <linux/delay.h>
 
 #include "workqueue_sched.h"
 
@@ -58,10 +57,20 @@ enum {
 	WORKER_DIE		= 1 << 1,	/* die die die */
 	WORKER_IDLE		= 1 << 2,	/* is idle */
 	WORKER_PREP		= 1 << 3,	/* preparing to run works */
-	WORKER_CPU_INTENSIVE	= 1 << 4,	/* cpu intensive */
-	WORKER_UNBOUND		= 1 << 5,	/* worker is unbound */
+	WORKER_ROGUE		= 1 << 4,	/* not bound to any cpu */
+	WORKER_REBIND		= 1 << 5,	/* mom is home, come back */
+	WORKER_CPU_INTENSIVE	= 1 << 6,	/* cpu intensive */
+	WORKER_UNBOUND		= 1 << 7,	/* worker is unbound */
 
-	WORKER_NOT_RUNNING	= WORKER_PREP | WORKER_CPU_INTENSIVE | WORKER_UNBOUND,
+	WORKER_NOT_RUNNING	= WORKER_PREP | WORKER_ROGUE | WORKER_REBIND |
+				  WORKER_CPU_INTENSIVE | WORKER_UNBOUND,
+
+	/* gcwq->trustee_state */
+	TRUSTEE_START		= 0,		/* start */
+	TRUSTEE_IN_CHARGE	= 1,		/* trustee in charge of gcwq */
+	TRUSTEE_BUTCHER		= 2,		/* butcher workers */
+	TRUSTEE_RELEASE		= 3,		/* release workers */
+	TRUSTEE_DONE		= 4,		/* trustee is done */
 
 	BUSY_WORKER_HASH_ORDER	= 6,		/* 64 pointers */
 	BUSY_WORKER_HASH_SIZE	= 1 << BUSY_WORKER_HASH_ORDER,
@@ -75,6 +84,7 @@ enum {
 						   (min two ticks) */
 	MAYDAY_INTERVAL		= HZ / 10,	/* and then every 100ms */
 	CREATE_COOLDOWN		= HZ,		/* time to breath after fail */
+	TRUSTEE_COOLDOWN	= HZ / 10,	/* for trustee draining */
 
 	/*
 	 * Rescue workers are used only on emergencies and shared by
@@ -126,6 +136,7 @@ struct worker {
 	unsigned long		last_active;	/* L: last active timestamp */
 	unsigned int		flags;		/* X: flags */
 	int			id;		/* I: worker id */
+	struct work_struct	rebind_work;	/* L: rebind worker to cpu */
 	int			sleeping;	/* None */
 };
 
@@ -153,8 +164,10 @@ struct global_cwq {
 
 	struct ida		worker_ida;	/* L: for worker IDs */
 
+	struct task_struct	*trustee;	/* L: for gcwq shutdown */
+	unsigned int		trustee_state;	/* L: trustee state */
+	wait_queue_head_t	trustee_wait;	/* trustee wait */
 	struct worker		*first_idle;	/* L: first idle worker */
-	wait_queue_head_t	idle_wait;
 } ____cacheline_aligned_in_smp;
 
 /*
@@ -960,38 +973,13 @@ static bool is_chained_work(struct workqueue_struct *wq)
 	return false;
 }
 
-static void ___queue_work(struct workqueue_struct *wq, struct global_cwq *gcwq,
-			  struct work_struct *work)
-{
-	struct cpu_workqueue_struct *cwq;
-	struct list_head *worklist;
-	unsigned int work_flags;
-
-	/* gcwq determined, get cwq and queue */
-	cwq = get_cwq(gcwq->cpu, wq);
-	trace_workqueue_queue_work(gcwq->cpu, cwq, work);
-
-	BUG_ON(!list_empty(&work->entry));
-
-	cwq->nr_in_flight[cwq->work_color]++;
-	work_flags = work_color_to_flags(cwq->work_color);
-
-	if (likely(cwq->nr_active < cwq->max_active)) {
-		trace_workqueue_activate_work(work);
-		cwq->nr_active++;
-		worklist = gcwq_determine_ins_pos(gcwq, cwq);
-	} else {
-		work_flags |= WORK_STRUCT_DELAYED;
-		worklist = &cwq->delayed_works;
-	}
-
-	insert_work(cwq, work, worklist, work_flags);
-}
-
 static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 			 struct work_struct *work)
 {
 	struct global_cwq *gcwq;
+	struct cpu_workqueue_struct *cwq;
+	struct list_head *worklist;
+	unsigned int work_flags;
 	unsigned long flags;
 
 	debug_work_activate(work);
@@ -1037,32 +1025,27 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 		spin_lock_irqsave(&gcwq->lock, flags);
 	}
 
-	___queue_work(wq, gcwq, work);
+	/* gcwq determined, get cwq and queue */
+	cwq = get_cwq(gcwq->cpu, wq);
+	trace_workqueue_queue_work(cpu, cwq, work);
 
-	spin_unlock_irqrestore(&gcwq->lock, flags);
-}
+	BUG_ON(!list_empty(&work->entry));
 
-/**
- * queue_work_on - queue work on specific cpu
- * @cpu: CPU number to execute work on
- * @wq: workqueue to use
- * @work: work to queue
- *
- * Returns 0 if @work was already on a queue, non-zero otherwise.
- *
- * We queue the work to a specific CPU, the caller must ensure it
- * can't go away.
- */
-static int
-__queue_work_on(int cpu, struct workqueue_struct *wq, struct work_struct *work)
-{
-	int ret = 0;
+	cwq->nr_in_flight[cwq->work_color]++;
+	work_flags = work_color_to_flags(cwq->work_color);
 
-	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
-		__queue_work(cpu, wq, work);
-		ret = 1;
+	if (likely(cwq->nr_active < cwq->max_active)) {
+		trace_workqueue_activate_work(work);
+		cwq->nr_active++;
+		worklist = gcwq_determine_ins_pos(gcwq, cwq);
+	} else {
+		work_flags |= WORK_STRUCT_DELAYED;
+		worklist = &cwq->delayed_works;
 	}
-	return ret;
+
+	insert_work(cwq, work, worklist, work_flags);
+
+	spin_unlock_irqrestore(&gcwq->lock, flags);
 }
 
 /**
@@ -1079,19 +1062,34 @@ int queue_work(struct workqueue_struct *wq, struct work_struct *work)
 {
 	int ret;
 
-	ret = __queue_work_on(get_cpu_light(), wq, work);
+	ret = queue_work_on(get_cpu_light(), wq, work);
 	put_cpu_light();
 
 	return ret;
 }
 EXPORT_SYMBOL_GPL(queue_work);
 
+/**
+ * queue_work_on - queue work on specific cpu
+ * @cpu: CPU number to execute work on
+ * @wq: workqueue to use
+ * @work: work to queue
+ *
+ * Returns 0 if @work was already on a queue, non-zero otherwise.
+ *
+ * We queue the work to a specific CPU, the caller must ensure it
+ * can't go away.
+ */
 int
 queue_work_on(int cpu, struct workqueue_struct *wq, struct work_struct *work)
 {
-	WARN_ON(wq->flags & WQ_NON_AFFINE);
+	int ret = 0;
 
-	return __queue_work_on(cpu, wq, work);
+	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
+		__queue_work(cpu, wq, work);
+		ret = 1;
+	}
+	return ret;
 }
 EXPORT_SYMBOL_GPL(queue_work_on);
 
@@ -1137,8 +1135,6 @@ int queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 	struct timer_list *timer = &dwork->timer;
 	struct work_struct *work = &dwork->work;
 
-	WARN_ON((wq->flags & WQ_NON_AFFINE) && cpu != -1);
-
 	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
 		unsigned int lcpu;
 
@@ -1204,13 +1200,12 @@ static void worker_enter_idle(struct worker *worker)
 	/* idle_list is LIFO */
 	list_add(&worker->entry, &gcwq->idle_list);
 
-	if (gcwq->nr_idle == gcwq->nr_workers)
-		wake_up_all(&gcwq->idle_wait);
-
-	if (too_many_workers(gcwq) && !timer_pending(&gcwq->idle_timer)) {
-		mod_timer(&gcwq->idle_timer,
-				jiffies + IDLE_WORKER_TIMEOUT);
-	}
+	if (likely(!(worker->flags & WORKER_ROGUE))) {
+		if (too_many_workers(gcwq) && !timer_pending(&gcwq->idle_timer))
+			mod_timer(&gcwq->idle_timer,
+				  jiffies + IDLE_WORKER_TIMEOUT);
+	} else
+		wake_up_all(&gcwq->trustee_wait);
 
 	/* sanity check nr_running */
 	WARN_ON_ONCE(gcwq->nr_workers == gcwq->nr_idle &&
@@ -1302,6 +1297,23 @@ __acquires(&gcwq->lock)
 	}
 }
 
+/*
+ * Function for worker->rebind_work used to rebind rogue busy workers
+ * to the associated cpu which is coming back online.  This is
+ * scheduled by cpu up but can race with other cpu hotplug operations
+ * and may be executed twice without intervening cpu down.
+ */
+static void worker_rebind_fn(struct work_struct *work)
+{
+	struct worker *worker = container_of(work, struct worker, rebind_work);
+	struct global_cwq *gcwq = worker->gcwq;
+
+	if (worker_maybe_bind_and_lock(worker))
+		worker_clr_flags(worker, WORKER_REBIND);
+
+	spin_unlock_irq(&gcwq->lock);
+}
+
 static struct worker *alloc_worker(void)
 {
 	struct worker *worker;
@@ -1310,6 +1322,7 @@ static struct worker *alloc_worker(void)
 	if (worker) {
 		INIT_LIST_HEAD(&worker->entry);
 		INIT_LIST_HEAD(&worker->scheduled);
+		INIT_WORK(&worker->rebind_work, worker_rebind_fn);
 		/* on creation a worker is in !idle && prep state */
 		worker->flags = WORKER_PREP;
 	}
@@ -1649,6 +1662,13 @@ static bool manage_workers(struct worker *worker)
 
 	gcwq->flags &= ~GCWQ_MANAGING_WORKERS;
 
+	/*
+	 * The trustee might be waiting to take over the manager
+	 * position, tell it we're done.
+	 */
+	if (unlikely(gcwq->trustee))
+		wake_up_all(&gcwq->trustee_wait);
+
 	return ret;
 }
 
@@ -3164,76 +3184,171 @@ EXPORT_SYMBOL_GPL(work_busy);
  * gcwqs serve mix of short, long and very long running works making
  * blocked draining impractical.
  *
+ * This is solved by allowing a gcwq to be detached from CPU, running
+ * it with unbound (rogue) workers and allowing it to be reattached
+ * later if the cpu comes back online.  A separate thread is created
+ * to govern a gcwq in such state and is called the trustee of the
+ * gcwq.
+ *
+ * Trustee states and their descriptions.
+ *
+ * START	Command state used on startup.  On CPU_DOWN_PREPARE, a
+ *		new trustee is started with this state.
+ *
+ * IN_CHARGE	Once started, trustee will enter this state after
+ *		assuming the manager role and making all existing
+ *		workers rogue.  DOWN_PREPARE waits for trustee to
+ *		enter this state.  After reaching IN_CHARGE, trustee
+ *		tries to execute the pending worklist until it's empty
+ *		and the state is set to BUTCHER, or the state is set
+ *		to RELEASE.
+ *
+ * BUTCHER	Command state which is set by the cpu callback after
+ *		the cpu has went down.  Once this state is set trustee
+ *		knows that there will be no new works on the worklist
+ *		and once the worklist is empty it can proceed to
+ *		killing idle workers.
+ *
+ * RELEASE	Command state which is set by the cpu callback if the
+ *		cpu down has been canceled or it has come online
+ *		again.  After recognizing this state, trustee stops
+ *		trying to drain or butcher and clears ROGUE, rebinds
+ *		all remaining workers back to the cpu and releases
+ *		manager role.
+ *
+ * DONE		Trustee will enter this state after BUTCHER or RELEASE
+ *		is complete.
+ *
+ *          trustee                 CPU                draining
+ *         took over                down               complete
+ * START -----------> IN_CHARGE -----------> BUTCHER -----------> DONE
+ *                        |                     |                  ^
+ *                        | CPU is back online  v   return workers |
+ *                         ----------------> RELEASE --------------
  */
 
-static int __devinit workqueue_cpu_up_callback(struct notifier_block *nfb,
-						unsigned long action,
-						void *hcpu)
-{
-	unsigned int cpu = (unsigned long)hcpu;
-	struct global_cwq *gcwq = get_gcwq(cpu);
-	struct worker *uninitialized_var(new_worker);
-	unsigned long flags;
+/**
+ * trustee_wait_event_timeout - timed event wait for trustee
+ * @cond: condition to wait for
+ * @timeout: timeout in jiffies
+ *
+ * wait_event_timeout() for trustee to use.  Handles locking and
+ * checks for RELEASE request.
+ *
+ * CONTEXT:
+ * spin_lock_irq(gcwq->lock) which may be released and regrabbed
+ * multiple times.  To be used by trustee.
+ *
+ * RETURNS:
+ * Positive indicating left time if @cond is satisfied, 0 if timed
+ * out, -1 if canceled.
+ */
+#define trustee_wait_event_timeout(cond, timeout) ({			\
+	long __ret = (timeout);						\
+	while (!((cond) || (gcwq->trustee_state == TRUSTEE_RELEASE)) &&	\
+	       __ret) {							\
+		spin_unlock_irq(&gcwq->lock);				\
+		__wait_event_timeout(gcwq->trustee_wait, (cond) ||	\
+			(gcwq->trustee_state == TRUSTEE_RELEASE),	\
+			__ret);						\
+		spin_lock_irq(&gcwq->lock);				\
+	}								\
+	gcwq->trustee_state == TRUSTEE_RELEASE ? -1 : (__ret);		\
+})
 
-	action &= ~CPU_TASKS_FROZEN;
+/**
+ * trustee_wait_event - event wait for trustee
+ * @cond: condition to wait for
+ *
+ * wait_event() for trustee to use.  Automatically handles locking and
+ * checks for CANCEL request.
+ *
+ * CONTEXT:
+ * spin_lock_irq(gcwq->lock) which may be released and regrabbed
+ * multiple times.  To be used by trustee.
+ *
+ * RETURNS:
+ * 0 if @cond is satisfied, -1 if canceled.
+ */
+#define trustee_wait_event(cond) ({					\
+	long __ret1;							\
+	__ret1 = trustee_wait_event_timeout(cond, MAX_SCHEDULE_TIMEOUT);\
+	__ret1 < 0 ? -1 : 0;						\
+})
 
-	switch (action) {
-	case CPU_UP_PREPARE:
-		BUG_ON(gcwq->first_idle);
-		new_worker = create_worker(gcwq, false);
-		if (!new_worker)
-			return NOTIFY_BAD;
-	case CPU_UP_CANCELED:
-	case CPU_ONLINE:
-		break;
-	default:
-		return notifier_from_errno(0);
-	}
+static int __cpuinit trustee_thread(void *__gcwq)
+{
+	struct global_cwq *gcwq = __gcwq;
+	struct worker *worker;
+	struct work_struct *work;
+	struct hlist_node *pos;
+	long rc;
+	int i;
 
-	/* some are called w/ irq disabled, don't disturb irq status */
-	spin_lock_irqsave(&gcwq->lock, flags);
+	BUG_ON(gcwq->cpu != smp_processor_id());
 
-	switch (action) {
-	case CPU_UP_PREPARE:
-		BUG_ON(gcwq->first_idle);
-		gcwq->first_idle = new_worker;
-		break;
+	spin_lock_irq(&gcwq->lock);
+	/*
+	 * Claim the manager position and make all workers rogue.
+	 * Trustee must be bound to the target cpu and can't be
+	 * cancelled.
+	 */
+	BUG_ON(gcwq->cpu != smp_processor_id());
+	rc = trustee_wait_event(!(gcwq->flags & GCWQ_MANAGING_WORKERS));
+	BUG_ON(rc < 0);
 
-	case CPU_UP_CANCELED:
-		destroy_worker(gcwq->first_idle);
-		gcwq->first_idle = NULL;
-		break;
+	gcwq->flags |= GCWQ_MANAGING_WORKERS;
 
-	case CPU_ONLINE:
-		spin_unlock_irq(&gcwq->lock);
-		kthread_bind(gcwq->first_idle->task, cpu);
-		spin_lock_irq(&gcwq->lock);
-		gcwq->flags |= GCWQ_MANAGE_WORKERS;
-		start_worker(gcwq->first_idle);
-		gcwq->first_idle = NULL;
-		break;
-	}
+	list_for_each_entry(worker, &gcwq->idle_list, entry)
+		worker->flags |= WORKER_ROGUE;
 
-	spin_unlock_irqrestore(&gcwq->lock, flags);
+	for_each_busy_worker(worker, i, pos, gcwq)
+		worker->flags |= WORKER_ROGUE;
 
-	return notifier_from_errno(0);
-}
+	/*
+	 * Call schedule() so that we cross rq->lock and thus can
+	 * guarantee sched callbacks see the rogue flag.  This is
+	 * necessary as scheduler callbacks may be invoked from other
+	 * cpus.
+	 */
+	spin_unlock_irq(&gcwq->lock);
+	schedule();
+	spin_lock_irq(&gcwq->lock);
 
-static void flush_gcwq(struct global_cwq *gcwq)
-{
-	struct work_struct *work, *nw;
-	struct worker *worker, *n;
-	LIST_HEAD(non_affine_works);
+	/*
+	 * Sched callbacks are disabled now.  Zap nr_running.  After
+	 * this, nr_running stays zero and need_more_worker() and
+	 * keep_working() are always true as long as the worklist is
+	 * not empty.
+	 */
+	atomic_set(get_gcwq_nr_running(gcwq->cpu), 0);
 
+	spin_unlock_irq(&gcwq->lock);
+	del_timer_sync(&gcwq->idle_timer);
 	spin_lock_irq(&gcwq->lock);
-	list_for_each_entry_safe(work, nw, &gcwq->worklist, entry) {
-		struct workqueue_struct *wq = get_work_cwq(work)->wq;
 
-		if (wq->flags & WQ_NON_AFFINE)
-			list_move(&work->entry, &non_affine_works);
-	}
+	/*
+	 * We're now in charge.  Notify and proceed to drain.  We need
+	 * to keep the gcwq running during the whole CPU down
+	 * procedure as other cpu hotunplug callbacks may need to
+	 * flush currently running tasks.
+	 */
+	gcwq->trustee_state = TRUSTEE_IN_CHARGE;
+	wake_up_all(&gcwq->trustee_wait);
 
-	while (!list_empty(&gcwq->worklist)) {
+	/*
+	 * The original cpu is in the process of dying and may go away
+	 * anytime now.  When that happens, we and all workers would
+	 * be migrated to other cpus.  Try draining any left work.  We
+	 * want to get it over with ASAP - spam rescuers, wake up as
+	 * many idlers as necessary and create new ones till the
+	 * worklist is empty.  Note that if the gcwq is frozen, there
+	 * may be frozen works in freezable cwqs.  Don't declare
+	 * completion while frozen.
+	 */
+	while (gcwq->nr_workers != gcwq->nr_idle ||
+	       gcwq->flags & GCWQ_FREEZING ||
+	       gcwq->trustee_state == TRUSTEE_IN_CHARGE) {
 		int nr_works = 0;
 
 		list_for_each_entry(work, &gcwq->worklist, entry) {
@@ -3247,55 +3362,200 @@ static void flush_gcwq(struct global_cwq *gcwq)
 			wake_up_process(worker->task);
 		}
 
-		spin_unlock_irq(&gcwq->lock);
-
 		if (need_to_create_worker(gcwq)) {
-			worker = create_worker(gcwq, true);
-			if (worker)
+			spin_unlock_irq(&gcwq->lock);
+			worker = create_worker(gcwq, false);
+			spin_lock_irq(&gcwq->lock);
+			if (worker) {
+				worker->flags |= WORKER_ROGUE;
 				start_worker(worker);
+			}
 		}
 
-		wait_event_timeout(gcwq->idle_wait,
-				gcwq->nr_idle == gcwq->nr_workers, HZ/10);
-
-		spin_lock_irq(&gcwq->lock);
+		/* give a breather */
+		if (trustee_wait_event_timeout(false, TRUSTEE_COOLDOWN) < 0)
+			break;
 	}
 
-	WARN_ON(gcwq->nr_workers != gcwq->nr_idle);
+	/*
+	 * Either all works have been scheduled and cpu is down, or
+	 * cpu down has already been canceled.  Wait for and butcher
+	 * all workers till we're canceled.
+	 */
+	do {
+		rc = trustee_wait_event(!list_empty(&gcwq->idle_list));
+		while (!list_empty(&gcwq->idle_list))
+			destroy_worker(list_first_entry(&gcwq->idle_list,
+							struct worker, entry));
+	} while (gcwq->nr_workers && rc >= 0);
 
-	list_for_each_entry_safe(worker, n, &gcwq->idle_list, entry)
-		destroy_worker(worker);
+	/*
+	 * At this point, either draining has completed and no worker
+	 * is left, or cpu down has been canceled or the cpu is being
+	 * brought back up.  There shouldn't be any idle one left.
+	 * Tell the remaining busy ones to rebind once it finishes the
+	 * currently scheduled works by scheduling the rebind_work.
+	 */
+	WARN_ON(!list_empty(&gcwq->idle_list));
 
-	WARN_ON(gcwq->nr_workers || gcwq->nr_idle);
+	for_each_busy_worker(worker, i, pos, gcwq) {
+		struct work_struct *rebind_work = &worker->rebind_work;
 
-	spin_unlock_irq(&gcwq->lock);
+		/*
+		 * Rebind_work may race with future cpu hotplug
+		 * operations.  Use a separate flag to mark that
+		 * rebinding is scheduled.
+		 */
+		worker->flags |= WORKER_REBIND;
+		worker->flags &= ~WORKER_ROGUE;
 
-	gcwq = get_gcwq(get_cpu_light());
-	spin_lock_irq(&gcwq->lock);
-	list_for_each_entry_safe(work, nw, &non_affine_works, entry) {
-		list_del_init(&work->entry);
-		___queue_work(get_work_cwq(work)->wq, gcwq, work);
+		/* queue rebind_work, wq doesn't matter, use the default one */
+		if (test_and_set_bit(WORK_STRUCT_PENDING_BIT,
+				     work_data_bits(rebind_work)))
+			continue;
+
+		debug_work_activate(rebind_work);
+		insert_work(get_cwq(gcwq->cpu, system_wq), rebind_work,
+			    worker->scheduled.next,
+			    work_color_to_flags(WORK_NO_COLOR));
 	}
+
+	/* relinquish manager role */
+	gcwq->flags &= ~GCWQ_MANAGING_WORKERS;
+
+	/* notify completion */
+	gcwq->trustee = NULL;
+	gcwq->trustee_state = TRUSTEE_DONE;
+	wake_up_all(&gcwq->trustee_wait);
 	spin_unlock_irq(&gcwq->lock);
-	put_cpu_light();
+	return 0;
 }
 
-static int __devinit workqueue_cpu_down_callback(struct notifier_block *nfb,
+/**
+ * wait_trustee_state - wait for trustee to enter the specified state
+ * @gcwq: gcwq the trustee of interest belongs to
+ * @state: target state to wait for
+ *
+ * Wait for the trustee to reach @state.  DONE is already matched.
+ *
+ * CONTEXT:
+ * spin_lock_irq(gcwq->lock) which may be released and regrabbed
+ * multiple times.  To be used by cpu_callback.
+ */
+static void __cpuinit wait_trustee_state(struct global_cwq *gcwq, int state)
+__releases(&gcwq->lock)
+__acquires(&gcwq->lock)
+{
+	if (!(gcwq->trustee_state == state ||
+	      gcwq->trustee_state == TRUSTEE_DONE)) {
+		spin_unlock_irq(&gcwq->lock);
+		__wait_event(gcwq->trustee_wait,
+			     gcwq->trustee_state == state ||
+			     gcwq->trustee_state == TRUSTEE_DONE);
+		spin_lock_irq(&gcwq->lock);
+	}
+}
+
+static int __devinit workqueue_cpu_callback(struct notifier_block *nfb,
 						unsigned long action,
 						void *hcpu)
 {
 	unsigned int cpu = (unsigned long)hcpu;
 	struct global_cwq *gcwq = get_gcwq(cpu);
+	struct task_struct *new_trustee = NULL;
+	struct worker *uninitialized_var(new_worker);
+	unsigned long flags;
 
 	action &= ~CPU_TASKS_FROZEN;
 
-        switch (action) {
-        case CPU_DOWN_PREPARE:
-                flush_gcwq(gcwq);
-                break;
-        }
+	switch (action) {
+	case CPU_DOWN_PREPARE:
+		new_trustee = kthread_create(trustee_thread, gcwq,
+					     "workqueue_trustee/%d\n", cpu);
+		if (IS_ERR(new_trustee))
+			return notifier_from_errno(PTR_ERR(new_trustee));
+		kthread_bind(new_trustee, cpu);
+		/* fall through */
+	case CPU_UP_PREPARE:
+		BUG_ON(gcwq->first_idle);
+		new_worker = create_worker(gcwq, false);
+		if (!new_worker) {
+			if (new_trustee)
+				kthread_stop(new_trustee);
+			return NOTIFY_BAD;
+		}
+		break;
+	case CPU_POST_DEAD:
+	case CPU_UP_CANCELED:
+	case CPU_DOWN_FAILED:
+	case CPU_ONLINE:
+		break;
+	case CPU_DYING:
+		/*
+		 * We access this lockless. We are on the dying CPU
+		 * and called from stomp machine.
+		 *
+		 * Before this, the trustee and all workers except for
+		 * the ones which are still executing works from
+		 * before the last CPU down must be on the cpu.  After
+		 * this, they'll all be diasporas.
+		 */
+		gcwq->flags |= GCWQ_DISASSOCIATED;
+	default:
+		goto out;
+	}
+
+	/* some are called w/ irq disabled, don't disturb irq status */
+	spin_lock_irqsave(&gcwq->lock, flags);
+
+	switch (action) {
+	case CPU_DOWN_PREPARE:
+		/* initialize trustee and tell it to acquire the gcwq */
+		BUG_ON(gcwq->trustee || gcwq->trustee_state != TRUSTEE_DONE);
+		gcwq->trustee = new_trustee;
+		gcwq->trustee_state = TRUSTEE_START;
+		wake_up_process(gcwq->trustee);
+		wait_trustee_state(gcwq, TRUSTEE_IN_CHARGE);
+		/* fall through */
+	case CPU_UP_PREPARE:
+		BUG_ON(gcwq->first_idle);
+		gcwq->first_idle = new_worker;
+		break;
+
+	case CPU_POST_DEAD:
+		gcwq->trustee_state = TRUSTEE_BUTCHER;
+		/* fall through */
+	case CPU_UP_CANCELED:
+		destroy_worker(gcwq->first_idle);
+		gcwq->first_idle = NULL;
+		break;
 
+	case CPU_DOWN_FAILED:
+	case CPU_ONLINE:
+		gcwq->flags &= ~GCWQ_DISASSOCIATED;
+		if (gcwq->trustee_state != TRUSTEE_DONE) {
+			gcwq->trustee_state = TRUSTEE_RELEASE;
+			wake_up_process(gcwq->trustee);
+			wait_trustee_state(gcwq, TRUSTEE_DONE);
+		}
+
+		/*
+		 * Trustee is done and there might be no worker left.
+		 * Put the first_idle in and request a real manager to
+		 * take a look.
+		 */
+		spin_unlock_irq(&gcwq->lock);
+		kthread_bind(gcwq->first_idle->task, cpu);
+		spin_lock_irq(&gcwq->lock);
+		gcwq->flags |= GCWQ_MANAGE_WORKERS;
+		start_worker(gcwq->first_idle);
+		gcwq->first_idle = NULL;
+		break;
+	}
+
+	spin_unlock_irqrestore(&gcwq->lock, flags);
 
+out:
 	return notifier_from_errno(0);
 }
 
@@ -3492,8 +3752,7 @@ static int __init init_workqueues(void)
 	unsigned int cpu;
 	int i;
 
-	cpu_notifier(workqueue_cpu_up_callback, CPU_PRI_WORKQUEUE_ACTIVE);
- 	hotcpu_notifier(workqueue_cpu_down_callback, CPU_PRI_WORKQUEUE_INACTIVE);
+	cpu_notifier(workqueue_cpu_callback, CPU_PRI_WORKQUEUE);
 
 	/* initialize gcwqs */
 	for_each_gcwq_cpu(cpu) {
@@ -3516,7 +3775,9 @@ static int __init init_workqueues(void)
 			    (unsigned long)gcwq);
 
 		ida_init(&gcwq->worker_ida);
-		init_waitqueue_head(&gcwq->idle_wait);
+
+		gcwq->trustee_state = TRUSTEE_DONE;
+		init_waitqueue_head(&gcwq->trustee_wait);
 	}
 
 	/* create the initial worker */
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RT 12/12] Linux 3.0.36-rt58-rc1
  2012-07-18 22:39 [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review Steven Rostedt
                   ` (10 preceding siblings ...)
  2012-07-18 22:39 ` [PATCH RT 11/12] workqueue: Revert workqueue: Fix cpuhotplug trainwreck Steven Rostedt
@ 2012-07-18 22:39 ` Steven Rostedt
  2012-07-19  4:00 ` [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review Mike Galbraith
  12 siblings, 0 replies; 26+ messages in thread
From: Steven Rostedt @ 2012-07-18 22:39 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users; +Cc: Thomas Gleixner, Carsten Emde, John Kacur

[-- Attachment #1: 0012-Linux-3.0.36-rt58-rc1.patch --]
[-- Type: text/plain, Size: 289 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

---
 localversion-rt |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/localversion-rt b/localversion-rt
index c06cc43..1fdcd49 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt57
+-rt58-rc1
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review
  2012-07-18 22:39 [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review Steven Rostedt
                   ` (11 preceding siblings ...)
  2012-07-18 22:39 ` [PATCH RT 12/12] Linux 3.0.36-rt58-rc1 Steven Rostedt
@ 2012-07-19  4:00 ` Mike Galbraith
  2012-07-19 13:05   ` Steven Rostedt
  12 siblings, 1 reply; 26+ messages in thread
From: Mike Galbraith @ 2012-07-19  4:00 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Thomas Gleixner, Carsten Emde, John Kacur

On Wed, 2012-07-18 at 18:39 -0400, Steven Rostedt wrote:

> Please test the patches too.

Your hotplug stress test script made x3550 M3 box fall over.  It took a
bit, but down she went.  64 core test box fell over quickly, but that's
very far from virgin source.. seems to be the same though.

[  255.016043] CPU 1 MCA<0>Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 7
Pid: 9914, comm: migration/7 Not tainted 3.0.36-rt57 #49
Call Trace:
 <NMI>  [<ffffffff814a0f7b>] panic+0x9b/0x1b0
 [<ffffffff810b0627>] watchdog_overflow_callback+0xd7/0xe0
 [<ffffffff810c3dad>] __perf_event_overflow+0x9d/0x240
 [<ffffffff810c066b>] ? perf_event_update_userpage+0x9b/0xe0
 [<ffffffff810c41a4>] perf_event_overflow+0x14/0x20
 [<ffffffff81015707>] intel_pmu_handle_irq+0x177/0x230
 [<ffffffff814a5549>] perf_event_nmi_handler+0x39/0xc0
 [<ffffffff814a727d>] notifier_call_chain+0x4d/0x70
 [<ffffffff814a72e3>] __atomic_notifier_call_chain+0x43/0x60
 [<ffffffff814a7311>] atomic_notifier_call_chain+0x11/0x20
 [<ffffffff814a734e>] notify_die+0x2e/0x30
 [<ffffffff814a4699>] default_do_nmi+0x39/0x200
 [<ffffffff814a4a48>] do_nmi+0x78/0x80
 [<ffffffff814a44d0>] nmi+0x20/0x30
 [<ffffffff810a461a>] ? stop_machine_cpu_stop+0x6a/0xe0
 <<EOE>>  [<ffffffff810a47f4>] cpu_stopper_thread+0xf4/0x1d0
 [<ffffffff810a45b0>] ? wait_for_stop_done+0xa0/0xa0
 [<ffffffff814a1397>] ? __schedule+0x2c7/0x630
 [<ffffffff810a4700>] ? cpu_stop_queue_work+0x70/0x70
 [<ffffffff810a4700>] ? cpu_stop_queue_work+0x70/0x70
 [<ffffffff810702c6>] kthread+0xa6/0xb0
 [<ffffffff81056328>] ? do_exit+0x278/0x450
 [<ffffffff810016b2>] ? __switch_to+0xf2/0x370
 [<ffffffff81040f15>] ? finish_task_switch+0x55/0xd0
 [<ffffffff814aa6e4>] kernel_thread_helper+0x4/0x10
 [<ffffffff81070220>] ? __init_kthread_worker+0x50/0x50
 [<ffffffff814aa6e0>] ? gs_change+0x13/0x13



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review
  2012-07-19  4:00 ` [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review Mike Galbraith
@ 2012-07-19 13:05   ` Steven Rostedt
  2012-07-19 13:51     ` Mike Galbraith
  0 siblings, 1 reply; 26+ messages in thread
From: Steven Rostedt @ 2012-07-19 13:05 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: linux-kernel, linux-rt-users, Thomas Gleixner, Carsten Emde, John Kacur

On Thu, 2012-07-19 at 06:00 +0200, Mike Galbraith wrote:
> On Wed, 2012-07-18 at 18:39 -0400, Steven Rostedt wrote:
> 
> > Please test the patches too.
> 
> Your hotplug stress test script made x3550 M3 box fall over.  It took a
> bit, but down she went.  64 core test box fell over quickly, but that's
> very far from virgin source.. seems to be the same though.

Thanks for the report. I know a few areas in the hotplug code that can
still deadlock (but are hard to hit). But there's no easy fix for them.
Basically, the only thing we can do is redesign cpu hotplug (I think
someone is already trying to do that ;-).

But these patches do fix the main issues of cpu hotplug (albeit, making
the code even uglier).

The panic below isn't telling much. We really need to know what the
other CPUs were up to. This call trace is just telling us that one of
the CPUs is waiting for other CPUs to stop or to finish something up.

-- Steve


> 
> [  255.016043] CPU 1 MCA<0>Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 7
> Pid: 9914, comm: migration/7 Not tainted 3.0.36-rt57 #49
> Call Trace:
>  <NMI>  [<ffffffff814a0f7b>] panic+0x9b/0x1b0
>  [<ffffffff810b0627>] watchdog_overflow_callback+0xd7/0xe0
>  [<ffffffff810c3dad>] __perf_event_overflow+0x9d/0x240
>  [<ffffffff810c066b>] ? perf_event_update_userpage+0x9b/0xe0
>  [<ffffffff810c41a4>] perf_event_overflow+0x14/0x20
>  [<ffffffff81015707>] intel_pmu_handle_irq+0x177/0x230
>  [<ffffffff814a5549>] perf_event_nmi_handler+0x39/0xc0
>  [<ffffffff814a727d>] notifier_call_chain+0x4d/0x70
>  [<ffffffff814a72e3>] __atomic_notifier_call_chain+0x43/0x60
>  [<ffffffff814a7311>] atomic_notifier_call_chain+0x11/0x20
>  [<ffffffff814a734e>] notify_die+0x2e/0x30
>  [<ffffffff814a4699>] default_do_nmi+0x39/0x200
>  [<ffffffff814a4a48>] do_nmi+0x78/0x80
>  [<ffffffff814a44d0>] nmi+0x20/0x30
>  [<ffffffff810a461a>] ? stop_machine_cpu_stop+0x6a/0xe0
>  <<EOE>>  [<ffffffff810a47f4>] cpu_stopper_thread+0xf4/0x1d0
>  [<ffffffff810a45b0>] ? wait_for_stop_done+0xa0/0xa0
>  [<ffffffff814a1397>] ? __schedule+0x2c7/0x630
>  [<ffffffff810a4700>] ? cpu_stop_queue_work+0x70/0x70
>  [<ffffffff810a4700>] ? cpu_stop_queue_work+0x70/0x70
>  [<ffffffff810702c6>] kthread+0xa6/0xb0
>  [<ffffffff81056328>] ? do_exit+0x278/0x450
>  [<ffffffff810016b2>] ? __switch_to+0xf2/0x370
>  [<ffffffff81040f15>] ? finish_task_switch+0x55/0xd0
>  [<ffffffff814aa6e4>] kernel_thread_helper+0x4/0x10
>  [<ffffffff81070220>] ? __init_kthread_worker+0x50/0x50
>  [<ffffffff814aa6e0>] ? gs_change+0x13/0x13
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review
  2012-07-19 13:05   ` Steven Rostedt
@ 2012-07-19 13:51     ` Mike Galbraith
  2012-07-19 14:02       ` Steven Rostedt
  0 siblings, 1 reply; 26+ messages in thread
From: Mike Galbraith @ 2012-07-19 13:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Thomas Gleixner, Carsten Emde, John Kacur

On Thu, 2012-07-19 at 09:05 -0400, Steven Rostedt wrote: 
> On Thu, 2012-07-19 at 06:00 +0200, Mike Galbraith wrote:
> > On Wed, 2012-07-18 at 18:39 -0400, Steven Rostedt wrote:
> > 
> > > Please test the patches too.
> > 
> > Your hotplug stress test script made x3550 M3 box fall over.  It took a
> > bit, but down she went.  64 core test box fell over quickly, but that's
> > very far from virgin source.. seems to be the same though.
> 
> Thanks for the report. I know a few areas in the hotplug code that can
> still deadlock (but are hard to hit). But there's no easy fix for them.
> Basically, the only thing we can do is redesign cpu hotplug (I think
> someone is already trying to do that ;-).

Every kernel I've fed you script to has died sooner or later, so I wish
him fair sailing.  Here there be sea monsters ;-)

-Mike


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review
  2012-07-19 13:51     ` Mike Galbraith
@ 2012-07-19 14:02       ` Steven Rostedt
  2012-07-20  3:49         ` Mike Galbraith
  0 siblings, 1 reply; 26+ messages in thread
From: Steven Rostedt @ 2012-07-19 14:02 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: linux-kernel, linux-rt-users, Thomas Gleixner, Carsten Emde, John Kacur

On Thu, 2012-07-19 at 15:51 +0200, Mike Galbraith wrote:

> Every kernel I've fed you script to has died sooner or later, so I wish
> him fair sailing.  Here there be sea monsters ;-)

I'm curious. Can my script bring down a non-rt kernel?

-- Steve



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review
  2012-07-19 14:02       ` Steven Rostedt
@ 2012-07-20  3:49         ` Mike Galbraith
  0 siblings, 0 replies; 26+ messages in thread
From: Mike Galbraith @ 2012-07-20  3:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-rt-users, Thomas Gleixner, Carsten Emde, John Kacur

On Thu, 2012-07-19 at 10:02 -0400, Steven Rostedt wrote: 
> On Thu, 2012-07-19 at 15:51 +0200, Mike Galbraith wrote:
> 
> > Every kernel I've fed you script to has died sooner or later, so I wish
> > him fair sailing.  Here there be sea monsters ;-)
> 
> I'm curious. Can my script bring down a non-rt kernel?

Yeah, it took a couple non-rt (and virgin) kernels down on my 64 core
box.

-Mike


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RT 05/12] slab: Prevent local lock deadlock
  2012-07-18 22:39 ` [PATCH RT 05/12] slab: Prevent local lock deadlock Steven Rostedt
@ 2012-07-27  0:15   ` Frank Rowand
  2012-07-31  1:22     ` Steven Rostedt
  0 siblings, 1 reply; 26+ messages in thread
From: Frank Rowand @ 2012-07-27  0:15 UTC (permalink / raw)
  To: Steven Rostedt, tglx, chris.pringle
  Cc: linux-kernel, linux-rt-users, Thomas Gleixner, Carsten Emde, John Kacur

On 07/18/12 15:39, Steven Rostedt wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> On RT we avoid the cross cpu function calls and take the per cpu local
> locks instead. Now the code missed that taking the local lock on the
> cpu which runs the code must use the proper local lock functions and
> not a simple spin_lock(). Otherwise it deadlocks later when trying to
> acquire the local lock with the proper function.
> 
> Reported-and-tested-by: Chris Pringle <chris.pringle@miranda.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  mm/slab.c |   26 ++++++++++++++++++++++----
>  1 file changed, 22 insertions(+), 4 deletions(-)


This patch leads to a warning during boot on the ARM pandaboard:

[    0.225097] Brought up 2 CPUs
[    0.225097] SMP: Total of 2 processors activated (2007.19 BogoMIPS).
[    0.225952] 
[    0.225982] =============================================
[    0.225982] [ INFO: possible recursive locking detected ]
[    0.225982] 3.0.36-rt58 #1
[    0.225982] ---------------------------------------------
[    0.225982] swapper/0/1 is trying to acquire lock:
[    0.226013]  (&per_cpu(slab_lock, __cpu).lock){+.+...}, at: [<c0147544>] do_ccupdate_local+0x18/0x44
[    0.226043] 
[    0.226043] but task is already holding lock:
[    0.226043]  (&per_cpu(slab_lock, __cpu).lock){+.+...}, at: [<c014737c>] lock_slab_on+0x48/0x134
[    0.226074] 
[    0.226074] other info that might help us debug this:
[    0.226074]  Possible unsafe locking scenario:
[    0.226074] 
[    0.226074]        CPU0
[    0.226074]        ----
[    0.226074]   lock(&per_cpu(slab_lock, __cpu).lock);
[    0.226104]   lock(&per_cpu(slab_lock, __cpu).lock);
[    0.226104] 
[    0.226104]  *** DEADLOCK ***
[    0.226104] 
[    0.226104]  May be due to missing lock nesting notation
[    0.226104] 
[    0.226104] 2 locks held by swapper/0/1:
[    0.226135]  #0:  (cache_chain_mutex){+.+.+.}, at: [<c014a618>] kmem_cache_create+0x74/0x4bc
[    0.226135]  #1:  (&per_cpu(slab_lock, __cpu).lock){+.+...}, at: [<c014737c>] lock_slab_on+0x48/0x134
[    0.226165] 
[    0.226165] stack backtrace:
[    0.226196] [<c00681f8>] (unwind_backtrace+0x0/0xf0) from [<c00da918>] (__lock_acquire+0x1984/0x1ce8)
[    0.226196] [<c00da918>] (__lock_acquire+0x1984/0x1ce8) from [<c00db29c>] (lock_acquire+0x100/0x120)
[    0.226226] [<c00db29c>] (lock_acquire+0x100/0x120) from [<c0485c10>] (rt_spin_lock+0x4c/0x5c)
[    0.226257] [<c0485c10>] (rt_spin_lock+0x4c/0x5c) from [<c0147544>] (do_ccupdate_local+0x18/0x44)
[    0.226257] [<c0147544>] (do_ccupdate_local+0x18/0x44) from [<c01476e8>] (slab_on_each_cpu+0x2c/0x64)
[    0.226287] [<c01476e8>] (slab_on_each_cpu+0x2c/0x64) from [<c0149c70>] (do_tune_cpucache+0xd8/0x3e8)
[    0.226287] [<c0149c70>] (do_tune_cpucache+0xd8/0x3e8) from [<c014a154>] (enable_cpucache+0x50/0xcc)
[    0.226318] [<c014a154>] (enable_cpucache+0x50/0xcc) from [<c014a974>] (kmem_cache_create+0x3d0/0x4bc)
[    0.226318] [<c014a974>] (kmem_cache_create+0x3d0/0x4bc) from [<c0021e54>] (init_tmpfs+0x3c/0xe8)
[    0.226348] [<c0021e54>] (init_tmpfs+0x3c/0xe8) from [<c00083b4>] (kernel_init+0x80/0x150)
[    0.226379] [<c00083b4>] (kernel_init+0x80/0x150) from [<c0061e30>] (kernel_thread_exit+0x0/0x8)
[    0.239776] omap_hwmod: _populate_mpu_rt_base found no _mpu_rt_va for emif_fw
[    0.239776] omap_hwmod: _populate_mpu_rt_base found no _mpu_rt_va for l3_instr



Config is from arch/arm/configs/omap2plus_defconfig
plus:

   CONFIG_USB_EHCI_HCD=y
   CONFIG_USB_NET_SMSC95XX=y
   CONFIG_PREEMPT_RT_FULL=y


-Frank


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RT 05/12] slab: Prevent local lock deadlock
  2012-07-27  0:15   ` Frank Rowand
@ 2012-07-31  1:22     ` Steven Rostedt
  2012-07-31  2:22       ` Frank Rowand
  0 siblings, 1 reply; 26+ messages in thread
From: Steven Rostedt @ 2012-07-31  1:22 UTC (permalink / raw)
  To: frank.rowand
  Cc: tglx, chris.pringle, linux-kernel, linux-rt-users, Carsten Emde,
	John Kacur

On Thu, 2012-07-26 at 17:15 -0700, Frank Rowand wrote:

> 
> Config is from arch/arm/configs/omap2plus_defconfig
> plus:
> 
>    CONFIG_USB_EHCI_HCD=y
>    CONFIG_USB_NET_SMSC95XX=y
>    CONFIG_PREEMPT_RT_FULL=y
> 

Interesting, I just booted my panda board against 3.0.36-rt58 with that
config and these three set, and I didn't get this error.

-- Steve



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RT 05/12] slab: Prevent local lock deadlock
  2012-07-31  1:22     ` Steven Rostedt
@ 2012-07-31  2:22       ` Frank Rowand
  2012-07-31  2:32         ` Steven Rostedt
  0 siblings, 1 reply; 26+ messages in thread
From: Frank Rowand @ 2012-07-31  2:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Rowand, Frank, tglx, chris.pringle, linux-kernel, linux-rt-users,
	Carsten Emde, John Kacur

On 07/30/12 18:22, Steven Rostedt wrote:
> On Thu, 2012-07-26 at 17:15 -0700, Frank Rowand wrote:
> 
>>
>> Config is from arch/arm/configs/omap2plus_defconfig
>> plus:
>>
>>    CONFIG_USB_EHCI_HCD=y
>>    CONFIG_USB_NET_SMSC95XX=y
>>    CONFIG_PREEMPT_RT_FULL=y
>>
> 
> Interesting, I just booted my panda board against 3.0.36-rt58 with that
> config and these three set, and I didn't get this error.

I don't know if it makes any difference, but my root fs is nfs mounted.

I'll try to look at this some more tomorrow.

-Frank


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RT 05/12] slab: Prevent local lock deadlock
  2012-07-31  2:22       ` Frank Rowand
@ 2012-07-31  2:32         ` Steven Rostedt
  2012-07-31 19:00           ` Frank Rowand
  0 siblings, 1 reply; 26+ messages in thread
From: Steven Rostedt @ 2012-07-31  2:32 UTC (permalink / raw)
  To: frank.rowand
  Cc: Rowand, Frank, tglx, chris.pringle, linux-kernel, linux-rt-users,
	Carsten Emde, John Kacur

On Mon, 2012-07-30 at 19:22 -0700, Frank Rowand wrote:
> On 07/30/12 18:22, Steven Rostedt wrote:
> > On Thu, 2012-07-26 at 17:15 -0700, Frank Rowand wrote:
> > 
> >>
> >> Config is from arch/arm/configs/omap2plus_defconfig
> >> plus:
> >>
> >>    CONFIG_USB_EHCI_HCD=y
> >>    CONFIG_USB_NET_SMSC95XX=y
> >>    CONFIG_PREEMPT_RT_FULL=y
> >>
> > 
> > Interesting, I just booted my panda board against 3.0.36-rt58 with that
> > config and these three set, and I didn't get this error.
> 
> I don't know if it makes any difference, but my root fs is nfs mounted.
> 
> I'll try to look at this some more tomorrow.
> 

Yeah, my root fs is on the sdcard. Did you get this bug every time or
was it sporadic?

-- Steve



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RT 05/12] slab: Prevent local lock deadlock
  2012-07-31  2:32         ` Steven Rostedt
@ 2012-07-31 19:00           ` Frank Rowand
  2012-07-31 19:11             ` Steven Rostedt
  0 siblings, 1 reply; 26+ messages in thread
From: Frank Rowand @ 2012-07-31 19:00 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Rowand, Frank, tglx, chris.pringle, linux-kernel, linux-rt-users,
	Carsten Emde, John Kacur

On 07/30/12 19:32, Steven Rostedt wrote:
> On Mon, 2012-07-30 at 19:22 -0700, Frank Rowand wrote:
>> On 07/30/12 18:22, Steven Rostedt wrote:
>>> On Thu, 2012-07-26 at 17:15 -0700, Frank Rowand wrote:
>>>
>>>>
>>>> Config is from arch/arm/configs/omap2plus_defconfig
>>>> plus:
>>>>
>>>>    CONFIG_USB_EHCI_HCD=y
>>>>    CONFIG_USB_NET_SMSC95XX=y
>>>>    CONFIG_PREEMPT_RT_FULL=y
>>>>
>>>
>>> Interesting, I just booted my panda board against 3.0.36-rt58 with that
>>> config and these three set, and I didn't get this error.
>>
>> I don't know if it makes any difference, but my root fs is nfs mounted.
>>
>> I'll try to look at this some more tomorrow.
>>
> 
> Yeah, my root fs is on the sdcard. Did you get this bug every time or
> was it sporadic?

I get it every boot.

-Frank


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RT 05/12] slab: Prevent local lock deadlock
  2012-07-31 19:00           ` Frank Rowand
@ 2012-07-31 19:11             ` Steven Rostedt
  2012-07-31 21:52               ` Frank Rowand
  0 siblings, 1 reply; 26+ messages in thread
From: Steven Rostedt @ 2012-07-31 19:11 UTC (permalink / raw)
  To: frank.rowand
  Cc: Rowand, Frank, tglx, chris.pringle, linux-kernel, linux-rt-users,
	Carsten Emde, John Kacur

On Tue, 2012-07-31 at 12:00 -0700, Frank Rowand wrote:

> > Yeah, my root fs is on the sdcard. Did you get this bug every time or
> > was it sporadic?
> 
> I get it every boot.

Can you try it with an SD card? I can set it up for nfs as well, but
that would take a bit of time.

-- Steve



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RT 05/12] slab: Prevent local lock deadlock
  2012-07-31 19:11             ` Steven Rostedt
@ 2012-07-31 21:52               ` Frank Rowand
  0 siblings, 0 replies; 26+ messages in thread
From: Frank Rowand @ 2012-07-31 21:52 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Rowand, Frank, tglx, chris.pringle, linux-kernel, linux-rt-users,
	Carsten Emde, John Kacur

On 07/31/12 12:11, Steven Rostedt wrote:
> On Tue, 2012-07-31 at 12:00 -0700, Frank Rowand wrote:
> 
>>> Yeah, my root fs is on the sdcard. Did you get this bug every time or
>>> was it sporadic?
>>
>> I get it every boot.
> 
> Can you try it with an SD card? I can set it up for nfs as well, but
> that would take a bit of time.

That would turn into a project for me, not likely to get to it soon.
But I'll put it on my "try to get to" list.

-Frank


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH RT 09/12] cpu/rt: Fix cpu_hotplug variable initialization
  2012-07-17 15:31 [PATCH RT 00/12] [ANNOUNCE] 3.4.4-rt14-rc2 " Steven Rostedt
@ 2012-07-17 15:31 ` Steven Rostedt
  0 siblings, 0 replies; 26+ messages in thread
From: Steven Rostedt @ 2012-07-17 15:31 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users; +Cc: Thomas Gleixner, Carsten Emde, John Kacur

[-- Attachment #1: 0009-cpu-rt-Fix-cpu_hotplug-variable-initialization.patch --]
[-- Type: text/plain, Size: 842 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The commit "cpu/rt: Rework cpu down for PREEMPT_RT" changed the double
meaning of the cpu_hotplug.lock, where it was a spinlock for RT and a
mutex for non-RT, to just a mutex for both.  But the initialization of
the variable was not updated to reflect this change.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/cpu.c |    4 ----
 1 file changed, 4 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index c5b3273..3e722c0 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -54,11 +54,7 @@ static struct {
 	int refcount;
 } cpu_hotplug = {
 	.active_writer = NULL,
-#ifdef CONFIG_PREEMPT_RT_FULL
-	.lock = __SPIN_LOCK_UNLOCKED(cpu_hotplug.lock),
-#else
 	.lock = __MUTEX_INITIALIZER(cpu_hotplug.lock),
-#endif
 	.refcount = 0,
 };
 
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2012-07-31 21:52 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-18 22:39 [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review Steven Rostedt
2012-07-18 22:39 ` [PATCH RT 01/12] Latency histogramms: Cope with backwards running local trace clock Steven Rostedt
2012-07-18 22:39 ` [PATCH RT 02/12] Latency histograms: Adjust timer, if already elapsed when programmed Steven Rostedt
2012-07-18 22:39 ` [PATCH RT 03/12] Disable RT_GROUP_SCHED in PREEMPT_RT_FULL Steven Rostedt
2012-07-18 22:39 ` [PATCH RT 04/12] Latency histograms: Detect another yet overlooked sharedprio condition Steven Rostedt
2012-07-18 22:39 ` [PATCH RT 05/12] slab: Prevent local lock deadlock Steven Rostedt
2012-07-27  0:15   ` Frank Rowand
2012-07-31  1:22     ` Steven Rostedt
2012-07-31  2:22       ` Frank Rowand
2012-07-31  2:32         ` Steven Rostedt
2012-07-31 19:00           ` Frank Rowand
2012-07-31 19:11             ` Steven Rostedt
2012-07-31 21:52               ` Frank Rowand
2012-07-18 22:39 ` [PATCH RT 06/12] fs, jbd: pull your plug when waiting for space Steven Rostedt
2012-07-18 22:39 ` [PATCH RT 07/12] perf: Make swevent hrtimer run in irq instead of softirq Steven Rostedt
2012-07-18 22:39 ` [PATCH RT 08/12] cpu/rt: Rework cpu down for PREEMPT_RT Steven Rostedt
2012-07-18 22:39 ` [PATCH RT 09/12] cpu/rt: Fix cpu_hotplug variable initialization Steven Rostedt
2012-07-18 22:39 ` [PATCH RT 10/12] workqueue: Revert workqueue: Fix PF_THREAD_BOUND abuse Steven Rostedt
2012-07-18 22:39 ` [PATCH RT 11/12] workqueue: Revert workqueue: Fix cpuhotplug trainwreck Steven Rostedt
2012-07-18 22:39 ` [PATCH RT 12/12] Linux 3.0.36-rt58-rc1 Steven Rostedt
2012-07-19  4:00 ` [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review Mike Galbraith
2012-07-19 13:05   ` Steven Rostedt
2012-07-19 13:51     ` Mike Galbraith
2012-07-19 14:02       ` Steven Rostedt
2012-07-20  3:49         ` Mike Galbraith
  -- strict thread matches above, loose matches on Subject: below --
2012-07-17 15:31 [PATCH RT 00/12] [ANNOUNCE] 3.4.4-rt14-rc2 " Steven Rostedt
2012-07-17 15:31 ` [PATCH RT 09/12] cpu/rt: Fix cpu_hotplug variable initialization Steven Rostedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.