All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch RT 0/7] Various fixes for the stable RT series - part I
@ 2012-07-11 22:05 Thomas Gleixner
  2012-07-11 22:05 ` [patch RT 1/7] Latency histogramms: Cope with backwards running local trace clock Thomas Gleixner
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Thomas Gleixner @ 2012-07-11 22:05 UTC (permalink / raw)
  To: LKML; +Cc: Steven Rostedt, RT-users, Carsten Emde

The following patch series is a collection of bug fixes, which should
go into the 3.x based stable RT trees.

I have them locally applied to my 3.5 devel queue, but I'm still
distracted by other events (leap seconds and the like) to get a 3.5
devel queue released.

Steven, please pick up the lot.

I have a part II series pending which is addressing the widely
observed CPU hotplug issue. I'm going to send that out soon, despite
the fact that I fundamentaly hate it. Though the hotplug rework which
I have in the queue has turned out to be more work than expected and
there is no real way to backport all of the necessary changes into any
of the existing 3.x based RT trees. So I bite the bullet and go with
the workarounds to get rid of the last obstacles in the 3.x based RT
series.

Thanks,

	tglx




^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch RT 1/7] Latency histogramms: Cope with backwards running local trace clock
  2012-07-11 22:05 [patch RT 0/7] Various fixes for the stable RT series - part I Thomas Gleixner
@ 2012-07-11 22:05 ` Thomas Gleixner
  2012-07-11 22:05 ` [patch RT 3/7] Disable RT_GROUP_SCHED in PREEMPT_RT_FULL Thomas Gleixner
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2012-07-11 22:05 UTC (permalink / raw)
  To: LKML; +Cc: Steven Rostedt, RT-users, Carsten Emde

[-- Attachment #1: latency-histogramms-cope-with-backwards-running-local-trace-clock.patch --]
[-- Type: text/plain, Size: 9180 bytes --]

Thanks to the wonders of modern technology, the local trace clock can
now run backwards. Since this never happened before, the time difference
between now and somewhat earlier was expected to never become negative
and, thus, stored in an unsigned integer variable. Nowadays, we need a
signed integer to ensure that the value is stored as underflow in the
related histogram. (In cases where this is not a misfunction, bipolar
histograms can be used.)

This patch takes care that all latency variables are represented as
signed integers and negative numbers are considered as histogram
underflows.

In one of the misbehaving processors switching to global clock solved
the problem:
  echo global >/sys/kernel/debug/tracing/trace_clock

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 include/linux/sched.h       |    2 +-
 kernel/trace/latency_hist.c |   30 +++++++++++++++---------------
 2 files changed, 16 insertions(+), 16 deletions(-)

Index: linux-3.4.4-rt13/include/linux/sched.h
===================================================================
--- linux-3.4.4-rt13.orig/include/linux/sched.h
+++ linux-3.4.4-rt13/include/linux/sched.h
@@ -1629,7 +1629,7 @@ struct task_struct {
 #ifdef CONFIG_WAKEUP_LATENCY_HIST
 	u64 preempt_timestamp_hist;
 #ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST
-	unsigned long timer_offset;
+	long timer_offset;
 #endif
 #endif
 #endif /* CONFIG_TRACING */
Index: linux-3.4.4-rt13/kernel/trace/latency_hist.c
===================================================================
--- linux-3.4.4-rt13.orig/kernel/trace/latency_hist.c
+++ linux-3.4.4-rt13/kernel/trace/latency_hist.c
@@ -27,6 +27,8 @@
 #include "trace.h"
 #include <trace/events/sched.h>
 
+#define NSECS_PER_USECS 1000L
+
 #define CREATE_TRACE_POINTS
 #include <trace/events/hist.h>
 
@@ -46,11 +48,11 @@ enum {
 struct hist_data {
 	atomic_t hist_mode; /* 0 log, 1 don't log */
 	long offset; /* set it to MAX_ENTRY_NUM/2 for a bipolar scale */
-	unsigned long min_lat;
-	unsigned long max_lat;
+	long min_lat;
+	long max_lat;
 	unsigned long long below_hist_bound_samples;
 	unsigned long long above_hist_bound_samples;
-	unsigned long long accumulate_lat;
+	long long accumulate_lat;
 	unsigned long long total_samples;
 	unsigned long long hist_array[MAX_ENTRY_NUM];
 };
@@ -152,8 +154,8 @@ static struct enable_data timerandwakeup
 static DEFINE_PER_CPU(struct maxlatproc_data, timerandwakeup_maxlatproc);
 #endif
 
-void notrace latency_hist(int latency_type, int cpu, unsigned long latency,
-			  unsigned long timeroffset, cycle_t stop,
+void notrace latency_hist(int latency_type, int cpu, long latency,
+			  long timeroffset, cycle_t stop,
 			  struct task_struct *p)
 {
 	struct hist_data *my_hist;
@@ -224,7 +226,7 @@ void notrace latency_hist(int latency_ty
 		my_hist->hist_array[latency]++;
 
 	if (unlikely(latency > my_hist->max_lat ||
-	    my_hist->min_lat == ULONG_MAX)) {
+	    my_hist->min_lat == LONG_MAX)) {
 #if defined(CONFIG_WAKEUP_LATENCY_HIST) || \
     defined(CONFIG_MISSED_TIMER_OFFSETS_HIST)
 		if (latency_type == WAKEUP_LATENCY ||
@@ -263,15 +265,14 @@ static void *l_start(struct seq_file *m,
 		atomic_dec(&my_hist->hist_mode);
 
 		if (likely(my_hist->total_samples)) {
-			unsigned long avg = (unsigned long)
-			    div64_u64(my_hist->accumulate_lat,
+			long avg = (long) div64_s64(my_hist->accumulate_lat,
 			    my_hist->total_samples);
 			snprintf(minstr, sizeof(minstr), "%ld",
-			    (long) my_hist->min_lat - my_hist->offset);
+			    my_hist->min_lat - my_hist->offset);
 			snprintf(avgstr, sizeof(avgstr), "%ld",
-			    (long) avg - my_hist->offset);
+			    avg - my_hist->offset);
 			snprintf(maxstr, sizeof(maxstr), "%ld",
-			    (long) my_hist->max_lat - my_hist->offset);
+			    my_hist->max_lat - my_hist->offset);
 		} else {
 			strcpy(minstr, "<undef>");
 			strcpy(avgstr, minstr);
@@ -376,10 +377,10 @@ static void hist_reset(struct hist_data 
 	memset(hist->hist_array, 0, sizeof(hist->hist_array));
 	hist->below_hist_bound_samples = 0ULL;
 	hist->above_hist_bound_samples = 0ULL;
-	hist->min_lat = ULONG_MAX;
-	hist->max_lat = 0UL;
+	hist->min_lat = LONG_MAX;
+	hist->max_lat = LONG_MIN;
 	hist->total_samples = 0ULL;
-	hist->accumulate_lat = 0ULL;
+	hist->accumulate_lat = 0LL;
 
 	atomic_inc(&hist->hist_mode);
 }
@@ -790,9 +791,9 @@ static notrace void probe_preemptirqsoff
 
 			stop = ftrace_now(cpu);
 			time_set++;
-			if (start && stop >= start) {
-				unsigned long latency =
-				    nsecs_to_usecs(stop - start);
+			if (start) {
+				long latency = ((long) (stop - start)) /
+				    NSECS_PER_USECS;
 
 				latency_hist(IRQSOFF_LATENCY, cpu, latency, 0,
 				    stop, NULL);
@@ -808,9 +809,9 @@ static notrace void probe_preemptirqsoff
 
 			if (!(time_set++))
 				stop = ftrace_now(cpu);
-			if (start && stop >= start) {
-				unsigned long latency =
-				    nsecs_to_usecs(stop - start);
+			if (start) {
+				long latency = ((long) (stop - start)) /
+				    NSECS_PER_USECS;
 
 				latency_hist(PREEMPTOFF_LATENCY, cpu, latency,
 				    0, stop, NULL);
@@ -827,9 +828,10 @@ static notrace void probe_preemptirqsoff
 
 			if (!time_set)
 				stop = ftrace_now(cpu);
-			if (start && stop >= start) {
-				unsigned long latency =
-				    nsecs_to_usecs(stop - start);
+			if (start) {
+				long latency = ((long) (stop - start)) /
+				    NSECS_PER_USECS;
+
 				latency_hist(PREEMPTIRQSOFF_LATENCY, cpu,
 				    latency, 0, stop, NULL);
 			}
@@ -908,7 +910,7 @@ static notrace void probe_wakeup_latency
 {
 	unsigned long flags;
 	int cpu = task_cpu(next);
-	unsigned long latency;
+	long latency;
 	cycle_t stop;
 	struct task_struct *cpu_wakeup_task;
 
@@ -939,7 +941,8 @@ static notrace void probe_wakeup_latency
 	 */
 	stop = ftrace_now(raw_smp_processor_id());
 
-	latency = nsecs_to_usecs(stop - next->preempt_timestamp_hist);
+	latency = ((long) (stop - next->preempt_timestamp_hist)) /
+	    NSECS_PER_USECS;
 
 	if (per_cpu(wakeup_sharedprio, cpu)) {
 		latency_hist(WAKEUP_LATENCY_SHAREDPRIO, cpu, latency, 0, stop,
@@ -975,7 +978,7 @@ static notrace void probe_hrtimer_interr
 	    (task->prio < curr->prio ||
 	    (task->prio == curr->prio &&
 	    !cpumask_test_cpu(cpu, &task->cpus_allowed)))) {
-		unsigned long latency;
+		long latency;
 		cycle_t now;
 
 		if (missed_timer_offsets_pid) {
@@ -985,7 +988,7 @@ static notrace void probe_hrtimer_interr
 		}
 
 		now = ftrace_now(cpu);
-		latency = (unsigned long) div_s64(-latency_ns, 1000);
+		latency = (long) div_s64(-latency_ns, NSECS_PER_USECS);
 		latency_hist(MISSED_TIMER_OFFSETS, cpu, latency, latency, now,
 		    task);
 #ifdef CONFIG_WAKEUP_LATENCY_HIST
@@ -1026,7 +1029,7 @@ static __init int latency_hist_init(void
 		    &per_cpu(irqsoff_hist, i), &latency_hist_fops);
 		my_hist = &per_cpu(irqsoff_hist, i);
 		atomic_set(&my_hist->hist_mode, 1);
-		my_hist->min_lat = 0xFFFFFFFFUL;
+		my_hist->min_lat = LONG_MAX;
 	}
 	entry = debugfs_create_file("reset", 0644, dentry,
 	    (void *)IRQSOFF_LATENCY, &latency_hist_reset_fops);
@@ -1041,7 +1044,7 @@ static __init int latency_hist_init(void
 		    &per_cpu(preemptoff_hist, i), &latency_hist_fops);
 		my_hist = &per_cpu(preemptoff_hist, i);
 		atomic_set(&my_hist->hist_mode, 1);
-		my_hist->min_lat = 0xFFFFFFFFUL;
+		my_hist->min_lat = LONG_MAX;
 	}
 	entry = debugfs_create_file("reset", 0644, dentry,
 	    (void *)PREEMPTOFF_LATENCY, &latency_hist_reset_fops);
@@ -1056,7 +1059,7 @@ static __init int latency_hist_init(void
 		    &per_cpu(preemptirqsoff_hist, i), &latency_hist_fops);
 		my_hist = &per_cpu(preemptirqsoff_hist, i);
 		atomic_set(&my_hist->hist_mode, 1);
-		my_hist->min_lat = 0xFFFFFFFFUL;
+		my_hist->min_lat = LONG_MAX;
 	}
 	entry = debugfs_create_file("reset", 0644, dentry,
 	    (void *)PREEMPTIRQSOFF_LATENCY, &latency_hist_reset_fops);
@@ -1081,14 +1084,14 @@ static __init int latency_hist_init(void
 		    &latency_hist_fops);
 		my_hist = &per_cpu(wakeup_latency_hist, i);
 		atomic_set(&my_hist->hist_mode, 1);
-		my_hist->min_lat = 0xFFFFFFFFUL;
+		my_hist->min_lat = LONG_MAX;
 
 		entry = debugfs_create_file(name, 0444, dentry_sharedprio,
 		    &per_cpu(wakeup_latency_hist_sharedprio, i),
 		    &latency_hist_fops);
 		my_hist = &per_cpu(wakeup_latency_hist_sharedprio, i);
 		atomic_set(&my_hist->hist_mode, 1);
-		my_hist->min_lat = 0xFFFFFFFFUL;
+		my_hist->min_lat = LONG_MAX;
 
 		sprintf(name, cpufmt_maxlatproc, i);
 
@@ -1122,7 +1125,7 @@ static __init int latency_hist_init(void
 		    &per_cpu(missed_timer_offsets, i), &latency_hist_fops);
 		my_hist = &per_cpu(missed_timer_offsets, i);
 		atomic_set(&my_hist->hist_mode, 1);
-		my_hist->min_lat = 0xFFFFFFFFUL;
+		my_hist->min_lat = LONG_MAX;
 
 		sprintf(name, cpufmt_maxlatproc, i);
 		mp = &per_cpu(missed_timer_offsets_maxlatproc, i);
@@ -1150,7 +1153,7 @@ static __init int latency_hist_init(void
 		    &latency_hist_fops);
 		my_hist = &per_cpu(timerandwakeup_latency_hist, i);
 		atomic_set(&my_hist->hist_mode, 1);
-		my_hist->min_lat = 0xFFFFFFFFUL;
+		my_hist->min_lat = LONG_MAX;
 
 		sprintf(name, cpufmt_maxlatproc, i);
 		mp = &per_cpu(timerandwakeup_maxlatproc, i);






^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch RT 3/7] Disable RT_GROUP_SCHED in PREEMPT_RT_FULL
  2012-07-11 22:05 [patch RT 0/7] Various fixes for the stable RT series - part I Thomas Gleixner
  2012-07-11 22:05 ` [patch RT 1/7] Latency histogramms: Cope with backwards running local trace clock Thomas Gleixner
@ 2012-07-11 22:05 ` Thomas Gleixner
  2012-07-12  2:36   ` Mike Galbraith
  2012-07-11 22:05 ` [patch RT 2/7] Latency histograms: Adjust timer, if already elapsed when programmed Thomas Gleixner
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2012-07-11 22:05 UTC (permalink / raw)
  To: LKML; +Cc: Steven Rostedt, RT-users, Carsten Emde

[-- Attachment #1: disable-rt_group_sched-in-preempt_rt_full.patch --]
[-- Type: text/plain, Size: 709 bytes --]

Strange CPU stalls have been observed in RT when RT_GROUP_SCHED
was configured.

Disable it for now.

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 init/Kconfig |    1 +
 1 file changed, 1 insertion(+)

Index: linux-3.4.4-rt13-64+/init/Kconfig
===================================================================
--- linux-3.4.4-rt13-64+.orig/init/Kconfig
+++ linux-3.4.4-rt13-64+/init/Kconfig
@@ -746,6 +746,7 @@ config RT_GROUP_SCHED
 	bool "Group scheduling for SCHED_RR/FIFO"
 	depends on EXPERIMENTAL
 	depends on CGROUP_SCHED
+	depends on !PREEMPT_RT_FULL
 	default n
 	help
 	  This feature lets you explicitly allocate real CPU bandwidth






^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch RT 2/7] Latency histograms: Adjust timer, if already elapsed when programmed
  2012-07-11 22:05 [patch RT 0/7] Various fixes for the stable RT series - part I Thomas Gleixner
  2012-07-11 22:05 ` [patch RT 1/7] Latency histogramms: Cope with backwards running local trace clock Thomas Gleixner
  2012-07-11 22:05 ` [patch RT 3/7] Disable RT_GROUP_SCHED in PREEMPT_RT_FULL Thomas Gleixner
@ 2012-07-11 22:05 ` Thomas Gleixner
  2012-07-11 22:05 ` [patch RT 5/7] slab: Prevent local lock deadlock Thomas Gleixner
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2012-07-11 22:05 UTC (permalink / raw)
  To: LKML; +Cc: Steven Rostedt, RT-users, Carsten Emde

[-- Attachment #1: latency-histograms-adjust-timer-if-already-elapsed-when-programmed.patch --]
[-- Type: text/plain, Size: 2155 bytes --]

Nothing prevents a programmer from calling clock_nanosleep() with an
already elapsed wakeup time in absolute time mode or with a too small
delay in relative time mode. Such timers cannot wake up in time and,
thus, should be corrected when entered into the missed timers latency
histogram (CONFIG_MISSED_TIMERS_HIST).

This patch marks such timers and uses a corrected expiration time. 

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 include/linux/hrtimer.h |    3 +++
 kernel/hrtimer.c        |   16 ++++++++++++++--
 2 files changed, 17 insertions(+), 2 deletions(-)

Index: linux-3.4.2-rt10-64+/include/linux/hrtimer.h
===================================================================
--- linux-3.4.2-rt10-64+.orig/include/linux/hrtimer.h
+++ linux-3.4.2-rt10-64+/include/linux/hrtimer.h
@@ -113,6 +113,9 @@ struct hrtimer {
 	unsigned long			state;
 	struct list_head		cb_entry;
 	int				irqsafe;
+#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST
+	ktime_t 			praecox;
+#endif
 #ifdef CONFIG_TIMER_STATS
 	int				start_pid;
 	void				*start_site;
Index: linux-3.4.2-rt10-64+/kernel/hrtimer.c
===================================================================
--- linux-3.4.2-rt10-64+.orig/kernel/hrtimer.c
+++ linux-3.4.2-rt10-64+/kernel/hrtimer.c
@@ -1021,6 +1021,17 @@ int __hrtimer_start_range_ns(struct hrti
 #endif
 	}
 
+#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST
+	{
+		ktime_t now = new_base->get_time();
+
+		if (ktime_to_ns(tim) < ktime_to_ns(now))
+			timer->praecox = now;
+		else
+			timer->praecox = ktime_set(0, 0);
+	}
+#endif
+
 	hrtimer_set_expires_range_ns(timer, tim, delta_ns);
 
 	timer_stats_hrtimer_set_start_info(timer);
@@ -1458,8 +1469,9 @@ retry:
 			timer = container_of(node, struct hrtimer, node);
 
 			trace_hrtimer_interrupt(raw_smp_processor_id(),
-			    ktime_to_ns(ktime_sub(
-				hrtimer_get_expires(timer), basenow)),
+			    ktime_to_ns(ktime_sub(ktime_to_ns(timer->praecox) ?
+				timer->praecox : hrtimer_get_expires(timer),
+				basenow)),
 			    current,
 			    timer->function == hrtimer_wakeup ?
 			    container_of(timer, struct hrtimer_sleeper,






^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch RT 5/7] slab: Prevent local lock deadlock
  2012-07-11 22:05 [patch RT 0/7] Various fixes for the stable RT series - part I Thomas Gleixner
                   ` (2 preceding siblings ...)
  2012-07-11 22:05 ` [patch RT 2/7] Latency histograms: Adjust timer, if already elapsed when programmed Thomas Gleixner
@ 2012-07-11 22:05 ` Thomas Gleixner
  2012-07-11 22:05 ` [patch RT 4/7] Latency histograms: Detect another yet overlooked sharedprio condition Thomas Gleixner
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2012-07-11 22:05 UTC (permalink / raw)
  To: LKML; +Cc: Steven Rostedt, RT-users, Carsten Emde

[-- Attachment #1: slab-fix-local-lock-wreckage.patch --]
[-- Type: text/plain, Size: 1933 bytes --]

On RT we avoid the cross cpu function calls and take the per cpu local
locks instead. Now the code missed that taking the local lock on the
cpu which runs the code must use the proper local lock functions and
not a simple spin_lock(). Otherwise it deadlocks later when trying to
acquire the local lock with the proper function.

Reported-and-tested-by: Chris Pringle <chris.pringle@miranda.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 mm/slab.c |   26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

Index: linux-stable-rt/mm/slab.c
===================================================================
--- linux-stable-rt.orig/mm/slab.c
+++ linux-stable-rt/mm/slab.c
@@ -743,8 +743,26 @@ slab_on_each_cpu(void (*func)(void *arg,
 {
 	unsigned int i;
 
+	get_cpu_light();
 	for_each_online_cpu(i)
 		func(arg, i);
+	put_cpu_light();
+}
+
+static void lock_slab_on(unsigned int cpu)
+{
+	if (cpu == smp_processor_id())
+		local_lock_irq(slab_lock);
+	else
+		local_spin_lock_irq(slab_lock, &per_cpu(slab_lock, cpu).lock);
+}
+
+static void unlock_slab_on(unsigned int cpu)
+{
+	if (cpu == smp_processor_id())
+		local_unlock_irq(slab_lock);
+	else
+		local_spin_unlock_irq(slab_lock, &per_cpu(slab_lock, cpu).lock);
 }
 #endif
 
@@ -2692,10 +2710,10 @@ static void do_drain(void *arg, int cpu)
 {
 	LIST_HEAD(tmp);
 
-	spin_lock_irq(&per_cpu(slab_lock, cpu).lock);
+	lock_slab_on(cpu);
 	__do_drain(arg, cpu);
 	list_splice_init(&per_cpu(slab_free_list, cpu), &tmp);
-	spin_unlock_irq(&per_cpu(slab_lock, cpu).lock);
+	unlock_slab_on(cpu);
 	free_delayed(&tmp);
 }
 #endif
@@ -4163,9 +4181,9 @@ static void do_ccupdate_local(void *info
 #else
 static void do_ccupdate_local(void *info, int cpu)
 {
-	spin_lock_irq(&per_cpu(slab_lock, cpu).lock);
+	lock_slab_on(cpu);
 	__do_ccupdate_local(info, cpu);
-	spin_unlock_irq(&per_cpu(slab_lock, cpu).lock);
+	unlock_slab_on(cpu);
 }
 #endif
 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch RT 4/7] Latency histograms: Detect another yet overlooked sharedprio condition
  2012-07-11 22:05 [patch RT 0/7] Various fixes for the stable RT series - part I Thomas Gleixner
                   ` (3 preceding siblings ...)
  2012-07-11 22:05 ` [patch RT 5/7] slab: Prevent local lock deadlock Thomas Gleixner
@ 2012-07-11 22:05 ` Thomas Gleixner
  2012-07-11 22:05 ` [patch RT 6/7] fs, jbd: pull your plug when waiting for space Thomas Gleixner
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2012-07-11 22:05 UTC (permalink / raw)
  To: LKML; +Cc: Steven Rostedt, RT-users, Carsten Emde

[-- Attachment #1: latency-histograms-detect-another-yet-overlooked-sharedprio-condition.patch --]
[-- Type: text/plain, Size: 1109 bytes --]

While waiting for an RT process to be woken up, the previous process may
go to wait and switch to another one with the same priority which then
becomes current. This condition was not correctly recognized and led to
erroneously high latency recordings during periods of low CPU load.

This patch correctly marks such latencies as sharedprio and prevents
them from being recorded as actual system latency. 

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/trace/latency_hist.c |    3 +++
 1 file changed, 3 insertions(+)

Index: linux-3.4.2-rt10-64+/kernel/trace/latency_hist.c
===================================================================
--- linux-3.4.2-rt10-64+.orig/kernel/trace/latency_hist.c
+++ linux-3.4.2-rt10-64+/kernel/trace/latency_hist.c
@@ -935,6 +935,9 @@ static notrace void probe_wakeup_latency
 		goto out;
 	}
 
+	if (current->prio == cpu_wakeup_task->prio)
+		per_cpu(wakeup_sharedprio, cpu) = 1;
+
 	/*
 	 * The task we are waiting for is about to be switched to.
 	 * Calculate latency and store it in histogram.






^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch RT 6/7] fs, jbd: pull your plug when waiting for space
  2012-07-11 22:05 [patch RT 0/7] Various fixes for the stable RT series - part I Thomas Gleixner
                   ` (4 preceding siblings ...)
  2012-07-11 22:05 ` [patch RT 4/7] Latency histograms: Detect another yet overlooked sharedprio condition Thomas Gleixner
@ 2012-07-11 22:05 ` Thomas Gleixner
  2012-07-11 22:05 ` [patch RT 7/7] perf: Make swevent hrtimer run in irq instead of softirq Thomas Gleixner
  2012-07-11 23:31 ` [patch RT 0/7] Various fixes for the stable RT series - part I Steven Rostedt
  7 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2012-07-11 22:05 UTC (permalink / raw)
  To: LKML; +Cc: Steven Rostedt, RT-users, Carsten Emde, Mike Galbraith, Theodore Tso

[-- Attachment #1: fs-jbd-pull-plug-when-waiting.patch --]
[-- Type: text/plain, Size: 950 bytes --]

With an -rt kernel, and a heavy sync IO load, tasks can jam
up on journal locks without unplugging, which can lead to
terminal IO starvation.  Unplug and schedule when waiting for space.

Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Theodore Tso <tytso@mit.edu>
Link: http://lkml.kernel.org/r/1341812414.7370.73.camel@marge.simpson.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 fs/jbd/checkpoint.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-stable-rt/fs/jbd/checkpoint.c
===================================================================
--- linux-stable-rt.orig/fs/jbd/checkpoint.c
+++ linux-stable-rt/fs/jbd/checkpoint.c
@@ -129,6 +129,8 @@ void __log_wait_for_space(journal_t *jou
 		if (journal->j_flags & JFS_ABORT)
 			return;
 		spin_unlock(&journal->j_state_lock);
+		if (current->plug)
+			io_schedule();
 		mutex_lock(&journal->j_checkpoint_mutex);
 
 		/*



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch RT 7/7] perf: Make swevent hrtimer run in irq instead of softirq
  2012-07-11 22:05 [patch RT 0/7] Various fixes for the stable RT series - part I Thomas Gleixner
                   ` (5 preceding siblings ...)
  2012-07-11 22:05 ` [patch RT 6/7] fs, jbd: pull your plug when waiting for space Thomas Gleixner
@ 2012-07-11 22:05 ` Thomas Gleixner
  2012-07-11 23:31 ` [patch RT 0/7] Various fixes for the stable RT series - part I Steven Rostedt
  7 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2012-07-11 22:05 UTC (permalink / raw)
  To: LKML; +Cc: Steven Rostedt, RT-users, Carsten Emde, Yong Zhang, Peter Zijlstra

[-- Attachment #1: perf-make-swevent-hrtimer-run-in-irq-instead-of-softirq.patch --]
[-- Type: text/plain, Size: 3573 bytes --]

From: Yong Zhang <yong.zhang@windriver.com>

Otherwise we get a deadlock like below:

[ 1044.042749] BUG: scheduling while atomic: ksoftirqd/21/141/0x00010003
[ 1044.042752] INFO: lockdep is turned off.
[ 1044.042754] Modules linked in:
[ 1044.042757] Pid: 141, comm: ksoftirqd/21 Tainted: G        W    3.4.0-rc2-rt3-23676-ga723175-dirty #29
[ 1044.042759] Call Trace:
[ 1044.042761]  <IRQ>  [<ffffffff8107d8e5>] __schedule_bug+0x65/0x80
[ 1044.042770]  [<ffffffff8168978c>] __schedule+0x83c/0xa70
[ 1044.042775]  [<ffffffff8106bdd2>] ? prepare_to_wait+0x32/0xb0
[ 1044.042779]  [<ffffffff81689a5e>] schedule+0x2e/0xa0
[ 1044.042782]  [<ffffffff81071ebd>] hrtimer_wait_for_timer+0x6d/0xb0
[ 1044.042786]  [<ffffffff8106bb30>] ? wake_up_bit+0x40/0x40
[ 1044.042790]  [<ffffffff81071f20>] hrtimer_cancel+0x20/0x40
[ 1044.042794]  [<ffffffff8111da0c>] perf_swevent_cancel_hrtimer+0x3c/0x50
[ 1044.042798]  [<ffffffff8111da31>] task_clock_event_stop+0x11/0x40
[ 1044.042802]  [<ffffffff8111da6e>] task_clock_event_del+0xe/0x10
[ 1044.042805]  [<ffffffff8111c568>] event_sched_out+0x118/0x1d0
[ 1044.042809]  [<ffffffff8111c649>] group_sched_out+0x29/0x90
[ 1044.042813]  [<ffffffff8111ed7e>] __perf_event_disable+0x18e/0x200
[ 1044.042817]  [<ffffffff8111c343>] remote_function+0x63/0x70
[ 1044.042821]  [<ffffffff810b0aae>] generic_smp_call_function_single_interrupt+0xce/0x120
[ 1044.042826]  [<ffffffff81022bc7>] smp_call_function_single_interrupt+0x27/0x40
[ 1044.042831]  [<ffffffff8168d50c>] call_function_single_interrupt+0x6c/0x80
[ 1044.042833]  <EOI>  [<ffffffff811275b0>] ? perf_event_overflow+0x20/0x20
[ 1044.042840]  [<ffffffff8168b970>] ? _raw_spin_unlock_irq+0x30/0x70
[ 1044.042844]  [<ffffffff8168b976>] ? _raw_spin_unlock_irq+0x36/0x70
[ 1044.042848]  [<ffffffff810702e2>] run_hrtimer_softirq+0xc2/0x200
[ 1044.042853]  [<ffffffff811275b0>] ? perf_event_overflow+0x20/0x20
[ 1044.042857]  [<ffffffff81045265>] __do_softirq_common+0xf5/0x3a0
[ 1044.042862]  [<ffffffff81045c3d>] __thread_do_softirq+0x15d/0x200
[ 1044.042865]  [<ffffffff81045dda>] run_ksoftirqd+0xfa/0x210
[ 1044.042869]  [<ffffffff81045ce0>] ? __thread_do_softirq+0x200/0x200
[ 1044.042873]  [<ffffffff81045ce0>] ? __thread_do_softirq+0x200/0x200
[ 1044.042877]  [<ffffffff8106b596>] kthread+0xb6/0xc0
[ 1044.042881]  [<ffffffff8168b97b>] ? _raw_spin_unlock_irq+0x3b/0x70
[ 1044.042886]  [<ffffffff8168d994>] kernel_thread_helper+0x4/0x10
[ 1044.042889]  [<ffffffff8107d98c>] ? finish_task_switch+0x8c/0x110
[ 1044.042894]  [<ffffffff8168b97b>] ? _raw_spin_unlock_irq+0x3b/0x70
[ 1044.042897]  [<ffffffff8168bd5d>] ? retint_restore_args+0xe/0xe
[ 1044.042900]  [<ffffffff8106b4e0>] ? kthreadd+0x1e0/0x1e0
[ 1044.042902]  [<ffffffff8168d990>] ? gs_change+0xb/0xb

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1341476476-5666-1-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/events/core.c |    1 +
 1 file changed, 1 insertion(+)

Index: linux-stable-rt/kernel/events/core.c
===================================================================
--- linux-stable-rt.orig/kernel/events/core.c
+++ linux-stable-rt/kernel/events/core.c
@@ -5403,6 +5403,7 @@ static void perf_swevent_init_hrtimer(st
 
 	hrtimer_init(&hwc->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
 	hwc->hrtimer.function = perf_swevent_hrtimer;
+	hwc->hrtimer.irqsafe = 1;
 
 	/*
 	 * Since hrtimers have a fixed rate, we can do a static freq->period



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch RT 0/7] Various fixes for the stable RT series - part I
  2012-07-11 22:05 [patch RT 0/7] Various fixes for the stable RT series - part I Thomas Gleixner
                   ` (6 preceding siblings ...)
  2012-07-11 22:05 ` [patch RT 7/7] perf: Make swevent hrtimer run in irq instead of softirq Thomas Gleixner
@ 2012-07-11 23:31 ` Steven Rostedt
  2012-07-12  8:49   ` Thomas Gleixner
  7 siblings, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2012-07-11 23:31 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, RT-users, Carsten Emde

On Wed, 2012-07-11 at 22:05 +0000, Thomas Gleixner wrote:
> The following patch series is a collection of bug fixes, which should
> go into the 3.x based stable RT trees.
> 
> I have them locally applied to my 3.5 devel queue, but I'm still
> distracted by other events (leap seconds and the like) to get a 3.5
> devel queue released.
> 
> Steven, please pick up the lot.
> 
> I have a part II series pending which is addressing the widely
> observed CPU hotplug issue. I'm going to send that out soon, despite
> the fact that I fundamentaly hate it. Though the hotplug rework which
> I have in the queue has turned out to be more work than expected and
> there is no real way to backport all of the necessary changes into any
> of the existing 3.x based RT trees. So I bite the bullet and go with
> the workarounds to get rid of the last obstacles in the 3.x based RT
> series.

I'll pull these in and start running my tests. I noticed that some of
them are Signed-off-by Carsten first. Did he author them? If so, I'll
make it so his name gets the git author.

Same for Mike Galbraith on patch 6. Only patch 7 has a 'From:' tag.

-- Steve


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch RT 3/7] Disable RT_GROUP_SCHED in PREEMPT_RT_FULL
  2012-07-11 22:05 ` [patch RT 3/7] Disable RT_GROUP_SCHED in PREEMPT_RT_FULL Thomas Gleixner
@ 2012-07-12  2:36   ` Mike Galbraith
  0 siblings, 0 replies; 11+ messages in thread
From: Mike Galbraith @ 2012-07-12  2:36 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, Steven Rostedt, RT-users, Carsten Emde

On Wed, 2012-07-11 at 22:05 +0000, Thomas Gleixner wrote:
> plain text document attachment
> (disable-rt_group_sched-in-preempt_rt_full.patch)
> Strange CPU stalls have been observed in RT when RT_GROUP_SCHED
> was configured.
> 
> Disable it for now.
> 
> Signed-off-by: Carsten Emde <C.Emde@osadl.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> 
> ---
>  init/Kconfig |    1 +
>  1 file changed, 1 insertion(+)
> 
> Index: linux-3.4.4-rt13-64+/init/Kconfig
> ===================================================================
> --- linux-3.4.4-rt13-64+.orig/init/Kconfig
> +++ linux-3.4.4-rt13-64+/init/Kconfig
> @@ -746,6 +746,7 @@ config RT_GROUP_SCHED
>  	bool "Group scheduling for SCHED_RR/FIFO"
>  	depends on EXPERIMENTAL
>  	depends on CGROUP_SCHED
> +	depends on !PREEMPT_RT_FULL
>  	default n
>  	help
>  	  This feature lets you explicitly allocate real CPU bandwidth
> 
> 
> 
> 
> 

I turn the thing off because it doesn't make any sense to me for -rt,
and because it's busted.  The below works around isolation bustage I
encountered.  Peter didn't like it (what's to like?) but it saves the
day, so shall live on in non-rt kernels until I hopefully someday see
RT_GROUP_SCHED being fed into a Bitwolf-9000 ;-)

sched,rt: fix isolated CPUs leaving root_task_group indefinitely throttled

Root task group bandwidth replentishment must service all CPUs regardless of
where it was last started.

Signed-off-by: Mike Galbraith <efault@gmx.de>
---
 kernel/sched/rt.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -782,6 +782,19 @@ static int do_sched_rt_period_timer(stru
 	const struct cpumask *span;
 
 	span = sched_rt_period_mask();
+#ifdef CONFIG_RT_GROUP_SCHED
+	/*
+	 * FIXME: isolated CPUs should really leave the root task group,
+	 * whether they are isolcpus or were isolated via cpusets, lest
+	 * the timer run on a CPU which does not service all runqueues,
+	 * potentially leaving other CPUs indefinitely throttled.  If
+	 * isolation is really required, the user will turn the throttle
+	 * off to kill the perturbations it causes anyway.  Meanwhile,
+	 * this maintains functionallity for boot and/or troubleshooting.
+	 */
+	if (rt_b == &root_task_group.rt_bandwidth)
+		span = cpu_online_mask;
+#endif
 	for_each_cpu(i, span) {
 		int enqueue = 0;
 		struct rt_rq *rt_rq = sched_rt_period_rt_rq(rt_b, i);



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch RT 0/7] Various fixes for the stable RT series - part I
  2012-07-11 23:31 ` [patch RT 0/7] Various fixes for the stable RT series - part I Steven Rostedt
@ 2012-07-12  8:49   ` Thomas Gleixner
  0 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2012-07-12  8:49 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, RT-users, Carsten Emde

On Wed, 11 Jul 2012, Steven Rostedt wrote:

> On Wed, 2012-07-11 at 22:05 +0000, Thomas Gleixner wrote:
> > The following patch series is a collection of bug fixes, which should
> > go into the 3.x based stable RT trees.
> > 
> > I have them locally applied to my 3.5 devel queue, but I'm still
> > distracted by other events (leap seconds and the like) to get a 3.5
> > devel queue released.
> > 
> > Steven, please pick up the lot.
> > 
> > I have a part II series pending which is addressing the widely
> > observed CPU hotplug issue. I'm going to send that out soon, despite
> > the fact that I fundamentaly hate it. Though the hotplug rework which
> > I have in the queue has turned out to be more work than expected and
> > there is no real way to backport all of the necessary changes into any
> > of the existing 3.x based RT trees. So I bite the bullet and go with
> > the workarounds to get rid of the last obstacles in the 3.x based RT
> > series.
> 
> I'll pull these in and start running my tests. I noticed that some of
> them are Signed-off-by Carsten first. Did he author them? If so, I'll
> make it so his name gets the git author.
> 
> Same for Mike Galbraith on patch 6. Only patch 7 has a 'From:' tag.

Gah. Forgot to add the From lines. 1-4 are from Carsten, 5 is mine, 6
is Mike and 7 is Yong.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-07-12  8:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-11 22:05 [patch RT 0/7] Various fixes for the stable RT series - part I Thomas Gleixner
2012-07-11 22:05 ` [patch RT 1/7] Latency histogramms: Cope with backwards running local trace clock Thomas Gleixner
2012-07-11 22:05 ` [patch RT 3/7] Disable RT_GROUP_SCHED in PREEMPT_RT_FULL Thomas Gleixner
2012-07-12  2:36   ` Mike Galbraith
2012-07-11 22:05 ` [patch RT 2/7] Latency histograms: Adjust timer, if already elapsed when programmed Thomas Gleixner
2012-07-11 22:05 ` [patch RT 5/7] slab: Prevent local lock deadlock Thomas Gleixner
2012-07-11 22:05 ` [patch RT 4/7] Latency histograms: Detect another yet overlooked sharedprio condition Thomas Gleixner
2012-07-11 22:05 ` [patch RT 6/7] fs, jbd: pull your plug when waiting for space Thomas Gleixner
2012-07-11 22:05 ` [patch RT 7/7] perf: Make swevent hrtimer run in irq instead of softirq Thomas Gleixner
2012-07-11 23:31 ` [patch RT 0/7] Various fixes for the stable RT series - part I Steven Rostedt
2012-07-12  8:49   ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.