linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/2] NOHZ vs. profile/oprofile
@ 2009-05-28 15:04 Martin Schwidefsky
  2009-05-28 15:04 ` [patch 1/2] idle profile hits with NOHZ Martin Schwidefsky
  2009-05-28 15:04 ` [patch 2/2] keep on ticking if oprofile is active Martin Schwidefsky
  0 siblings, 2 replies; 12+ messages in thread
From: Martin Schwidefsky @ 2009-05-28 15:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Rob van der Heij, Heiko Carstens, Ingo Molnar, Thomas Gleixner,
	john stultz

Greetings,
Rob pointed me to a deficiency with the current profile/oprofile
code together with NOHZ. For us this problem crept in with the
conversion of s390 to generic clock events, git commit 5a62b192
If the system is running with the HZ-tick disabled and the cpu spents
time in idle we see skewed percentages e.g. with the oprofile output.
On an I/O bound system the number of idle ticks is way to small. The
reason is that the generic clock events code reports either zero or
one tick to profile/oprofile on wakeup from idle even if the cpu has
slept much longer.
I've tried to fix that with the two patches in this series and
another pure s390 specific fix (profile_tick is called from
clock_comparator_work which is nonsense). These three do correct the
oprofile output on s390. The best idea I had to get oprofile in good
shape again is to let the system tick at the HZ rate while oprofile
is working. Better ideas welcome.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch 1/2] idle profile hits with NOHZ
  2009-05-28 15:04 [patch 0/2] NOHZ vs. profile/oprofile Martin Schwidefsky
@ 2009-05-28 15:04 ` Martin Schwidefsky
  2009-05-28 20:19   ` Thomas Gleixner
  2009-05-28 15:04 ` [patch 2/2] keep on ticking if oprofile is active Martin Schwidefsky
  1 sibling, 1 reply; 12+ messages in thread
From: Martin Schwidefsky @ 2009-05-28 15:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Rob van der Heij, Heiko Carstens, Ingo Molnar, Thomas Gleixner,
	john stultz, Martin Schwidefsky

[-- Attachment #1: profile-idle-tick.diff --]
[-- Type: text/plain, Size: 2963 bytes --]

From: Martin Schwidefsky <schwidefsky@de.ibm.com>

On a NOHZ system the generic clock events code reports a maximum
of 1 tick to the kernel profiler. If the system slept for more than
a single tick or has not been woken by a timer interrupt the number
of ticks reported will be incorrect. This skewes the percentages in
the profile. A good place to report the profile ticks is in
tick_nohz_restart_sched_tick. We calculate the number of ticks for
the cpu accounting anyway. The only obstacle is that we need an
instruction address for the profile ticks. In order to get one
extend the tick_sched structure by an idle_pc field and set it in
tick_nohz_stop_idle which is called from tick_check_idle.
Note: this does not solve the same problem in oprofile.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---

 include/linux/tick.h     |    1 +
 kernel/time/tick-sched.c |   13 ++++++++-----
 2 files changed, 9 insertions(+), 5 deletions(-)

Index: quilt-2.6/include/linux/tick.h
===================================================================
--- quilt-2.6.orig/include/linux/tick.h
+++ quilt-2.6/include/linux/tick.h
@@ -64,6 +64,7 @@ struct tick_sched {
 	unsigned long			last_jiffies;
 	unsigned long			next_jiffies;
 	ktime_t				idle_expires;
+	unsigned long			idle_pc;
 };
 
 extern void __init tick_init(void);
Index: quilt-2.6/kernel/time/tick-sched.c
===================================================================
--- quilt-2.6.orig/kernel/time/tick-sched.c
+++ quilt-2.6/kernel/time/tick-sched.c
@@ -166,6 +166,7 @@ static void tick_nohz_stop_idle(int cpu)
 		ts->idle_lastupdate = now;
 		ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
 		ts->idle_active = 0;
+		ts->idle_pc = profile_pc(get_irq_regs());
 
 		sched_clock_idle_wakeup_event(0);
 	}
@@ -419,9 +420,7 @@ void tick_nohz_restart_sched_tick(void)
 {
 	int cpu = smp_processor_id();
 	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
-#ifndef CONFIG_VIRT_CPU_ACCOUNTING
 	unsigned long ticks;
-#endif
 	ktime_t now;
 
 	local_irq_disable();
@@ -443,19 +442,23 @@ void tick_nohz_restart_sched_tick(void)
 	tick_do_update_jiffies64(now);
 	cpumask_clear_cpu(cpu, nohz_cpu_mask);
 
+	ticks = jiffies - ts->idle_jiffies;
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING
 	/*
 	 * We stopped the tick in idle. Update process times would miss the
 	 * time we slept as update_process_times does only a 1 tick
 	 * accounting. Enforce that this is accounted to idle !
-	 */
-	ticks = jiffies - ts->idle_jiffies;
-	/*
+	 *
 	 * We might be one off. Do not randomly account a huge number of ticks!
 	 */
 	if (ticks && ticks < LONG_MAX)
 		account_idle_ticks(ticks);
 #endif
+#ifdef CONFIG_PROFILING
+	/* Create profile hits for all ticks we slept in idle. */
+	if (ticks && ticks < LONG_MAX)
+		profile_hits(CPU_PROFILING, (void *) ts->idle_pc, ticks);
+#endif
 
 	touch_softlockup_watchdog();
 	/*

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch 2/2] keep on ticking if oprofile is active
  2009-05-28 15:04 [patch 0/2] NOHZ vs. profile/oprofile Martin Schwidefsky
  2009-05-28 15:04 ` [patch 1/2] idle profile hits with NOHZ Martin Schwidefsky
@ 2009-05-28 15:04 ` Martin Schwidefsky
  2009-05-28 20:29   ` Thomas Gleixner
  2009-06-01  8:09   ` Andi Kleen
  1 sibling, 2 replies; 12+ messages in thread
From: Martin Schwidefsky @ 2009-05-28 15:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Rob van der Heij, Heiko Carstens, Ingo Molnar, Thomas Gleixner,
	john stultz, Martin Schwidefsky

[-- Attachment #1: profile-stop-nohz.diff --]
[-- Type: text/plain, Size: 3958 bytes --]

From: Martin Schwidefsky <schwidefsky@de.ibm.com>

On a NOHZ system with oprofile enabled the timer tick should not be
stopped when a cpu goes idle. Oprofile needs the pt_regs structure
of the interrupt and allocates memory in the ring buffer for each
sample. Current a maximum of 1 tick is accounted with oprofile if a
cpu sleeps for a longer period of time. This does bad things to the
percentages in the oprofile output. To postpone the oprofile tick to
tick_nohz_restart_sched_tick analog to the in kernel profiler is not
possible as there is no pt_regs structure in the context the
tick_nohz_restart_sched_tick function is called and it is not a good
idea to create hundreds of samples at once.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---

 drivers/oprofile/oprof.c |    3 +++
 include/linux/tick.h     |    4 ++++
 kernel/time/tick-sched.c |   26 +++++++++++++++++++++++++-
 3 files changed, 32 insertions(+), 1 deletion(-)

Index: quilt-2.6/drivers/oprofile/oprof.c
===================================================================
--- quilt-2.6.orig/drivers/oprofile/oprof.c
+++ quilt-2.6/drivers/oprofile/oprof.c
@@ -12,6 +12,7 @@
 #include <linux/init.h>
 #include <linux/oprofile.h>
 #include <linux/moduleparam.h>
+#include <linux/tick.h>
 #include <asm/mutex.h>
 
 #include "oprof.h"
@@ -103,6 +104,7 @@ int oprofile_start(void)
 	if (oprofile_started)
 		goto out;
 
+	tick_nohz_disable();
 	oprofile_reset_stats();
 
 	if ((err = oprofile_ops.start()))
@@ -123,6 +125,7 @@ void oprofile_stop(void)
 		goto out;
 	oprofile_ops.stop();
 	oprofile_started = 0;
+	tick_nohz_enable();
 	/* wake up the daemon to read what remains */
 	wake_up_buffer_waiter();
 out:
Index: quilt-2.6/include/linux/tick.h
===================================================================
--- quilt-2.6.orig/include/linux/tick.h
+++ quilt-2.6/include/linux/tick.h
@@ -117,6 +117,8 @@ extern void tick_nohz_stop_sched_tick(in
 extern void tick_nohz_restart_sched_tick(void);
 extern ktime_t tick_nohz_get_sleep_length(void);
 extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
+extern void tick_nohz_enable(void);
+extern void tick_nohz_disable(void);
 # else
 static inline void tick_nohz_stop_sched_tick(int inidle) { }
 static inline void tick_nohz_restart_sched_tick(void) { }
@@ -127,6 +129,8 @@ static inline ktime_t tick_nohz_get_slee
 	return len;
 }
 static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return -1; }
+static inline void tick_nohz_enable(void) { }
+static inline void tick_nohz_disable(void) { }
 # endif /* !NO_HZ */
 
 #endif
Index: quilt-2.6/kernel/time/tick-sched.c
===================================================================
--- quilt-2.6.orig/kernel/time/tick-sched.c
+++ quilt-2.6/kernel/time/tick-sched.c
@@ -124,6 +124,29 @@ static int __init setup_tick_nohz(char *
 
 __setup("nohz=", setup_tick_nohz);
 
+/*
+ * NO HZ currently disabled ?
+ */
+static atomic_t tick_nohz_disable_counter = ATOMIC_INIT(0);
+
+void tick_nohz_enable(void)
+{
+	atomic_dec(&tick_nohz_disable_counter);
+}
+EXPORT_SYMBOL_GPL(tick_nohz_enable);
+
+static void __tick_nohz_disable(void *dummy)
+{
+}
+
+void tick_nohz_disable(void)
+{
+	if (atomic_inc_return(&tick_nohz_disable_counter) == 1)
+		/* Wake up all cpus to make them start ticking. */
+		smp_call_function(__tick_nohz_disable, NULL, 0);
+}
+EXPORT_SYMBOL_GPL(tick_nohz_disable);
+
 /**
  * tick_nohz_update_jiffies - update jiffies when idle was interrupted
  *
@@ -271,7 +294,8 @@ void tick_nohz_stop_sched_tick(int inidl
 	next_jiffies = get_next_timer_interrupt(last_jiffies);
 	delta_jiffies = next_jiffies - last_jiffies;
 
-	if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu))
+	if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu) ||
+	    atomic_read(&tick_nohz_disable_counter) > 0)
 		delta_jiffies = 1;
 	/*
 	 * Do not stop the tick, if we are only one off

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 1/2] idle profile hits with NOHZ
  2009-05-28 15:04 ` [patch 1/2] idle profile hits with NOHZ Martin Schwidefsky
@ 2009-05-28 20:19   ` Thomas Gleixner
  2009-05-29 12:56     ` Martin Schwidefsky
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2009-05-28 20:19 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: linux-kernel, Rob van der Heij, Heiko Carstens, Ingo Molnar, john stultz

On Thu, 28 May 2009, Martin Schwidefsky wrote:
> --- quilt-2.6.orig/kernel/time/tick-sched.c
> +++ quilt-2.6/kernel/time/tick-sched.c
> @@ -166,6 +166,7 @@ static void tick_nohz_stop_idle(int cpu)
>  		ts->idle_lastupdate = now;
>  		ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
>  		ts->idle_active = 0;
> +		ts->idle_pc = profile_pc(get_irq_regs());

  Hmm, tick_nohz_stop_idle() is called from
  tick_nohz_restart_sched_tick() as well in the context of the idle
  task.

  I think there is no guarantee that get_irq_regs() will return
  anything useful in thread context.

  So get_irq_regs() might return a NULL pointer which will explode
  some of the profile_pc() implementations. If not it can still feed
  total nonsense to the profile_hits() call.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 2/2] keep on ticking if oprofile is active
  2009-05-28 15:04 ` [patch 2/2] keep on ticking if oprofile is active Martin Schwidefsky
@ 2009-05-28 20:29   ` Thomas Gleixner
  2009-05-29 12:57     ` Martin Schwidefsky
  2009-06-01  8:09   ` Andi Kleen
  1 sibling, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2009-05-28 20:29 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: linux-kernel, Rob van der Heij, Heiko Carstens, Ingo Molnar, john stultz

On Thu, 28 May 2009, Martin Schwidefsky wrote:
> From: Martin Schwidefsky <schwidefsky@de.ibm.com>
> 
> On a NOHZ system with oprofile enabled the timer tick should not be
> stopped when a cpu goes idle. Oprofile needs the pt_regs structure
> of the interrupt and allocates memory in the ring buffer for each
> sample. Current a maximum of 1 tick is accounted with oprofile if a
> cpu sleeps for a longer period of time. This does bad things to the
> percentages in the oprofile output. To postpone the oprofile tick to
> tick_nohz_restart_sched_tick analog to the in kernel profiler is not
> possible as there is no pt_regs structure in the context the
> tick_nohz_restart_sched_tick function is called and it is not a good
> idea to create hundreds of samples at once.

  Sigh. That's stupid.

  OTOH, thinking more about the patch itself it might be even useful
  for things aside oprofile. Runtime switching from and to nohz mode
  for debugging or evaluation purposes comes to my mind. That would
  need some sysfs interface, but that's not too hard to do.

  So yeah, I think we should satisfy oprofile needs and utilize it further.

  Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 1/2] idle profile hits with NOHZ
  2009-05-28 20:19   ` Thomas Gleixner
@ 2009-05-29 12:56     ` Martin Schwidefsky
  2009-05-29 13:15       ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Martin Schwidefsky @ 2009-05-29 12:56 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Rob van der Heij, Heiko Carstens, Ingo Molnar, john stultz

On Thu, 28 May 2009 22:19:44 +0200 (CEST)
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Thu, 28 May 2009, Martin Schwidefsky wrote:
> > --- quilt-2.6.orig/kernel/time/tick-sched.c
> > +++ quilt-2.6/kernel/time/tick-sched.c
> > @@ -166,6 +166,7 @@ static void tick_nohz_stop_idle(int cpu)
> >  		ts->idle_lastupdate = now;
> >  		ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
> >  		ts->idle_active = 0;
> > +		ts->idle_pc = profile_pc(get_irq_regs());
> 
>   Hmm, tick_nohz_stop_idle() is called from
>   tick_nohz_restart_sched_tick() as well in the context of the idle
>   task.
> 
>   I think there is no guarantee that get_irq_regs() will return
>   anything useful in thread context.
> 
>   So get_irq_regs() might return a NULL pointer which will explode
>   some of the profile_pc() implementations. If not it can still feed
>   total nonsense to the profile_hits() call.

Drat, but if it is called from idle we should be able to just skip the
assignment to ts->idle_pc, no? Then a simple in_interrupt() test is
missing.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 2/2] keep on ticking if oprofile is active
  2009-05-28 20:29   ` Thomas Gleixner
@ 2009-05-29 12:57     ` Martin Schwidefsky
  2009-05-29 13:14       ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Martin Schwidefsky @ 2009-05-29 12:57 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Rob van der Heij, Heiko Carstens, Ingo Molnar, john stultz

On Thu, 28 May 2009 22:29:38 +0200 (CEST)
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Thu, 28 May 2009, Martin Schwidefsky wrote:
> > From: Martin Schwidefsky <schwidefsky@de.ibm.com>
> > 
> > On a NOHZ system with oprofile enabled the timer tick should not be
> > stopped when a cpu goes idle. Oprofile needs the pt_regs structure
> > of the interrupt and allocates memory in the ring buffer for each
> > sample. Current a maximum of 1 tick is accounted with oprofile if a
> > cpu sleeps for a longer period of time. This does bad things to the
> > percentages in the oprofile output. To postpone the oprofile tick to
> > tick_nohz_restart_sched_tick analog to the in kernel profiler is not
> > possible as there is no pt_regs structure in the context the
> > tick_nohz_restart_sched_tick function is called and it is not a good
> > idea to create hundreds of samples at once.
> 
>   Sigh. That's stupid.

What is stupid, the bug or the fix?
 
>   OTOH, thinking more about the patch itself it might be even useful
>   for things aside oprofile. Runtime switching from and to nohz mode
>   for debugging or evaluation purposes comes to my mind. That would
>   need some sysfs interface, but that's not too hard to do.

That should be no problem. We used to have the hz_timer system control
with the old no-tick solution on s390. 

>   So yeah, I think we should satisfy oprofile needs and utilize it further.

Ok, so you are in principle fine with the patch?

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 2/2] keep on ticking if oprofile is active
  2009-05-29 12:57     ` Martin Schwidefsky
@ 2009-05-29 13:14       ` Thomas Gleixner
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2009-05-29 13:14 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: linux-kernel, Rob van der Heij, Heiko Carstens, Ingo Molnar, john stultz

On Fri, 29 May 2009, Martin Schwidefsky wrote:
> On Thu, 28 May 2009 22:29:38 +0200 (CEST)
> Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > On Thu, 28 May 2009, Martin Schwidefsky wrote:
> > > From: Martin Schwidefsky <schwidefsky@de.ibm.com>
> > > 
> > > On a NOHZ system with oprofile enabled the timer tick should not be
> > > stopped when a cpu goes idle. Oprofile needs the pt_regs structure
> > > of the interrupt and allocates memory in the ring buffer for each
> > > sample. Current a maximum of 1 tick is accounted with oprofile if a
> > > cpu sleeps for a longer period of time. This does bad things to the
> > > percentages in the oprofile output. To postpone the oprofile tick to
> > > tick_nohz_restart_sched_tick analog to the in kernel profiler is not
> > > possible as there is no pt_regs structure in the context the
> > > tick_nohz_restart_sched_tick function is called and it is not a good
> > > idea to create hundreds of samples at once.
> > 
> >   Sigh. That's stupid.
> 
> What is stupid, the bug or the fix?

  The bug :)
  
> >   OTOH, thinking more about the patch itself it might be even useful
> >   for things aside oprofile. Runtime switching from and to nohz mode
> >   for debugging or evaluation purposes comes to my mind. That would
> >   need some sysfs interface, but that's not too hard to do.
> 
> That should be no problem. We used to have the hz_timer system control
> with the old no-tick solution on s390. 
> 
> >   So yeah, I think we should satisfy oprofile needs and utilize it further.
> 
> Ok, so you are in principle fine with the patch?

Yup.

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 1/2] idle profile hits with NOHZ
  2009-05-29 12:56     ` Martin Schwidefsky
@ 2009-05-29 13:15       ` Thomas Gleixner
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2009-05-29 13:15 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: linux-kernel, Rob van der Heij, Heiko Carstens, Ingo Molnar, john stultz

On Fri, 29 May 2009, Martin Schwidefsky wrote:
> On Thu, 28 May 2009 22:19:44 +0200 (CEST)
> Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > On Thu, 28 May 2009, Martin Schwidefsky wrote:
> > > --- quilt-2.6.orig/kernel/time/tick-sched.c
> > > +++ quilt-2.6/kernel/time/tick-sched.c
> > > @@ -166,6 +166,7 @@ static void tick_nohz_stop_idle(int cpu)
> > >  		ts->idle_lastupdate = now;
> > >  		ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
> > >  		ts->idle_active = 0;
> > > +		ts->idle_pc = profile_pc(get_irq_regs());
> > 
> >   Hmm, tick_nohz_stop_idle() is called from
> >   tick_nohz_restart_sched_tick() as well in the context of the idle
> >   task.
> > 
> >   I think there is no guarantee that get_irq_regs() will return
> >   anything useful in thread context.
> > 
> >   So get_irq_regs() might return a NULL pointer which will explode
> >   some of the profile_pc() implementations. If not it can still feed
> >   total nonsense to the profile_hits() call.
> 
> Drat, but if it is called from idle we should be able to just skip the
> assignment to ts->idle_pc, no? Then a simple in_interrupt() test is
> missing.

That should do the trick.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 2/2] keep on ticking if oprofile is active
  2009-05-28 15:04 ` [patch 2/2] keep on ticking if oprofile is active Martin Schwidefsky
  2009-05-28 20:29   ` Thomas Gleixner
@ 2009-06-01  8:09   ` Andi Kleen
  2009-06-01 10:22     ` Martin Schwidefsky
  1 sibling, 1 reply; 12+ messages in thread
From: Andi Kleen @ 2009-06-01  8:09 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: linux-kernel, Rob van der Heij, Heiko Carstens, Ingo Molnar,
	Thomas Gleixner, john stultz

Martin Schwidefsky <schwidefsky@de.ibm.com> writes:

> From: Martin Schwidefsky <schwidefsky@de.ibm.com>
>
> On a NOHZ system with oprofile enabled the timer tick should not be

The old style profile=... profiler should do that too.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 2/2] keep on ticking if oprofile is active
  2009-06-01  8:09   ` Andi Kleen
@ 2009-06-01 10:22     ` Martin Schwidefsky
  0 siblings, 0 replies; 12+ messages in thread
From: Martin Schwidefsky @ 2009-06-01 10:22 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Rob van der Heij, Heiko Carstens, Ingo Molnar,
	Thomas Gleixner, john stultz

On Mon, 01 Jun 2009 10:09:27 +0200
Andi Kleen <andi@firstfloor.org> wrote:

> Martin Schwidefsky <schwidefsky@de.ibm.com> writes:
> 
> > From: Martin Schwidefsky <schwidefsky@de.ibm.com>
> >
> > On a NOHZ system with oprofile enabled the timer tick should not be
> 
> The old style profile=... profiler should do that too.

Technically the old style profiler and NOHZ can be combined (see patch
#1 plus a fix as discussed with Thomas). On the other hand it would make
the code simpler if we just fallback to the standard tick if profile= is
used. My personal preference would be to use my patch.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch 2/2] keep on ticking if oprofile is active
  2009-06-03 15:22 [patch 0/2] NOHZ vs. profile/oprofile v2 Martin Schwidefsky
@ 2009-06-03 15:22 ` Martin Schwidefsky
  0 siblings, 0 replies; 12+ messages in thread
From: Martin Schwidefsky @ 2009-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Rob van der Heij, Heiko Carstens, Ingo Molnar, Thomas Gleixner,
	john stultz, Andi Kleen, Martin Schwidefsky

[-- Attachment #1: profile-stop-nohz.diff --]
[-- Type: text/plain, Size: 3958 bytes --]

From: Martin Schwidefsky <schwidefsky@de.ibm.com>

On a NOHZ system with oprofile enabled the timer tick should not be
stopped when a cpu goes idle. Oprofile needs the pt_regs structure
of the interrupt and allocates memory in the ring buffer for each
sample. Current a maximum of 1 tick is accounted with oprofile if a
cpu sleeps for a longer period of time. This does bad things to the
percentages in the oprofile output. To postpone the oprofile tick to
tick_nohz_restart_sched_tick analog to the in kernel profiler is not
possible as there is no pt_regs structure in the context the
tick_nohz_restart_sched_tick function is called and it is not a good
idea to create hundreds of samples at once.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---

 drivers/oprofile/oprof.c |    3 +++
 include/linux/tick.h     |    4 ++++
 kernel/time/tick-sched.c |   26 +++++++++++++++++++++++++-
 3 files changed, 32 insertions(+), 1 deletion(-)

Index: quilt-2.6/drivers/oprofile/oprof.c
===================================================================
--- quilt-2.6.orig/drivers/oprofile/oprof.c
+++ quilt-2.6/drivers/oprofile/oprof.c
@@ -12,6 +12,7 @@
 #include <linux/init.h>
 #include <linux/oprofile.h>
 #include <linux/moduleparam.h>
+#include <linux/tick.h>
 #include <asm/mutex.h>
 
 #include "oprof.h"
@@ -103,6 +104,7 @@ int oprofile_start(void)
 	if (oprofile_started)
 		goto out;
 
+	tick_nohz_disable();
 	oprofile_reset_stats();
 
 	if ((err = oprofile_ops.start()))
@@ -123,6 +125,7 @@ void oprofile_stop(void)
 		goto out;
 	oprofile_ops.stop();
 	oprofile_started = 0;
+	tick_nohz_enable();
 	/* wake up the daemon to read what remains */
 	wake_up_buffer_waiter();
 out:
Index: quilt-2.6/include/linux/tick.h
===================================================================
--- quilt-2.6.orig/include/linux/tick.h
+++ quilt-2.6/include/linux/tick.h
@@ -117,6 +117,8 @@ extern void tick_nohz_stop_sched_tick(in
 extern void tick_nohz_restart_sched_tick(void);
 extern ktime_t tick_nohz_get_sleep_length(void);
 extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
+extern void tick_nohz_enable(void);
+extern void tick_nohz_disable(void);
 # else
 static inline void tick_nohz_stop_sched_tick(int inidle) { }
 static inline void tick_nohz_restart_sched_tick(void) { }
@@ -127,6 +129,8 @@ static inline ktime_t tick_nohz_get_slee
 	return len;
 }
 static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return -1; }
+static inline void tick_nohz_enable(void) { }
+static inline void tick_nohz_disable(void) { }
 # endif /* !NO_HZ */
 
 #endif
Index: quilt-2.6/kernel/time/tick-sched.c
===================================================================
--- quilt-2.6.orig/kernel/time/tick-sched.c
+++ quilt-2.6/kernel/time/tick-sched.c
@@ -124,6 +124,29 @@ static int __init setup_tick_nohz(char *
 
 __setup("nohz=", setup_tick_nohz);
 
+/*
+ * NO HZ currently disabled ?
+ */
+static atomic_t tick_nohz_disable_counter = ATOMIC_INIT(0);
+
+void tick_nohz_enable(void)
+{
+	atomic_dec(&tick_nohz_disable_counter);
+}
+EXPORT_SYMBOL_GPL(tick_nohz_enable);
+
+static void __tick_nohz_disable(void *dummy)
+{
+}
+
+void tick_nohz_disable(void)
+{
+	if (atomic_inc_return(&tick_nohz_disable_counter) == 1)
+		/* Wake up all cpus to make them start ticking. */
+		smp_call_function(__tick_nohz_disable, NULL, 0);
+}
+EXPORT_SYMBOL_GPL(tick_nohz_disable);
+
 /**
  * tick_nohz_update_jiffies - update jiffies when idle was interrupted
  *
@@ -272,7 +295,8 @@ void tick_nohz_stop_sched_tick(int inidl
 	next_jiffies = get_next_timer_interrupt(last_jiffies);
 	delta_jiffies = next_jiffies - last_jiffies;
 
-	if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu))
+	if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu) ||
+	    atomic_read(&tick_nohz_disable_counter) > 0)
 		delta_jiffies = 1;
 	/*
 	 * Do not stop the tick, if we are only one off

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-06-03 15:23 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-28 15:04 [patch 0/2] NOHZ vs. profile/oprofile Martin Schwidefsky
2009-05-28 15:04 ` [patch 1/2] idle profile hits with NOHZ Martin Schwidefsky
2009-05-28 20:19   ` Thomas Gleixner
2009-05-29 12:56     ` Martin Schwidefsky
2009-05-29 13:15       ` Thomas Gleixner
2009-05-28 15:04 ` [patch 2/2] keep on ticking if oprofile is active Martin Schwidefsky
2009-05-28 20:29   ` Thomas Gleixner
2009-05-29 12:57     ` Martin Schwidefsky
2009-05-29 13:14       ` Thomas Gleixner
2009-06-01  8:09   ` Andi Kleen
2009-06-01 10:22     ` Martin Schwidefsky
2009-06-03 15:22 [patch 0/2] NOHZ vs. profile/oprofile v2 Martin Schwidefsky
2009-06-03 15:22 ` [patch 2/2] keep on ticking if oprofile is active Martin Schwidefsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).