linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 2/2 -tip] perf: Don't generate events for the idle task when exclude_idle is set.
       [not found] <ye8vdi7mluz.fsf@camel16.daimi.au.dk>
@ 2009-10-22 16:34 ` Soeren Sandmann
  2009-10-22 16:38 ` [PATCH] x86: Get bp from the IRQ regs instead of directly from the CPU Soeren Sandmann
  2009-10-23  7:35 ` [PATCH 1/2 -tip] perf: Keep track of remaining time when enabling/disabling swevent hrtimers Ingo Molnar
  2 siblings, 0 replies; 11+ messages in thread
From: Soeren Sandmann @ 2009-10-22 16:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Fr??d??ric Weisbecker,
	Thomas Gleixner, Arnaldo Carvalho de Melo, Steven Rostedt

Getting samples for the idle task is often not interesting, so don't
generate them when exclude_idle is set for the event in question.

Signed-off-by: Søren Sandmann Pedersen <sandmann@redhat.com>
---
 kernel/perf_event.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index b492c55..7bc84cb 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -3959,8 +3959,9 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
 		regs = task_pt_regs(current);
 
 	if (regs) {
-		if (perf_event_overflow(event, 0, &data, regs))
-			ret = HRTIMER_NORESTART;
+		if (!(event->attr.exclude_idle && current->pid == 0))
+			if (perf_event_overflow(event, 0, &data, regs))
+				ret = HRTIMER_NORESTART;
 	}
 
 	period = max_t(u64, 10000, event->hw.sample_period);
-- 
1.6.5.rc2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH] x86: Get bp from the IRQ regs instead of directly from the CPU
       [not found] <ye8vdi7mluz.fsf@camel16.daimi.au.dk>
  2009-10-22 16:34 ` [PATCH 2/2 -tip] perf: Don't generate events for the idle task when exclude_idle is set Soeren Sandmann
@ 2009-10-22 16:38 ` Soeren Sandmann
  2009-10-23 10:50   ` Ingo Molnar
                     ` (2 more replies)
  2009-10-23  7:35 ` [PATCH 1/2 -tip] perf: Keep track of remaining time when enabling/disabling swevent hrtimers Ingo Molnar
  2 siblings, 3 replies; 11+ messages in thread
From: Soeren Sandmann @ 2009-10-22 16:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Fr??d??ric Weisbecker,
	Thomas Gleixner, Arnaldo Carvalho de Melo, Steven Rostedt

Passing 0 for bp causes dump_trace() to get bp directly from the
hardware register. This leads to the IRQ stack being included in the
generated call chains, which means the stack looks something like
this:

	[ ip ] [ IRQ stack ] [ rest of stack trace ]

which is incorrect and confusing to user space.

Getting bp from the IRQ regs instead makes the tracing start after the
IRQ stack:

	[ ip ] [ rest of stack trace ]

Signed-off-by: Søren Sandmann Pedersen <sandmann@redhat.com>
---
 arch/x86/kernel/cpu/perf_event.c |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index b5801c3..39b1d0c 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -2177,10 +2177,18 @@ static const struct stacktrace_ops backtrace_ops = {
 static void
 perf_callchain_kernel(struct pt_regs *regs, struct perf_callchain_entry *entry)
 {
+	unsigned long bp;
+    
 	callchain_store(entry, PERF_CONTEXT_KERNEL);
 	callchain_store(entry, regs->ip);
 
-	dump_trace(NULL, regs, NULL, 0, &backtrace_ops, entry);
+#ifdef CONFIG_FRAME_POINTER
+	bp = regs->bp;
+#else
+	bp = 0;
+#endif
+	
+	dump_trace(NULL, regs, NULL, bp, &backtrace_ops, entry);
 }
 
 /*
-- 
1.6.5.rc2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2 -tip] perf: Keep track of remaining time when enabling/disabling swevent hrtimers
       [not found] <ye8vdi7mluz.fsf@camel16.daimi.au.dk>
  2009-10-22 16:34 ` [PATCH 2/2 -tip] perf: Don't generate events for the idle task when exclude_idle is set Soeren Sandmann
  2009-10-22 16:38 ` [PATCH] x86: Get bp from the IRQ regs instead of directly from the CPU Soeren Sandmann
@ 2009-10-23  7:35 ` Ingo Molnar
  2 siblings, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2009-10-23  7:35 UTC (permalink / raw)
  To: Soeren Sandmann
  Cc: linux-kernel, Peter Zijlstra, Fr??d??ric Weisbecker,
	Thomas Gleixner, Arnaldo Carvalho de Melo, Steven Rostedt,
	Paul Mackerras


* Soeren Sandmann <sandmann@daimi.au.dk> wrote:

> Hi,
> 
> These patches against perf/core make the hrtimer based events work for
> sysprof.
> 
> 1/2: Don't restart the timer on every scheduler tick
> 2/2: If exclude_idle is set, don't report idle events.

Applied to tip:perf/urgent, thanks a lot Soeren!

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: Get bp from the IRQ regs instead of directly from the CPU
  2009-10-22 16:38 ` [PATCH] x86: Get bp from the IRQ regs instead of directly from the CPU Soeren Sandmann
@ 2009-10-23 10:50   ` Ingo Molnar
  2009-10-29 12:46     ` Soeren Sandmann
  2010-11-05 11:14   ` [PATCH 0/1] x86: Eliminate bp argument from the stack tracing routines Søren Sandmann Pedersen
  2010-11-05 11:14   ` [PATCH] " Søren Sandmann Pedersen
  2 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2009-10-23 10:50 UTC (permalink / raw)
  To: Soeren Sandmann, Arjan van de Ven, Thomas Gleixner, H. Peter Anvin
  Cc: linux-kernel, Peter Zijlstra, Fr??d??ric Weisbecker,
	Thomas Gleixner, Arnaldo Carvalho de Melo, Steven Rostedt


* Soeren Sandmann <sandmann@daimi.au.dk> wrote:

> Passing 0 for bp causes dump_trace() to get bp directly from the
> hardware register. This leads to the IRQ stack being included in the
> generated call chains, which means the stack looks something like
> this:
> 
> 	[ ip ] [ IRQ stack ] [ rest of stack trace ]
> 
> which is incorrect and confusing to user space.
> 
> Getting bp from the IRQ regs instead makes the tracing start after the
> IRQ stack:
> 
> 	[ ip ] [ rest of stack trace ]
> 
> Signed-off-by: Søren Sandmann Pedersen <sandmann@redhat.com>

Indeed, nice catch!

> ---
>  arch/x86/kernel/cpu/perf_event.c |   10 +++++++++-
>  1 files changed, 9 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index b5801c3..39b1d0c 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -2177,10 +2177,18 @@ static const struct stacktrace_ops backtrace_ops = {
>  static void
>  perf_callchain_kernel(struct pt_regs *regs, struct perf_callchain_entry *entry)
>  {
> +	unsigned long bp;
> +    
>  	callchain_store(entry, PERF_CONTEXT_KERNEL);
>  	callchain_store(entry, regs->ip);
>  
> -	dump_trace(NULL, regs, NULL, 0, &backtrace_ops, entry);
> +#ifdef CONFIG_FRAME_POINTER
> +	bp = regs->bp;
> +#else
> +	bp = 0;
> +#endif
> +	
> +	dump_trace(NULL, regs, NULL, bp, &backtrace_ops, entry);
>  }

Wouldnt it be better to push this logic into dump_trace() itself? That 
way other ways of backtrace generation would be improved as well, not 
just perf events call-chains.

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: Get bp from the IRQ regs instead of directly from the CPU
  2009-10-23 10:50   ` Ingo Molnar
@ 2009-10-29 12:46     ` Soeren Sandmann
  0 siblings, 0 replies; 11+ messages in thread
From: Soeren Sandmann @ 2009-10-29 12:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arjan van de Ven, Thomas Gleixner, H. Peter Anvin, linux-kernel,
	Peter Zijlstra, Fr??d??ric Weisbecker, Arnaldo Carvalho de Melo,
	Steven Rostedt

Ingo Molnar <mingo@elte.hu> writes:

> >  arch/x86/kernel/cpu/perf_event.c |   10 +++++++++-
> >  1 files changed, 9 insertions(+), 1 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> > index b5801c3..39b1d0c 100644
> > --- a/arch/x86/kernel/cpu/perf_event.c
> > +++ b/arch/x86/kernel/cpu/perf_event.c
> > @@ -2177,10 +2177,18 @@ static const struct stacktrace_ops backtrace_ops = {
> >  static void
> >  perf_callchain_kernel(struct pt_regs *regs, struct perf_callchain_entry *entry)
> >  {
> > +	unsigned long bp;
> > +    
> >  	callchain_store(entry, PERF_CONTEXT_KERNEL);
> >  	callchain_store(entry, regs->ip);
> >  
> > -	dump_trace(NULL, regs, NULL, 0, &backtrace_ops, entry);
> > +#ifdef CONFIG_FRAME_POINTER
> > +	bp = regs->bp;
> > +#else
> > +	bp = 0;
> > +#endif
> > +	
> > +	dump_trace(NULL, regs, NULL, bp, &backtrace_ops, entry);
> >  }
> 
> Wouldnt it be better to push this logic into dump_trace() itself? That 
> way other ways of backtrace generation would be improved as well, not 
> just perf events call-chains.

Yes, it would, and I wrote the patch for that below, but then
discovered that getting bp from the IRQ registers makes stacktracing
not work at all on 64 bit. As I read entry_64.S, rbp ends up being
stored in regs->bx; maybe that's related.

I'll try and track down what is going on later.


Soren




>From ce007534805fcfcc5d6b630bac5761f137fbbb1a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?S=C3=B8ren=20Sandmann=20Pedersen?= <sandmann@redhat.com>
Date: Thu, 22 Oct 2009 08:30:14 -0400
Subject: [PATCH] x86: Eliminate bp argument from the stack tracing routines

The various stack tracing routines take a 'bp' argument in which the
caller is supposed to provide the base pointer to use, or 0 if doesn't
have one. Since bp is garbage whenever CONFIG_FRAME_POINTER is not
defined, this means all callers in principle should either always pass
0, or be conditional on CONFIG_FRAME_POINTER.

However, there are only really three use cases for stack tracing and
in all cases, dump_trace can figure out the desired bp for itself.

(a) Trace the current task, including IRQ stack if any
(b) Trace the current task, but skip IRQ stack
(c) Trace some other task

In all cases, if CONFIG_FRAME_POINTER is not defined, just use 0 for
bp.  If it _is_ defined, then

- in case (a) bp should be gotten directly from the CPU's register, so
  the caller should pass NULL for regs,

- in case (b) the caller should should pass the IRQ registers to
  dump_trace(),

- in case (c) bp should be gotten from the top of the task's stack, so
  the caller should pass NULL for regs.

Hence, the bp argument is not necessary because the combination of
task and regs is sufficient to determine an appropriate value for bp.
---
 arch/x86/include/asm/kdebug.h     |    2 +-
 arch/x86/include/asm/stacktrace.h |    2 +-
 arch/x86/kernel/cpu/perf_event.c  |    2 +-
 arch/x86/kernel/dumpstack.c       |   12 ++++++------
 arch/x86/kernel/dumpstack.h       |    4 ++--
 arch/x86/kernel/dumpstack_32.c    |   18 +++++++++++-------
 arch/x86/kernel/dumpstack_64.c    |   17 +++++++++++------
 arch/x86/kernel/process_32.c      |    2 +-
 arch/x86/kernel/process_64.c      |    2 +-
 arch/x86/kernel/stacktrace.c      |    8 ++++----
 arch/x86/mm/kmemcheck/error.c     |    2 +-
 arch/x86/oprofile/backtrace.c     |    2 +-
 include/linux/stacktrace.h        |    4 +++-
 kernel/trace/trace_sysprof.c      |    8 +-------
 14 files changed, 45 insertions(+), 40 deletions(-)

diff --git a/arch/x86/include/asm/kdebug.h b/arch/x86/include/asm/kdebug.h
index fa7c0b9..7017cad 100644
--- a/arch/x86/include/asm/kdebug.h
+++ b/arch/x86/include/asm/kdebug.h
@@ -28,7 +28,7 @@ extern void die(const char *, struct pt_regs *,long);
 extern int __must_check __die(const char *, struct pt_regs *, long);
 extern void show_registers(struct pt_regs *regs);
 extern void show_trace(struct task_struct *t, struct pt_regs *regs,
-		       unsigned long *sp, unsigned long bp);
+		       unsigned long *sp);
 extern void __show_regs(struct pt_regs *regs, int all);
 extern void show_regs(struct pt_regs *regs);
 extern unsigned long oops_begin(void);
diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index cf86a5e..761487a 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -17,7 +17,7 @@ struct stacktrace_ops {
 };
 
 void dump_trace(struct task_struct *tsk, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp,
+		unsigned long *stack,
 		const struct stacktrace_ops *ops, void *data);
 
 #endif /* _ASM_X86_STACKTRACE_H */
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index b5801c3..5db9ae6 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -2180,7 +2180,7 @@ perf_callchain_kernel(struct pt_regs *regs, struct perf_callchain_entry *entry)
 	callchain_store(entry, PERF_CONTEXT_KERNEL);
 	callchain_store(entry, regs->ip);
 
-	dump_trace(NULL, regs, NULL, 0, &backtrace_ops, entry);
+	dump_trace(NULL, regs, NULL, &backtrace_ops, entry);
 }
 
 /*
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 2d8a371..e180862 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -149,21 +149,21 @@ static const struct stacktrace_ops print_trace_ops = {
 
 void
 show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp, char *log_lvl)
+		unsigned long *stack, char *log_lvl)
 {
 	printk("%sCall Trace:\n", log_lvl);
-	dump_trace(task, regs, stack, bp, &print_trace_ops, log_lvl);
+	dump_trace(task, regs, stack, &print_trace_ops, log_lvl);
 }
 
 void show_trace(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp)
+		unsigned long *stack)
 {
-	show_trace_log_lvl(task, regs, stack, bp, "");
+	show_trace_log_lvl(task, regs, stack, "");
 }
 
 void show_stack(struct task_struct *task, unsigned long *sp)
 {
-	show_stack_log_lvl(task, NULL, sp, 0, "");
+	show_stack_log_lvl(task, NULL, sp, "");
 }
 
 /*
@@ -184,7 +184,7 @@ void dump_stack(void)
 		init_utsname()->release,
 		(int)strcspn(init_utsname()->version, " "),
 		init_utsname()->version);
-	show_trace(NULL, NULL, &stack, bp);
+	show_trace(NULL, NULL, &stack);
 }
 EXPORT_SYMBOL(dump_stack);
 
diff --git a/arch/x86/kernel/dumpstack.h b/arch/x86/kernel/dumpstack.h
index 81086c2..4c90b51 100644
--- a/arch/x86/kernel/dumpstack.h
+++ b/arch/x86/kernel/dumpstack.h
@@ -22,11 +22,11 @@ print_context_stack(struct thread_info *tinfo,
 
 extern void
 show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp, char *log_lvl);
+		unsigned long *stack, char *log_lvl);
 
 extern void
 show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *sp, unsigned long bp, char *log_lvl);
+		unsigned long *sp, char *log_lvl);
 
 extern unsigned int code_bytes;
 
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index f7dd2a7..2069bdf 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -24,11 +24,12 @@ int x86_is_stack_id(int id, char *name)
 	return 0;
 }
 
-void dump_trace(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp,
+void dump_trace(struct task_struct *task,
+		struct pt_regs *regs, unsigned long *stack,
 		const struct stacktrace_ops *ops, void *data)
 {
 	int graph = 0;
+	unsigned long bp;
 
 	if (!task)
 		task = current;
@@ -41,7 +42,9 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	}
 
 #ifdef CONFIG_FRAME_POINTER
-	if (!bp) {
+	if (regs) {
+		bp = regs->bp;
+	} else {
 		if (task == current) {
 			/* Grab bp right from our regs */
 			get_bp(bp);
@@ -50,6 +53,8 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 			bp = *(unsigned long *) task->thread.sp;
 		}
 	}
+#else
+	bp = 0;
 #endif
 
 	for (;;) {
@@ -72,7 +77,7 @@ EXPORT_SYMBOL(dump_trace);
 
 void
 show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *sp, unsigned long bp, char *log_lvl)
+		unsigned long *sp, char *log_lvl)
 {
 	unsigned long *stack;
 	int i;
@@ -94,7 +99,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 		touch_nmi_watchdog();
 	}
 	printk("\n");
-	show_trace_log_lvl(task, regs, sp, bp, log_lvl);
+	show_trace_log_lvl(task, regs, sp, log_lvl);
 }
 
 
@@ -119,8 +124,7 @@ void show_registers(struct pt_regs *regs)
 		u8 *ip;
 
 		printk(KERN_EMERG "Stack:\n");
-		show_stack_log_lvl(NULL, regs, &regs->sp,
-				0, KERN_EMERG);
+		show_stack_log_lvl(NULL, regs, &regs->sp, KERN_EMERG);
 
 		printk(KERN_EMERG "Code: ");
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index a071e6b..d091eee 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -108,8 +108,8 @@ static unsigned long *in_exception_stack(unsigned cpu, unsigned long stack,
  * severe exception (double fault, nmi, stack fault, debug, mce) hardware stack
  */
 
-void dump_trace(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp,
+void dump_trace(struct task_struct *task,
+		struct pt_regs *regs, unsigned long *stack,
 		const struct stacktrace_ops *ops, void *data)
 {
 	const unsigned cpu = get_cpu();
@@ -118,6 +118,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	unsigned used = 0;
 	struct thread_info *tinfo;
 	int graph = 0;
+	unsigned long bp;
 
 	if (!task)
 		task = current;
@@ -130,7 +131,9 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	}
 
 #ifdef CONFIG_FRAME_POINTER
-	if (!bp) {
+	if (regs) {
+		bp = regs->bp;
+	} else {
 		if (task == current) {
 			/* Grab bp right from our regs */
 			get_bp(bp);
@@ -139,6 +142,8 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 			bp = *(unsigned long *) task->thread.sp;
 		}
 	}
+#else
+	bp = 0;
 #endif
 
 	/*
@@ -202,7 +207,7 @@ EXPORT_SYMBOL(dump_trace);
 
 void
 show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *sp, unsigned long bp, char *log_lvl)
+		unsigned long *sp, char *log_lvl)
 {
 	unsigned long *stack;
 	int i;
@@ -241,7 +246,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 		touch_nmi_watchdog();
 	}
 	printk("\n");
-	show_trace_log_lvl(task, regs, sp, bp, log_lvl);
+	show_trace_log_lvl(task, regs, sp, log_lvl);
 }
 
 void show_registers(struct pt_regs *regs)
@@ -269,7 +274,7 @@ void show_registers(struct pt_regs *regs)
 
 		printk(KERN_EMERG "Stack:\n");
 		show_stack_log_lvl(NULL, regs, (unsigned long *)sp,
-				regs->bp, KERN_EMERG);
+				   KERN_EMERG);
 
 		printk(KERN_EMERG "Code: ");
 
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 4cf7956..c4f3753 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -188,7 +188,7 @@ void __show_regs(struct pt_regs *regs, int all)
 void show_regs(struct pt_regs *regs)
 {
 	__show_regs(regs, 1);
-	show_trace(NULL, regs, &regs->sp, regs->bp);
+	show_trace(NULL, regs, &regs->sp);
 }
 
 /*
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index ad535b6..ddfcd37 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -228,7 +228,7 @@ void show_regs(struct pt_regs *regs)
 {
 	printk(KERN_INFO "CPU %d:", smp_processor_id());
 	__show_regs(regs, 1);
-	show_trace(NULL, regs, (void *)(regs + 1), regs->bp);
+	show_trace(NULL, regs, (void *)(regs + 1));
 }
 
 void release_thread(struct task_struct *dead_task)
diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index c3eb207..d5c00d3 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -71,22 +71,22 @@ static const struct stacktrace_ops save_stack_ops_nosched = {
  */
 void save_stack_trace(struct stack_trace *trace)
 {
-	dump_trace(current, NULL, NULL, 0, &save_stack_ops, trace);
+	dump_trace(current, NULL, NULL, &save_stack_ops, trace);
 	if (trace->nr_entries < trace->max_entries)
 		trace->entries[trace->nr_entries++] = ULONG_MAX;
 }
 EXPORT_SYMBOL_GPL(save_stack_trace);
 
-void save_stack_trace_bp(struct stack_trace *trace, unsigned long bp)
+void save_stack_trace_regs(struct stack_trace *trace, struct pt_regs *regs)
 {
-	dump_trace(current, NULL, NULL, bp, &save_stack_ops, trace);
+	dump_trace(current, regs, NULL, &save_stack_ops, trace);
 	if (trace->nr_entries < trace->max_entries)
 		trace->entries[trace->nr_entries++] = ULONG_MAX;
 }
 
 void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
 {
-	dump_trace(tsk, NULL, NULL, 0, &save_stack_ops_nosched, trace);
+	dump_trace(tsk, NULL, NULL, &save_stack_ops_nosched, trace);
 	if (trace->nr_entries < trace->max_entries)
 		trace->entries[trace->nr_entries++] = ULONG_MAX;
 }
diff --git a/arch/x86/mm/kmemcheck/error.c b/arch/x86/mm/kmemcheck/error.c
index 4901d0d..dc9b197 100644
--- a/arch/x86/mm/kmemcheck/error.c
+++ b/arch/x86/mm/kmemcheck/error.c
@@ -186,7 +186,7 @@ void kmemcheck_error_save(enum kmemcheck_shadow state,
 	e->trace.entries = e->trace_entries;
 	e->trace.max_entries = ARRAY_SIZE(e->trace_entries);
 	e->trace.skip = 0;
-	save_stack_trace_bp(&e->trace, regs->bp);
+	save_stack_trace_regs(&e->trace, regs);
 
 	/* Round address down to nearest 16 bytes */
 	shadow_copy = kmemcheck_shadow_lookup(address
diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index 044897b..00e33b5 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -80,7 +80,7 @@ x86_backtrace(struct pt_regs * const regs, unsigned int depth)
 	if (!user_mode_vm(regs)) {
 		unsigned long stack = kernel_stack_pointer(regs);
 		if (depth)
-			dump_trace(NULL, regs, (unsigned long *)stack, 0,
+			dump_trace(NULL, regs, (unsigned long *)stack,
 				   &backtrace_ops, &depth);
 		return;
 	}
diff --git a/include/linux/stacktrace.h b/include/linux/stacktrace.h
index 51efbef..25310f1 100644
--- a/include/linux/stacktrace.h
+++ b/include/linux/stacktrace.h
@@ -2,6 +2,7 @@
 #define __LINUX_STACKTRACE_H
 
 struct task_struct;
+struct pt_regs;
 
 #ifdef CONFIG_STACKTRACE
 struct task_struct;
@@ -13,7 +14,8 @@ struct stack_trace {
 };
 
 extern void save_stack_trace(struct stack_trace *trace);
-extern void save_stack_trace_bp(struct stack_trace *trace, unsigned long bp);
+extern void save_stack_trace_regs(struct stack_trace *trace,
+				  struct pt_regs *regs);
 extern void save_stack_trace_tsk(struct task_struct *tsk,
 				struct stack_trace *trace);
 
diff --git a/kernel/trace/trace_sysprof.c b/kernel/trace/trace_sysprof.c
index f669396..3db232b 100644
--- a/kernel/trace/trace_sysprof.c
+++ b/kernel/trace/trace_sysprof.c
@@ -100,7 +100,6 @@ trace_kernel(struct pt_regs *regs, struct trace_array *tr,
 	     struct trace_array_cpu *data)
 {
 	struct backtrace_info info;
-	unsigned long bp;
 	char *stack;
 
 	info.tr = tr;
@@ -110,13 +109,8 @@ trace_kernel(struct pt_regs *regs, struct trace_array *tr,
 	__trace_special(info.tr, info.data, 1, regs->ip, 0);
 
 	stack = ((char *)regs + sizeof(struct pt_regs));
-#ifdef CONFIG_FRAME_POINTER
-	bp = regs->bp;
-#else
-	bp = 0;
-#endif
 
-	dump_trace(NULL, regs, (void *)stack, bp, &backtrace_ops, &info);
+	dump_trace(NULL, regs, (void *)stack, &backtrace_ops, &info);
 
 	return info.pos;
 }
-- 
1.6.5.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 0/1] x86: Eliminate bp argument from the stack tracing routines
  2009-10-22 16:38 ` [PATCH] x86: Get bp from the IRQ regs instead of directly from the CPU Soeren Sandmann
  2009-10-23 10:50   ` Ingo Molnar
@ 2010-11-05 11:14   ` Søren Sandmann Pedersen
  2010-11-07 21:24     ` Frederic Weisbecker
  2010-11-05 11:14   ` [PATCH] " Søren Sandmann Pedersen
  2 siblings, 1 reply; 11+ messages in thread
From: Søren Sandmann Pedersen @ 2010-11-05 11:14 UTC (permalink / raw)
  To: mingo, linux-kernel

Hi,

This is a resurrection of an old patch that I sent about a year ago:

     http://lkml.org/lkml/2009/10/22/192

At the time, I thought the patch broke perf callchains on 64 bit, but
it turns out that those are broken even without this patch. 

I don't know why that is, but I now think the patch is correct and to
blame.

(FWIW, this

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 461a85d..d977d26 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1653,7 +1653,7 @@ static const struct stacktrace_ops backtrace_ops = {
 	.warning_symbol		= backtrace_warning_symbol,
 	.stack			= backtrace_stack,
 	.address		= backtrace_address,
-	.walk_stack		= print_context_stack_bp,
+	.walk_stack		= print_context_stack,
 };

makes it produce correct kernel callchains. And yes, I did compile the
kernel with CONFIG_FRAME_POINTER).


Soren



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH] x86: Eliminate bp argument from the stack tracing routines
  2009-10-22 16:38 ` [PATCH] x86: Get bp from the IRQ regs instead of directly from the CPU Soeren Sandmann
  2009-10-23 10:50   ` Ingo Molnar
  2010-11-05 11:14   ` [PATCH 0/1] x86: Eliminate bp argument from the stack tracing routines Søren Sandmann Pedersen
@ 2010-11-05 11:14   ` Søren Sandmann Pedersen
  2 siblings, 0 replies; 11+ messages in thread
From: Søren Sandmann Pedersen @ 2010-11-05 11:14 UTC (permalink / raw)
  To: mingo, linux-kernel
  Cc: Søren Sandmann Pedersen, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Peter Zijlstra, Arjan van de Ven,
	Frederic Weisbecker, Arnaldo Carvalho de Melo, Steven Rostedt

The various stack tracing routines take a 'bp' argument in which the
caller is supposed to provide the base pointer to use, or 0 if doesn't
have one. Since bp is garbage whenever CONFIG_FRAME_POINTER is not
defined, this means all callers in principle should either always pass
0, or be conditional on CONFIG_FRAME_POINTER.

However, there are only really three use cases for stack tracing and
in all cases, dump_trace can figure out the desired bp for itself.

(a) Trace the current task, including IRQ stack if any
(b) Trace the current task, but skip IRQ stack
(c) Trace some other task

In all cases, if CONFIG_FRAME_POINTER is not defined, just use 0 for
bp.  If it _is_ defined, then

- in case (a) bp should be gotten directly from the CPU's register, so
  the caller should pass NULL for regs,

- in case (b) the caller should should pass the IRQ registers to
  dump_trace(),

- in case (c) bp should be gotten from the top of the task's stack, so
  the caller should pass NULL for regs.

Hence, the bp argument is not necessary because the combination of
task and regs is sufficient to determine an appropriate value for bp.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arjan van de Ven <arjan@infradead.org>,
Cc: Frederic Weisbecker <fweisbec@gmail.com>,
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>,
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Soren Sandmann <ssp@redhat.com>
---
 arch/x86/include/asm/kdebug.h     |    2 +-
 arch/x86/include/asm/stacktrace.h |    6 +++---
 arch/x86/kernel/cpu/perf_event.c  |    2 +-
 arch/x86/kernel/dumpstack.c       |   12 ++++++------
 arch/x86/kernel/dumpstack_32.c    |   18 +++++++++++-------
 arch/x86/kernel/dumpstack_64.c    |   17 +++++++++++------
 arch/x86/kernel/process.c         |    3 +--
 arch/x86/kernel/stacktrace.c      |    8 ++++----
 arch/x86/mm/kmemcheck/error.c     |    2 +-
 arch/x86/oprofile/backtrace.c     |    2 +-
 include/linux/stacktrace.h        |    4 +++-
 11 files changed, 43 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/kdebug.h b/arch/x86/include/asm/kdebug.h
index 5bdfca8..f23eb25 100644
--- a/arch/x86/include/asm/kdebug.h
+++ b/arch/x86/include/asm/kdebug.h
@@ -28,7 +28,7 @@ extern void die(const char *, struct pt_regs *,long);
 extern int __must_check __die(const char *, struct pt_regs *, long);
 extern void show_registers(struct pt_regs *regs);
 extern void show_trace(struct task_struct *t, struct pt_regs *regs,
-		       unsigned long *sp, unsigned long bp);
+		       unsigned long *sp);
 extern void __show_regs(struct pt_regs *regs, int all);
 extern void show_regs(struct pt_regs *regs);
 extern unsigned long oops_begin(void);
diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index 2b16a2a..f918c82 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -46,7 +46,7 @@ struct stacktrace_ops {
 };
 
 void dump_trace(struct task_struct *tsk, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp,
+		unsigned long *stack,
 		const struct stacktrace_ops *ops, void *data);
 
 #ifdef CONFIG_X86_32
@@ -59,11 +59,11 @@ void dump_trace(struct task_struct *tsk, struct pt_regs *regs,
 
 extern void
 show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp, char *log_lvl);
+		   unsigned long *stack, char *log_lvl);
 
 extern void
 show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *sp, unsigned long bp, char *log_lvl);
+		   unsigned long *sp, char *log_lvl);
 
 extern unsigned int code_bytes;
 
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index ed63101..461a85d 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1666,7 +1666,7 @@ perf_callchain_kernel(struct perf_callchain_entry *entry, struct pt_regs *regs)
 
 	perf_callchain_store(entry, regs->ip);
 
-	dump_trace(NULL, regs, NULL, regs->bp, &backtrace_ops, entry);
+	dump_trace(NULL, regs, NULL, &backtrace_ops, entry);
 }
 
 #ifdef CONFIG_COMPAT
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 6e8752c..8474c99 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -175,21 +175,21 @@ static const struct stacktrace_ops print_trace_ops = {
 
 void
 show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp, char *log_lvl)
+		unsigned long *stack, char *log_lvl)
 {
 	printk("%sCall Trace:\n", log_lvl);
-	dump_trace(task, regs, stack, bp, &print_trace_ops, log_lvl);
+	dump_trace(task, regs, stack, &print_trace_ops, log_lvl);
 }
 
 void show_trace(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp)
+		unsigned long *stack)
 {
-	show_trace_log_lvl(task, regs, stack, bp, "");
+	show_trace_log_lvl(task, regs, stack, "");
 }
 
 void show_stack(struct task_struct *task, unsigned long *sp)
 {
-	show_stack_log_lvl(task, NULL, sp, 0, "");
+	show_stack_log_lvl(task, NULL, sp, "");
 }
 
 /*
@@ -210,7 +210,7 @@ void dump_stack(void)
 		init_utsname()->release,
 		(int)strcspn(init_utsname()->version, " "),
 		init_utsname()->version);
-	show_trace(NULL, NULL, &stack, bp);
+	show_trace(NULL, NULL, &stack);
 }
 EXPORT_SYMBOL(dump_stack);
 
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 1bc7f75..6d92b37 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -17,11 +17,12 @@
 #include <asm/stacktrace.h>
 
 
-void dump_trace(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp,
+void dump_trace(struct task_struct *task,
+		struct pt_regs *regs, unsigned long *stack,
 		const struct stacktrace_ops *ops, void *data)
 {
 	int graph = 0;
+	unsigned long bp;
 
 	if (!task)
 		task = current;
@@ -35,7 +36,9 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	}
 
 #ifdef CONFIG_FRAME_POINTER
-	if (!bp) {
+	if (regs) {
+		bp = regs->bp;
+	} else {
 		if (task == current) {
 			/* Grab bp right from our regs */
 			get_bp(bp);
@@ -44,6 +47,8 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 			bp = *(unsigned long *) task->thread.sp;
 		}
 	}
+#else
+	bp = 0;
 #endif
 
 	for (;;) {
@@ -65,7 +70,7 @@ EXPORT_SYMBOL(dump_trace);
 
 void
 show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		   unsigned long *sp, unsigned long bp, char *log_lvl)
+		   unsigned long *sp, char *log_lvl)
 {
 	unsigned long *stack;
 	int i;
@@ -87,7 +92,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 		touch_nmi_watchdog();
 	}
 	printk(KERN_CONT "\n");
-	show_trace_log_lvl(task, regs, sp, bp, log_lvl);
+	show_trace_log_lvl(task, regs, sp, log_lvl);
 }
 
 
@@ -112,8 +117,7 @@ void show_registers(struct pt_regs *regs)
 		u8 *ip;
 
 		printk(KERN_EMERG "Stack:\n");
-		show_stack_log_lvl(NULL, regs, &regs->sp,
-				0, KERN_EMERG);
+		show_stack_log_lvl(NULL, regs, &regs->sp, KERN_EMERG);
 
 		printk(KERN_EMERG "Code: ");
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 6a34048..d00afc1 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -139,8 +139,8 @@ fixup_bp_irq_link(unsigned long bp, unsigned long *stack,
  * severe exception (double fault, nmi, stack fault, debug, mce) hardware stack
  */
 
-void dump_trace(struct task_struct *task, struct pt_regs *regs,
-		unsigned long *stack, unsigned long bp,
+void dump_trace(struct task_struct *task,
+		struct pt_regs *regs, unsigned long *stack,
 		const struct stacktrace_ops *ops, void *data)
 {
 	const unsigned cpu = get_cpu();
@@ -149,6 +149,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	unsigned used = 0;
 	struct thread_info *tinfo;
 	int graph = 0;
+	unsigned long bp;
 
 	if (!task)
 		task = current;
@@ -161,7 +162,9 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	}
 
 #ifdef CONFIG_FRAME_POINTER
-	if (!bp) {
+	if (regs) {
+		bp = regs->bp;
+	} else {
 		if (task == current) {
 			/* Grab bp right from our regs */
 			get_bp(bp);
@@ -170,6 +173,8 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 			bp = *(unsigned long *) task->thread.sp;
 		}
 	}
+#else
+	bp = 0;
 #endif
 
 	/*
@@ -235,7 +240,7 @@ EXPORT_SYMBOL(dump_trace);
 
 void
 show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
-		   unsigned long *sp, unsigned long bp, char *log_lvl)
+		   unsigned long *sp, char *log_lvl)
 {
 	unsigned long *irq_stack_end;
 	unsigned long *irq_stack;
@@ -279,7 +284,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	preempt_enable();
 
 	printk(KERN_CONT "\n");
-	show_trace_log_lvl(task, regs, sp, bp, log_lvl);
+	show_trace_log_lvl(task, regs, sp, log_lvl);
 }
 
 void show_registers(struct pt_regs *regs)
@@ -308,7 +313,7 @@ void show_registers(struct pt_regs *regs)
 
 		printk(KERN_EMERG "Stack:\n");
 		show_stack_log_lvl(NULL, regs, (unsigned long *)sp,
-				regs->bp, KERN_EMERG);
+				   KERN_EMERG);
 
 		printk(KERN_EMERG "Code: ");
 
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 57d1868..96ed1aa 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -91,8 +91,7 @@ void exit_thread(void)
 void show_regs(struct pt_regs *regs)
 {
 	show_registers(regs);
-	show_trace(NULL, regs, (unsigned long *)kernel_stack_pointer(regs),
-		   regs->bp);
+	show_trace(NULL, regs, (unsigned long *)kernel_stack_pointer(regs));
 }
 
 void show_regs_common(void)
diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index b53c525..938c8e1 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -73,22 +73,22 @@ static const struct stacktrace_ops save_stack_ops_nosched = {
  */
 void save_stack_trace(struct stack_trace *trace)
 {
-	dump_trace(current, NULL, NULL, 0, &save_stack_ops, trace);
+	dump_trace(current, NULL, NULL, &save_stack_ops, trace);
 	if (trace->nr_entries < trace->max_entries)
 		trace->entries[trace->nr_entries++] = ULONG_MAX;
 }
 EXPORT_SYMBOL_GPL(save_stack_trace);
 
-void save_stack_trace_bp(struct stack_trace *trace, unsigned long bp)
+void save_stack_trace_regs(struct stack_trace *trace, struct pt_regs *regs)
 {
-	dump_trace(current, NULL, NULL, bp, &save_stack_ops, trace);
+	dump_trace(current, regs, NULL, &save_stack_ops, trace);
 	if (trace->nr_entries < trace->max_entries)
 		trace->entries[trace->nr_entries++] = ULONG_MAX;
 }
 
 void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
 {
-	dump_trace(tsk, NULL, NULL, 0, &save_stack_ops_nosched, trace);
+	dump_trace(tsk, NULL, NULL, &save_stack_ops_nosched, trace);
 	if (trace->nr_entries < trace->max_entries)
 		trace->entries[trace->nr_entries++] = ULONG_MAX;
 }
diff --git a/arch/x86/mm/kmemcheck/error.c b/arch/x86/mm/kmemcheck/error.c
index af3b6c8..704a37c 100644
--- a/arch/x86/mm/kmemcheck/error.c
+++ b/arch/x86/mm/kmemcheck/error.c
@@ -185,7 +185,7 @@ void kmemcheck_error_save(enum kmemcheck_shadow state,
 	e->trace.entries = e->trace_entries;
 	e->trace.max_entries = ARRAY_SIZE(e->trace_entries);
 	e->trace.skip = 0;
-	save_stack_trace_bp(&e->trace, regs->bp);
+	save_stack_trace_regs(&e->trace, regs);
 
 	/* Round address down to nearest 16 bytes */
 	shadow_copy = kmemcheck_shadow_lookup(address
diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index 2d49d4e..72cbec1 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -126,7 +126,7 @@ x86_backtrace(struct pt_regs * const regs, unsigned int depth)
 	if (!user_mode_vm(regs)) {
 		unsigned long stack = kernel_stack_pointer(regs);
 		if (depth)
-			dump_trace(NULL, regs, (unsigned long *)stack, 0,
+			dump_trace(NULL, regs, (unsigned long *)stack,
 				   &backtrace_ops, &depth);
 		return;
 	}
diff --git a/include/linux/stacktrace.h b/include/linux/stacktrace.h
index 51efbef..25310f1 100644
--- a/include/linux/stacktrace.h
+++ b/include/linux/stacktrace.h
@@ -2,6 +2,7 @@
 #define __LINUX_STACKTRACE_H
 
 struct task_struct;
+struct pt_regs;
 
 #ifdef CONFIG_STACKTRACE
 struct task_struct;
@@ -13,7 +14,8 @@ struct stack_trace {
 };
 
 extern void save_stack_trace(struct stack_trace *trace);
-extern void save_stack_trace_bp(struct stack_trace *trace, unsigned long bp);
+extern void save_stack_trace_regs(struct stack_trace *trace,
+				  struct pt_regs *regs);
 extern void save_stack_trace_tsk(struct task_struct *tsk,
 				struct stack_trace *trace);
 
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/1] x86: Eliminate bp argument from the stack tracing routines
  2010-11-05 11:14   ` [PATCH 0/1] x86: Eliminate bp argument from the stack tracing routines Søren Sandmann Pedersen
@ 2010-11-07 21:24     ` Frederic Weisbecker
  2010-11-08 11:38       ` Soeren Sandmann
  0 siblings, 1 reply; 11+ messages in thread
From: Frederic Weisbecker @ 2010-11-07 21:24 UTC (permalink / raw)
  To: Søren Sandmann Pedersen; +Cc: mingo, linux-kernel

On Fri, Nov 05, 2010 at 07:14:33AM -0400, Søren Sandmann Pedersen wrote:
> Hi,
> 
> This is a resurrection of an old patch that I sent about a year ago:
> 
>      http://lkml.org/lkml/2009/10/22/192
> 
> At the time, I thought the patch broke perf callchains on 64 bit, but
> it turns out that those are broken even without this patch. 
> 
> I don't know why that is, but I now think the patch is correct and to
> blame.
> 
> (FWIW, this
> 
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index 461a85d..d977d26 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -1653,7 +1653,7 @@ static const struct stacktrace_ops backtrace_ops = {
>  	.warning_symbol		= backtrace_warning_symbol,
>  	.stack			= backtrace_stack,
>  	.address		= backtrace_address,
> -	.walk_stack		= print_context_stack_bp,
> +	.walk_stack		= print_context_stack,
>  };
> 
> makes it produce correct kernel callchains. And yes, I did compile the
> kernel with CONFIG_FRAME_POINTER).



What do you see is broken in 64 bits perf callchains? Can you please provide
me more details so that I can fix the issue?

Thanks.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/1] x86: Eliminate bp argument from the stack tracing routines
  2010-11-07 21:24     ` Frederic Weisbecker
@ 2010-11-08 11:38       ` Soeren Sandmann
  2010-11-18 15:32         ` Frederic Weisbecker
  0 siblings, 1 reply; 11+ messages in thread
From: Soeren Sandmann @ 2010-11-08 11:38 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Søren Sandmann Pedersen, mingo, linux-kernel

Frederic Weisbecker <fweisbec@gmail.com> writes:

> > (FWIW, this
> > 
> > diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> > index 461a85d..d977d26 100644
> > --- a/arch/x86/kernel/cpu/perf_event.c
> > +++ b/arch/x86/kernel/cpu/perf_event.c
> > @@ -1653,7 +1653,7 @@ static const struct stacktrace_ops backtrace_ops = {
> >  	.warning_symbol		= backtrace_warning_symbol,
> >  	.stack			= backtrace_stack,
> >  	.address		= backtrace_address,
> > -	.walk_stack		= print_context_stack_bp,
> > +	.walk_stack		= print_context_stack,
> >  };
> > 
> > makes it produce correct kernel callchains. And yes, I did compile the
> > kernel with CONFIG_FRAME_POINTER).
> 
> 
> 
> What do you see is broken in 64 bits perf callchains? Can you please provide
> me more details so that I can fix the issue?

Apparently, the problem only happens when using the hrtimer based
events. I don't think there is a way to make perf use those from the
command line, but if you apply this:

--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -294,6 +294,12 @@ static void create_counter(int counter, int cpu)
                attr->enable_on_exec = 1;
        }
 
+       if (attr->config == PERF_COUNT_HW_CPU_CYCLES && attr->type ==
PERF_TYPE_HARDWARE)
+       {
+           attr->type = PERF_TYPE_SOFTWARE;
+           attr->config = PERF_COUNT_SW_CPU_CLOCK;
+       }
+       
        for (thread_index = 0; thread_index < thread_num; thread_index++) {
 try_again:
                fd[nr_cpu][counter][thread_index] = sys_perf_event_open(attr,

you should be able to see the problem.  You can also try sysprof; it
always uses the software counters.


Soren

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/1] x86: Eliminate bp argument from the stack tracing routines
  2010-11-08 11:38       ` Soeren Sandmann
@ 2010-11-18 15:32         ` Frederic Weisbecker
  0 siblings, 0 replies; 11+ messages in thread
From: Frederic Weisbecker @ 2010-11-18 15:32 UTC (permalink / raw)
  To: Soeren Sandmann; +Cc: Søren Sandmann Pedersen, mingo, linux-kernel

On Mon, Nov 08, 2010 at 12:38:22PM +0100, Soeren Sandmann wrote:
> Frederic Weisbecker <fweisbec@gmail.com> writes:
> 
> > > (FWIW, this
> > > 
> > > diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> > > index 461a85d..d977d26 100644
> > > --- a/arch/x86/kernel/cpu/perf_event.c
> > > +++ b/arch/x86/kernel/cpu/perf_event.c
> > > @@ -1653,7 +1653,7 @@ static const struct stacktrace_ops backtrace_ops = {
> > >  	.warning_symbol		= backtrace_warning_symbol,
> > >  	.stack			= backtrace_stack,
> > >  	.address		= backtrace_address,
> > > -	.walk_stack		= print_context_stack_bp,
> > > +	.walk_stack		= print_context_stack,
> > >  };
> > > 
> > > makes it produce correct kernel callchains. And yes, I did compile the
> > > kernel with CONFIG_FRAME_POINTER).
> > 
> > 
> > 
> > What do you see is broken in 64 bits perf callchains? Can you please provide
> > me more details so that I can fix the issue?
> 
> Apparently, the problem only happens when using the hrtimer based
> events. I don't think there is a way to make perf use those from the
> command line, but if you apply this:


You can, with perf record -e cpu-clock :)


> 
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -294,6 +294,12 @@ static void create_counter(int counter, int cpu)
>                 attr->enable_on_exec = 1;
>         }
>  
> +       if (attr->config == PERF_COUNT_HW_CPU_CYCLES && attr->type ==
> PERF_TYPE_HARDWARE)
> +       {
> +           attr->type = PERF_TYPE_SOFTWARE;
> +           attr->config = PERF_COUNT_SW_CPU_CLOCK;
> +       }
> +       
>         for (thread_index = 0; thread_index < thread_num; thread_index++) {
>  try_again:
>                 fd[nr_cpu][counter][thread_index] = sys_perf_event_open(attr,
> 
> you should be able to see the problem.  You can also try sysprof; it
> always uses the software counters.


Yep I can reproduce, there is indeed something weird happening there. I'll fix,
thanks for your report!!


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2 -tip] perf: Keep track of remaining time when enabling/disabling swevent hrtimers
@ 2009-10-22 16:47 Soeren Sandmann
  0 siblings, 0 replies; 11+ messages in thread
From: Soeren Sandmann @ 2009-10-22 16:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Fr??d??ric Weisbecker,
	Thomas Gleixner, Arnaldo Carvalho de Melo, Steven Rostedt

[I don't think this mail got through the first time]

Hi,

These patches against perf/core make the hrtimer based events work for
sysprof.

1/2: Don't restart the timer on every scheduler tick
2/2: If exclude_idle is set, don't report idle events

Thanks,                                                                         
Soren                                                                           


>From f3e9630ccee6c148ee9c5bc1632dfd0384969b2b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?S=C3=B8ren=20Sandmann=20Pedersen?= <sandmann@redhat.com>
Date: Thu, 22 Oct 2009 09:51:35 -0400
Subject: [PATCH 1/2] perf: Keep track of remaining time when enabling/disabling swevent hrtimers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Whenever a swevent is scheduled out, the hrtimer is canceled. When it
is scheduled back in, the timer is restarted. This happens every
scheduler tick, which means the timer never expired because it was
getting repeatedly restarted over and over with the same period.

To fix that, save the remaining time when disabling; when reenabling,
use that saved time as the period instead of the user-specified
sampling period.

Also, move the starting and stopping of the hrtimers to helper
functions instead of duplicating the code.

Signed-off-by: Søren Sandmann Pedersen <sandmann@redhat.com>
---
 include/linux/perf_event.h |    4 +-
 kernel/perf_event.c        |   64 +++++++++++++++++++++++++++++--------------
 2 files changed, 45 insertions(+), 23 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2e6d95f..9e70126 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -471,8 +471,8 @@ struct hw_perf_event {
 			unsigned long	event_base;
 			int		idx;
 		};
-		union { /* software */
-			atomic64_t	count;
+		struct { /* software */
+			s64		remaining;
 			struct hrtimer	hrtimer;
 		};
 	};
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 9d0b5c6..b492c55 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -3969,6 +3969,43 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
 	return ret;
 }
 
+static void perf_swevent_start_hrtimer(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	hrtimer_init(&hwc->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	hwc->hrtimer.function = perf_swevent_hrtimer;
+	if (hwc->sample_period) {
+		u64 period;
+
+		if (hwc->remaining) {
+			if (hwc->remaining < 0)
+				period = 10000;
+			else
+				period = hwc->remaining;
+			hwc->remaining = 0;
+		}
+		else {
+			period = max_t(u64, 10000, hwc->sample_period);
+		}
+		__hrtimer_start_range_ns(&hwc->hrtimer,
+				ns_to_ktime(period), 0,
+				HRTIMER_MODE_REL, 0);
+	}
+}
+
+static void perf_swevent_cancel_hrtimer(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (hwc->sample_period) {
+		ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
+		hwc->remaining = ktime_to_ns(remaining);
+
+		hrtimer_cancel(&hwc->hrtimer);
+	}
+}
+
 /*
  * Software event: cpu wall time clock
  */
@@ -3991,22 +4028,14 @@ static int cpu_clock_perf_event_enable(struct perf_event *event)
 	int cpu = raw_smp_processor_id();
 
 	atomic64_set(&hwc->prev_count, cpu_clock(cpu));
-	hrtimer_init(&hwc->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-	hwc->hrtimer.function = perf_swevent_hrtimer;
-	if (hwc->sample_period) {
-		u64 period = max_t(u64, 10000, hwc->sample_period);
-		__hrtimer_start_range_ns(&hwc->hrtimer,
-				ns_to_ktime(period), 0,
-				HRTIMER_MODE_REL, 0);
-	}
-
+	perf_swevent_start_hrtimer(event);
+	
 	return 0;
 }
 
 static void cpu_clock_perf_event_disable(struct perf_event *event)
 {
-	if (event->hw.sample_period)
-		hrtimer_cancel(&event->hw.hrtimer);
+	perf_swevent_cancel_hrtimer(event);
 	cpu_clock_perf_event_update(event);
 }
 
@@ -4043,22 +4072,15 @@ static int task_clock_perf_event_enable(struct perf_event *event)
 	now = event->ctx->time;
 
 	atomic64_set(&hwc->prev_count, now);
-	hrtimer_init(&hwc->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-	hwc->hrtimer.function = perf_swevent_hrtimer;
-	if (hwc->sample_period) {
-		u64 period = max_t(u64, 10000, hwc->sample_period);
-		__hrtimer_start_range_ns(&hwc->hrtimer,
-				ns_to_ktime(period), 0,
-				HRTIMER_MODE_REL, 0);
-	}
+
+	perf_swevent_start_hrtimer(event);
 
 	return 0;
 }
 
 static void task_clock_perf_event_disable(struct perf_event *event)
 {
-	if (event->hw.sample_period)
-		hrtimer_cancel(&event->hw.hrtimer);
+	perf_swevent_cancel_hrtimer(event);
 	task_clock_perf_event_update(event, event->ctx->time);
 
 }
-- 
1.6.5.rc2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-11-18 15:32 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <ye8vdi7mluz.fsf@camel16.daimi.au.dk>
2009-10-22 16:34 ` [PATCH 2/2 -tip] perf: Don't generate events for the idle task when exclude_idle is set Soeren Sandmann
2009-10-22 16:38 ` [PATCH] x86: Get bp from the IRQ regs instead of directly from the CPU Soeren Sandmann
2009-10-23 10:50   ` Ingo Molnar
2009-10-29 12:46     ` Soeren Sandmann
2010-11-05 11:14   ` [PATCH 0/1] x86: Eliminate bp argument from the stack tracing routines Søren Sandmann Pedersen
2010-11-07 21:24     ` Frederic Weisbecker
2010-11-08 11:38       ` Soeren Sandmann
2010-11-18 15:32         ` Frederic Weisbecker
2010-11-05 11:14   ` [PATCH] " Søren Sandmann Pedersen
2009-10-23  7:35 ` [PATCH 1/2 -tip] perf: Keep track of remaining time when enabling/disabling swevent hrtimers Ingo Molnar
2009-10-22 16:47 Soeren Sandmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).