linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Anyone working on ftrace function graph support on ARM?
@ 2009-03-24 19:38 Tim Bird
  2009-03-24 20:25 ` Ingo Molnar
                   ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Tim Bird @ 2009-03-24 19:38 UTC (permalink / raw)
  To: linux-arm-kernel, linux kernel, Steven Rostedt, Ingo Molnar,
	Abhishek Sagar, Russell King, Uwe Kleine-König

ARM and FTRACE people,

Is anyone working on function graph support for ARM?

I haven't done much ARM assembly, but the Intel mechanism
for the return hook looks fairly straightforward,
and I thought I'd take a shot at implementing it on ARM.

But if someone else is already doing it, I'd rather work
with them.

BTW - I turned on -finstrument-functions on ARM, and it seems
to work OK (with the exception being that I don't see evenly matched
calls and returns).  For this latter reason, I'm going to
start with an implementation that copies the return hook
used by x86, with a fallback to using __cyg_profile_... instead
of mcount, if the hook approach proves too hard for me on ARM.

My ultimate goal is to add function duration filtering, which
is one of the nicer features of KFT (an older tracer I used
to maintain out-of-mainline).  With all the ftrace and ringbuffer
support already in mainline, this shouldn't be too hard, but
I need to start with basic graph support on ARM first.

Any comments or feedback on the approach, or on current plans
to extend ftrace support on ARM, before I get too far along,
are welcome.

Thanks,
 -- Tim

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=============================


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-24 19:38 Anyone working on ftrace function graph support on ARM? Tim Bird
@ 2009-03-24 20:25 ` Ingo Molnar
  2009-03-24 20:48   ` Tim Bird
  2009-03-24 20:38 ` Uwe Kleine-König
  2009-03-24 21:36 ` Frederic Weisbecker
  2 siblings, 1 reply; 37+ messages in thread
From: Ingo Molnar @ 2009-03-24 20:25 UTC (permalink / raw)
  To: Tim Bird, Frédéric Weisbecker
  Cc: linux-arm-kernel, linux kernel, Steven Rostedt, Ingo Molnar,
	Abhishek Sagar, Russell King, Uwe Kleine-König


* Tim Bird <tim.bird@am.sony.com> wrote:

> My ultimate goal is to add function duration filtering, which is 
> one of the nicer features of KFT (an older tracer I used to 
> maintain out-of-mainline).  With all the ftrace and ringbuffer 
> support already in mainline, this shouldn't be too hard, but I 
> need to start with basic graph support on ARM first.

ah, function duration filtering - is that to only show functions 
which have a duration beyond a certain (runtime configurable) value?

	Ingo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-24 19:38 Anyone working on ftrace function graph support on ARM? Tim Bird
  2009-03-24 20:25 ` Ingo Molnar
@ 2009-03-24 20:38 ` Uwe Kleine-König
  2009-03-24 21:36 ` Frederic Weisbecker
  2 siblings, 0 replies; 37+ messages in thread
From: Uwe Kleine-König @ 2009-03-24 20:38 UTC (permalink / raw)
  To: Tim Bird
  Cc: linux-arm-kernel, linux kernel, Steven Rostedt, Ingo Molnar,
	Abhishek Sagar, Russell King

Hello,

On Tue, Mar 24, 2009 at 12:38:50PM -0700, Tim Bird wrote:
> ARM and FTRACE people,
> 
> Is anyone working on function graph support for ARM?
I don't think there is anyone working in that area.

You can Cc: me when your patches are ready, I will happily proof-read
them.

Best regards
Uwe

-- 
Pengutronix e.K.                              | Uwe Kleine-König            |
Industrial Linux Solutions                    | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-24 20:25 ` Ingo Molnar
@ 2009-03-24 20:48   ` Tim Bird
  0 siblings, 0 replies; 37+ messages in thread
From: Tim Bird @ 2009-03-24 20:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frédéric Weisbecker, linux-arm-kernel, linux kernel,
	Steven Rostedt, Ingo Molnar, Abhishek Sagar, Russell King,
	Uwe Kleine-König

Ingo Molnar wrote:
> * Tim Bird <tim.bird@am.sony.com> wrote:
> 
>> My ultimate goal is to add function duration filtering, which is 
>> one of the nicer features of KFT (an older tracer I used to 
>> maintain out-of-mainline).  With all the ftrace and ringbuffer 
>> support already in mainline, this shouldn't be too hard, but I 
>> need to start with basic graph support on ARM first.
> 
> ah, function duration filtering - is that to only show functions 
> which have a duration beyond a certain (runtime configurable) value?

Exactly.  With KFT, you set a "mintime filter", and it removed from
the trace buffer any functions which were less than the threshhold.
This was not high cost, because you usually only removed the most
recently added function in the trace buffer.

This lets you focus on routines that last a long time.  I used
it mainly to find long-duration routines in early boot.
The initcall tracer serves a similar purpose, but not at the
same granularity.

It greatly extends the length of time you can trace, before the
buffer fills up.
 -- Tim

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=============================


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-24 19:38 Anyone working on ftrace function graph support on ARM? Tim Bird
  2009-03-24 20:25 ` Ingo Molnar
  2009-03-24 20:38 ` Uwe Kleine-König
@ 2009-03-24 21:36 ` Frederic Weisbecker
  2009-03-24 21:40   ` Tim Bird
                     ` (2 more replies)
  2 siblings, 3 replies; 37+ messages in thread
From: Frederic Weisbecker @ 2009-03-24 21:36 UTC (permalink / raw)
  To: Tim Bird
  Cc: linux-arm-kernel, linux kernel, Steven Rostedt, Ingo Molnar,
	Abhishek Sagar, Russell King, Uwe Kleine-König

On Tue, Mar 24, 2009 at 12:38:50PM -0700, Tim Bird wrote:
> ARM and FTRACE people,
> 
> Is anyone working on function graph support for ARM?


Aah, it was on my plans!
But well, if you are about to do it, so don't hesitate.

I don't have any arm board for now, though I have a pending
one which I will probably get in few weeks, but for now
I can't work on it, so knock yourself out.


> I haven't done much ARM assembly, but the Intel mechanism
> for the return hook looks fairly straightforward,
> and I thought I'd take a shot at implementing it on ARM.


Yes, ie:

_Before jumping to the function entry hook, you must save
the arguments for the traced function on the stack.
On x86, its eax, edx and ecx.
On arm, it will be r0-r3.
Then you have to transmit the address of the traced function
(it's on r14) and it's parent (must rely on fp for that).
Then you call the entry hook and restore the old scratch/arg
registers.


_ On return hook it's pretty the same, (saving the scratch
registers, especially the return value which should be on r0
and r1 if I'm not wrong).
But you'll have to get the original return address from the
return handler and then put it on pc.

Well it's a very naive listing, there are sometimes some problems.
For example on x86-64, I had to save even some non-scratch registers
before calling the return hook, I still don't know why.


> But if someone else is already doing it, I'd rather work
> with them.
> 
> BTW - I turned on -finstrument-functions on ARM, and it seems
> to work OK (with the exception being that I don't see evenly matched
> calls and returns).  For this latter reason, I'm going to
> start with an implementation that copies the return hook
> used by x86, with a fallback to using __cyg_profile_... instead
> of mcount, if the hook approach proves too hard for me on ARM.


Be care, -finstrument-functions will add one more handler and then
increase the size of the kernel, may be a lot.

Moreover it will not be compatible with dynamic tracing which is
designed to patch the mcount call sites. It doesn't support patching
of __cyg_profile. And patching __cyg_profile call sites would mean
running twice the time to patch the kernel code.

Function tracing (hooks only on function entries) and dynamic tracing
are already implemented on Arm. Using the mcount way shoudn't really be
a problem because all is already set for that with the function tracer.


Anyway if you need some help, don't hesitate!


> My ultimate goal is to add function duration filtering, which
> is one of the nicer features of KFT (an older tracer I used
> to maintain out-of-mainline).  With all the ftrace and ringbuffer
> support already in mainline, this shouldn't be too hard, but
> I need to start with basic graph support on ARM first.


Yeah, seems a nice idea.

Thanks,
Frederic.


> Any comments or feedback on the approach, or on current plans
> to extend ftrace support on ARM, before I get too far along,
> are welcome.
> 
> Thanks,
>  -- Tim
> 
> =============================
> Tim Bird
> Architecture Group Chair, CE Linux Forum
> Senior Staff Engineer, Sony Corporation of America
> =============================
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-24 21:36 ` Frederic Weisbecker
@ 2009-03-24 21:40   ` Tim Bird
  2009-03-24 21:48   ` Ingo Molnar
  2009-03-24 22:29   ` Anyone working on ftrace function graph support on ARM? Abhishek Sagar
  2 siblings, 0 replies; 37+ messages in thread
From: Tim Bird @ 2009-03-24 21:40 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-arm-kernel, linux kernel, Steven Rostedt, Ingo Molnar,
	Abhishek Sagar, Russell King, Uwe Kleine-König

Frederic Weisbecker wrote:
> Yes, ie:
> 
> _Before jumping to the function entry hook, you must save
> the arguments for the traced function on the stack.
> On x86, its eax, edx and ecx.
> On arm, it will be r0-r3.
> Then you have to transmit the address of the traced function
> (it's on r14) and it's parent (must rely on fp for that).
> Then you call the entry hook and restore the old scratch/arg
> registers.
> 
> 
> _ On return hook it's pretty the same, (saving the scratch
> registers, especially the return value which should be on r0
> and r1 if I'm not wrong).
> But you'll have to get the original return address from the
> return handler and then put it on pc.
> 
> Well it's a very naive listing, there are sometimes some problems.
> For example on x86-64, I had to save even some non-scratch registers
> before calling the return hook, I still don't know why.

Thanks - this is very helpful.  I'll send patches around when
I get something going. Hopefully in the next week or two.
 -- Tim

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=============================


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-24 21:36 ` Frederic Weisbecker
  2009-03-24 21:40   ` Tim Bird
@ 2009-03-24 21:48   ` Ingo Molnar
  2009-03-24 21:57     ` Frederic Weisbecker
  2009-03-24 22:29   ` Anyone working on ftrace function graph support on ARM? Abhishek Sagar
  2 siblings, 1 reply; 37+ messages in thread
From: Ingo Molnar @ 2009-03-24 21:48 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Tim Bird, linux-arm-kernel, linux kernel, Steven Rostedt,
	Ingo Molnar, Abhishek Sagar, Russell King, Uwe Kleine-König


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> Well it's a very naive listing, there are sometimes some problems. 
> For example on x86-64, I had to save even some non-scratch 
> registers before calling the return hook, I still don't know why.

btw., which are those registers?

	Ingo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-24 21:48   ` Ingo Molnar
@ 2009-03-24 21:57     ` Frederic Weisbecker
  2009-03-24 22:14       ` Ingo Molnar
  2009-03-25 16:00       ` Steven Rostedt
  0 siblings, 2 replies; 37+ messages in thread
From: Frederic Weisbecker @ 2009-03-24 21:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tim Bird, linux-arm-kernel, linux kernel, Steven Rostedt,
	Ingo Molnar, Abhishek Sagar, Russell King, Uwe Kleine-König

On Tue, Mar 24, 2009 at 10:48:46PM +0100, Ingo Molnar wrote:
> 
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > Well it's a very naive listing, there are sometimes some problems. 
> > For example on x86-64, I had to save even some non-scratch 
> > registers before calling the return hook, I still don't know why.
> 
> btw., which are those registers?
> 
> 	Ingo


I would expect to only save rax,rdi,rsi,rdx,rcx,r8,r9 which
are used for parameters.
And I had some crashes until I append r10 and r11 which actually are
scratch if I'm not wrong, but since they are scratch and are not used for
arguments, I thought they didn't need to be saved.

Well, I think there were some code flow cases I was missing.


The complete code is:

	movq %rax, (%rsp)
	movq %rcx, 8(%rsp)
	movq %rdx, 16(%rsp)
	movq %rsi, 24(%rsp)
	movq %rdi, 32(%rsp)
	movq %r8, 40(%rsp)
	movq %r9, 48(%rsp)
	movq %r10, 56(%rsp)
	movq %r11, 64(%rsp)

	call ftrace_return_to_handler

	movq %rax, 72(%rsp) <-- get original return value
	movq 64(%rsp), %r11
	movq 56(%rsp), %r10
	movq 48(%rsp), %r9
	movq 40(%rsp), %r8
	movq 32(%rsp), %rdi
	movq 24(%rsp), %rsi
	movq 16(%rsp), %rdx
	movq 8(%rsp), %rcx
	movq (%rsp), %rax
	addq $72, %rsp


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-24 21:57     ` Frederic Weisbecker
@ 2009-03-24 22:14       ` Ingo Molnar
  2009-03-24 22:54         ` Frederic Weisbecker
  2009-03-25  8:36         ` Russell King - ARM Linux
  2009-03-25 16:00       ` Steven Rostedt
  1 sibling, 2 replies; 37+ messages in thread
From: Ingo Molnar @ 2009-03-24 22:14 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Tim Bird, linux-arm-kernel, linux kernel, Steven Rostedt,
	Ingo Molnar, Abhishek Sagar, Russell King, Uwe Kleine-König


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> On Tue, Mar 24, 2009 at 10:48:46PM +0100, Ingo Molnar wrote:
> > 
> > * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > 
> > > Well it's a very naive listing, there are sometimes some problems. 
> > > For example on x86-64, I had to save even some non-scratch 
> > > registers before calling the return hook, I still don't know why.
> > 
> > btw., which are those registers?
> > 
> > 	Ingo
> 
> 
> I would expect to only save rax,rdi,rsi,rdx,rcx,r8,r9 which are 
> used for parameters.

> And I had some crashes until I append r10 and r11 which actually 
> are scratch if I'm not wrong, but since they are scratch and are 
> not used for arguments, I thought they didn't need to be saved.
> 
> Well, I think there were some code flow cases I was missing.

Correct, r10 and r11 are clobbered registers too - and you need to 
save them too in mcount methods.

The reason is that mcount has a special calling convention - it's 
not just about not destroying arguments - GCC can keep data in r10 
or r11 scratch registers across function calls as well - for example 
for relatively static functions that are in its local optimization 
scope.

If GCC can prove that the local scope function itself does not 
clobber r10/r11, it does not have to clobber them across the 
function call. But the mcount() callback still gets inserted.

So the rule is: mcount must not destroy _any_ register state. 
(beyond flags)

	ngo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-24 21:36 ` Frederic Weisbecker
  2009-03-24 21:40   ` Tim Bird
  2009-03-24 21:48   ` Ingo Molnar
@ 2009-03-24 22:29   ` Abhishek Sagar
  2009-03-24 22:48     ` Frederic Weisbecker
  2 siblings, 1 reply; 37+ messages in thread
From: Abhishek Sagar @ 2009-03-24 22:29 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Tim Bird, linux-arm-kernel, linux kernel, Steven Rostedt,
	Ingo Molnar, Russell King, Uwe Kleine-König

Frederic Weisbecker wrote:
> Yes, ie:
>
> _Before jumping to the function entry hook, you must save
> the arguments for the traced function on the stack.
> On x86, its eax, edx and ecx.
> On arm, it will be r0-r3.
> Then you have to transmit the address of the traced function
> (it's on r14) and it's parent (must rely on fp for that).
> Then you call the entry hook and restore the old scratch/arg
> registers.
>   
Instead of just restoring the old backed-up args, lr can be fixed up
inside the entry hook to point to the return hook. So when the traced
function returns, it actually returns to the return hook (where we can
restore the original return address). This means that
-finstrument-functions is not required at all. This is analogous to how
kretprobes work. The only difference here is that instead of planting a
kprobe at the function entry and redirecting the function return to the
profiling exit routine, we can use mcount. This is slightly more
complicated to implement but can be a very efficient alternative to
kretprobes.
--
Abhishek


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-24 22:29   ` Anyone working on ftrace function graph support on ARM? Abhishek Sagar
@ 2009-03-24 22:48     ` Frederic Weisbecker
  2009-03-25  8:42       ` Russell King - ARM Linux
  0 siblings, 1 reply; 37+ messages in thread
From: Frederic Weisbecker @ 2009-03-24 22:48 UTC (permalink / raw)
  To: Abhishek Sagar
  Cc: Tim Bird, linux-arm-kernel, linux kernel, Steven Rostedt,
	Ingo Molnar, Russell King, Uwe Kleine-König

On Tue, Mar 24, 2009 at 06:29:03PM -0400, Abhishek Sagar wrote:
> Frederic Weisbecker wrote:
> > Yes, ie:
> >
> > _Before jumping to the function entry hook, you must save
> > the arguments for the traced function on the stack.
> > On x86, its eax, edx and ecx.
> > On arm, it will be r0-r3.
> > Then you have to transmit the address of the traced function
> > (it's on r14) and it's parent (must rely on fp for that).
> > Then you call the entry hook and restore the old scratch/arg
> > registers.
> >   
> Instead of just restoring the old backed-up args, lr can be fixed up
> inside the entry hook to point to the return hook. So when the traced
> function returns, it actually returns to the return hook (where we can
> restore the original return address). This means that
> -finstrument-functions is not required at all. This is analogous to how
> kretprobes work. The only difference here is that instead of planting a
> kprobe at the function entry and redirecting the function return to the
> profiling exit routine, we can use mcount. This is slightly more
> complicated to implement but can be a very efficient alternative to
> kretprobes.
> --
> Abhishek
> 

Indeed, you need to override lr, that even the only solution.
I was still thinking in an x86 way with its on stack return address.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-24 22:14       ` Ingo Molnar
@ 2009-03-24 22:54         ` Frederic Weisbecker
  2009-03-25  8:36         ` Russell King - ARM Linux
  1 sibling, 0 replies; 37+ messages in thread
From: Frederic Weisbecker @ 2009-03-24 22:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tim Bird, linux-arm-kernel, linux kernel, Steven Rostedt,
	Ingo Molnar, Abhishek Sagar, Russell King, Uwe Kleine-König

On Tue, Mar 24, 2009 at 11:14:39PM +0100, Ingo Molnar wrote:
> 
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > On Tue, Mar 24, 2009 at 10:48:46PM +0100, Ingo Molnar wrote:
> > > 
> > > * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > > 
> > > > Well it's a very naive listing, there are sometimes some problems. 
> > > > For example on x86-64, I had to save even some non-scratch 
> > > > registers before calling the return hook, I still don't know why.
> > > 
> > > btw., which are those registers?
> > > 
> > > 	Ingo
> > 
> > 
> > I would expect to only save rax,rdi,rsi,rdx,rcx,r8,r9 which are 
> > used for parameters.
> 
> > And I had some crashes until I append r10 and r11 which actually 
> > are scratch if I'm not wrong, but since they are scratch and are 
> > not used for arguments, I thought they didn't need to be saved.
> > 
> > Well, I think there were some code flow cases I was missing.
> 
> Correct, r10 and r11 are clobbered registers too - and you need to 
> save them too in mcount methods.
> 
> The reason is that mcount has a special calling convention - it's 
> not just about not destroying arguments - GCC can keep data in r10 
> or r11 scratch registers across function calls as well - for example 
> for relatively static functions that are in its local optimization 
> scope.
> 
> If GCC can prove that the local scope function itself does not 
> clobber r10/r11, it does not have to clobber them across the 
> function call. But the mcount() callback still gets inserted.
> 
> So the rule is: mcount must not destroy _any_ register state. 
> (beyond flags)
> 
> 	ngo


Aah, ok, understood!
Thanks.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-24 22:14       ` Ingo Molnar
  2009-03-24 22:54         ` Frederic Weisbecker
@ 2009-03-25  8:36         ` Russell King - ARM Linux
  1 sibling, 0 replies; 37+ messages in thread
From: Russell King - ARM Linux @ 2009-03-25  8:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Tim Bird, linux-arm-kernel, linux kernel,
	Steven Rostedt, Ingo Molnar, Abhishek Sagar,
	Uwe Kleine-König

On Tue, Mar 24, 2009 at 11:14:39PM +0100, Ingo Molnar wrote:
> So the rule is: mcount must not destroy _any_ register state. 
> (beyond flags)

As has already been discussed on the ARM lists, we have a problem with
using mcount along with EABI, or OABI without frame pointers.  We can
not avoid destroying 'lr' - the return address for the function.

OABI works around this by assuming that the parent function saved a
stack frame, which means it's at a known offset from the frame pointer
(provided frame pointers are enabled.)  mcount reloads 'lr' before
returning:

ENTRY(mcount)
        stmdb sp!, {r0-r3, lr}
        mov r0, lr
        sub r0, r0, #MCOUNT_INSN_SIZE

        .globl mcount_call
mcount_call:
        bl ftrace_stub
        ldr lr, [fp, #-4]                       @ restore lr
        ldmia sp!, {r0-r3, pc}

So I think if Tim is expecting to be able to use this feature on EABI
kernels, he's going to struggle.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-24 22:48     ` Frederic Weisbecker
@ 2009-03-25  8:42       ` Russell King - ARM Linux
  2009-03-25  8:54         ` Ingo Molnar
                           ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Russell King - ARM Linux @ 2009-03-25  8:42 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Abhishek Sagar, Tim Bird, linux-arm-kernel, linux kernel,
	Steven Rostedt, Ingo Molnar, Uwe Kleine-König

On Tue, Mar 24, 2009 at 11:48:58PM +0100, Frederic Weisbecker wrote:
> On Tue, Mar 24, 2009 at 06:29:03PM -0400, Abhishek Sagar wrote:
> > Instead of just restoring the old backed-up args, lr can be fixed up
> > inside the entry hook to point to the return hook. So when the traced
> > function returns, it actually returns to the return hook (where we can
> > restore the original return address). This means that
> > -finstrument-functions is not required at all. This is analogous to how
> > kretprobes work. The only difference here is that instead of planting a
> > kprobe at the function entry and redirecting the function return to the
> > profiling exit routine, we can use mcount. This is slightly more
> > complicated to implement but can be a very efficient alternative to
> > kretprobes.
> > --
> > Abhishek
> > 
> 
> Indeed, you need to override lr, that even the only solution.
> I was still thinking in an x86 way with its on stack return address.

As pointed out in my previous mail, identifying where on the stack the
return address is stored is only possible for OABI with frame pointers.

EABI will probably be possible with the stack unwinding code, but it
probably won't be cheap.  The EABI unwinder is scheduled for merging
during the present now-open merge window.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-25  8:42       ` Russell King - ARM Linux
@ 2009-03-25  8:54         ` Ingo Molnar
  2009-03-25  9:57           ` Russell King - ARM Linux
  2009-03-25 16:41           ` Tim Bird
  2009-03-25 11:41         ` Frederic Weisbecker
  2009-03-25 16:34         ` Tim Bird
  2 siblings, 2 replies; 37+ messages in thread
From: Ingo Molnar @ 2009-03-25  8:54 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Frederic Weisbecker, Abhishek Sagar, Tim Bird, linux-arm-kernel,
	linux kernel, Steven Rostedt, Ingo Molnar, Uwe Kleine-König


* Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:

> On Tue, Mar 24, 2009 at 11:48:58PM +0100, Frederic Weisbecker wrote:
> > On Tue, Mar 24, 2009 at 06:29:03PM -0400, Abhishek Sagar wrote:
> > > Instead of just restoring the old backed-up args, lr can be fixed up
> > > inside the entry hook to point to the return hook. So when the traced
> > > function returns, it actually returns to the return hook (where we can
> > > restore the original return address). This means that
> > > -finstrument-functions is not required at all. This is analogous to how
> > > kretprobes work. The only difference here is that instead of planting a
> > > kprobe at the function entry and redirecting the function return to the
> > > profiling exit routine, we can use mcount. This is slightly more
> > > complicated to implement but can be a very efficient alternative to
> > > kretprobes.
> > > --
> > > Abhishek
> > > 
> > 
> > Indeed, you need to override lr, that even the only solution.
> > I was still thinking in an x86 way with its on stack return address.
> 
> As pointed out in my previous mail, identifying where on the stack 
> the return address is stored is only possible for OABI with frame 
> pointers.
> 
> EABI will probably be possible with the stack unwinding code, but 
> it probably won't be cheap.  The EABI unwinder is scheduled for 
> merging during the present now-open merge window.

Unwinding is not realistic or desired for the function tracer - it 
runs in every kernel function so performance is paramount.

So, if i understood you correctly, an OABI_COMPAT and FRAME_POINTERS 
dependency has to be added to the ARM HAVE_FUNCTION_GRAPH_TRACER 
Kconfig rule.

	Ingo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-25  8:54         ` Ingo Molnar
@ 2009-03-25  9:57           ` Russell King - ARM Linux
  2009-03-25 10:45             ` Uwe Kleine-König
  2009-03-25 16:41           ` Tim Bird
  1 sibling, 1 reply; 37+ messages in thread
From: Russell King - ARM Linux @ 2009-03-25  9:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Abhishek Sagar, Tim Bird, linux-arm-kernel,
	linux kernel, Steven Rostedt, Ingo Molnar, Uwe Kleine-König

On Wed, Mar 25, 2009 at 09:54:18AM +0100, Ingo Molnar wrote:
> Unwinding is not realistic or desired for the function tracer - it 
> runs in every kernel function so performance is paramount.

Which would also include the unwinder itself as well.

> So, if i understood you correctly, an OABI_COMPAT and FRAME_POINTERS 
> dependency has to be added to the ARM HAVE_FUNCTION_GRAPH_TRACER 
> Kconfig rule.

If we have frame pointers enabled with EABI, then it looks like it will
work as well.  So the dependency should be on FRAME_POINTERS for _every_
feature using the mcount code.

Hmm, and it looks like the ftrace code is rather crap:

ENTRY(mcount)
        stmdb sp!, {r0-r3, lr}
        ldr r0, =ftrace_trace_function
        ldr r2, [r0]
        adr r0, ftrace_stub
        cmp r0, r2
        bne trace
        ldr lr, [fp, #-4]                       @ restore lr
        ldmia sp!, {r0-r3, pc}

trace:
        ldr r1, [fp, #-4]                       @ lr of instrumented routine
        mov r0, lr
        sub r0, r0, #MCOUNT_INSN_SIZE
        mov lr, pc
        mov pc, r2
 XXX calling a C function results in r0-r3,ip,lr being clobbered XXX

        mov lr, r1                              @ restore lr
 XXX not necessarily, r1 might be some other random value

        ldmia sp!, {r0-r3, pc}

In fact, to me the above code looks totally crap, because it's checking
whether the caller is 'ftrace_stub'.  It can never be 'ftrace_stub'
because that is an assembly function:

        .globl ftrace_stub
ftrace_stub:
        mov pc, lr

and therefore gcc has no hand in adding a mcount call to it.

Moreover, the _dynamic_ ftrace code does this:

ENTRY(mcount)
        stmdb sp!, {r0-r3, lr}
        mov r0, lr
        sub r0, r0, #MCOUNT_INSN_SIZE

        .globl mcount_call
mcount_call:
        bl ftrace_stub
        ldr lr, [fp, #-4]                       @ restore lr
        ldmia sp!, {r0-r3, pc}

ENTRY(ftrace_caller)
        stmdb sp!, {r0-r3, lr}
        ldr r1, [fp, #-4]
        mov r0, lr
        sub r0, r0, #MCOUNT_INSN_SIZE

        .globl ftrace_call
ftrace_call:
        bl ftrace_stub
        ldr lr, [fp, #-4]                       @ restore lr
        ldmia sp!, {r0-r3, pc}

In other words, it pushes some words onto the stack, sets r0, calls
an assembly function which does nothing but just returns, reloads lr,
restores the stack and returns.  This ftrace implementation looks like
an exercise in slowing down execution to me with no added value.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-25  9:57           ` Russell King - ARM Linux
@ 2009-03-25 10:45             ` Uwe Kleine-König
  2009-03-25 11:21               ` Russell King - ARM Linux
  0 siblings, 1 reply; 37+ messages in thread
From: Uwe Kleine-König @ 2009-03-25 10:45 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Ingo Molnar, Frederic Weisbecker, Abhishek Sagar, Tim Bird,
	linux-arm-kernel, linux kernel, Steven Rostedt, Ingo Molnar

Hi Russell,

On Wed, Mar 25, 2009 at 09:57:51AM +0000, Russell King - ARM Linux wrote:
> On Wed, Mar 25, 2009 at 09:54:18AM +0100, Ingo Molnar wrote:
> > Unwinding is not realistic or desired for the function tracer - it 
> > runs in every kernel function so performance is paramount.
> 
> Which would also include the unwinder itself as well.
> 
> > So, if i understood you correctly, an OABI_COMPAT and FRAME_POINTERS 
> > dependency has to be added to the ARM HAVE_FUNCTION_GRAPH_TRACER 
> > Kconfig rule.
> 
> If we have frame pointers enabled with EABI, then it looks like it will
> work as well.  So the dependency should be on FRAME_POINTERS for _every_
> feature using the mcount code.
> 
> Hmm, and it looks like the ftrace code is rather crap:
> 
> ENTRY(mcount)
>         stmdb sp!, {r0-r3, lr}
>         ldr r0, =ftrace_trace_function
>         ldr r2, [r0]
>         adr r0, ftrace_stub
>         cmp r0, r2
>         bne trace
>         ldr lr, [fp, #-4]                       @ restore lr
>         ldmia sp!, {r0-r3, pc}
> 
> trace:
>         ldr r1, [fp, #-4]                       @ lr of instrumented routine
>         mov r0, lr
>         sub r0, r0, #MCOUNT_INSN_SIZE
>         mov lr, pc
>         mov pc, r2
>  XXX calling a C function results in r0-r3,ip,lr being clobbered XXX
> 
>         mov lr, r1                              @ restore lr
>  XXX not necessarily, r1 might be some other random value
> 
>         ldmia sp!, {r0-r3, pc}
> 
> In fact, to me the above code looks totally crap, because it's checking
> whether the caller is 'ftrace_stub'.  It can never be 'ftrace_stub'
> because that is an assembly function:
> 
>         .globl ftrace_stub
> ftrace_stub:
>         mov pc, lr
> 
> and therefore gcc has no hand in adding a mcount call to it.
Hhhm.  Isn't the equivalent C-Code ~ as follows:

	if (ftrace_trace_function != ftrace_stub)
		trace(some, args);
	return;
?  ftrace_trace_function is initialised to ftrace_stub at compile time
and is changed when a tracerfunction is registered.

> Moreover, the _dynamic_ ftrace code does this:
> 
> ENTRY(mcount)
>         stmdb sp!, {r0-r3, lr}
>         mov r0, lr
>         sub r0, r0, #MCOUNT_INSN_SIZE
> 
>         .globl mcount_call
> mcount_call:
>         bl ftrace_stub
>         ldr lr, [fp, #-4]                       @ restore lr
>         ldmia sp!, {r0-r3, pc}
> 
> ENTRY(ftrace_caller)
>         stmdb sp!, {r0-r3, lr}
>         ldr r1, [fp, #-4]
>         mov r0, lr
>         sub r0, r0, #MCOUNT_INSN_SIZE
> 
>         .globl ftrace_call
> ftrace_call:
>         bl ftrace_stub
>         ldr lr, [fp, #-4]                       @ restore lr
>         ldmia sp!, {r0-r3, pc}
> 
> In other words, it pushes some words onto the stack, sets r0, calls
> an assembly function which does nothing but just returns, reloads lr,
> restores the stack and returns.  This ftrace implementation looks like
> an exercise in slowing down execution to me with no added value.
The idea is that the instruction at address mcount_call (and
ftrace_call IIRC) is rewritten at run time.
Still the code is not active currently (because CONFIG_DYNAMIC_FTRACE
isn't selectable for ARM) and needs some care anyhow on reactivation
because the way how dynamic ftrace works changed somehow.  Didn't look
at it up to now though.

Best regards
Uwe

-- 
Pengutronix e.K.                              | Uwe Kleine-König            |
Industrial Linux Solutions                    | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-25 10:45             ` Uwe Kleine-König
@ 2009-03-25 11:21               ` Russell King - ARM Linux
  2009-03-25 12:09                 ` Uwe Kleine-König
  0 siblings, 1 reply; 37+ messages in thread
From: Russell King - ARM Linux @ 2009-03-25 11:21 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Ingo Molnar, Frederic Weisbecker, Abhishek Sagar, Tim Bird,
	linux-arm-kernel, linux kernel, Steven Rostedt, Ingo Molnar

On Wed, Mar 25, 2009 at 11:45:05AM +0100, Uwe Kleine-König wrote:
> Hi Russell,
> 
> On Wed, Mar 25, 2009 at 09:57:51AM +0000, Russell King - ARM Linux wrote:
> > On Wed, Mar 25, 2009 at 09:54:18AM +0100, Ingo Molnar wrote:
> > > Unwinding is not realistic or desired for the function tracer - it 
> > > runs in every kernel function so performance is paramount.
> > 
> > Which would also include the unwinder itself as well.
> > 
> > > So, if i understood you correctly, an OABI_COMPAT and FRAME_POINTERS 
> > > dependency has to be added to the ARM HAVE_FUNCTION_GRAPH_TRACER 
> > > Kconfig rule.
> > 
> > If we have frame pointers enabled with EABI, then it looks like it will
> > work as well.  So the dependency should be on FRAME_POINTERS for _every_
> > feature using the mcount code.
> > 
> > Hmm, and it looks like the ftrace code is rather crap:
> > 
> > ENTRY(mcount)
> >         stmdb sp!, {r0-r3, lr}
> >         ldr r0, =ftrace_trace_function
> >         ldr r2, [r0]
> >         adr r0, ftrace_stub
> >         cmp r0, r2
> >         bne trace
> >         ldr lr, [fp, #-4]                       @ restore lr
> >         ldmia sp!, {r0-r3, pc}
> > 
> > trace:
> >         ldr r1, [fp, #-4]                       @ lr of instrumented routine
> >         mov r0, lr
> >         sub r0, r0, #MCOUNT_INSN_SIZE
> >         mov lr, pc
> >         mov pc, r2
> >  XXX calling a C function results in r0-r3,ip,lr being clobbered XXX
> > 
> >         mov lr, r1                              @ restore lr
> >  XXX not necessarily, r1 might be some other random value
> > 
> >         ldmia sp!, {r0-r3, pc}
> > 
> > In fact, to me the above code looks totally crap, because it's checking
> > whether the caller is 'ftrace_stub'.  It can never be 'ftrace_stub'
> > because that is an assembly function:
> > 
> >         .globl ftrace_stub
> > ftrace_stub:
> >         mov pc, lr
> > 
> > and therefore gcc has no hand in adding a mcount call to it.
> Hhhm.  Isn't the equivalent C-Code ~ as follows:
> 
> 	if (ftrace_trace_function != ftrace_stub)
> 		trace(some, args);
> 	return;
> ?  ftrace_trace_function is initialised to ftrace_stub at compile time
> and is changed when a tracerfunction is registered.

Correct.  But my point is that there's no way for ftrace_stub to ever call
mcount.  So the check there is pointless.

> > Moreover, the _dynamic_ ftrace code does this:
> > 
> > ENTRY(mcount)
> >         stmdb sp!, {r0-r3, lr}
> >         mov r0, lr
> >         sub r0, r0, #MCOUNT_INSN_SIZE
> > 
> >         .globl mcount_call
> > mcount_call:
> >         bl ftrace_stub
> >         ldr lr, [fp, #-4]                       @ restore lr
> >         ldmia sp!, {r0-r3, pc}
> > 
> > ENTRY(ftrace_caller)
> >         stmdb sp!, {r0-r3, lr}
> >         ldr r1, [fp, #-4]
> >         mov r0, lr
> >         sub r0, r0, #MCOUNT_INSN_SIZE
> > 
> >         .globl ftrace_call
> > ftrace_call:
> >         bl ftrace_stub
> >         ldr lr, [fp, #-4]                       @ restore lr
> >         ldmia sp!, {r0-r3, pc}
> > 
> > In other words, it pushes some words onto the stack, sets r0, calls
> > an assembly function which does nothing but just returns, reloads lr,
> > restores the stack and returns.  This ftrace implementation looks like
> > an exercise in slowing down execution to me with no added value.
> The idea is that the instruction at address mcount_call (and
> ftrace_call IIRC) is rewritten at run time.
> Still the code is not active currently (because CONFIG_DYNAMIC_FTRACE
> isn't selectable for ARM) and needs some care anyhow on reactivation
> because the way how dynamic ftrace works changed somehow.  Didn't look
> at it up to now though.

Ok - it would be nice if there was a comment to explain that.

Is someone going to fix the existing ftrace before trying to build stuff
on top of it?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-25  8:42       ` Russell King - ARM Linux
  2009-03-25  8:54         ` Ingo Molnar
@ 2009-03-25 11:41         ` Frederic Weisbecker
  2009-03-25 16:34         ` Tim Bird
  2 siblings, 0 replies; 37+ messages in thread
From: Frederic Weisbecker @ 2009-03-25 11:41 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Abhishek Sagar, Tim Bird, linux-arm-kernel, linux kernel,
	Steven Rostedt, Ingo Molnar, Uwe Kleine-König

On Wed, Mar 25, 2009 at 08:42:48AM +0000, Russell King - ARM Linux wrote:
> On Tue, Mar 24, 2009 at 11:48:58PM +0100, Frederic Weisbecker wrote:
> > On Tue, Mar 24, 2009 at 06:29:03PM -0400, Abhishek Sagar wrote:
> > > Instead of just restoring the old backed-up args, lr can be fixed up
> > > inside the entry hook to point to the return hook. So when the traced
> > > function returns, it actually returns to the return hook (where we can
> > > restore the original return address). This means that
> > > -finstrument-functions is not required at all. This is analogous to how
> > > kretprobes work. The only difference here is that instead of planting a
> > > kprobe at the function entry and redirecting the function return to the
> > > profiling exit routine, we can use mcount. This is slightly more
> > > complicated to implement but can be a very efficient alternative to
> > > kretprobes.
> > > --
> > > Abhishek
> > > 
> > 
> > Indeed, you need to override lr, that even the only solution.
> > I was still thinking in an x86 way with its on stack return address.
> 
> As pointed out in my previous mail, identifying where on the stack the
> return address is stored is only possible for OABI with frame pointers.
> 
> EABI will probably be possible with the stack unwinding code, but it
> probably won't be cheap.  The EABI unwinder is scheduled for merging
> during the present now-open merge window.


Hm, if I'm not wrong, the function tracer already depends on frame pointer,
this is necessary to retrieve the parent of the caller.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-25 11:21               ` Russell King - ARM Linux
@ 2009-03-25 12:09                 ` Uwe Kleine-König
  0 siblings, 0 replies; 37+ messages in thread
From: Uwe Kleine-König @ 2009-03-25 12:09 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Ingo Molnar, Frederic Weisbecker, Abhishek Sagar, Tim Bird,
	linux-arm-kernel, linux kernel, Steven Rostedt, Ingo Molnar

Hi Russell,

> > > Hmm, and it looks like the ftrace code is rather crap:
> > > 
> > > ENTRY(mcount)
> > >         stmdb sp!, {r0-r3, lr}
> > >         ldr r0, =ftrace_trace_function
> > >         ldr r2, [r0]
> > >         adr r0, ftrace_stub
> > >         cmp r0, r2
> > >         bne trace
> > >         ldr lr, [fp, #-4]                       @ restore lr
> > >         ldmia sp!, {r0-r3, pc}
> > > 
> > > trace:
> > >         ldr r1, [fp, #-4]                       @ lr of instrumented routine
> > >         mov r0, lr
> > >         sub r0, r0, #MCOUNT_INSN_SIZE
> > >         mov lr, pc
> > >         mov pc, r2
> > >  XXX calling a C function results in r0-r3,ip,lr being clobbered XXX
> > > 
> > >         mov lr, r1                              @ restore lr
> > >  XXX not necessarily, r1 might be some other random value
> > > 
> > >         ldmia sp!, {r0-r3, pc}
> > > 
> > > In fact, to me the above code looks totally crap, because it's checking
> > > whether the caller is 'ftrace_stub'.  It can never be 'ftrace_stub'
> > > because that is an assembly function:
> > > 
> > >         .globl ftrace_stub
> > > ftrace_stub:
> > >         mov pc, lr
> > > 
> > > and therefore gcc has no hand in adding a mcount call to it.
> > Hhhm.  Isn't the equivalent C-Code ~ as follows:
> > 
> > 	if (ftrace_trace_function != ftrace_stub)
> > 		trace(some, args);
> > 	return;
> > ?  ftrace_trace_function is initialised to ftrace_stub at compile time
> > and is changed when a tracerfunction is registered.
> 
> Correct.  But my point is that there's no way for ftrace_stub to ever call
> mcount.  So the check there is pointless.
Which check?  ftrace_trace_function isn't the caller of mcount but a
function pointer defined in kernel/trace/ftrace.c.  And if this pointer
changes, the if-condition becomes true.

> Ok - it would be nice if there was a comment to explain that.
Some comments would be nice, that's right.

> Is someone going to fix the existing ftrace before trying to build stuff
> on top of it?
Mmh, I think you misunderstood the code, so I don't see something to fix
here (at least for non-dynamic ftrace).  If the next few mails show,
that it's me who didn't understand the code, I will fix the problems, of
course.

Best regards
Uwe

-- 
Pengutronix e.K.                              | Uwe Kleine-König            |
Industrial Linux Solutions                    | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-24 21:57     ` Frederic Weisbecker
  2009-03-24 22:14       ` Ingo Molnar
@ 2009-03-25 16:00       ` Steven Rostedt
  2009-03-25 17:13         ` Frederic Weisbecker
  2009-03-25 20:27         ` [PATCH][GIT PULL] x86, function-graph: only save return values on x86_64 Steven Rostedt
  1 sibling, 2 replies; 37+ messages in thread
From: Steven Rostedt @ 2009-03-25 16:00 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, Tim Bird, linux-arm-kernel, linux kernel,
	Ingo Molnar, Abhishek Sagar, Russell King, Uwe Kleine-König


On Tue, 24 Mar 2009, Frederic Weisbecker wrote:

> On Tue, Mar 24, 2009 at 10:48:46PM +0100, Ingo Molnar wrote:
> > 
> > * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > 
> > > Well it's a very naive listing, there are sometimes some problems. 
> > > For example on x86-64, I had to save even some non-scratch 
> > > registers before calling the return hook, I still don't know why.
> > 
> > btw., which are those registers?
> > 
> > 	Ingo
> 
> 
> I would expect to only save rax,rdi,rsi,rdx,rcx,r8,r9 which
> are used for parameters.
> And I had some crashes until I append r10 and r11 which actually are
> scratch if I'm not wrong, but since they are scratch and are not used for
> arguments, I thought they didn't need to be saved.
> 
> Well, I think there were some code flow cases I was missing.
> 
> 
> The complete code is:
> 
> 	movq %rax, (%rsp)
> 	movq %rcx, 8(%rsp)
> 	movq %rdx, 16(%rsp)
> 	movq %rsi, 24(%rsp)
> 	movq %rdi, 32(%rsp)
> 	movq %r8, 40(%rsp)
> 	movq %r9, 48(%rsp)
> 	movq %r10, 56(%rsp)
> 	movq %r11, 64(%rsp)
> 
> 	call ftrace_return_to_handler
> 
> 	movq %rax, 72(%rsp) <-- get original return value
> 	movq 64(%rsp), %r11
> 	movq 56(%rsp), %r10
> 	movq 48(%rsp), %r9
> 	movq 40(%rsp), %r8
> 	movq 32(%rsp), %rdi
> 	movq 24(%rsp), %rsi
> 	movq 16(%rsp), %rdx
> 	movq 8(%rsp), %rcx
> 	movq (%rsp), %rax
> 	addq $72, %rsp

This bothers me. In PowerPC 64, all I have is:

_GLOBAL(return_to_handler)
	/* need to save return values */
	std	r4,  -24(r1)
	std	r3,  -16(r1)
	std	r31, -8(r1)
	mr	r31, r1
	stdu	r1, -112(r1)

	bl	.ftrace_return_to_handler
	nop

	/* return value has real return address */
	mtlr	r3

	ld	r1, 0(r1)
	ld	r4,  -24(r1)
	ld	r3,  -16(r1)
	ld	r31, -8(r1)

	/* Jump back to real return address */
	blr

All I save is the return values (and I'm paranoid with that, by saving 
both r3 and r4 and not just r3) as well as saving the stack. There should 
be no reason to save any other registers.

This is not the same as mcount. mcount varies differently from arch to 
arch. But this is the return of a function. This is not a mcount call, and 
really has nothing to do with mcount.

If you think about it, the return is coming back from a function that 
should have already saved all the registers that it modifies. The caller 
of that function (the one we will return to) should have saved any 
registers that are allowed to be modified by the callee.

When we call our ftrace_return_to_handler function it too will save any 
register that it must for callees and restore it on return.

Perhaps the issue you had with x86_64 was that you did not set up the 
stack frame properly? And by saving all those registers, it just happen to 
do it for you?

-- Steve


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-25  8:42       ` Russell King - ARM Linux
  2009-03-25  8:54         ` Ingo Molnar
  2009-03-25 11:41         ` Frederic Weisbecker
@ 2009-03-25 16:34         ` Tim Bird
  2009-03-25 17:05           ` Uwe Kleine-König
  2009-03-27 12:58           ` Catalin Marinas
  2 siblings, 2 replies; 37+ messages in thread
From: Tim Bird @ 2009-03-25 16:34 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Frederic Weisbecker, Abhishek Sagar, linux-arm-kernel,
	linux kernel, Steven Rostedt, Ingo Molnar, Uwe Kleine-König

Russell King - ARM Linux wrote:
> As pointed out in my previous mail, identifying where on the stack the
> return address is stored is only possible for OABI with frame pointers.
> 
> EABI will probably be possible with the stack unwinding code, but it
> probably won't be cheap.  The EABI unwinder is scheduled for merging
> during the present now-open merge window.
> 

-finstrument-functions is looking better and better.  I know it
adds more overhead than the mcount call, and may wreak havoc with
the dynamic ftrace mechanisms, but at least the callouts are
simple, clear, and you get both entry and exit, at fixed
costs.  I'll take a look at the EABI unwinder to see
what kind of variability it introduces (e.g. if it does a stack
scan or something).

Thanks very much for the information!
 -- Tim

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=============================


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-25  8:54         ` Ingo Molnar
  2009-03-25  9:57           ` Russell King - ARM Linux
@ 2009-03-25 16:41           ` Tim Bird
  1 sibling, 0 replies; 37+ messages in thread
From: Tim Bird @ 2009-03-25 16:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Russell King - ARM Linux, Frederic Weisbecker, Abhishek Sagar,
	linux-arm-kernel, linux kernel, Steven Rostedt, Ingo Molnar,
	Uwe Kleine-König

Ingo Molnar wrote:
> Unwinding is not realistic or desired for the function tracer - it 
> runs in every kernel function so performance is paramount.
> 
> So, if i understood you correctly, an OABI_COMPAT and FRAME_POINTERS 
> dependency has to be added to the ARM HAVE_FUNCTION_GRAPH_TRACER 
> Kconfig rule.

This, unfortunately, is a problem for me.  All my ARM
projects are now using EABI.  If the unwinding looks like
it has too much overhead, I'll do some research on the
-finstrument-functions (__cyg_profile_func_enter/exit) approach.

I'm not sure, however, if it's possible to integrate this with
the dynamic tracing mechanisms, though.  So something may have
to give to get this supported on ARM.
 -- Tim

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=============================


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-25 16:34         ` Tim Bird
@ 2009-03-25 17:05           ` Uwe Kleine-König
  2009-03-25 17:17             ` Russell King - ARM Linux
  2009-03-27 12:58           ` Catalin Marinas
  1 sibling, 1 reply; 37+ messages in thread
From: Uwe Kleine-König @ 2009-03-25 17:05 UTC (permalink / raw)
  To: Tim Bird, Russell King - ARM Linux
  Cc: Frederic Weisbecker, Abhishek Sagar, linux-arm-kernel,
	linux kernel, Steven Rostedt, Ingo Molnar

On Wed, Mar 25, 2009 at 09:34:07AM -0700, Tim Bird wrote:
> Russell King - ARM Linux wrote:
> > As pointed out in my previous mail, identifying where on the stack the
> > return address is stored is only possible for OABI with frame pointers.
> > 
> > EABI will probably be possible with the stack unwinding code, but it
> > probably won't be cheap.  The EABI unwinder is scheduled for merging
> > during the present now-open merge window.
EABI with frame pointers should work, too, shouldn't it?

Best regards
Uwe

-- 
Pengutronix e.K.                              | Uwe Kleine-König            |
Industrial Linux Solutions                    | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-25 16:00       ` Steven Rostedt
@ 2009-03-25 17:13         ` Frederic Weisbecker
  2009-03-25 20:27         ` [PATCH][GIT PULL] x86, function-graph: only save return values on x86_64 Steven Rostedt
  1 sibling, 0 replies; 37+ messages in thread
From: Frederic Weisbecker @ 2009-03-25 17:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Tim Bird, linux-arm-kernel, linux kernel,
	Ingo Molnar, Abhishek Sagar, Russell King, Uwe Kleine-König

On Wed, Mar 25, 2009 at 12:00:24PM -0400, Steven Rostedt wrote:
> 
> On Tue, 24 Mar 2009, Frederic Weisbecker wrote:
> 
> > On Tue, Mar 24, 2009 at 10:48:46PM +0100, Ingo Molnar wrote:
> > > 
> > > * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > > 
> > > > Well it's a very naive listing, there are sometimes some problems. 
> > > > For example on x86-64, I had to save even some non-scratch 
> > > > registers before calling the return hook, I still don't know why.
> > > 
> > > btw., which are those registers?
> > > 
> > > 	Ingo
> > 
> > 
> > I would expect to only save rax,rdi,rsi,rdx,rcx,r8,r9 which
> > are used for parameters.
> > And I had some crashes until I append r10 and r11 which actually are
> > scratch if I'm not wrong, but since they are scratch and are not used for
> > arguments, I thought they didn't need to be saved.
> > 
> > Well, I think there were some code flow cases I was missing.
> > 
> > 
> > The complete code is:
> > 
> > 	movq %rax, (%rsp)
> > 	movq %rcx, 8(%rsp)
> > 	movq %rdx, 16(%rsp)
> > 	movq %rsi, 24(%rsp)
> > 	movq %rdi, 32(%rsp)
> > 	movq %r8, 40(%rsp)
> > 	movq %r9, 48(%rsp)
> > 	movq %r10, 56(%rsp)
> > 	movq %r11, 64(%rsp)
> > 
> > 	call ftrace_return_to_handler
> > 
> > 	movq %rax, 72(%rsp) <-- get original return value
> > 	movq 64(%rsp), %r11
> > 	movq 56(%rsp), %r10
> > 	movq 48(%rsp), %r9
> > 	movq 40(%rsp), %r8
> > 	movq 32(%rsp), %rdi
> > 	movq 24(%rsp), %rsi
> > 	movq 16(%rsp), %rdx
> > 	movq 8(%rsp), %rcx
> > 	movq (%rsp), %rax
> > 	addq $72, %rsp
> 
> This bothers me. In PowerPC 64, all I have is:
> 
> _GLOBAL(return_to_handler)
> 	/* need to save return values */
> 	std	r4,  -24(r1)
> 	std	r3,  -16(r1)
> 	std	r31, -8(r1)
> 	mr	r31, r1
> 	stdu	r1, -112(r1)
> 
> 	bl	.ftrace_return_to_handler
> 	nop
> 
> 	/* return value has real return address */
> 	mtlr	r3
> 
> 	ld	r1, 0(r1)
> 	ld	r4,  -24(r1)
> 	ld	r3,  -16(r1)
> 	ld	r31, -8(r1)
> 
> 	/* Jump back to real return address */
> 	blr
> 
> All I save is the return values (and I'm paranoid with that, by saving 
> both r3 and r4 and not just r3) as well as saving the stack. There should 
> be no reason to save any other registers.
> 
> This is not the same as mcount. mcount varies differently from arch to 
> arch. But this is the return of a function. This is not a mcount call, and 
> really has nothing to do with mcount.
> 
> If you think about it, the return is coming back from a function that 
> should have already saved all the registers that it modifies. The caller 
> of that function (the one we will return to) should have saved any 
> registers that are allowed to be modified by the callee.
> 
> When we call our ftrace_return_to_handler function it too will save any 
> register that it must for callees and restore it on return.
> 
> Perhaps the issue you had with x86_64 was that you did not set up the 
> stack frame properly? And by saving all those registers, it just happen to 
> do it for you?


I don't know. It seems to me that the stack frame is well set.
This is weird.

 
> -- Steve
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-25 17:05           ` Uwe Kleine-König
@ 2009-03-25 17:17             ` Russell King - ARM Linux
  2009-03-25 18:37               ` Tim Bird
  0 siblings, 1 reply; 37+ messages in thread
From: Russell King - ARM Linux @ 2009-03-25 17:17 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Tim Bird, Frederic Weisbecker, Abhishek Sagar, linux-arm-kernel,
	linux kernel, Steven Rostedt, Ingo Molnar

On Wed, Mar 25, 2009 at 06:05:33PM +0100, Uwe Kleine-König wrote:
> On Wed, Mar 25, 2009 at 09:34:07AM -0700, Tim Bird wrote:
> > Russell King - ARM Linux wrote:
> > > As pointed out in my previous mail, identifying where on the stack the
> > > return address is stored is only possible for OABI with frame pointers.
> > > 
> > > EABI will probably be possible with the stack unwinding code, but it
> > > probably won't be cheap.  The EABI unwinder is scheduled for merging
> > > during the present now-open merge window.
> EABI with frame pointers should work, too, shouldn't it?

Yes, as I said at the top of my reply dated Wed, 25 Mar 2009 09:57:51 +0000

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-25 17:17             ` Russell King - ARM Linux
@ 2009-03-25 18:37               ` Tim Bird
  2009-03-25 18:41                 ` Steven Rostedt
  0 siblings, 1 reply; 37+ messages in thread
From: Tim Bird @ 2009-03-25 18:37 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Uwe Kleine-König, Frederic Weisbecker, Abhishek Sagar,
	linux-arm-kernel, linux kernel, Steven Rostedt, Ingo Molnar

Russell King - ARM Linux wrote:
> On Wed, Mar 25, 2009 at 06:05:33PM +0100, Uwe Kleine-K�nig wrote:
>> On Wed, Mar 25, 2009 at 09:34:07AM -0700, Tim Bird wrote:
>>> Russell King - ARM Linux wrote:
>>>> As pointed out in my previous mail, identifying where on the stack the
>>>> return address is stored is only possible for OABI with frame pointers.
>>>>
>>>> EABI will probably be possible with the stack unwinding code, but it
>>>> probably won't be cheap.  The EABI unwinder is scheduled for merging
>>>> during the present now-open merge window.
>> EABI with frame pointers should work, too, shouldn't it?
> 
> Yes, as I said at the top of my reply dated Wed, 25 Mar 2009 09:57:51 +0000

It turns out you can't use -pg and -fomit-frame-pointers at the same time.
At least, my gcc complains about this:
$ make
arm-sony-linux-gnueabi-dev-gcc -g -pg -fomit-frame-pointer -o hello hello.c
arm-sony-linux-gnueabi-dev-gcc: -pg and -fomit-frame-pointer are incompatible
make: *** [hello] Error 1

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=============================


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-25 18:37               ` Tim Bird
@ 2009-03-25 18:41                 ` Steven Rostedt
  0 siblings, 0 replies; 37+ messages in thread
From: Steven Rostedt @ 2009-03-25 18:41 UTC (permalink / raw)
  To: Tim Bird
  Cc: Russell King - ARM Linux, Uwe Kleine-König,
	Frederic Weisbecker, Abhishek Sagar, linux-arm-kernel,
	linux kernel, Ingo Molnar


On Wed, 25 Mar 2009, Tim Bird wrote:

> Russell King - ARM Linux wrote:
> > On Wed, Mar 25, 2009 at 06:05:33PM +0100, Uwe Kleine-K???nig wrote:
> >> On Wed, Mar 25, 2009 at 09:34:07AM -0700, Tim Bird wrote:
> >>> Russell King - ARM Linux wrote:
> >>>> As pointed out in my previous mail, identifying where on the stack the
> >>>> return address is stored is only possible for OABI with frame pointers.
> >>>>
> >>>> EABI will probably be possible with the stack unwinding code, but it
> >>>> probably won't be cheap.  The EABI unwinder is scheduled for merging
> >>>> during the present now-open merge window.
> >> EABI with frame pointers should work, too, shouldn't it?
> > 
> > Yes, as I said at the top of my reply dated Wed, 25 Mar 2009 09:57:51 +0000
> 
> It turns out you can't use -pg and -fomit-frame-pointers at the same time.
> At least, my gcc complains about this:
> $ make
> arm-sony-linux-gnueabi-dev-gcc -g -pg -fomit-frame-pointer -o hello hello.c
> arm-sony-linux-gnueabi-dev-gcc: -pg and -fomit-frame-pointer are incompatible
> make: *** [hello] Error 1

Correct, that is because mcount requires (as Russell earlier pointed out) 
the use of frame pointers.

-- Steve


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH][GIT PULL] x86, function-graph: only save return values on x86_64
  2009-03-25 16:00       ` Steven Rostedt
  2009-03-25 17:13         ` Frederic Weisbecker
@ 2009-03-25 20:27         ` Steven Rostedt
  2009-03-25 20:45           ` Jaswinder Singh Rajput
  2009-04-08 16:09           ` Ingo Molnar
  1 sibling, 2 replies; 37+ messages in thread
From: Steven Rostedt @ 2009-03-25 20:27 UTC (permalink / raw)
  To: linux kernel
  Cc: Frederic Weisbecker, Ingo Molnar, Tim Bird, Ingo Molnar,
	Abhishek Sagar, Peter Zijlstra, Thomas Gleixner


Ingo,

Please pull the latest tip/tracing/function-graph tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
tip/tracing/function-graph


Steven Rostedt (1):
      x86, function-graph: only save return values on x86_64

----
 arch/x86/kernel/entry_64.S |   19 +++----------------
 1 files changed, 3 insertions(+), 16 deletions(-)
---------------------------
commit d5b754b2208b1c22691a7cd5d1a65398b5973d74
Author: Steven Rostedt <srostedt@redhat.com>
Date:   Wed Mar 25 14:30:04 2009 -0400

    x86, function-graph: only save return values on x86_64
    
    Impact: speed up
    
    The return to handler portion of the function graph tracer should only
    need to save the return values. The caller already saved off the
    registers that the callee can modify. The returning function already
    saved the registers it modified. When we call our own trace function
    it too will save the registers that the callee must restore.
    
    There's no reason to save off anything more that the registers used
    to return the values.
    
    Note, I did a complete kernel build with this modification and the
    function graph tracer running on x86_64.
    
    Signed-off-by: Steven Rostedt <srostedt@redhat.com>

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index a331ec3..1ac9986 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -147,27 +147,14 @@ END(ftrace_graph_caller)
 GLOBAL(return_to_handler)
 	subq  $80, %rsp
 
+	/* Save the return values */
 	movq %rax, (%rsp)
-	movq %rcx, 8(%rsp)
-	movq %rdx, 16(%rsp)
-	movq %rsi, 24(%rsp)
-	movq %rdi, 32(%rsp)
-	movq %r8, 40(%rsp)
-	movq %r9, 48(%rsp)
-	movq %r10, 56(%rsp)
-	movq %r11, 64(%rsp)
+	movq %rdx, 8(%rsp)
 
 	call ftrace_return_to_handler
 
 	movq %rax, 72(%rsp)
-	movq 64(%rsp), %r11
-	movq 56(%rsp), %r10
-	movq 48(%rsp), %r9
-	movq 40(%rsp), %r8
-	movq 32(%rsp), %rdi
-	movq 24(%rsp), %rsi
-	movq 16(%rsp), %rdx
-	movq 8(%rsp), %rcx
+	movq 8(%rsp), %rdx
 	movq (%rsp), %rax
 	addq $72, %rsp
 	retq

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH][GIT PULL] x86, function-graph: only save return values on x86_64
  2009-03-25 20:27         ` [PATCH][GIT PULL] x86, function-graph: only save return values on x86_64 Steven Rostedt
@ 2009-03-25 20:45           ` Jaswinder Singh Rajput
  2009-03-25 21:26             ` Steven Rostedt
  2009-04-08 16:09           ` Ingo Molnar
  1 sibling, 1 reply; 37+ messages in thread
From: Jaswinder Singh Rajput @ 2009-03-25 20:45 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux kernel, Frederic Weisbecker, Ingo Molnar, Tim Bird,
	Ingo Molnar, Abhishek Sagar, Peter Zijlstra, Thomas Gleixner

On Wed, 2009-03-25 at 16:27 -0400, Steven Rostedt wrote:
> Ingo,
> 
> Please pull the latest tip/tracing/function-graph tree, which can be found at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
> tip/tracing/function-graph
> 
> 
> Steven Rostedt (1):
>       x86, function-graph: only save return values on x86_64
> 
> ----
>  arch/x86/kernel/entry_64.S |   19 +++----------------
>  1 files changed, 3 insertions(+), 16 deletions(-)
> ---------------------------
> commit d5b754b2208b1c22691a7cd5d1a65398b5973d74
> Author: Steven Rostedt <srostedt@redhat.com>
> Date:   Wed Mar 25 14:30:04 2009 -0400
> 
>     x86, function-graph: only save return values on x86_64
>     
>     Impact: speed up
>     
>     The return to handler portion of the function graph tracer should only
>     need to save the return values. The caller already saved off the
>     registers that the callee can modify. The returning function already
>     saved the registers it modified. When we call our own trace function
>     it too will save the registers that the callee must restore.
>     
>     There's no reason to save off anything more that the registers used
>     to return the values.
>     
>     Note, I did a complete kernel build with this modification and the
>     function graph tracer running on x86_64.
>     
>     Signed-off-by: Steven Rostedt <srostedt@redhat.com>
> 
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index a331ec3..1ac9986 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -147,27 +147,14 @@ END(ftrace_graph_caller)
>  GLOBAL(return_to_handler)
>  	subq  $80, %rsp
>  
> +	/* Save the return values */
>  	movq %rax, (%rsp)
> -	movq %rcx, 8(%rsp)
> -	movq %rdx, 16(%rsp)
> -	movq %rsi, 24(%rsp)
> -	movq %rdi, 32(%rsp)
> -	movq %r8, 40(%rsp)
> -	movq %r9, 48(%rsp)
> -	movq %r10, 56(%rsp)
> -	movq %r11, 64(%rsp)
> +	movq %rdx, 8(%rsp)
>  
>  	call ftrace_return_to_handler
>  
>  	movq %rax, 72(%rsp)
> -	movq 64(%rsp), %r11
> -	movq 56(%rsp), %r10
> -	movq 48(%rsp), %r9
> -	movq 40(%rsp), %r8
> -	movq 32(%rsp), %rdi
> -	movq 24(%rsp), %rsi
> -	movq 16(%rsp), %rdx
> -	movq 8(%rsp), %rcx
> +	movq 8(%rsp), %rdx
>  	movq (%rsp), %rax
>  	addq $72, %rsp
>  	retq

hmm, is this gonna work:
	movq %rax, 16(%rsp)
	movq 8(%rsp), %rdx
 	movq (%rsp), %rax
 	addq $16, %rsp
 	retq

--
JSR


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH][GIT PULL] x86, function-graph: only save return values on x86_64
  2009-03-25 20:45           ` Jaswinder Singh Rajput
@ 2009-03-25 21:26             ` Steven Rostedt
  0 siblings, 0 replies; 37+ messages in thread
From: Steven Rostedt @ 2009-03-25 21:26 UTC (permalink / raw)
  To: Jaswinder Singh Rajput
  Cc: linux kernel, Frederic Weisbecker, Ingo Molnar, Tim Bird,
	Ingo Molnar, Abhishek Sagar, Peter Zijlstra, Thomas Gleixner


On Thu, 26 Mar 2009, Jaswinder Singh Rajput wrote:
> > 
> > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> > index a331ec3..1ac9986 100644
> > --- a/arch/x86/kernel/entry_64.S
> > +++ b/arch/x86/kernel/entry_64.S
> > @@ -147,27 +147,14 @@ END(ftrace_graph_caller)
> >  GLOBAL(return_to_handler)
> >  	subq  $80, %rsp
> >  
> > +	/* Save the return values */
> >  	movq %rax, (%rsp)
> > -	movq %rcx, 8(%rsp)
> > -	movq %rdx, 16(%rsp)
> > -	movq %rsi, 24(%rsp)
> > -	movq %rdi, 32(%rsp)
> > -	movq %r8, 40(%rsp)
> > -	movq %r9, 48(%rsp)
> > -	movq %r10, 56(%rsp)
> > -	movq %r11, 64(%rsp)
> > +	movq %rdx, 8(%rsp)
> >  
> >  	call ftrace_return_to_handler
> >  
> >  	movq %rax, 72(%rsp)
> > -	movq 64(%rsp), %r11
> > -	movq 56(%rsp), %r10
> > -	movq 48(%rsp), %r9
> > -	movq 40(%rsp), %r8
> > -	movq 32(%rsp), %rdi
> > -	movq 24(%rsp), %rsi
> > -	movq 16(%rsp), %rdx
> > -	movq 8(%rsp), %rcx
> > +	movq 8(%rsp), %rdx
> >  	movq (%rsp), %rax
> >  	addq $72, %rsp
> >  	retq
> 
> hmm, is this gonna work:
> 	movq %rax, 16(%rsp)
> 	movq 8(%rsp), %rdx
>  	movq (%rsp), %rax
>  	addq $16, %rsp
>  	retq

Interesting. I was just talking over IRC with Frederic about this. One 
theory about his earlier failures was not having a proper stack frame when 
calling the function. This is probably a bogus theory and there was 
probably something else that broke the code. But there is no harm in 
allocating 80 bytes (it worked before). Thus, I rather error on the side 
of least change than risk it.

-- Steve


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-25 16:34         ` Tim Bird
  2009-03-25 17:05           ` Uwe Kleine-König
@ 2009-03-27 12:58           ` Catalin Marinas
  2009-04-09 15:29             ` Daniel Jacobowitz
  1 sibling, 1 reply; 37+ messages in thread
From: Catalin Marinas @ 2009-03-27 12:58 UTC (permalink / raw)
  To: Tim Bird
  Cc: Russell King - ARM Linux, Frederic Weisbecker, Abhishek Sagar,
	linux-arm-kernel, linux kernel, Steven Rostedt, Ingo Molnar,
	Uwe Kleine-König

On Wed, 2009-03-25 at 09:34 -0700, Tim Bird wrote:
> Russell King - ARM Linux wrote:
> > As pointed out in my previous mail, identifying where on the stack the
> > return address is stored is only possible for OABI with frame pointers.
> > 
> > EABI will probably be possible with the stack unwinding code, but it
> > probably won't be cheap.  The EABI unwinder is scheduled for merging
> > during the present now-open merge window.
> 
> -finstrument-functions is looking better and better.  I know it
> adds more overhead than the mcount call, and may wreak havoc with
> the dynamic ftrace mechanisms, but at least the callouts are
> simple, clear, and you get both entry and exit, at fixed
> costs.  

This approach would work even with Thumb-2 compiled kernels (not in
mainline) where the frame pointer is useless wrt backtraces. The Thumb-2
kernel cannot currently be compiled with frame pointer because there is
some inline assembly where gcc fail to allocate lower registers (in
Thumb, the frame pointer is r7).

Anyway, the EABI toolchain I have (from CodeSourcery) generates
something like below with -pg for both ARM and Thumb code (so that it
doesn't rely on the frame pointer):

	push	{lr}
	bl	__gnu_mcount_nc

I think this will be (was?) merged into the mainline gcc for ARM. The
-pg option is still incompatible with -fomit-frame-pointer but maybe it
shouldn't be anymore.

> I'll take a look at the EABI unwinder to see
> what kind of variability it introduces (e.g. if it does a stack
> scan or something).

It's not cheap - it looks up the function address in a binary tree and
it reads some bytecodes (either from the tree or from a different table
pointed to by the tree) which it interprets to perform the equivalent of
a return from the current function (reading registers from the stack and
changing SP, LR).

-- 
Catalin


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH][GIT PULL] x86, function-graph: only save return values on x86_64
  2009-03-25 20:27         ` [PATCH][GIT PULL] x86, function-graph: only save return values on x86_64 Steven Rostedt
  2009-03-25 20:45           ` Jaswinder Singh Rajput
@ 2009-04-08 16:09           ` Ingo Molnar
  2009-04-08 16:37             ` Steven Rostedt
  2009-04-08 17:40             ` Frederic Weisbecker
  1 sibling, 2 replies; 37+ messages in thread
From: Ingo Molnar @ 2009-04-08 16:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux kernel, Frederic Weisbecker, Tim Bird, Ingo Molnar,
	Abhishek Sagar, Peter Zijlstra, Thomas Gleixner


(replying to old mail - working down my awful email backlog)

do we still need this for -tip?

	Ingo

* Steven Rostedt <rostedt@goodmis.org> wrote:

> 
> Ingo,
> 
> Please pull the latest tip/tracing/function-graph tree, which can be found at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
> tip/tracing/function-graph
> 
> 
> Steven Rostedt (1):
>       x86, function-graph: only save return values on x86_64
> 
> ----
>  arch/x86/kernel/entry_64.S |   19 +++----------------
>  1 files changed, 3 insertions(+), 16 deletions(-)
> ---------------------------
> commit d5b754b2208b1c22691a7cd5d1a65398b5973d74
> Author: Steven Rostedt <srostedt@redhat.com>
> Date:   Wed Mar 25 14:30:04 2009 -0400
> 
>     x86, function-graph: only save return values on x86_64
>     
>     Impact: speed up
>     
>     The return to handler portion of the function graph tracer should only
>     need to save the return values. The caller already saved off the
>     registers that the callee can modify. The returning function already
>     saved the registers it modified. When we call our own trace function
>     it too will save the registers that the callee must restore.
>     
>     There's no reason to save off anything more that the registers used
>     to return the values.
>     
>     Note, I did a complete kernel build with this modification and the
>     function graph tracer running on x86_64.
>     
>     Signed-off-by: Steven Rostedt <srostedt@redhat.com>
> 
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index a331ec3..1ac9986 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -147,27 +147,14 @@ END(ftrace_graph_caller)
>  GLOBAL(return_to_handler)
>  	subq  $80, %rsp
>  
> +	/* Save the return values */
>  	movq %rax, (%rsp)
> -	movq %rcx, 8(%rsp)
> -	movq %rdx, 16(%rsp)
> -	movq %rsi, 24(%rsp)
> -	movq %rdi, 32(%rsp)
> -	movq %r8, 40(%rsp)
> -	movq %r9, 48(%rsp)
> -	movq %r10, 56(%rsp)
> -	movq %r11, 64(%rsp)
> +	movq %rdx, 8(%rsp)
>  
>  	call ftrace_return_to_handler
>  
>  	movq %rax, 72(%rsp)
> -	movq 64(%rsp), %r11
> -	movq 56(%rsp), %r10
> -	movq 48(%rsp), %r9
> -	movq 40(%rsp), %r8
> -	movq 32(%rsp), %rdi
> -	movq 24(%rsp), %rsi
> -	movq 16(%rsp), %rdx
> -	movq 8(%rsp), %rcx
> +	movq 8(%rsp), %rdx
>  	movq (%rsp), %rax
>  	addq $72, %rsp
>  	retq

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH][GIT PULL] x86, function-graph: only save return values on x86_64
  2009-04-08 16:09           ` Ingo Molnar
@ 2009-04-08 16:37             ` Steven Rostedt
  2009-04-08 16:41               ` Ingo Molnar
  2009-04-08 17:40             ` Frederic Weisbecker
  1 sibling, 1 reply; 37+ messages in thread
From: Steven Rostedt @ 2009-04-08 16:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux kernel, Frederic Weisbecker, Tim Bird, Ingo Molnar,
	Abhishek Sagar, Peter Zijlstra, Thomas Gleixner



On Wed, 8 Apr 2009, Ingo Molnar wrote:

> 
> (replying to old mail - working down my awful email backlog)
> 
> do we still need this for -tip?
> 

I've ran it without issues. It will help speed up the function graph 
tracer since it removes a bunch of copies at every function call. But I 
would test it in your work suite, since I only tested it on two boxes.

-- Steve


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH][GIT PULL] x86, function-graph: only save return values on x86_64
  2009-04-08 16:37             ` Steven Rostedt
@ 2009-04-08 16:41               ` Ingo Molnar
  0 siblings, 0 replies; 37+ messages in thread
From: Ingo Molnar @ 2009-04-08 16:41 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux kernel, Frederic Weisbecker, Tim Bird, Ingo Molnar,
	Abhishek Sagar, Peter Zijlstra, Thomas Gleixner


* Steven Rostedt <rostedt@goodmis.org> wrote:

> > do we still need this for -tip?
> 
> I've ran it without issues. It will help speed up the function 
> graph tracer since it removes a bunch of copies at every function 
> call. But I would test it in your work suite, since I only tested 
> it on two boxes.

ok, great - i've applied it to tip:tracing/ftrace.

	Ingo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH][GIT PULL] x86, function-graph: only save return values on x86_64
  2009-04-08 16:09           ` Ingo Molnar
  2009-04-08 16:37             ` Steven Rostedt
@ 2009-04-08 17:40             ` Frederic Weisbecker
  1 sibling, 0 replies; 37+ messages in thread
From: Frederic Weisbecker @ 2009-04-08 17:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, linux kernel, Tim Bird, Ingo Molnar,
	Abhishek Sagar, Peter Zijlstra, Thomas Gleixner

On Wed, Apr 08, 2009 at 06:09:15PM +0200, Ingo Molnar wrote:
> 
> (replying to old mail - working down my awful email backlog)
> 
> do we still need this for -tip?
> 
> 	Ingo


Yes. It drops a small overhead on the function graph tracer.

Thanks,
Frederic.

 
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > 
> > Ingo,
> > 
> > Please pull the latest tip/tracing/function-graph tree, which can be found at:
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
> > tip/tracing/function-graph
> > 
> > 
> > Steven Rostedt (1):
> >       x86, function-graph: only save return values on x86_64
> > 
> > ----
> >  arch/x86/kernel/entry_64.S |   19 +++----------------
> >  1 files changed, 3 insertions(+), 16 deletions(-)
> > ---------------------------
> > commit d5b754b2208b1c22691a7cd5d1a65398b5973d74
> > Author: Steven Rostedt <srostedt@redhat.com>
> > Date:   Wed Mar 25 14:30:04 2009 -0400
> > 
> >     x86, function-graph: only save return values on x86_64
> >     
> >     Impact: speed up
> >     
> >     The return to handler portion of the function graph tracer should only
> >     need to save the return values. The caller already saved off the
> >     registers that the callee can modify. The returning function already
> >     saved the registers it modified. When we call our own trace function
> >     it too will save the registers that the callee must restore.
> >     
> >     There's no reason to save off anything more that the registers used
> >     to return the values.
> >     
> >     Note, I did a complete kernel build with this modification and the
> >     function graph tracer running on x86_64.
> >     
> >     Signed-off-by: Steven Rostedt <srostedt@redhat.com>
> > 
> > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> > index a331ec3..1ac9986 100644
> > --- a/arch/x86/kernel/entry_64.S
> > +++ b/arch/x86/kernel/entry_64.S
> > @@ -147,27 +147,14 @@ END(ftrace_graph_caller)
> >  GLOBAL(return_to_handler)
> >  	subq  $80, %rsp
> >  
> > +	/* Save the return values */
> >  	movq %rax, (%rsp)
> > -	movq %rcx, 8(%rsp)
> > -	movq %rdx, 16(%rsp)
> > -	movq %rsi, 24(%rsp)
> > -	movq %rdi, 32(%rsp)
> > -	movq %r8, 40(%rsp)
> > -	movq %r9, 48(%rsp)
> > -	movq %r10, 56(%rsp)
> > -	movq %r11, 64(%rsp)
> > +	movq %rdx, 8(%rsp)
> >  
> >  	call ftrace_return_to_handler
> >  
> >  	movq %rax, 72(%rsp)
> > -	movq 64(%rsp), %r11
> > -	movq 56(%rsp), %r10
> > -	movq 48(%rsp), %r9
> > -	movq 40(%rsp), %r8
> > -	movq 32(%rsp), %rdi
> > -	movq 24(%rsp), %rsi
> > -	movq 16(%rsp), %rdx
> > -	movq 8(%rsp), %rcx
> > +	movq 8(%rsp), %rdx
> >  	movq (%rsp), %rax
> >  	addq $72, %rsp
> >  	retq


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Anyone working on ftrace function graph support on ARM?
  2009-03-27 12:58           ` Catalin Marinas
@ 2009-04-09 15:29             ` Daniel Jacobowitz
  0 siblings, 0 replies; 37+ messages in thread
From: Daniel Jacobowitz @ 2009-04-09 15:29 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Tim Bird, Russell King - ARM Linux, Frederic Weisbecker,
	Abhishek Sagar, linux-arm-kernel, linux kernel, Steven Rostedt,
	Ingo Molnar, Uwe Kleine-König

Sorry for the late reply, I'm way behind on list mail.

On Fri, Mar 27, 2009 at 12:58:08PM +0000, Catalin Marinas wrote:
> Anyway, the EABI toolchain I have (from CodeSourcery) generates
> something like below with -pg for both ARM and Thumb code (so that it
> doesn't rely on the frame pointer):
> 
> 	push	{lr}
> 	bl	__gnu_mcount_nc
> 
> I think this will be (was?) merged into the mainline gcc for ARM. The
> -pg option is still incompatible with -fomit-frame-pointer but maybe it
> shouldn't be anymore.

Was merged.  It'll be in GCC 4.4 and is in our current Lite releases;
it was created to solve precisely this problem.  It should be easy to
apply to an older GCC release if desired, but it's not in existing
FSF releases.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2009-04-09 15:54 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-24 19:38 Anyone working on ftrace function graph support on ARM? Tim Bird
2009-03-24 20:25 ` Ingo Molnar
2009-03-24 20:48   ` Tim Bird
2009-03-24 20:38 ` Uwe Kleine-König
2009-03-24 21:36 ` Frederic Weisbecker
2009-03-24 21:40   ` Tim Bird
2009-03-24 21:48   ` Ingo Molnar
2009-03-24 21:57     ` Frederic Weisbecker
2009-03-24 22:14       ` Ingo Molnar
2009-03-24 22:54         ` Frederic Weisbecker
2009-03-25  8:36         ` Russell King - ARM Linux
2009-03-25 16:00       ` Steven Rostedt
2009-03-25 17:13         ` Frederic Weisbecker
2009-03-25 20:27         ` [PATCH][GIT PULL] x86, function-graph: only save return values on x86_64 Steven Rostedt
2009-03-25 20:45           ` Jaswinder Singh Rajput
2009-03-25 21:26             ` Steven Rostedt
2009-04-08 16:09           ` Ingo Molnar
2009-04-08 16:37             ` Steven Rostedt
2009-04-08 16:41               ` Ingo Molnar
2009-04-08 17:40             ` Frederic Weisbecker
2009-03-24 22:29   ` Anyone working on ftrace function graph support on ARM? Abhishek Sagar
2009-03-24 22:48     ` Frederic Weisbecker
2009-03-25  8:42       ` Russell King - ARM Linux
2009-03-25  8:54         ` Ingo Molnar
2009-03-25  9:57           ` Russell King - ARM Linux
2009-03-25 10:45             ` Uwe Kleine-König
2009-03-25 11:21               ` Russell King - ARM Linux
2009-03-25 12:09                 ` Uwe Kleine-König
2009-03-25 16:41           ` Tim Bird
2009-03-25 11:41         ` Frederic Weisbecker
2009-03-25 16:34         ` Tim Bird
2009-03-25 17:05           ` Uwe Kleine-König
2009-03-25 17:17             ` Russell King - ARM Linux
2009-03-25 18:37               ` Tim Bird
2009-03-25 18:41                 ` Steven Rostedt
2009-03-27 12:58           ` Catalin Marinas
2009-04-09 15:29             ` Daniel Jacobowitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).