linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ftrace not showing the process names for all processes on syscall events
@ 2020-03-30 14:28 David Laight
  2020-03-30 15:08 ` Steven Rostedt
  0 siblings, 1 reply; 7+ messages in thread
From: David Laight @ 2020-03-30 14:28 UTC (permalink / raw)
  To: Linux Kernel Mailing List, 'Steven Rostedt'

I've just updated one of my systems to 5.6.0-rc7+ (end of last week).
ftrace in showing <...>-3179 in the system call events for a couple
of threads of the active processes.
Other threads of the same processes are fine.
The scheduler process switch events also show the full name.

Is this a known regression?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ftrace not showing the process names for all processes on syscall events
  2020-03-30 14:28 ftrace not showing the process names for all processes on syscall events David Laight
@ 2020-03-30 15:08 ` Steven Rostedt
  2020-03-30 15:34   ` David Laight
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2020-03-30 15:08 UTC (permalink / raw)
  To: David Laight; +Cc: Linux Kernel Mailing List

On Mon, 30 Mar 2020 14:28:01 +0000
David Laight <David.Laight@ACULAB.COM> wrote:

> I've just updated one of my systems to 5.6.0-rc7+ (end of last week).
> ftrace in showing <...>-3179 in the system call events for a couple
> of threads of the active processes.
> Other threads of the same processes are fine.
> The scheduler process switch events also show the full name.
> 
> Is this a known regression?


Well, that code hasn't changed in years. But can you explain more of what
you did? Was this the "trace" file, or "trace_pipe" file? The command names
are cached in an array (see /sys/kernel/tracing/saved_cmdlines) of the size
that is defined by the saved_cmdlines_size file.

The cmdlines get updated via the sched_switch and sched_waking trace
events. The update is protected by a spinlock, which is only taken with a
"trylock", if the lock fails, then it does not get updated (we don't want
to hold back the running code just to cache the name of an event), but it
will try at the next sched event until it succeeds. This means that under
strong contention, it may fail to cache certain names.

This is not a regression, it's really just the work load that can cause
event names to be missed. I could work on something that if you have the
sched events enabled, that the output side could do its own caching to get
better results.

-- Steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: ftrace not showing the process names for all processes on syscall events
  2020-03-30 15:08 ` Steven Rostedt
@ 2020-03-30 15:34   ` David Laight
  2020-03-30 18:07     ` Steven Rostedt
  0 siblings, 1 reply; 7+ messages in thread
From: David Laight @ 2020-03-30 15:34 UTC (permalink / raw)
  To: 'Steven Rostedt'; +Cc: Linux Kernel Mailing List

From: Steven Rostedt
> Sent: 30 March 2020 16:09
> On Mon, 30 Mar 2020 14:28:01 +0000
> David Laight <David.Laight@ACULAB.COM> wrote:
> 
> > I've just updated one of my systems to 5.6.0-rc7+ (end of last week).
> > ftrace in showing <...>-3179 in the system call events for a couple
> > of threads of the active processes.
> > Other threads of the same processes are fine.
> > The scheduler process switch events also show the full name.
> >
> > Is this a known regression?
> 
> 
> Well, that code hasn't changed in years. But can you explain more of what
> you did? Was this the "trace" file, or "trace_pipe" file? The command names
> are cached in an array (see /sys/kernel/tracing/saved_cmdlines) of the size
> that is defined by the saved_cmdlines_size file.

It was the 'trace' file - ie the text formatted by the kernel.
I've looked at a lot of traces over the last few weeks and not seen it before.
The workload is the same, so I assumed there might be a kernel change.
I've just updated from a 5.4-rc7 kernel to a 5.6-rc7 one.

> The cmdlines get updated via the sched_switch and sched_waking trace
> events. The update is protected by a spinlock, which is only taken with a
> "trylock", if the lock fails, then it does not get updated (we don't want
> to hold back the running code just to cache the name of an event), but it
> will try at the next sched event until it succeeds. This means that under
> strong contention, it may fail to cache certain names.

I've not got a silly number of context switches.
But it seemed to have consistently lost some process names even when they
went idle and then resumed.

I would have been tracing the schedule process wakeup and switch events,
system call entry/exit and irq+softirq.

> This is not a regression, it's really just the work load that can cause
> event names to be missed. I could work on something that if you have the
> sched events enabled, that the output side could do its own caching to get
> better results.

Ok if nothing has changed I'll stop worrying and work out what I've broken.

Oh, does the 'function_graph' code ignore tail calls?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ftrace not showing the process names for all processes on syscall events
  2020-03-30 15:34   ` David Laight
@ 2020-03-30 18:07     ` Steven Rostedt
  2020-03-30 18:14       ` David Laight
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2020-03-30 18:07 UTC (permalink / raw)
  To: David Laight; +Cc: Linux Kernel Mailing List

On Mon, 30 Mar 2020 15:34:08 +0000
David Laight <David.Laight@ACULAB.COM> wrote:

> Oh, does the 'function_graph' code ignore tail calls?

Yes and no ;-)  It works by dumb luck. As it was a year after function
graph tracing was live (some time in 2010 I believe) that someone brought
up tail calls, and I had to take a look at how it never crashed, and was
surprised that it "just worked". Here's a summary:


The function graph tracer works via the following:

start of function being traced:

  call fgraph trampoline

      Replace return code on stack with jump to return trampoline
      save original return address in shadow stack
      return back to start of function

end of function:

  pop return address
  jump to return address which is now a jump to the return trampoline:

      Return trampoline:
        trace return of function
        pop the shadow stack to get the original return address
        jump to original return address



Now let's take a look at a tail function call:


first_func() {
	some_stuff;
	tail_func();
	return;
}

tail_func() {
	some_other_stuff;
	return;
}


The above would look like in assembly something like this:


first_func:

1:    call fgaph_tramp
2:    asm some_stuff
3:    jmp tail_func


tail_func:

4:   call fgraph_trapm
5:   asm some_other_stuff
6:   pop
7:   ret

At 1 above, we call fgraph_tramp which will replace the return address on
the stack (what called first_func) with the address of the fgraph return
trampoline. And store the original address onto the shadow stack.

The stack looks like this:


  main stack:
  -----------
  fgraph return trampoline


  shadow stack:
  -------------
  original return address


At 3, we just jump directly to tail_func, without doing anything to the
stack.

At 4, fgraph_tramp is called for tail_func this time, and it will replace
its return address with a call to the fgraph return trampoline. HERE IS THE
TRICK!  The return address that was replaced, was actually the address of
the fgraph return trampoline and not the original return address. And this
is now stored on the shadow stack. Thus we have this:

  main stack:
  -----------
  fgraph return trampoline

  shadow stack:
  -------------
  original return address
  fgraph return trampoline

At 6, the fgraph return trampoline is popped off the stack, and called.

The fgraph return trampoline, will trace the end of the function, pop off
the shadow stack, and jump to that. It just happen that it pops off the
address of the fgraph return trampoline and jumps to itself again.

The second call to fgraph return trampline will trace the end of
first_func, pop the shadow stack, which this time is the original return
address, and then jump to that.

Thus, it "just works" and we can all stay happy ;-)

-- Steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: ftrace not showing the process names for all processes on syscall events
  2020-03-30 18:07     ` Steven Rostedt
@ 2020-03-30 18:14       ` David Laight
  2020-03-30 18:39         ` Steven Rostedt
  2020-04-01 14:37         ` David Laight
  0 siblings, 2 replies; 7+ messages in thread
From: David Laight @ 2020-03-30 18:14 UTC (permalink / raw)
  To: 'Steven Rostedt'; +Cc: Linux Kernel Mailing List

From: Steven Rostedt
> Sent: 30 March 2020 19:08
> On Mon, 30 Mar 2020 15:34:08 +0000
> David Laight <David.Laight@ACULAB.COM> wrote:
> 
> > Oh, does the 'function_graph' code ignore tail calls?
> 
> Yes and no ;-)  It works by dumb luck. As it was a year after function
> graph tracing was live (some time in 2010 I believe) that someone brought
> up tail calls, and I had to take a look at how it never crashed, and was
> surprised that it "just worked". Here's a summary:

'Dumb luck' seems to be failing me :-)
I'll look more closely tomorrow.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ftrace not showing the process names for all processes on syscall events
  2020-03-30 18:14       ` David Laight
@ 2020-03-30 18:39         ` Steven Rostedt
  2020-04-01 14:37         ` David Laight
  1 sibling, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2020-03-30 18:39 UTC (permalink / raw)
  To: David Laight; +Cc: Linux Kernel Mailing List

On Mon, 30 Mar 2020 18:14:37 +0000
David Laight <David.Laight@ACULAB.COM> wrote:

> From: Steven Rostedt
> > Sent: 30 March 2020 19:08
> > On Mon, 30 Mar 2020 15:34:08 +0000
> > David Laight <David.Laight@ACULAB.COM> wrote:
> >   
> > > Oh, does the 'function_graph' code ignore tail calls?  
> > 
> > Yes and no ;-)  It works by dumb luck. As it was a year after function
> > graph tracing was live (some time in 2010 I believe) that someone brought
> > up tail calls, and I had to take a look at how it never crashed, and was
> > surprised that it "just worked". Here's a summary:  
> 
> 'Dumb luck' seems to be failing me :-)
> I'll look more closely tomorrow.

The tl;td; version is that the function graph tracer relies on a shadow
stack that it uses to save the original return address, as it replaces the
original return address with the address of the fgraph return trampoline.

Although a tail call causes the real stack to only contain one return
address, the shadow stack will contain a return address for every function,
even if it was a tail call. As that stack gets updated by the entry of the
function not the return side. The difference is, a tail call would cause
the shadow stack to just contain a call to the start of the fgraph return
trampoline, and not an address into the rest of the kernel.

-- Steve


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: ftrace not showing the process names for all processes on syscall events
  2020-03-30 18:14       ` David Laight
  2020-03-30 18:39         ` Steven Rostedt
@ 2020-04-01 14:37         ` David Laight
  1 sibling, 0 replies; 7+ messages in thread
From: David Laight @ 2020-04-01 14:37 UTC (permalink / raw)
  To: 'Steven Rostedt'; +Cc: 'Linux Kernel Mailing List'

From: David Laight
> Sent: 30 March 2020 19:15
> From: Steven Rostedt
> > Sent: 30 March 2020 19:08
> > On Mon, 30 Mar 2020 15:34:08 +0000
> > David Laight <David.Laight@ACULAB.COM> wrote:
> >
> > > Oh, does the 'function_graph' code ignore tail calls?
> >
> > Yes and no ;-)  It works by dumb luck. As it was a year after function
> > graph tracing was live (some time in 2010 I believe) that someone brought
> > up tail calls, and I had to take a look at how it never crashed, and was
> > surprised that it "just worked". Here's a summary:
> 
> 'Dumb luck' seems to be failing me :-)
> I'll look more closely tomorrow.

The 'problem' is that some files like lib/iov_iter.c don't have the
tracepoints compiled in.

So calls I was expecting to see were absent.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-04-01 14:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-30 14:28 ftrace not showing the process names for all processes on syscall events David Laight
2020-03-30 15:08 ` Steven Rostedt
2020-03-30 15:34   ` David Laight
2020-03-30 18:07     ` Steven Rostedt
2020-03-30 18:14       ` David Laight
2020-03-30 18:39         ` Steven Rostedt
2020-04-01 14:37         ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).