linux-trace-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Masami Hiramatsu <mhiramat@kernel.org>
To: "Dorau, Lukasz" <lukasz.dorau@intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>,
	"Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>,
	"David S. Miller" <davem@davemloft.net>,
	"linux-trace-users@vger.kernel.org"
	<linux-trace-users@vger.kernel.org>,
	"Slusarz, Marcin" <marcin.slusarz@intel.com>,
	"Jelinek, Sarah" <sarah.jelinek@intel.com>,
	"Chernookyi, Vitalii" <vitalii.chernookyi@intel.com>,
	"Buella, Gabor" <gabor.buella@intel.com>
Subject: Re: Why return probes of some syscalls sometimes are not called?
Date: Thu, 16 Mar 2017 19:01:29 +0900	[thread overview]
Message-ID: <20170316190129.4d46fee4ab53fed49a6bf870@kernel.org> (raw)
In-Reply-To: <D9FFE20C522965449E182ACE73889AEB48682689@irsmsx105.ger.corp.intel.com>

On Mon, 13 Mar 2017 09:09:47 +0000
"Dorau, Lukasz" <lukasz.dorau@intel.com> wrote:

> On Friday, March 10, 2017 11:51 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > On Thu, 9 Mar 2017 18:00:54 +0100
> > Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > 
> > > On Thu, 9 Mar 2017 13:44:38 +0000
> > > "Dorau, Lukasz" <lukasz.dorau@intel.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > Could someone explain me why return probes of some syscalls (for example:
> > futex, poll, epoll_wait) sometimes are not called?
> > > >
> > > > It can be reproduced using the following bash script:
> > > > https://gist.github.com/ldorau/c439d9ec7635409a5016c42e3a9121ec
> > > >
> > > > Here are results gathered from 60 seconds test run on kernel 4.9.12 (Fedora
> > 24):
> > > >
> > > > futex:       p 56904    r 5489     (90% did not return (51415))
> > > > poll:        p 43466    r 7703     (82% did not return (35763))
> > > > epoll_wait:  p 73366    r 23551    (67% did not return (49815))
> > > >
> > > > The whole log is attached below.
> > > >
> > > > Lukasz
> > > >
> > > > ---
> > > > # ./test_kprobes.sh 60
> > > >
> > > > Will trace using following kprobe_events:
> > > > r:kprobes/r_futex sys_futex
> > > > p:kprobes/p_futex sys_futex
> > > > r:kprobes/r_poll sys_poll
> > > > p:kprobes/p_poll sys_poll
> > > > r:kprobes/r_epoll_wait sys_epoll_wait
> > > > p:kprobes/p_epoll_wait sys_epoll_wait
> > > > r:kprobes/r_select sys_select
> > > > p:kprobes/p_select sys_select
> > > > r:kprobes/r_fork sys_fork
> > > > p:kprobes/p_fork sys_fork
> > > > r:kprobes/r_vfork sys_vfork
> > > > p:kprobes/p_vfork sys_vfork
> > > > r:kprobes/r_mmap sys_mmap
> > > > p:kprobes/p_mmap sys_mmap
> > > > r:kprobes/r_open sys_open
> > > > p:kprobes/p_open sys_open
> > > > r:kprobes/r_close sys_close
> > > > p:kprobes/p_close sys_close
> > > > r:kprobes/r_write sys_write
> > > > p:kprobes/p_write sys_write
> > > > r:kprobes/r_read sys_read
> > > > p:kprobes/p_read sys_read
> > > >
> > > > Results (60 sec):
> > > > futex:       p 56904    r 5489     (90% did not return (51415))
> > > > poll:        p 43466    r 7703     (82% did not return (35763))
> > > > epoll_wait:  p 73366    r 23551    (67% did not return (49815))
> > > > select:      p 13355    r 13351    (0% did not return (4))
> > > > fork:        p 0        r 0        (OK)
> > > > vfork:       p 0        r 0        (OK)
> > > > mmap:        p 4328     r 4328     (OK)
> > > > open:        p 4579     r 4579     (OK)
> > > > close:       p 7163     r 7163     (OK)
> > > > write:       p 22769    r 22769    (OK)
> > > > read:        p 40014    r 40014    (OK)
> > > >
> > >
> > > OK, I just dumped kprobe_profile after the test,
> > >
> > > futex:       p 7775     r 698      (91% did not return (7077))
> > > poll:        p 18409    r 3523     (80% did not return (14886))
> > > epoll_wait:  p 5122     r 2389     (53% did not return (2733))
> > > select:      p 5411     r 5286     (2% did not return (125))
> > > fork:        p 0        r 0        (OK)
> > > vfork:       p 0        r 0        (OK)
> > > mmap:        p 179      r 179      (OK)
> > > open:        p 921      r 921      (OK)
> > > close:       p 1005     r 1005     (OK)
> > > write:       p 5632     r 5631     (0% did not return (1))
> > > read:        p 10879    r 10878    (0% did not return (1))
> > >   r_futex                                                 1091               0
> > >   p_futex                                                 8986               0
> > >   r_poll                                                  4276               0
> > >   p_poll                                                 20610               0
> > >   r_epoll_wait                                            2849               0
> > >   p_epoll_wait                                            5905               0
> > >   r_select                                               10172               0
> > >   p_select                                               10312               0
> > >   r_fork                                                     0               0
> > >   p_fork                                                     0               0
> > >   r_vfork                                                    0               0
> > >   p_vfork                                                    0               0
> > >   r_mmap                                                  2987               0
> > >   p_mmap                                                  2884               0
> > >   r_open                                                  4002               0
> > >   p_open                                                  3855               0
> > >   r_close                                                 4262               0
> > >   p_close                                                 4122               0
> > >   r_write                                                14281               0
> > >   p_write                                                14276               0
> > >   r_read                                                 41494               0
> > >   p_read                                                 41363               0
> > >
> > > OK, there are no miss-counted event. So at least kprobe and kretprobe working
> > correctly.
> > > I also tried to use static tracepoints for that.
> > > https://gist.github.com/mhiramat/04d782deaf110a6b3391e2ecb923325e
> > >
> > > ----
> > > $ sudo bash test_kprobes.sh 10
> > > Disabling all current probes ...
> > > Removing old probe points ...
> > > Adding new probe points ...
> > > Enabling new probe points ...
> > > Clearing old trace log ...
> > >
> > > Tracing for 10 seconds ... done
> > > Saving trace log ... done
> > >
> > > Results (10 sec):
> > > futex:       p 9856     r 1258     (87% did not return (8598))
> > > poll:        p 7531     r 4031     (46% did not return (3500))
> > > epoll_wait:  p 1996     r 1617     (18% did not return (379))
> > > select:      p 273      r 257      (5% did not return (16))
> > > Results (10 sec):
> > > futex:       p 9851     r 9851     (OK)
> > > poll:        p 7531     r 7531     (OK)
> > > epoll_wait:  p 1996     r 1996     (OK)
> > > select:      p 273      r 273      (OK)
> > > ----
> > > Of course these events are collected in "different places" so
> > > it could be different. But yeah, it seems sys_* function itself
> > > dosen't return but jumps into ret_from_syscall etc.
> > 
> > Please try attached patch (that is my miss that forgot to accumulate missed
> > count of kretprobe).
> > 
> 
> Thanks for the patch. I ran two tests using the following script:
> https://gist.github.com/ldorau/36d87640ff0659a483342d490b66f034
> The results are following:
> 
> 1) The first test - during 60-seconds test the system was almost idle
>    (I did not touch the keyboard or mouse)
> 
> KProbes results (60 sec):
> futex:       p 1373     r 422      (69% did not return (951))
> poll:        p 3304     r 1575     (52% did not return (1729))
> epoll_wait:  p 1062     r 943      (11% did not return (119))
> select:      p 5978     r 5977     (0% did not return (1))
> mmap:        p 264      r 264      (OK)
> open:        p 1056     r 1056     (OK)
> close:       p 1198     r 1198     (OK)
> write:       p 4786     r 4786     (OK)
> read:        p 8939     r 8939     (OK)
> exit:        p 1        r 0    
> fork:        p 0        r 0        (OK)
> vfork:       p 0        r 0        (OK)
> clone:       p 5        r 5        (OK)
> execve:      p 4        r 4        (OK)
> 
> Tracepoints results (60 sec):
> futex:       p 1372     r 1372     (OK)
> poll:        p 3304     r 3303     (0% did not return (1))
> epoll_wait:  p 1062     r 1062     (OK)
> select:      p 5978     r 5978     (OK)
> mmap:        p 264      r 264      (OK)
> open:        p 1056     r 1056     (OK)
> close:       p 1198     r 1198     (OK)
> write:       p 4786     r 4786     (OK)
> read:        p 8939     r 8940     (OK)
> exit:        p 1        r 0    
> 
> 2) The second test - during 60-seconds test I was normally using the system -
>    - I launched an internet browser, edited a file...
> 
> KProbes results (60 sec):
> futex:       p 35394    r 689      (98% did not return (34705))
> poll:        p 23449    r 896      (96% did not return (22553))
> epoll_wait:  p 43927    r 4973     (88% did not return (38954))
> select:      p 4874     r 4873     (0% did not return (1))
> mmap:        p 3165     r 3165     (OK)
> open:        p 3680     r 3678     (0% did not return (2))
> close:       p 6704     r 6704     (OK)
> write:       p 11769    r 11769    (OK)
> read:        p 22645    r 22647    (OK)
> exit:        p 30       r 0        
> fork:        p 0        r 0        (OK)
> vfork:       p 0        r 0        (OK)
> clone:       p 170      r 170      (OK)
> execve:      p 2        r 2        (OK)
> 
> Tracepoints results (60 sec):
> futex:       p 35359    r 35268    (0% did not return (91))
> poll:        p 23448    r 23444    (0% did not return (4))
> epoll_wait:  p 43926    r 43909    (0% did not return (17))
> select:      p 4874     r 4874     (OK)
> mmap:        p 3165     r 3165     (OK)
> open:        p 3680     r 3827     (OK)
> close:       p 6704     r 6704     (OK)
> write:       p 11769    r 11770    (OK)
> read:        p 22645    r 22648    (OK)
> exit:        p 30       r 0        
> 

I see. The patch was to identify the reason why the return event was
lost. Not for fixing the issue.
To fix this issue, I need some more work.

Thank you,

> Lukasz
> 
> 
> 
> > I've focused on futex and it showed there were some "misses" of kretprobe.
> > 
> > ----
> > Disabling all current probes ...
> > Removing old probe points ...
> > Adding new probe points ...
> > Enabling new probe points ...
> > Clearing old trace log ...
> > 
> > Tracing for 10 seconds ... done
> > Saving trace log ... done
> > 
> > Results (10 sec):
> > futex:       p 190      r 74       (61% did not return (116))
> > Results (10 sec):
> > futex:       p 190      r 190      (OK)
> > 
> > $ cat /sys/kernel/debug/tracing/kprobe_profile
> >   r_futex                                                  176             127
> >   p_futex                                                  298               0
> > -----
> > 
> > Actually, since the kretprobe is not well designed for the usecase
> > probing on the sleep-able functions, it just fails if it invoked
> > more than the number of CPUs in parallel.
> > It is currently a limitation of kretprobe, but I'll solve it.
> > 
> > Thanks,
> > 
> > --
> > Masami Hiramatsu <mhiramat@kernel.org>


-- 
Masami Hiramatsu <mhiramat@kernel.org>

      reply	other threads:[~2017-03-16 10:11 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-09 13:44 Why return probes of some syscalls sometimes are not called? Dorau, Lukasz
2017-03-09 13:58 ` Mathieu Desnoyers
2017-03-09 14:44   ` Steven Rostedt
2017-03-09 14:57     ` Steven Rostedt
     [not found]     ` <1231722663.1343.1489071698914.JavaMail.zimbra@efficios.com>
2017-03-09 15:20       ` Steven Rostedt
     [not found]         ` <D9FFE20C522965449E182ACE73889AEB486818F2@irsmsx105.ger.corp.intel.com>
2017-03-09 15:38           ` Steven Rostedt
2017-03-09 17:00 ` Masami Hiramatsu
2017-03-09 18:19   ` Chernookyi, Vitalii
2017-03-10 10:51   ` Masami Hiramatsu
2017-03-13  9:09     ` Dorau, Lukasz
2017-03-16 10:01       ` Masami Hiramatsu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170316190129.4d46fee4ab53fed49a6bf870@kernel.org \
    --to=mhiramat@kernel.org \
    --cc=ananth@linux.vnet.ibm.com \
    --cc=anil.s.keshavamurthy@intel.com \
    --cc=davem@davemloft.net \
    --cc=gabor.buella@intel.com \
    --cc=linux-trace-users@vger.kernel.org \
    --cc=lukasz.dorau@intel.com \
    --cc=marcin.slusarz@intel.com \
    --cc=sarah.jelinek@intel.com \
    --cc=vitalii.chernookyi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).