linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* perf_evlist__filter_pollfd() in trace__run()
@ 2015-01-29 23:55 Sukadev Bhattiprolu
  2015-01-30 15:22 ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 4+ messages in thread
From: Sukadev Bhattiprolu @ 2015-01-29 23:55 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel, zhlbj

Arnaldo,

On one of our systems we are seeing an intermittent SIGSEGV with

	perf trace sleep 1

and I have question about the 'draining' flag below:
| 
| From 46fb3c21d20415dd2693570c33d0ea6eb8745e04 Mon Sep 17 00:00:00 2001
| From: Arnaldo Carvalho de Melo <acme@redhat.com>
| Date: Mon, 22 Sep 2014 14:39:48 -0300
| Subject: [PATCH 1/1] perf trace: Filter out POLLHUP'ed file descriptors
| 
| So that we don't continue polling on vanished file descriptors, i.e.
| file descriptors for events monitoring threads that exited.
| 
| I.e. the following 'trace' command now exits as expected, instead
| of staying in an eternal loop:
| 
|       $ sleep 5s &
|       $ trace -p `pidof sleep`
| 
| Reported-by: Jiri Olsa <jolsa@redhat.com>
| Cc: Adrian Hunter <adrian.hunter@intel.com>
| Cc: David Ahern <dsahern@gmail.com>
| Cc: Don Zickus <dzickus@redhat.com>
| Cc: Frederic Weisbecker <fweisbec@gmail.com>
| Cc: Jiri Olsa <jolsa@redhat.com>
| Cc: Mike Galbraith <efault@gmx.de>
| Cc: Namhyung Kim <namhyung@kernel.org>
| Cc: Paul Mackerras <paulus@samba.org>
| Cc: Peter Zijlstra <peterz@infradead.org>
| Cc: Stephane Eranian <eranian@google.com>
| Link: http://lkml.kernel.org/n/tip-6qegv786zbf6i8us6t4rxug9@git.kernel.org
| Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| ---
|  tools/perf/builtin-trace.c | 7 ++++++-
|  1 file changed, 6 insertions(+), 1 deletion(-)
| 
| diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
| index b8fedf3..fe39dc6 100644
| --- a/tools/perf/builtin-trace.c
| +++ b/tools/perf/builtin-trace.c
| @@ -2044,6 +2044,7 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
|  	int err = -1, i;
|  	unsigned long before;
|  	const bool forks = argc > 0;
| +	bool draining = false;
|  	char sbuf[STRERR_BUFSIZE];
|  
|  	trace->live = true;
| @@ -2171,8 +2172,12 @@ next_event:
|  	if (trace->nr_events == before) {
|  		int timeout = done ? 100 : -1;
|  
| -		if (perf_evlist__poll(evlist, timeout) > 0)
| +		if (!draining && perf_evlist__poll(evlist, timeout) > 0) {
| +			if (perf_evlist__filter_pollfd(evlist, POLLERR | POLLHUP) == 0)
| +				draining = true;
| +

If an fd gets into POLLHUP state, perf_evlist__filter_pollfd() removes
("puts") the mmap for the fd. We are seeing that sometimes (frequently)
_all_ fds are in the POLLHUP state and hence their mmap->base are set
to NULL.


|  			goto again;

Now with this goto, we go back and call perf_evlist__mmap_read() which
tries to access the freed mmaps.

Should there be another check to before reading the mmap again ?

I must add that I don't get the SIGSEGV on recent perf-core and the
system where we get the crash, first runs into the following
errors that we are still looking into (maybe related to "ppc64le"
architecture).

	Problems reading syscall 45 information
	Problems reading syscall 5 information
	Problems reading syscall 5 information
	Problems reading syscall 108 information
	Problems reading syscall 108 information
	Problems reading syscall 90 information
	Problems reading syscall 90 information
	Problems reading syscall 6 information

Unlike the SIGSEGV, these errors occur always.

| +		}
|  	} else {
|  		goto again;
|  	}
| -- 
| 1.8.3.1
| 

Following hack seems to fix the SIGSEGV, but then we completely ignore
'draining' flag.


diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index fb12645..ac25e16 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -2173,8 +2173,10 @@ next_event:
                int timeout = done ? 100 : -1;
 
                if (!draining && perf_evlist__poll(evlist, timeout) > 0) {
-                       if (perf_evlist__filter_pollfd(evlist, POLLERR | POLLHUP) == 0)
+                       if (perf_evlist__filter_pollfd(evlist, POLLERR | POLLHUP) == 0) {
                                draining = true;
+                               goto out_disable;
+                       }
 
                        goto again;
                }


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: perf_evlist__filter_pollfd() in trace__run()
  2015-01-29 23:55 perf_evlist__filter_pollfd() in trace__run() Sukadev Bhattiprolu
@ 2015-01-30 15:22 ` Arnaldo Carvalho de Melo
  2015-01-30 20:16   ` Sukadev Bhattiprolu
  0 siblings, 1 reply; 4+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-01-30 15:22 UTC (permalink / raw)
  To: Sukadev Bhattiprolu; +Cc: Jiri Olsa, linux-kernel, zhlbj

Em Thu, Jan 29, 2015 at 03:55:22PM -0800, Sukadev Bhattiprolu escreveu:
> Arnaldo,
> 
> On one of our systems we are seeing an intermittent SIGSEGV with
> 
> 	perf trace sleep 1
> 
> and I have question about the 'draining' flag below:
> | 
> | From 46fb3c21d20415dd2693570c33d0ea6eb8745e04 Mon Sep 17 00:00:00 2001
> | From: Arnaldo Carvalho de Melo <acme@redhat.com>
> | Date: Mon, 22 Sep 2014 14:39:48 -0300
> | Subject: [PATCH 1/1] perf trace: Filter out POLLHUP'ed file descriptors
> | 
> | So that we don't continue polling on vanished file descriptors, i.e.
> | file descriptors for events monitoring threads that exited.
> | 
> | I.e. the following 'trace' command now exits as expected, instead
> | of staying in an eternal loop:
> | 
> |       $ sleep 5s &
> |       $ trace -p `pidof sleep`
> | 
> | Reported-by: Jiri Olsa <jolsa@redhat.com>
> | Cc: Adrian Hunter <adrian.hunter@intel.com>
> | Cc: David Ahern <dsahern@gmail.com>
> | Cc: Don Zickus <dzickus@redhat.com>
> | Cc: Frederic Weisbecker <fweisbec@gmail.com>
> | Cc: Jiri Olsa <jolsa@redhat.com>
> | Cc: Mike Galbraith <efault@gmx.de>
> | Cc: Namhyung Kim <namhyung@kernel.org>
> | Cc: Paul Mackerras <paulus@samba.org>
> | Cc: Peter Zijlstra <peterz@infradead.org>
> | Cc: Stephane Eranian <eranian@google.com>
> | Link: http://lkml.kernel.org/n/tip-6qegv786zbf6i8us6t4rxug9@git.kernel.org
> | Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> | ---
> |  tools/perf/builtin-trace.c | 7 ++++++-
> |  1 file changed, 6 insertions(+), 1 deletion(-)
> | 
> | diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> | index b8fedf3..fe39dc6 100644
> | --- a/tools/perf/builtin-trace.c
> | +++ b/tools/perf/builtin-trace.c
> | @@ -2044,6 +2044,7 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
> |  	int err = -1, i;
> |  	unsigned long before;
> |  	const bool forks = argc > 0;
> | +	bool draining = false;
> |  	char sbuf[STRERR_BUFSIZE];
> |  
> |  	trace->live = true;
> | @@ -2171,8 +2172,12 @@ next_event:
> |  	if (trace->nr_events == before) {
> |  		int timeout = done ? 100 : -1;
> |  
> | -		if (perf_evlist__poll(evlist, timeout) > 0)
> | +		if (!draining && perf_evlist__poll(evlist, timeout) > 0) {
> | +			if (perf_evlist__filter_pollfd(evlist, POLLERR | POLLHUP) == 0)
> | +				draining = true;
> | +
> 
> If an fd gets into POLLHUP state, perf_evlist__filter_pollfd() removes
> ("puts") the mmap for the fd. We are seeing that sometimes (frequently)
> _all_ fds are in the POLLHUP state and hence their mmap->base are set
> to NULL.
> 
> 
> |  			goto again;
> 
> Now with this goto, we go back and call perf_evlist__mmap_read() which
> tries to access the freed mmaps.
> 
> Should there be another check to before reading the mmap again ?

Possibly, checking, but a similar algorithm should be in place for
'record', do you see any problems there? I.e. with 'perf record sleep
1'?

- Arnaldo
 
> I must add that I don't get the SIGSEGV on recent perf-core and the
> system where we get the crash, first runs into the following
> errors that we are still looking into (maybe related to "ppc64le"
> architecture).
> 
> 	Problems reading syscall 45 information
> 	Problems reading syscall 5 information
> 	Problems reading syscall 5 information
> 	Problems reading syscall 108 information
> 	Problems reading syscall 108 information
> 	Problems reading syscall 90 information
> 	Problems reading syscall 90 information
> 	Problems reading syscall 6 information
> 
> Unlike the SIGSEGV, these errors occur always.
> 
> | +		}
> |  	} else {
> |  		goto again;
> |  	}
> | -- 
> | 1.8.3.1
> | 
> 
> Following hack seems to fix the SIGSEGV, but then we completely ignore
> 'draining' flag.
> 
> 
> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> index fb12645..ac25e16 100644
> --- a/tools/perf/builtin-trace.c
> +++ b/tools/perf/builtin-trace.c
> @@ -2173,8 +2173,10 @@ next_event:
>                 int timeout = done ? 100 : -1;
>  
>                 if (!draining && perf_evlist__poll(evlist, timeout) > 0) {
> -                       if (perf_evlist__filter_pollfd(evlist, POLLERR | POLLHUP) == 0)
> +                       if (perf_evlist__filter_pollfd(evlist, POLLERR | POLLHUP) == 0) {
>                                 draining = true;
> +                               goto out_disable;
> +                       }
>  
>                         goto again;
>                 }

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: perf_evlist__filter_pollfd() in trace__run()
  2015-01-30 15:22 ` Arnaldo Carvalho de Melo
@ 2015-01-30 20:16   ` Sukadev Bhattiprolu
  2015-02-01 14:10     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 4+ messages in thread
From: Sukadev Bhattiprolu @ 2015-01-30 20:16 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel, zhlbj

Arnaldo Carvalho de Melo [acme@kernel.org] wrote:
| Em Thu, Jan 29, 2015 at 03:55:22PM -0800, Sukadev Bhattiprolu escreveu:
| > Should there be another check to before reading the mmap again ?
| 
| Possibly, checking, but a similar algorithm should be in place for
| 'record', do you see any problems there? I.e. with 'perf record sleep
| 1'?

No, I don't see the failure with perf record in several hundred attempts.
(both the distro and latest perf-core versions work ok).

| 
| - Arnaldo
| 
| > I must add that I don't get the SIGSEGV on recent perf-core and the
| > system where we get the crash, first runs into the following
| > errors that we are still looking into (maybe related to "ppc64le"
| > architecture).

With recent perf core I did not get the SIGSEGV (or the syscall translation
failures below) on a different system.

But on this system, a power KVM guest, I get both errors even with latest 
perf core (commit c52686f9f88).

| > 
| > 	Problems reading syscall 45 information
| > 	Problems reading syscall 5 information
| > 	Problems reading syscall 5 information
| > 	Problems reading syscall 108 information
| > 	Problems reading syscall 108 information
| > 	Problems reading syscall 90 information
| > 	Problems reading syscall 90 information
| > 	Problems reading syscall 6 information
| > 
| > Unlike the SIGSEGV, these errors occur always.

Most likely this syscall number translation is due to the

https://fedorahosted.org/audit/browser/trunk/lib/machinetab.h 

missing this change set 1013, which added support for ppc64le.

Sukadev


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: perf_evlist__filter_pollfd() in trace__run()
  2015-01-30 20:16   ` Sukadev Bhattiprolu
@ 2015-02-01 14:10     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 4+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-02-01 14:10 UTC (permalink / raw)
  To: Sukadev Bhattiprolu; +Cc: Jiri Olsa, linux-kernel, zhlbj

Em Fri, Jan 30, 2015 at 12:16:11PM -0800, Sukadev Bhattiprolu escreveu:
> Arnaldo Carvalho de Melo [acme@kernel.org] wrote:
> | Em Thu, Jan 29, 2015 at 03:55:22PM -0800, Sukadev Bhattiprolu escreveu:
> | > Should there be another check to before reading the mmap again ?
> | 
> | Possibly, checking, but a similar algorithm should be in place for
> | 'record', do you see any problems there? I.e. with 'perf record sleep
> | 1'?
> 
> No, I don't see the failure with perf record in several hundred attempts.
> (both the distro and latest perf-core versions work ok).

Ok, so its a matter of comparing what both are doing, will try.
 
> | > I must add that I don't get the SIGSEGV on recent perf-core and the
> | > system where we get the crash, first runs into the following
> | > errors that we are still looking into (maybe related to "ppc64le"
> | > architecture).
> 
> With recent perf core I did not get the SIGSEGV (or the syscall translation
> failures below) on a different system.
> 
> But on this system, a power KVM guest, I get both errors even with latest 
> perf core (commit c52686f9f88).
> 
> | > 
> | > 	Problems reading syscall 45 information
> | > 	Problems reading syscall 5 information
> | > 	Problems reading syscall 5 information
> | > 	Problems reading syscall 108 information
> | > 	Problems reading syscall 108 information
> | > 	Problems reading syscall 90 information
> | > 	Problems reading syscall 90 information
> | > 	Problems reading syscall 6 information
> | > 
> | > Unlike the SIGSEGV, these errors occur always.
> 
> Most likely this syscall number translation is due to the
> 
> https://fedorahosted.org/audit/browser/trunk/lib/machinetab.h 
> 
> missing this change set 1013, which added support for ppc64le.

We have to improve that message to mention that perhaps a newer
audit-libs package is required.

- Arnaldo

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-02-01 14:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-29 23:55 perf_evlist__filter_pollfd() in trace__run() Sukadev Bhattiprolu
2015-01-30 15:22 ` Arnaldo Carvalho de Melo
2015-01-30 20:16   ` Sukadev Bhattiprolu
2015-02-01 14:10     ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).