All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] perf record: Allow poll timeout to be specified
@ 2015-03-24 16:09 David Ahern
  2015-03-24 16:12 ` Ingo Molnar
  2015-03-25 12:38 ` Jiri Olsa
  0 siblings, 2 replies; 10+ messages in thread
From: David Ahern @ 2015-03-24 16:09 UTC (permalink / raw)
  To: acme
  Cc: linux-kernel, David Ahern, Ingo Molnar, Frederic Weisbecker,
	Peter Zijlstra, Jiri Olsa, Namhyung Kim, Stephane Eranian,
	Adrian Hunter

Record currently wakes up based on watermarks to read events from the mmaps and
write them out to the file. The result is a file that can have large blocks of
events per mmap before a finished round event is added to the stream.  This in
turn affects the quantity of events that have to be passed through the ordered
events queue before results can be displayed to the user. For commands like
perf-script this can lead to long unnecessarily long delays before a user gets
output. Large systems (e.g, 1024 cpus) further compound this effect. I have seen
instances where I have to wait 45 minutes for perf-script to process a 5GB file
before any events are shown.

This patch adds an option to perf-record to allow a user to specify the
poll timeout in msec. For example using 100 msec timeouts similar to perf-top
means the mmaps are traversed much more frequently leading to a smoother
analysis side.

Signed-off-by: David Ahern <david.ahern@oracle.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-record.txt | 6 ++++++
 tools/perf/builtin-record.c              | 5 ++++-
 tools/perf/perf.h                        | 1 +
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 355c4f5569b5..7010c363fdd1 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -250,6 +250,12 @@ is off by default.
 --running-time::
 Record running and enabled time for read events (:S)
 
+--poll=::
+Polling interval in msec. Defaults to infinite which means record relies on
+watermarks to wakeup and read events from each mmap. Setting poll helps smooth
+the event collection across mmaps and the subsequent processing of the data
+file. For example perf-top uses a 100 msec polling interval.
+
 SEE ALSO
 --------
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 5a2ff510b75b..091868288d29 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -485,7 +485,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		if (hits == rec->samples) {
 			if (done || draining)
 				break;
-			err = perf_evlist__poll(rec->evlist, -1);
+			err = perf_evlist__poll(rec->evlist, opts->poll_timeout);
 			/*
 			 * Propagate error, only if there's any. Ignore positive
 			 * number of returned events and interrupt error.
@@ -734,6 +734,7 @@ static struct record record = {
 		.user_freq	     = UINT_MAX,
 		.user_interval	     = ULLONG_MAX,
 		.freq		     = 4000,
+		.poll_timeout	     = -1,
 		.target		     = {
 			.uses_mmap   = true,
 			.default_per_cpu = true,
@@ -841,6 +842,8 @@ struct option __record_options[] = {
 		    "Sample machine registers on interrupt"),
 	OPT_BOOLEAN(0, "running-time", &record.opts.running_time,
 		    "Record running/enabled time of read (:S) events"),
+	OPT_INTEGER(0, "poll", &record.opts.poll_timeout,
+		  "poll interval in ms (defaults to infinite)"),
 	OPT_END()
 };
 
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 1caa70a4a9e1..ee847c8af668 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -62,6 +62,7 @@ struct record_opts {
 	u64	     user_interval;
 	bool	     sample_transaction;
 	unsigned     initial_delay;
+	int	     poll_timeout;
 };
 
 struct option;
-- 
2.2.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf record: Allow poll timeout to be specified
  2015-03-24 16:09 [PATCH] perf record: Allow poll timeout to be specified David Ahern
@ 2015-03-24 16:12 ` Ingo Molnar
  2015-03-24 16:18   ` David Ahern
  2015-03-24 21:21   ` Arnaldo Carvalho de Melo
  2015-03-25 12:38 ` Jiri Olsa
  1 sibling, 2 replies; 10+ messages in thread
From: Ingo Molnar @ 2015-03-24 16:12 UTC (permalink / raw)
  To: David Ahern
  Cc: acme, linux-kernel, Frederic Weisbecker, Peter Zijlstra,
	Jiri Olsa, Namhyung Kim, Stephane Eranian, Adrian Hunter


* David Ahern <david.ahern@oracle.com> wrote:

> Record currently wakes up based on watermarks to read events from 
> the mmaps and write them out to the file. The result is a file that 
> can have large blocks of events per mmap before a finished round 
> event is added to the stream.  This in turn affects the quantity of 
> events that have to be passed through the ordered events queue 
> before results can be displayed to the user. For commands like 
> perf-script this can lead to long unnecessarily long delays before a 
> user gets output. Large systems (e.g, 1024 cpus) further compound 
> this effect. I have seen instances where I have to wait 45 minutes 
> for perf-script to process a 5GB file before any events are shown.
> 
> This patch adds an option to perf-record to allow a user to specify 
> the poll timeout in msec. For example using 100 msec timeouts 
> similar to perf-top means the mmaps are traversed much more 
> frequently leading to a smoother analysis side.

Please tune the default value (perhaps influenced by N_PROC?) so that 
users will get sane behavior without having to specify this option!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf record: Allow poll timeout to be specified
  2015-03-24 16:12 ` Ingo Molnar
@ 2015-03-24 16:18   ` David Ahern
  2015-03-24 21:21   ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 10+ messages in thread
From: David Ahern @ 2015-03-24 16:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: acme, linux-kernel, Frederic Weisbecker, Peter Zijlstra,
	Jiri Olsa, Namhyung Kim, Stephane Eranian, Adrian Hunter

On 3/24/15 10:12 AM, Ingo Molnar wrote:
>
> * David Ahern <david.ahern@oracle.com> wrote:
>
>> Record currently wakes up based on watermarks to read events from
>> the mmaps and write them out to the file. The result is a file that
>> can have large blocks of events per mmap before a finished round
>> event is added to the stream.  This in turn affects the quantity of
>> events that have to be passed through the ordered events queue
>> before results can be displayed to the user. For commands like
>> perf-script this can lead to long unnecessarily long delays before a
>> user gets output. Large systems (e.g, 1024 cpus) further compound
>> this effect. I have seen instances where I have to wait 45 minutes
>> for perf-script to process a 5GB file before any events are shown.
>>
>> This patch adds an option to perf-record to allow a user to specify
>> the poll timeout in msec. For example using 100 msec timeouts
>> similar to perf-top means the mmaps are traversed much more
>> frequently leading to a smoother analysis side.
>
> Please tune the default value (perhaps influenced by N_PROC?) so that
> users will get sane behavior without having to specify this option!

I knew you were going to say that! ;-)

It's really a function of events coming in not cpus. The number of CPUs 
just compounds the problem.

I thought about making perf-record use a 100msec timeout like perf-top, 
but that can lead to unnecessary FINISHED_ROUND events in the file and 
unnecessary noise/overhead in the record side. On the other hand looking 
at scheduler tracepoints, kvm tracepoints, etc -- those can flood in to 
the point that even 100msec is too long.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf record: Allow poll timeout to be specified
  2015-03-24 16:12 ` Ingo Molnar
  2015-03-24 16:18   ` David Ahern
@ 2015-03-24 21:21   ` Arnaldo Carvalho de Melo
  2015-03-25  9:11     ` Ingo Molnar
  1 sibling, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-03-24 21:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Ahern, linux-kernel, Frederic Weisbecker, Peter Zijlstra,
	Jiri Olsa, Namhyung Kim, Stephane Eranian, Adrian Hunter

Em Tue, Mar 24, 2015 at 05:12:11PM +0100, Ingo Molnar escreveu:
> * David Ahern <david.ahern@oracle.com> wrote:
> > Record currently wakes up based on watermarks to read events from 
> > the mmaps and write them out to the file. The result is a file that 
> > can have large blocks of events per mmap before a finished round 
> > event is added to the stream.  This in turn affects the quantity of 
> > events that have to be passed through the ordered events queue 
> > before results can be displayed to the user. For commands like 
> > perf-script this can lead to long unnecessarily long delays before a 
> > user gets output. Large systems (e.g, 1024 cpus) further compound 
> > this effect. I have seen instances where I have to wait 45 minutes 
> > for perf-script to process a 5GB file before any events are shown.
> > 
> > This patch adds an option to perf-record to allow a user to specify 
> > the poll timeout in msec. For example using 100 msec timeouts 
> > similar to perf-top means the mmaps are traversed much more 
> > frequently leading to a smoother analysis side.
> 
> Please tune the default value (perhaps influenced by N_PROC?) so that 
> users will get sane behavior without having to specify this option!

Isn't this a followup patch? I.e. changing the default from infinity to
some sane value?

Applying it now.

- Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf record: Allow poll timeout to be specified
  2015-03-24 21:21   ` Arnaldo Carvalho de Melo
@ 2015-03-25  9:11     ` Ingo Molnar
  2015-03-25 12:14       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2015-03-25  9:11 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: David Ahern, linux-kernel, Frederic Weisbecker, Peter Zijlstra,
	Jiri Olsa, Namhyung Kim, Stephane Eranian, Adrian Hunter


* Arnaldo Carvalho de Melo <acme@kernel.org> wrote:

> Em Tue, Mar 24, 2015 at 05:12:11PM +0100, Ingo Molnar escreveu:
> > * David Ahern <david.ahern@oracle.com> wrote:
> >
> > > Record currently wakes up based on watermarks to read events 
> > > from the mmaps and write them out to the file. The result is a 
> > > file that can have large blocks of events per mmap before a 
> > > finished round event is added to the stream.  This in turn 
> > > affects the quantity of events that have to be passed through 
> > > the ordered events queue before results can be displayed to the 
> > > user. For commands like perf-script this can lead to long 
> > > unnecessarily long delays before a user gets output. Large 
> > > systems (e.g, 1024 cpus) further compound this effect. I have 
> > > seen instances where I have to wait 45 minutes for perf-script 
> > > to process a 5GB file before any events are shown.
> > > 
> > > This patch adds an option to perf-record to allow a user to 
> > > specify the poll timeout in msec. For example using 100 msec 
> > > timeouts similar to perf-top means the mmaps are traversed much 
> > > more frequently leading to a smoother analysis side.
> > 
> > Please tune the default value (perhaps influenced by N_PROC?) so 
> > that users will get sane behavior without having to specify this 
> > option!
> 
> Isn't this a followup patch? [...]

Will a followup patch be written?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf record: Allow poll timeout to be specified
  2015-03-25  9:11     ` Ingo Molnar
@ 2015-03-25 12:14       ` Arnaldo Carvalho de Melo
  2015-03-25 14:41         ` David Ahern
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-03-25 12:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Ahern, linux-kernel, Frederic Weisbecker, Peter Zijlstra,
	Jiri Olsa, Namhyung Kim, Stephane Eranian, Adrian Hunter

Em Wed, Mar 25, 2015 at 10:11:47AM +0100, Ingo Molnar escreveu:
> * Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > Em Tue, Mar 24, 2015 at 05:12:11PM +0100, Ingo Molnar escreveu:
> > > * David Ahern <david.ahern@oracle.com> wrote:

> > > > Record currently wakes up based on watermarks to read events
> > > > from the mmaps and write them out to the file. The result is a
> > > > file that can have large blocks of events per mmap before a
> > > > finished round event is added to the stream.  This in turn
> > > > affects the quantity of events that have to be passed through
> > > > the ordered events queue before results can be displayed to the
> > > > user. For commands like perf-script this can lead to long
> > > > unnecessarily long delays before a user gets output. Large
> > > > systems (e.g, 1024 cpus) further compound this effect. I have
> > > > seen instances where I have to wait 45 minutes for perf-script
> > > > to process a 5GB file before any events are shown.

> > > > This patch adds an option to perf-record to allow a user to
> > > > specify the poll timeout in msec. For example using 100 msec
> > > > timeouts similar to perf-top means the mmaps are traversed much
> > > > more frequently leading to a smoother analysis side.

> > > Please tune the default value (perhaps influenced by N_PROC?) so 
> > > that users will get sane behavior without having to specify this 
> > > option!

> > Isn't this a followup patch? [...]
 
> Will a followup patch be written?

Hope so :-)

If David doesn't come up with something I'll probably will, as making
'trace' use the ordered_samples, like 'perf top' does (initially with
some arbitrary reasonable poll timeout value), is a low hanging fruit to
get those multi-CPU tracepoints sorted until I get something better in
place...

But what I said is independent of if a followup patch would come or not,
right now we don't have that possibility, with his patch, we do.

Turning it from not possible to possible looks an improvement before we
get it done automatically, and even by then allowing someone to tweak
that value may be useful, no?

- Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf record: Allow poll timeout to be specified
  2015-03-24 16:09 [PATCH] perf record: Allow poll timeout to be specified David Ahern
  2015-03-24 16:12 ` Ingo Molnar
@ 2015-03-25 12:38 ` Jiri Olsa
  2015-03-25 14:37   ` David Ahern
  1 sibling, 1 reply; 10+ messages in thread
From: Jiri Olsa @ 2015-03-25 12:38 UTC (permalink / raw)
  To: David Ahern
  Cc: acme, linux-kernel, Ingo Molnar, Frederic Weisbecker,
	Peter Zijlstra, Jiri Olsa, Namhyung Kim, Stephane Eranian,
	Adrian Hunter

On Tue, Mar 24, 2015 at 12:09:48PM -0400, David Ahern wrote:
> Record currently wakes up based on watermarks to read events from the mmaps and
> write them out to the file. The result is a file that can have large blocks of
> events per mmap before a finished round event is added to the stream.  This in
> turn affects the quantity of events that have to be passed through the ordered
> events queue before results can be displayed to the user. For commands like
> perf-script this can lead to long unnecessarily long delays before a user gets
> output. Large systems (e.g, 1024 cpus) further compound this effect. I have seen
> instances where I have to wait 45 minutes for perf-script to process a 5GB file
> before any events are shown.

so you have pipe to perf script, right?

> 
> This patch adds an option to perf-record to allow a user to specify the
> poll timeout in msec. For example using 100 msec timeouts similar to perf-top
> means the mmaps are traversed much more frequently leading to a smoother
> analysis side.

there's also the '--no-buffering' option that sets:

                attr->watermark = 0;
                attr->wakeup_events = 1;

but that's just the other edge, which is not what you'd want

I think it's good to have user side configurable as well

Acked-by: Jiri Olsa <jolsa@kernel.org>

thanks,
jirka

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf record: Allow poll timeout to be specified
  2015-03-25 12:38 ` Jiri Olsa
@ 2015-03-25 14:37   ` David Ahern
  0 siblings, 0 replies; 10+ messages in thread
From: David Ahern @ 2015-03-25 14:37 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, linux-kernel, Ingo Molnar, Frederic Weisbecker,
	Peter Zijlstra, Jiri Olsa, Namhyung Kim, Stephane Eranian,
	Adrian Hunter

On 3/25/15 6:38 AM, Jiri Olsa wrote:
> On Tue, Mar 24, 2015 at 12:09:48PM -0400, David Ahern wrote:
>> Record currently wakes up based on watermarks to read events from the mmaps and
>> write them out to the file. The result is a file that can have large blocks of
>> events per mmap before a finished round event is added to the stream.  This in
>> turn affects the quantity of events that have to be passed through the ordered
>> events queue before results can be displayed to the user. For commands like
>> perf-script this can lead to long unnecessarily long delays before a user gets
>> output. Large systems (e.g, 1024 cpus) further compound this effect. I have seen
>> instances where I have to wait 45 minutes for perf-script to process a 5GB file
>> before any events are shown.
>
> so you have pipe to perf script, right?

$ perf record ....
$ perf script ...
<wait an eternity>
data

>
>>
>> This patch adds an option to perf-record to allow a user to specify the
>> poll timeout in msec. For example using 100 msec timeouts similar to perf-top
>> means the mmaps are traversed much more frequently leading to a smoother
>> analysis side.
>
> there's also the '--no-buffering' option that sets:
>
>                  attr->watermark = 0;
>                  attr->wakeup_events = 1;
>
> but that's just the other edge, which is not what you'd want

right, that is the other extreme. record would never go to sleep.

>
> I think it's good to have user side configurable as well
>
> Acked-by: Jiri Olsa <jolsa@kernel.org>
>
> thanks,
> jirka
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf record: Allow poll timeout to be specified
  2015-03-25 12:14       ` Arnaldo Carvalho de Melo
@ 2015-03-25 14:41         ` David Ahern
  2015-03-25 18:20           ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: David Ahern @ 2015-03-25 14:41 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ingo Molnar
  Cc: linux-kernel, Frederic Weisbecker, Peter Zijlstra, Jiri Olsa,
	Namhyung Kim, Stephane Eranian, Adrian Hunter

On 3/25/15 6:14 AM, Arnaldo Carvalho de Melo wrote:
> If David doesn't come up with something I'll probably will, as making
> 'trace' use the ordered_samples, like 'perf top' does (initially with
> some arbitrary reasonable poll timeout value), is a low hanging fruit to
> get those multi-CPU tracepoints sorted until I get something better in
> place...

I have thought about it. It needs to be an adaptive algorithm:
1. start at 100 msec.
2. Read the maps. How much data are there (not events, but data size)? 
3. Adjust poll timeout up or down with some heuristic -- maybe something 
similar to the algorithm perf-top uses for removing entries from the 
histograms.

That said, I still thinking giving the user control is not a crazy idea.

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] perf record: Allow poll timeout to be specified
  2015-03-25 14:41         ` David Ahern
@ 2015-03-25 18:20           ` Ingo Molnar
  0 siblings, 0 replies; 10+ messages in thread
From: Ingo Molnar @ 2015-03-25 18:20 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Frederic Weisbecker,
	Peter Zijlstra, Jiri Olsa, Namhyung Kim, Stephane Eranian,
	Adrian Hunter


* David Ahern <david.ahern@oracle.com> wrote:

> On 3/25/15 6:14 AM, Arnaldo Carvalho de Melo wrote:
> >If David doesn't come up with something I'll probably will, as making
> >'trace' use the ordered_samples, like 'perf top' does (initially with
> >some arbitrary reasonable poll timeout value), is a low hanging fruit to
> >get those multi-CPU tracepoints sorted until I get something better in
> >place...
> 
> I have thought about it. It needs to be an adaptive algorithm:
> 1. start at 100 msec.
> 2. Read the maps. How much data are there (not events, but data size)? 3.
> Adjust poll timeout up or down with some heuristic -- maybe something
> similar to the algorithm perf-top uses for removing entries from the
> histograms.
> 
> That said, I still thinking giving the user control is not a crazy idea.

The user has absolutely zero control over the rate of data and about 
what the ideal refresh rate would be! So this really should be solved 
in software and automatically.

'try to tweak this parameter up and down until the tool is usable' is 
a really, really poor user interface concept ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-03-25 18:21 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-24 16:09 [PATCH] perf record: Allow poll timeout to be specified David Ahern
2015-03-24 16:12 ` Ingo Molnar
2015-03-24 16:18   ` David Ahern
2015-03-24 21:21   ` Arnaldo Carvalho de Melo
2015-03-25  9:11     ` Ingo Molnar
2015-03-25 12:14       ` Arnaldo Carvalho de Melo
2015-03-25 14:41         ` David Ahern
2015-03-25 18:20           ` Ingo Molnar
2015-03-25 12:38 ` Jiri Olsa
2015-03-25 14:37   ` David Ahern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.