All of lore.kernel.org
 help / color / mirror / Atom feed
* perf sampling frequency drops after some record rounds?
@ 2017-04-11 13:49 Milian Wolff
  2017-04-11 14:49 ` Arnaldo Carvalho de Melo
  2017-04-12  3:07 ` Andi Kleen
  0 siblings, 2 replies; 3+ messages in thread
From: Milian Wolff @ 2017-04-11 13:49 UTC (permalink / raw)
  To: Perf Users; +Cc: Arnaldo Carvalho de Melo, Andi Kleen, Nate Rogers

Hello all,

a colleague of mine (CC'ed) is encountering a strange issue with perf from 
Ubuntu 16.04 running on a Thinkpad P50 with Intel(R) Core(TM) i7-6700HQ CPU @ 
2.60GHz on 4.4.0-72-generic with perf version 4.4.49.

For him, the sampling frequency drops dramatically after some successful 
records, making perf record essentially unusable afterwards:

First, everything is nice and peachy:

Tr0g@PC:~$ perf record --call-graph dwarf -F 999 ./cpp-inlining
6.66491e+16
[ perf record: Woken up 20 times to write data ]
[ perf record: Captured and wrote 4.926 MB perf.data (613 samples) ]
Tr0g@PC:~$ perf record --call-graph dwarf -F 999 ./cpp-inlining
6.66491e+16
[ perf record: Woken up 19 times to write data ]
[ perf record: Captured and wrote 4.734 MB perf.data (591 samples) ]
Tr0g@PC:~$ perf record --call-graph dwarf -F 999 ./cpp-inlining
6.66491e+16
[ perf record: Woken up 19 times to write data ]
[ perf record: Captured and wrote 4.750 MB perf.data (591 samples) ]
Tr0g@PC:~$ perf record --call-graph dwarf -F 999 ./cpp-inlining
6.66491e+16
[ perf record: Woken up 20 times to write data ]
[ perf record: Captured and wrote 4.766 MB perf.data (593 samples) ]
Tr0g@PC:~$ perf record --call-graph dwarf -F 999 ./cpp-inlining
6.66491e+16
[ perf record: Woken up 19 times to write data ]
[ perf record: Captured and wrote 4.750 MB perf.data (591 samples) ]

But then, suddenly:

Tr0g@PC:~$ perf record --call-graph dwarf -F 999 ./cpp-inlining
6.66491e+16
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.082 MB perf.data (10 samples) ]
Tr0g@PC:~$ perf record --call-graph dwarf -F 999 ./cpp-inlining
6.66491e+16
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.082 MB perf.data (10 samples) ]

dmesg only shows:

[28449.132447] perf interrupt took too long (2510 > 2500), lowering 
kernel.perf_event_max_sample_rate to 50000

But that should still be large enough to encompass the above. Also, increasing 
that value to, say, 100000 does not help this situation. The only known 
workaround is to restart the machine to make it work again.

Is this a known kernel bug in that version? Anything he can try out to fix 
this situation? Anything he could do, other than restarting this machine, to 
bring back the usable sampling frequency?

Thanks
-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: perf sampling frequency drops after some record rounds?
  2017-04-11 13:49 perf sampling frequency drops after some record rounds? Milian Wolff
@ 2017-04-11 14:49 ` Arnaldo Carvalho de Melo
  2017-04-12  3:07 ` Andi Kleen
  1 sibling, 0 replies; 3+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-04-11 14:49 UTC (permalink / raw)
  To: Milian Wolff; +Cc: Perf Users, Andi Kleen, Nate Rogers

Em Tue, Apr 11, 2017 at 03:49:06PM +0200, Milian Wolff escreveu:
> Hello all,
> 
> a colleague of mine (CC'ed) is encountering a strange issue with perf from 
> Ubuntu 16.04 running on a Thinkpad P50 with Intel(R) Core(TM) i7-6700HQ CPU @ 
> 2.60GHz on 4.4.0-72-generic with perf version 4.4.49.
> 
> For him, the sampling frequency drops dramatically after some successful 
> records, making perf record essentially unusable afterwards:
> 
> First, everything is nice and peachy:
> 
> Tr0g@PC:~$ perf record --call-graph dwarf -F 999 ./cpp-inlining
> 6.66491e+16
> [ perf record: Woken up 20 times to write data ]
> [ perf record: Captured and wrote 4.926 MB perf.data (613 samples) ]
> Tr0g@PC:~$ perf record --call-graph dwarf -F 999 ./cpp-inlining
> 6.66491e+16
> [ perf record: Woken up 19 times to write data ]
> [ perf record: Captured and wrote 4.734 MB perf.data (591 samples) ]
> Tr0g@PC:~$ perf record --call-graph dwarf -F 999 ./cpp-inlining
> 6.66491e+16
> [ perf record: Woken up 19 times to write data ]
> [ perf record: Captured and wrote 4.750 MB perf.data (591 samples) ]
> Tr0g@PC:~$ perf record --call-graph dwarf -F 999 ./cpp-inlining
> 6.66491e+16
> [ perf record: Woken up 20 times to write data ]
> [ perf record: Captured and wrote 4.766 MB perf.data (593 samples) ]
> Tr0g@PC:~$ perf record --call-graph dwarf -F 999 ./cpp-inlining
> 6.66491e+16
> [ perf record: Woken up 19 times to write data ]
> [ perf record: Captured and wrote 4.750 MB perf.data (591 samples) ]
> 
> But then, suddenly:
> 
> Tr0g@PC:~$ perf record --call-graph dwarf -F 999 ./cpp-inlining
> 6.66491e+16
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.082 MB perf.data (10 samples) ]
> Tr0g@PC:~$ perf record --call-graph dwarf -F 999 ./cpp-inlining
> 6.66491e+16
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.082 MB perf.data (10 samples) ]
> 
> dmesg only shows:
> 
> [28449.132447] perf interrupt took too long (2510 > 2500), lowering 
> kernel.perf_event_max_sample_rate to 50000
> 
> But that should still be large enough to encompass the above. Also, increasing 
> that value to, say, 100000 does not help this situation. The only known 
> workaround is to restart the machine to make it work again.
> 
> Is this a known kernel bug in that version? Anything he can try out to fix 
> this situation? Anything he could do, other than restarting this machine, to 
> bring back the usable sampling frequency?

I don't recall any specific patch, but a quick look showed a few patches
fixing the auto throttling of that max_sample_rate knob.

In such cases you guys could help by trying to build the latest kernel
using the config you have, and see if the problem persists. A bisection
would then point what is the problem, else that would help in pointing
out a bug in the current code.

Or you could try to workaround that a bit by reducing the size of the
stack dump collected via --call-graph dwarf, to reduce the time spent in
NMI that leads to the auto-throttling of the sample rate.

Also there are some other knobs that could be tweaked somehow to see if
they help:

[acme@jouet linux]$ ls -la /proc/sys/kernel/*perf*
-rw-r--r--. 1 root root 0 Apr 11 11:49 /proc/sys/kernel/perf_cpu_time_max_percent
-rw-r--r--. 1 root root 0 Apr 10 12:18 /proc/sys/kernel/perf_event_max_contexts_per_stack
-rw-r--r--. 1 root root 0 Apr 11 10:09 /proc/sys/kernel/perf_event_max_sample_rate
-rw-r--r--. 1 root root 0 Apr 10 12:18 /proc/sys/kernel/perf_event_max_stack
-rw-r--r--. 1 root root 0 Apr 11 08:43 /proc/sys/kernel/perf_event_mlock_kb
-rw-r--r--. 1 root root 0 Apr 10 12:18 /proc/sys/kernel/perf_event_paranoid
[acme@jouet linux]$

- Arnaldo

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: perf sampling frequency drops after some record rounds?
  2017-04-11 13:49 perf sampling frequency drops after some record rounds? Milian Wolff
  2017-04-11 14:49 ` Arnaldo Carvalho de Melo
@ 2017-04-12  3:07 ` Andi Kleen
  1 sibling, 0 replies; 3+ messages in thread
From: Andi Kleen @ 2017-04-12  3:07 UTC (permalink / raw)
  To: Milian Wolff; +Cc: Perf Users, Arnaldo Carvalho de Melo, Nate Rogers

Milian Wolff <milian.wolff@kdab.com> writes:

> a colleague of mine (CC'ed) is encountering a strange issue with perf from 
> Ubuntu 16.04 running on a Thinkpad P50 with Intel(R) Core(TM) i7-6700HQ CPU @ 
> 2.60GHz on 4.4.0-72-generic with perf version 4.4.49.
>
> For him, the sampling frequency drops dramatically after some successful 
> records, making perf record essentially unusable afterwards:

I see that regularly too, especially on larger systems.

You can see how long the perf nmis take by enabling the nmi trace
tracepoint

echo 1 > /sys/kernel/debug/tracing/events/nmi/enable
... run perf ...
cat /sys/kernel/debug/trace

I debugged a few
cases some time ago, and the two most common causes were:
- there was an old case where there was a lot of cache line contention
on struct page reference counts hurting larger systems. That was fixed
eventually.
- too many page faults while stack walking. The old atomic copy user
did multiple page faults even for a single page fault, just to find out
where exactly the page fault was occurring (even though nothing cares
about the exact location). This was fixed a few kernels ago, but it will
still do a single page fault, which may be slow.
- sometimes when the perf ring buffer is full there is a very long
delay. I have seen it in traces, but no fixes so far.
- there may be more problems.

Usually the way to debug it is using ftrace function tracer, but you
have to enable ftrace for perf first (remove the lines from the makefile
that disable it). Then

T=/sys/kernel/debug/tracing
echo default_do_nmi > $T/set_graph_function
echo function_graph > $T/current_tracer
cat $T/events/nmi/nmi_handler/trigger
echo 'traceoff if delta_ns > 20000' > $T/events/nmi/nmi_handler/trigger
echo 1 > $T/events/nmi/enable

... run perf ...

and look at what the long NMI did in the $T/trace

Of course you can also disable the limiter sysctl
(echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent) 
but then there is no protection of the perf NMI using up all your CPU
and hanging the system.

-Andi

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-04-12  3:07 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-11 13:49 perf sampling frequency drops after some record rounds? Milian Wolff
2017-04-11 14:49 ` Arnaldo Carvalho de Melo
2017-04-12  3:07 ` Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.