All of lore.kernel.org
 help / color / mirror / Atom feed
* kernel/perf: Sample data being lost
@ 2020-04-21 15:54 Thomas Richter
  2020-04-21 15:59 ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Richter @ 2020-04-21 15:54 UTC (permalink / raw)
  To: linux-perf-use., Linux Kernel Mailing List, Arnaldo Carvalho de Melo
  Cc: Heiko Carstens, Sumanth Korikkar, Vasily Gorbik

Since a couple of days I see this warning popping up very often:

[root@m35lp76 perf]# ./perf record --call-graph dwarf -e rb0000 -- find /
[ perf record: Woken up 282 times to write data ]
Warning:
Processed 16999 events and lost 382 chunks!

Check IO/CPU overload!

[ perf record: Captured and wrote 125.730 MB perf.data (16219 samples) ]
[root@m35lp76 perf]#

The machine is idle, its my development system, so not much going on.
It also happens using a software event, for example cycles. It shows
up more often, the larger the sample size is. So for example:

[root@m35lp76 perf]# pwd
/root/linux/tools/perf
[root@m35lp76 perf]#  ./perf record  --call-graph dwarf -- find
[ perf record: Woken up 2 times to write data ]
Warning:
Processed 231 events and lost 7 chunks!

Check IO/CPU overload!

[ perf record: Captured and wrote 1.000 MB perf.data (130 samples) ]
[root@m35lp76 perf]#

I have very seldom observed this before, only in extremely rare cases with
a heavily loaded machine. I am wondering what has changed, I haven't
changed anything in the s390 PMU device drivers.
It could be
 - common kernel code when writing into the ringbuffer.
 - the perf tool too slow to read data from the mapped buffer.
   However I have not come across changes in this area.

Has anybody observed similar issue?

PS: I have added some printk messages into my PMU devices drivers.
I have seen messages that the 16384 pages for auxilary buffers are full
and that samples have been dropped.


Thanks a lot.                                 
-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: kernel/perf: Sample data being lost
  2020-04-21 15:54 kernel/perf: Sample data being lost Thomas Richter
@ 2020-04-21 15:59 ` Arnaldo Carvalho de Melo
  2020-04-22 14:42   ` Thomas Richter
  0 siblings, 1 reply; 3+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-04-21 15:59 UTC (permalink / raw)
  To: Thomas Richter
  Cc: linux-perf-use.,
	Linux Kernel Mailing List, Heiko Carstens, Sumanth Korikkar,
	Vasily Gorbik

Em Tue, Apr 21, 2020 at 05:54:29PM +0200, Thomas Richter escreveu:
> Since a couple of days I see this warning popping up very often:
> 
> [root@m35lp76 perf]# ./perf record --call-graph dwarf -e rb0000 -- find /
> [ perf record: Woken up 282 times to write data ]
> Warning:
> Processed 16999 events and lost 382 chunks!
> 
> Check IO/CPU overload!
> 
> [ perf record: Captured and wrote 125.730 MB perf.data (16219 samples) ]
> [root@m35lp76 perf]#
> 
> The machine is idle, its my development system, so not much going on.
> It also happens using a software event, for example cycles. It shows
> up more often, the larger the sample size is. So for example:
> 
> [root@m35lp76 perf]# pwd
> /root/linux/tools/perf
> [root@m35lp76 perf]#  ./perf record  --call-graph dwarf -- find
> [ perf record: Woken up 2 times to write data ]
> Warning:
> Processed 231 events and lost 7 chunks!
> 
> Check IO/CPU overload!
> 
> [ perf record: Captured and wrote 1.000 MB perf.data (130 samples) ]
> [root@m35lp76 perf]#
> 
> I have very seldom observed this before, only in extremely rare cases with
> a heavily loaded machine. I am wondering what has changed, I haven't
> changed anything in the s390 PMU device drivers.
> It could be
>  - common kernel code when writing into the ringbuffer.
>  - the perf tool too slow to read data from the mapped buffer.
>    However I have not come across changes in this area.
> 
> Has anybody observed similar issue?
> 
> PS: I have added some printk messages into my PMU devices drivers.
> I have seen messages that the 16384 pages for auxilary buffers are full
> and that samples have been dropped.

Can you try to bisect tools/perf?

Something like:

git checkout v5.4

build it, no problems? Use it as the 'git bisect good' starting point.

- Arnaldo

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: kernel/perf: Sample data being lost
  2020-04-21 15:59 ` Arnaldo Carvalho de Melo
@ 2020-04-22 14:42   ` Thomas Richter
  0 siblings, 0 replies; 3+ messages in thread
From: Thomas Richter @ 2020-04-22 14:42 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: linux-perf-use.,
	Linux Kernel Mailing List, Heiko Carstens, Sumanth Korikkar,
	Vasily Gorbik

On 4/21/20 5:59 PM, Arnaldo Carvalho de Melo wrote:
> Em Tue, Apr 21, 2020 at 05:54:29PM +0200, Thomas Richter escreveu:
>> Since a couple of days I see this warning popping up very often:
>>
>> [root@m35lp76 perf]# ./perf record --call-graph dwarf -e rb0000 -- find /
>> [ perf record: Woken up 282 times to write data ]
>> Warning:
>> Processed 16999 events and lost 382 chunks!
>>
>> Check IO/CPU overload!
>>
>> [ perf record: Captured and wrote 125.730 MB perf.data (16219 samples) ]
>> [root@m35lp76 perf]#
>>
>> The machine is idle, its my development system, so not much going on.
>> It also happens using a software event, for example cycles. It shows
>> up more often, the larger the sample size is. So for example:
>>
>> [root@m35lp76 perf]# pwd
>> /root/linux/tools/perf
>> [root@m35lp76 perf]#  ./perf record  --call-graph dwarf -- find
>> [ perf record: Woken up 2 times to write data ]
>> Warning:
>> Processed 231 events and lost 7 chunks!
>>
>> Check IO/CPU overload!
>>
>> [ perf record: Captured and wrote 1.000 MB perf.data (130 samples) ]
>> [root@m35lp76 perf]#
>>
>> I have very seldom observed this before, only in extremely rare cases with
>> a heavily loaded machine. I am wondering what has changed, I haven't
>> changed anything in the s390 PMU device drivers.
>> It could be
>>  - common kernel code when writing into the ringbuffer.
>>  - the perf tool too slow to read data from the mapped buffer
>>    However I have not come across changes in this area.
>>
>> Has anybody observed similar issue?
>>
>> PS: I have added some printk messages into my PMU devices drivers.
>> I have seen messages that the 16384 pages for auxilary buffers are full
>> and that samples have been dropped.
> 
> Can you try to bisect tools/perf?
> 
> Something like:
> 
> git checkout v5.4
> 
> build it, no problems? Use it as the 'git bisect good' starting point.
> 
> - Arnaldo
> 

I think this was my fault. The samples with call-graphs are large and I forgot
to increase the mapped ringbuffer with the -m option.
So the buffer default size is 128 pages (516kb) which fills quite quickly.

Sorry for the noise.

I also realized this status line running perf top

Too slow to read ring buffer (change period (-c/-F) or limit CPUs (-C)

It shows up after ca 1 minutes on an idle system.
I guess this is the sane root cause?
-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-04-22 14:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-21 15:54 kernel/perf: Sample data being lost Thomas Richter
2020-04-21 15:59 ` Arnaldo Carvalho de Melo
2020-04-22 14:42   ` Thomas Richter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.