[RFC] perf/x86: PMU IRQ handler issues

* [RFC] perf/x86: PMU IRQ handler issues
@ 2014-05-28 19:48 Stephane Eranian
  2014-05-28 19:55 ` Andi Kleen
  2014-05-28 20:20 ` Dave Hansen
  0 siblings, 2 replies; 6+ messages in thread
From: Stephane Eranian @ 2014-05-28 19:48 UTC (permalink / raw)
  To: LKML, Peter Zijlstra, mingo, ak, Dave Hansen, Yan, Zheng

Hi,

Some days ago, I was alerted that under important network load, something
is going wrong with perf_event sampling in frequency mode (such as perf top).
The number of samples was way too low given the cycle count (via perf stat).
Looking at the syslog, I noticed that the perf irq latency throttler
had kicked in
several times. There may have been several reasons for this.

Maybe the workload had changing phases and the frequency adjustments
was not working properly and dropping to very small period and then generated
flood of interrupts.

Another explanation is that because we ACK the NMI early, we leave the
door open to other interrupts, incl. NIC, and we are interrupting the execution
of the PMU IRQ handler, yet that detour is measured in the PMU handler
latency, causing more throttling than needed. Is that a plausible scenario too?
And if so, I think we need to narrow the window for timing errors, by
acking late
on all processors and not just HSW.

I still suspect there is something wrong with the frequency mode.

Any better explanation for the problem?

^ permalink raw reply	[flat|nested] 6+ messages in thread