linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] perf/x86: PMU IRQ handler issues
@ 2014-05-28 19:48 Stephane Eranian
  2014-05-28 19:55 ` Andi Kleen
  2014-05-28 20:20 ` Dave Hansen
  0 siblings, 2 replies; 6+ messages in thread
From: Stephane Eranian @ 2014-05-28 19:48 UTC (permalink / raw)
  To: LKML, Peter Zijlstra, mingo, ak, Dave Hansen, Yan, Zheng

Hi,

Some days ago, I was alerted that under important network load, something
is going wrong with perf_event sampling in frequency mode (such as perf top).
The number of samples was way too low given the cycle count (via perf stat).
Looking at the syslog, I noticed that the perf irq latency throttler
had kicked in
several times. There may have been several reasons for this.

Maybe the workload had changing phases and the frequency adjustments
was not working properly and dropping to very small period and then generated
flood of interrupts.

Another explanation is that because we ACK the NMI early, we leave the
door open to other interrupts, incl. NIC, and we are interrupting the execution
of the PMU IRQ handler, yet that detour is measured in the PMU handler
latency, causing more throttling than needed. Is that a plausible scenario too?
And if so, I think we need to narrow the window for timing errors, by
acking late
on all processors and not just HSW.

I still suspect there is something wrong with the frequency mode.

Any better explanation for the problem?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] perf/x86: PMU IRQ handler issues
  2014-05-28 19:48 [RFC] perf/x86: PMU IRQ handler issues Stephane Eranian
@ 2014-05-28 19:55 ` Andi Kleen
  2014-05-28 19:56   ` Stephane Eranian
  2014-05-28 20:20 ` Dave Hansen
  1 sibling, 1 reply; 6+ messages in thread
From: Andi Kleen @ 2014-05-28 19:55 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: LKML, Peter Zijlstra, mingo, Dave Hansen, Yan, Zheng


> Another explanation is that because we ACK the NMI early, we leave the
> door open to other interrupts, incl. NIC, and we are interrupting the execution

PMI executes with interrupts off.

> of the PMU IRQ handler, yet that detour is measured in the PMU handler
> latency, causing more throttling than needed. Is that a plausible scenario too?
> And if so, I think we need to narrow the window for timing errors, by
> acking late
> on all processors and not just HSW.

If you think there's a concrete problem please show an ftrace.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] perf/x86: PMU IRQ handler issues
  2014-05-28 19:55 ` Andi Kleen
@ 2014-05-28 19:56   ` Stephane Eranian
  2014-05-28 20:06     ` Andi Kleen
  0 siblings, 1 reply; 6+ messages in thread
From: Stephane Eranian @ 2014-05-28 19:56 UTC (permalink / raw)
  To: Andi Kleen; +Cc: LKML, Peter Zijlstra, mingo, Dave Hansen, Yan, Zheng

On Wed, May 28, 2014 at 9:55 PM, Andi Kleen <ak@linux.intel.com> wrote:
>
>> Another explanation is that because we ACK the NMI early, we leave the
>> door open to other interrupts, incl. NIC, and we are interrupting the execution
>
> PMI executes with interrupts off.
>
And that's coming from where?

>> of the PMU IRQ handler, yet that detour is measured in the PMU handler
>> latency, causing more throttling than needed. Is that a plausible scenario too?
>> And if so, I think we need to narrow the window for timing errors, by
>> acking late
>> on all processors and not just HSW.
>
> If you think there's a concrete problem please show an ftrace.
>
> -Andi
> --
> ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] perf/x86: PMU IRQ handler issues
  2014-05-28 19:56   ` Stephane Eranian
@ 2014-05-28 20:06     ` Andi Kleen
  0 siblings, 0 replies; 6+ messages in thread
From: Andi Kleen @ 2014-05-28 20:06 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: LKML, Peter Zijlstra, mingo, Dave Hansen, Yan, Zheng

On Wed, May 28, 2014 at 09:56:35PM +0200, Stephane Eranian wrote:
> On Wed, May 28, 2014 at 9:55 PM, Andi Kleen <ak@linux.intel.com> wrote:
> >
> >> Another explanation is that because we ACK the NMI early, we leave the
> >> door open to other interrupts, incl. NIC, and we are interrupting the execution
> >
> > PMI executes with interrupts off.
> >
> And that's coming from where?

The interrupt gate.

-Andi


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] perf/x86: PMU IRQ handler issues
  2014-05-28 19:48 [RFC] perf/x86: PMU IRQ handler issues Stephane Eranian
  2014-05-28 19:55 ` Andi Kleen
@ 2014-05-28 20:20 ` Dave Hansen
  2014-05-29  2:56   ` Andi Kleen
  1 sibling, 1 reply; 6+ messages in thread
From: Dave Hansen @ 2014-05-28 20:20 UTC (permalink / raw)
  To: Stephane Eranian, LKML, Peter Zijlstra, mingo, ak, Yan, Zheng

On 05/28/2014 12:48 PM, Stephane Eranian wrote:
> Some days ago, I was alerted that under important network load, something
> is going wrong with perf_event sampling in frequency mode (such as perf top).
> The number of samples was way too low given the cycle count (via perf stat).
> Looking at the syslog, I noticed that the perf irq latency throttler
> had kicked in
> several times. There may have been several reasons for this.
> 
> Maybe the workload had changing phases and the frequency adjustments
> was not working properly and dropping to very small period and then generated
> flood of interrupts.

The problem description here is pretty fuzzy.  Could you give some
actual numbers describing the issues that you're seeing, including the
ftrace that Andi was asking for?  There are also some handy tracepoints
for NMI lengths that I stuck in.

The reason that the throttling code is there is that the CPU can get in
to a state where it is doing *NOTHING* other than processing NMIs (the
biggest of which are the perf-driven ones).  It doesn't start throttling
until 128 samples end up averaging more than the limit.

How large of a system is this, btw?  I had the worst issues on a
160-logical-cpu system.  It was much harder to get it to trouble on
smaller systems.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] perf/x86: PMU IRQ handler issues
  2014-05-28 20:20 ` Dave Hansen
@ 2014-05-29  2:56   ` Andi Kleen
  0 siblings, 0 replies; 6+ messages in thread
From: Andi Kleen @ 2014-05-29  2:56 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Stephane Eranian, LKML, Peter Zijlstra, mingo, Yan, Zheng

> actual numbers describing the issues that you're seeing, including the
> ftrace that Andi was asking for?  There are also some handy tracepoints
> for NMI lengths that I stuck in.

Another good thing would be to plot the periods (from perf report -D)

-Andi

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-05-29  2:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-28 19:48 [RFC] perf/x86: PMU IRQ handler issues Stephane Eranian
2014-05-28 19:55 ` Andi Kleen
2014-05-28 19:56   ` Stephane Eranian
2014-05-28 20:06     ` Andi Kleen
2014-05-28 20:20 ` Dave Hansen
2014-05-29  2:56   ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).