* Re: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-09 19:48 ` [PATCH] perf, x86: try to handle unknown nmis with running perfctrs Robert Richter
@ 2010-08-09 20:02 ` Cyrill Gorcunov
2010-08-10 7:42 ` Robert Richter
2010-08-10 20:48 ` Don Zickus
` (2 subsequent siblings)
3 siblings, 1 reply; 85+ messages in thread
From: Cyrill Gorcunov @ 2010-08-09 20:02 UTC (permalink / raw)
To: Robert Richter
Cc: Don Zickus, Peter Zijlstra, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Mon, Aug 09, 2010 at 09:48:29PM +0200, Robert Richter wrote:
> On 06.08.10 10:21:31, Don Zickus wrote:
> > On Fri, Aug 06, 2010 at 08:52:03AM +0200, Robert Richter wrote:
>
> > > I was playing around with it yesterday trying to fix this. My idea is
> > > to skip an unkown nmi if the privious nmi was a *handled* perfctr
> >
> > You might want to add a little more logic that says *handled* _and_ had
> > more than one perfctr trigger. Most of the time only one perfctr is
> > probably triggering, so you might be eating unknown_nmi's needlessly.
> >
> > Just a thought.
>
> Yes, that's true. It could be implemented on top of the patch below.
>
> >
> > > nmi. I will probably post an rfc patch early next week.
>
> Here it comes:
>
Thanks Robert! Looks good to me, one nit below.
> From d2739578199d881ae6a9537c1b96a0efd1cdea43 Mon Sep 17 00:00:00 2001
> From: Robert Richter <robert.richter@amd.com>
> Date: Thu, 5 Aug 2010 16:19:59 +0200
> Subject: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
>
...
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index f2da20f..c3cd159 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -1200,12 +1200,16 @@ void perf_events_lapic_init(void)
> apic_write(APIC_LVTPC, APIC_DM_NMI);
> }
>
> +static DEFINE_PER_CPU(unsigned int, perfctr_handled);
> +
> static int __kprobes
> perf_event_nmi_handler(struct notifier_block *self,
> unsigned long cmd, void *__args)
> {
> struct die_args *args = __args;
> struct pt_regs *regs;
> + unsigned int this_nmi;
> + unsigned int prev_nmi;
>
> if (!atomic_read(&active_events))
> return NOTIFY_DONE;
> @@ -1214,7 +1218,26 @@ perf_event_nmi_handler(struct notifier_block *self,
> case DIE_NMI:
> case DIE_NMI_IPI:
> break;
> -
> + case DIE_NMIUNKNOWN:
> + /*
> + * This one could be our NMI, two events could trigger
> + * 'simultaneously' raising two back-to-back NMIs. If
> + * the first NMI handles both, the latter will be
> + * empty and daze the CPU.
> + *
> + * So, we drop this unknown NMI if the previous NMI
> + * was handling a perfctr. Otherwise we pass it and
> + * let the kernel handle the unknown nmi.
> + *
> + * Note: this could be improved if we drop unknown
> + * NMIs only if we handled more than one perfctr in
> + * the previous NMI.
> + */
> + this_nmi = percpu_read(irq_stat.__nmi_count);
> + prev_nmi = __get_cpu_var(perfctr_handled);
> + if (this_nmi == prev_nmi + 1)
> + return NOTIFY_STOP;
> + return NOTIFY_DONE;
> default:
> return NOTIFY_DONE;
> }
> @@ -1222,14 +1245,12 @@ perf_event_nmi_handler(struct notifier_block *self,
> regs = args->regs;
>
> apic_write(APIC_LVTPC, APIC_DM_NMI);
If only I'm not missing something this apic_write should go up to
"case DIE_NMIUNKNOWN" site, no?
-- Cyrill
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-09 20:02 ` Cyrill Gorcunov
@ 2010-08-10 7:42 ` Robert Richter
2010-08-10 16:16 ` Cyrill Gorcunov
0 siblings, 1 reply; 85+ messages in thread
From: Robert Richter @ 2010-08-10 7:42 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: Don Zickus, Peter Zijlstra, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On 09.08.10 16:02:45, Cyrill Gorcunov wrote:
> > @@ -1222,14 +1245,12 @@ perf_event_nmi_handler(struct notifier_block *self,
> > regs = args->regs;
> >
> > apic_write(APIC_LVTPC, APIC_DM_NMI);
>
> If only I'm not missing something this apic_write should go up to
> "case DIE_NMIUNKNOWN" site, no?
This seems to be code from the non-nmi implementation and can be
removed at all, which should be a separate patch. The vector is
already set up.
-Robert
--
Advanced Micro Devices, Inc.
Operating System Research Center
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-10 7:42 ` Robert Richter
@ 2010-08-10 16:16 ` Cyrill Gorcunov
2010-08-10 16:41 ` Robert Richter
0 siblings, 1 reply; 85+ messages in thread
From: Cyrill Gorcunov @ 2010-08-10 16:16 UTC (permalink / raw)
To: Robert Richter
Cc: Don Zickus, Peter Zijlstra, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Tue, Aug 10, 2010 at 09:42:00AM +0200, Robert Richter wrote:
> On 09.08.10 16:02:45, Cyrill Gorcunov wrote:
>
> > > @@ -1222,14 +1245,12 @@ perf_event_nmi_handler(struct notifier_block *self,
> > > regs = args->regs;
> > >
> > > apic_write(APIC_LVTPC, APIC_DM_NMI);
> >
> > If only I'm not missing something this apic_write should go up to
> > "case DIE_NMIUNKNOWN" site, no?
>
> This seems to be code from the non-nmi implementation and can be
> removed at all, which should be a separate patch. The vector is
> already set up.
>
> -Robert
>
No, this is just a short way to unmask LVTPC (which is required for
cpus). Actually lookig into this snippet I found that in p4 pmu
I've made one redundant unmaksing operation. will update as only
this area settle down.
-- Cyrill
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-10 16:16 ` Cyrill Gorcunov
@ 2010-08-10 16:41 ` Robert Richter
2010-08-10 17:24 ` Cyrill Gorcunov
0 siblings, 1 reply; 85+ messages in thread
From: Robert Richter @ 2010-08-10 16:41 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: Don Zickus, Peter Zijlstra, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On 10.08.10 12:16:27, Cyrill Gorcunov wrote:
> On Tue, Aug 10, 2010 at 09:42:00AM +0200, Robert Richter wrote:
> > On 09.08.10 16:02:45, Cyrill Gorcunov wrote:
> >
> > > > @@ -1222,14 +1245,12 @@ perf_event_nmi_handler(struct notifier_block *self,
> > > > regs = args->regs;
> > > >
> > > > apic_write(APIC_LVTPC, APIC_DM_NMI);
> > >
> > > If only I'm not missing something this apic_write should go up to
> > > "case DIE_NMIUNKNOWN" site, no?
> >
> > This seems to be code from the non-nmi implementation and can be
> > removed at all, which should be a separate patch. The vector is
> > already set up.
> >
> > -Robert
> >
>
> No, this is just a short way to unmask LVTPC (which is required for
> cpus). Actually lookig into this snippet I found that in p4 pmu
> I've made one redundant unmaksing operation. will update as only
> this area settle down.
The vector is setup in hw_perf_enable() and then never masked. The
perfctrs nmi is alwayes enabled since then. I still see no reason for
unmasking it again with every nmi. Once you handle the nmi it is also
enabled.
-Robert
--
Advanced Micro Devices, Inc.
Operating System Research Center
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-10 16:41 ` Robert Richter
@ 2010-08-10 17:24 ` Cyrill Gorcunov
2010-08-10 19:05 ` Robert Richter
0 siblings, 1 reply; 85+ messages in thread
From: Cyrill Gorcunov @ 2010-08-10 17:24 UTC (permalink / raw)
To: Robert Richter
Cc: Don Zickus, Peter Zijlstra, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Tue, Aug 10, 2010 at 06:41:24PM +0200, Robert Richter wrote:
> On 10.08.10 12:16:27, Cyrill Gorcunov wrote:
> > On Tue, Aug 10, 2010 at 09:42:00AM +0200, Robert Richter wrote:
> > > On 09.08.10 16:02:45, Cyrill Gorcunov wrote:
> > >
> > > > > @@ -1222,14 +1245,12 @@ perf_event_nmi_handler(struct notifier_block *self,
> > > > > regs = args->regs;
> > > > >
> > > > > apic_write(APIC_LVTPC, APIC_DM_NMI);
> > > >
> > > > If only I'm not missing something this apic_write should go up to
> > > > "case DIE_NMIUNKNOWN" site, no?
> > >
> > > This seems to be code from the non-nmi implementation and can be
> > > removed at all, which should be a separate patch. The vector is
> > > already set up.
> > >
> > > -Robert
> > >
> >
> > No, this is just a short way to unmask LVTPC (which is required for
> > cpus). Actually lookig into this snippet I found that in p4 pmu
> > I've made one redundant unmaksing operation. will update as only
> > this area settle down.
>
> The vector is setup in hw_perf_enable() and then never masked. The
> perfctrs nmi is alwayes enabled since then. I still see no reason for
> unmasking it again with every nmi. Once you handle the nmi it is also
> enabled.
>
It gets masked on NMI arrival, at least for some models (Core Duo, P4,
P6 M and I suspect more theh that, that was the reason oprofile has
it, also there is a note in SDM V3a page 643).
> -Robert
>
> --
> Advanced Micro Devices, Inc.
> Operating System Research Center
>
-- Cyrill
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-10 17:24 ` Cyrill Gorcunov
@ 2010-08-10 19:05 ` Robert Richter
2010-08-10 19:24 ` Cyrill Gorcunov
0 siblings, 1 reply; 85+ messages in thread
From: Robert Richter @ 2010-08-10 19:05 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: Don Zickus, Peter Zijlstra, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On 10.08.10 13:24:51, Cyrill Gorcunov wrote:
> It gets masked on NMI arrival, at least for some models (Core Duo, P4,
> P6 M and I suspect more theh that, that was the reason oprofile has
> it, also there is a note in SDM V3a page 643).
Yes, that's right, I never noticed that. Maybe it is better to
implement the apic write it in model specific code then.
-Robert
--
Advanced Micro Devices, Inc.
Operating System Research Center
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-10 19:05 ` Robert Richter
@ 2010-08-10 19:24 ` Cyrill Gorcunov
2010-08-12 13:24 ` Robert Richter
0 siblings, 1 reply; 85+ messages in thread
From: Cyrill Gorcunov @ 2010-08-10 19:24 UTC (permalink / raw)
To: Robert Richter
Cc: Don Zickus, Peter Zijlstra, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Tue, Aug 10, 2010 at 09:05:41PM +0200, Robert Richter wrote:
> On 10.08.10 13:24:51, Cyrill Gorcunov wrote:
>
> > It gets masked on NMI arrival, at least for some models (Core Duo, P4,
> > P6 M and I suspect more theh that, that was the reason oprofile has
> > it, also there is a note in SDM V3a page 643).
>
> Yes, that's right, I never noticed that. Maybe it is better to
> implement the apic write it in model specific code then.
>
Perhaps we can make it simplier I think, ie like it was before -- we just
move it under your new DIE_NMIUNKNOWN, in separate patch of course. Though
I'm fine with either way.
(actually it's interesting to know wouldn't we leave lvt masked when
we hit 'second delayed nmi has arrived' situation, I guess we didn't
hit it before in real yet :-)
> -Robert
>
> --
> Advanced Micro Devices, Inc.
> Operating System Research Center
>
-- Cyrill
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-10 19:24 ` Cyrill Gorcunov
@ 2010-08-12 13:24 ` Robert Richter
2010-08-12 14:31 ` Cyrill Gorcunov
0 siblings, 1 reply; 85+ messages in thread
From: Robert Richter @ 2010-08-12 13:24 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: Don Zickus, Peter Zijlstra, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On 10.08.10 15:24:28, Cyrill Gorcunov wrote:
> On Tue, Aug 10, 2010 at 09:05:41PM +0200, Robert Richter wrote:
> > On 10.08.10 13:24:51, Cyrill Gorcunov wrote:
> >
> > > It gets masked on NMI arrival, at least for some models (Core Duo, P4,
> > > P6 M and I suspect more theh that, that was the reason oprofile has
> > > it, also there is a note in SDM V3a page 643).
> >
> > Yes, that's right, I never noticed that. Maybe it is better to
> > implement the apic write it in model specific code then.
> >
>
> Perhaps we can make it simplier I think, ie like it was before -- we just
> move it under your new DIE_NMIUNKNOWN, in separate patch of course. Though
> I'm fine with either way.
I do not understand why you want to put this in the 'unknown'
path. Isn't it necessary to unmask the vector with every call of the
nmi handler?
-Robert
>
> (actually it's interesting to know wouldn't we leave lvt masked when
> we hit 'second delayed nmi has arrived' situation, I guess we didn't
> hit it before in real yet :-)
>
> > -Robert
> >
> > --
> > Advanced Micro Devices, Inc.
> > Operating System Research Center
> >
> -- Cyrill
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
Advanced Micro Devices, Inc.
Operating System Research Center
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-12 13:24 ` Robert Richter
@ 2010-08-12 14:31 ` Cyrill Gorcunov
0 siblings, 0 replies; 85+ messages in thread
From: Cyrill Gorcunov @ 2010-08-12 14:31 UTC (permalink / raw)
To: Robert Richter
Cc: Don Zickus, Peter Zijlstra, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Thu, Aug 12, 2010 at 03:24:19PM +0200, Robert Richter wrote:
> On 10.08.10 15:24:28, Cyrill Gorcunov wrote:
> > On Tue, Aug 10, 2010 at 09:05:41PM +0200, Robert Richter wrote:
> > > On 10.08.10 13:24:51, Cyrill Gorcunov wrote:
> > >
> > > > It gets masked on NMI arrival, at least for some models (Core Duo, P4,
> > > > P6 M and I suspect more theh that, that was the reason oprofile has
> > > > it, also there is a note in SDM V3a page 643).
> > >
> > > Yes, that's right, I never noticed that. Maybe it is better to
> > > implement the apic write it in model specific code then.
> > >
> >
> > Perhaps we can make it simplier I think, ie like it was before -- we just
> > move it under your new DIE_NMIUNKNOWN, in separate patch of course. Though
> > I'm fine with either way.
>
> I do not understand why you want to put this in the 'unknown'
> path. Isn't it necessary to unmask the vector with every call of the
> nmi handler?
>
> -Robert
>
Heh, it's simple - I'm screwed. Robert you're right, of course it should NOT
be under every 'unknown' nmi. I thought about small optimization here, but
I think all this should be done only _after_ your patch is merged.
Sorry for confuse ;)
-- Cyrill
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-09 19:48 ` [PATCH] perf, x86: try to handle unknown nmis with running perfctrs Robert Richter
2010-08-09 20:02 ` Cyrill Gorcunov
@ 2010-08-10 20:48 ` Don Zickus
2010-08-11 2:44 ` Frederic Weisbecker
2010-08-11 3:19 ` Huang Ying
2010-08-11 22:00 ` [PATCH -v2] " Robert Richter
2010-08-17 15:22 ` [PATCH -v3] " Robert Richter
3 siblings, 2 replies; 85+ messages in thread
From: Don Zickus @ 2010-08-10 20:48 UTC (permalink / raw)
To: Robert Richter
Cc: Cyrill Gorcunov, Peter Zijlstra, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Mon, Aug 09, 2010 at 09:48:29PM +0200, Robert Richter wrote:
> On 06.08.10 10:21:31, Don Zickus wrote:
> > On Fri, Aug 06, 2010 at 08:52:03AM +0200, Robert Richter wrote:
>
> > > I was playing around with it yesterday trying to fix this. My idea is
> > > to skip an unkown nmi if the privious nmi was a *handled* perfctr
> >
> > You might want to add a little more logic that says *handled* _and_ had
> > more than one perfctr trigger. Most of the time only one perfctr is
> > probably triggering, so you might be eating unknown_nmi's needlessly.
> >
> > Just a thought.
>
> Yes, that's true. It could be implemented on top of the patch below.
I did, but the changes basically revert the bulk of your patch.
>
> >
> > > nmi. I will probably post an rfc patch early next week.
>
> Here it comes:
>
> From d2739578199d881ae6a9537c1b96a0efd1cdea43 Mon Sep 17 00:00:00 2001
> From: Robert Richter <robert.richter@amd.com>
> Date: Thu, 5 Aug 2010 16:19:59 +0200
> Subject: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
On top of Robert's patch:
(compiled tested only because I don't have a fancy button to trigger
unknown nmis)
>From 548cf5148f47618854a0eff22b1d55db71b6f8fc Mon Sep 17 00:00:00 2001
From: Don Zickus <dzickus@redhat.com>
Date: Tue, 10 Aug 2010 16:40:03 -0400
Subject: [PATCH] perf, x86: only skip NMIs when multiple perfctrs trigger
A small optimization on top of Robert's patch that limits the
skipping of NMI's to cases where we detect multiple perfctr events
have happened.
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
arch/x86/kernel/cpu/perf_event.c | 34 ++++++++++++++++++++--------------
1 files changed, 20 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index c3cd159..066046d 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1154,7 +1154,7 @@ static int x86_pmu_handle_irq(struct pt_regs *regs)
/*
* event overflow
*/
- handled = 1;
+ handled += 1;
data.period = event->hw.last_period;
if (!x86_perf_event_set_period(event))
@@ -1200,7 +1200,7 @@ void perf_events_lapic_init(void)
apic_write(APIC_LVTPC, APIC_DM_NMI);
}
-static DEFINE_PER_CPU(unsigned int, perfctr_handled);
+static DEFINE_PER_CPU(unsigned int, perfctr_skip);
static int __kprobes
perf_event_nmi_handler(struct notifier_block *self,
@@ -1208,8 +1208,7 @@ perf_event_nmi_handler(struct notifier_block *self,
{
struct die_args *args = __args;
struct pt_regs *regs;
- unsigned int this_nmi;
- unsigned int prev_nmi;
+ int handled = 0;
if (!atomic_read(&active_events))
return NOTIFY_DONE;
@@ -1229,14 +1228,11 @@ perf_event_nmi_handler(struct notifier_block *self,
* was handling a perfctr. Otherwise we pass it and
* let the kernel handle the unknown nmi.
*
- * Note: this could be improved if we drop unknown
- * NMIs only if we handled more than one perfctr in
- * the previous NMI.
*/
- this_nmi = percpu_read(irq_stat.__nmi_count);
- prev_nmi = __get_cpu_var(perfctr_handled);
- if (this_nmi == prev_nmi + 1)
+ if (__get_cpu_var(perfctr_skip)){
+ __get_cpu_var(perfctr_skip) -=1;
return NOTIFY_STOP;
+ }
return NOTIFY_DONE;
default:
return NOTIFY_DONE;
@@ -1246,11 +1242,21 @@ perf_event_nmi_handler(struct notifier_block *self,
apic_write(APIC_LVTPC, APIC_DM_NMI);
- if (!x86_pmu.handle_irq(regs))
+ handled = x86_pmu.handle_irq(regs);
+ if (!handled)
+ /* not our NMI */
return NOTIFY_DONE;
-
- /* handled */
- __get_cpu_var(perfctr_handled) = percpu_read(irq_stat.__nmi_count);
+ else if (handled > 1)
+ /*
+ * More than one perfctr triggered. This could have
+ * caused a second NMI that we must now skip because
+ * we have already handled it. Remember it.
+ *
+ * NOTE: We have no way of knowing if a second NMI was
+ * actually triggered, so we may accidentally skip a valid
+ * unknown nmi later.
+ */
+ __get_cpu_var(perfctr_skip) +=1;
return NOTIFY_STOP;
}
--
1.7.2
^ permalink raw reply related [flat|nested] 85+ messages in thread
* Re: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-10 20:48 ` Don Zickus
@ 2010-08-11 2:44 ` Frederic Weisbecker
2010-08-11 11:10 ` Robert Richter
2010-08-11 12:39 ` Don Zickus
2010-08-11 3:19 ` Huang Ying
1 sibling, 2 replies; 85+ messages in thread
From: Frederic Weisbecker @ 2010-08-11 2:44 UTC (permalink / raw)
To: Don Zickus
Cc: Robert Richter, Cyrill Gorcunov, Peter Zijlstra, Lin Ming,
Ingo Molnar, linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Tue, Aug 10, 2010 at 04:48:56PM -0400, Don Zickus wrote:
> On Mon, Aug 09, 2010 at 09:48:29PM +0200, Robert Richter wrote:
> > On 06.08.10 10:21:31, Don Zickus wrote:
> > > On Fri, Aug 06, 2010 at 08:52:03AM +0200, Robert Richter wrote:
> >
> > > > I was playing around with it yesterday trying to fix this. My idea is
> > > > to skip an unkown nmi if the privious nmi was a *handled* perfctr
> > >
> > > You might want to add a little more logic that says *handled* _and_ had
> > > more than one perfctr trigger. Most of the time only one perfctr is
> > > probably triggering, so you might be eating unknown_nmi's needlessly.
> > >
> > > Just a thought.
> >
> > Yes, that's true. It could be implemented on top of the patch below.
>
> I did, but the changes basically revert the bulk of your patch.
>
> >
> > >
> > > > nmi. I will probably post an rfc patch early next week.
> >
> > Here it comes:
> >
> > From d2739578199d881ae6a9537c1b96a0efd1cdea43 Mon Sep 17 00:00:00 2001
> > From: Robert Richter <robert.richter@amd.com>
> > Date: Thu, 5 Aug 2010 16:19:59 +0200
> > Subject: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
>
> On top of Robert's patch:
> (compiled tested only because I don't have a fancy button to trigger
> unknown nmis)
>
> From 548cf5148f47618854a0eff22b1d55db71b6f8fc Mon Sep 17 00:00:00 2001
> From: Don Zickus <dzickus@redhat.com>
> Date: Tue, 10 Aug 2010 16:40:03 -0400
> Subject: [PATCH] perf, x86: only skip NMIs when multiple perfctrs trigger
>
> A small optimization on top of Robert's patch that limits the
> skipping of NMI's to cases where we detect multiple perfctr events
> have happened.
Yeah, I think that's more reasonable. This lowers even more the chances of
losing important hardware errors.
One comment though:
>
> Signed-off-by: Don Zickus <dzickus@redhat.com>
>
> ---
> arch/x86/kernel/cpu/perf_event.c | 34 ++++++++++++++++++++--------------
> 1 files changed, 20 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index c3cd159..066046d 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -1154,7 +1154,7 @@ static int x86_pmu_handle_irq(struct pt_regs *regs)
> /*
> * event overflow
> */
> - handled = 1;
> + handled += 1;
> data.period = event->hw.last_period;
>
> if (!x86_perf_event_set_period(event))
> @@ -1200,7 +1200,7 @@ void perf_events_lapic_init(void)
> apic_write(APIC_LVTPC, APIC_DM_NMI);
> }
>
> -static DEFINE_PER_CPU(unsigned int, perfctr_handled);
> +static DEFINE_PER_CPU(unsigned int, perfctr_skip);
>
> static int __kprobes
> perf_event_nmi_handler(struct notifier_block *self,
> @@ -1208,8 +1208,7 @@ perf_event_nmi_handler(struct notifier_block *self,
> {
> struct die_args *args = __args;
> struct pt_regs *regs;
> - unsigned int this_nmi;
> - unsigned int prev_nmi;
> + int handled = 0;
>
> if (!atomic_read(&active_events))
> return NOTIFY_DONE;
> @@ -1229,14 +1228,11 @@ perf_event_nmi_handler(struct notifier_block *self,
> * was handling a perfctr. Otherwise we pass it and
> * let the kernel handle the unknown nmi.
> *
> - * Note: this could be improved if we drop unknown
> - * NMIs only if we handled more than one perfctr in
> - * the previous NMI.
> */
> - this_nmi = percpu_read(irq_stat.__nmi_count);
> - prev_nmi = __get_cpu_var(perfctr_handled);
> - if (this_nmi == prev_nmi + 1)
> + if (__get_cpu_var(perfctr_skip)){
> + __get_cpu_var(perfctr_skip) -=1;
> return NOTIFY_STOP;
> + }
> return NOTIFY_DONE;
> default:
> return NOTIFY_DONE;
> @@ -1246,11 +1242,21 @@ perf_event_nmi_handler(struct notifier_block *self,
>
> apic_write(APIC_LVTPC, APIC_DM_NMI);
>
> - if (!x86_pmu.handle_irq(regs))
> + handled = x86_pmu.handle_irq(regs);
> + if (!handled)
> + /* not our NMI */
> return NOTIFY_DONE;
> -
> - /* handled */
> - __get_cpu_var(perfctr_handled) = percpu_read(irq_stat.__nmi_count);
> + else if (handled > 1)
> + /*
> + * More than one perfctr triggered. This could have
> + * caused a second NMI that we must now skip because
> + * we have already handled it. Remember it.
> + *
> + * NOTE: We have no way of knowing if a second NMI was
> + * actually triggered, so we may accidentally skip a valid
> + * unknown nmi later.
> + */
> + __get_cpu_var(perfctr_skip) +=1;
May be make it just a pending bit. I mean not something that can
go further 1, because you can't have more than 1 pending anyway. I don't
know how that could happen you get accidental perctr_skip > 1, may be
expected pending NMIs that don't happen somehow, but better be paranoid with
that, as it's about trying not to miss hardware errors.
Thanks.
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-11 2:44 ` Frederic Weisbecker
@ 2010-08-11 11:10 ` Robert Richter
2010-08-11 12:44 ` Don Zickus
2010-08-13 4:37 ` Frederic Weisbecker
2010-08-11 12:39 ` Don Zickus
1 sibling, 2 replies; 85+ messages in thread
From: Robert Richter @ 2010-08-11 11:10 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Don Zickus, Cyrill Gorcunov, Peter Zijlstra, Lin Ming,
Ingo Molnar, linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On 10.08.10 22:44:55, Frederic Weisbecker wrote:
> On Tue, Aug 10, 2010 at 04:48:56PM -0400, Don Zickus wrote:
> > @@ -1200,7 +1200,7 @@ void perf_events_lapic_init(void)
> > apic_write(APIC_LVTPC, APIC_DM_NMI);
> > }
> >
> > -static DEFINE_PER_CPU(unsigned int, perfctr_handled);
> > +static DEFINE_PER_CPU(unsigned int, perfctr_skip);
Yes, using perfctr_skip is better to understand ...
> > @@ -1229,14 +1228,11 @@ perf_event_nmi_handler(struct notifier_block *self,
> > * was handling a perfctr. Otherwise we pass it and
> > * let the kernel handle the unknown nmi.
> > *
> > - * Note: this could be improved if we drop unknown
> > - * NMIs only if we handled more than one perfctr in
> > - * the previous NMI.
> > */
> > - this_nmi = percpu_read(irq_stat.__nmi_count);
> > - prev_nmi = __get_cpu_var(perfctr_handled);
> > - if (this_nmi == prev_nmi + 1)
> > + if (__get_cpu_var(perfctr_skip)){
> > + __get_cpu_var(perfctr_skip) -=1;
> > return NOTIFY_STOP;
> > + }
> > return NOTIFY_DONE;
> > default:
> > return NOTIFY_DONE;
> > @@ -1246,11 +1242,21 @@ perf_event_nmi_handler(struct notifier_block *self,
> >
> > apic_write(APIC_LVTPC, APIC_DM_NMI);
> >
> > - if (!x86_pmu.handle_irq(regs))
> > + handled = x86_pmu.handle_irq(regs);
> > + if (!handled)
> > + /* not our NMI */
> > return NOTIFY_DONE;
> > -
> > - /* handled */
> > - __get_cpu_var(perfctr_handled) = percpu_read(irq_stat.__nmi_count);
> > + else if (handled > 1)
> > + /*
> > + * More than one perfctr triggered. This could have
> > + * caused a second NMI that we must now skip because
> > + * we have already handled it. Remember it.
> > + *
> > + * NOTE: We have no way of knowing if a second NMI was
> > + * actually triggered, so we may accidentally skip a valid
> > + * unknown nmi later.
> > + */
> > + __get_cpu_var(perfctr_skip) +=1;
... but this will not work. You have to mark the *absolute* nmi number
here. If you only raise a flag, the next unknown nmi will be dropped,
every. Because, in between there could have been other nmis that
stopped the chain and thus the 'unknown' path is not executed. The
trick in my patch is that you *know*, which nmi you want to skip.
I will send an updated version of my patch.
-Robert
>
>
>
> May be make it just a pending bit. I mean not something that can
> go further 1, because you can't have more than 1 pending anyway. I don't
> know how that could happen you get accidental perctr_skip > 1, may be
> expected pending NMIs that don't happen somehow, but better be paranoid with
> that, as it's about trying not to miss hardware errors.
>
> Thanks.
>
>
--
Advanced Micro Devices, Inc.
Operating System Research Center
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-11 11:10 ` Robert Richter
@ 2010-08-11 12:44 ` Don Zickus
2010-08-11 14:03 ` Robert Richter
2010-08-13 4:37 ` Frederic Weisbecker
1 sibling, 1 reply; 85+ messages in thread
From: Don Zickus @ 2010-08-11 12:44 UTC (permalink / raw)
To: Robert Richter
Cc: Frederic Weisbecker, Cyrill Gorcunov, Peter Zijlstra, Lin Ming,
Ingo Molnar, linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Wed, Aug 11, 2010 at 01:10:46PM +0200, Robert Richter wrote:
> > > + *
> > > + * NOTE: We have no way of knowing if a second NMI was
> > > + * actually triggered, so we may accidentally skip a valid
> > > + * unknown nmi later.
> > > + */
> > > + __get_cpu_var(perfctr_skip) +=1;
>
> ... but this will not work. You have to mark the *absolute* nmi number
> here. If you only raise a flag, the next unknown nmi will be dropped,
> every. Because, in between there could have been other nmis that
> stopped the chain and thus the 'unknown' path is not executed. The
> trick in my patch is that you *know*, which nmi you want to skip.
I guess I am confused. The way I read your patch was that you assumed the
next NMI would be the one you skip and if there was another NMI in between
the handled one and the one to skip, you would not skip it (nmi count !=
prev + 1) and it would produce an accidental unknown nmi.
I tried to change that with my patch by setting the skip flag which would
be drained on the next unknown nmi, independent of where it is in the NMI
backlog of NMIs.
Did I misread something?
Cheers,
Don
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-11 12:44 ` Don Zickus
@ 2010-08-11 14:03 ` Robert Richter
2010-08-11 14:32 ` Don Zickus
0 siblings, 1 reply; 85+ messages in thread
From: Robert Richter @ 2010-08-11 14:03 UTC (permalink / raw)
To: Don Zickus
Cc: Frederic Weisbecker, Cyrill Gorcunov, Peter Zijlstra, Lin Ming,
Ingo Molnar, linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On 11.08.10 08:44:43, Don Zickus wrote:
> On Wed, Aug 11, 2010 at 01:10:46PM +0200, Robert Richter wrote:
> > > > + *
> > > > + * NOTE: We have no way of knowing if a second NMI was
> > > > + * actually triggered, so we may accidentally skip a valid
> > > > + * unknown nmi later.
> > > > + */
> > > > + __get_cpu_var(perfctr_skip) +=1;
> >
> > ... but this will not work. You have to mark the *absolute* nmi number
> > here. If you only raise a flag, the next unknown nmi will be dropped,
> > every. Because, in between there could have been other nmis that
> > stopped the chain and thus the 'unknown' path is not executed. The
> > trick in my patch is that you *know*, which nmi you want to skip.
>
> I guess I am confused. The way I read your patch was that you assumed the
> next NMI would be the one you skip and if there was another NMI in between
> the handled one and the one to skip, you would not skip it (nmi count !=
> prev + 1) and it would produce an accidental unknown nmi.
That's how it works.
> I tried to change that with my patch by setting the skip flag which would
> be drained on the next unknown nmi, independent of where it is in the NMI
> backlog of NMIs.
"setting the skip flag which would be drained on the next unknown nmi"
That's what is wrong, it drops every unknown nmi, no matter when it is
detected. In between there could be 1000's of valid other nmis
handled. You even could have been returned from nmi mode. But still,
the next unknown nmi will be dropped. Your patch could accumulate also
the number of unknown nmis to skip, and then, if 'real' unknown nmis
happen, all of them will be dropped.
-Robert
>
> Did I misread something?
>
> Cheers,
> Don
>
--
Advanced Micro Devices, Inc.
Operating System Research Center
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-11 14:03 ` Robert Richter
@ 2010-08-11 14:32 ` Don Zickus
0 siblings, 0 replies; 85+ messages in thread
From: Don Zickus @ 2010-08-11 14:32 UTC (permalink / raw)
To: Robert Richter
Cc: Frederic Weisbecker, Cyrill Gorcunov, Peter Zijlstra, Lin Ming,
Ingo Molnar, linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Wed, Aug 11, 2010 at 04:03:07PM +0200, Robert Richter wrote:
> On 11.08.10 08:44:43, Don Zickus wrote:
> > On Wed, Aug 11, 2010 at 01:10:46PM +0200, Robert Richter wrote:
> > > > > + *
> > > > > + * NOTE: We have no way of knowing if a second NMI was
> > > > > + * actually triggered, so we may accidentally skip a valid
> > > > > + * unknown nmi later.
> > > > > + */
> > > > > + __get_cpu_var(perfctr_skip) +=1;
> > >
> > > ... but this will not work. You have to mark the *absolute* nmi number
> > > here. If you only raise a flag, the next unknown nmi will be dropped,
> > > every. Because, in between there could have been other nmis that
> > > stopped the chain and thus the 'unknown' path is not executed. The
> > > trick in my patch is that you *know*, which nmi you want to skip.
> >
> > I guess I am confused. The way I read your patch was that you assumed the
> > next NMI would be the one you skip and if there was another NMI in between
> > the handled one and the one to skip, you would not skip it (nmi count !=
> > prev + 1) and it would produce an accidental unknown nmi.
>
> That's how it works.
>
> > I tried to change that with my patch by setting the skip flag which would
> > be drained on the next unknown nmi, independent of where it is in the NMI
> > backlog of NMIs.
>
> "setting the skip flag which would be drained on the next unknown nmi"
>
> That's what is wrong, it drops every unknown nmi, no matter when it is
Well as Frederic pointed out the skip variable will never go past one, so
it will only drop at most one unknown nmi.
> detected. In between there could be 1000's of valid other nmis
> handled. You even could have been returned from nmi mode. But still,
> the next unknown nmi will be dropped. Your patch could accumulate also
That was the intent. Can we guarantee that in the rare cases where the
perfctr is generating two nmis, that they will be back-to-back?
I think Huang tried to cap my approach even further my creating a time
window in which the two nmis had to happen. That gives us the flexibility
to handle nmis that are not back to back, but yet deal with the case where
two perfctrs fired but we are unsure if it generated a second nmi and we
falsely set the skip flag.
Cheers,
Don
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-11 11:10 ` Robert Richter
2010-08-11 12:44 ` Don Zickus
@ 2010-08-13 4:37 ` Frederic Weisbecker
2010-08-13 8:22 ` Robert Richter
1 sibling, 1 reply; 85+ messages in thread
From: Frederic Weisbecker @ 2010-08-13 4:37 UTC (permalink / raw)
To: Robert Richter
Cc: Don Zickus, Cyrill Gorcunov, Peter Zijlstra, Lin Ming,
Ingo Molnar, linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Wed, Aug 11, 2010 at 01:10:46PM +0200, Robert Richter wrote:
> On 10.08.10 22:44:55, Frederic Weisbecker wrote:
> > On Tue, Aug 10, 2010 at 04:48:56PM -0400, Don Zickus wrote:
> > > @@ -1200,7 +1200,7 @@ void perf_events_lapic_init(void)
> > > apic_write(APIC_LVTPC, APIC_DM_NMI);
> > > }
> > >
> > > -static DEFINE_PER_CPU(unsigned int, perfctr_handled);
> > > +static DEFINE_PER_CPU(unsigned int, perfctr_skip);
>
> Yes, using perfctr_skip is better to understand ...
>
> > > @@ -1229,14 +1228,11 @@ perf_event_nmi_handler(struct notifier_block *self,
> > > * was handling a perfctr. Otherwise we pass it and
> > > * let the kernel handle the unknown nmi.
> > > *
> > > - * Note: this could be improved if we drop unknown
> > > - * NMIs only if we handled more than one perfctr in
> > > - * the previous NMI.
> > > */
> > > - this_nmi = percpu_read(irq_stat.__nmi_count);
> > > - prev_nmi = __get_cpu_var(perfctr_handled);
> > > - if (this_nmi == prev_nmi + 1)
> > > + if (__get_cpu_var(perfctr_skip)){
> > > + __get_cpu_var(perfctr_skip) -=1;
> > > return NOTIFY_STOP;
> > > + }
> > > return NOTIFY_DONE;
> > > default:
> > > return NOTIFY_DONE;
> > > @@ -1246,11 +1242,21 @@ perf_event_nmi_handler(struct notifier_block *self,
> > >
> > > apic_write(APIC_LVTPC, APIC_DM_NMI);
> > >
> > > - if (!x86_pmu.handle_irq(regs))
> > > + handled = x86_pmu.handle_irq(regs);
> > > + if (!handled)
> > > + /* not our NMI */
> > > return NOTIFY_DONE;
> > > -
> > > - /* handled */
> > > - __get_cpu_var(perfctr_handled) = percpu_read(irq_stat.__nmi_count);
> > > + else if (handled > 1)
> > > + /*
> > > + * More than one perfctr triggered. This could have
> > > + * caused a second NMI that we must now skip because
> > > + * we have already handled it. Remember it.
> > > + *
> > > + * NOTE: We have no way of knowing if a second NMI was
> > > + * actually triggered, so we may accidentally skip a valid
> > > + * unknown nmi later.
> > > + */
> > > + __get_cpu_var(perfctr_skip) +=1;
>
> ... but this will not work. You have to mark the *absolute* nmi number
> here. If you only raise a flag, the next unknown nmi will be dropped,
> every.
Isn't it what we want? Only the next unknown nmi gets dropped.
> Because, in between there could have been other nmis that
> stopped the chain and thus the 'unknown' path is not executed.
I'm not sure what you mean here. Are you thinking about a third
NMI source that triggers while we are still handling the first
NMI in the back to back sequence?
> The trick in my patch is that you *know*, which nmi you want to skip.
Well with the flag you also know which nmi you want to skip.
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-13 4:37 ` Frederic Weisbecker
@ 2010-08-13 8:22 ` Robert Richter
2010-08-14 1:28 ` Frederic Weisbecker
0 siblings, 1 reply; 85+ messages in thread
From: Robert Richter @ 2010-08-13 8:22 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Don Zickus, Cyrill Gorcunov, Peter Zijlstra, Lin Ming,
Ingo Molnar, linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On 13.08.10 00:37:48, Frederic Weisbecker wrote:
> >
> > ... but this will not work. You have to mark the *absolute* nmi number
> > here. If you only raise a flag, the next unknown nmi will be dropped,
> > every.
>
>
>
> Isn't it what we want? Only the next unknown nmi gets dropped.
>
>
>
>
> > Because, in between there could have been other nmis that
> > stopped the chain and thus the 'unknown' path is not executed.
>
>
>
> I'm not sure what you mean here. Are you thinking about a third
> NMI source that triggers while we are still handling the first
> NMI in the back to back sequence?
>
>
>
> > The trick in my patch is that you *know*, which nmi you want to skip.
>
>
> Well with the flag you also know which nmi you want to skip.
We cannot assume that all cpus have the same behavior here. Imagine a
cpu that handles 2 counters and *does not* trigger a back-to-back
nmi. With flags only implemented, the next unknown nmi will be dropped
anyway, no matter when it fires. We have to check the nmi sequence.
The next thing you have to be aware is, a registered nmi handler is
not called with every nmi. If there was another nmi source and a
handler with higher priority was returning a stop, when all other
subsequent handlers are not called. Esp. 'unknown nmi' is called only
in rare cases. So, a handler might not get noticed of an nmi. This
means, if a handler gets called a 2nd time, it must not necessarily
the next (2nd) nmi.
-Robert
--
Advanced Micro Devices, Inc.
Operating System Research Center
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-13 8:22 ` Robert Richter
@ 2010-08-14 1:28 ` Frederic Weisbecker
2010-08-14 2:29 ` Robert Richter
0 siblings, 1 reply; 85+ messages in thread
From: Frederic Weisbecker @ 2010-08-14 1:28 UTC (permalink / raw)
To: Robert Richter
Cc: Don Zickus, Cyrill Gorcunov, Peter Zijlstra, Lin Ming,
Ingo Molnar, linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Fri, Aug 13, 2010 at 10:22:30AM +0200, Robert Richter wrote:
> On 13.08.10 00:37:48, Frederic Weisbecker wrote:
>
> > >
> > > ... but this will not work. You have to mark the *absolute* nmi number
> > > here. If you only raise a flag, the next unknown nmi will be dropped,
> > > every.
> >
> >
> >
> > Isn't it what we want? Only the next unknown nmi gets dropped.
> >
> >
> >
> >
> > > Because, in between there could have been other nmis that
> > > stopped the chain and thus the 'unknown' path is not executed.
> >
> >
> >
> > I'm not sure what you mean here. Are you thinking about a third
> > NMI source that triggers while we are still handling the first
> > NMI in the back to back sequence?
> >
> >
> >
> > > The trick in my patch is that you *know*, which nmi you want to skip.
> >
> >
> > Well with the flag you also know which nmi you want to skip.
>
> We cannot assume that all cpus have the same behavior here. Imagine a
> cpu that handles 2 counters and *does not* trigger a back-to-back
> nmi. With flags only implemented, the next unknown nmi will be dropped
> anyway, no matter when it fires. We have to check the nmi sequence.
I'd expect it to be an ABI. NMIs can't nest, but if one triggers while
servicing another, it should trigger right after (once we "iret", which
reenables NMIs).
But I haven't checked intel or amd documentation about that.
> The next thing you have to be aware is, a registered nmi handler is
> not called with every nmi. If there was another nmi source and a
> handler with higher priority was returning a stop, when all other
> subsequent handlers are not called. Esp. 'unknown nmi' is called only
> in rare cases. So, a handler might not get noticed of an nmi. This
> means, if a handler gets called a 2nd time, it must not necessarily
> the next (2nd) nmi.
Yeah, in this case we can just clear the __ger_cpu_var(next_nmi_skip)
when another handler service the next NMI.
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-14 1:28 ` Frederic Weisbecker
@ 2010-08-14 2:29 ` Robert Richter
0 siblings, 0 replies; 85+ messages in thread
From: Robert Richter @ 2010-08-14 2:29 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Don Zickus, Cyrill Gorcunov, Peter Zijlstra, Lin Ming,
Ingo Molnar, linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On 13.08.10 21:28:07, Frederic Weisbecker wrote:
> > We cannot assume that all cpus have the same behavior here. Imagine a
> > cpu that handles 2 counters and *does not* trigger a back-to-back
> > nmi. With flags only implemented, the next unknown nmi will be dropped
> > anyway, no matter when it fires. We have to check the nmi sequence.
>
>
>
> I'd expect it to be an ABI. NMIs can't nest, but if one triggers while
> servicing another, it should trigger right after (once we "iret", which
> reenables NMIs).
>
> But I haven't checked intel or amd documentation about that.
Yes, nmis are nested and if there is another source firing it will be
retriggered. But the question is, if multiple counters trigger, does
this mean multiple nmis are fired, esp. if all counters were served in
the first run? This very much depends on the cpu implementation and it
will be hard to find documentation about this detail.
> > The next thing you have to be aware is, a registered nmi handler is
> > not called with every nmi. If there was another nmi source and a
> > handler with higher priority was returning a stop, when all other
> > subsequent handlers are not called. Esp. 'unknown nmi' is called only
> > in rare cases. So, a handler might not get noticed of an nmi. This
> > means, if a handler gets called a 2nd time, it must not necessarily
> > the next (2nd) nmi.
>
>
> Yeah, in this case we can just clear the __ger_cpu_var(next_nmi_skip)
> when another handler service the next NMI.
Yes, this might work too. But then you end up at the same complexity.
Even worse, you have to check and unset the flag with each perf nmi.
If you store the nmi number, it will only be read in then 'unknown'
nmi path again. And, you can't unset the flag for nmis by other
sources what do not pass the perf nmi handler.
So, overall, I don't see advantages in using a flag. The
implementation of storing the nmi number is simple and straight
forward with no or less impact on performance or memory usage.
-Robert
--
Advanced Micro Devices, Inc.
Operating System Research Center
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-11 2:44 ` Frederic Weisbecker
2010-08-11 11:10 ` Robert Richter
@ 2010-08-11 12:39 ` Don Zickus
1 sibling, 0 replies; 85+ messages in thread
From: Don Zickus @ 2010-08-11 12:39 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Robert Richter, Cyrill Gorcunov, Peter Zijlstra, Lin Ming,
Ingo Molnar, linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Wed, Aug 11, 2010 at 04:44:55AM +0200, Frederic Weisbecker wrote:
> May be make it just a pending bit. I mean not something that can
> go further 1, because you can't have more than 1 pending anyway. I don't
> know how that could happen you get accidental perctr_skip > 1, may be
> expected pending NMIs that don't happen somehow, but better be paranoid with
> that, as it's about trying not to miss hardware errors.
I guess I was thinking about the SMI case where it drains the perfctr(s)
and then retriggers them but I guess even in that case the most you can
have is one extra NMI. So yeah, you are probably right, I should have
used a flag instead of incrementing.
Cheers,
Don
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-10 20:48 ` Don Zickus
2010-08-11 2:44 ` Frederic Weisbecker
@ 2010-08-11 3:19 ` Huang Ying
2010-08-11 12:36 ` Don Zickus
1 sibling, 1 reply; 85+ messages in thread
From: Huang Ying @ 2010-08-11 3:19 UTC (permalink / raw)
To: Don Zickus
Cc: Robert Richter, Cyrill Gorcunov, Peter Zijlstra, Lin, Ming M,
Ingo Molnar, fweisbec, linux-kernel, Yinghai Lu, Andi Kleen
On Wed, 2010-08-11 at 04:48 +0800, Don Zickus wrote:
> > From d2739578199d881ae6a9537c1b96a0efd1cdea43 Mon Sep 17 00:00:00 2001
> > From: Robert Richter <robert.richter@amd.com>
> > Date: Thu, 5 Aug 2010 16:19:59 +0200
> > Subject: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
>
> On top of Robert's patch:
> (compiled tested only because I don't have a fancy button to trigger
> unknown nmis)
You can trigger unknown NMIs via apic->send_IPI_mask(cpu_mask,
NMI_VECTOR).
How about the algorithm as follow:
int perf_event_nmi_handler()
{
...
switch (cmd) {
case DIE_NMIUNKNOWN:
if (per_cpu(perfctr_prev_handled) > 1
&& rdtsc() - per_cpu(perfctr_handled_timestamp) < 1000)
return NOTIFY_STOP;
else
return NOTIFY_DONE;
}
...
handled = x86_pmu.handle_irq(regs);
per_cpu(perfctr_prev_handled) = per_cpu(perfctr_handled);
per_cpu(perfctr_handled) = handled;
if (handled) {
per_cpu(perfctr_handled_timestamp) = rdtsc();
return NOTIFY_STOP;
} else
return NOTIFY_DONE;
}
Best Regards,
Huang Ying
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-11 3:19 ` Huang Ying
@ 2010-08-11 12:36 ` Don Zickus
2010-08-16 14:37 ` Peter Zijlstra
0 siblings, 1 reply; 85+ messages in thread
From: Don Zickus @ 2010-08-11 12:36 UTC (permalink / raw)
To: Huang Ying
Cc: Robert Richter, Cyrill Gorcunov, Peter Zijlstra, Lin, Ming M,
Ingo Molnar, fweisbec, linux-kernel, Yinghai Lu, Andi Kleen
On Wed, Aug 11, 2010 at 11:19:39AM +0800, Huang Ying wrote:
> On Wed, 2010-08-11 at 04:48 +0800, Don Zickus wrote:
> > > From d2739578199d881ae6a9537c1b96a0efd1cdea43 Mon Sep 17 00:00:00 2001
> > > From: Robert Richter <robert.richter@amd.com>
> > > Date: Thu, 5 Aug 2010 16:19:59 +0200
> > > Subject: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
> >
> > On top of Robert's patch:
> > (compiled tested only because I don't have a fancy button to trigger
> > unknown nmis)
>
> You can trigger unknown NMIs via apic->send_IPI_mask(cpu_mask,
> NMI_VECTOR).
>
> How about the algorithm as follow:
Heh, I thought about the following too, just couldn't figure out an easy
way to timestamp. I forgot about the rdtsc(). :-)
The only thing that might screw this up would be an SMI which takes longer
than 1000 but that should be rare and would probably be related to the
unknown NMI anyway. Also under virt that would probably break (due to
time jumping) but until they emulate the perfctr, it won't matter. :-)
Cheers,
Don
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-11 12:36 ` Don Zickus
@ 2010-08-16 14:37 ` Peter Zijlstra
0 siblings, 0 replies; 85+ messages in thread
From: Peter Zijlstra @ 2010-08-16 14:37 UTC (permalink / raw)
To: Don Zickus
Cc: Huang Ying, Robert Richter, Cyrill Gorcunov, Lin, Ming M,
Ingo Molnar, fweisbec, linux-kernel, Yinghai Lu, Andi Kleen
On Wed, 2010-08-11 at 08:36 -0400, Don Zickus wrote:
> The only thing that might screw this up would be an SMI which takes longer
> than 1000 but that should be rare and would probably be related to the
> unknown NMI anyway.
Long running SMIs aren't nearly as rare as you'd want them to be.
Hitting one in exactly the right spot will be, but given the numbers its
going to happen and make us scratch our heads..
^ permalink raw reply [flat|nested] 85+ messages in thread
* [PATCH -v2] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-09 19:48 ` [PATCH] perf, x86: try to handle unknown nmis with running perfctrs Robert Richter
2010-08-09 20:02 ` Cyrill Gorcunov
2010-08-10 20:48 ` Don Zickus
@ 2010-08-11 22:00 ` Robert Richter
2010-08-12 13:10 ` Robert Richter
` (3 more replies)
2010-08-17 15:22 ` [PATCH -v3] " Robert Richter
3 siblings, 4 replies; 85+ messages in thread
From: Robert Richter @ 2010-08-11 22:00 UTC (permalink / raw)
To: Don Zickus
Cc: Cyrill Gorcunov, Peter Zijlstra, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
I was debuging this a little more, see version 2 below.
-Robert
--
>From 8bb831af56d118b85fc38e0ddc2e516f7504b9fb Mon Sep 17 00:00:00 2001
From: Robert Richter <robert.richter@amd.com>
Date: Thu, 5 Aug 2010 16:19:59 +0200
Subject: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
When perfctrs are running it is valid to have unhandled nmis, two
events could trigger 'simultaneously' raising two back-to-back
NMIs. If the first NMI handles both, the latter will be empty and daze
the CPU.
The solution to avoid an 'unknown nmi' massage in this case was simply
to stop the nmi handler chain when perfctrs are runnning by stating
the nmi was handled. This has the drawback that a) we can not detect
unknown nmis anymore, and b) subsequent nmi handlers are not called.
This patch addresses this. Now, we check this unknown NMI if it could
be a perfctr back-to-back NMI. Otherwise we pass it and let the kernel
handle the unknown nmi.
This is a debug log:
cpu #6, nmi #32333, skip_nmi #32330, handled = 1, time = 1934364430
cpu #6, nmi #32334, skip_nmi #32330, handled = 1, time = 1934704616
cpu #6, nmi #32335, skip_nmi #32336, handled = 2, time = 1936032320
cpu #6, nmi #32336, skip_nmi #32336, handled = 0, time = 1936034139
cpu #6, nmi #32337, skip_nmi #32336, handled = 1, time = 1936120100
cpu #6, nmi #32338, skip_nmi #32336, handled = 1, time = 1936404607
cpu #6, nmi #32339, skip_nmi #32336, handled = 1, time = 1937983416
cpu #6, nmi #32340, skip_nmi #32341, handled = 2, time = 1938201032
cpu #6, nmi #32341, skip_nmi #32341, handled = 0, time = 1938202830
cpu #6, nmi #32342, skip_nmi #32341, handled = 1, time = 1938443743
cpu #6, nmi #32343, skip_nmi #32341, handled = 1, time = 1939956552
cpu #6, nmi #32344, skip_nmi #32341, handled = 1, time = 1940073224
cpu #6, nmi #32345, skip_nmi #32341, handled = 1, time = 1940485677
cpu #6, nmi #32346, skip_nmi #32347, handled = 2, time = 1941947772
cpu #6, nmi #32347, skip_nmi #32347, handled = 1, time = 1941949818
cpu #6, nmi #32348, skip_nmi #32347, handled = 0, time = 1941951591
Uhhuh. NMI received for unknown reason 00 on CPU 6.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue
Deltas:
nmi #32334 340186
nmi #32335 1327704
nmi #32336 1819 <<<< back-to-back nmi [1]
nmi #32337 85961
nmi #32338 284507
nmi #32339 1578809
nmi #32340 217616
nmi #32341 1798 <<<< back-to-back nmi [2]
nmi #32342 240913
nmi #32343 1512809
nmi #32344 116672
nmi #32345 412453
nmi #32346 1462095 <<<< 1st nmi (standard) handling 2 counters
nmi #32347 2046 <<<< 2nd nmi (back-to-back) handling one counter
nmi #32348 1773 <<<< 3rd nmi (back-to-back) handling no counter! [3]
For back-to-back nmi detection there are the following rules:
The perfctr nmi handler was handling more than one counter and no
counter was handled in the subsequent nmi (see [1] and [2] above).
There is another case if there are two subsequent back-to-back nmis
[3]. In this case we measure the time between the first and the
2nd. The 2nd is detected as back-to-back because the first handled
more than one counter. The time between the 1st and the 2nd is used to
calculate a range for which we assume a back-to-back nmi. Now, the 3rd
nmi triggers, we measure again the time delta and compare it with the
first delta from which we know it was a back-to-back nmi. If the 3rd
nmi is within the range, it is also a back-to-back nmi and we drop it.
Signed-off-by: Robert Richter <robert.richter@amd.com>
---
arch/x86/kernel/cpu/perf_event.c | 96 +++++++++++++++++++++++++++++++++----
1 files changed, 85 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index f2da20f..b79a235 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1154,7 +1154,7 @@ static int x86_pmu_handle_irq(struct pt_regs *regs)
/*
* event overflow
*/
- handled = 1;
+ handled++;
data.period = event->hw.last_period;
if (!x86_perf_event_set_period(event))
@@ -1200,12 +1200,24 @@ void perf_events_lapic_init(void)
apic_write(APIC_LVTPC, APIC_DM_NMI);
}
+struct perfctr_nmi {
+ u64 timestamp;
+ u64 range;
+ unsigned int marked;
+ int handled;
+};
+
+static DEFINE_PER_CPU(struct perfctr_nmi, nmi);
+
static int __kprobes
perf_event_nmi_handler(struct notifier_block *self,
unsigned long cmd, void *__args)
{
struct die_args *args = __args;
- struct pt_regs *regs;
+ unsigned int this_nmi;
+ int handled;
+ u64 timestamp;
+ u64 delta;
if (!atomic_read(&active_events))
return NOTIFY_DONE;
@@ -1214,22 +1226,84 @@ perf_event_nmi_handler(struct notifier_block *self,
case DIE_NMI:
case DIE_NMI_IPI:
break;
+ case DIE_NMIUNKNOWN:
+ this_nmi = percpu_read(irq_stat.__nmi_count);
+ if (this_nmi != __get_cpu_var(nmi).marked)
+ return NOTIFY_DONE;
+
+ /*
+ * This one could be our NMI, two events could trigger
+ * 'simultaneously' raising two back-to-back NMIs. If
+ * the first NMI handles both, the latter will be
+ * empty and daze the CPU.
+ *
+ * So, we check this unknown NMI and drop it if it is
+ * a perfctr back-to-back nmi. Otherwise we pass it
+ * and let the kernel handle the unknown nmi.
+ */
+
+ if (!cpu_has_tsc)
+ /*
+ * no timestamps available the cannot detect
+ * back-to-back nmis, drop it
+ */
+ return NOTIFY_STOP;
+
+ if (__get_cpu_var(nmi).handled > 1)
+ /*
+ * we have handled more than one counter,
+ * this must be a back-to-back nmi
+ */
+ return NOTIFY_STOP;
+
+ rdtscll(delta);
+ delta -= __get_cpu_var(nmi).timestamp;
+ if (delta < __get_cpu_var(nmi).range)
+ /* the delta is small, it is a back-to-back nmi */
+ return NOTIFY_STOP;
+
+ /* not a perfctr back-to-back nmi, let it pass */
+ return NOTIFY_DONE;
default:
return NOTIFY_DONE;
}
- regs = args->regs;
+ this_nmi = percpu_read(irq_stat.__nmi_count);
apic_write(APIC_LVTPC, APIC_DM_NMI);
- /*
- * Can't rely on the handled return value to say it was our NMI, two
- * events could trigger 'simultaneously' raising two back-to-back NMIs.
- *
- * If the first NMI handles both, the latter will be empty and daze
- * the CPU.
- */
- x86_pmu.handle_irq(regs);
+
+ handled = x86_pmu.handle_irq(args->regs);
+
+ if (!handled)
+ return NOTIFY_DONE;
+
+ if ((handled == 1) && (__get_cpu_var(nmi).marked != this_nmi))
+ /* this may not trigger back-to-back nmis */
+ return NOTIFY_STOP;
+
+ if (!cpu_has_tsc)
+ goto mark_next_nmi;
+
+ rdtscll(timestamp);
+ if (__get_cpu_var(nmi).marked == this_nmi) {
+ /*
+ * this was a back-to-back nmi, calculate back-to-back
+ * time delta and define the back-to-back range that
+ * is twice the delta
+ */
+ delta = timestamp;
+ delta -= __get_cpu_var(nmi).timestamp;
+
+ __get_cpu_var(nmi).range = delta << 1;
+ }
+
+ __get_cpu_var(nmi).timestamp = timestamp;
+ __get_cpu_var(nmi).handled = handled;
+
+ mark_next_nmi:
+ /* the next nmi could be a back-to-back nmi */
+ __get_cpu_var(nmi).marked = this_nmi + 1;
return NOTIFY_STOP;
}
--
1.7.1.1
--
Advanced Micro Devices, Inc.
Operating System Research Center
^ permalink raw reply related [flat|nested] 85+ messages in thread
* Re: [PATCH -v2] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-11 22:00 ` [PATCH -v2] " Robert Richter
@ 2010-08-12 13:10 ` Robert Richter
2010-08-12 18:21 ` Don Zickus
2010-08-12 13:52 ` Don Zickus
` (2 subsequent siblings)
3 siblings, 1 reply; 85+ messages in thread
From: Robert Richter @ 2010-08-12 13:10 UTC (permalink / raw)
To: Don Zickus
Cc: Cyrill Gorcunov, Peter Zijlstra, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On 12.08.10 00:00:58, Robert Richter wrote:
> I was debuging this a little more, see version 2 below.
I was testing the patch further, it properly filters perfctr
back-to-back nmis. I was able to reliable detect unknown nmis
triggered by the nmi button during high load perf sessions with
multiple counters, no false positives.
>
> -Robert
--
Advanced Micro Devices, Inc.
Operating System Research Center
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v2] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-12 13:10 ` Robert Richter
@ 2010-08-12 18:21 ` Don Zickus
2010-08-16 7:37 ` Robert Richter
0 siblings, 1 reply; 85+ messages in thread
From: Don Zickus @ 2010-08-12 18:21 UTC (permalink / raw)
To: Robert Richter
Cc: Cyrill Gorcunov, Peter Zijlstra, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Thu, Aug 12, 2010 at 03:10:07PM +0200, Robert Richter wrote:
> On 12.08.10 00:00:58, Robert Richter wrote:
> > I was debuging this a little more, see version 2 below.
>
> I was testing the patch further, it properly filters perfctr
> back-to-back nmis. I was able to reliable detect unknown nmis
> triggered by the nmi button during high load perf sessions with
> multiple counters, no false positives.
For my own curiousity, what type of high load perf sessions are you using
to test this. I don't know perf well enough to have it generate events
across multiple perfctrs.
Thanks,
Don
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v2] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-12 18:21 ` Don Zickus
@ 2010-08-16 7:37 ` Robert Richter
0 siblings, 0 replies; 85+ messages in thread
From: Robert Richter @ 2010-08-16 7:37 UTC (permalink / raw)
To: Don Zickus
Cc: Cyrill Gorcunov, Peter Zijlstra, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On 12.08.10 14:21:26, Don Zickus wrote:
> > I was testing the patch further, it properly filters perfctr
> > back-to-back nmis. I was able to reliable detect unknown nmis
> > triggered by the nmi button during high load perf sessions with
> > multiple counters, no false positives.
>
> For my own curiousity, what type of high load perf sessions are you using
> to test this. I don't know perf well enough to have it generate events
> across multiple perfctrs.
You put load on all cpus and then start something like the following:
perf record -e cycles -e instructions -e cache-references \
-e cache-misses -e branch-misses -a -- sleep 10
-Robert
--
Advanced Micro Devices, Inc.
Operating System Research Center
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v2] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-11 22:00 ` [PATCH -v2] " Robert Richter
2010-08-12 13:10 ` Robert Richter
@ 2010-08-12 13:52 ` Don Zickus
2010-08-13 4:25 ` Frederic Weisbecker
2010-08-16 14:48 ` Peter Zijlstra
3 siblings, 0 replies; 85+ messages in thread
From: Don Zickus @ 2010-08-12 13:52 UTC (permalink / raw)
To: Robert Richter
Cc: Cyrill Gorcunov, Peter Zijlstra, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Thu, Aug 12, 2010 at 12:00:58AM +0200, Robert Richter wrote:
> I was debuging this a little more, see version 2 below.
>
> -Robert
>
> --
>
> From 8bb831af56d118b85fc38e0ddc2e516f7504b9fb Mon Sep 17 00:00:00 2001
> From: Robert Richter <robert.richter@amd.com>
> Date: Thu, 5 Aug 2010 16:19:59 +0200
> Subject: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
>
> When perfctrs are running it is valid to have unhandled nmis, two
> events could trigger 'simultaneously' raising two back-to-back
> NMIs. If the first NMI handles both, the latter will be empty and daze
> the CPU.
>
> The solution to avoid an 'unknown nmi' massage in this case was simply
> to stop the nmi handler chain when perfctrs are runnning by stating
> the nmi was handled. This has the drawback that a) we can not detect
> unknown nmis anymore, and b) subsequent nmi handlers are not called.
>
> This patch addresses this. Now, we check this unknown NMI if it could
> be a perfctr back-to-back NMI. Otherwise we pass it and let the kernel
> handle the unknown nmi.
Seems to cover my concerns. Great work!
ACK
Cheers,
Don
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v2] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-11 22:00 ` [PATCH -v2] " Robert Richter
2010-08-12 13:10 ` Robert Richter
2010-08-12 13:52 ` Don Zickus
@ 2010-08-13 4:25 ` Frederic Weisbecker
2010-08-16 14:48 ` Peter Zijlstra
3 siblings, 0 replies; 85+ messages in thread
From: Frederic Weisbecker @ 2010-08-13 4:25 UTC (permalink / raw)
To: Robert Richter
Cc: Don Zickus, Cyrill Gorcunov, Peter Zijlstra, Lin Ming,
Ingo Molnar, linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Thu, Aug 12, 2010 at 12:00:58AM +0200, Robert Richter wrote:
> I was debuging this a little more, see version 2 below.
>
> -Robert
>
> --
>
> From 8bb831af56d118b85fc38e0ddc2e516f7504b9fb Mon Sep 17 00:00:00 2001
> From: Robert Richter <robert.richter@amd.com>
> Date: Thu, 5 Aug 2010 16:19:59 +0200
> Subject: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
>
> When perfctrs are running it is valid to have unhandled nmis, two
> events could trigger 'simultaneously' raising two back-to-back
> NMIs. If the first NMI handles both, the latter will be empty and daze
> the CPU.
>
> The solution to avoid an 'unknown nmi' massage in this case was simply
> to stop the nmi handler chain when perfctrs are runnning by stating
> the nmi was handled. This has the drawback that a) we can not detect
> unknown nmis anymore, and b) subsequent nmi handlers are not called.
>
> This patch addresses this. Now, we check this unknown NMI if it could
> be a perfctr back-to-back NMI. Otherwise we pass it and let the kernel
> handle the unknown nmi.
>
> This is a debug log:
>
> cpu #6, nmi #32333, skip_nmi #32330, handled = 1, time = 1934364430
> cpu #6, nmi #32334, skip_nmi #32330, handled = 1, time = 1934704616
> cpu #6, nmi #32335, skip_nmi #32336, handled = 2, time = 1936032320
> cpu #6, nmi #32336, skip_nmi #32336, handled = 0, time = 1936034139
> cpu #6, nmi #32337, skip_nmi #32336, handled = 1, time = 1936120100
> cpu #6, nmi #32338, skip_nmi #32336, handled = 1, time = 1936404607
> cpu #6, nmi #32339, skip_nmi #32336, handled = 1, time = 1937983416
> cpu #6, nmi #32340, skip_nmi #32341, handled = 2, time = 1938201032
> cpu #6, nmi #32341, skip_nmi #32341, handled = 0, time = 1938202830
> cpu #6, nmi #32342, skip_nmi #32341, handled = 1, time = 1938443743
> cpu #6, nmi #32343, skip_nmi #32341, handled = 1, time = 1939956552
> cpu #6, nmi #32344, skip_nmi #32341, handled = 1, time = 1940073224
> cpu #6, nmi #32345, skip_nmi #32341, handled = 1, time = 1940485677
> cpu #6, nmi #32346, skip_nmi #32347, handled = 2, time = 1941947772
> cpu #6, nmi #32347, skip_nmi #32347, handled = 1, time = 1941949818
> cpu #6, nmi #32348, skip_nmi #32347, handled = 0, time = 1941951591
> Uhhuh. NMI received for unknown reason 00 on CPU 6.
> Do you have a strange power saving mode enabled?
> Dazed and confused, but trying to continue
>
> Deltas:
>
> nmi #32334 340186
> nmi #32335 1327704
> nmi #32336 1819 <<<< back-to-back nmi [1]
> nmi #32337 85961
> nmi #32338 284507
> nmi #32339 1578809
> nmi #32340 217616
> nmi #32341 1798 <<<< back-to-back nmi [2]
> nmi #32342 240913
> nmi #32343 1512809
> nmi #32344 116672
> nmi #32345 412453
> nmi #32346 1462095 <<<< 1st nmi (standard) handling 2 counters
> nmi #32347 2046 <<<< 2nd nmi (back-to-back) handling one counter
> nmi #32348 1773 <<<< 3rd nmi (back-to-back) handling no counter! [3]
>
> For back-to-back nmi detection there are the following rules:
>
> The perfctr nmi handler was handling more than one counter and no
> counter was handled in the subsequent nmi (see [1] and [2] above).
>
> There is another case if there are two subsequent back-to-back nmis
> [3]. In this case we measure the time between the first and the
> 2nd. The 2nd is detected as back-to-back because the first handled
> more than one counter. The time between the 1st and the 2nd is used to
> calculate a range for which we assume a back-to-back nmi. Now, the 3rd
> nmi triggers, we measure again the time delta and compare it with the
> first delta from which we know it was a back-to-back nmi. If the 3rd
> nmi is within the range, it is also a back-to-back nmi and we drop it.
>
> Signed-off-by: Robert Richter <robert.richter@amd.com>
> ---
That time based thing looks a bit complicated.
I'm still not sure why you don't want to use a simple flag:
After handled a perf NMI:
if (handled more than one counter)
__get_cpu_var(skip_unknown) = 1;
While handling an unknown NMI:
if (__get_cpu_var(skip_unknown)) {
__get_cpu_var(skip_unknow) = 0;
return NOTIFY_STOP;
}
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v2] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-11 22:00 ` [PATCH -v2] " Robert Richter
` (2 preceding siblings ...)
2010-08-13 4:25 ` Frederic Weisbecker
@ 2010-08-16 14:48 ` Peter Zijlstra
2010-08-16 16:27 ` Cyrill Gorcunov
3 siblings, 1 reply; 85+ messages in thread
From: Peter Zijlstra @ 2010-08-16 14:48 UTC (permalink / raw)
To: Robert Richter
Cc: Don Zickus, Cyrill Gorcunov, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Thu, 2010-08-12 at 00:00 +0200, Robert Richter wrote:
> From 8bb831af56d118b85fc38e0ddc2e516f7504b9fb Mon Sep 17 00:00:00 2001
> From: Robert Richter <robert.richter@amd.com>
> Date: Thu, 5 Aug 2010 16:19:59 +0200
> Subject: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
>
> When perfctrs are running it is valid to have unhandled nmis, two
> events could trigger 'simultaneously' raising two back-to-back
> NMIs. If the first NMI handles both, the latter will be empty and daze
> the CPU.
>
> The solution to avoid an 'unknown nmi' massage in this case was simply
> to stop the nmi handler chain when perfctrs are runnning by stating
> the nmi was handled. This has the drawback that a) we can not detect
> unknown nmis anymore, and b) subsequent nmi handlers are not called.
>
> This patch addresses this. Now, we check this unknown NMI if it could
> be a perfctr back-to-back NMI. Otherwise we pass it and let the kernel
> handle the unknown nmi.
>
> This is a debug log:
>
> Deltas:
>
> nmi #32346 1462095 <<<< 1st nmi (standard) handling 2 counters
> nmi #32347 2046 <<<< 2nd nmi (back-to-back) handling one counter
> nmi #32348 1773 <<<< 3rd nmi (back-to-back) handling no counter! [3]
>
> For back-to-back nmi detection there are the following rules:
>
> The perfctr nmi handler was handling more than one counter and no
> counter was handled in the subsequent nmi (see [1] and [2] above).
>
> There is another case if there are two subsequent back-to-back nmis
> [3]. In this case we measure the time between the first and the
> 2nd. The 2nd is detected as back-to-back because the first handled
> more than one counter. The time between the 1st and the 2nd is used to
> calculate a range for which we assume a back-to-back nmi. Now, the 3rd
> nmi triggers, we measure again the time delta and compare it with the
> first delta from which we know it was a back-to-back nmi. If the 3rd
> nmi is within the range, it is also a back-to-back nmi and we drop it.
I liked the one without funny timestamps in better, the whole timestamps
thing just feels too fragile.
Relying on handled > 1 to arm the back-to-back filter seems doable.
(Also, you didn't deal with the TSC going backwards..)
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v2] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-16 14:48 ` Peter Zijlstra
@ 2010-08-16 16:27 ` Cyrill Gorcunov
2010-08-16 17:16 ` Robert Richter
0 siblings, 1 reply; 85+ messages in thread
From: Cyrill Gorcunov @ 2010-08-16 16:27 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Robert Richter, Don Zickus, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Mon, Aug 16, 2010 at 04:48:36PM +0200, Peter Zijlstra wrote:
...
> > There is another case if there are two subsequent back-to-back nmis
> > [3]. In this case we measure the time between the first and the
> > 2nd. The 2nd is detected as back-to-back because the first handled
> > more than one counter. The time between the 1st and the 2nd is used to
> > calculate a range for which we assume a back-to-back nmi. Now, the 3rd
> > nmi triggers, we measure again the time delta and compare it with the
> > first delta from which we know it was a back-to-back nmi. If the 3rd
> > nmi is within the range, it is also a back-to-back nmi and we drop it.
>
> I liked the one without funny timestamps in better, the whole timestamps
> thing just feels too fragile.
>
Me too, the former Roberts patch (if I'm not missing something) looks good
to me.
>
> Relying on handled > 1 to arm the back-to-back filter seems doable.
>
It's doable _but_ I think there is nothing we can do, there is no
way (at least I known of) to check if there is latched nmi from
perf counters. We only can assume that if there multiple counters
overflowed most probably the next unknown nmi has the same nature,
ie it came from perf. Yes, we can loose real unknown nmi in this
case but I think this is justified trade off. If an user need
a precise counting of unknown nmis he should not arm perf events
at all, if there an user with nmi button (guys where did you get this
magic buttuns? i need one ;) he better to not arm perf events too
otherwise he might have to click twice
(and of course we should keep in mind Andi's proposal but it
is a next step I think).
> (Also, you didn't deal with the TSC going backwards..)
>
-- Cyrill
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v2] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-16 16:27 ` Cyrill Gorcunov
@ 2010-08-16 17:16 ` Robert Richter
2010-08-16 19:06 ` Cyrill Gorcunov
0 siblings, 1 reply; 85+ messages in thread
From: Robert Richter @ 2010-08-16 17:16 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: Peter Zijlstra, Don Zickus, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On 16.08.10 12:27:06, Cyrill Gorcunov wrote:
> On Mon, Aug 16, 2010 at 04:48:36PM +0200, Peter Zijlstra wrote:
> > I liked the one without funny timestamps in better, the whole timestamps
> > thing just feels too fragile.
> >
>
> Me too, the former Roberts patch (if I'm not missing something) looks good
> to me.
>
> >
> > Relying on handled > 1 to arm the back-to-back filter seems doable.
Peter, I will rip out the timestamp code from the -v2 patch. My first
patch does not deal with a 2-1-0 sequence, so it has false positives.
We do not necessarily need the timestamps if back-to-back nmis are
rare. Without using timestamps the statistically lost ratio for
unknown nmis will be as the ratio for back-to-back nmis, with
timestamps we could catch almost every unknown nmi. So if we encounter
problems we could still implement timestamp code on top.
> It's doable _but_ I think there is nothing we can do, there is no
> way (at least I known of) to check if there is latched nmi from
> perf counters. We only can assume that if there multiple counters
> overflowed most probably the next unknown nmi has the same nature,
> ie it came from perf.
As said, I think with timestamps we could be able to detect 100% of
the unknown nmis. I guess we get now more than 90% with mutliple
counters, and 100% with a single counter running. So, this is already
more than a simple improvement.
> Yes, we can loose real unknown nmi in this
> case but I think this is justified trade off. If an user need
> a precise counting of unknown nmis he should not arm perf events
> at all, if there an user with nmi button (guys where did you get this
> magic buttuns? i need one ;) he better to not arm perf events too
> otherwise he might have to click twice
>
> (and of course we should keep in mind Andi's proposal but it
> is a next step I think).
Yes, this patch is the first step, now we can change the nmi handler
priority. The perf handler must not have the lowest priority anymore.
> > (Also, you didn't deal with the TSC going backwards..)
Does this also happen in the case of a back-to-back nmi? I don't know
the conditions for a backward running TSC. Maybe, if an nmi is
retriggered the TSC wont be adjusted by a negative offset, I don't
know...
-Robert
--
Advanced Micro Devices, Inc.
Operating System Research Center
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v2] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-16 17:16 ` Robert Richter
@ 2010-08-16 19:06 ` Cyrill Gorcunov
2010-08-16 19:13 ` Peter Zijlstra
2010-08-16 22:55 ` Robert Richter
0 siblings, 2 replies; 85+ messages in thread
From: Cyrill Gorcunov @ 2010-08-16 19:06 UTC (permalink / raw)
To: Robert Richter
Cc: Peter Zijlstra, Don Zickus, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Mon, Aug 16, 2010 at 07:16:10PM +0200, Robert Richter wrote:
> On 16.08.10 12:27:06, Cyrill Gorcunov wrote:
> > On Mon, Aug 16, 2010 at 04:48:36PM +0200, Peter Zijlstra wrote:
> > > I liked the one without funny timestamps in better, the whole timestamps
> > > thing just feels too fragile.
> > >
> >
> > Me too, the former Roberts patch (if I'm not missing something) looks good
> > to me.
> >
> > >
> > > Relying on handled > 1 to arm the back-to-back filter seems doable.
>
I suspect Peter was supposed to be in To: field ;)
> Peter, I will rip out the timestamp code from the -v2 patch. My first
> patch does not deal with a 2-1-0 sequence, so it has false positives.
> We do not necessarily need the timestamps if back-to-back nmis are
> rare. Without using timestamps the statistically lost ratio for
> unknown nmis will be as the ratio for back-to-back nmis, with
> timestamps we could catch almost every unknown nmi. So if we encounter
> problems we could still implement timestamp code on top.
>
> > It's doable _but_ I think there is nothing we can do, there is no
> > way (at least I known of) to check if there is latched nmi from
> > perf counters. We only can assume that if there multiple counters
> > overflowed most probably the next unknown nmi has the same nature,
> > ie it came from perf.
>
> As said, I think with timestamps we could be able to detect 100% of
> the unknown nmis. I guess we get now more than 90% with mutliple
> counters, and 100% with a single counter running. So, this is already
> more than a simple improvement.
Robert, I think we still may miss unknown irq, consider the case when
unknown nmi is latched while you handle nmi from perf and what is
more interesting several counters may be overflowed. So you set
delta small enough and second (unknown nmi) will be in range and
treated as being perf back-to-back, or I miss something from patch?
>
> > Yes, we can loose real unknown nmi in this
> > case but I think this is justified trade off. If an user need
> > a precise counting of unknown nmis he should not arm perf events
> > at all, if there an user with nmi button (guys where did you get this
> > magic buttuns? i need one ;) he better to not arm perf events too
> > otherwise he might have to click twice
> >
> > (and of course we should keep in mind Andi's proposal but it
> > is a next step I think).
>
> Yes, this patch is the first step, now we can change the nmi handler
> priority. The perf handler must not have the lowest priority anymore.
>
> > > (Also, you didn't deal with the TSC going backwards..)
>
> Does this also happen in the case of a back-to-back nmi? I don't know
> the conditions for a backward running TSC. Maybe, if an nmi is
> retriggered the TSC wont be adjusted by a negative offset, I don't
> know...
I never heard of backward running tsc, though tsc is a strange beast.
>
> -Robert
>
> --
> Advanced Micro Devices, Inc.
> Operating System Research Center
>
-- Cyrill
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v2] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-16 19:06 ` Cyrill Gorcunov
@ 2010-08-16 19:13 ` Peter Zijlstra
2010-08-16 19:18 ` Cyrill Gorcunov
2010-08-16 22:55 ` Robert Richter
1 sibling, 1 reply; 85+ messages in thread
From: Peter Zijlstra @ 2010-08-16 19:13 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: Robert Richter, Don Zickus, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Mon, 2010-08-16 at 23:06 +0400, Cyrill Gorcunov wrote:
> I never heard of backward running tsc, though tsc is a strange beast.
>
Its not supposed to happen, but then there's BIOS failure-add that frobs
the TSC from SMIs and fun TSC artifacts around CPU frequency changes and
people resetting TSC in S-states etc..
In short, never trust the TSC to be even remotely sane.
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v2] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-16 19:13 ` Peter Zijlstra
@ 2010-08-16 19:18 ` Cyrill Gorcunov
0 siblings, 0 replies; 85+ messages in thread
From: Cyrill Gorcunov @ 2010-08-16 19:18 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Robert Richter, Don Zickus, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Mon, Aug 16, 2010 at 09:13:16PM +0200, Peter Zijlstra wrote:
> On Mon, 2010-08-16 at 23:06 +0400, Cyrill Gorcunov wrote:
> > I never heard of backward running tsc, though tsc is a strange beast.
> >
> Its not supposed to happen, but then there's BIOS failure-add that frobs
> the TSC from SMIs and fun TSC artifacts around CPU frequency changes and
> people resetting TSC in S-states etc..
grr :/
>
> In short, never trust the TSC to be even remotely sane.
>
ok, good to know, thanks!
-- Cyrill
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v2] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-16 19:06 ` Cyrill Gorcunov
2010-08-16 19:13 ` Peter Zijlstra
@ 2010-08-16 22:55 ` Robert Richter
2010-08-17 15:23 ` Cyrill Gorcunov
1 sibling, 1 reply; 85+ messages in thread
From: Robert Richter @ 2010-08-16 22:55 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: Peter Zijlstra, Don Zickus, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On 16.08.10 15:06:59, Cyrill Gorcunov wrote:
> I suspect Peter was supposed to be in To: field ;)
It was the attempt to answer 2 mails with one. :)
> Robert, I think we still may miss unknown irq, consider the case when
> unknown nmi is latched while you handle nmi from perf and what is
> more interesting several counters may be overflowed. So you set
> delta small enough and second (unknown nmi) will be in range and
> treated as being perf back-to-back, or I miss something from patch?
Yes, that's true, but before you have to enable your Infinite
Improbability Drive.
-Robert
--
Advanced Micro Devices, Inc.
Operating System Research Center
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v2] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-16 22:55 ` Robert Richter
@ 2010-08-17 15:23 ` Cyrill Gorcunov
0 siblings, 0 replies; 85+ messages in thread
From: Cyrill Gorcunov @ 2010-08-17 15:23 UTC (permalink / raw)
To: Robert Richter
Cc: Peter Zijlstra, Don Zickus, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Tue, Aug 17, 2010 at 12:55:25AM +0200, Robert Richter wrote:
> On 16.08.10 15:06:59, Cyrill Gorcunov wrote:
> > I suspect Peter was supposed to be in To: field ;)
>
> It was the attempt to answer 2 mails with one. :)
>
ah, I see ;)
> > Robert, I think we still may miss unknown irq, consider the case when
> > unknown nmi is latched while you handle nmi from perf and what is
> > more interesting several counters may be overflowed. So you set
> > delta small enough and second (unknown nmi) will be in range and
> > treated as being perf back-to-back, or I miss something from patch?
>
> Yes, that's true, but before you have to enable your Infinite
> Improbability Drive.
>
not at all, it's absolutely legit to happen ;)
> -Robert
>
> --
> Advanced Micro Devices, Inc.
> Operating System Research Center
>
-- Cyrill
^ permalink raw reply [flat|nested] 85+ messages in thread
* [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-09 19:48 ` [PATCH] perf, x86: try to handle unknown nmis with running perfctrs Robert Richter
` (2 preceding siblings ...)
2010-08-11 22:00 ` [PATCH -v2] " Robert Richter
@ 2010-08-17 15:22 ` Robert Richter
2010-08-17 16:17 ` Cyrill Gorcunov
` (2 more replies)
3 siblings, 3 replies; 85+ messages in thread
From: Robert Richter @ 2010-08-17 15:22 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Don Zickus, Cyrill Gorcunov, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
Peter,
this is version 3 without timestamp code. Compared to -v1 it
implements the handling for 2-1-0 nmi sequences.
-Robert
>From 71a206394d8a3536b033247ec14ad037fef77236 Mon Sep 17 00:00:00 2001
From: Robert Richter <robert.richter@amd.com>
Date: Tue, 17 Aug 2010 16:42:03 +0200
Subject: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
When perfctrs are running it is valid to have unhandled nmis, two
events could trigger 'simultaneously' raising two back-to-back
NMIs. If the first NMI handles both, the latter will be empty and daze
the CPU.
The solution to avoid an 'unknown nmi' massage in this case was simply
to stop the nmi handler chain when perfctrs are runnning by stating
the nmi was handled. This has the drawback that a) we can not detect
unknown nmis anymore, and b) subsequent nmi handlers are not called.
This patch addresses this. Now, we check this unknown NMI if it could
be a perfctr back-to-back NMI. Otherwise we pass it and let the kernel
handle the unknown nmi.
This is a debug log:
cpu #6, nmi #32333, skip_nmi #32330, handled = 1, time = 1934364430
cpu #6, nmi #32334, skip_nmi #32330, handled = 1, time = 1934704616
cpu #6, nmi #32335, skip_nmi #32336, handled = 2, time = 1936032320
cpu #6, nmi #32336, skip_nmi #32336, handled = 0, time = 1936034139
cpu #6, nmi #32337, skip_nmi #32336, handled = 1, time = 1936120100
cpu #6, nmi #32338, skip_nmi #32336, handled = 1, time = 1936404607
cpu #6, nmi #32339, skip_nmi #32336, handled = 1, time = 1937983416
cpu #6, nmi #32340, skip_nmi #32341, handled = 2, time = 1938201032
cpu #6, nmi #32341, skip_nmi #32341, handled = 0, time = 1938202830
cpu #6, nmi #32342, skip_nmi #32341, handled = 1, time = 1938443743
cpu #6, nmi #32343, skip_nmi #32341, handled = 1, time = 1939956552
cpu #6, nmi #32344, skip_nmi #32341, handled = 1, time = 1940073224
cpu #6, nmi #32345, skip_nmi #32341, handled = 1, time = 1940485677
cpu #6, nmi #32346, skip_nmi #32347, handled = 2, time = 1941947772
cpu #6, nmi #32347, skip_nmi #32347, handled = 1, time = 1941949818
cpu #6, nmi #32348, skip_nmi #32347, handled = 0, time = 1941951591
Uhhuh. NMI received for unknown reason 00 on CPU 6.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue
Deltas:
nmi #32334 340186
nmi #32335 1327704
nmi #32336 1819 <<<< back-to-back nmi [1]
nmi #32337 85961
nmi #32338 284507
nmi #32339 1578809
nmi #32340 217616
nmi #32341 1798 <<<< back-to-back nmi [2]
nmi #32342 240913
nmi #32343 1512809
nmi #32344 116672
nmi #32345 412453
nmi #32346 1462095 <<<< 1st nmi (standard) handling 2 counters
nmi #32347 2046 <<<< 2nd nmi (back-to-back) handling one counter
nmi #32348 1773 <<<< 3rd nmi (back-to-back) handling no counter! [3]
For back-to-back nmi detection there are the following rules:
The perfctr nmi handler was handling more than one counter and no
counter was handled in the subsequent nmi (see [1] and [2] above).
There is another case if there are two subsequent back-to-back nmis
[3]. The 2nd is detected as back-to-back because the first handled
more than one counter. If the second handles one counter and the 3rd
handles nothing, we drop the 3rd nmi because it could be a
back-to-back nmi.
Signed-off-by: Robert Richter <robert.richter@amd.com>
---
arch/x86/kernel/cpu/perf_event.c | 65 ++++++++++++++++++++++++++++++-------
1 files changed, 52 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index f2da20f..bbcc89e 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1154,7 +1154,7 @@ static int x86_pmu_handle_irq(struct pt_regs *regs)
/*
* event overflow
*/
- handled = 1;
+ handled++;
data.period = event->hw.last_period;
if (!x86_perf_event_set_period(event))
@@ -1200,12 +1200,20 @@ void perf_events_lapic_init(void)
apic_write(APIC_LVTPC, APIC_DM_NMI);
}
+struct perfctr_nmi {
+ unsigned int marked;
+ int handled;
+};
+
+static DEFINE_PER_CPU(struct perfctr_nmi, nmi);
+
static int __kprobes
perf_event_nmi_handler(struct notifier_block *self,
unsigned long cmd, void *__args)
{
struct die_args *args = __args;
- struct pt_regs *regs;
+ unsigned int this_nmi;
+ int handled;
if (!atomic_read(&active_events))
return NOTIFY_DONE;
@@ -1214,22 +1222,53 @@ perf_event_nmi_handler(struct notifier_block *self,
case DIE_NMI:
case DIE_NMI_IPI:
break;
-
+ case DIE_NMIUNKNOWN:
+ this_nmi = percpu_read(irq_stat.__nmi_count);
+ if (this_nmi != __get_cpu_var(nmi).marked)
+ /* let the kernel handle the unknown nmi */
+ return NOTIFY_DONE;
+ /*
+ * This one is a perfctr back-to-back nmi. Two events
+ * trigger 'simultaneously' raising two back-to-back
+ * NMIs. If the first NMI handles both, the latter
+ * will be empty and daze the CPU. So, we drop it to
+ * avoid false-positive 'unknown nmi' messages.
+ */
+ return NOTIFY_STOP;
default:
return NOTIFY_DONE;
}
- regs = args->regs;
-
apic_write(APIC_LVTPC, APIC_DM_NMI);
- /*
- * Can't rely on the handled return value to say it was our NMI, two
- * events could trigger 'simultaneously' raising two back-to-back NMIs.
- *
- * If the first NMI handles both, the latter will be empty and daze
- * the CPU.
- */
- x86_pmu.handle_irq(regs);
+
+ handled = x86_pmu.handle_irq(args->regs);
+ if (!handled)
+ return NOTIFY_DONE;
+
+ this_nmi = percpu_read(irq_stat.__nmi_count);
+ if (handled > 1)
+ goto mark_nmi;
+ if ((__get_cpu_var(nmi).marked == this_nmi)
+ && (__get_cpu_var(nmi).handled > 1))
+ /*
+ * We could have two subsequent back-to-back nmis: The
+ * first handles more than one counter, the 2nd
+ * handles only one counter and the 3rd handles no
+ * counter.
+ *
+ * This is the 2nd nmi because the previous was
+ * handling more than one counter. We will mark the
+ * next (3rd) and then drop it if unhandled.
+ */
+ goto mark_nmi;
+
+ /* this may not trigger back-to-back nmis */
+ return NOTIFY_STOP;
+
+ mark_nmi:
+ /* the next nmi could be a back-to-back nmi */
+ __get_cpu_var(nmi).marked = this_nmi + 1;
+ __get_cpu_var(nmi).handled = handled;
return NOTIFY_STOP;
}
--
1.7.1.1
--
Advanced Micro Devices, Inc.
Operating System Research Center
^ permalink raw reply related [flat|nested] 85+ messages in thread
* Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-17 15:22 ` [PATCH -v3] " Robert Richter
@ 2010-08-17 16:17 ` Cyrill Gorcunov
2010-08-19 10:45 ` Peter Zijlstra
2010-08-20 14:17 ` [tip:perf/urgent] perf, x86: Try to handle unknown nmis with an enabled PMU tip-bot for Robert Richter
2 siblings, 0 replies; 85+ messages in thread
From: Cyrill Gorcunov @ 2010-08-17 16:17 UTC (permalink / raw)
To: Robert Richter
Cc: Peter Zijlstra, Don Zickus, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Tue, Aug 17, 2010 at 05:22:25PM +0200, Robert Richter wrote:
> Peter,
>
> this is version 3 without timestamp code. Compared to -v1 it
> implements the handling for 2-1-0 nmi sequences.
>
> -Robert
>
> From 71a206394d8a3536b033247ec14ad037fef77236 Mon Sep 17 00:00:00 2001
> From: Robert Richter <robert.richter@amd.com>
> Date: Tue, 17 Aug 2010 16:42:03 +0200
> Subject: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
>
...
> Signed-off-by: Robert Richter <robert.richter@amd.com>
> ---
...
Looks good to me, thanks a lot Robert!
-- Cyrill
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-17 15:22 ` [PATCH -v3] " Robert Richter
2010-08-17 16:17 ` Cyrill Gorcunov
@ 2010-08-19 10:45 ` Peter Zijlstra
2010-08-19 12:39 ` Robert Richter
2010-08-19 14:12 ` Don Zickus
2010-08-20 14:17 ` [tip:perf/urgent] perf, x86: Try to handle unknown nmis with an enabled PMU tip-bot for Robert Richter
2 siblings, 2 replies; 85+ messages in thread
From: Peter Zijlstra @ 2010-08-19 10:45 UTC (permalink / raw)
To: Robert Richter
Cc: Don Zickus, Cyrill Gorcunov, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Tue, 2010-08-17 at 17:22 +0200, Robert Richter wrote:
> + this_nmi = percpu_read(irq_stat.__nmi_count);
> + if (handled > 1)
> + goto mark_nmi;
> + if ((__get_cpu_var(nmi).marked == this_nmi)
> + && (__get_cpu_var(nmi).handled > 1))
> + /*
> + * We could have two subsequent back-to-back nmis: The
> + * first handles more than one counter, the 2nd
> + * handles only one counter and the 3rd handles no
> + * counter.
> + *
> + * This is the 2nd nmi because the previous was
> + * handling more than one counter. We will mark the
> + * next (3rd) and then drop it if unhandled.
> + */
> + goto mark_nmi;
> +
> + /* this may not trigger back-to-back nmis */
> + return NOTIFY_STOP;
> +
> + mark_nmi:
> + /* the next nmi could be a back-to-back nmi */
> + __get_cpu_var(nmi).marked = this_nmi + 1;
> + __get_cpu_var(nmi).handled = handled;
>
> return NOTIFY_STOP;
> }
I queued it with that part changed to:
+ this_nmi = percpu_read(irq_stat.__nmi_count);
+ if ((handled > 1) ||
+ /* the next nmi could be a back-to-back nmi */
+ ((__get_cpu_var(nmi).marked == this_nmi) &&
+ (__get_cpu_var(nmi).handled > 1))) {
+ /*
+ * We could have two subsequent back-to-back nmis: The
+ * first handles more than one counter, the 2nd
+ * handles only one counter and the 3rd handles no
+ * counter.
+ *
+ * This is the 2nd nmi because the previous was
+ * handling more than one counter. We will mark the
+ * next (3rd) and then drop it if unhandled.
+ */
+ __get_cpu_var(nmi).marked = this_nmi + 1;
+ __get_cpu_var(nmi).handled = handled;
+ }
return NOTIFY_STOP;
}
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-19 10:45 ` Peter Zijlstra
@ 2010-08-19 12:39 ` Robert Richter
2010-08-19 14:12 ` Don Zickus
1 sibling, 0 replies; 85+ messages in thread
From: Robert Richter @ 2010-08-19 12:39 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Don Zickus, Cyrill Gorcunov, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On 19.08.10 06:45:53, Peter Zijlstra wrote:
> I queued it with that part changed to:
>
> + this_nmi = percpu_read(irq_stat.__nmi_count);
> + if ((handled > 1) ||
> + /* the next nmi could be a back-to-back nmi */
> + ((__get_cpu_var(nmi).marked == this_nmi) &&
> + (__get_cpu_var(nmi).handled > 1))) {
> + /*
> + * We could have two subsequent back-to-back nmis: The
> + * first handles more than one counter, the 2nd
> + * handles only one counter and the 3rd handles no
> + * counter.
> + *
> + * This is the 2nd nmi because the previous was
> + * handling more than one counter. We will mark the
> + * next (3rd) and then drop it if unhandled.
> + */
> + __get_cpu_var(nmi).marked = this_nmi + 1;
> + __get_cpu_var(nmi).handled = handled;
> + }
>
> return NOTIFY_STOP;
> }
I am fine with this. Thanks Peter.
-Robert
--
Advanced Micro Devices, Inc.
Operating System Research Center
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-19 10:45 ` Peter Zijlstra
2010-08-19 12:39 ` Robert Richter
@ 2010-08-19 14:12 ` Don Zickus
2010-08-19 14:27 ` Peter Zijlstra
1 sibling, 1 reply; 85+ messages in thread
From: Don Zickus @ 2010-08-19 14:12 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Robert Richter, Cyrill Gorcunov, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Thu, Aug 19, 2010 at 12:45:53PM +0200, Peter Zijlstra wrote:
>
> I queued it with that part changed to:
I realized the other day this change doesn't cover the nehalem, core and p4
cases which use
intel_pmu_handle_irq
p4_pmu_handle_irq
as their handlers. Though that patch can go on top of Robert's.
Cheers,
Don
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-19 14:12 ` Don Zickus
@ 2010-08-19 14:27 ` Peter Zijlstra
2010-08-19 15:20 ` Don Zickus
` (4 more replies)
0 siblings, 5 replies; 85+ messages in thread
From: Peter Zijlstra @ 2010-08-19 14:27 UTC (permalink / raw)
To: Don Zickus
Cc: Robert Richter, Cyrill Gorcunov, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Thu, 2010-08-19 at 10:12 -0400, Don Zickus wrote:
> On Thu, Aug 19, 2010 at 12:45:53PM +0200, Peter Zijlstra wrote:
> >
> > I queued it with that part changed to:
>
> I realized the other day this change doesn't cover the nehalem, core and p4
> cases which use
>
> intel_pmu_handle_irq
> p4_pmu_handle_irq
>
> as their handlers. Though that patch can go on top of Robert's.
Something like this?
---
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
@@ -713,6 +713,7 @@ static int intel_pmu_handle_irq(struct p
struct cpu_hw_events *cpuc;
int bit, loops;
u64 ack, status;
+ int handled = 0;
perf_sample_data_init(&data, 0);
@@ -743,12 +744,16 @@ again:
/*
* PEBS overflow sets bit 62 in the global status register
*/
- if (__test_and_clear_bit(62, (unsigned long *)&status))
+ if (__test_and_clear_bit(62, (unsigned long *)&status)) {
+ handled++;
x86_pmu.drain_pebs(regs);
+ }
for_each_set_bit(bit, (unsigned long *)&status, X86_PMC_IDX_MAX) {
struct perf_event *event = cpuc->events[bit];
+ handled++;
+
if (!test_bit(bit, cpuc->active_mask))
continue;
@@ -772,7 +777,7 @@ again:
done:
intel_pmu_enable_all(0);
- return 1;
+ return handled;
}
static struct event_constraint *
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_p4.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_p4.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_p4.c
@@ -673,7 +673,7 @@ static int p4_pmu_handle_irq(struct pt_r
if (!overflow && (val & (1ULL << (x86_pmu.cntval_bits - 1))))
continue;
- handled += overflow;
+ handled += !!overflow;
/* event overflow for sure */
data.period = event->hw.last_period;
@@ -690,7 +690,7 @@ static int p4_pmu_handle_irq(struct pt_r
inc_irq_stat(apic_perf_irqs);
}
- return handled > 0;
+ return handled;
}
/*
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-19 14:27 ` Peter Zijlstra
@ 2010-08-19 15:20 ` Don Zickus
2010-08-19 17:43 ` Cyrill Gorcunov
` (3 subsequent siblings)
4 siblings, 0 replies; 85+ messages in thread
From: Don Zickus @ 2010-08-19 15:20 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Robert Richter, Cyrill Gorcunov, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Thu, Aug 19, 2010 at 04:27:13PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-08-19 at 10:12 -0400, Don Zickus wrote:
> > On Thu, Aug 19, 2010 at 12:45:53PM +0200, Peter Zijlstra wrote:
> > >
> > > I queued it with that part changed to:
> >
> > I realized the other day this change doesn't cover the nehalem, core and p4
> > cases which use
> >
> > intel_pmu_handle_irq
> > p4_pmu_handle_irq
> >
> > as their handlers. Though that patch can go on top of Robert's.
>
> Something like this?
Looks correct. I'll try to test the perf_event_intel.c path though I
haven't had much luck getting multiple events on the nehalem box I have
(the amd box was easy).
Cheers,
Don
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-19 14:27 ` Peter Zijlstra
2010-08-19 15:20 ` Don Zickus
@ 2010-08-19 17:43 ` Cyrill Gorcunov
2010-08-19 17:53 ` Peter Zijlstra
2010-08-19 21:58 ` Don Zickus
` (2 subsequent siblings)
4 siblings, 1 reply; 85+ messages in thread
From: Cyrill Gorcunov @ 2010-08-19 17:43 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Don Zickus, Robert Richter, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Thu, Aug 19, 2010 at 04:27:13PM +0200, Peter Zijlstra wrote:
...
> Index: linux-2.6/arch/x86/kernel/cpu/perf_event_p4.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_p4.c
> +++ linux-2.6/arch/x86/kernel/cpu/perf_event_p4.c
> @@ -673,7 +673,7 @@ static int p4_pmu_handle_irq(struct pt_r
> if (!overflow && (val & (1ULL << (x86_pmu.cntval_bits - 1))))
> continue;
>
> - handled += overflow;
> + handled += !!overflow;
No need to !! here, overflowed returns [0;1] though a small nit
and could be updated later :)
>
> /* event overflow for sure */
> data.period = event->hw.last_period;
> @@ -690,7 +690,7 @@ static int p4_pmu_handle_irq(struct pt_r
> inc_irq_stat(apic_perf_irqs);
> }
>
> - return handled > 0;
> + return handled;
> }
>
> /*
>
-- Cyrill
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-19 17:43 ` Cyrill Gorcunov
@ 2010-08-19 17:53 ` Peter Zijlstra
0 siblings, 0 replies; 85+ messages in thread
From: Peter Zijlstra @ 2010-08-19 17:53 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: Don Zickus, Robert Richter, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Thu, 2010-08-19 at 21:43 +0400, Cyrill Gorcunov wrote:
> > @@ -673,7 +673,7 @@ static int p4_pmu_handle_irq(struct pt_r
> > if (!overflow && (val & (1ULL << (x86_pmu.cntval_bits
> - 1))))
> > continue;
> >
> > - handled += overflow;
> > + handled += !!overflow;
>
> No need to !! here, overflowed returns [0;1] though a small nit
> and could be updated later :)
done.
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-19 14:27 ` Peter Zijlstra
2010-08-19 15:20 ` Don Zickus
2010-08-19 17:43 ` Cyrill Gorcunov
@ 2010-08-19 21:58 ` Don Zickus
2010-08-20 8:50 ` Peter Zijlstra
2010-08-20 1:50 ` Don Zickus
2010-08-20 14:17 ` [tip:perf/urgent] perf, x86: Fix handle_irq return values tip-bot for Peter Zijlstra
4 siblings, 1 reply; 85+ messages in thread
From: Don Zickus @ 2010-08-19 21:58 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Robert Richter, Cyrill Gorcunov, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Thu, Aug 19, 2010 at 04:27:13PM +0200, Peter Zijlstra wrote:
> x86_pmu.drain_pebs(regs);
> + }
>
> for_each_set_bit(bit, (unsigned long *)&status, X86_PMC_IDX_MAX) {
> struct perf_event *event = cpuc->events[bit];
>
> + handled++;
> +
> if (!test_bit(bit, cpuc->active_mask))
> continue;
Sorry I didn't notice this earlier, but I think you want that handled++
after the if(..) continue pieces. Otherwise you will always have a number
>1. :-)
Cheers,
Don
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-19 21:58 ` Don Zickus
@ 2010-08-20 8:50 ` Peter Zijlstra
0 siblings, 0 replies; 85+ messages in thread
From: Peter Zijlstra @ 2010-08-20 8:50 UTC (permalink / raw)
To: Don Zickus
Cc: Robert Richter, Cyrill Gorcunov, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Thu, 2010-08-19 at 17:58 -0400, Don Zickus wrote:
> On Thu, Aug 19, 2010 at 04:27:13PM +0200, Peter Zijlstra wrote:
> > x86_pmu.drain_pebs(regs);
> > + }
> >
> > for_each_set_bit(bit, (unsigned long *)&status, X86_PMC_IDX_MAX) {
> > struct perf_event *event = cpuc->events[bit];
> >
> > + handled++;
> > +
> > if (!test_bit(bit, cpuc->active_mask))
> > continue;
>
> Sorry I didn't notice this earlier, but I think you want that handled++
> after the if(..) continue pieces. Otherwise you will always have a number
> >1. :-)
Only if there's any remaining set bits in the overflow status reg,
right?
But if we want to be paranoid, we should also check if the event that
generated the overflow actually had the INT flag enabled, I guess ;-)
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-19 14:27 ` Peter Zijlstra
` (2 preceding siblings ...)
2010-08-19 21:58 ` Don Zickus
@ 2010-08-20 1:50 ` Don Zickus
2010-08-20 8:16 ` Ingo Molnar
2010-08-20 8:36 ` Robert Richter
2010-08-20 14:17 ` [tip:perf/urgent] perf, x86: Fix handle_irq return values tip-bot for Peter Zijlstra
4 siblings, 2 replies; 85+ messages in thread
From: Don Zickus @ 2010-08-20 1:50 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Robert Richter, Cyrill Gorcunov, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Thu, Aug 19, 2010 at 04:27:13PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-08-19 at 10:12 -0400, Don Zickus wrote:
> > On Thu, Aug 19, 2010 at 12:45:53PM +0200, Peter Zijlstra wrote:
> > >
> > > I queued it with that part changed to:
> >
> > I realized the other day this change doesn't cover the nehalem, core and p4
> > cases which use
> >
> > intel_pmu_handle_irq
> > p4_pmu_handle_irq
> >
> > as their handlers. Though that patch can go on top of Robert's.
>
> Something like this?
I tested this patch and Robert's on an AMD box and Nehalem box. Both
worked as intended. However I did notice that whenever the AMD box
detected handled >1, it was shortly followed by an unknown_nmi that was
properly eaten with Robert's logic. Whereas on the Nehalem box I saw a
lot of 'handled > 1' messages but very very few of them were followed by
an unknown_nmi message (and those messages that did come were properly
eaten).
Maybe that is just the differences in the cpu designs.
Of course I had to make the one change I mentioned previously for the
perf_event_intel.c file (moving the handled++ logic down a few lines).
I didn't run the test on a P4 box.
Looks great, thanks guys!
Cheers,
Don
>
> ---
> Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel.c
> +++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
> @@ -713,6 +713,7 @@ static int intel_pmu_handle_irq(struct p
> struct cpu_hw_events *cpuc;
> int bit, loops;
> u64 ack, status;
> + int handled = 0;
>
> perf_sample_data_init(&data, 0);
>
> @@ -743,12 +744,16 @@ again:
> /*
> * PEBS overflow sets bit 62 in the global status register
> */
> - if (__test_and_clear_bit(62, (unsigned long *)&status))
> + if (__test_and_clear_bit(62, (unsigned long *)&status)) {
> + handled++;
> x86_pmu.drain_pebs(regs);
> + }
>
> for_each_set_bit(bit, (unsigned long *)&status, X86_PMC_IDX_MAX) {
> struct perf_event *event = cpuc->events[bit];
>
> + handled++;
> +
> if (!test_bit(bit, cpuc->active_mask))
> continue;
>
> @@ -772,7 +777,7 @@ again:
>
> done:
> intel_pmu_enable_all(0);
> - return 1;
> + return handled;
> }
>
> static struct event_constraint *
> Index: linux-2.6/arch/x86/kernel/cpu/perf_event_p4.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_p4.c
> +++ linux-2.6/arch/x86/kernel/cpu/perf_event_p4.c
> @@ -673,7 +673,7 @@ static int p4_pmu_handle_irq(struct pt_r
> if (!overflow && (val & (1ULL << (x86_pmu.cntval_bits - 1))))
> continue;
>
> - handled += overflow;
> + handled += !!overflow;
>
> /* event overflow for sure */
> data.period = event->hw.last_period;
> @@ -690,7 +690,7 @@ static int p4_pmu_handle_irq(struct pt_r
> inc_irq_stat(apic_perf_irqs);
> }
>
> - return handled > 0;
> + return handled;
> }
>
> /*
>
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-20 1:50 ` Don Zickus
@ 2010-08-20 8:16 ` Ingo Molnar
2010-08-20 10:04 ` Peter Zijlstra
2010-08-20 8:36 ` Robert Richter
1 sibling, 1 reply; 85+ messages in thread
From: Ingo Molnar @ 2010-08-20 8:16 UTC (permalink / raw)
To: Don Zickus
Cc: Peter Zijlstra, Robert Richter, Cyrill Gorcunov, Lin Ming,
fweisbec, linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
* Don Zickus <dzickus@redhat.com> wrote:
> On Thu, Aug 19, 2010 at 04:27:13PM +0200, Peter Zijlstra wrote:
> > On Thu, 2010-08-19 at 10:12 -0400, Don Zickus wrote:
> > > On Thu, Aug 19, 2010 at 12:45:53PM +0200, Peter Zijlstra wrote:
> > > >
> > > > I queued it with that part changed to:
> > >
> > > I realized the other day this change doesn't cover the nehalem, core and p4
> > > cases which use
> > >
> > > intel_pmu_handle_irq
> > > p4_pmu_handle_irq
> > >
> > > as their handlers. Though that patch can go on top of Robert's.
> >
> > Something like this?
>
> I tested this patch and Robert's on an AMD box and Nehalem box. Both
> worked as intended. However I did notice that whenever the AMD box
> detected handled >1, it was shortly followed by an unknown_nmi that was
> properly eaten with Robert's logic. Whereas on the Nehalem box I saw a
> lot of 'handled > 1' messages but very very few of them were followed by
> an unknown_nmi message (and those messages that did come were properly
> eaten).
>
> Maybe that is just the differences in the cpu designs.
>
> Of course I had to make the one change I mentioned previously for the
> perf_event_intel.c file (moving the handled++ logic down a few lines).
>
> I didn't run the test on a P4 box.
>
> Looks great, thanks guys!
Please someone send the final version with a changelog, with all the acks and
tested-by's added, so that i can send it Linuswards.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-20 8:16 ` Ingo Molnar
@ 2010-08-20 10:04 ` Peter Zijlstra
2010-08-20 10:30 ` Cyrill Gorcunov
2010-08-20 12:39 ` Don Zickus
0 siblings, 2 replies; 85+ messages in thread
From: Peter Zijlstra @ 2010-08-20 10:04 UTC (permalink / raw)
To: Ingo Molnar
Cc: Don Zickus, Robert Richter, Cyrill Gorcunov, Lin Ming, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Fri, 2010-08-20 at 10:16 +0200, Ingo Molnar wrote:
>
>
> Please someone send the final version with a changelog, with all the acks and
> tested-by's added, so that i can send it Linuswards.
I've got two patches queued, one from Robert (with a slight
modification, which removes a goto) and one from myself converting the
intel and p4 irq handler.
They're available at:
programming.kicks-ass.net/sekrit/patches.tar.bz2
I did add Tested-by tags, although I suspect its not the exact same bits
Don tested..
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-20 10:04 ` Peter Zijlstra
@ 2010-08-20 10:30 ` Cyrill Gorcunov
2010-08-20 12:39 ` Don Zickus
1 sibling, 0 replies; 85+ messages in thread
From: Cyrill Gorcunov @ 2010-08-20 10:30 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Don Zickus, Robert Richter, Lin Ming, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On 8/20/10, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, 2010-08-20 at 10:16 +0200, Ingo Molnar wrote:
>>
>>
>> Please someone send the final version with a changelog, with all the acks
>> and
>> tested-by's added, so that i can send it Linuswards.
>
> I've got two patches queued, one from Robert (with a slight
> modification, which removes a goto) and one from myself converting the
> intel and p4 irq handler.
>
> They're available at:
>
> programming.kicks-ass.net/sekrit/patches.tar.bz2
>
> I did add Tested-by tags, although I suspect its not the exact same bits
> Don tested..
>
My Ack if needed
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-20 10:04 ` Peter Zijlstra
2010-08-20 10:30 ` Cyrill Gorcunov
@ 2010-08-20 12:39 ` Don Zickus
2010-08-20 13:27 ` Ingo Molnar
1 sibling, 1 reply; 85+ messages in thread
From: Don Zickus @ 2010-08-20 12:39 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Robert Richter, Cyrill Gorcunov, Lin Ming, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Fri, Aug 20, 2010 at 12:04:11PM +0200, Peter Zijlstra wrote:
> On Fri, 2010-08-20 at 10:16 +0200, Ingo Molnar wrote:
> >
> >
> > Please someone send the final version with a changelog, with all the acks and
> > tested-by's added, so that i can send it Linuswards.
>
> I've got two patches queued, one from Robert (with a slight
> modification, which removes a goto) and one from myself converting the
> intel and p4 irq handler.
>
> They're available at:
>
> programming.kicks-ass.net/sekrit/patches.tar.bz2
>
> I did add Tested-by tags, although I suspect its not the exact same bits
> Don tested..
I can re-test those exact patches later today if needed.
Cheers,
Don
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-20 12:39 ` Don Zickus
@ 2010-08-20 13:27 ` Ingo Molnar
2010-08-20 13:51 ` Don Zickus
0 siblings, 1 reply; 85+ messages in thread
From: Ingo Molnar @ 2010-08-20 13:27 UTC (permalink / raw)
To: Don Zickus
Cc: Peter Zijlstra, Robert Richter, Cyrill Gorcunov, Lin Ming,
fweisbec, linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
* Don Zickus <dzickus@redhat.com> wrote:
> On Fri, Aug 20, 2010 at 12:04:11PM +0200, Peter Zijlstra wrote:
> > On Fri, 2010-08-20 at 10:16 +0200, Ingo Molnar wrote:
> > >
> > >
> > > Please someone send the final version with a changelog, with all the acks and
> > > tested-by's added, so that i can send it Linuswards.
> >
> > I've got two patches queued, one from Robert (with a slight
> > modification, which removes a goto) and one from myself converting the
> > intel and p4 irq handler.
> >
> > They're available at:
> >
> > programming.kicks-ass.net/sekrit/patches.tar.bz2
> >
> > I did add Tested-by tags, although I suspect its not the exact same bits
> > Don tested..
>
> I can re-test those exact patches later today if needed.
Please pick up -tip in an hour or two and holler if something looks wrong.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-20 13:27 ` Ingo Molnar
@ 2010-08-20 13:51 ` Don Zickus
2010-08-20 14:17 ` Ingo Molnar
0 siblings, 1 reply; 85+ messages in thread
From: Don Zickus @ 2010-08-20 13:51 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Robert Richter, Cyrill Gorcunov, Lin Ming,
fweisbec, linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Fri, Aug 20, 2010 at 03:27:37PM +0200, Ingo Molnar wrote:
>
> * Don Zickus <dzickus@redhat.com> wrote:
>
> > On Fri, Aug 20, 2010 at 12:04:11PM +0200, Peter Zijlstra wrote:
> > > On Fri, 2010-08-20 at 10:16 +0200, Ingo Molnar wrote:
> > > >
> > > >
> > > > Please someone send the final version with a changelog, with all the acks and
> > > > tested-by's added, so that i can send it Linuswards.
> > >
> > > I've got two patches queued, one from Robert (with a slight
> > > modification, which removes a goto) and one from myself converting the
> > > intel and p4 irq handler.
> > >
> > > They're available at:
> > >
> > > programming.kicks-ass.net/sekrit/patches.tar.bz2
> > >
> > > I did add Tested-by tags, although I suspect its not the exact same bits
> > > Don tested..
> >
> > I can re-test those exact patches later today if needed.
>
> Please pick up -tip in an hour or two and holler if something looks wrong.
Easy enough.
Cheers,
Don
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-20 13:51 ` Don Zickus
@ 2010-08-20 14:17 ` Ingo Molnar
2010-08-20 20:45 ` Cyrill Gorcunov
2010-08-24 21:48 ` Don Zickus
0 siblings, 2 replies; 85+ messages in thread
From: Ingo Molnar @ 2010-08-20 14:17 UTC (permalink / raw)
To: Don Zickus
Cc: Peter Zijlstra, Robert Richter, Cyrill Gorcunov, Lin Ming,
fweisbec, linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
it's not working so well, i'm getting:
Uhhuh. NMI received for unknown reason 00 on CPU 9.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue
on a nehalem box, after a perf top and perf stat run.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-20 14:17 ` Ingo Molnar
@ 2010-08-20 20:45 ` Cyrill Gorcunov
2010-08-24 21:48 ` Don Zickus
1 sibling, 0 replies; 85+ messages in thread
From: Cyrill Gorcunov @ 2010-08-20 20:45 UTC (permalink / raw)
To: Ingo Molnar
Cc: Don Zickus, Peter Zijlstra, Robert Richter, Lin Ming, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Fri, Aug 20, 2010 at 04:17:03PM +0200, Ingo Molnar wrote:
>
> it's not working so well, i'm getting:
>
> Uhhuh. NMI received for unknown reason 00 on CPU 9.
> Do you have a strange power saving mode enabled?
> Dazed and confused, but trying to continue
>
> on a nehalem box, after a perf top and perf stat run.
>
> Thanks,
>
> Ingo
>
Might not this be the case of AAN93 erratum, which says the following
| The processor can be configured to issue a PMI (performance monitor interrupt)
| upon overflow of the IA32_FIXED_CTR0 MSR (309H). A single PMI should be observed on
| overflow of IA32_FIXED_CTR0, however multiple PMIs are observed when this
| erratum occurs.
Just a thought.
-- Cyrill
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-20 14:17 ` Ingo Molnar
2010-08-20 20:45 ` Cyrill Gorcunov
@ 2010-08-24 21:48 ` Don Zickus
1 sibling, 0 replies; 85+ messages in thread
From: Don Zickus @ 2010-08-24 21:48 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Robert Richter, Cyrill Gorcunov, Lin Ming,
fweisbec, linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On Fri, Aug 20, 2010 at 04:17:03PM +0200, Ingo Molnar wrote:
>
> it's not working so well, i'm getting:
>
> Uhhuh. NMI received for unknown reason 00 on CPU 9.
> Do you have a strange power saving mode enabled?
> Dazed and confused, but trying to continue
>
> on a nehalem box, after a perf top and perf stat run.
After applying the patch below, I ran the following commands on my nehalem
box without reproducing what you are seeing. Any thing else I can run
that might trigger it? (I also ran them on an amd phenom quad-core box).
I used 2.6.32-rc2 plus Robert's and 2 of PeterZ's patches.
perf top
perf stat -a -e cycles -e instructions -e cache-references -e cache-misses -e branch-misses -- sleep 5
perf record -f -a -e cycles -e instructions -e cache-references -e cache-misses -e branch-misses -- sleep 5
Cheers,
Don
>From 198be1044fa603bc9582a5c19134fdf9a433fff0 Mon Sep 17 00:00:00 2001
From: Don Zickus <dzickus@redhat.com>
Date: Tue, 24 Aug 2010 17:43:17 -0400
Subject: [PATCH] [x86] perf: rename nmi variable to avoid clash with entry point
There is already an entry point named .nmi in entry.S and that seems to clash
with the per_cpu variable nmi defined in commit f3a860d8. Renaming this
variable avoids the namespace collision.
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
arch/x86/kernel/cpu/perf_event.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index dd2fceb..2a05ea4 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1205,7 +1205,7 @@ struct pmu_nmi_state {
int handled;
};
-static DEFINE_PER_CPU(struct pmu_nmi_state, nmi);
+static DEFINE_PER_CPU_PAGE_ALIGNED(struct pmu_nmi_state, pmu_nmi);
static int __kprobes
perf_event_nmi_handler(struct notifier_block *self,
@@ -1224,7 +1224,7 @@ perf_event_nmi_handler(struct notifier_block *self,
break;
case DIE_NMIUNKNOWN:
this_nmi = percpu_read(irq_stat.__nmi_count);
- if (this_nmi != __get_cpu_var(nmi).marked)
+ if (this_nmi != __get_cpu_var(pmu_nmi).marked)
/* let the kernel handle the unknown nmi */
return NOTIFY_DONE;
/*
@@ -1248,8 +1248,8 @@ perf_event_nmi_handler(struct notifier_block *self,
this_nmi = percpu_read(irq_stat.__nmi_count);
if ((handled > 1) ||
/* the next nmi could be a back-to-back nmi */
- ((__get_cpu_var(nmi).marked == this_nmi) &&
- (__get_cpu_var(nmi).handled > 1))) {
+ ((__get_cpu_var(pmu_nmi).marked == this_nmi) &&
+ (__get_cpu_var(pmu_nmi).handled > 1))) {
/*
* We could have two subsequent back-to-back nmis: The
* first handles more than one counter, the 2nd
@@ -1260,8 +1260,8 @@ perf_event_nmi_handler(struct notifier_block *self,
* handling more than one counter. We will mark the
* next (3rd) and then drop it if unhandled.
*/
- __get_cpu_var(nmi).marked = this_nmi + 1;
- __get_cpu_var(nmi).handled = handled;
+ __get_cpu_var(pmu_nmi).marked = this_nmi + 1;
+ __get_cpu_var(pmu_nmi).handled = handled;
}
return NOTIFY_STOP;
--
1.7.2.1
^ permalink raw reply related [flat|nested] 85+ messages in thread
* Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs
2010-08-20 1:50 ` Don Zickus
2010-08-20 8:16 ` Ingo Molnar
@ 2010-08-20 8:36 ` Robert Richter
1 sibling, 0 replies; 85+ messages in thread
From: Robert Richter @ 2010-08-20 8:36 UTC (permalink / raw)
To: Don Zickus
Cc: Peter Zijlstra, Cyrill Gorcunov, Lin Ming, Ingo Molnar, fweisbec,
linux-kernel, Huang, Ying, Yinghai Lu, Andi Kleen
On 19.08.10 21:50:17, Don Zickus wrote:
> I tested this patch and Robert's on an AMD box and Nehalem box. Both
> worked as intended. However I did notice that whenever the AMD box
> detected handled >1, it was shortly followed by an unknown_nmi that was
> properly eaten with Robert's logic. Whereas on the Nehalem box I saw a
> lot of 'handled > 1' messages but very very few of them were followed by
> an unknown_nmi message (and those messages that did come were properly
> eaten).
Don, thanks for testing this.
-Robert
--
Advanced Micro Devices, Inc.
Operating System Research Center
^ permalink raw reply [flat|nested] 85+ messages in thread
* [tip:perf/urgent] perf, x86: Fix handle_irq return values
2010-08-19 14:27 ` Peter Zijlstra
` (3 preceding siblings ...)
2010-08-20 1:50 ` Don Zickus
@ 2010-08-20 14:17 ` tip-bot for Peter Zijlstra
4 siblings, 0 replies; 85+ messages in thread
From: tip-bot for Peter Zijlstra @ 2010-08-20 14:17 UTC (permalink / raw)
To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, a.p.zijlstra, tglx, dzickus
Commit-ID: 4a31bebe71ab2e4f6ecd6e5f9f2ac9f0ff38ff76
Gitweb: http://git.kernel.org/tip/4a31bebe71ab2e4f6ecd6e5f9f2ac9f0ff38ff76
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Thu, 19 Aug 2010 16:28:00 +0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 20 Aug 2010 15:00:06 +0200
perf, x86: Fix handle_irq return values
Now that we rely on the number of handled overflows, ensure all handle_irq
implementations actually return the right number.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Don Zickus <dzickus@redhat.com>
LKML-Reference: <1282228033.2605.204.camel@laptop>
---
arch/x86/kernel/cpu/perf_event_intel.c | 9 +++++++--
arch/x86/kernel/cpu/perf_event_p4.c | 2 +-
2 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index d8d86d0..4539b4b 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -713,6 +713,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
struct cpu_hw_events *cpuc;
int bit, loops;
u64 ack, status;
+ int handled = 0;
perf_sample_data_init(&data, 0);
@@ -743,12 +744,16 @@ again:
/*
* PEBS overflow sets bit 62 in the global status register
*/
- if (__test_and_clear_bit(62, (unsigned long *)&status))
+ if (__test_and_clear_bit(62, (unsigned long *)&status)) {
+ handled++;
x86_pmu.drain_pebs(regs);
+ }
for_each_set_bit(bit, (unsigned long *)&status, X86_PMC_IDX_MAX) {
struct perf_event *event = cpuc->events[bit];
+ handled++;
+
if (!test_bit(bit, cpuc->active_mask))
continue;
@@ -772,7 +777,7 @@ again:
done:
intel_pmu_enable_all(0);
- return 1;
+ return handled;
}
static struct event_constraint *
diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index febb12c..d470c91 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -690,7 +690,7 @@ static int p4_pmu_handle_irq(struct pt_regs *regs)
inc_irq_stat(apic_perf_irqs);
}
- return handled > 0;
+ return handled;
}
/*
^ permalink raw reply related [flat|nested] 85+ messages in thread
* [tip:perf/urgent] perf, x86: Try to handle unknown nmis with an enabled PMU
2010-08-17 15:22 ` [PATCH -v3] " Robert Richter
2010-08-17 16:17 ` Cyrill Gorcunov
2010-08-19 10:45 ` Peter Zijlstra
@ 2010-08-20 14:17 ` tip-bot for Robert Richter
2 siblings, 0 replies; 85+ messages in thread
From: tip-bot for Robert Richter @ 2010-08-20 14:17 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, a.p.zijlstra, robert.richter, tglx,
dzickus, mingo
Commit-ID: 8e3e42b4f88602bb6e1f0b74afd80ff4703f79ce
Gitweb: http://git.kernel.org/tip/8e3e42b4f88602bb6e1f0b74afd80ff4703f79ce
Author: Robert Richter <robert.richter@amd.com>
AuthorDate: Tue, 17 Aug 2010 16:42:03 +0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 20 Aug 2010 15:00:05 +0200
perf, x86: Try to handle unknown nmis with an enabled PMU
When the PMU is enabled it is valid to have unhandled nmis, two
events could trigger 'simultaneously' raising two back-to-back
NMIs. If the first NMI handles both, the latter will be empty and daze
the CPU.
The solution to avoid an 'unknown nmi' massage in this case was simply
to stop the nmi handler chain when the PMU is enabled by stating
the nmi was handled. This has the drawback that a) we can not detect
unknown nmis anymore, and b) subsequent nmi handlers are not called.
This patch addresses this. Now, we check this unknown NMI if it could
be a PMU back-to-back NMI. Otherwise we pass it and let the kernel
handle the unknown nmi.
This is a debug log:
cpu #6, nmi #32333, skip_nmi #32330, handled = 1, time = 1934364430
cpu #6, nmi #32334, skip_nmi #32330, handled = 1, time = 1934704616
cpu #6, nmi #32335, skip_nmi #32336, handled = 2, time = 1936032320
cpu #6, nmi #32336, skip_nmi #32336, handled = 0, time = 1936034139
cpu #6, nmi #32337, skip_nmi #32336, handled = 1, time = 1936120100
cpu #6, nmi #32338, skip_nmi #32336, handled = 1, time = 1936404607
cpu #6, nmi #32339, skip_nmi #32336, handled = 1, time = 1937983416
cpu #6, nmi #32340, skip_nmi #32341, handled = 2, time = 1938201032
cpu #6, nmi #32341, skip_nmi #32341, handled = 0, time = 1938202830
cpu #6, nmi #32342, skip_nmi #32341, handled = 1, time = 1938443743
cpu #6, nmi #32343, skip_nmi #32341, handled = 1, time = 1939956552
cpu #6, nmi #32344, skip_nmi #32341, handled = 1, time = 1940073224
cpu #6, nmi #32345, skip_nmi #32341, handled = 1, time = 1940485677
cpu #6, nmi #32346, skip_nmi #32347, handled = 2, time = 1941947772
cpu #6, nmi #32347, skip_nmi #32347, handled = 1, time = 1941949818
cpu #6, nmi #32348, skip_nmi #32347, handled = 0, time = 1941951591
Uhhuh. NMI received for unknown reason 00 on CPU 6.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue
Deltas:
nmi #32334 340186
nmi #32335 1327704
nmi #32336 1819 <<<< back-to-back nmi [1]
nmi #32337 85961
nmi #32338 284507
nmi #32339 1578809
nmi #32340 217616
nmi #32341 1798 <<<< back-to-back nmi [2]
nmi #32342 240913
nmi #32343 1512809
nmi #32344 116672
nmi #32345 412453
nmi #32346 1462095 <<<< 1st nmi (standard) handling 2 counters
nmi #32347 2046 <<<< 2nd nmi (back-to-back) handling one counter
nmi #32348 1773 <<<< 3rd nmi (back-to-back) handling no counter! [3]
For back-to-back nmi detection there are the following rules:
The PMU nmi handler was handling more than one counter and no
counter was handled in the subsequent nmi (see [1] and [2] above).
There is another case if there are two subsequent back-to-back nmis
[3]. The 2nd is detected as back-to-back because the first handled
more than one counter. If the second handles one counter and the 3rd
handles nothing, we drop the 3rd nmi because it could be a
back-to-back nmi.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Don Zickus <dzickus@redhat.com>
LKML-Reference: <20100817152225.GQ26154@erda.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/kernel/cpu/perf_event.c | 59 +++++++++++++++++++++++++++++--------
1 files changed, 46 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index f2da20f..dd2fceb 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1154,7 +1154,7 @@ static int x86_pmu_handle_irq(struct pt_regs *regs)
/*
* event overflow
*/
- handled = 1;
+ handled++;
data.period = event->hw.last_period;
if (!x86_perf_event_set_period(event))
@@ -1200,12 +1200,20 @@ void perf_events_lapic_init(void)
apic_write(APIC_LVTPC, APIC_DM_NMI);
}
+struct pmu_nmi_state {
+ unsigned int marked;
+ int handled;
+};
+
+static DEFINE_PER_CPU(struct pmu_nmi_state, nmi);
+
static int __kprobes
perf_event_nmi_handler(struct notifier_block *self,
unsigned long cmd, void *__args)
{
struct die_args *args = __args;
- struct pt_regs *regs;
+ unsigned int this_nmi;
+ int handled;
if (!atomic_read(&active_events))
return NOTIFY_DONE;
@@ -1214,22 +1222,47 @@ perf_event_nmi_handler(struct notifier_block *self,
case DIE_NMI:
case DIE_NMI_IPI:
break;
-
+ case DIE_NMIUNKNOWN:
+ this_nmi = percpu_read(irq_stat.__nmi_count);
+ if (this_nmi != __get_cpu_var(nmi).marked)
+ /* let the kernel handle the unknown nmi */
+ return NOTIFY_DONE;
+ /*
+ * This one is a PMU back-to-back nmi. Two events
+ * trigger 'simultaneously' raising two back-to-back
+ * NMIs. If the first NMI handles both, the latter
+ * will be empty and daze the CPU. So, we drop it to
+ * avoid false-positive 'unknown nmi' messages.
+ */
+ return NOTIFY_STOP;
default:
return NOTIFY_DONE;
}
- regs = args->regs;
-
apic_write(APIC_LVTPC, APIC_DM_NMI);
- /*
- * Can't rely on the handled return value to say it was our NMI, two
- * events could trigger 'simultaneously' raising two back-to-back NMIs.
- *
- * If the first NMI handles both, the latter will be empty and daze
- * the CPU.
- */
- x86_pmu.handle_irq(regs);
+
+ handled = x86_pmu.handle_irq(args->regs);
+ if (!handled)
+ return NOTIFY_DONE;
+
+ this_nmi = percpu_read(irq_stat.__nmi_count);
+ if ((handled > 1) ||
+ /* the next nmi could be a back-to-back nmi */
+ ((__get_cpu_var(nmi).marked == this_nmi) &&
+ (__get_cpu_var(nmi).handled > 1))) {
+ /*
+ * We could have two subsequent back-to-back nmis: The
+ * first handles more than one counter, the 2nd
+ * handles only one counter and the 3rd handles no
+ * counter.
+ *
+ * This is the 2nd nmi because the previous was
+ * handling more than one counter. We will mark the
+ * next (3rd) and then drop it if unhandled.
+ */
+ __get_cpu_var(nmi).marked = this_nmi + 1;
+ __get_cpu_var(nmi).handled = handled;
+ }
return NOTIFY_STOP;
}
^ permalink raw reply related [flat|nested] 85+ messages in thread