All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test
@ 2011-02-04 12:17 Cyrill Gorcunov
  2011-02-04 16:59 ` Don Zickus
  2011-02-05  2:28 ` George Spelvin
  0 siblings, 2 replies; 11+ messages in thread
From: Cyrill Gorcunov @ 2011-02-04 12:17 UTC (permalink / raw)
  To: Ingo Molnar, Don Zickus
  Cc: George Spelvin, Meelis Roos, Lin Ming, Peter Zijlstra, lkml

[-- Attachment #1: Type: text/plain, Size: 215 bytes --]

Please apply it, sorry for non-inlined patch (have a web access only at moment).

Note that I've tested the patch on non-HT machine so if someone have HT'ed one
-- it would be great to test the patch there.

Cyrill

[-- Attachment #2: x86-perf-unflagged-nmi --]
[-- Type: application/octet-stream, Size: 2023 bytes --]

From: Cyrill Gorcunov <gorcunov@openvz.org>
Subject: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test

A couple of people have reported an unknown NMI issue on p4 pmu.
This patch should fix it.

Reported-by: George Spelvin <linux@horizon.com>
Reported-by: Meelis Roos <mroos@linux.ee>
Reported-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Ingo Molnar <mingo@elte.hu>
CC: Lin Ming <ming.m.lin@intel.com>
CC: Don Zickus <dzickus@redhat.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/x86/include/asm/perf_event_p4.h |    1 +
 arch/x86/kernel/cpu/perf_event_p4.c  |   11 ++++++++---
 2 files changed, 9 insertions(+), 3 deletions(-)

Index: linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
===================================================================
--- linux-2.6.tip.orig/arch/x86/include/asm/perf_event_p4.h
+++ linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
@@ -22,6 +22,7 @@
 
 #define ARCH_P4_CNTRVAL_BITS	(40)
 #define ARCH_P4_CNTRVAL_MASK	((1ULL << ARCH_P4_CNTRVAL_BITS) - 1)
+#define ARCH_P4_UNFLAGGED_BIT	((1ULL) << (ARCH_P4_CNTRVAL_BITS - 1))
 
 #define P4_ESCR_EVENT_MASK	0x7e000000U
 #define P4_ESCR_EVENT_SHIFT	25
Index: linux-2.6.tip/arch/x86/kernel/cpu/perf_event_p4.c
===================================================================
--- linux-2.6.tip.orig/arch/x86/kernel/cpu/perf_event_p4.c
+++ linux-2.6.tip/arch/x86/kernel/cpu/perf_event_p4.c
@@ -770,9 +770,14 @@ static inline int p4_pmu_clear_cccr_ovf(
 		return 1;
 	}
 
-	/* it might be unflagged overflow */
-	rdmsrl(hwc->event_base + hwc->idx, v);
-	if (!(v & ARCH_P4_CNTRVAL_MASK))
+	/*
+	 * at some circumstances the overflow might issue NMI but did
+	 * not set P4_CCCR_OVF bit so since a counter holds a negative value
+	 * we simply check for high bit being set, if it's cleared it means
+	 * the counter has reached zero value and continued counting before
+	 * real NMI signal was received
+	 */
+	if (!(v & ARCH_P4_UNFLAGGED_BIT))
 		return 1;
 
 	return 0;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test
  2011-02-04 12:17 [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test Cyrill Gorcunov
@ 2011-02-04 16:59 ` Don Zickus
  2011-02-04 17:32   ` Cyrill Gorcunov
  2011-02-06 19:21   ` Cyrill Gorcunov
  2011-02-05  2:28 ` George Spelvin
  1 sibling, 2 replies; 11+ messages in thread
From: Don Zickus @ 2011-02-04 16:59 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Ingo Molnar, George Spelvin, Meelis Roos, Lin Ming, Peter Zijlstra, lkml

On Fri, Feb 04, 2011 at 03:17:28PM +0300, Cyrill Gorcunov wrote:
> Please apply it, sorry for non-inlined patch (have a web access only at moment).
> 
> Note that I've tested the patch on non-HT machine so if someone have HT'ed one
> -- it would be great to test the patch there.

Hmm. For some reason, when I enable the kgdb testsuite, the box fails to
boot with hardlockup issues.  It seems like the code is swallowing the
NMIs? I basically applied this patch on top of 2.6.38-rc3 and ran it on my
Xeon box (p4 w/HT).

Cheers,
Don

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test
  2011-02-04 16:59 ` Don Zickus
@ 2011-02-04 17:32   ` Cyrill Gorcunov
  2011-02-06 19:21   ` Cyrill Gorcunov
  1 sibling, 0 replies; 11+ messages in thread
From: Cyrill Gorcunov @ 2011-02-04 17:32 UTC (permalink / raw)
  To: Don Zickus
  Cc: Ingo Molnar, George Spelvin, Meelis Roos, Lin Ming,
	Peter Zijlstra, lkml, Jason Wessel

On 02/04/2011 07:59 PM, Don Zickus wrote:
> On Fri, Feb 04, 2011 at 03:17:28PM +0300, Cyrill Gorcunov wrote:
>> Please apply it, sorry for non-inlined patch (have a web access only at moment).
>>
>> Note that I've tested the patch on non-HT machine so if someone have HT'ed one
>> -- it would be great to test the patch there.
> 
> Hmm. For some reason, when I enable the kgdb testsuite, the box fails to
> boot with hardlockup issues.  It seems like the code is swallowing the
> NMIs? I basically applied this patch on top of 2.6.38-rc3 and ran it on my
> Xeon box (p4 w/HT).
> 
> Cheers,
> Don

  Interesting, seems old kgdb issue got back. The former unknown nmi problem
is due to commit 047a3772feaae8e43d81d790f3d3f80dae8ae676 which assumed that
counter stays zero when unflagged overflow happened, but it seems this is not
what happens on hw level. I noted that at moment of nmi the counter reached
some positive value so the new patch simply checks for negative bit being set.

  I must admit I forgot to test with kgdb testsuite at bootup time and I'll
be able to test this at monday in best case. I'll try to figure out what might
happen by code reading for a while (the only idea comes is that nmi from kgdb
get slipped with one issued by a perf).

-- 
    Cyrill

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test
  2011-02-04 12:17 [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test Cyrill Gorcunov
  2011-02-04 16:59 ` Don Zickus
@ 2011-02-05  2:28 ` George Spelvin
  2011-02-05  8:40   ` Cyrill Gorcunov
  1 sibling, 1 reply; 11+ messages in thread
From: George Spelvin @ 2011-02-05  2:28 UTC (permalink / raw)
  To: dzickus, gorcunov, mingo
  Cc: a.p.zijlstra, linux-kernel, linux, ming.m.lin, mroos

The earlier patch didn't fix things after all.  I'm rebooting with this one now.

(2.6.38-rc3 wih x86_pmu_start patch.)
Feb  2 01:00:03: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb  2 01:00:03: Do you have a strange power saving mode enabled?
Feb  2 01:00:03: Dazed and confused, but trying to continue
Feb  2 01:01:04: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb  2 01:01:04: Do you have a strange power saving mode enabled?
Feb  2 01:01:04: Dazed and confused, but trying to continue
Feb  2 04:22:20: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb  2 04:22:20: Do you have a strange power saving mode enabled?
Feb  2 04:22:20: Dazed and confused, but trying to continue
Feb  2 06:28:17: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb  2 06:28:17: Do you have a strange power saving mode enabled?
Feb  2 06:28:17: Dazed and confused, but trying to continue
Feb  2 08:39:01: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb  2 08:39:01: Do you have a strange power saving mode enabled?
Feb  2 08:39:01: Dazed and confused, but trying to continue
Feb  2 16:00:04: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb  2 16:00:04: Do you have a strange power saving mode enabled?
Feb  2 16:00:04: Dazed and confused, but trying to continue
Feb  2 20:04:41: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb  2 20:04:41: Do you have a strange power saving mode enabled?
Feb  2 20:04:41: Dazed and confused, but trying to continue
Feb  2 22:21:27: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb  2 22:21:27: Do you have a strange power saving mode enabled?
Feb  2 22:21:27: Dazed and confused, but trying to continue
Feb  3 00:09:11: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb  3 00:09:11: Do you have a strange power saving mode enabled?
Feb  3 00:09:11: Dazed and confused, but trying to continue
Feb  3 00:24:02: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb  3 00:24:02: Do you have a strange power saving mode enabled?
Feb  3 00:24:02: Dazed and confused, but trying to continue
Feb  3 01:00:12: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb  3 01:00:12: Do you have a strange power saving mode enabled?
Feb  3 01:00:12: Dazed and confused, but trying to continue
Feb  3 01:27:50: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb  3 01:27:50: Do you have a strange power saving mode enabled?
Feb  3 01:27:50: Dazed and confused, but trying to continue
Feb  3 06:27:55: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb  3 06:27:55: Do you have a strange power saving mode enabled?
Feb  3 06:27:55: Dazed and confused, but trying to continue
Feb  3 09:21:06: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb  3 09:21:06: Do you have a strange power saving mode enabled?
Feb  3 09:21:06: Dazed and confused, but trying to continue
Feb  3 11:35:23: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb  3 11:35:23: Do you have a strange power saving mode enabled?
Feb  3 11:35:23: Dazed and confused, but trying to continue
Feb  3 17:52:05: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb  3 17:52:05: Do you have a strange power saving mode enabled?
Feb  3 17:52:05: Dazed and confused, but trying to continue
Feb  3 18:01:44: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb  3 18:01:44: Do you have a strange power saving mode enabled?
Feb  3 18:01:44: Dazed and confused, but trying to continue
Feb  3 18:11:23: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb  3 18:11:23: Do you have a strange power saving mode enabled?
Feb  3 18:11:23: Dazed and confused, but trying to continue
Feb  3 20:02:43: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb  3 20:02:43: Do you have a strange power saving mode enabled?
Feb  3 20:02:43: Dazed and confused, but trying to continue
Feb  3 22:38:43: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb  3 22:38:43: Do you have a strange power saving mode enabled?
Feb  3 22:38:43: Dazed and confused, but trying to continue
Feb  4 01:00:41: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb  4 01:00:41: Do you have a strange power saving mode enabled?
Feb  4 01:00:41: Dazed and confused, but trying to continue
Feb  4 05:00:02: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb  4 05:00:02: Do you have a strange power saving mode enabled?
Feb  4 05:00:02: Dazed and confused, but trying to continue
Feb  4 06:28:42: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb  4 06:28:42: Do you have a strange power saving mode enabled?
Feb  4 06:28:42: Dazed and confused, but trying to continue
Feb  4 13:35:29: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb  4 13:35:29: Do you have a strange power saving mode enabled?
Feb  4 13:35:29: Dazed and confused, but trying to continue
Feb  4 20:00:32: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb  4 20:00:32: Do you have a strange power saving mode enabled?
Feb  4 20:00:32: Dazed and confused, but trying to continue
Feb  4 21:12:52: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb  4 21:12:52: Do you have a strange power saving mode enabled?
Feb  4 21:12:52: Dazed and confused, but trying to continue


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test
  2011-02-05  2:28 ` George Spelvin
@ 2011-02-05  8:40   ` Cyrill Gorcunov
  2011-02-05  9:15     ` George Spelvin
  0 siblings, 1 reply; 11+ messages in thread
From: Cyrill Gorcunov @ 2011-02-05  8:40 UTC (permalink / raw)
  To: George Spelvin
  Cc: dzickus, mingo, a.p.zijlstra, linux-kernel, ming.m.lin, mroos

On 02/05/2011 05:28 AM, George Spelvin wrote:
> The earlier patch didn't fix things after all.  I'm rebooting with this one now.
> 
> (2.6.38-rc3 wih x86_pmu_start patch.)
> Feb  2 01:00:03: Uhhuh. NMI received for unknown reason 2d on CPU 0.
> Feb  2 01:00:03: Do you have a strange power saving mode enabled?
...

Hi George, this log is when unflagged overflow patch applied or previous one?

-- 
    Cyrill

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test
  2011-02-05  8:40   ` Cyrill Gorcunov
@ 2011-02-05  9:15     ` George Spelvin
  2011-02-05  9:22       ` Cyrill Gorcunov
  0 siblings, 1 reply; 11+ messages in thread
From: George Spelvin @ 2011-02-05  9:15 UTC (permalink / raw)
  To: gorcunov, linux
  Cc: a.p.zijlstra, dzickus, linux-kernel, ming.m.lin, mingo, mroos

> Hi George, this log is when unflagged overflow patch applied or previous one?

The earlier one.  As I said, the patch to x86_pmu_start.  The later one (no
complaints in 6 hours of uptime so far) was to p4_pmu_clear_cccr_ovf.

Thank you very much!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test
  2011-02-05  9:15     ` George Spelvin
@ 2011-02-05  9:22       ` Cyrill Gorcunov
  0 siblings, 0 replies; 11+ messages in thread
From: Cyrill Gorcunov @ 2011-02-05  9:22 UTC (permalink / raw)
  To: George Spelvin
  Cc: a.p.zijlstra, dzickus, linux-kernel, ming.m.lin, mingo, mroos

On 02/05/2011 12:15 PM, George Spelvin wrote:
>> Hi George, this log is when unflagged overflow patch applied or previous one?
> 
> The earlier one.  As I said, the patch to x86_pmu_start.  The later one (no
> complaints in 6 hours of uptime so far) was to p4_pmu_clear_cccr_ovf.
> 
> Thank you very much!

Thanks for testing George! There is still a problem with kgdb bootup tests,
but at moment I didn't resolve this thing. Will ping as only get any news ;)

-- 
    Cyrill

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test
  2011-02-04 16:59 ` Don Zickus
  2011-02-04 17:32   ` Cyrill Gorcunov
@ 2011-02-06 19:21   ` Cyrill Gorcunov
  2011-02-07 17:22     ` Cyrill Gorcunov
  1 sibling, 1 reply; 11+ messages in thread
From: Cyrill Gorcunov @ 2011-02-06 19:21 UTC (permalink / raw)
  To: Don Zickus
  Cc: Ingo Molnar, George Spelvin, Meelis Roos, Lin Ming, Peter Zijlstra, lkml

On 02/04/2011 07:59 PM, Don Zickus wrote:
> On Fri, Feb 04, 2011 at 03:17:28PM +0300, Cyrill Gorcunov wrote:
>> Please apply it, sorry for non-inlined patch (have a web access only at moment).
>>
>> Note that I've tested the patch on non-HT machine so if someone have HT'ed one
>> -- it would be great to test the patch there.
> 
> Hmm. For some reason, when I enable the kgdb testsuite, the box fails to
> boot with hardlockup issues.  It seems like the code is swallowing the
> NMIs? I basically applied this patch on top of 2.6.38-rc3 and ran it on my
> Xeon box (p4 w/HT).
> 
> Cheers,
> Don

  Don, I hope to get access to p4 machine tomorrow and investigate this issue
(didn't manage to read kgdb code this weekend). Sorry for delay.

-- 
    Cyrill

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test
  2011-02-06 19:21   ` Cyrill Gorcunov
@ 2011-02-07 17:22     ` Cyrill Gorcunov
  2011-02-08 14:26       ` George Spelvin
  0 siblings, 1 reply; 11+ messages in thread
From: Cyrill Gorcunov @ 2011-02-07 17:22 UTC (permalink / raw)
  To: Don Zickus
  Cc: Ingo Molnar, George Spelvin, Meelis Roos, Lin Ming,
	Peter Zijlstra, lkml, Jason Wessel

On 02/06/2011 10:21 PM, Cyrill Gorcunov wrote:
> On 02/04/2011 07:59 PM, Don Zickus wrote:
>> On Fri, Feb 04, 2011 at 03:17:28PM +0300, Cyrill Gorcunov wrote:
>>> Please apply it, sorry for non-inlined patch (have a web access only at moment).
>>>
>>> Note that I've tested the patch on non-HT machine so if someone have HT'ed one
>>> -- it would be great to test the patch there.
>>
>> Hmm. For some reason, when I enable the kgdb testsuite, the box fails to
>> boot with hardlockup issues.  It seems like the code is swallowing the
>> NMIs? I basically applied this patch on top of 2.6.38-rc3 and ran it on my
>> Xeon box (p4 w/HT).
>>
>> Cheers,
>> Don
> 
>   Don, I hope to get access to p4 machine tomorrow and investigate this issue
> (didn't manage to read kgdb code this weekend). Sorry for delay.
> 

  Just for info -- I've tested the patch on p4 machine with kgdb bootup tests
and results are somehow strange. If I disable nmi-watchdog the tests passes
fine and i'm able to run perf top (or anything related). Same time if I leave
nmi-watchdog enabled by default the borrowed event reported, kgdb tests passes
but nmi-watchdog never fires (ie I see nmi irq counter remains zero). So I've added
debug prints and found that counter reaches positive values and didn't issues
nmi at all. All in one -- i'm still investigating this issue unfortunatelly
the kernel build procedure sometime takes hours on this machine (even with
ccache enabled) so it goes a way slower then I expected :(
-- 
    Cyrill

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test
  2011-02-07 17:22     ` Cyrill Gorcunov
@ 2011-02-08 14:26       ` George Spelvin
  2011-02-08 14:38         ` Cyrill Gorcunov
  0 siblings, 1 reply; 11+ messages in thread
From: George Spelvin @ 2011-02-08 14:26 UTC (permalink / raw)
  To: dzickus, gorcunov
  Cc: a.p.zijlstra, jason.wessel, linux-kernel, linux, ming.m.lin,
	mingo, mroos

I don't use kgdb, so I can't comment on that, but patch #2
("perf, x86: P4 PMU -- Fix unflagged overflows test")
has given me no problems in 3.5 days of uptime.

Thank you very much!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test
  2011-02-08 14:26       ` George Spelvin
@ 2011-02-08 14:38         ` Cyrill Gorcunov
  0 siblings, 0 replies; 11+ messages in thread
From: Cyrill Gorcunov @ 2011-02-08 14:38 UTC (permalink / raw)
  To: George Spelvin
  Cc: dzickus, a.p.zijlstra, jason.wessel, linux-kernel, ming.m.lin,
	mingo, mroos

On 02/08/2011 05:26 PM, George Spelvin wrote:
> I don't use kgdb, so I can't comment on that, but patch #2
> ("perf, x86: P4 PMU -- Fix unflagged overflows test")
> has given me no problems in 3.5 days of uptime.
> 
> Thank you very much!

Thanks for testing George! I'm still investigatin this issue
(though didn't manage to spend more time today but there are
 some results obtained I need to analyze first).

-- 
    Cyrill

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-02-08 14:38 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-04 12:17 [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test Cyrill Gorcunov
2011-02-04 16:59 ` Don Zickus
2011-02-04 17:32   ` Cyrill Gorcunov
2011-02-06 19:21   ` Cyrill Gorcunov
2011-02-07 17:22     ` Cyrill Gorcunov
2011-02-08 14:26       ` George Spelvin
2011-02-08 14:38         ` Cyrill Gorcunov
2011-02-05  2:28 ` George Spelvin
2011-02-05  8:40   ` Cyrill Gorcunov
2011-02-05  9:15     ` George Spelvin
2011-02-05  9:22       ` Cyrill Gorcunov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.