All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
@ 2014-06-13  8:44 HATAYAMA Daisuke
  2014-06-16 15:30 ` Don Zickus
  0 siblings, 1 reply; 7+ messages in thread
From: HATAYAMA Daisuke @ 2014-06-13  8:44 UTC (permalink / raw)
  To: matt, peterz, acme, d.hatayama, mingo, paulus, hpa, tglx
  Cc: x86, linux-kernel

Currently, a NMI handler for NMI watchdog may falsely handle any NMI
signaled for different purpose if CondChgd bit in
MSR_CORE_PERF_GLOBAL_STATUS MSR is set.

This commit deals with the issue simply by ignoring CondChgd bit.

Here is explanation in detail.

On x86 NMI watchdog uses performance monitoring feature to
periodically signal NMI each time performance counter gets overflowed.

intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI
handler of NMI watchdog, perf_event_nmi_handler(). It identifies an
owner of a given NMI by looking at overflow status bits in
MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it
handles the given NMI as its own NMI.

The problem is that the intel_pmu_handle_irq() doesn't distinguish
CondChgd bit from other bits. Unlike the other status bits, CondChgd
bit doesn't represent overflow status for performance counters. Thus,
CondChgd bit cannot be thought of as a mark indicating a given NMI is
NMI watchdog's. As a result, if CondChgd bit is set, any NMI is
falsely handled by the NMI handler of NMI watchdog. Also, if type of
the falsely handled NMI is either NMI_UNKNOWN, NMI_SERR or
NMI_IO_CHECK, the corresponding action is never performed until
CondChgd bit is cleared.

I noticed this behavior on systems with Ivy Bridge processors: Intel
Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems,
CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set
in the beginning at boot. Then the CondChgd bit is immediately cleared
by next wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain
0.

On the other hand, on older processors such as Nehalem, Xeon E7540,
CondChgd bit is not set in the beginning at boot.

I'm not sure about exact behavior of CondChgd bit, in particular when
this bit is set. Although I read Intel System Programmer's Manual to
figure out that, the descriptions I found are:

  In 18.9.1:

  "The MSR_PERF_GLOBAL_STATUS MSR also provides a ‘sticky bit’ to
   indicate changes to the state of performancmonitoring hardware"

  In Table 35-2 IA-32 Architectural MSRs

  63 CondChg: status bits of this register has changed.

These are different from the bahviour I see on the actual system as I
explained above.

At least, I think ignoring CondChgd bit should be enough for NMI
watchdog perspective.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---
 arch/x86/kernel/cpu/perf_event_intel.c |    9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index adb02aa..07846d7 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1382,6 +1382,15 @@ again:
 	intel_pmu_lbr_read();
 
 	/*
+	 * CondChgd bit 63 doesn't mean any overflow status. Ignore
+	 * and clear the bit.
+	 */
+	if (__test_and_clear_bit(63, (unsigned long *)&status)) {
+		if (!status)
+			goto done;
+	}
+
+	/*
 	 * PEBS overflow sets bit 62 in the global status register
 	 */
 	if (__test_and_clear_bit(62, (unsigned long *)&status)) {


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
  2014-06-13  8:44 [PATCH v2] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling HATAYAMA Daisuke
@ 2014-06-16 15:30 ` Don Zickus
  2014-06-30  9:24   ` HATAYAMA, Daisuke
  0 siblings, 1 reply; 7+ messages in thread
From: Don Zickus @ 2014-06-16 15:30 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: matt, peterz, acme, mingo, paulus, hpa, tglx, x86, linux-kernel

On Fri, Jun 13, 2014 at 05:44:37PM +0900, HATAYAMA Daisuke wrote:
> Currently, a NMI handler for NMI watchdog may falsely handle any NMI
> signaled for different purpose if CondChgd bit in
> MSR_CORE_PERF_GLOBAL_STATUS MSR is set.
> 
> This commit deals with the issue simply by ignoring CondChgd bit.
> 
> Here is explanation in detail.
> 
> On x86 NMI watchdog uses performance monitoring feature to
> periodically signal NMI each time performance counter gets overflowed.
> 
> intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI
> handler of NMI watchdog, perf_event_nmi_handler(). It identifies an
> owner of a given NMI by looking at overflow status bits in
> MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it
> handles the given NMI as its own NMI.
> 
> The problem is that the intel_pmu_handle_irq() doesn't distinguish
> CondChgd bit from other bits. Unlike the other status bits, CondChgd
> bit doesn't represent overflow status for performance counters. Thus,
> CondChgd bit cannot be thought of as a mark indicating a given NMI is
> NMI watchdog's. As a result, if CondChgd bit is set, any NMI is
> falsely handled by the NMI handler of NMI watchdog. Also, if type of
> the falsely handled NMI is either NMI_UNKNOWN, NMI_SERR or
> NMI_IO_CHECK, the corresponding action is never performed until
> CondChgd bit is cleared.
> 
> I noticed this behavior on systems with Ivy Bridge processors: Intel
> Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems,
> CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set
> in the beginning at boot. Then the CondChgd bit is immediately cleared
> by next wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain
> 0.
> 
> On the other hand, on older processors such as Nehalem, Xeon E7540,
> CondChgd bit is not set in the beginning at boot.
> 
> I'm not sure about exact behavior of CondChgd bit, in particular when
> this bit is set. Although I read Intel System Programmer's Manual to
> figure out that, the descriptions I found are:
> 
>   In 18.9.1:
> 
>   "The MSR_PERF_GLOBAL_STATUS MSR also provides a ‘sticky bit’ to
>    indicate changes to the state of performancmonitoring hardware"
> 
>   In Table 35-2 IA-32 Architectural MSRs
> 
>   63 CondChg: status bits of this register has changed.
> 
> These are different from the bahviour I see on the actual system as I
> explained above.
> 
> At least, I think ignoring CondChgd bit should be enough for NMI
> watchdog perspective.

As I said in a previous email, I ran into a similar problem and was going
to solve it by zeroing out all the registers on init (which probably would
have upset Peter :-) ).  This is a smaller solution and seems ok.  The
only downside is it is called in the nmi handler.


I am working with our customer to try and talk with Intel why this bit is
set to begin with.  Our customer says their BIOS doesn't use the PMU
during boot so it wasn't clear why this is now set on IVBs (though I don't
see them on Intel whitebox IVBs).

I am for this patch as it solves our problem too.  But it makes me wonder
if this is just yet another workaround for something broken deeper.

Acked-by: Don Zickus <dzickus@redhat.com>


> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> ---
>  arch/x86/kernel/cpu/perf_event_intel.c |    9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
> index adb02aa..07846d7 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> @@ -1382,6 +1382,15 @@ again:
>  	intel_pmu_lbr_read();
>  
>  	/*
> +	 * CondChgd bit 63 doesn't mean any overflow status. Ignore
> +	 * and clear the bit.
> +	 */
> +	if (__test_and_clear_bit(63, (unsigned long *)&status)) {
> +		if (!status)
> +			goto done;
> +	}
> +
> +	/*
>  	 * PEBS overflow sets bit 62 in the global status register
>  	 */
>  	if (__test_and_clear_bit(62, (unsigned long *)&status)) {
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
  2014-06-16 15:30 ` Don Zickus
@ 2014-06-30  9:24   ` HATAYAMA, Daisuke
  2014-06-30 22:22     ` Andi Kleen
  0 siblings, 1 reply; 7+ messages in thread
From: HATAYAMA, Daisuke @ 2014-06-30  9:24 UTC (permalink / raw)
  To: hpa, ak
  Cc: Don Zickus, matt, peterz, acme, mingo, paulus, tglx, x86, linux-kernel

Hello,

(2014/06/17 0:30), Don Zickus wrote:
> On Fri, Jun 13, 2014 at 05:44:37PM +0900, HATAYAMA Daisuke wrote:
>> Currently, a NMI handler for NMI watchdog may falsely handle any NMI
>> signaled for different purpose if CondChgd bit in
>> MSR_CORE_PERF_GLOBAL_STATUS MSR is set.
>>
>> This commit deals with the issue simply by ignoring CondChgd bit.
>>
>> Here is explanation in detail.
>>
>> On x86 NMI watchdog uses performance monitoring feature to
>> periodically signal NMI each time performance counter gets overflowed.
>>
>> intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI
>> handler of NMI watchdog, perf_event_nmi_handler(). It identifies an
>> owner of a given NMI by looking at overflow status bits in
>> MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it
>> handles the given NMI as its own NMI.
>>
>> The problem is that the intel_pmu_handle_irq() doesn't distinguish
>> CondChgd bit from other bits. Unlike the other status bits, CondChgd
>> bit doesn't represent overflow status for performance counters. Thus,
>> CondChgd bit cannot be thought of as a mark indicating a given NMI is
>> NMI watchdog's. As a result, if CondChgd bit is set, any NMI is
>> falsely handled by the NMI handler of NMI watchdog. Also, if type of
>> the falsely handled NMI is either NMI_UNKNOWN, NMI_SERR or
>> NMI_IO_CHECK, the corresponding action is never performed until
>> CondChgd bit is cleared.
>>
>> I noticed this behavior on systems with Ivy Bridge processors: Intel
>> Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems,
>> CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set
>> in the beginning at boot. Then the CondChgd bit is immediately cleared
>> by next wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain
>> 0.
>>
>> On the other hand, on older processors such as Nehalem, Xeon E7540,
>> CondChgd bit is not set in the beginning at boot.
>>
>> I'm not sure about exact behavior of CondChgd bit, in particular when
>> this bit is set. Although I read Intel System Programmer's Manual to
>> figure out that, the descriptions I found are:
>>
>>    In 18.9.1:
>>
>>    "The MSR_PERF_GLOBAL_STATUS MSR also provides a ‘sticky bit’ to
>>     indicate changes to the state of performancmonitoring hardware"
>>
>>    In Table 35-2 IA-32 Architectural MSRs
>>
>>    63 CondChg: status bits of this register has changed.
>>
>> These are different from the bahviour I see on the actual system as I
>> explained above.
>>
>> At least, I think ignoring CondChgd bit should be enough for NMI
>> watchdog perspective.
>
> As I said in a previous email, I ran into a similar problem and was going
> to solve it by zeroing out all the registers on init (which probably would
> have upset Peter :-) ).  This is a smaller solution and seems ok.  The
> only downside is it is called in the nmi handler.
>
>
> I am working with our customer to try and talk with Intel why this bit is
> set to begin with.  Our customer says their BIOS doesn't use the PMU
> during boot so it wasn't clear why this is now set on IVBs (though I don't
> see them on Intel whitebox IVBs).
>

I'm also interested in the behaviour of CondChgd bit on Ivy Bridge processors.

Do you know something about this behaviour?

-- 
Thanks.
HATAYAMA, Daisuke


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
  2014-06-30  9:24   ` HATAYAMA, Daisuke
@ 2014-06-30 22:22     ` Andi Kleen
  2014-07-01  8:14       ` Peter Zijlstra
                         ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Andi Kleen @ 2014-06-30 22:22 UTC (permalink / raw)
  To: HATAYAMA, Daisuke
  Cc: hpa, Don Zickus, matt, peterz, acme, mingo, paulus, tglx, x86,
	linux-kernel

> 
> I'm also interested in the behaviour of CondChgd bit on Ivy Bridge processors.

The intended meaning of CondChgd is that a hardware debugger has taken over the
PMU. It shouldn't really be set in other circumstances.

I think right now for perf it would be best to just ignore it.

In theory could stop using the PMU, but if some BIOS set it it would
completely disable perf there. So better to just ignore it.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
  2014-06-30 22:22     ` Andi Kleen
@ 2014-07-01  8:14       ` Peter Zijlstra
  2014-07-01 14:22       ` Don Zickus
  2014-07-02  1:07       ` HATAYAMA Daisuke
  2 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2014-07-01  8:14 UTC (permalink / raw)
  To: Andi Kleen
  Cc: HATAYAMA, Daisuke, hpa, Don Zickus, matt, acme, mingo, paulus,
	tglx, x86, linux-kernel

On Mon, Jun 30, 2014 at 03:22:24PM -0700, Andi Kleen wrote:
> > 
> > I'm also interested in the behaviour of CondChgd bit on Ivy Bridge processors.
> 
> The intended meaning of CondChgd is that a hardware debugger has taken over the
> PMU. It shouldn't really be set in other circumstances.

The SDM utterly and totally fails to mention this, please fix that.

> I think right now for perf it would be best to just ignore it.
> 
> In theory could stop using the PMU, but if some BIOS set it it would
> completely disable perf there. So better to just ignore it.

The even better option would be to kill off SMM mode, all that ever does
is create problems.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
  2014-06-30 22:22     ` Andi Kleen
  2014-07-01  8:14       ` Peter Zijlstra
@ 2014-07-01 14:22       ` Don Zickus
  2014-07-02  1:07       ` HATAYAMA Daisuke
  2 siblings, 0 replies; 7+ messages in thread
From: Don Zickus @ 2014-07-01 14:22 UTC (permalink / raw)
  To: Andi Kleen
  Cc: HATAYAMA, Daisuke, hpa, matt, peterz, acme, mingo, paulus, tglx,
	x86, linux-kernel

On Mon, Jun 30, 2014 at 03:22:24PM -0700, Andi Kleen wrote:
> > 
> > I'm also interested in the behaviour of CondChgd bit on Ivy Bridge processors.
> 
> The intended meaning of CondChgd is that a hardware debugger has taken over the
> PMU. It shouldn't really be set in other circumstances.

Interesting.  My concern is we have customers that can soft reboot a
machine and have this bit accidentally be enabled.  I am pretty sure they
do not have a hardware debugger attached (I would have to double check).
Is there other ways this can be set?

Cheers,
Don

> 
> I think right now for perf it would be best to just ignore it.
> 
> In theory could stop using the PMU, but if some BIOS set it it would
> completely disable perf there. So better to just ignore it.
> 
> -Andi
> 
> -- 
> ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
  2014-06-30 22:22     ` Andi Kleen
  2014-07-01  8:14       ` Peter Zijlstra
  2014-07-01 14:22       ` Don Zickus
@ 2014-07-02  1:07       ` HATAYAMA Daisuke
  2 siblings, 0 replies; 7+ messages in thread
From: HATAYAMA Daisuke @ 2014-07-02  1:07 UTC (permalink / raw)
  To: ak
  Cc: hpa, dzickus, matt, peterz, acme, mingo, paulus, tglx, x86, linux-kernel

From: Andi Kleen <ak@linux.intel.com>
Subject: Re: [PATCH v2] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
Date: Mon, 30 Jun 2014 15:22:24 -0700

>> 
>> I'm also interested in the behaviour of CondChgd bit on Ivy Bridge processors.
> 
> The intended meaning of CondChgd is that a hardware debugger has taken over the
> PMU. It shouldn't really be set in other circumstances.
> 

Thanks for your explanation.

The hardware debugger you mean is a kind of ITP?

> I think right now for perf it would be best to just ignore it.
> 
> In theory could stop using the PMU, but if some BIOS set it it would
> completely disable perf there. So better to just ignore it.
> 

I'll reflect this information in the patch description.

--
Thanks.
HATAYAMA, Daisuke


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-07-02  1:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-13  8:44 [PATCH v2] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling HATAYAMA Daisuke
2014-06-16 15:30 ` Don Zickus
2014-06-30  9:24   ` HATAYAMA, Daisuke
2014-06-30 22:22     ` Andi Kleen
2014-07-01  8:14       ` Peter Zijlstra
2014-07-01 14:22       ` Don Zickus
2014-07-02  1:07       ` HATAYAMA Daisuke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.