All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] perf/x86: read the FREEZE_WHILE_SMM bit during boot
@ 2018-06-03 18:23 David Arcari
  2018-06-04  8:24 ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: David Arcari @ 2018-06-03 18:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Arcari, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Peter Zijlstra, Andi Kleen, Kan Liang, Jiri Olsa, Donald Zickus,
	Prarit Bhargava, Jerry Hoemann

On some systems pressing the external NMI button is now failing to inject
an NMI 5-10% of the time.  This causes confusion for a user that expects
the NMI to dump the system.

Commit 6089327f5424 ("perf/x86: Add sysfs entry to freeze counters on SMI")
does not read the firmware setting of the FREEZE_WHILE_SMM bit and will
always clear it when the PMU is initialized.  As a result the performance
counters will always run and that greatly expands the race in which
external NMI will not be processed if a local NMI is already being
processed.

One option is to change default_do_nmi().  The code snippet below shows the
relevant portion of a patch that resolves the issue, but it is problematic
from a performance perspective and was dismissed.

-345,7 +345,17 @@ static void default_do_nmi(struct pt_regs *regs)
 		 */
 		if (handled > 1)
 			__this_cpu_write(swallow_nmi, true);
-		return;
+
+		/*
+		 * Unfortunately, there is a race condition which can
+		 * result in a missing an external NMI.  Typically, an
+		 * external NMI is processed on cpu 0.  Therefore, on
+		 * cpu 0 check for an external NMI before returning.
+		 */
+		if (smp_processor_id() ||
+		    (x86_platform.get_nmi_reason() & NMI_REASON_MASK) == 0) {
+			return;
+		}
 	}

Ultimately, the issue can be resolved by storing the default firmware
setting of FREEZE_WHILE_SMM before initializing the PMU.

Fixes: 6089327f5424 ("perf/x86: Add sysfs entry to freeze counters on SMI")

Signed-off-by: David Arcari <darcari@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Donald Zickus <dzickus@redhat.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Jerry Hoemann <jerry.hoemann@hpe.com>
---
 arch/x86/events/intel/core.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 707b2a9..fce98df 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3325,6 +3325,18 @@ static void flip_smm_bit(void *data)
 	}
 }
 
+static int read_smm_bit(void)
+{
+	u64 val;
+
+	if (!rdmsrl_safe(MSR_IA32_DEBUGCTLMSR, &val)) {
+		if (val & DEBUGCTLMSR_FREEZE_IN_SMM)
+			return 1;
+	}
+
+	return 0;
+}
+
 static void intel_pmu_cpu_starting(int cpu)
 {
 	struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
@@ -4423,6 +4435,8 @@ __init int intel_pmu_init(void)
 		pr_cont("full-width counters, ");
 	}
 
+	x86_pmu.attr_freeze_on_smi = read_smm_bit();
+
 	kfree(to_free);
 	return 0;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf/x86: read the FREEZE_WHILE_SMM bit during boot
  2018-06-03 18:23 [PATCH] perf/x86: read the FREEZE_WHILE_SMM bit during boot David Arcari
@ 2018-06-04  8:24 ` Peter Zijlstra
  2018-06-04 14:12   ` David Arcari
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2018-06-04  8:24 UTC (permalink / raw)
  To: David Arcari
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andi Kleen, Kan Liang, Jiri Olsa, Donald Zickus, Prarit Bhargava,
	Jerry Hoemann

On Sun, Jun 03, 2018 at 02:23:43PM -0400, David Arcari wrote:
> On some systems pressing the external NMI button is now failing to inject
> an NMI 5-10% of the time.  This causes confusion for a user that expects
> the NMI to dump the system.
> 
> Commit 6089327f5424 ("perf/x86: Add sysfs entry to freeze counters on SMI")
> does not read the firmware setting of the FREEZE_WHILE_SMM bit and will
> always clear it when the PMU is initialized.  As a result the performance
> counters will always run and that greatly expands the race in which
> external NMI will not be processed if a local NMI is already being
> processed.
> 
> One option is to change default_do_nmi().  The code snippet below shows the
> relevant portion of a patch that resolves the issue, but it is problematic
> from a performance perspective and was dismissed.
> 
> -345,7 +345,17 @@ static void default_do_nmi(struct pt_regs *regs)
>  		 */
>  		if (handled > 1)
>  			__this_cpu_write(swallow_nmi, true);
> -		return;
> +
> +		/*
> +		 * Unfortunately, there is a race condition which can
> +		 * result in a missing an external NMI.  Typically, an
> +		 * external NMI is processed on cpu 0.  Therefore, on
> +		 * cpu 0 check for an external NMI before returning.
> +		 */
> +		if (smp_processor_id() ||
> +		    (x86_platform.get_nmi_reason() & NMI_REASON_MASK) == 0) {
> +			return;
> +		}
>  	}
> 
> Ultimately, the issue can be resolved by storing the default firmware
> setting of FREEZE_WHILE_SMM before initializing the PMU.

I'm sorry, I know it's Monday morning, but what?! I really don't
understand anything you write there.

Maybe if you explain the race and how your proposed fix closes it things
will make sense. The above refers to too many things not here.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf/x86: read the FREEZE_WHILE_SMM bit during boot
  2018-06-04  8:24 ` Peter Zijlstra
@ 2018-06-04 14:12   ` David Arcari
  2018-06-11 17:57     ` David Arcari
  2018-06-12 16:56     ` Peter Zijlstra
  0 siblings, 2 replies; 6+ messages in thread
From: David Arcari @ 2018-06-04 14:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andi Kleen, Kan Liang, Jiri Olsa, Donald Zickus, Prarit Bhargava,
	Jerry Hoemann

On 06/04/2018 04:24 AM, Peter Zijlstra wrote:
> On Sun, Jun 03, 2018 at 02:23:43PM -0400, David Arcari wrote:
>> On some systems pressing the external NMI button is now failing to inject
>> an NMI 5-10% of the time.  This causes confusion for a user that expects
>> the NMI to dump the system.
>>
>> Commit 6089327f5424 ("perf/x86: Add sysfs entry to freeze counters on SMI")
>> does not read the firmware setting of the FREEZE_WHILE_SMM bit and will
>> always clear it when the PMU is initialized.  As a result the performance
>> counters will always run and that greatly expands the race in which
>> external NMI will not be processed if a local NMI is already being
>> processed.
>>
>> One option is to change default_do_nmi().  The code snippet below shows the
>> relevant portion of a patch that resolves the issue, but it is problematic
>> from a performance perspective and was dismissed.
>>
>> -345,7 +345,17 @@ static void default_do_nmi(struct pt_regs *regs)
>>  		 */
>>  		if (handled > 1)
>>  			__this_cpu_write(swallow_nmi, true);
>> -		return;
>> +
>> +		/*
>> +		 * Unfortunately, there is a race condition which can
>> +		 * result in a missing an external NMI.  Typically, an
>> +		 * external NMI is processed on cpu 0.  Therefore, on
>> +		 * cpu 0 check for an external NMI before returning.
>> +		 */
>> +		if (smp_processor_id() ||
>> +		    (x86_platform.get_nmi_reason() & NMI_REASON_MASK) == 0) {
>> +			return;
>> +		}
>>  	}
>>
>> Ultimately, the issue can be resolved by storing the default firmware
>> setting of FREEZE_WHILE_SMM before initializing the PMU.
> 
> I'm sorry, I know it's Monday morning, but what?! I really don't
> understand anything you write there.
> 
> Maybe if you explain the race and how your proposed fix closes it things
> will make sense. The above refers to too many things not here.
> 

Sorry.

default_do_nmi() will process both perf events (local interrupts) as well as
external interrupts (such as the NMI button).  The handler is coded such that
if a local interrupt occurs, no check is made for an external interrupt.
Therefore, if the two interrupts occur simultaneously, the external interrupt
is lost.

The code above, which was ultimately discounted, attempts to avoid this
scenario with as little performance impact as possible by reading the register
without the spinlock for cpu 0 only (currently only cpu 0 can handle an
external NMI, I verified this on my system by testing the NMI button with
cpu 0 offline).

The code above is problematic for a number of reasons not the least of which
is performance.  Furthermore, I don't see a less intrusive solution wrt
do_default_nmi().

Upstream 6089327f5424 ("perf/x86: Add sysfs entry to freeze counters on SMI")
appears to have made it relatively easy to hit this race condition.  On some
systems, this commit has resulted in a change to the default firmware setting
of DEBUGCTLMSR_FREEZE_IN_SMM_BIT (it is now cleared by the OS by default).

With this bit cleared, the following situation occurs:

1) external NMI - due to io check
2) long duration SMI (counters do not freeze)
3) NMI handler runs and misattributes interrupt to perf event

Ultimately, my solution was to restore the previous behavior by reading and
storing the firmware setting of the bit rather than to always clear it.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf/x86: read the FREEZE_WHILE_SMM bit during boot
  2018-06-04 14:12   ` David Arcari
@ 2018-06-11 17:57     ` David Arcari
  2018-06-12 16:56     ` Peter Zijlstra
  1 sibling, 0 replies; 6+ messages in thread
From: David Arcari @ 2018-06-11 17:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andi Kleen, Kan Liang, Jiri Olsa, Donald Zickus, Prarit Bhargava,
	Jerry Hoemann

On 06/04/2018 10:12 AM, David Arcari wrote:
> On 06/04/2018 04:24 AM, Peter Zijlstra wrote:
>> On Sun, Jun 03, 2018 at 02:23:43PM -0400, David Arcari wrote:
>>> On some systems pressing the external NMI button is now failing to inject
>>> an NMI 5-10% of the time.  This causes confusion for a user that expects
>>> the NMI to dump the system.
>>>
>>> Commit 6089327f5424 ("perf/x86: Add sysfs entry to freeze counters on SMI")
>>> does not read the firmware setting of the FREEZE_WHILE_SMM bit and will
>>> always clear it when the PMU is initialized.  As a result the performance
>>> counters will always run and that greatly expands the race in which
>>> external NMI will not be processed if a local NMI is already being
>>> processed.
>>>
>>> One option is to change default_do_nmi().  The code snippet below shows the
>>> relevant portion of a patch that resolves the issue, but it is problematic
>>> from a performance perspective and was dismissed.
>>>
>>> -345,7 +345,17 @@ static void default_do_nmi(struct pt_regs *regs)
>>>  		 */
>>>  		if (handled > 1)
>>>  			__this_cpu_write(swallow_nmi, true);
>>> -		return;
>>> +
>>> +		/*
>>> +		 * Unfortunately, there is a race condition which can
>>> +		 * result in a missing an external NMI.  Typically, an
>>> +		 * external NMI is processed on cpu 0.  Therefore, on
>>> +		 * cpu 0 check for an external NMI before returning.
>>> +		 */
>>> +		if (smp_processor_id() ||
>>> +		    (x86_platform.get_nmi_reason() & NMI_REASON_MASK) == 0) {
>>> +			return;
>>> +		}
>>>  	}
>>>
>>> Ultimately, the issue can be resolved by storing the default firmware
>>> setting of FREEZE_WHILE_SMM before initializing the PMU.
>>
>> I'm sorry, I know it's Monday morning, but what?! I really don't
>> understand anything you write there.
>>
>> Maybe if you explain the race and how your proposed fix closes it things
>> will make sense. The above refers to too many things not here.
>>
> 
> Sorry.
> 
> default_do_nmi() will process both perf events (local interrupts) as well as
> external interrupts (such as the NMI button).  The handler is coded such that
> if a local interrupt occurs, no check is made for an external interrupt.
> Therefore, if the two interrupts occur simultaneously, the external interrupt
> is lost.
> 
> The code above, which was ultimately discounted, attempts to avoid this
> scenario with as little performance impact as possible by reading the register
> without the spinlock for cpu 0 only (currently only cpu 0 can handle an
> external NMI, I verified this on my system by testing the NMI button with
> cpu 0 offline).
> 
> The code above is problematic for a number of reasons not the least of which
> is performance.  Furthermore, I don't see a less intrusive solution wrt
> do_default_nmi().
> 
> Upstream 6089327f5424 ("perf/x86: Add sysfs entry to freeze counters on SMI")
> appears to have made it relatively easy to hit this race condition.  On some
> systems, this commit has resulted in a change to the default firmware setting
> of DEBUGCTLMSR_FREEZE_IN_SMM_BIT (it is now cleared by the OS by default).
> 
> With this bit cleared, the following situation occurs:
> 
> 1) external NMI - due to io check
> 2) long duration SMI (counters do not freeze)
> 3) NMI handler runs and misattributes interrupt to perf event
> 
> Ultimately, my solution was to restore the previous behavior by reading and
> storing the firmware setting of the bit rather than to always clear it.
> 

Hi Peter,

Have you had a chance to take a look at this?

Thanks,

-Dave

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf/x86: read the FREEZE_WHILE_SMM bit during boot
  2018-06-04 14:12   ` David Arcari
  2018-06-11 17:57     ` David Arcari
@ 2018-06-12 16:56     ` Peter Zijlstra
  2018-06-18 19:14       ` David Arcari
  1 sibling, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2018-06-12 16:56 UTC (permalink / raw)
  To: David Arcari
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andi Kleen, Kan Liang, Jiri Olsa, Donald Zickus, Prarit Bhargava,
	Jerry Hoemann

On Mon, Jun 04, 2018 at 10:12:20AM -0400, David Arcari wrote:
> default_do_nmi() will process both perf events (local interrupts) as well as
> external interrupts (such as the NMI button).  The handler is coded such that
> if a local interrupt occurs, no check is made for an external interrupt.
> Therefore, if the two interrupts occur simultaneously, the external interrupt
> is lost.

ACK, NMI handling on x86 is less than ideal.

> The code above, which was ultimately discounted, attempts to avoid this
> scenario with as little performance impact as possible by reading the register
> without the spinlock for cpu 0 only (currently only cpu 0 can handle an
> external NMI, I verified this on my system by testing the NMI button with
> cpu 0 offline).
> 
> The code above is problematic for a number of reasons not the least of which
> is performance.  Furthermore, I don't see a less intrusive solution wrt
> do_default_nmi().

Right, because reading the register itself is dog slow IIRC.

> Upstream 6089327f5424 ("perf/x86: Add sysfs entry to freeze counters on SMI")
> appears to have made it relatively easy to hit this race condition.  On some
> systems, this commit has resulted in a change to the default firmware setting
> of DEBUGCTLMSR_FREEZE_IN_SMM_BIT (it is now cleared by the OS by default).
> 
> With this bit cleared, the following situation occurs:
> 
> 1) external NMI - due to io check
> 2) long duration SMI (counters do not freeze)
> 3) NMI handler runs and misattributes interrupt to perf event

I think 3 is wrong, because 2 will in fact have triggered a PMI (due to
long running) so 3 will observe a PMI and claim the NMI. No
misattribution what so ever.

Because if this wasn't the case, flipping FREEZE_IN_SMM wouldn't have
made a difference.

> Ultimately, my solution was to restore the previous behavior by reading and
> storing the firmware setting of the bit rather than to always clear it.

Ah, urgh.. what a mess. So the OS setting the bit to a known and
consistent value is 'good' IMO. The firmware magically frobbing things
is 'bad'.

Now, explain to me why an IO-check results in an external NMI, and why
there are long running SMI handlers around? Why can't the IO error not
be propagated through the regular device interrupt/state? Why are long
running SMIs required at all, ever? Why doesn't the OS handler whatever
it is the SMM does?

Are you not solving the wrong problem here?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf/x86: read the FREEZE_WHILE_SMM bit during boot
  2018-06-12 16:56     ` Peter Zijlstra
@ 2018-06-18 19:14       ` David Arcari
  0 siblings, 0 replies; 6+ messages in thread
From: David Arcari @ 2018-06-18 19:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andi Kleen, Kan Liang, Jiri Olsa, Donald Zickus, Prarit Bhargava,
	Jerry Hoemann

On 06/12/2018 12:56 PM, Peter Zijlstra wrote:

> 
>> Ultimately, my solution was to restore the previous behavior by reading and
>> storing the firmware setting of the bit rather than to always clear it.
> 
> Ah, urgh.. what a mess. So the OS setting the bit to a known and
> consistent value is 'good' IMO. The firmware magically frobbing things
> is 'bad'.

I had actually considered changing the code to enable the FREEZE_WHILE_SMM by
default, but decided against this approach as I was concerned that setting the
bit on a system where it is initially cleared by the firmware could also have
negative side effects.

> 
> Now, explain to me why an IO-check results in an external NMI, and why
> there are long running SMI handlers around? Why can't the IO error not
> be propagated through the regular device interrupt/state? Why are long
> running SMIs required at all, ever? Why doesn't the OS handler whatever
> it is the SMM does?
> 
> Are you not solving the wrong problem here?
> 

I didn't think so.

1) This functionality was working reasonably well before this commit was
   introduced into the OS.

2) As discussed, the problem cannot be addressed in the NMI handler.

3) The work around that I proposed is quite unobtrusive. You are correct in
   this is not an actual "fix" since the problem will be present if the user
   decides to change the setting of FREEZE_WHILE_SMM via sysfs, but at least
   external NMIs are functional by default.

IIUC, you are proposing a complete rewrite of the external NMI infrastructure
along with modification to system firmware.  I also believe this would be
somewhat problematic as an external NMI would not function when interrupts are
disabled.

Is there an alternate solution that could provide relief in the short term?  I
think that what I have proposed accomplishes this, but perhaps there is a better
more palatable alternative.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-06-18 19:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-03 18:23 [PATCH] perf/x86: read the FREEZE_WHILE_SMM bit during boot David Arcari
2018-06-04  8:24 ` Peter Zijlstra
2018-06-04 14:12   ` David Arcari
2018-06-11 17:57     ` David Arcari
2018-06-12 16:56     ` Peter Zijlstra
2018-06-18 19:14       ` David Arcari

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.