linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] perf/x86/amd: Do not WARN on every IRQ
@ 2023-06-16 11:53 Breno Leitao
  2023-06-16 13:29 ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Breno Leitao @ 2023-06-16 11:53 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Thomas Gleixner, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Sandipan Das
  Cc: leit, dcostantino, open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM

On some systems, the Performance Counter Global Status Register is
coming with reserved bits set, which causes the system to be unusable
if a simple `perf top` runs. The system hits the WARN() thousands times
while perf runs.

WARNING: CPU: 18 PID: 20608 at arch/x86/events/amd/core.c:944 amd_pmu_v2_handle_irq+0x1be/0x2b0

This happens because the "Performance Counter Global Status Register"
(PerfCntGlobalStatus) MSR has bit 7 set. Bit 7 should be reserved according
to the documentation (Figure 13-12 from "AMD64 Architecture Programmer’s
Manual, Volume 2: System Programming, 24593"[1]

WARN_ONCE if any reserved bit is set, and sanitize the value to what the
code is handling, so the overflow events continue to be handled for the
number of events that are known to be sane.

Signed-off-by: Breno Leitao <leitao@debian.org>
Fixes: 7685665c390d ("perf/x86/amd/core: Add PerfMonV2 overflow handling")

[1] Link: https://www.amd.com/system/files/TechDocs/24593.pdf
---
 arch/x86/events/amd/core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index bccea57dee81..809ddb15c122 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -909,6 +909,10 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
 		status &= ~GLOBAL_STATUS_LBRS_FROZEN;
 	}
 
+	amd_pmu_global_cntr_mask = (1ULL << x86_pmu.num_counters) - 1;
+	WARN_ON_ONCE(status & ~amd_pmu_global_cntr_mask);
+	status &= amd_pmu_global_cntr_mask;
+
 	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
 		if (!test_bit(idx, cpuc->active_mask))
 			continue;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/x86/amd: Do not WARN on every IRQ
  2023-06-16 11:53 [PATCH] perf/x86/amd: Do not WARN on every IRQ Breno Leitao
@ 2023-06-16 13:29 ` Peter Zijlstra
  2023-06-16 14:03   ` Breno Leitao
  2023-09-13 16:24   ` Breno Leitao
  0 siblings, 2 replies; 22+ messages in thread
From: Peter Zijlstra @ 2023-06-16 13:29 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Ian Rogers,
	Adrian Hunter, Thomas Gleixner, Borislav Petkov, Dave Hansen,
	x86, H. Peter Anvin, Sandipan Das, leit, dcostantino,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM

On Fri, Jun 16, 2023 at 04:53:15AM -0700, Breno Leitao wrote:
> On some systems, the Performance Counter Global Status Register is
> coming with reserved bits set, which causes the system to be unusable
> if a simple `perf top` runs. The system hits the WARN() thousands times
> while perf runs.
> 
> WARNING: CPU: 18 PID: 20608 at arch/x86/events/amd/core.c:944 amd_pmu_v2_handle_irq+0x1be/0x2b0
> 
> This happens because the "Performance Counter Global Status Register"
> (PerfCntGlobalStatus) MSR has bit 7 set. Bit 7 should be reserved according
> to the documentation (Figure 13-12 from "AMD64 Architecture Programmer’s
> Manual, Volume 2: System Programming, 24593"[1]

Would it then not make more sense to mask out bit7 before:

+	status &= ~AMD_PMU_V2_GLOBAL_STATUS_RESERVED;
	if (!status)
		goto done;

?

Aside from being reserved, why are these bits magically set all of a
sudden?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/x86/amd: Do not WARN on every IRQ
  2023-06-16 13:29 ` Peter Zijlstra
@ 2023-06-16 14:03   ` Breno Leitao
  2023-06-16 14:43     ` Sandipan Das (AMD)
  2023-09-13 16:24   ` Breno Leitao
  1 sibling, 1 reply; 22+ messages in thread
From: Breno Leitao @ 2023-06-16 14:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Ian Rogers,
	Adrian Hunter, Thomas Gleixner, Borislav Petkov, Dave Hansen,
	x86, H. Peter Anvin, Sandipan Das, leit, dcostantino,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM

On Fri, Jun 16, 2023 at 03:29:54PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 16, 2023 at 04:53:15AM -0700, Breno Leitao wrote:
> > On some systems, the Performance Counter Global Status Register is
> > coming with reserved bits set, which causes the system to be unusable
> > if a simple `perf top` runs. The system hits the WARN() thousands times
> > while perf runs.
> > 
> > WARNING: CPU: 18 PID: 20608 at arch/x86/events/amd/core.c:944 amd_pmu_v2_handle_irq+0x1be/0x2b0
> > 
> > This happens because the "Performance Counter Global Status Register"
> > (PerfCntGlobalStatus) MSR has bit 7 set. Bit 7 should be reserved according
> > to the documentation (Figure 13-12 from "AMD64 Architecture Programmer’s
> > Manual, Volume 2: System Programming, 24593"[1]
> 
> Would it then not make more sense to mask out bit7 before:

It is more than bit 7. This is the register structure according to the document
above:

Bits 		Mnemonic		Description		 		Access type
63:60	        Reserved RO
59 		PMCF			Performance Counter Freeze		RO
58 		LBRSF			Last Branch Record Stack Freeze 	RO
57:6 		Reserved				 			RO
5:0 		CNT_OF 			Counter overflow for PerfCnt[5:0] 	RO

In the code, bit GLOBAL_STATUS_LBRS_FROZEN is handled and cleared before
we reach my changes

That said, your approach is almost similar to what I did, and I will be happy
to change in order to make the code clearer.

> +	status &= ~AMD_PMU_V2_GLOBAL_STATUS_RESERVED;
> 	if (!status)
> 		goto done;
> 
> ?
> 
> Aside from being reserved, why are these bits magically set all of a
> sudden?

That is probably a question to AMD.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/x86/amd: Do not WARN on every IRQ
  2023-06-16 14:03   ` Breno Leitao
@ 2023-06-16 14:43     ` Sandipan Das (AMD)
  2023-06-16 15:32       ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Sandipan Das (AMD) @ 2023-06-16 14:43 UTC (permalink / raw)
  To: Breno Leitao, Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Ian Rogers,
	Adrian Hunter, Thomas Gleixner, Borislav Petkov, Dave Hansen,
	x86, H. Peter Anvin, Sandipan Das, leit, dcostantino,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM, ananth.narayan

On 16/06/23 7:33 pm, Breno Leitao wrote:
> On Fri, Jun 16, 2023 at 03:29:54PM +0200, Peter Zijlstra wrote:
>> On Fri, Jun 16, 2023 at 04:53:15AM -0700, Breno Leitao wrote:
>>> On some systems, the Performance Counter Global Status Register is
>>> coming with reserved bits set, which causes the system to be unusable
>>> if a simple `perf top` runs. The system hits the WARN() thousands times
>>> while perf runs.
>>>
>>> WARNING: CPU: 18 PID: 20608 at arch/x86/events/amd/core.c:944 amd_pmu_v2_handle_irq+0x1be/0x2b0
>>>
>>> This happens because the "Performance Counter Global Status Register"
>>> (PerfCntGlobalStatus) MSR has bit 7 set. Bit 7 should be reserved according
>>> to the documentation (Figure 13-12 from "AMD64 Architecture Programmer’s
>>> Manual, Volume 2: System Programming, 24593"[1]
>>
>> Would it then not make more sense to mask out bit7 before:
> 
> It is more than bit 7. This is the register structure according to the document
> above:
> 
> Bits 		Mnemonic		Description		 		Access type
> 63:60	        Reserved RO
> 59 		PMCF			Performance Counter Freeze		RO
> 58 		LBRSF			Last Branch Record Stack Freeze 	RO
> 57:6 		Reserved				 			RO
> 5:0 		CNT_OF 			Counter overflow for PerfCnt[5:0] 	RO
> 
> In the code, bit GLOBAL_STATUS_LBRS_FROZEN is handled and cleared before
> we reach my changes
> 
> That said, your approach is almost similar to what I did, and I will be happy
> to change in order to make the code clearer.
> 
>> +	status &= ~AMD_PMU_V2_GLOBAL_STATUS_RESERVED;
>> 	if (!status)
>> 		goto done;
>>
>> ?
>>
>> Aside from being reserved, why are these bits magically set all of a
>> sudden?
> 
> That is probably a question to AMD.
> 

The reserved bits should never be set. The purpose of the WARN_ON() is to catch such anomalies.
I am working out the details with Breno and will report back with a resolution.

- Sandipan

[sending from my personal email as I currently don't have access to my work laptop]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/x86/amd: Do not WARN on every IRQ
  2023-06-16 14:43     ` Sandipan Das (AMD)
@ 2023-06-16 15:32       ` Peter Zijlstra
  2023-09-13 14:30         ` Jirka Hladky
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2023-06-16 15:32 UTC (permalink / raw)
  To: Sandipan Das (AMD)
  Cc: Breno Leitao, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Thomas Gleixner, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Sandipan Das, leit,
	dcostantino, open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM, ananth.narayan,
	Ravi Bangoria

On Fri, Jun 16, 2023 at 08:13:22PM +0530, Sandipan Das (AMD) wrote:
> The reserved bits should never be set. The purpose of the WARN_ON() is to catch such anomalies.
> I am working out the details with Breno and will report back with a resolution.

Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/x86/amd: Do not WARN on every IRQ
  2023-06-16 15:32       ` Peter Zijlstra
@ 2023-09-13 14:30         ` Jirka Hladky
  2023-09-13 15:03           ` Breno Leitao
  2023-09-13 16:18           ` Sandipan Das
  0 siblings, 2 replies; 22+ messages in thread
From: Jirka Hladky @ 2023-09-13 14:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sandipan Das (AMD),
	Breno Leitao, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Thomas Gleixner, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Sandipan Das, leit,
	dcostantino, open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM, ananth.narayan,
	Ravi Bangoria, Michael Petlan

Hi Sandipan,

Is there any update on this issue? We have hit the issue, and it makes
the server pretty unusable due to the thousands of Call Traces being
logged.

Thanks a lot!
Jirka


On Fri, Jun 16, 2023 at 5:34 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Jun 16, 2023 at 08:13:22PM +0530, Sandipan Das (AMD) wrote:
> > The reserved bits should never be set. The purpose of the WARN_ON() is to catch such anomalies.
> > I am working out the details with Breno and will report back with a resolution.
>
> Thanks!
>


-- 
-Jirka


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/x86/amd: Do not WARN on every IRQ
  2023-09-13 14:30         ` Jirka Hladky
@ 2023-09-13 15:03           ` Breno Leitao
  2023-09-13 15:23             ` Jirka Hladky
  2023-09-13 16:18           ` Sandipan Das
  1 sibling, 1 reply; 22+ messages in thread
From: Breno Leitao @ 2023-09-13 15:03 UTC (permalink / raw)
  To: Jirka Hladky
  Cc: Peter Zijlstra, Sandipan Das (AMD),
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Ian Rogers,
	Adrian Hunter, Thomas Gleixner, Borislav Petkov, Dave Hansen,
	x86, H. Peter Anvin, Sandipan Das, leit, dcostantino,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM, ananth.narayan,
	Ravi Bangoria, Michael Petlan

On Wed, Sep 13, 2023 at 04:30:44PM +0200, Jirka Hladky wrote:
> Hi Sandipan,
> 
> Is there any update on this issue?

I still think we want to have this patch in Linux, so, we protect the
kernel for whatever the hardware/firmware is doing.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/x86/amd: Do not WARN on every IRQ
  2023-09-13 15:03           ` Breno Leitao
@ 2023-09-13 15:23             ` Jirka Hladky
  0 siblings, 0 replies; 22+ messages in thread
From: Jirka Hladky @ 2023-09-13 15:23 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Peter Zijlstra, Sandipan Das (AMD),
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Ian Rogers,
	Adrian Hunter, Thomas Gleixner, Borislav Petkov, Dave Hansen,
	x86, H. Peter Anvin, Sandipan Das, leit, dcostantino,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM, ananth.narayan,
	Ravi Bangoria, Michael Petlan

I agree.

Are you planning v2 of the patch, addressing the points raised by
Peter Zijlstra?

On Wed, Sep 13, 2023 at 5:03 PM Breno Leitao <leitao@debian.org> wrote:
>
> On Wed, Sep 13, 2023 at 04:30:44PM +0200, Jirka Hladky wrote:
> > Hi Sandipan,
> >
> > Is there any update on this issue?
>
> I still think we want to have this patch in Linux, so, we protect the
> kernel for whatever the hardware/firmware is doing.
>


-- 
-Jirka


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/x86/amd: Do not WARN on every IRQ
  2023-09-13 14:30         ` Jirka Hladky
  2023-09-13 15:03           ` Breno Leitao
@ 2023-09-13 16:18           ` Sandipan Das
  2023-09-14  8:44             ` Jirka Hladky
       [not found]             ` <CAE4VaGAcWk0PYygNGcguRA2V2qK03entkv6BUsnxhS-ftdfywg@mail.gmail.com>
  1 sibling, 2 replies; 22+ messages in thread
From: Sandipan Das @ 2023-09-13 16:18 UTC (permalink / raw)
  To: Jirka Hladky
  Cc: Peter Zijlstra, Breno Leitao, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Adrian Hunter,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, leit, dcostantino,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM, ananth.narayan,
	Ravi Bangoria, Michael Petlan

Hi Jirka,

Can you please share the following?

1. Family, Model and Stepping of the processor
2. Microcode patch level
On 9/13/2023 8:00 PM, Jirka Hladky wrote:
> Hi Sandipan,
> 
> Is there any update on this issue? We have hit the issue, and it makes
> the server pretty unusable due to the thousands of Call Traces being
> logged.
> 
> Thanks a lot!
> Jirka
> 
> 
> On Fri, Jun 16, 2023 at 5:34 PM Peter Zijlstra <peterz@infradead.org> wrote:
>>
>> On Fri, Jun 16, 2023 at 08:13:22PM +0530, Sandipan Das (AMD) wrote:
>>> The reserved bits should never be set. The purpose of the WARN_ON() is to catch such anomalies.
>>> I am working out the details with Breno and will report back with a resolution.
>>
>> Thanks!
>>
> 
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/x86/amd: Do not WARN on every IRQ
  2023-06-16 13:29 ` Peter Zijlstra
  2023-06-16 14:03   ` Breno Leitao
@ 2023-09-13 16:24   ` Breno Leitao
  2023-09-14  8:45     ` Jirka Hladky
  1 sibling, 1 reply; 22+ messages in thread
From: Breno Leitao @ 2023-09-13 16:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Ian Rogers,
	Adrian Hunter, Thomas Gleixner, Borislav Petkov, Dave Hansen,
	x86, H. Peter Anvin, Sandipan Das, leit, dcostantino,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM, Jirka Hladky

Hi Peter,

On Fri, Jun 16, 2023 at 03:29:54PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 16, 2023 at 04:53:15AM -0700, Breno Leitao wrote:
> > On some systems, the Performance Counter Global Status Register is
> > coming with reserved bits set, which causes the system to be unusable
> > if a simple `perf top` runs. The system hits the WARN() thousands times
> > while perf runs.
> > 
> > WARNING: CPU: 18 PID: 20608 at arch/x86/events/amd/core.c:944 amd_pmu_v2_handle_irq+0x1be/0x2b0
> > 
> > This happens because the "Performance Counter Global Status Register"
> > (PerfCntGlobalStatus) MSR has bit 7 set. Bit 7 should be reserved according
> > to the documentation (Figure 13-12 from "AMD64 Architecture Programmer’s
> > Manual, Volume 2: System Programming, 24593"[1]
> 
> Would it then not make more sense to mask out bit7 before:
> 
> +	status &= ~AMD_PMU_V2_GLOBAL_STATUS_RESERVED;
> 	if (!status)
> 		goto done;

Instead of masking `status` against AMD_PMU_V2_GLOBAL_STATUS_RESERVED
(AMD64_NUM_COUNTERS?), I opted for using the `amd_pmu_global_cntr_mask`
global variable because it seems to represent what the loop below is
iterating over:

	/* PMC Enable and Overflow bits for PerfCntrGlobal* registers */
	static u64 amd_pmu_global_cntr_mask __read_mostly;

Also, I think we want to WARN_ON_ONCE() if we see this problem. Right
now, it warns at every time we call this function, which makes the
machine unusable, but, warning it once could be helpful to figure out
there is something wrong with the machine/firmware.

Anyway, please let me know whatever is your preferred way and I will
submit a v2.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/x86/amd: Do not WARN on every IRQ
  2023-09-13 16:18           ` Sandipan Das
@ 2023-09-14  8:44             ` Jirka Hladky
       [not found]             ` <CAE4VaGAcWk0PYygNGcguRA2V2qK03entkv6BUsnxhS-ftdfywg@mail.gmail.com>
  1 sibling, 0 replies; 22+ messages in thread
From: Jirka Hladky @ 2023-09-14  8:44 UTC (permalink / raw)
  To: Sandipan Das
  Cc: Peter Zijlstra, Breno Leitao, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Adrian Hunter,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, leit, dcostantino,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM, ananth.narayan,
	Ravi Bangoria, Michael Petlan

Hi Sandipan,

here is the info from /proc/cpuinfo

vendor_id       : AuthenticAMD
cpu family      : 25
model           : 160
model name      : AMD EPYC 9754 128-Core Processor
stepping        : 2
microcode       : 0xaa0020f

>2. Microcode patch level
Is it the microcode string from cpuinfo provided above, or are you
looking for something else?

Thanks!
Jirka


On Wed, Sep 13, 2023 at 6:19 PM Sandipan Das <sandipan.das@amd.com> wrote:
>
> Hi Jirka,
>
> Can you please share the following?
>
> 1. Family, Model and Stepping of the processor
> 2. Microcode patch level
> On 9/13/2023 8:00 PM, Jirka Hladky wrote:
> > Hi Sandipan,
> >
> > Is there any update on this issue? We have hit the issue, and it makes
> > the server pretty unusable due to the thousands of Call Traces being
> > logged.
> >
> > Thanks a lot!
> > Jirka
> >
> >
> > On Fri, Jun 16, 2023 at 5:34 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >>
> >> On Fri, Jun 16, 2023 at 08:13:22PM +0530, Sandipan Das (AMD) wrote:
> >>> The reserved bits should never be set. The purpose of the WARN_ON() is to catch such anomalies.
> >>> I am working out the details with Breno and will report back with a resolution.
> >>
> >> Thanks!
> >>
> >
> >
>


-- 
-Jirka


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/x86/amd: Do not WARN on every IRQ
  2023-09-13 16:24   ` Breno Leitao
@ 2023-09-14  8:45     ` Jirka Hladky
  2023-09-14  8:55       ` Sandipan Das
  0 siblings, 1 reply; 22+ messages in thread
From: Jirka Hladky @ 2023-09-14  8:45 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Thomas Gleixner, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Sandipan Das, leit,
	dcostantino, open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM

Hi Breno,

I'm definitively voting for using WARN_ON_ONCE - in the current
implementation, we are getting thousands of the same warnings and Call
Traces, causing the system to become unusable.

>Anyway, please let me know whatever is your preferred way and I will submit a v2.
@Peter Zijlstra and @Sandipan - could you please comment on the
preferred implementation of the patch?

THANK YOU
Jirka

On Wed, Sep 13, 2023 at 6:24 PM Breno Leitao <leitao@debian.org> wrote:
>
> Hi Peter,
>
> On Fri, Jun 16, 2023 at 03:29:54PM +0200, Peter Zijlstra wrote:
> > On Fri, Jun 16, 2023 at 04:53:15AM -0700, Breno Leitao wrote:
> > > On some systems, the Performance Counter Global Status Register is
> > > coming with reserved bits set, which causes the system to be unusable
> > > if a simple `perf top` runs. The system hits the WARN() thousands times
> > > while perf runs.
> > >
> > > WARNING: CPU: 18 PID: 20608 at arch/x86/events/amd/core.c:944 amd_pmu_v2_handle_irq+0x1be/0x2b0
> > >
> > > This happens because the "Performance Counter Global Status Register"
> > > (PerfCntGlobalStatus) MSR has bit 7 set. Bit 7 should be reserved according
> > > to the documentation (Figure 13-12 from "AMD64 Architecture Programmer’s
> > > Manual, Volume 2: System Programming, 24593"[1]
> >
> > Would it then not make more sense to mask out bit7 before:
> >
> > +     status &= ~AMD_PMU_V2_GLOBAL_STATUS_RESERVED;
> >       if (!status)
> >               goto done;
>
> Instead of masking `status` against AMD_PMU_V2_GLOBAL_STATUS_RESERVED
> (AMD64_NUM_COUNTERS?), I opted for using the `amd_pmu_global_cntr_mask`
> global variable because it seems to represent what the loop below is
> iterating over:
>
>         /* PMC Enable and Overflow bits for PerfCntrGlobal* registers */
>         static u64 amd_pmu_global_cntr_mask __read_mostly;
>
> Also, I think we want to WARN_ON_ONCE() if we see this problem. Right
> now, it warns at every time we call this function, which makes the
> machine unusable, but, warning it once could be helpful to figure out
> there is something wrong with the machine/firmware.
>
> Anyway, please let me know whatever is your preferred way and I will
> submit a v2.
>


-- 
-Jirka


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/x86/amd: Do not WARN on every IRQ
       [not found]             ` <CAE4VaGAcWk0PYygNGcguRA2V2qK03entkv6BUsnxhS-ftdfywg@mail.gmail.com>
@ 2023-09-14  8:49               ` Sandipan Das
  0 siblings, 0 replies; 22+ messages in thread
From: Sandipan Das @ 2023-09-14  8:49 UTC (permalink / raw)
  To: Jirka Hladky
  Cc: Peter Zijlstra, Breno Leitao, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Adrian Hunter,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, leit, dcostantino,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM, ananth.narayan,
	Ravi Bangoria, Michael Petlan

Hi Jirka,

Thanks for reporting back. Moving to the latest Family 19h microcode (link below) will fix the problem.
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/amd-ucode/microcode_amd_fam19h.bin


On 9/14/2023 2:03 PM, Jirka Hladky wrote:
> Hi Sandipan,
> 
> here is the info from /proc/cpuinfo
>                                                                                                                                                                                                                      
> vendor_id       : AuthenticAMD                                                                                                                                                                                                              
> cpu family      : 25                                                                                                                                                                                                                        
> model           : 160                                                                                                                                                                                                                      
> model name      : AMD EPYC 9754 128-Core Processor                                                                                                                                                                                          
> stepping        : 2                                                                                                                                                                                                                        
> microcode       : 0xaa0020f
> 
>>2. Microcode patch level
> Is it the microcode string from cpuinfo provided above, or are you looking for something else? 
> 
> Thanks!
> Jirka
> 
> 
> On Wed, Sep 13, 2023 at 6:19 PM Sandipan Das <sandipan.das@amd.com <mailto:sandipan.das@amd.com>> wrote:
> 
>     Hi Jirka,
> 
>     Can you please share the following?
> 
>     1. Family, Model and Stepping of the processor
>     2. Microcode patch level
>     On 9/13/2023 8:00 PM, Jirka Hladky wrote:
>     > Hi Sandipan,
>     >
>     > Is there any update on this issue? We have hit the issue, and it makes
>     > the server pretty unusable due to the thousands of Call Traces being
>     > logged.
>     >
>     > Thanks a lot!
>     > Jirka
>     >
>     >
>     > On Fri, Jun 16, 2023 at 5:34 PM Peter Zijlstra <peterz@infradead.org <mailto:peterz@infradead.org>> wrote:
>     >>
>     >> On Fri, Jun 16, 2023 at 08:13:22PM +0530, Sandipan Das (AMD) wrote:
>     >>> The reserved bits should never be set. The purpose of the WARN_ON() is to catch such anomalies.
>     >>> I am working out the details with Breno and will report back with a resolution.
>     >>
>     >> Thanks!
>     >>
>     >
>     >
> 
> 
> 
> -- 
> -Jirka


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/x86/amd: Do not WARN on every IRQ
  2023-09-14  8:45     ` Jirka Hladky
@ 2023-09-14  8:55       ` Sandipan Das
  2023-09-14  9:12         ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Sandipan Das @ 2023-09-14  8:55 UTC (permalink / raw)
  To: Jirka Hladky, Breno Leitao
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Thomas Gleixner, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, leit, dcostantino,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM

Hi Breno, Jirka,

On 9/14/2023 2:15 PM, Jirka Hladky wrote:
> Hi Breno,
> 
> I'm definitively voting for using WARN_ON_ONCE - in the current
> implementation, we are getting thousands of the same warnings and Call
> Traces, causing the system to become unusable.
> 
>> Anyway, please let me know whatever is your preferred way and I will submit a v2.
> @Peter Zijlstra and @Sandipan - could you please comment on the
> preferred implementation of the patch?
> 

I agree with using WARN_ON_ONCE() to make this less intrusive.

> 
> On Wed, Sep 13, 2023 at 6:24 PM Breno Leitao <leitao@debian.org> wrote:
>>
>> Hi Peter,
>>
>> On Fri, Jun 16, 2023 at 03:29:54PM +0200, Peter Zijlstra wrote:
>>> On Fri, Jun 16, 2023 at 04:53:15AM -0700, Breno Leitao wrote:
>>>> On some systems, the Performance Counter Global Status Register is
>>>> coming with reserved bits set, which causes the system to be unusable
>>>> if a simple `perf top` runs. The system hits the WARN() thousands times
>>>> while perf runs.
>>>>
>>>> WARNING: CPU: 18 PID: 20608 at arch/x86/events/amd/core.c:944 amd_pmu_v2_handle_irq+0x1be/0x2b0
>>>>
>>>> This happens because the "Performance Counter Global Status Register"
>>>> (PerfCntGlobalStatus) MSR has bit 7 set. Bit 7 should be reserved according
>>>> to the documentation (Figure 13-12 from "AMD64 Architecture Programmer’s
>>>> Manual, Volume 2: System Programming, 24593"[1]
>>>
>>> Would it then not make more sense to mask out bit7 before:
>>>
>>> +     status &= ~AMD_PMU_V2_GLOBAL_STATUS_RESERVED;
>>>       if (!status)
>>>               goto done;
>>
>> Instead of masking `status` against AMD_PMU_V2_GLOBAL_STATUS_RESERVED
>> (AMD64_NUM_COUNTERS?), I opted for using the `amd_pmu_global_cntr_mask`
>> global variable because it seems to represent what the loop below is
>> iterating over:
>>
>>         /* PMC Enable and Overflow bits for PerfCntrGlobal* registers */
>>         static u64 amd_pmu_global_cntr_mask __read_mostly;
>>
>> Also, I think we want to WARN_ON_ONCE() if we see this problem. Right
>> now, it warns at every time we call this function, which makes the
>> machine unusable, but, warning it once could be helpful to figure out
>> there is something wrong with the machine/firmware.
>>
>> Anyway, please let me know whatever is your preferred way and I will
>> submit a v2.
>>
> 
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/x86/amd: Do not WARN on every IRQ
  2023-09-14  8:55       ` Sandipan Das
@ 2023-09-14  9:12         ` Peter Zijlstra
  2023-09-14  9:14           ` Sandipan Das
  2023-09-14  9:30           ` Breno Leitao
  0 siblings, 2 replies; 22+ messages in thread
From: Peter Zijlstra @ 2023-09-14  9:12 UTC (permalink / raw)
  To: Sandipan Das
  Cc: Jirka Hladky, Breno Leitao, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Adrian Hunter,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, leit, dcostantino,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM

On Thu, Sep 14, 2023 at 02:25:40PM +0530, Sandipan Das wrote:
> Hi Breno, Jirka,
> 
> On 9/14/2023 2:15 PM, Jirka Hladky wrote:
> > Hi Breno,
> > 
> > I'm definitively voting for using WARN_ON_ONCE - in the current
> > implementation, we are getting thousands of the same warnings and Call
> > Traces, causing the system to become unusable.
> > 
> >> Anyway, please let me know whatever is your preferred way and I will submit a v2.
> > @Peter Zijlstra and @Sandipan - could you please comment on the
> > preferred implementation of the patch?
> > 
> 
> I agree with using WARN_ON_ONCE() to make this less intrusive.

Could you send a patch that AMD is happy with? I think you wrote this
was a firmware bug and is sorted with a new firmware, so perhaps make it
WARN_ONCE() and tell the users to upgrade their firmware to $ver ?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/x86/amd: Do not WARN on every IRQ
  2023-09-14  9:12         ` Peter Zijlstra
@ 2023-09-14  9:14           ` Sandipan Das
  2023-09-14  9:30           ` Breno Leitao
  1 sibling, 0 replies; 22+ messages in thread
From: Sandipan Das @ 2023-09-14  9:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jirka Hladky, Breno Leitao, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Adrian Hunter,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, leit, dcostantino,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM


On 9/14/2023 2:42 PM, Peter Zijlstra wrote:
> On Thu, Sep 14, 2023 at 02:25:40PM +0530, Sandipan Das wrote:
>> Hi Breno, Jirka,
>>
>> On 9/14/2023 2:15 PM, Jirka Hladky wrote:
>>> Hi Breno,
>>>
>>> I'm definitively voting for using WARN_ON_ONCE - in the current
>>> implementation, we are getting thousands of the same warnings and Call
>>> Traces, causing the system to become unusable.
>>>
>>>> Anyway, please let me know whatever is your preferred way and I will submit a v2.
>>> @Peter Zijlstra and @Sandipan - could you please comment on the
>>> preferred implementation of the patch?
>>>
>>
>> I agree with using WARN_ON_ONCE() to make this less intrusive.
> 
> Could you send a patch that AMD is happy with? I think you wrote this
> was a firmware bug and is sorted with a new firmware, so perhaps make it
> WARN_ONCE() and tell the users to upgrade their firmware to $ver ?

Sure. Will do.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/x86/amd: Do not WARN on every IRQ
  2023-09-14  9:12         ` Peter Zijlstra
  2023-09-14  9:14           ` Sandipan Das
@ 2023-09-14  9:30           ` Breno Leitao
  2023-09-14 11:18             ` Peter Zijlstra
  1 sibling, 1 reply; 22+ messages in thread
From: Breno Leitao @ 2023-09-14  9:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sandipan Das, Jirka Hladky, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Adrian Hunter,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, leit, dcostantino,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM

On Thu, Sep 14, 2023 at 11:12:34AM +0200, Peter Zijlstra wrote:
> On Thu, Sep 14, 2023 at 02:25:40PM +0530, Sandipan Das wrote:

> > I agree with using WARN_ON_ONCE() to make this less intrusive.
> 
> Could you send a patch that AMD is happy with?

Why the current patch is not good enough?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/x86/amd: Do not WARN on every IRQ
  2023-09-14  9:30           ` Breno Leitao
@ 2023-09-14 11:18             ` Peter Zijlstra
  2023-09-14 11:22               ` Sandipan Das
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2023-09-14 11:18 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Sandipan Das, Jirka Hladky, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Adrian Hunter,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, leit, dcostantino,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM

On Thu, Sep 14, 2023 at 02:30:43AM -0700, Breno Leitao wrote:
> On Thu, Sep 14, 2023 at 11:12:34AM +0200, Peter Zijlstra wrote:
> > On Thu, Sep 14, 2023 at 02:25:40PM +0530, Sandipan Das wrote:
> 
> > > I agree with using WARN_ON_ONCE() to make this less intrusive.
> > 
> > Could you send a patch that AMD is happy with?
> 
> Why the current patch is not good enough?

Sandipan, can you answer this? I don't tihnk I'm qualified to speak for
the AMD pmu and certainly I don't have insight into their design future.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/x86/amd: Do not WARN on every IRQ
  2023-09-14 11:18             ` Peter Zijlstra
@ 2023-09-14 11:22               ` Sandipan Das
  2023-09-14 11:27                 ` Peter Zijlstra
  2023-09-14 12:21                 ` Breno Leitao
  0 siblings, 2 replies; 22+ messages in thread
From: Sandipan Das @ 2023-09-14 11:22 UTC (permalink / raw)
  To: Peter Zijlstra, Breno Leitao
  Cc: Jirka Hladky, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Thomas Gleixner, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, leit, dcostantino,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM

On 9/14/2023 4:48 PM, Peter Zijlstra wrote:
> On Thu, Sep 14, 2023 at 02:30:43AM -0700, Breno Leitao wrote:
>> On Thu, Sep 14, 2023 at 11:12:34AM +0200, Peter Zijlstra wrote:
>>> On Thu, Sep 14, 2023 at 02:25:40PM +0530, Sandipan Das wrote:
>>
>>>> I agree with using WARN_ON_ONCE() to make this less intrusive.
>>>
>>> Could you send a patch that AMD is happy with?
>>
>> Why the current patch is not good enough?
> 
> Sandipan, can you answer this? I don't tihnk I'm qualified to speak for
> the AMD pmu and certainly I don't have insight into their design future.

Hi Breno,

Functionally, the patch looks good to me and I will be reusing it
without any change to the authorship. However, as Peter suggested, I
wanted to add a message to prompt users to update the microcode and
also call out the required patch levels in the commit message since
different Zen 4 variants and steppings use different microcode.

Here's what I plan to send.

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index abadd5f23425..186a124bb3c0 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -909,6 +909,13 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
                status &= ~GLOBAL_STATUS_LBRS_FROZEN;
        }

+       if (status & ~amd_pmu_global_cntr_mask)
+               pr_warn_once("Unknown status bits are set (0x%llx), please consider updating microcode\n",
+                            status);
+
+       /* Clear any reserved bits set by buggy microcode */
+       status &= amd_pmu_global_cntr_mask;
+
        for (idx = 0; idx < x86_pmu.num_counters; idx++) {
                if (!test_bit(idx, cpuc->active_mask))
                        continue;

--

Hi Peter,

There is another case where users will see warnings but the patch
to fix it (link below) is yet to be reviewed. May I rebase and
resend it along with the above?

https://lore.kernel.org/all/20230613105809.524535-1-sandipan.das@amd.com/


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/x86/amd: Do not WARN on every IRQ
  2023-09-14 11:22               ` Sandipan Das
@ 2023-09-14 11:27                 ` Peter Zijlstra
  2023-09-14 12:21                 ` Breno Leitao
  1 sibling, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2023-09-14 11:27 UTC (permalink / raw)
  To: Sandipan Das
  Cc: Breno Leitao, Jirka Hladky, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Adrian Hunter,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, leit, dcostantino,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM

On Thu, Sep 14, 2023 at 04:52:13PM +0530, Sandipan Das wrote:
> On 9/14/2023 4:48 PM, Peter Zijlstra wrote:
> > On Thu, Sep 14, 2023 at 02:30:43AM -0700, Breno Leitao wrote:
> >> On Thu, Sep 14, 2023 at 11:12:34AM +0200, Peter Zijlstra wrote:
> >>> On Thu, Sep 14, 2023 at 02:25:40PM +0530, Sandipan Das wrote:
> >>
> >>>> I agree with using WARN_ON_ONCE() to make this less intrusive.
> >>>
> >>> Could you send a patch that AMD is happy with?
> >>
> >> Why the current patch is not good enough?
> > 
> > Sandipan, can you answer this? I don't tihnk I'm qualified to speak for
> > the AMD pmu and certainly I don't have insight into their design future.
> 
> Hi Breno,
> 
> Functionally, the patch looks good to me and I will be reusing it
> without any change to the authorship. However, as Peter suggested, I
> wanted to add a message to prompt users to update the microcode and
> also call out the required patch levels in the commit message since
> different Zen 4 variants and steppings use different microcode.
> 
> Here's what I plan to send.
> 
> diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
> index abadd5f23425..186a124bb3c0 100644
> --- a/arch/x86/events/amd/core.c
> +++ b/arch/x86/events/amd/core.c
> @@ -909,6 +909,13 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
>                 status &= ~GLOBAL_STATUS_LBRS_FROZEN;
>         }
> 
> +       if (status & ~amd_pmu_global_cntr_mask)
> +               pr_warn_once("Unknown status bits are set (0x%llx), please consider updating microcode\n",
> +                            status);
> +
> +       /* Clear any reserved bits set by buggy microcode */
> +       status &= amd_pmu_global_cntr_mask;
> +
>         for (idx = 0; idx < x86_pmu.num_counters; idx++) {
>                 if (!test_bit(idx, cpuc->active_mask))
>                         continue;
> 
> --
> 
> Hi Peter,
> 
> There is another case where users will see warnings but the patch
> to fix it (link below) is yet to be reviewed. May I rebase and
> resend it along with the above?
> 
> https://lore.kernel.org/all/20230613105809.524535-1-sandipan.das@amd.com/
> 

Sure, sorry I seem to have missed that :-(

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/x86/amd: Do not WARN on every IRQ
  2023-09-14 11:22               ` Sandipan Das
  2023-09-14 11:27                 ` Peter Zijlstra
@ 2023-09-14 12:21                 ` Breno Leitao
  2023-09-14 13:50                   ` Jirka Hladky
  1 sibling, 1 reply; 22+ messages in thread
From: Breno Leitao @ 2023-09-14 12:21 UTC (permalink / raw)
  To: Sandipan Das
  Cc: Peter Zijlstra, Jirka Hladky, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Adrian Hunter,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, leit, dcostantino,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM

On Thu, Sep 14, 2023 at 04:52:13PM +0530, Sandipan Das wrote:
> On 9/14/2023 4:48 PM, Peter Zijlstra wrote:
> > On Thu, Sep 14, 2023 at 02:30:43AM -0700, Breno Leitao wrote:
> >> On Thu, Sep 14, 2023 at 11:12:34AM +0200, Peter Zijlstra wrote:
> >>> On Thu, Sep 14, 2023 at 02:25:40PM +0530, Sandipan Das wrote:
> >>
> >>>> I agree with using WARN_ON_ONCE() to make this less intrusive.
> >>>
> >>> Could you send a patch that AMD is happy with?
> >>
> >> Why the current patch is not good enough?
> > 
> > Sandipan, can you answer this? I don't tihnk I'm qualified to speak for
> > the AMD pmu and certainly I don't have insight into their design future.
> 
> Hi Breno,
> 
> Functionally, the patch looks good to me and I will be reusing it
> without any change to the authorship. However, as Peter suggested, I
> wanted to add a message to prompt users to update the microcode and
> also call out the required patch levels in the commit message since
> different Zen 4 variants and steppings use different microcode.
> 
> Here's what I plan to send.

Awesome. Thanks for addressing it.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/x86/amd: Do not WARN on every IRQ
  2023-09-14 12:21                 ` Breno Leitao
@ 2023-09-14 13:50                   ` Jirka Hladky
  0 siblings, 0 replies; 22+ messages in thread
From: Jirka Hladky @ 2023-09-14 13:50 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Sandipan Das, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Adrian Hunter,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, leit, dcostantino,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM

Thank you, all!

I have confirmed that the latest microcode fixes the issue.

=============================================
cat /proc/cpuinfo
microcode       : 0xaa00212

perf top runs fine, without any kernel warnings
=============================================

Jirka


On Thu, Sep 14, 2023 at 2:22 PM Breno Leitao <leitao@debian.org> wrote:
>
> On Thu, Sep 14, 2023 at 04:52:13PM +0530, Sandipan Das wrote:
> > On 9/14/2023 4:48 PM, Peter Zijlstra wrote:
> > > On Thu, Sep 14, 2023 at 02:30:43AM -0700, Breno Leitao wrote:
> > >> On Thu, Sep 14, 2023 at 11:12:34AM +0200, Peter Zijlstra wrote:
> > >>> On Thu, Sep 14, 2023 at 02:25:40PM +0530, Sandipan Das wrote:
> > >>
> > >>>> I agree with using WARN_ON_ONCE() to make this less intrusive.
> > >>>
> > >>> Could you send a patch that AMD is happy with?
> > >>
> > >> Why the current patch is not good enough?
> > >
> > > Sandipan, can you answer this? I don't tihnk I'm qualified to speak for
> > > the AMD pmu and certainly I don't have insight into their design future.
> >
> > Hi Breno,
> >
> > Functionally, the patch looks good to me and I will be reusing it
> > without any change to the authorship. However, as Peter suggested, I
> > wanted to add a message to prompt users to update the microcode and
> > also call out the required patch levels in the commit message since
> > different Zen 4 variants and steppings use different microcode.
> >
> > Here's what I plan to send.
>
> Awesome. Thanks for addressing it.
>


-- 
-Jirka


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-09-14 13:52 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-16 11:53 [PATCH] perf/x86/amd: Do not WARN on every IRQ Breno Leitao
2023-06-16 13:29 ` Peter Zijlstra
2023-06-16 14:03   ` Breno Leitao
2023-06-16 14:43     ` Sandipan Das (AMD)
2023-06-16 15:32       ` Peter Zijlstra
2023-09-13 14:30         ` Jirka Hladky
2023-09-13 15:03           ` Breno Leitao
2023-09-13 15:23             ` Jirka Hladky
2023-09-13 16:18           ` Sandipan Das
2023-09-14  8:44             ` Jirka Hladky
     [not found]             ` <CAE4VaGAcWk0PYygNGcguRA2V2qK03entkv6BUsnxhS-ftdfywg@mail.gmail.com>
2023-09-14  8:49               ` Sandipan Das
2023-09-13 16:24   ` Breno Leitao
2023-09-14  8:45     ` Jirka Hladky
2023-09-14  8:55       ` Sandipan Das
2023-09-14  9:12         ` Peter Zijlstra
2023-09-14  9:14           ` Sandipan Das
2023-09-14  9:30           ` Breno Leitao
2023-09-14 11:18             ` Peter Zijlstra
2023-09-14 11:22               ` Sandipan Das
2023-09-14 11:27                 ` Peter Zijlstra
2023-09-14 12:21                 ` Breno Leitao
2023-09-14 13:50                   ` Jirka Hladky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).