All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2] x86, mce, amd: Enable interrupts by default if HW capable
@ 2015-02-02 17:02 Aravind Gopalakrishnan
  2015-02-06 16:38 ` Borislav Petkov
  0 siblings, 1 reply; 2+ messages in thread
From: Aravind Gopalakrishnan @ 2015-02-02 17:02 UTC (permalink / raw)
  To: tony.luck, bp, mingo, tglx, hpa, x86, linux-kernel, linux-edac
  Cc: Aravind Gopalakrishnan

We setup APIC vectors for threshold errors if interrupt_capable.
However, we don't set interrupt_enable by default.
Re-working threshold_restart_bank() here so that the first time we
set up lvt_offset, we also set IntType to APIC.

User is still allowed to disable interrupts through sysfs.

While at it, check if status is valid before we proceed to log
error using mce_log. This is because, in multi-node platforms,
only NBC has valid status info. So, the decoding of status values
on the non-NBC leads to noise on kernel logs like so-

[  440.509744] EDAC DEBUG: amd64_inject_write_store: section=0x80000000
word_bits=0x10020001
[  466.570925] [Hardware Error]: Corrected error, no action required.
[  466.570935] [Hardware Error]: CPU:25 (15:2:0) MC4_STATUS[-|CE|-|-|-
[  466.570936] [Hardware Error]: Corrected error, no action required.
[  466.570959] [Hardware Error]: CPU:26 (15:2:0) MC4_STATUS[-|CE|-|-|-
<...>
[  466.571293] WARNING: CPU: 25 PID: 0 at drivers/edac/amd64_edac.c:2147
decode_bus_error+0x1ba/0x2a0()
[  466.571301] WARNING: CPU: 26 PID: 0 at drivers/edac/amd64_edac.c:2147
decode_bus_error+0x1ba/0x2a0()
[  466.571303] Something is rotten in the state of Denmark.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
---
Changes in V2:
 - earlier changes regarding removal of bank == 4 check and removal
   of 'interrupt_enable' attribute causes regressions. Fixed that.
 - moving setting of threshold_limit and comment style fixes are not
   directly related to this patch. So removing them to cut out any
   distractions
 - Add fix for garbled dmesg output on multi-node platforms, modify
   commit message to reflect the change.

 arch/x86/kernel/cpu/mcheck/mce_amd.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index f1c3769..82c5144 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -250,6 +250,7 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 			if (!b.interrupt_capable)
 				goto init;
 
+			b.interrupt_enable = 1;
 			new	= (high & MASK_LVTOFF_HI) >> 20;
 			offset  = setup_APIC_mce(offset, new);
 
@@ -322,6 +323,8 @@ static void amd_threshold_interrupt(void)
 log:
 	mce_setup(&m);
 	rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
+	if (!(m.status & MCI_STATUS_VAL))
+		return;
 	m.misc = ((u64)high << 32) | low;
 	m.bank = bank;
 	mce_log(&m);
@@ -497,10 +500,12 @@ static int allocate_threshold_blocks(unsigned int cpu, unsigned int bank,
 	b->interrupt_capable	= lvt_interrupt_supported(bank, high);
 	b->threshold_limit	= THRESHOLD_MAX;
 
-	if (b->interrupt_capable)
+	if (b->interrupt_capable) {
 		threshold_ktype.default_attrs[2] = &interrupt_enable.attr;
-	else
+		b->interrupt_enable = 1;
+	} else {
 		threshold_ktype.default_attrs[2] = NULL;
+	}
 
 	INIT_LIST_HEAD(&b->miscj);
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH V2] x86, mce, amd: Enable interrupts by default if HW capable
  2015-02-02 17:02 [PATCH V2] x86, mce, amd: Enable interrupts by default if HW capable Aravind Gopalakrishnan
@ 2015-02-06 16:38 ` Borislav Petkov
  0 siblings, 0 replies; 2+ messages in thread
From: Borislav Petkov @ 2015-02-06 16:38 UTC (permalink / raw)
  To: Aravind Gopalakrishnan
  Cc: tony.luck, mingo, tglx, hpa, x86, linux-kernel, linux-edac

On Mon, Feb 02, 2015 at 11:02:41AM -0600, Aravind Gopalakrishnan wrote:
> We setup APIC vectors for threshold errors if interrupt_capable.
> However, we don't set interrupt_enable by default.
> Re-working threshold_restart_bank() here so that the first time we
> set up lvt_offset, we also set IntType to APIC.
> 
> User is still allowed to disable interrupts through sysfs.
> 
> While at it, check if status is valid before we proceed to log
> error using mce_log. This is because, in multi-node platforms,
> only NBC has valid status info. So, the decoding of status values
> on the non-NBC leads to noise on kernel logs like so-
> 
> [  440.509744] EDAC DEBUG: amd64_inject_write_store: section=0x80000000
> word_bits=0x10020001
> [  466.570925] [Hardware Error]: Corrected error, no action required.
> [  466.570935] [Hardware Error]: CPU:25 (15:2:0) MC4_STATUS[-|CE|-|-|-
> [  466.570936] [Hardware Error]: Corrected error, no action required.
> [  466.570959] [Hardware Error]: CPU:26 (15:2:0) MC4_STATUS[-|CE|-|-|-
> <...>
> [  466.571293] WARNING: CPU: 25 PID: 0 at drivers/edac/amd64_edac.c:2147
> decode_bus_error+0x1ba/0x2a0()
> [  466.571301] WARNING: CPU: 26 PID: 0 at drivers/edac/amd64_edac.c:2147
> decode_bus_error+0x1ba/0x2a0()
> [  466.571303] Something is rotten in the state of Denmark.
> 
> Suggested-by: Borislav Petkov <bp@suse.de>
> Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>

Queued for 3.21, thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-02-06 16:39 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-02 17:02 [PATCH V2] x86, mce, amd: Enable interrupts by default if HW capable Aravind Gopalakrishnan
2015-02-06 16:38 ` Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.