linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/mce: Do not use bank 1 for APEI generated error logs.
@ 2016-05-27 21:11 Tony Luck
  2016-05-28  7:34 ` Borislav Petkov
  0 siblings, 1 reply; 6+ messages in thread
From: Tony Luck @ 2016-05-27 21:11 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel, Aristeu Rozanski

BIOS can report a memory error to Linux using ACPI/APEI mechanism.
When it does this, we create a fictitious machine check error record
and feed it into the standard mce_Log() function. The error record
needs a machine check bank number, and for some reason we chose "1"
for this.

But "1" is a valid bank number, and this causes confusion and heartburn
among h/w folks who are concerned that a memory error signature was
somehow logged in bank 1.

Change to use "mca_cfg.banks" (one higher than the largest bank number
supported on the platform) so that it will be clearer that this error
did not originate in a machine check bank.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mcheck/mce-apei.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-apei.c b/arch/x86/kernel/cpu/mcheck/mce-apei.c
index 34c89a3e8260..9d2c02337713 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-apei.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-apei.c
@@ -46,7 +46,7 @@ void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err)
 		return;
 
 	mce_setup(&m);
-	m.bank = 1;
+	m.bank = mca_cfg.banks;
 	/* Fake a memory read error with unknown channel */
 	m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | 0x9f;
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/mce: Do not use bank 1 for APEI generated error logs.
  2016-05-27 21:11 [PATCH] x86/mce: Do not use bank 1 for APEI generated error logs Tony Luck
@ 2016-05-28  7:34 ` Borislav Petkov
  2016-05-31 17:11   ` Luck, Tony
  0 siblings, 1 reply; 6+ messages in thread
From: Borislav Petkov @ 2016-05-28  7:34 UTC (permalink / raw)
  To: Tony Luck; +Cc: linux-edac, linux-kernel, Aristeu Rozanski

On Fri, May 27, 2016 at 02:11:06PM -0700, Tony Luck wrote:
> diff --git a/arch/x86/kernel/cpu/mcheck/mce-apei.c b/arch/x86/kernel/cpu/mcheck/mce-apei.c
> index 34c89a3e8260..9d2c02337713 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce-apei.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce-apei.c
> @@ -46,7 +46,7 @@ void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err)
>  		return;
>  
>  	mce_setup(&m);
> -	m.bank = 1;
> +	m.bank = mca_cfg.banks;

There's struct cper_sec_mem_err.bank. Why aren't we copying that?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH] x86/mce: Do not use bank 1 for APEI generated error logs.
  2016-05-28  7:34 ` Borislav Petkov
@ 2016-05-31 17:11   ` Luck, Tony
  2016-05-31 18:09     ` Borislav Petkov
  0 siblings, 1 reply; 6+ messages in thread
From: Luck, Tony @ 2016-05-31 17:11 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel, Aristeu Rozanski

>> -	m.bank = 1;
>> +	m.bank = mca_cfg.banks;
>
> There's struct cper_sec_mem_err.bank. Why aren't we copying that?

Because that is DDR3/DDR4 "bank" (internal DIMM detail) as opposed to machine check "bank"
(CPU microarchitecture detail).  We need the latter here.

-Tony

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/mce: Do not use bank 1 for APEI generated error logs.
  2016-05-31 17:11   ` Luck, Tony
@ 2016-05-31 18:09     ` Borislav Petkov
  2016-05-31 18:18       ` Luck, Tony
  0 siblings, 1 reply; 6+ messages in thread
From: Borislav Petkov @ 2016-05-31 18:09 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-edac, linux-kernel, Aristeu Rozanski

On Tue, May 31, 2016 at 05:11:45PM +0000, Luck, Tony wrote:
> Because that is DDR3/DDR4 "bank" (internal DIMM detail) as opposed
> to machine check "bank" (CPU microarchitecture detail). We need the
> latter here.

Ok, I see.

Btw, would it have any benefit of writing a "magic" value in m.bank
to denote the error comes from APEI instead of number of banks which
differs betweem generations?

Something like

	m.bank = -1;

or so?

255 banks will never happen anyway! (Famous last words ... :-)))

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH] x86/mce: Do not use bank 1 for APEI generated error logs.
  2016-05-31 18:09     ` Borislav Petkov
@ 2016-05-31 18:18       ` Luck, Tony
  2016-06-03  8:20         ` Borislav Petkov
  0 siblings, 1 reply; 6+ messages in thread
From: Luck, Tony @ 2016-05-31 18:18 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel, Aristeu Rozanski

> Btw, would it have any benefit of writing a "magic" value in m.bank
> to denote the error comes from APEI instead of number of banks which
> differs between generations?
>
> Something like
>
>	m.bank = -1;
>
> or so?

That might be a bit more obvious than my subtle "one more than possible
on this platform" magic number.

> 255 banks will never happen anyway! (Famous last words ... :-)))

Intel is stuck at 32 unless we come up with a new mechanism and change
all the code that generates MSR numbers with "base + 4*i". There are
some virtualization MSRs allocated at what would be bank32.

-Tony

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/mce: Do not use bank 1 for APEI generated error logs.
  2016-05-31 18:18       ` Luck, Tony
@ 2016-06-03  8:20         ` Borislav Petkov
  0 siblings, 0 replies; 6+ messages in thread
From: Borislav Petkov @ 2016-06-03  8:20 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-edac, linux-kernel, Aristeu Rozanski

On Tue, May 31, 2016 at 06:18:42PM +0000, Luck, Tony wrote:
> Intel is stuck at 32 unless we come up with a new mechanism and change
> all the code that generates MSR numbers with "base + 4*i". There are
> some virtualization MSRs allocated at what would be bank32.

Just when I was hoping that 32 banks should be more than enough and hw
people would restrain themselves. Looks like a "natural" restraint has
presented itself ... :-))

Anyway, v2 applied, thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-06-03  8:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-27 21:11 [PATCH] x86/mce: Do not use bank 1 for APEI generated error logs Tony Luck
2016-05-28  7:34 ` Borislav Petkov
2016-05-31 17:11   ` Luck, Tony
2016-05-31 18:09     ` Borislav Petkov
2016-05-31 18:18       ` Luck, Tony
2016-06-03  8:20         ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).