* [PATCH] x86/mce: Do not use bank 1 for APEI generated error logs.
@ 2016-05-27 21:11 Tony Luck
2016-05-28 7:34 ` Borislav Petkov
0 siblings, 1 reply; 6+ messages in thread
From: Tony Luck @ 2016-05-27 21:11 UTC (permalink / raw)
To: Borislav Petkov; +Cc: linux-edac, linux-kernel, Aristeu Rozanski
BIOS can report a memory error to Linux using ACPI/APEI mechanism.
When it does this, we create a fictitious machine check error record
and feed it into the standard mce_Log() function. The error record
needs a machine check bank number, and for some reason we chose "1"
for this.
But "1" is a valid bank number, and this causes confusion and heartburn
among h/w folks who are concerned that a memory error signature was
somehow logged in bank 1.
Change to use "mca_cfg.banks" (one higher than the largest bank number
supported on the platform) so that it will be clearer that this error
did not originate in a machine check bank.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/kernel/cpu/mcheck/mce-apei.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce-apei.c b/arch/x86/kernel/cpu/mcheck/mce-apei.c
index 34c89a3e8260..9d2c02337713 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-apei.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-apei.c
@@ -46,7 +46,7 @@ void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err)
return;
mce_setup(&m);
- m.bank = 1;
+ m.bank = mca_cfg.banks;
/* Fake a memory read error with unknown channel */
m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | 0x9f;
--
2.5.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/mce: Do not use bank 1 for APEI generated error logs.
2016-05-27 21:11 [PATCH] x86/mce: Do not use bank 1 for APEI generated error logs Tony Luck
@ 2016-05-28 7:34 ` Borislav Petkov
2016-05-31 17:11 ` Luck, Tony
0 siblings, 1 reply; 6+ messages in thread
From: Borislav Petkov @ 2016-05-28 7:34 UTC (permalink / raw)
To: Tony Luck; +Cc: linux-edac, linux-kernel, Aristeu Rozanski
On Fri, May 27, 2016 at 02:11:06PM -0700, Tony Luck wrote:
> diff --git a/arch/x86/kernel/cpu/mcheck/mce-apei.c b/arch/x86/kernel/cpu/mcheck/mce-apei.c
> index 34c89a3e8260..9d2c02337713 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce-apei.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce-apei.c
> @@ -46,7 +46,7 @@ void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err)
> return;
>
> mce_setup(&m);
> - m.bank = 1;
> + m.bank = mca_cfg.banks;
There's struct cper_sec_mem_err.bank. Why aren't we copying that?
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH] x86/mce: Do not use bank 1 for APEI generated error logs.
2016-05-28 7:34 ` Borislav Petkov
@ 2016-05-31 17:11 ` Luck, Tony
2016-05-31 18:09 ` Borislav Petkov
0 siblings, 1 reply; 6+ messages in thread
From: Luck, Tony @ 2016-05-31 17:11 UTC (permalink / raw)
To: Borislav Petkov; +Cc: linux-edac, linux-kernel, Aristeu Rozanski
>> - m.bank = 1;
>> + m.bank = mca_cfg.banks;
>
> There's struct cper_sec_mem_err.bank. Why aren't we copying that?
Because that is DDR3/DDR4 "bank" (internal DIMM detail) as opposed to machine check "bank"
(CPU microarchitecture detail). We need the latter here.
-Tony
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/mce: Do not use bank 1 for APEI generated error logs.
2016-05-31 17:11 ` Luck, Tony
@ 2016-05-31 18:09 ` Borislav Petkov
2016-05-31 18:18 ` Luck, Tony
0 siblings, 1 reply; 6+ messages in thread
From: Borislav Petkov @ 2016-05-31 18:09 UTC (permalink / raw)
To: Luck, Tony; +Cc: linux-edac, linux-kernel, Aristeu Rozanski
On Tue, May 31, 2016 at 05:11:45PM +0000, Luck, Tony wrote:
> Because that is DDR3/DDR4 "bank" (internal DIMM detail) as opposed
> to machine check "bank" (CPU microarchitecture detail). We need the
> latter here.
Ok, I see.
Btw, would it have any benefit of writing a "magic" value in m.bank
to denote the error comes from APEI instead of number of banks which
differs betweem generations?
Something like
m.bank = -1;
or so?
255 banks will never happen anyway! (Famous last words ... :-)))
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH] x86/mce: Do not use bank 1 for APEI generated error logs.
2016-05-31 18:09 ` Borislav Petkov
@ 2016-05-31 18:18 ` Luck, Tony
2016-06-03 8:20 ` Borislav Petkov
0 siblings, 1 reply; 6+ messages in thread
From: Luck, Tony @ 2016-05-31 18:18 UTC (permalink / raw)
To: Borislav Petkov; +Cc: linux-edac, linux-kernel, Aristeu Rozanski
> Btw, would it have any benefit of writing a "magic" value in m.bank
> to denote the error comes from APEI instead of number of banks which
> differs between generations?
>
> Something like
>
> m.bank = -1;
>
> or so?
That might be a bit more obvious than my subtle "one more than possible
on this platform" magic number.
> 255 banks will never happen anyway! (Famous last words ... :-)))
Intel is stuck at 32 unless we come up with a new mechanism and change
all the code that generates MSR numbers with "base + 4*i". There are
some virtualization MSRs allocated at what would be bank32.
-Tony
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/mce: Do not use bank 1 for APEI generated error logs.
2016-05-31 18:18 ` Luck, Tony
@ 2016-06-03 8:20 ` Borislav Petkov
0 siblings, 0 replies; 6+ messages in thread
From: Borislav Petkov @ 2016-06-03 8:20 UTC (permalink / raw)
To: Luck, Tony; +Cc: linux-edac, linux-kernel, Aristeu Rozanski
On Tue, May 31, 2016 at 06:18:42PM +0000, Luck, Tony wrote:
> Intel is stuck at 32 unless we come up with a new mechanism and change
> all the code that generates MSR numbers with "base + 4*i". There are
> some virtualization MSRs allocated at what would be bank32.
Just when I was hoping that 32 banks should be more than enough and hw
people would restrain themselves. Looks like a "natural" restraint has
presented itself ... :-))
Anyway, v2 applied, thanks.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-06-03 8:21 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-27 21:11 [PATCH] x86/mce: Do not use bank 1 for APEI generated error logs Tony Luck
2016-05-28 7:34 ` Borislav Petkov
2016-05-31 17:11 ` Luck, Tony
2016-05-31 18:09 ` Borislav Petkov
2016-05-31 18:18 ` Luck, Tony
2016-06-03 8:20 ` Borislav Petkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).