linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] x86/mce: Try printing all machine check banks known before panic
@ 2014-11-19  9:22 ruiv.wang
  2014-11-19 10:29 ` Borislav Petkov
  0 siblings, 1 reply; 16+ messages in thread
From: ruiv.wang @ 2014-11-19  9:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: tony.luck, gong.chen, bp, rui.y.wang

From: Rui Wang <rui.y.wang@intel.com>

There are cases when an machine check panics without giving any information
about the error:

[  177.806166] Kernel panic - not syncing: Machine check from unknown source

No information besides that it is a machine check. This happens in two cases:
1) The CPU logs the error with the MCi_STATUS.EN bit set to zero, and Linux
   ignores EN=0 entries (as it should).
2) In normal processing the MCE handler ignores banks that do not contain fatal
   or unrecoverable errors (these would later be found and logged by the CMCI
   handler). If we panic, these will never be logged, but could be important
   to diagnose the problem.

This patch aims to record and print all known machine check banks if we
decide to panic.

Signed-off-by: Rui Wang <rui.y.wang@intel.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c |   53 +++++++++++++++++++++++++++++++++++--
 1 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 61a9668ce..97cf0b1 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -85,6 +85,16 @@ static char			*mce_helper_argv[2] = { mce_helper, NULL };
 static DECLARE_WAIT_QUEUE_HEAD(mce_chrdev_wait);
 
 static DEFINE_PER_CPU(struct mce, mces_seen);
+
+/*
+ * All valid error banks seen during MCE are temporarily saved here.
+ * There are multiple components which can report an error. For now
+ * 8 might be enough but it's subject to change in the future.
+ */
+#define MAX_ERRORS	8
+static struct mce banks_saved[MAX_ERRORS];
+int	banks_idx;
+
 static int			cpu_missing;
 
 /* CMCI storm detection filter */
@@ -363,6 +373,16 @@ static void mce_panic(char *msg, struct mce *final, char *exp)
 		pr_emerg(HW_ERR "Some CPUs didn't answer in synchronization\n");
 	if (exp)
 		pr_emerg(HW_ERR "Machine check: %s\n", exp);
+
+	/* Sometimes a CPU signals MCEs with MCi_STATUS.EN bit set to zero;
+	 * sometimes there are CMCI errors not consumed yet. We print them
+	 * here as they could be important to diagnose the problem.
+	 */
+	for (i = 0; i < banks_idx; i++) {
+		pr_emerg_once(HW_ERR "Possibly missed machine check banks:\n");
+		print_mce(&banks_saved[i]);
+	}
+
 	if (!fake_panic) {
 		if (panic_timeout == 0)
 			panic_timeout = mca_cfg.panic_timeout;
@@ -837,6 +857,8 @@ static int mce_start(int *no_way_out)
 		 * Monarch: Starts executing now, the others wait.
 		 */
 		atomic_set(&mce_executing, 1);
+		memset(banks_saved, 0, sizeof(banks_saved));
+		banks_idx = 0;
 	} else {
 		/*
 		 * Subject: Now start the scanning loop one by one in
@@ -1016,7 +1038,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 {
 	struct mca_config *cfg = &mca_cfg;
 	struct mce m, *final;
-	int i;
+	int i, k;
 	int worst = 0;
 	int severity;
 	/*
@@ -1082,6 +1104,33 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 		if ((m.status & MCI_STATUS_VAL) == 0)
 			continue;
 
+		mce_read_aux(&m, i);
+
+		/*
+		 * Temporarily save all valid banks including those handled
+		 * by CMCIs. If we panic, we can print them. If we don't panic,
+		 * then we can forget them (because we will print them from
+		 * machine_check_poll() sooner  or later). There  are many
+		 * duplications because banks are  shared, only  save each
+		 * distinct error once.
+		 *
+		 * Everything is serialized so no locking/atomics needed.
+		 */
+		for (k = 0; k < banks_idx; k++) {
+			if (banks_saved[k].status == m.status &&
+				banks_saved[k].addr == m.addr &&
+				banks_saved[k].misc == m.misc)
+
+				goto mce_skip;
+		}
+
+		if (banks_idx < MAX_ERRORS)
+			memcpy(&banks_saved[banks_idx++], &m,
+				sizeof(struct mce));
+		else
+			pr_warn_once("mce: MAX_ERRORS too low\n");
+mce_skip:
+
 		/*
 		 * Non uncorrected or non signaled errors are handled by
 		 * machine_check_poll. Leave them alone, unless this panics.
@@ -1112,8 +1161,6 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 			continue;
 		}
 
-		mce_read_aux(&m, i);
-
 		/*
 		 * Action optional error. Queue address for later processing.
 		 * When the ring overflows we just ignore the AO error.
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/mce: Try printing all machine check banks known before panic
  2014-11-19  9:22 [PATCH v3] x86/mce: Try printing all machine check banks known before panic ruiv.wang
@ 2014-11-19 10:29 ` Borislav Petkov
  2014-11-19 23:34   ` Luck, Tony
  0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2014-11-19 10:29 UTC (permalink / raw)
  To: ruiv.wang; +Cc: linux-kernel, tony.luck, gong.chen, rui.y.wang

On Wed, Nov 19, 2014 at 05:22:41PM +0800, ruiv.wang@gmail.com wrote:
> From: Rui Wang <rui.y.wang@intel.com>
> 
> There are cases when an machine check panics without giving any information
> about the error:
> 
> [  177.806166] Kernel panic - not syncing: Machine check from unknown source
> 
> No information besides that it is a machine check. This happens in two cases:
> 1) The CPU logs the error with the MCi_STATUS.EN bit set to zero, and Linux
>    ignores EN=0 entries (as it should).

Well, I guess we shouldn't anymore. Apparently hw forgets to set the
bit when raising an MCE so then we should ignore it too in mce-severity
and delete that piece or grade it as higher severity based on, I dunno,
b0rked hardware family/model/stepping or whatever bit we set...

        MCESEV(
                NO, "Not enabled",
                BITCLR(MCI_STATUS_EN)
                ),

> 2) In normal processing the MCE handler ignores banks that do not contain fatal
>    or unrecoverable errors (these would later be found and logged by the CMCI
>    handler). If we panic, these will never be logged, but could be important
>    to diagnose the problem.

Well, we do this:

                /*
                 * Non uncorrected or non signaled errors are handled by
                 * machine_check_poll. Leave them alone, unless this panics.
                 */
                if (!(m.status & (cfg->ser ? MCI_STATUS_S : MCI_STATUS_UC)) &&
                        !no_way_out)
                        continue;

so no_way_out gets indirectly controlled by mce-severity too. So I guess
mce-severity would need adjusting instead of adding more stuff to the #MC
handler.

Btw, the panic message comes from

        /*
         * No machine check event found. Must be some external
         * source or one CPU is hung. Panic.
         */
        if (global_worst <= MCE_KEEP_SEVERITY && mca_cfg.tolerant < 3)
                mce_panic("Machine check from unknown source", NULL, NULL);

so fixing mce_severity is what should happen here instead, IMO.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v3] x86/mce: Try printing all machine check banks known before panic
  2014-11-19 10:29 ` Borislav Petkov
@ 2014-11-19 23:34   ` Luck, Tony
  2014-11-20 10:15     ` Borislav Petkov
  0 siblings, 1 reply; 16+ messages in thread
From: Luck, Tony @ 2014-11-19 23:34 UTC (permalink / raw)
  To: Borislav Petkov, ruiv.wang; +Cc: linux-kernel, gong.chen, Wang, Rui Y

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1860 bytes --]

>> No information besides that it is a machine check. This happens in two cases:
>> 1) The CPU logs the error with the MCi_STATUS.EN bit set to zero, and Linux
>>    ignores EN=0 entries (as it should).

> Well, I guess we shouldn't anymore. Apparently hw forgets to set the
> bit when raising an MCE so then we should ignore it too in mce-severity
> and delete that piece or grade it as higher severity based on, I dunno,
> b0rked hardware family/model/stepping or whatever bit we set...
>
>        MCESEV(
>                NO, "Not enabled",
>                BITCLR(MCI_STATUS_EN)
>                ),

The SDM has this to say about EN=0 (in section 15.10.4.1 of volume 3B):

   When the EN flag is zero but the VAL and UC flags are one in
   the IA32_MCi_STATUS register, the reported uncorrected error
   in this bank is not enabled. As uncorrected errors with the
   EN flag = 0 are not the source of machine check exceptions,
   the MCE handler should log and clear non-enabled errors when
   the S bit is set and should continue searching for enabled
   errors from the other IA32_MCi_STATUS registers. Note that
   when IA32_MCG_CAP [24] is 0, any uncorrected error condition
   (VAL =1 and UC=1) including the one with the EN flag cleared
   are fatal and the handler must signal the operating system to
   reset the system. For the errors that do not generate machine
   check exceptions, the EN flag has no meaning.

Note the "should log and clear".  We just clear ... just need to shuffle some code
in mce.c to add the logging.

But we still need something like Rui's patch - calling mcelog() doesn't ensure that
we see something on the console about possible cause of the problem.

-Tony
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/mce: Try printing all machine check banks known before panic
  2014-11-19 23:34   ` Luck, Tony
@ 2014-11-20 10:15     ` Borislav Petkov
  2014-11-21  1:20       ` rui wang
  0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2014-11-20 10:15 UTC (permalink / raw)
  To: Luck, Tony; +Cc: ruiv.wang, linux-kernel, gong.chen, Wang, Rui Y

On Wed, Nov 19, 2014 at 11:34:10PM +0000, Luck, Tony wrote:
> The SDM has this to say about EN=0 (in section 15.10.4.1 of volume 3B):
> 
>    When the EN flag is zero but the VAL and UC flags are one in
>    the IA32_MCi_STATUS register, the reported uncorrected error
>    in this bank is not enabled. As uncorrected errors with the
>    EN flag = 0 are not the source of machine check exceptions,
>    the MCE handler should log and clear non-enabled errors when
>    the S bit is set and should continue searching for enabled
>    errors from the other IA32_MCi_STATUS registers. Note that
>    when IA32_MCG_CAP [24] is 0, any uncorrected error condition
>    (VAL =1 and UC=1) including the one with the EN flag cleared
>    are fatal and the handler must signal the operating system to
>    reset the system. For the errors that do not generate machine
>    check exceptions, the EN flag has no meaning.
> 
> Note the "should log and clear".  We just clear ... just need to shuffle some code
> in mce.c to add the logging.

Sure, we can log those.

> But we still need something like Rui's patch - calling mcelog()
> doesn't ensure that we see something on the console about possible
> cause of the problem.

So you're saying we should drain the mcelog buffer to the console in
such situations before we panic? If so, there's drain_mcelog_buffer()
which could be changed to call print_mce() instead of going to the
x86_mce_decoder_chain.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/mce: Try printing all machine check banks known before panic
  2014-11-20 10:15     ` Borislav Petkov
@ 2014-11-21  1:20       ` rui wang
  2014-11-21 16:41         ` Borislav Petkov
  0 siblings, 1 reply; 16+ messages in thread
From: rui wang @ 2014-11-21  1:20 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Luck, Tony, linux-kernel, gong.chen, Wang, Rui Y

On 11/20/14, Borislav Petkov <bp@alien8.de> wrote:
> On Wed, Nov 19, 2014 at 11:34:10PM +0000, Luck, Tony wrote:
>> The SDM has this to say about EN=0 (in section 15.10.4.1 of volume 3B):
>>
>>    When the EN flag is zero but the VAL and UC flags are one in
>>    the IA32_MCi_STATUS register, the reported uncorrected error
>>    in this bank is not enabled. As uncorrected errors with the
>>    EN flag = 0 are not the source of machine check exceptions,
>>    the MCE handler should log and clear non-enabled errors when
>>    the S bit is set and should continue searching for enabled
>>    errors from the other IA32_MCi_STATUS registers. Note that
>>    when IA32_MCG_CAP [24] is 0, any uncorrected error condition
>>    (VAL =1 and UC=1) including the one with the EN flag cleared
>>    are fatal and the handler must signal the operating system to
>>    reset the system. For the errors that do not generate machine
>>    check exceptions, the EN flag has no meaning.
>>
>> Note the "should log and clear".  We just clear ... just need to shuffle
>> some code
>> in mce.c to add the logging.
>
> Sure, we can log those.
>
>> But we still need something like Rui's patch - calling mcelog()
>> doesn't ensure that we see something on the console about possible
>> cause of the problem.
>
> So you're saying we should drain the mcelog buffer to the console in
> such situations before we panic? If so, there's drain_mcelog_buffer()
> which could be changed to call print_mce() instead of going to the
> x86_mce_decoder_chain.
>

Hi Boris,

We've found there are cases after mce_log() has been called, we then
decide to panic, but print_mce() can't find anything in the mcelog
buffer. I think the mcelog buffer can be consumed by the user space
daemon (possibly on a different CPU). We may end up seeing the "panic
from unknown source" message without printing any mca banks, which is
one of the cases where this bug was originated.

The current logging mechanism is not as reliable as it looks. When
some log entries have been copied to user space, but haven't been
logged on the disk, and we panic, then we permanently lose those log
entries.

Thanks
Rui

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/mce: Try printing all machine check banks known before panic
  2014-11-21  1:20       ` rui wang
@ 2014-11-21 16:41         ` Borislav Petkov
  2014-11-21 17:20           ` Luck, Tony
  0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2014-11-21 16:41 UTC (permalink / raw)
  To: rui wang; +Cc: Luck, Tony, linux-kernel, gong.chen, Wang, Rui Y

On Fri, Nov 21, 2014 at 09:20:59AM +0800, rui wang wrote:
> We've found there are cases after mce_log() has been called, we then
> decide to panic, but print_mce() can't find anything in the mcelog
> buffer. I think the mcelog buffer can be consumed by the user space
> daemon (possibly on a different CPU). We may end up seeing the "panic
> from unknown source" message without printing any mca banks, which is
> one of the cases where this bug was originated.

Ok, so modify the mcelog buffer to not zero out its entries when they're
being read out in userspace through mce_chrdev_read() but simply to
leave them in. Then you can read them out again on panic time. The mce
log buffer will have to become a circular buffer or something like that.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v3] x86/mce: Try printing all machine check banks known before panic
  2014-11-21 16:41         ` Borislav Petkov
@ 2014-11-21 17:20           ` Luck, Tony
  2014-11-21 18:13             ` Borislav Petkov
  0 siblings, 1 reply; 16+ messages in thread
From: Luck, Tony @ 2014-11-21 17:20 UTC (permalink / raw)
  To: Borislav Petkov, rui wang; +Cc: linux-kernel, gong.chen, Wang, Rui Y

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 838 bytes --]

> leave them in. Then you can read them out again on panic time. The mce
> log buffer will have to become a circular buffer or something like that.

This is a mixed bag.  If there are a bunch of errors so that we overflow the buffer,
then general wisdom says that people want to see the first errors, as the later
ones may just be secondary effects from the earlier ones.

But - lots of systems don't run mcelog(8) daemon.  So the buffer just fills
up with the first 32 errors (perhaps all relatively harmless corrected errors)
and then when some serious stuff happens we have no place to log :-)

Perhaps we need separate buffers for UC=0 and UC=1?  Or something else??

-Tony
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/mce: Try printing all machine check banks known before panic
  2014-11-21 17:20           ` Luck, Tony
@ 2014-11-21 18:13             ` Borislav Petkov
  2014-11-21 21:31               ` Luck, Tony
  2014-11-22  2:16               ` rui wang
  0 siblings, 2 replies; 16+ messages in thread
From: Borislav Petkov @ 2014-11-21 18:13 UTC (permalink / raw)
  To: Luck, Tony; +Cc: rui wang, linux-kernel, gong.chen, Wang, Rui Y

On Fri, Nov 21, 2014 at 05:20:53PM +0000, Luck, Tony wrote:
> > leave them in. Then you can read them out again on panic time. The mce
> > log buffer will have to become a circular buffer or something like that.
> 
> This is a mixed bag.  If there are a bunch of errors so that we overflow the buffer,
> then general wisdom says that people want to see the first errors, as the later
> ones may just be secondary effects from the earlier ones.
> 
> But - lots of systems don't run mcelog(8) daemon.  So the buffer just fills
> up with the first 32 errors (perhaps all relatively harmless corrected errors)
> and then when some serious stuff happens we have no place to log :-)

Of course we do - we overwrite the first one. Changing it into a
circular buffer will give us last 32 errors logged.

> Perhaps we need separate buffers for UC=0 and UC=1?  Or something else??

Well, adding yet another arbitrary length buffer of struct mces used in
yet another context scenario doesn't help that either, does it?

This chunk particularly makes me go WTH?!:

+ * All valid error banks seen during MCE are temporarily saved here.
+ * There are multiple components which can report an error. For now
+ * 8 might be enough but it's subject to change in the future.
+ */
+#define MAX_ERRORS     8
+static struct mce banks_saved[MAX_ERRORS];

It sounds to me like we need to go back to the drawing board and analyze
why that thing happens first:

[  177.806166] Kernel panic - not syncing: Machine check from unknown source

Now this comes from mce_reign() which is entered by the CPU which
entered the #MC handler first, according to the comments above it.

So basically it tells me that we want all the MCEs from the last
"round," so to speak, where we had to summon all cores into the indian
clearing of #MC to quickly show each other's wounds :-) :-)

Now, if MCE_LOG_LEN records, aka 32, are not enough because the error
happened at some point in the past, there are two possibilities:

* error got logged into mcelog and is long out to dmesg.

So we go look at dmesg. Not very easy to do when we panic, I know, so we
better make sure we have serial connected.


 [ Btw., we can know when userspace is eating up error data:
   drivers/ras/debugfs.c. If it doesn't, we can then dump it to dmesg.
   We'll have to teach mcelog/ras daemons to open that file so that we
   don't issue to dmesg. ]


* error is not logged yet so still in mcelog and we simply dump it out
to dmesg.

In any case, we cannot have fixed-size buffer for some number of errors
and rely on it always having the error which caused the #MC as something
will consume it at some point anyway.

So maybe if we could get a more detailed explanation of when this thing
happens, then we might address it better.

And also:

        /*
         * No machine check event found. Must be some external
         * source or one CPU is hung. Panic.
         */
        if (global_worst <= MCE_KEEP_SEVERITY && mca_cfg.tolerant < 3)
                mce_panic("Machine check from unknown source", NULL, NULL);

Provided this comment is correct, it doesn't sound like any MCE record
will ever tell us what causes the error as an external source or a hung
CPU doesn't generate an MCE record in any bank, does it?

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v3] x86/mce: Try printing all machine check banks known before panic
  2014-11-21 18:13             ` Borislav Petkov
@ 2014-11-21 21:31               ` Luck, Tony
  2014-11-21 21:35                 ` Borislav Petkov
  2014-11-22  2:16               ` rui wang
  1 sibling, 1 reply; 16+ messages in thread
From: Luck, Tony @ 2014-11-21 21:31 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: rui wang, linux-kernel, gong.chen, Wang, Rui Y

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1058 bytes --]

>
>        /*
>         * No machine check event found. Must be some external
>         * source or one CPU is hung. Panic.
>         */
>        if (global_worst <= MCE_KEEP_SEVERITY && mca_cfg.tolerant < 3)
>                mce_panic("Machine check from unknown source", NULL, NULL);
>
> Provided this comment is correct, it doesn't sound like any MCE record
> will ever tell us what causes the error as an external source or a hung
> CPU doesn't generate an MCE record in any bank, does it?

That means there were no VALID=1, EN=1, S=1 errors anywhere.  But there
might be some other things logged that would help us understand.

We are into cpu errata territory here though ... we aren't supposed to get
machine checks that don't have a logged cause.  We panic for spurious
machine checks because we know something has gone horribly wrong,
even if we don't know what that something was.

-Tony
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/mce: Try printing all machine check banks known before panic
  2014-11-21 21:31               ` Luck, Tony
@ 2014-11-21 21:35                 ` Borislav Petkov
  2014-11-21 21:59                   ` Luck, Tony
  0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2014-11-21 21:35 UTC (permalink / raw)
  To: Luck, Tony; +Cc: rui wang, linux-kernel, gong.chen, Wang, Rui Y

On Fri, Nov 21, 2014 at 09:31:56PM +0000, Luck, Tony wrote:
> >
> >        /*
> >         * No machine check event found. Must be some external
> >         * source or one CPU is hung. Panic.
> >         */
> >        if (global_worst <= MCE_KEEP_SEVERITY && mca_cfg.tolerant < 3)
> >                mce_panic("Machine check from unknown source", NULL, NULL);
> >
> > Provided this comment is correct, it doesn't sound like any MCE record
> > will ever tell us what causes the error as an external source or a hung
> > CPU doesn't generate an MCE record in any bank, does it?
> 
> That means there were no VALID=1, EN=1, S=1 errors anywhere.  But there
> might be some other things logged that would help us understand.

By "other things" you mean other MCEs?

> We are into cpu errata territory here though ... we aren't supposed to get
> machine checks that don't have a logged cause.  We panic for spurious
> machine checks because we know something has gone horribly wrong,
> even if we don't know what that something was.

Oh, cpu errata. So this would mean that we can't even rely on the
contents of the MCA banks, can we?

In any case, is any of the information in the MCA banks in such cases
even usable then? Because if not, we're definitely barking up the wrong
tree...

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v3] x86/mce: Try printing all machine check banks known before panic
  2014-11-21 21:35                 ` Borislav Petkov
@ 2014-11-21 21:59                   ` Luck, Tony
  2014-11-23 20:55                     ` Borislav Petkov
  0 siblings, 1 reply; 16+ messages in thread
From: Luck, Tony @ 2014-11-21 21:59 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: rui wang, linux-kernel, gong.chen, Wang, Rui Y

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1241 bytes --]

>> That means there were no VALID=1, EN=1, S=1 errors anywhere.  But there
>> might be some other things logged that would help us understand.
>
> By "other things" you mean other MCEs?

Logs with EN=0 and/or S=0.  They may have interesting information, and have
a good chance of being useful (especially if they are from some functional
unit that isn't part of the buggy behavior. Bad data flowing through multiple
functional units can leave a trail of logged entries (perhaps as many as four
units may see and log a single error). Only one of them should signal the machine
check (to avoid shutdown because of nested machine check). 

> Oh, cpu errata. So this would mean that we can't even rely on the
> contents of the MCA banks, can we?
>
> In any case, is any of the information in the MCA banks in such cases
> even usable then? Because if not, we're definitely barking up the wrong
> tree...

See above - I think even if there is a bug in the core that isn't setting the
right bits in the MCi_STATUS register - we could get good data from
devices out in the uncore.

-Tony
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/mce: Try printing all machine check banks known before panic
  2014-11-21 18:13             ` Borislav Petkov
  2014-11-21 21:31               ` Luck, Tony
@ 2014-11-22  2:16               ` rui wang
  2014-11-22  9:44                 ` Borislav Petkov
  1 sibling, 1 reply; 16+ messages in thread
From: rui wang @ 2014-11-22  2:16 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Luck, Tony, linux-kernel, gong.chen, Wang, Rui Y

On 11/22/14, Borislav Petkov <bp@alien8.de> wrote:
>... there are two possibilities:
>
> * error got logged into mcelog and is long out to dmesg.
>
> So we go look at dmesg. Not very easy to do when we panic, I know, so we
> better make sure we have serial connected.
>
>
>  [ Btw., we can know when userspace is eating up error data:
>    drivers/ras/debugfs.c. If it doesn't, we can then dump it to dmesg.
>    We'll have to teach mcelog/ras daemons to open that file so that we
>    don't issue to dmesg. ]
>
>
> * error is not logged yet so still in mcelog and we simply dump it out
> to dmesg.
>
> In any case, we cannot have fixed-size buffer for some number of errors
> and rely on it always having the error which caused the #MC as something
> will consume it at some point anyway.
>
> So maybe if we could get a more detailed explanation of when this thing
> happens, then we might address it better.
>

Hi Boris,
I think both possibilities are valid. But experiments show that the
error logs are not in the dmesg preserved by kdump in /var/crash/
after panic and reboot, and not in the mcelog.entry[] array in the
kernel. So they must be somewhere in user space memory. Even if we
have serial console connected we still can't cache them. The
difficulty is that there's no easy way to force a user space daemon to
do something during panic.

The new banks_saved[] array acts like a safe guard when you pass
something to someone else - to prevent it from getting lost in the
interim.

Thanks
Rui

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/mce: Try printing all machine check banks known before panic
  2014-11-22  2:16               ` rui wang
@ 2014-11-22  9:44                 ` Borislav Petkov
  2014-11-22 15:32                   ` rui wang
  0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2014-11-22  9:44 UTC (permalink / raw)
  To: rui wang; +Cc: Luck, Tony, linux-kernel, gong.chen, Wang, Rui Y

On Sat, Nov 22, 2014 at 10:16:49AM +0800, rui wang wrote:
> I think both possibilities are valid. But experiments show that the
> error logs are not in the dmesg preserved by kdump in /var/crash/
> after panic and reboot, and not in the mcelog.entry[] array in the
> kernel. So they must be somewhere in user space memory. Even if we
> have serial console connected we still can't cache them. The
> difficulty is that there's no easy way to force a user space daemon to
> do something during panic.
> 
> The new banks_saved[] array acts like a safe guard when you pass
> something to someone else - to prevent it from getting lost in the
> interim.

... and instead of duplicating the mcelog functionality partially by
adding yet another array of struct mces, simply change mcelog to not
zero out its contents and dump the last 32 errors that passed through
there.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/mce: Try printing all machine check banks known before panic
  2014-11-22  9:44                 ` Borislav Petkov
@ 2014-11-22 15:32                   ` rui wang
  2014-11-22 16:31                     ` Borislav Petkov
  0 siblings, 1 reply; 16+ messages in thread
From: rui wang @ 2014-11-22 15:32 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Luck, Tony, linux-kernel, gong.chen, Wang, Rui Y

On 11/22/14, Borislav Petkov <bp@alien8.de> wrote:
> On Sat, Nov 22, 2014 at 10:16:49AM +0800, rui wang wrote:
>> I think both possibilities are valid. But experiments show that the
>> error logs are not in the dmesg preserved by kdump in /var/crash/
>> after panic and reboot, and not in the mcelog.entry[] array in the
>> kernel. So they must be somewhere in user space memory. Even if we
>> have serial console connected we still can't cache them. The
>> difficulty is that there's no easy way to force a user space daemon to
>> do something during panic.
>>
>> The new banks_saved[] array acts like a safe guard when you pass
>> something to someone else - to prevent it from getting lost in the
>> interim.
>
> ... and instead of duplicating the mcelog functionality partially by
> adding yet another array of struct mces, simply change mcelog to not
> zero out its contents and dump the last 32 errors that passed through
> there.
>

But that means mcelog buffer will have to become circular, and we can
only dump the last 32 errors. There must be a reason why it wasn't
designed as circular. I guess its benefit is as Tony explained: on
systems where mcelog isn't run or run at a later time, we may lose the
first error which is more important.

There's valid reasons why people may not run mcelog, because they may
never see machine checks during their lifetime. However once their
machine panics due to a machine check, it suddenly becomes important.

Thanks
Rui

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/mce: Try printing all machine check banks known before panic
  2014-11-22 15:32                   ` rui wang
@ 2014-11-22 16:31                     ` Borislav Petkov
  0 siblings, 0 replies; 16+ messages in thread
From: Borislav Petkov @ 2014-11-22 16:31 UTC (permalink / raw)
  To: rui wang; +Cc: Luck, Tony, linux-kernel, gong.chen, Wang, Rui Y

On Sat, Nov 22, 2014 at 11:32:00PM +0800, rui wang wrote:
> But that means mcelog buffer will have to become circular, and we can
> only dump the last 32 errors. There must be a reason why it wasn't
> designed as circular.

Is there? Please do tell because I don't know the reason why.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] x86/mce: Try printing all machine check banks known before panic
  2014-11-21 21:59                   ` Luck, Tony
@ 2014-11-23 20:55                     ` Borislav Petkov
  0 siblings, 0 replies; 16+ messages in thread
From: Borislav Petkov @ 2014-11-23 20:55 UTC (permalink / raw)
  To: Luck, Tony; +Cc: rui wang, linux-kernel, gong.chen, Wang, Rui Y

On Fri, Nov 21, 2014 at 09:59:49PM +0000, Luck, Tony wrote:
> > Oh, cpu errata. So this would mean that we can't even rely on the
> > contents of the MCA banks, can we?
> >
> > In any case, is any of the information in the MCA banks in such cases
> > even usable then? Because if not, we're definitely barking up the wrong
> > tree...
> 
> See above - I think even if there is a bug in the core that isn't setting the
> right bits in the MCi_STATUS register - we could get good data from
> devices out in the uncore.

Btw, since we're talking about errata - I guess you could use X86_BUG
and static_cpu_has_bug* to query in do_machine_check and modify logging
behavior of mce_log to bypass mce_severity and put all errors of the
last round in mce_log and then dump them out at panic time.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-11-23 20:55 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-19  9:22 [PATCH v3] x86/mce: Try printing all machine check banks known before panic ruiv.wang
2014-11-19 10:29 ` Borislav Petkov
2014-11-19 23:34   ` Luck, Tony
2014-11-20 10:15     ` Borislav Petkov
2014-11-21  1:20       ` rui wang
2014-11-21 16:41         ` Borislav Petkov
2014-11-21 17:20           ` Luck, Tony
2014-11-21 18:13             ` Borislav Petkov
2014-11-21 21:31               ` Luck, Tony
2014-11-21 21:35                 ` Borislav Petkov
2014-11-21 21:59                   ` Luck, Tony
2014-11-23 20:55                     ` Borislav Petkov
2014-11-22  2:16               ` rui wang
2014-11-22  9:44                 ` Borislav Petkov
2014-11-22 15:32                   ` rui wang
2014-11-22 16:31                     ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).