All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors.
@ 2013-07-23 20:34 Luck, Tony
  2013-07-23 22:51 ` Tony Luck
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Luck, Tony @ 2013-07-23 20:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: Borislav Petkov, Chen Gong, Naveen N. Rao

The 0x1000 bit of the MCACOD field of machine check MCi_STATUS
registers is only defined for corrected errors (where it means
that hardware may be filtering errors see SDM section 15.9.2.1).

For uncorrected errors it may, or may not be set - so we should mask
it out when checking for the architecturaly defined recoverable
error signatures (see SDM 15.9.3.1 and 15.9.3.2)

While fixing this - I also noticed a bug introduced by
  commit 33d7885b594e169256daef652e8d3527b2298e75
  x86/mce: Update MCE severity condition check
where we were including MCACOD bits in the check for the
unaffected thread(s) during a machine check.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/mce.h                | 3 ++-
 arch/x86/kernel/cpu/mcheck/mce-severity.c | 8 ++++----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index fa5f71e..a528f28 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -33,10 +33,11 @@
 #define MCI_STATUS_S	 (1ULL<<56)  /* Signaled machine check */
 #define MCI_STATUS_AR	 (1ULL<<55)  /* Action required */
 #define MCACOD		  0xffff     /* MCA Error Code */
+#define MCACOD_UC	  0xefff     /* MCA Error Code - for UC errors */
 
 /* Architecturally defined codes from SDM Vol. 3B Chapter 15 */
 #define MCACOD_SCRUB	0x00C0	/* 0xC0-0xCF Memory Scrubbing */
-#define MCACOD_SCRUBMSK	0xfff0
+#define MCACOD_SCRUBMSK	0xeff0
 #define MCACOD_L3WB	0x017A	/* L3 Explicit Writeback */
 #define MCACOD_DATA	0x0134	/* Data Load */
 #define MCACOD_INSTR	0x0150	/* Instruction Fetch */
diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index e2703520..7f6ab4e 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -111,17 +111,17 @@ static struct severity {
 #ifdef	CONFIG_MEMORY_FAILURE
 	MCESEV(
 		KEEP, "Action required but unaffected thread is continuable",
-		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR),
+		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR, MCI_UC_SAR|MCI_ADDR),
 		MCGMASK(MCG_STATUS_RIPV, MCG_STATUS_RIPV)
 		),
 	MCESEV(
 		AR, "Action required: data load error in a user process",
-		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_DATA),
+		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD_UC, MCI_UC_SAR|MCI_ADDR|MCACOD_DATA),
 		USER
 		),
 	MCESEV(
 		AR, "Action required: instruction fetch error in a user process",
-		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_INSTR),
+		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD_UC, MCI_UC_SAR|MCI_ADDR|MCACOD_INSTR),
 		USER
 		),
 #endif
@@ -137,7 +137,7 @@ static struct severity {
 		),
 	MCESEV(
 		AO, "Action optional: last level cache writeback error",
-		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCACOD, MCI_UC_S|MCACOD_L3WB)
+		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCACOD_UC, MCI_UC_S|MCACOD_L3WB)
 		),
 	MCESEV(
 		SOME, "Action optional: unknown MCACOD",
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors.
  2013-07-23 20:34 [PATCH] x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors Luck, Tony
@ 2013-07-23 22:51 ` Tony Luck
  2013-07-24  6:16   ` Chen Gong
  2013-07-24  6:19 ` Chen Gong
  2013-07-24 15:55 ` Naveen N. Rao
  2 siblings, 1 reply; 9+ messages in thread
From: Tony Luck @ 2013-07-23 22:51 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Borislav Petkov, Chen Gong, Naveen N. Rao

Gah ... there is another bug in that unaffected thread entry.  The check for
MCG_STATUS should be for RIPV=1 *and* EIPV=0

gmail will mess this patch up ... but should still be readable.

-Tony

---

diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c
b/arch/x86/kernel/cpu/mcheck/mce-severity
index 7f6ab4e..48f0fd2 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -112,7 +112,7 @@ static struct severity {
        MCESEV(
                KEEP, "Action required but unaffected thread is continuable",
                SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR,
MCI_UC_SAR|MCI_ADDR),
-               MCGMASK(MCG_STATUS_RIPV, MCG_STATUS_RIPV)
+               MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, MCG_STATUS_RIPV)
                ),
        MCESEV(
                AR, "Action required: data load error in a user process",

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors.
  2013-07-23 22:51 ` Tony Luck
@ 2013-07-24  6:16   ` Chen Gong
  2013-07-25 10:38     ` Naveen N. Rao
  0 siblings, 1 reply; 9+ messages in thread
From: Chen Gong @ 2013-07-24  6:16 UTC (permalink / raw)
  To: Tony Luck; +Cc: Linux Kernel Mailing List, Borislav Petkov, Naveen N. Rao

[-- Attachment #1: Type: text/plain, Size: 1654 bytes --]

On Tue, Jul 23, 2013 at 03:51:14PM -0700, Tony Luck wrote:
> Date: Tue, 23 Jul 2013 15:51:14 -0700
> From: Tony Luck <tony.luck@gmail.com>
> To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
> Cc: Borislav Petkov <bp@suse.de>, Chen Gong <gong.chen@linux.intel.com>,
>  "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
> Subject: Re: [PATCH] x86/mce: Pay no attention to 'F' bit in MCACOD when
>  parsing 'UC' errors.
> 
> Gah ... there is another bug in that unaffected thread entry.  The check for
> MCG_STATUS should be for RIPV=1 *and* EIPV=0
> 

I set "MCGMASK(MCG_STATUS_RIPV, MCG_STATUS_RIPV)" becase
I want it to cover Non-Affected Logical Processors (1,0)
and Affected Logical Processor/Recoverable continuable (1,1).

I think both of them are continuable so they should be as
*KEEP*.

> gmail will mess this patch up ... but should still be readable.
> 
> -Tony
> 
> ---
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c
> b/arch/x86/kernel/cpu/mcheck/mce-severity
> index 7f6ab4e..48f0fd2 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
> @@ -112,7 +112,7 @@ static struct severity {
>         MCESEV(
>                 KEEP, "Action required but unaffected thread is continuable",
>                 SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR,
> MCI_UC_SAR|MCI_ADDR),
> -               MCGMASK(MCG_STATUS_RIPV, MCG_STATUS_RIPV)
> +               MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, MCG_STATUS_RIPV)
>                 ),
>         MCESEV(
>                 AR, "Action required: data load error in a user process",

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors.
  2013-07-23 20:34 [PATCH] x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors Luck, Tony
  2013-07-23 22:51 ` Tony Luck
@ 2013-07-24  6:19 ` Chen Gong
  2013-07-24 15:55 ` Naveen N. Rao
  2 siblings, 0 replies; 9+ messages in thread
From: Chen Gong @ 2013-07-24  6:19 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-kernel, Borislav Petkov, Naveen N. Rao

[-- Attachment #1: Type: text/plain, Size: 3920 bytes --]

On Tue, Jul 23, 2013 at 01:34:42PM -0700, Luck, Tony wrote:
> Date: Tue, 23 Jul 2013 13:34:42 -0700
> From: "Luck, Tony" <tony.luck@intel.com>
> To: linux-kernel@vger.kernel.org
> Cc: Borislav Petkov <bp@suse.de>, Chen Gong <gong.chen@linux.intel.com>,
>  "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
> Subject: [PATCH] x86/mce: Pay no attention to 'F' bit in MCACOD when
>  parsing 'UC' errors.
> 
> The 0x1000 bit of the MCACOD field of machine check MCi_STATUS
> registers is only defined for corrected errors (where it means
> that hardware may be filtering errors see SDM section 15.9.2.1).
> 
> For uncorrected errors it may, or may not be set - so we should mask
> it out when checking for the architecturaly defined recoverable
> error signatures (see SDM 15.9.3.1 and 15.9.3.2)
> 
> While fixing this - I also noticed a bug introduced by
>   commit 33d7885b594e169256daef652e8d3527b2298e75
>   x86/mce: Update MCE severity condition check
> where we were including MCACOD bits in the check for the
> unaffected thread(s) during a machine check.
> 

This bug should exist in a long period. How about splitting
this patch into two? One is for updating definition; The other
is for bug fix. if so, the 1st patch can be sent to *stable*
tree.

> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/include/asm/mce.h                | 3 ++-
>  arch/x86/kernel/cpu/mcheck/mce-severity.c | 8 ++++----
>  2 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index fa5f71e..a528f28 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -33,10 +33,11 @@
>  #define MCI_STATUS_S	 (1ULL<<56)  /* Signaled machine check */
>  #define MCI_STATUS_AR	 (1ULL<<55)  /* Action required */
>  #define MCACOD		  0xffff     /* MCA Error Code */
> +#define MCACOD_UC	  0xefff     /* MCA Error Code - for UC errors */
>  
>  /* Architecturally defined codes from SDM Vol. 3B Chapter 15 */
>  #define MCACOD_SCRUB	0x00C0	/* 0xC0-0xCF Memory Scrubbing */
> -#define MCACOD_SCRUBMSK	0xfff0
> +#define MCACOD_SCRUBMSK	0xeff0
>  #define MCACOD_L3WB	0x017A	/* L3 Explicit Writeback */
>  #define MCACOD_DATA	0x0134	/* Data Load */
>  #define MCACOD_INSTR	0x0150	/* Instruction Fetch */
> diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
> index e2703520..7f6ab4e 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
> @@ -111,17 +111,17 @@ static struct severity {
>  #ifdef	CONFIG_MEMORY_FAILURE
>  	MCESEV(
>  		KEEP, "Action required but unaffected thread is continuable",
> -		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR),
> +		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR, MCI_UC_SAR|MCI_ADDR),
>  		MCGMASK(MCG_STATUS_RIPV, MCG_STATUS_RIPV)
>  		),
>  	MCESEV(
>  		AR, "Action required: data load error in a user process",
> -		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_DATA),
> +		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD_UC, MCI_UC_SAR|MCI_ADDR|MCACOD_DATA),
>  		USER
>  		),
>  	MCESEV(
>  		AR, "Action required: instruction fetch error in a user process",
> -		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_INSTR),
> +		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD_UC, MCI_UC_SAR|MCI_ADDR|MCACOD_INSTR),
>  		USER
>  		),
>  #endif
> @@ -137,7 +137,7 @@ static struct severity {
>  		),
>  	MCESEV(
>  		AO, "Action optional: last level cache writeback error",
> -		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCACOD, MCI_UC_S|MCACOD_L3WB)
> +		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCACOD_UC, MCI_UC_S|MCACOD_L3WB)
>  		),
>  	MCESEV(
>  		SOME, "Action optional: unknown MCACOD",
> -- 
> 1.8.1.4
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors.
  2013-07-23 20:34 [PATCH] x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors Luck, Tony
  2013-07-23 22:51 ` Tony Luck
  2013-07-24  6:19 ` Chen Gong
@ 2013-07-24 15:55 ` Naveen N. Rao
  2013-07-24 17:00   ` Luck, Tony
  2 siblings, 1 reply; 9+ messages in thread
From: Naveen N. Rao @ 2013-07-24 15:55 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-kernel, Borislav Petkov, Chen Gong

On 2013/07/23 01:34PM, Tony Luck wrote:
> The 0x1000 bit of the MCACOD field of machine check MCi_STATUS
> registers is only defined for corrected errors (where it means
> that hardware may be filtering errors see SDM section 15.9.2.1).
> 
> For uncorrected errors it may, or may not be set - so we should mask
> it out when checking for the architecturaly defined recoverable
> error signatures (see SDM 15.9.3.1 and 15.9.3.2)
> 
> While fixing this - I also noticed a bug introduced by
>   commit 33d7885b594e169256daef652e8d3527b2298e75
>   x86/mce: Update MCE severity condition check
> where we were including MCACOD bits in the check for the
> unaffected thread(s) during a machine check.

Good catch!

> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/include/asm/mce.h                | 3 ++-
>  arch/x86/kernel/cpu/mcheck/mce-severity.c | 8 ++++----
>  2 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index fa5f71e..a528f28 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -33,10 +33,11 @@
>  #define MCI_STATUS_S	 (1ULL<<56)  /* Signaled machine check */
>  #define MCI_STATUS_AR	 (1ULL<<55)  /* Action required */
>  #define MCACOD		  0xffff     /* MCA Error Code */
> +#define MCACOD_UC	  0xefff     /* MCA Error Code - for UC errors */

How about just changing MCACOD to 0xefff? I don't think we ever care
about the 'F' bit, so we could simplify this by just changing MCACOD.

Regards,
Naveen

>  
>  /* Architecturally defined codes from SDM Vol. 3B Chapter 15 */
>  #define MCACOD_SCRUB	0x00C0	/* 0xC0-0xCF Memory Scrubbing */
> -#define MCACOD_SCRUBMSK	0xfff0
> +#define MCACOD_SCRUBMSK	0xeff0
>  #define MCACOD_L3WB	0x017A	/* L3 Explicit Writeback */
>  #define MCACOD_DATA	0x0134	/* Data Load */
>  #define MCACOD_INSTR	0x0150	/* Instruction Fetch */
> diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
> index e2703520..7f6ab4e 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
> @@ -111,17 +111,17 @@ static struct severity {
>  #ifdef	CONFIG_MEMORY_FAILURE
>  	MCESEV(
>  		KEEP, "Action required but unaffected thread is continuable",
> -		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR),
> +		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR, MCI_UC_SAR|MCI_ADDR),
>  		MCGMASK(MCG_STATUS_RIPV, MCG_STATUS_RIPV)
>  		),
>  	MCESEV(
>  		AR, "Action required: data load error in a user process",
> -		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_DATA),
> +		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD_UC, MCI_UC_SAR|MCI_ADDR|MCACOD_DATA),
>  		USER
>  		),
>  	MCESEV(
>  		AR, "Action required: instruction fetch error in a user process",
> -		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_INSTR),
> +		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD_UC, MCI_UC_SAR|MCI_ADDR|MCACOD_INSTR),
>  		USER
>  		),
>  #endif
> @@ -137,7 +137,7 @@ static struct severity {
>  		),
>  	MCESEV(
>  		AO, "Action optional: last level cache writeback error",
> -		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCACOD, MCI_UC_S|MCACOD_L3WB)
> +		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCACOD_UC, MCI_UC_S|MCACOD_L3WB)
>  		),
>  	MCESEV(
>  		SOME, "Action optional: unknown MCACOD",
> -- 
> 1.8.1.4
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH] x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors.
  2013-07-24 15:55 ` Naveen N. Rao
@ 2013-07-24 17:00   ` Luck, Tony
  0 siblings, 0 replies; 9+ messages in thread
From: Luck, Tony @ 2013-07-24 17:00 UTC (permalink / raw)
  To: Naveen N. Rao; +Cc: linux-kernel, Borislav Petkov, Chen Gong

> How about just changing MCACOD to 0xefff? I don't think we ever care
> about the 'F' bit, so we could simplify this by just changing MCACOD.

That certainly reduces the size of the patch ... I was a little worried about
just changing this because it doesn't match the definition of the MCACOD
field in the SDM (Figure 15-5 in 15.3.2.2 IA32_MCi_STATUS MSRS).

But auditing the existing uses of this define - we only currently use it in
places where we are looking for specific recoverable error signatures.
So I think a big comment in mce.h explaining the missing 'F' bit should
suffice.

I'm going to take Chen Gong's suggestion and break this into two patches,
one to fix the regression from the earlier cleanup. Another to tackle the
"F" bit issue.

New patches in a while.

-Tony

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors.
  2013-07-24  6:16   ` Chen Gong
@ 2013-07-25 10:38     ` Naveen N. Rao
  2013-07-25 18:01       ` Luck, Tony
  0 siblings, 1 reply; 9+ messages in thread
From: Naveen N. Rao @ 2013-07-25 10:38 UTC (permalink / raw)
  To: gong.chen, tony.luck; +Cc: bp, linux-kernel

On 07/24/2013 11:46 AM, Chen Gong wrote:
> On Tue, Jul 23, 2013 at 03:51:14PM -0700, Tony Luck wrote:
>> Date: Tue, 23 Jul 2013 15:51:14 -0700
>> From: Tony Luck <tony.luck@gmail.com>
>> To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
>> Cc: Borislav Petkov <bp@suse.de>, Chen Gong <gong.chen@linux.intel.com>,
>>   "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
>> Subject: Re: [PATCH] x86/mce: Pay no attention to 'F' bit in MCACOD when
>>   parsing 'UC' errors.
>>
>> Gah ... there is another bug in that unaffected thread entry.  The check
>> for
>> MCG_STATUS should be for RIPV=1 *and* EIPV=0
>>
>
> I set "MCGMASK(MCG_STATUS_RIPV, MCG_STATUS_RIPV)" becase
> I want it to cover Non-Affected Logical Processors (1,0)
> and Affected Logical Processor/Recoverable continuable (1,1).
>
> I think both of them are continuable so they should be as
> *KEEP*.

For affected logical processors, we won't be able to continue if we were in
kernel-space. Right? So, it looks like we should panic and I think this gets
covered by "Action required: unknown MCACOD" entry later on, though a more
explicit entry might help. For user-space, the next two entries cover AR.

Does the below help or am I reading this wrong?

Thanks,
Naveen

--
We have three categories under MCA Action Required (AR):
1. Unaffected threads/cpu (RIPV=1,EIPV=0): always continuable
2. Affected threads (RIPV=EIPV=1): continuable
3. Affected threads (RIPV=0): not continuable

The consolidated entry (Tony's new patch) should only cover (1).

(2) and (3) are covered for user-space by the two entries following the entry
for (1) for data load and instruction fetch errors.

(3) is covered for kernel-space by the earlier entry for "In kernel and no
restart IP" where we panic. The below patch is to make (2) explicit for
kernel-space.


Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 arch/x86/kernel/cpu/mcheck/mce-severity.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index e2703520..585ddbb 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -115,6 +115,12 @@ static struct severity {
 		MCGMASK(MCG_STATUS_RIPV, MCG_STATUS_RIPV)
 		),
 	MCESEV(
+		PANIC, "Action required but kernel thread is not continuable",
+		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR, MCI_UC_SAR|MCI_ADDR),
+		MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, MCG_STATUS_RIPV|MCG_STATUS_EIPV),
+		KERNEL
+		),
+	MCESEV(
 		AR, "Action required: data load error in a user process",
 		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_DATA),
 		USER


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* RE: [PATCH] x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors.
  2013-07-25 10:38     ` Naveen N. Rao
@ 2013-07-25 18:01       ` Luck, Tony
  2013-07-31 10:06         ` Naveen N. Rao
  0 siblings, 1 reply; 9+ messages in thread
From: Luck, Tony @ 2013-07-25 18:01 UTC (permalink / raw)
  To: Naveen N. Rao, gong.chen; +Cc: bp, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 951 bytes --]

	MCESEV(
+		PANIC, "Action required but kernel thread is not continuable",
+		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR, MCI_UC_SAR|MCI_ADDR),
+		MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, MCG_STATUS_RIPV|MCG_STATUS_EIPV),
+		KERNEL
+		),
+	MCESEV(
 		AR, "Action required: data load error in a user process",
 		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_DATA),
 		USER

This just gives us a better panic message. Right?  Without this we'd keep walking the
severity table until we hit the "Action required: unknown MCACOD" entry which will
match and force a panic anyway.

So I might look for better wording.  As far as the h/w is concerned the thread is continuable.
Linux is just not smart enough (yet) to take the required recovery action.

-Tony
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors.
  2013-07-25 18:01       ` Luck, Tony
@ 2013-07-31 10:06         ` Naveen N. Rao
  0 siblings, 0 replies; 9+ messages in thread
From: Naveen N. Rao @ 2013-07-31 10:06 UTC (permalink / raw)
  To: Luck, Tony; +Cc: gong.chen, bp, linux-kernel

On 07/25/2013 11:31 PM, Luck, Tony wrote:
> 	MCESEV(
> +		PANIC, "Action required but kernel thread is not continuable",
> +		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR, MCI_UC_SAR|MCI_ADDR),
> +		MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, MCG_STATUS_RIPV|MCG_STATUS_EIPV),
> +		KERNEL
> +		),
> +	MCESEV(
>   		AR, "Action required: data load error in a user process",
>   		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_DATA),
>   		USER
>
> This just gives us a better panic message. Right?  Without this we'd keep walking the
> severity table until we hit the "Action required: unknown MCACOD" entry which will
> match and force a panic anyway.

Yes, that's correct. But I felt it would be good to have this entry to 
make it explicit.

>
> So I might look for better wording.  As far as the h/w is concerned the thread is continuable.
> Linux is just not smart enough (yet) to take the required recovery action.

Ok, how about: "Action required but unable to recover kernel thread"


Thanks,
Naveen


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-07-31 10:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-23 20:34 [PATCH] x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors Luck, Tony
2013-07-23 22:51 ` Tony Luck
2013-07-24  6:16   ` Chen Gong
2013-07-25 10:38     ` Naveen N. Rao
2013-07-25 18:01       ` Luck, Tony
2013-07-31 10:06         ` Naveen N. Rao
2013-07-24  6:19 ` Chen Gong
2013-07-24 15:55 ` Naveen N. Rao
2013-07-24 17:00   ` Luck, Tony

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.