All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86, MCE, AMD: use macros to compute bank MSRs
@ 2014-09-23  2:16 Chen Yucong
  2014-09-23  8:19 ` [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it Chen Yucong
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Chen Yucong @ 2014-09-23  2:16 UTC (permalink / raw)
  To: tony.luck; +Cc: bp, linux-edac, linux-kernel, Chen Yucong

Avoid open coded calculations for bank MSRs by hiding the index
of higher bank MSRs in well-defined macros.

No semantic changes.

Signed-off-by: Chen Yucong <slaoub@gmail.com>
---
 arch/x86/kernel/cpu/mcheck/mce_amd.c |   10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 5d4999f..f8c56bd 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -217,7 +217,7 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 	for (bank = 0; bank < mca_cfg.banks; ++bank) {
 		for (block = 0; block < NR_BLOCKS; ++block) {
 			if (block == 0)
-				address = MSR_IA32_MC0_MISC + bank * 4;
+				address = MSR_IA32_MCx_MISC(bank);
 			else if (block == 1) {
 				address = (low & MASK_BLKPTR_LO) >> 21;
 				if (!address)
@@ -281,7 +281,7 @@ static void amd_threshold_interrupt(void)
 			continue;
 		for (block = 0; block < NR_BLOCKS; ++block) {
 			if (block == 0) {
-				address = MSR_IA32_MC0_MISC + bank * 4;
+				address = MSR_IA32_MCx_MISC(bank);
 			} else if (block == 1) {
 				address = (low & MASK_BLKPTR_LO) >> 21;
 				if (!address)
@@ -314,8 +314,7 @@ static void amd_threshold_interrupt(void)
 
 			if (high & MASK_OVERFLOW_HI) {
 				rdmsrl(address, m.misc);
-				rdmsrl(MSR_IA32_MC0_STATUS + bank * 4,
-				       m.status);
+				rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
 				m.bank = K8_MCE_THRESHOLD_BASE
 				       + bank * NR_BLOCKS
 				       + block;
@@ -617,8 +616,7 @@ static int threshold_create_bank(unsigned int cpu, unsigned int bank)
 		}
 	}
 
-	err = allocate_threshold_blocks(cpu, bank, 0,
-					MSR_IA32_MC0_MISC + bank * 4);
+	err = allocate_threshold_blocks(cpu, bank, 0, MSR_IA32_MCx_MISC(bank));
 	if (!err)
 		goto out;
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-09-23  2:16 [PATCH] x86, MCE, AMD: use macros to compute bank MSRs Chen Yucong
@ 2014-09-23  8:19 ` Chen Yucong
  2014-09-28  8:15   ` Chen Yucong
  2014-09-29 12:05   ` Borislav Petkov
  2014-09-28  8:09 ` [PATCH] x86, MCE, AMD: use macros to compute bank MSRs Chen Yucong
  2014-09-29 11:48 ` Borislav Petkov
  2 siblings, 2 replies; 28+ messages in thread
From: Chen Yucong @ 2014-09-23  8:19 UTC (permalink / raw)
  To: tony.luck; +Cc: bp, linux-edac, linux-kernel

machine_check_poll() will reset IA32_MCi_STATUS register to zero.
So we need to save the content of IA32_MCi_STATUS MSRs before
calling machine_check_poll() for logging threshold interrupt event.

mce_setup() does not gather the content of IA32_MCG_STATUS, so it
should be read explicitly.

Signed-off-by: Chen Yucong <slaoub@gmail.com>
---
 arch/x86/kernel/cpu/mcheck/mce_amd.c |   13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index f8c56bd..9148b4d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -275,6 +275,12 @@ static void amd_threshold_interrupt(void)
 
 	mce_setup(&m);
 
+	/*
+	 * mce_setup() can't gather the content of IA32_MCG_STATUS,
+	 * so it should be read explicitly.
+	 */
+	rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
+
 	/* assume first bank caused it */
 	for (bank = 0; bank < mca_cfg.banks; ++bank) {
 		if (!(per_cpu(bank_map, m.cpu) & (1 << bank)))
@@ -305,6 +311,12 @@ static void amd_threshold_interrupt(void)
 			     (high & MASK_LOCKED_HI))
 				continue;
 
+			/*
+			 * machine_check_poll() will reset IA32_MCi_STATUS
+			 * register to zero, save it for use later.
+			 */
+			rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
+
 			/*
 			 * Log the machine check that caused the threshold
 			 * event.
@@ -314,7 +326,6 @@ static void amd_threshold_interrupt(void)
 
 			if (high & MASK_OVERFLOW_HI) {
 				rdmsrl(address, m.misc);
-				rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
 				m.bank = K8_MCE_THRESHOLD_BASE
 				       + bank * NR_BLOCKS
 				       + block;
-- 
1.7.10.4






^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86, MCE, AMD: use macros to compute bank MSRs
  2014-09-23  2:16 [PATCH] x86, MCE, AMD: use macros to compute bank MSRs Chen Yucong
  2014-09-23  8:19 ` [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it Chen Yucong
@ 2014-09-28  8:09 ` Chen Yucong
  2014-09-29 11:48 ` Borislav Petkov
  2 siblings, 0 replies; 28+ messages in thread
From: Chen Yucong @ 2014-09-28  8:09 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel, Andi Kleen, Luck, Tony

On Tue, 2014-09-23 at 10:16 +0800, Chen Yucong wrote:
> Avoid open coded calculations for bank MSRs by hiding the index
> of higher bank MSRs in well-defined macros.
> 
> No semantic changes.
> 
> Signed-off-by: Chen Yucong <slaoub@gmail.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce_amd.c |   10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> index 5d4999f..f8c56bd 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> @@ -217,7 +217,7 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
>  	for (bank = 0; bank < mca_cfg.banks; ++bank) {
>  		for (block = 0; block < NR_BLOCKS; ++block) {
>  			if (block == 0)
> -				address = MSR_IA32_MC0_MISC + bank * 4;
> +				address = MSR_IA32_MCx_MISC(bank);
>  			else if (block == 1) {
>  				address = (low & MASK_BLKPTR_LO) >> 21;
>  				if (!address)
> @@ -281,7 +281,7 @@ static void amd_threshold_interrupt(void)
>  			continue;
>  		for (block = 0; block < NR_BLOCKS; ++block) {
>  			if (block == 0) {
> -				address = MSR_IA32_MC0_MISC + bank * 4;
> +				address = MSR_IA32_MCx_MISC(bank);
>  			} else if (block == 1) {
>  				address = (low & MASK_BLKPTR_LO) >> 21;
>  				if (!address)
> @@ -314,8 +314,7 @@ static void amd_threshold_interrupt(void)
>  
>  			if (high & MASK_OVERFLOW_HI) {
>  				rdmsrl(address, m.misc);
> -				rdmsrl(MSR_IA32_MC0_STATUS + bank * 4,
> -				       m.status);
> +				rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
>  				m.bank = K8_MCE_THRESHOLD_BASE
>  				       + bank * NR_BLOCKS
>  				       + block;
> @@ -617,8 +616,7 @@ static int threshold_create_bank(unsigned int cpu, unsigned int bank)
>  		}
>  	}
>  
> -	err = allocate_threshold_blocks(cpu, bank, 0,
> -					MSR_IA32_MC0_MISC + bank * 4);
> +	err = allocate_threshold_blocks(cpu, bank, 0, MSR_IA32_MCx_MISC(bank));
>  	if (!err)
>  		goto out;
>  
Hi Boris,

Can you review the above patch?

thx!
cyc


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-09-23  8:19 ` [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it Chen Yucong
@ 2014-09-28  8:15   ` Chen Yucong
  2014-09-29 12:05   ` Borislav Petkov
  1 sibling, 0 replies; 28+ messages in thread
From: Chen Yucong @ 2014-09-28  8:15 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-edac, linux-kernel, Luck, Tony, Borislav Petkov

On Tue, 2014-09-23 at 16:19 +0800, Chen Yucong wrote:
> machine_check_poll() will reset IA32_MCi_STATUS register to zero.
> So we need to save the content of IA32_MCi_STATUS MSRs before
> calling machine_check_poll() for logging threshold interrupt event.
> 
> mce_setup() does not gather the content of IA32_MCG_STATUS, so it
> should be read explicitly.
> 
> Signed-off-by: Chen Yucong <slaoub@gmail.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce_amd.c |   13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> index f8c56bd..9148b4d 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> @@ -275,6 +275,12 @@ static void amd_threshold_interrupt(void)
>  
>  	mce_setup(&m);
>  
> +	/*
> +	 * mce_setup() can't gather the content of IA32_MCG_STATUS,
> +	 * so it should be read explicitly.
> +	 */
> +	rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
> +
>  	/* assume first bank caused it */
>  	for (bank = 0; bank < mca_cfg.banks; ++bank) {
>  		if (!(per_cpu(bank_map, m.cpu) & (1 << bank)))
> @@ -305,6 +311,12 @@ static void amd_threshold_interrupt(void)
>  			     (high & MASK_LOCKED_HI))
>  				continue;
>  
> +			/*
> +			 * machine_check_poll() will reset IA32_MCi_STATUS
> +			 * register to zero, save it for use later.
> +			 */
> +			rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
> +
>  			/*
>  			 * Log the machine check that caused the threshold
>  			 * event.
> @@ -314,7 +326,6 @@ static void amd_threshold_interrupt(void)
>  
>  			if (high & MASK_OVERFLOW_HI) {
>  				rdmsrl(address, m.misc);
> -				rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
>  				m.bank = K8_MCE_THRESHOLD_BASE
>  				       + bank * NR_BLOCKS
>  				       + block;
Hi Andi,

Can you review the above patch?

thx!
cyc


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86, MCE, AMD: use macros to compute bank MSRs
  2014-09-23  2:16 [PATCH] x86, MCE, AMD: use macros to compute bank MSRs Chen Yucong
  2014-09-23  8:19 ` [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it Chen Yucong
  2014-09-28  8:09 ` [PATCH] x86, MCE, AMD: use macros to compute bank MSRs Chen Yucong
@ 2014-09-29 11:48 ` Borislav Petkov
  2 siblings, 0 replies; 28+ messages in thread
From: Borislav Petkov @ 2014-09-29 11:48 UTC (permalink / raw)
  To: Chen Yucong; +Cc: tony.luck, linux-edac, linux-kernel

On Tue, Sep 23, 2014 at 10:16:01AM +0800, Chen Yucong wrote:
> Avoid open coded calculations for bank MSRs by hiding the index
> of higher bank MSRs in well-defined macros.
> 
> No semantic changes.
> 
> Signed-off-by: Chen Yucong <slaoub@gmail.com>

Applied, thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-09-23  8:19 ` [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it Chen Yucong
  2014-09-28  8:15   ` Chen Yucong
@ 2014-09-29 12:05   ` Borislav Petkov
  2014-09-30  0:39     ` Chen Yucong
  1 sibling, 1 reply; 28+ messages in thread
From: Borislav Petkov @ 2014-09-29 12:05 UTC (permalink / raw)
  To: Chen Yucong; +Cc: tony.luck, linux-edac, linux-kernel

On Tue, Sep 23, 2014 at 04:19:14PM +0800, Chen Yucong wrote:
> machine_check_poll() will reset IA32_MCi_STATUS register to zero.
> So we need to save the content of IA32_MCi_STATUS MSRs before
> calling machine_check_poll() for logging threshold interrupt event.
> 
> mce_setup() does not gather the content of IA32_MCG_STATUS, so it
> should be read explicitly.
> 
> Signed-off-by: Chen Yucong <slaoub@gmail.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce_amd.c |   13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> index f8c56bd..9148b4d 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> @@ -275,6 +275,12 @@ static void amd_threshold_interrupt(void)
>  
>  	mce_setup(&m);
>  
> +	/*
> +	 * mce_setup() can't gather the content of IA32_MCG_STATUS,
> +	 * so it should be read explicitly.
> +	 */

No need for that comment.

> +	rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
> +
>  	/* assume first bank caused it */
>  	for (bank = 0; bank < mca_cfg.banks; ++bank) {
>  		if (!(per_cpu(bank_map, m.cpu) & (1 << bank)))
> @@ -305,6 +311,12 @@ static void amd_threshold_interrupt(void)
>  			     (high & MASK_LOCKED_HI))
>  				continue;
>  
> +			/*
> +			 * machine_check_poll() will reset IA32_MCi_STATUS
> +			 * register to zero, save it for use later.
> +			 */
> +			rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);

Actually, to be more future-proof, I'd like to do the AMD-specific
logging first, i.e. before machine_check_poll() so that any future
changes there don't influence what we do in mce_amd.c.

So please move the machine_check_poll() call behind the

	if (high & MASK_OVERFLOW_HI) {

test and drop the return.

But the patch makes sense so good catch!

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-09-29 12:05   ` Borislav Petkov
@ 2014-09-30  0:39     ` Chen Yucong
  2014-09-30  7:25       ` Borislav Petkov
  0 siblings, 1 reply; 28+ messages in thread
From: Chen Yucong @ 2014-09-30  0:39 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: tony.luck, linux-edac, linux-kernel

On Mon, 2014-09-29 at 14:05 +0200, Borislav Petkov wrote:
> > machine_check_poll() will reset IA32_MCi_STATUS register to zero.
> > So we need to save the content of IA32_MCi_STATUS MSRs before
> > calling machine_check_poll() for logging threshold interrupt event.
> > 
> > mce_setup() does not gather the content of IA32_MCG_STATUS, so it
> > should be read explicitly.
> > 
> > Signed-off-by: Chen Yucong <slaoub@gmail.com>
> > ---
> >  arch/x86/kernel/cpu/mcheck/mce_amd.c |   13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > index f8c56bd..9148b4d 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > @@ -275,6 +275,12 @@ static void amd_threshold_interrupt(void)
> >  
> >       mce_setup(&m);
> >  
> > +     /*
> > +      * mce_setup() can't gather the content of IA32_MCG_STATUS,
> > +      * so it should be read explicitly.
> > +      */
> 
> No need for that comment.
> 
> > +     rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
> > +
> >       /* assume first bank caused it */
> >       for (bank = 0; bank < mca_cfg.banks; ++bank) {
> >               if (!(per_cpu(bank_map, m.cpu) & (1 << bank)))
> > @@ -305,6 +311,12 @@ static void amd_threshold_interrupt(void)
> >                            (high & MASK_LOCKED_HI))
> >                               continue;
> >  
> > +                     /*
> > +                      * machine_check_poll() will reset
> IA32_MCi_STATUS
> > +                      * register to zero, save it for use later.
> > +                      */
> > +                     rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
> 
> Actually, to be more future-proof, I'd like to do the AMD-specific
> logging first, i.e. before machine_check_poll() so that any future
> changes there don't influence what we do in mce_amd.c.
> 
> So please move the machine_check_poll() call behind the
> 
>         if (high & MASK_OVERFLOW_HI) { 
machine_check_poll() will scan all banks, so I think we can move it out
of the loop body.

thx!
cyc


From: Chen Yucong

machine_check_poll() will reset IA32_MCi_STATUS register to zero.
So we need to save the content of IA32_MCi_STATUS MSRs before
calling machine_check_poll() for logging threshold interrupt event.

mce_setup() does not gather the content of IA32_MCG_STATUS, so it
should be read explicitly. And we also need to save MSR_IA32_MCx_ADDR
if MCI_STATUS_ADDRV bit field is valid.  

Signed-off-by: Chen Yucong <slaoub@gmail.com>
---
 arch/x86/kernel/cpu/mcheck/mce_amd.c |   21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index f8c56bd..f5a5beb 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -274,6 +274,7 @@ static void amd_threshold_interrupt(void)
 	struct mce m;
 
 	mce_setup(&m);
+	rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
 
 	/* assume first bank caused it */
 	for (bank = 0; bank < mca_cfg.banks; ++bank) {
@@ -305,24 +306,28 @@ static void amd_threshold_interrupt(void)
 			     (high & MASK_LOCKED_HI))
 				continue;
 
-			/*
-			 * Log the machine check that caused the threshold
-			 * event.
-			 */
-			machine_check_poll(MCP_TIMESTAMP,
-					this_cpu_ptr(&mce_poll_banks));
-
 			if (high & MASK_OVERFLOW_HI) {
 				rdmsrl(address, m.misc);
 				rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
+				if (m.status & MCI_STATUS_ADDRV)
+					rdmsrl(MSR_IA32_MCx_ADDR(bank), m.addr);
 				m.bank = K8_MCE_THRESHOLD_BASE
 				       + bank * NR_BLOCKS
 				       + block;
 				mce_log(&m);
-				return;
+
+				wrmsrl(MSR_IA32_MCx_STATUS(bank), 0);
+				goto log_mcheck;
 			}
 		}
 	}
+
+log_mcheck:
+	/*
+	 * Log the machine check that caused the threshold event.
+	 */
+	machine_check_poll(MCP_TIMESTAMP,
+				this_cpu_ptr(&mce_poll_banks));
 }
 
 /*
-- 
1.7.10.4





^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-09-30  0:39     ` Chen Yucong
@ 2014-09-30  7:25       ` Borislav Petkov
  2014-09-30  9:56         ` Chen Yucong
  0 siblings, 1 reply; 28+ messages in thread
From: Borislav Petkov @ 2014-09-30  7:25 UTC (permalink / raw)
  To: Chen Yucong; +Cc: tony.luck, linux-edac, linux-kernel

On Tue, Sep 30, 2014 at 08:39:38AM +0800, Chen Yucong wrote:
> machine_check_poll() will scan all banks, so I think we can move it out
> of the loop body.

Ok.

> From: Chen Yucong
> 
> machine_check_poll() will reset IA32_MCi_STATUS register to zero.
> So we need to save the content of IA32_MCi_STATUS MSRs before
> calling machine_check_poll() for logging threshold interrupt event.
> 
> mce_setup() does not gather the content of IA32_MCG_STATUS, so it
> should be read explicitly. And we also need to save MSR_IA32_MCx_ADDR
> if MCI_STATUS_ADDRV bit field is valid.  
> 
> Signed-off-by: Chen Yucong <slaoub@gmail.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce_amd.c |   21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> index f8c56bd..f5a5beb 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> @@ -274,6 +274,7 @@ static void amd_threshold_interrupt(void)
>  	struct mce m;
>  
>  	mce_setup(&m);
> +	rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
>  
>  	/* assume first bank caused it */
>  	for (bank = 0; bank < mca_cfg.banks; ++bank) {
> @@ -305,24 +306,28 @@ static void amd_threshold_interrupt(void)
>  			     (high & MASK_LOCKED_HI))
>  				continue;
>  
> -			/*
> -			 * Log the machine check that caused the threshold
> -			 * event.
> -			 */
> -			machine_check_poll(MCP_TIMESTAMP,
> -					this_cpu_ptr(&mce_poll_banks));
> -
>  			if (high & MASK_OVERFLOW_HI) {
>  				rdmsrl(address, m.misc);
>  				rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
> +				if (m.status & MCI_STATUS_ADDRV)
> +					rdmsrl(MSR_IA32_MCx_ADDR(bank), m.addr);
>  				m.bank = K8_MCE_THRESHOLD_BASE
>  				       + bank * NR_BLOCKS
>  				       + block;
>  				mce_log(&m);
> -				return;
> +
> +				wrmsrl(MSR_IA32_MCx_STATUS(bank), 0);

No, machine_check_poll will clear it anyway and now you're adding a
purely useless MSR write here which costs.

> +				goto log_mcheck;

Why goto? It will hit that machine_check_poll below even without it...

> +
> +log_mcheck:
> +	/*
> +	 * Log the machine check that caused the threshold event.
> +	 */
> +	machine_check_poll(MCP_TIMESTAMP,
> +				this_cpu_ptr(&mce_poll_banks));
>  }

Of course, the more important question is: how are you testing your patches?

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-09-30  7:25       ` Borislav Petkov
@ 2014-09-30  9:56         ` Chen Yucong
  2014-09-30 10:09           ` Borislav Petkov
  0 siblings, 1 reply; 28+ messages in thread
From: Chen Yucong @ 2014-09-30  9:56 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: tony.luck, linux-edac, linux-kernel

On Tue, 2014-09-30 at 09:25 +0200, Borislav Petkov wrote:
> >                       if (high & MASK_OVERFLOW_HI) {
> >                               rdmsrl(address, m.misc);
> >                               rdmsrl(MSR_IA32_MCx_STATUS(bank),
> m.status);
> > +                             if (m.status & MCI_STATUS_ADDRV)
> >
> +                                     rdmsrl(MSR_IA32_MCx_ADDR(bank),
> m.addr);
> >                               m.bank = K8_MCE_THRESHOLD_BASE
> >                                      + bank * NR_BLOCKS
> >                                      + block;
> >                               mce_log(&m);
> > -                             return;
> > +
> > +                             wrmsrl(MSR_IA32_MCx_STATUS(bank), 0);
> 
> No, machine_check_poll will clear it anyway and now you're adding a
> purely useless MSR write here which costs.
I just clear it to avoid that the mce_log() call logs the above
threshold event again in machine_check_poll().
> 
> > +                             goto log_mcheck;
> 
> Why goto? It will hit that machine_check_poll below even without
> it... 
It is just used for scanning other banks for recording other valid 
error information.

thx!
cyc


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-09-30  9:56         ` Chen Yucong
@ 2014-09-30 10:09           ` Borislav Petkov
  2014-10-01  4:35             ` Chen Yucong
  2014-10-01  5:26             ` Chen Yucong
  0 siblings, 2 replies; 28+ messages in thread
From: Borislav Petkov @ 2014-09-30 10:09 UTC (permalink / raw)
  To: Chen Yucong; +Cc: tony.luck, linux-edac, linux-kernel

On Tue, Sep 30, 2014 at 05:56:31PM +0800, Chen Yucong wrote:
> I just clear it to avoid that the mce_log() call logs the above
> threshold event again in machine_check_poll().

Ok, that's a good point, please put it in the commit message.

> It is just used for scanning other banks for recording other valid
> error information.

This is actually not what we want - we want to log the errors which
cause the overflow first and then the rest. So you don't need the goto
but simply have the machine_check_poll() at the end.

Now let me repeat my question: how are you testing your patches?

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-09-30 10:09           ` Borislav Petkov
@ 2014-10-01  4:35             ` Chen Yucong
  2014-10-02 13:12               ` Borislav Petkov
  2014-10-01  5:26             ` Chen Yucong
  1 sibling, 1 reply; 28+ messages in thread
From: Chen Yucong @ 2014-10-01  4:35 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: tony.luck, linux-edac, linux-kernel

On Tue, 2014-09-30 at 12:09 +0200, Borislav Petkov wrote:
> On Tue, Sep 30, 2014 at 05:56:31PM +0800, Chen Yucong wrote:
> > I just clear it to avoid that the mce_log() call logs the above
> > threshold event again in machine_check_poll().
> 
> Ok, that's a good point, please put it in the commit message.
> 
> > It is just used for scanning other banks for recording other valid
> > error information.
> 
> This is actually not what we want - we want to log the errors which
> cause the overflow first and then the rest. So you don't need the goto
> but simply have the machine_check_poll() at the end. 


From: Chen Yucong <slaoub@gmail.com>

machine_check_poll() will reset IA32_MCi_STATUS register to zero.
So we need to save the content of IA32_MCi_STATUS MSRs before
calling machine_check_poll() for logging threshold interrupt event.

mce_setup() does not gather the content of IA32_MCG_STATUS, so it
should be read explicitly. Moreover, we need to clear IA32_MCx_STATUS
to avoid that mce_log() logs the processed threshold event again
at next time.

Signed-off-by: Chen Yucong <slaoub@gmail.com>
---
 arch/x86/kernel/cpu/mcheck/mce_amd.c |   18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index f8c56bd..643e6a2 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -274,6 +274,7 @@ static void amd_threshold_interrupt(void)
 	struct mce m;
 
 	mce_setup(&m);
+	rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
 
 	/* assume first bank caused it */
 	for (bank = 0; bank < mca_cfg.banks; ++bank) {
@@ -305,24 +306,27 @@ static void amd_threshold_interrupt(void)
 			     (high & MASK_LOCKED_HI))
 				continue;
 
-			/*
-			 * Log the machine check that caused the threshold
-			 * event.
-			 */
-			machine_check_poll(MCP_TIMESTAMP,
-					this_cpu_ptr(&mce_poll_banks));
-
 			if (high & MASK_OVERFLOW_HI) {
 				rdmsrl(address, m.misc);
 				rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
+				if (m.status & MCI_STATUS_ADDRV)
+					rdmsrl(MSR_IA32_MCx_ADDR(bank), m.addr);
 				m.bank = K8_MCE_THRESHOLD_BASE
 				       + bank * NR_BLOCKS
 				       + block;
 				mce_log(&m);
+
+				wrmsrl(MSR_IA32_MCx_STATUS(bank), 0);
 				return;
 			}
 		}
 	}
+
+	/*
+	 * Log the machine check that caused the threshold event.
+	 */
+	machine_check_poll(MCP_TIMESTAMP,
+				this_cpu_ptr(&mce_poll_banks));
 }
 
 /*
-- 
1.7.10.4




^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-09-30 10:09           ` Borislav Petkov
  2014-10-01  4:35             ` Chen Yucong
@ 2014-10-01  5:26             ` Chen Yucong
  2014-10-01 10:10               ` Borislav Petkov
  1 sibling, 1 reply; 28+ messages in thread
From: Chen Yucong @ 2014-10-01  5:26 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: tony.luck, linux-edac, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 640 bytes --]

On Tue, 2014-09-30 at 12:09 +0200, Borislav Petkov wrote:
> 
> Now let me repeat my question: how are you testing your patches?
> 
There are no any hardware facilities that can help me to inject some
MCE errors. So I have to modify the kernel source code for testing my
patches.

My method is based on the `mce-injection' that is better suited to 
Intel processors. So I have replaced rdmsrl/wrmsrl/rdmsr_safe with
mce_rdmsrl/mce_wrmsrl/mce_rdmsr_safe in mce_amd.c. But I use a new
kernel module for error injection instead of writing /dev/mcelog.

For more detailed information about testing, you can refer the 
attachments.

thx!
cyc 
  

[-- Attachment #2: amd-mce-injection.patch --]
[-- Type: text/x-patch, Size: 5299 bytes --]

diff -uNr amd_inject/linux-3.16.3/arch/x86/include/asm/mce.h linux-3.16.3/arch/x86/include/asm/mce.h
--- amd_inject/linux-3.16.3/arch/x86/include/asm/mce.h	2014-09-18 01:22:16.000000000 +0800
+++ linux-3.16.3/arch/x86/include/asm/mce.h	2014-10-01 09:36:06.302670241 +0800
@@ -166,6 +166,7 @@
 #endif
 
 #ifdef CONFIG_X86_MCE_AMD
+void raise_amd_threshold_event(void);
 void mce_amd_feature_init(struct cpuinfo_x86 *c);
 #else
 static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { }
@@ -185,10 +186,14 @@
 	MCP_DONTLOG = (1 << 2),		/* only clear, don't log */
 };
 void machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
+u64 mce_rdmsrl(u32 msr);
+void mce_wrmsrl(u32 msr, u64 v);
+int mce_rdmsr_safe(u32 msr, u32 *low, u32 *high);
 
 int mce_notify_irq(void);
 void mce_notify_process(void);
 
+extern int amd_inject;
 DECLARE_PER_CPU(struct mce, injectm);
 
 extern void register_mce_write_callback(ssize_t (*)(struct file *filp,
diff -uNr amd_inject/linux-3.16.3/arch/x86/kernel/cpu/mcheck/mce_amd.c linux-3.16.3/arch/x86/kernel/cpu/mcheck/mce_amd.c
--- amd_inject/linux-3.16.3/arch/x86/kernel/cpu/mcheck/mce_amd.c	2014-09-18 01:22:16.000000000 +0800
+++ linux-3.16.3/arch/x86/kernel/cpu/mcheck/mce_amd.c	2014-10-01 11:09:07.817585622 +0800
@@ -274,6 +274,7 @@
 	struct mce m;
 
 	mce_setup(&m);
+	m.mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
 
 	/* assume first bank caused it */
 	for (bank = 0; bank < mca_cfg.banks; ++bank) {
@@ -291,7 +292,7 @@
 				++address;
 			}
 
-			if (rdmsr_safe(address, &low, &high))
+			if (mce_rdmsr_safe(address, &low, &high))
 				break;
 
 			if (!(high & MASK_VALID_HI)) {
@@ -305,26 +306,35 @@
 			     (high & MASK_LOCKED_HI))
 				continue;
 
-			/*
-			 * Log the machine check that caused the threshold
-			 * event.
-			 */
-			machine_check_poll(MCP_TIMESTAMP,
-					&__get_cpu_var(mce_poll_banks));
-
 			if (high & MASK_OVERFLOW_HI) {
-				rdmsrl(address, m.misc);
-				rdmsrl(MSR_IA32_MC0_STATUS + bank * 4,
-				       m.status);
+				m.misc = mce_rdmsrl(address);
+				m.status = mce_rdmsrl(MSR_IA32_MC0_STATUS + bank * 4);
+				if (m.status & MCI_STATUS_ADDRV)
+					m.addr = mce_rdmsrl(MSR_IA32_MC0_ADDR + bank * 4);
 				m.bank = K8_MCE_THRESHOLD_BASE
 				       + bank * NR_BLOCKS
 				       + block;
 				mce_log(&m);
+				mce_wrmsrl(MSR_IA32_MC0_STATUS + bank * 4, 0);
 				return;
 			}
 		}
 	}
+
+	/*
+	 * Log the machine check that caused the threshold
+	 * event.
+	 */
+	machine_check_poll(MCP_TIMESTAMP,
+				&__get_cpu_var(mce_poll_banks));
+
+}
+
+void raise_amd_threshold_event(void)
+{
+	amd_threshold_interrupt();
 }
+EXPORT_SYMBOL_GPL(raise_amd_threshold_event);
 
 /*
  * Sysfs Interface
diff -uNr amd_inject/linux-3.16.3/arch/x86/kernel/cpu/mcheck/mce.c linux-3.16.3/arch/x86/kernel/cpu/mcheck/mce.c
--- amd_inject/linux-3.16.3/arch/x86/kernel/cpu/mcheck/mce.c	2014-09-18 01:22:16.000000000 +0800
+++ linux-3.16.3/arch/x86/kernel/cpu/mcheck/mce.c	2014-10-01 09:40:13.269228358 +0800
@@ -48,6 +48,9 @@
 
 #include "mce-internal.h"
 
+int amd_inject = 0;
+EXPORT_PER_CPU_SYMBOL_GPL(amd_inject);
+
 static DEFINE_MUTEX(mce_chrdev_read_mutex);
 
 #define rcu_dereference_check_mce(p) \
@@ -131,6 +134,7 @@
 	m->apicid = cpu_data(m->extcpu).initial_apicid;
 	rdmsrl(MSR_IA32_MCG_CAP, m->mcgcap);
 }
+EXPORT_SYMBOL_GPL(mce_setup);
 
 DEFINE_PER_CPU(struct mce, injectm);
 EXPORT_PER_CPU_SYMBOL_GPL(injectm);
@@ -391,7 +395,7 @@
 }
 
 /* MSR access wrappers used for error injection */
-static u64 mce_rdmsrl(u32 msr)
+u64 mce_rdmsrl(u32 msr)
 {
 	u64 v;
 
@@ -415,8 +419,9 @@
 
 	return v;
 }
 
-static void mce_wrmsrl(u32 msr, u64 v)
+void mce_wrmsrl(u32 msr, u64 v)
 {
 	if (__this_cpu_read(injectm.finished)) {
 		int offset = msr_to_offset(msr);
@@ -427,6 +432,18 @@
 	}
 	wrmsrl(msr, v);
 }
+
+int mce_rdmsr_safe(u32 msr, u32 *low, u32 *high) 
+{
+        u64 __val = mce_rdmsrl(msr);
+
+        (*low) = (u32)__val;
+        (*high) = (u32)(__val >> 32);
+
+	return 0;
+}
 
 /*
  * Collect all global (w.r.t. this processor) status about this machine
@@ -1637,6 +1654,7 @@
 		mce_adjust_timer = mce_intel_adjust_timer;
 		break;
 	case X86_VENDOR_AMD:
+		amd_inject = 1;
 		mce_amd_feature_init(c);
 		break;
 	default:
diff -uNr amd_inject/linux-3.16.3/arch/x86/kernel/cpu/mcheck/mce-inject.c linux-3.16.3/arch/x86/kernel/cpu/mcheck/mce-inject.c
--- amd_inject/linux-3.16.3/arch/x86/kernel/cpu/mcheck/mce-inject.c	2014-09-18 01:22:16.000000000 +0800
+++ linux-3.16.3/arch/x86/kernel/cpu/mcheck/mce-inject.c	2014-09-30 22:38:30.138557839 +0800
@@ -54,7 +54,10 @@
 
 	memset(&b, 0xff, sizeof(mce_banks_t));
 	local_irq_save(flags);
-	machine_check_poll(0, &b);
+	if (!amd_inject)
+		machine_check_poll(0, &b);
+	else 
+		mce_threshold_vector();
 	local_irq_restore(flags);
 	m->finished = 0;
 }
diff -uNr amd_inject/linux-3.16.3/arch/x86/kernel/cpu/mcheck/threshold.c linux-3.16.3/arch/x86/kernel/cpu/mcheck/threshold.c
--- amd_inject/linux-3.16.3/arch/x86/kernel/cpu/mcheck/threshold.c	2014-09-18 01:22:16.000000000 +0800
+++ linux-3.16.3/arch/x86/kernel/cpu/mcheck/threshold.c	2014-10-01 08:49:06.140738192 +0800
@@ -17,6 +17,7 @@
 }
 
 void (*mce_threshold_vector)(void) = default_threshold_interrupt;
+EXPORT_SYMBOL_GPL(mce_threshold_vector);
 
 static inline void __smp_threshold_interrupt(void)
 {

[-- Attachment #3: amd_inject.c --]
[-- Type: text/x-csrc, Size: 1613 bytes --]

/*
 * Copyright Chen Yucong<slaoub@gmail.com> 2014 
 *
 * This program is free software; you can redistribute it and/or
 * modify it under the terms of the GNU General Public License
 * as published by the Free Software Foundation; version 2
 * of the License.
 */
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/smp.h>
#include <linux/cpu.h>
#include <asm/mce.h>
#include <asm/msr.h>
#include <asm/amd_nb.h>

#define MASK_OVERFLOW  0x0001000000000000

/* Update fake mce registers on current CPU. */
static void inject_mce(struct mce *m)
{
	struct mce *i = &per_cpu(injectm, m->extcpu);

	/* Make sure no one reads partially written injectm */
	i->finished = 0;
	mb();
	m->finished = 0;
	/* First set the fields after finished */
	i->extcpu = m->extcpu;
	mb();
	/* Now write record in order, finished last (except above) */
	memcpy(i, m, sizeof(struct mce));
	/* Finally activate it */
	mb();
	i->finished = 1;
}

static void raise_mce(void)
{
	struct mce m;

	mce_setup(&m);
	m.status = 0X8C00000000000000;
	m.misc = 0XC008000000000000 | MASK_OVERFLOW;
	//m.misc = 0XC008000000000000;
	m.bank = 4;
	m.addr = 0xabcdef;
	inject_mce(&m);

	raise_amd_threshold_event();
}

static int __init amd_inject_init(void)
{
	raise_mce();
	pr_info("amd_inject module loaded ...\n");

	return 0;
}

static void __exit amd_inject_exit(void)
{
	pr_info("amd_inject module unloaded ...\n");
}

module_init(amd_inject_init);
module_exit(amd_inject_exit);

/*
 * Cannot tolerate unloading currently because we cannot
 * guarantee all openers of mce_chrdev will get a reference to us.
 */
MODULE_LICENSE("GPL");

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-10-01  5:26             ` Chen Yucong
@ 2014-10-01 10:10               ` Borislav Petkov
  0 siblings, 0 replies; 28+ messages in thread
From: Borislav Petkov @ 2014-10-01 10:10 UTC (permalink / raw)
  To: Chen Yucong; +Cc: tony.luck, linux-edac, linux-kernel

On Wed, Oct 01, 2014 at 01:26:04PM +0800, Chen Yucong wrote:
> On Tue, 2014-09-30 at 12:09 +0200, Borislav Petkov wrote:
> > 
> > Now let me repeat my question: how are you testing your patches?
> > 
> There are no any hardware facilities that can help me to inject some
> MCE errors. So I have to modify the kernel source code for testing my
> patches.
> 
> My method is based on the `mce-injection' that is better suited to 
> Intel processors. So I have replaced rdmsrl/wrmsrl/rdmsr_safe with
> mce_rdmsrl/mce_wrmsrl/mce_rdmsr_safe in mce_amd.c. But I use a new
> kernel module for error injection instead of writing /dev/mcelog.
> 
> For more detailed information about testing, you can refer the 
> attachments.

Right, so you modprobe/rmmod when you inject, I see.

We actually have some functionality to test the decoding of MCEs, take a
look at drivers/edac/mce_amd_inj.c. I have patches somewhere which allow
it to raise real MCEs but didn't have the need to merge them yet - I
could try to dust them off...

I also have a patch converting this module to debugfs as sysfs is not
the right fs it should be using for injecting. Then it might be easy to
extend it to inject all kinds of errors into MCA... Also maybe do both
real injection into the hardware (dangerous) and do the fake thing which
mce-inject does. Oh well.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-10-01  4:35             ` Chen Yucong
@ 2014-10-02 13:12               ` Borislav Petkov
  2014-10-02 14:37                 ` Chen Yucong
       [not found]                 ` <CAOjmkp9qQiTbqU3NUhUDAoQAa8wAPJnE_qXbDuBKrA3ee1_APQ@mail.gmail.com>
  0 siblings, 2 replies; 28+ messages in thread
From: Borislav Petkov @ 2014-10-02 13:12 UTC (permalink / raw)
  To: Chen Yucong; +Cc: tony.luck, linux-edac, linux-kernel

On Wed, Oct 01, 2014 at 12:35:02PM +0800, Chen Yucong wrote:
> On Tue, 2014-09-30 at 12:09 +0200, Borislav Petkov wrote:
> > On Tue, Sep 30, 2014 at 05:56:31PM +0800, Chen Yucong wrote:
> > > I just clear it to avoid that the mce_log() call logs the above
> > > threshold event again in machine_check_poll().
> > 
> > Ok, that's a good point, please put it in the commit message.
> > 
> > > It is just used for scanning other banks for recording other valid
> > > error information.
> > 
> > This is actually not what we want - we want to log the errors which
> > cause the overflow first and then the rest. So you don't need the goto
> > but simply have the machine_check_poll() at the end. 
> 
> 
> From: Chen Yucong <slaoub@gmail.com>
> 
> machine_check_poll() will reset IA32_MCi_STATUS register to zero.
> So we need to save the content of IA32_MCi_STATUS MSRs before
> calling machine_check_poll() for logging threshold interrupt event.
> 
> mce_setup() does not gather the content of IA32_MCG_STATUS, so it
> should be read explicitly. Moreover, we need to clear IA32_MCx_STATUS
> to avoid that mce_log() logs the processed threshold event again
> at next time.
> 
> Signed-off-by: Chen Yucong <slaoub@gmail.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce_amd.c |   18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> index f8c56bd..643e6a2 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> @@ -274,6 +274,7 @@ static void amd_threshold_interrupt(void)
>  	struct mce m;
>  
>  	mce_setup(&m);
> +	rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
>  
>  	/* assume first bank caused it */
>  	for (bank = 0; bank < mca_cfg.banks; ++bank) {
> @@ -305,24 +306,27 @@ static void amd_threshold_interrupt(void)
>  			     (high & MASK_LOCKED_HI))
>  				continue;
>  
> -			/*
> -			 * Log the machine check that caused the threshold
> -			 * event.
> -			 */
> -			machine_check_poll(MCP_TIMESTAMP,
> -					this_cpu_ptr(&mce_poll_banks));
> -
>  			if (high & MASK_OVERFLOW_HI) {
>  				rdmsrl(address, m.misc);
>  				rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
> +				if (m.status & MCI_STATUS_ADDRV)
> +					rdmsrl(MSR_IA32_MCx_ADDR(bank), m.addr);
>  				m.bank = K8_MCE_THRESHOLD_BASE
>  				       + bank * NR_BLOCKS
>  				       + block;
>  				mce_log(&m);
> +
> +				wrmsrl(MSR_IA32_MCx_STATUS(bank), 0);
>  				return;

Ok, this return is still bugging me - we're logging the error which
caused the counter overflow but we go and explicitly clear _STATUS so
that machine_check_poll doesn't pick up the same error again.

Even though, machine_check_poll is intended to log the thresholding
error.

Which actually makes me think that that machine_check_poll is actually
completely useless there. IOW, how about that instead:

---
From: Chen Yucong <slaoub@gmail.com>
Date: Thu, 2 Oct 2014 14:48:19 +0200
Subject: [PATCH] x86, MCE, AMD: Correct thresholding error logging

mce_setup() does not gather the content of IA32_MCG_STATUS, so it
should be read explicitly. Moreover, we need to clear IA32_MCx_STATUS
to avoid that mce_log() logs the processed threshold event again
at next time.

But we do the logging ourselves and machine_check_poll() is completely
useless there. So kill it.

Signed-off-by: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 1c54d3d61a4d..9ce64955559d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -270,14 +270,13 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 static void amd_threshold_interrupt(void)
 {
 	u32 low = 0, high = 0, address = 0;
+	int cpu = smp_processor_id();
 	unsigned int bank, block;
 	struct mce m;
 
-	mce_setup(&m);
-
 	/* assume first bank caused it */
 	for (bank = 0; bank < mca_cfg.banks; ++bank) {
-		if (!(per_cpu(bank_map, m.cpu) & (1 << bank)))
+		if (!(per_cpu(bank_map, cpu) & (1 << bank)))
 			continue;
 		for (block = 0; block < NR_BLOCKS; ++block) {
 			if (block == 0) {
@@ -309,20 +308,21 @@ static void amd_threshold_interrupt(void)
 			 * Log the machine check that caused the threshold
 			 * event.
 			 */
-			machine_check_poll(MCP_TIMESTAMP,
-					&__get_cpu_var(mce_poll_banks));
-
-			if (high & MASK_OVERFLOW_HI) {
-				rdmsrl(address, m.misc);
-				rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
-				m.bank = K8_MCE_THRESHOLD_BASE
-				       + bank * NR_BLOCKS
-				       + block;
-				mce_log(&m);
-				return;
-			}
+			if (high & MASK_OVERFLOW_HI)
+				goto log;
 		}
 	}
+	return;
+
+log:
+	mce_setup(&m);
+	rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
+	rdmsrl(address, m.misc);
+	rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
+	m.bank = K8_MCE_THRESHOLD_BASE + bank * NR_BLOCKS + block;
+	mce_log(&m);
+
+	wrmsrl(MSR_IA32_MCx_STATUS(bank), 0);
 }
 
 /*
-- 
2.0.0

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-10-02 13:12               ` Borislav Petkov
@ 2014-10-02 14:37                 ` Chen Yucong
       [not found]                 ` <CAOjmkp9qQiTbqU3NUhUDAoQAa8wAPJnE_qXbDuBKrA3ee1_APQ@mail.gmail.com>
  1 sibling, 0 replies; 28+ messages in thread
From: Chen Yucong @ 2014-10-02 14:37 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: tony.luck, linux-edac, linux-kernel

On Thu, 2014-10-02 at 15:12 +0200, Borislav Petkov wrote:
> 
> Ok, this return is still bugging me - we're logging the error which
> caused the counter overflow but we go and explicitly clear _STATUS so
> that machine_check_poll doesn't pick up the same error again.
> 
> Even though, machine_check_poll is intended to log the thresholding
> error.
> 
> Which actually makes me think that that machine_check_poll is actually
> completely useless there. IOW, how about that instead: 

amd_threshold_interrup() is just used for logging threshold events. And
any 'valid' threshold events can be checked/logged in loop body.
Moreover, machine_check_poll() is unable to check additional MCx_MISCi.
So I agree with you on this change.

Thanks!
cyc 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
       [not found]                 ` <CAOjmkp9qQiTbqU3NUhUDAoQAa8wAPJnE_qXbDuBKrA3ee1_APQ@mail.gmail.com>
@ 2014-10-08 21:52                   ` Aravind Gopalakrishnan
  2014-10-08 22:57                     ` Borislav Petkov
  0 siblings, 1 reply; 28+ messages in thread
From: Aravind Gopalakrishnan @ 2014-10-08 21:52 UTC (permalink / raw)
  To: Borislav Petkov, slaoub; +Cc: Tony Luck, linux-edac, LKML

>
> Ok, this return is still bugging me - we're logging the error which
> caused the counter overflow but we go and explicitly clear _STATUS so
> that machine_check_poll doesn't pick up the same error again.
>
> Even though, machine_check_poll is intended to log the thresholding
> error.
>
> Which actually makes me think that that machine_check_poll is actually
> completely useless there. IOW, how about that instead:
>
> ---
> From: Chen Yucong <slaoub@gmail.com <mailto:slaoub@gmail.com>>
> Date: Thu, 2 Oct 2014 14:48:19 +0200
> Subject: [PATCH] x86, MCE, AMD: Correct thresholding error logging
>
> mce_setup() does not gather the content of IA32_MCG_STATUS, so it
> should be read explicitly. Moreover, we need to clear IA32_MCx_STATUS
> to avoid that mce_log() logs the processed threshold event again
> at next time.
>
> But we do the logging ourselves and machine_check_poll() is completely
> useless there. So kill it.
>
> Signed-off-by: Chen Yucong <slaoub@gmail.com <mailto:slaoub@gmail.com>>
> Signed-off-by: Borislav Petkov <bp@suse.de <mailto:bp@suse.de>>
> ---
>  arch/x86/kernel/cpu/mcheck/mce_amd.c | 30 +++++++++++++++---------------
>  1 file changed, 15 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
> b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> index 1c54d3d61a4d..9ce64955559d 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> @@ -270,14 +270,13 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
>  static void amd_threshold_interrupt(void)
>  {
>         u32 low = 0, high = 0, address = 0;
> +       int cpu = smp_processor_id();
>         unsigned int bank, block;
>         struct mce m;
>
> -       mce_setup(&m);
> -
>         /* assume first bank caused it */
>         for (bank = 0; bank < mca_cfg.banks; ++bank) {
> -               if (!(per_cpu(bank_map, m.cpu) & (1 << bank)))
> +               if (!(per_cpu(bank_map, cpu) & (1 << bank)))
>                         continue;
>                 for (block = 0; block < NR_BLOCKS; ++block) {
>                         if (block == 0) {
> @@ -309,20 +308,21 @@ static void amd_threshold_interrupt(void)
>                          * Log the machine check that caused the threshold
>                          * event.
>                          */
> -                       machine_check_poll(MCP_TIMESTAMP,
> -  &__get_cpu_var(mce_poll_banks));
> -
> -                       if (high & MASK_OVERFLOW_HI) {
> -                               rdmsrl(address, m.misc);
> -  rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
> -                               m.bank = K8_MCE_THRESHOLD_BASE
> -                                      + bank * NR_BLOCKS
> -                                      + block;
> -                               mce_log(&m);
> -                               return;
> -                       }
> +                       if (high & MASK_OVERFLOW_HI)
> +                               goto log;
>                 }
>         }
> +       return;
> +
> +log:
> +       mce_setup(&m);
> +       rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
> +       rdmsrl(address, m.misc);
> +       rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
> +       m.bank = K8_MCE_THRESHOLD_BASE + bank * NR_BLOCKS + block;


I am not understanding why m.bank is assigned this value..

It only causes incorrect decoding-
[  608.832916] DEBUG: raise_amd_threshold_event
[  608.832926] [Hardware Error]: Corrected error, no action required.
[  608.833143] [Hardware Error]: CPU:26 (15:2:0) 
MC165_STATUS[-|CE|MiscV|-|AddrV|-|-]: 0x8c00000000000000
[  608.833551] [Hardware Error]: MC165_ADDR: 0x0000000000000000
[  608.833777] [Hardware Error]: cache level: RESV, tx: INSN
[  608.834034] amd_inject module loaded ...


(Obviously, as in amd_decode_mce() we switch (m->bank) for decoding the 
status and there is no bank 165)

OTOH, if m.bank = bank;
Then we get correct decoding info-
[   58.021978] DEBUG: raise_amd_threshold_event
[   58.021992] [Hardware Error]: Corrected error, no action required.
[   58.022155] [Hardware Error]: CPU:0 (15:60:0) 
MC4_STATUS[-|CE|MiscV|-|AddrV|-|-]: 0x8c00000000000000
[   58.022393] [Hardware Error]: MC4_ADDR: 0x0000000000000000
[   58.022531] [Hardware Error]: MC4 Error (node 0): DRAM ECC error 
detected on the NB.
<snip..it's throws WARN as "Something is rotten in the state of Denmark".>
<.. but that's fine. we are just fake-injecting errors here.. :) >
[   58.022933] [Hardware Error]: cache level: RESV, tx: INSN
[   58.023084] amd_inject module loaded ...

Thanks,
-Aravind.

> +       mce_log(&m);
> +
> +       wrmsrl(MSR_IA32_MCx_STATUS(bank), 0);
>  }
>
>  /*
> --
> 2.0.0
>
> --
> Regards/Gruss,
>     Boris.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-10-08 21:52                   ` Fwd: " Aravind Gopalakrishnan
@ 2014-10-08 22:57                     ` Borislav Petkov
  2014-10-09 16:53                       ` Aravind Gopalakrishnan
  0 siblings, 1 reply; 28+ messages in thread
From: Borislav Petkov @ 2014-10-08 22:57 UTC (permalink / raw)
  To: Aravind Gopalakrishnan; +Cc: slaoub, Tony Luck, linux-edac, LKML

On Wed, Oct 08, 2014 at 04:52:06PM -0500, Aravind Gopalakrishnan wrote:
> I am not understanding why m.bank is assigned this value..

That's a very good question, see below for some history.

> 
> It only causes incorrect decoding-
> [  608.832916] DEBUG: raise_amd_threshold_event
> [  608.832926] [Hardware Error]: Corrected error, no action required.
> [  608.833143] [Hardware Error]: CPU:26 (15:2:0)
> MC165_STATUS[-|CE|MiscV|-|AddrV|-|-]: 0x8c00000000000000
> [  608.833551] [Hardware Error]: MC165_ADDR: 0x0000000000000000
> [  608.833777] [Hardware Error]: cache level: RESV, tx: INSN
> [  608.834034] amd_inject module loaded ...
> 
> 
> (Obviously, as in amd_decode_mce() we switch (m->bank) for decoding the
> status and there is no bank 165)
> 
> OTOH, if m.bank = bank;
> Then we get correct decoding info-

Yes, and I think we should do that only if we're using the *last* error
to report the overflow with: we're reporting a thresholding counter
overflow and the bank on which it was detected on should, of course, be
part of the report.

The "funny" bank is some sort of a software defined banks thing which
got added in 2005 (see the patch I dug out below) and it was supposed
to be used (I'm guessing here) for reporting thermal events using MCA
(dumb idea, if you ask me) so since thermal events don't really have
a bank, they decided to have some sort of a software-defined MCA bank
which doesn't correspond to any hardware bank.

Then Jacob decided to use it for some reason too:

95268664390b ("[PATCH] x86_64: mce_amd support for family 0x10 processors")

maybe because thresholding errors don't have a bank associated with them
but if I'm not missing anything, they do!

Oh oh, ok, it just dawned on me! I think I know what it *might* have
been: they wanted to report the overflowing with a special error
signature which uses a software-defined bank. Ok, that actually makes
sense: when you see an error for a sw-defined bank, you're reporting an
thresholding counter overflow.

Which means that we shouldn't be populating m.status either, i.e. what
we did earlier:

	rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);

because this is a special error type.

Hmm, it is too late here to think straight, more tomorrow. But Aravind,
that was a very good question, you actually made me dig into git history
:-)

Good night.


>From d2b6331397e634477b76f6fec119b7caf3ac564e Mon Sep 17 00:00:00 2001
From: Zwane Mwaikambo <zwane@linuxpower.ca>
Date: Mon, 3 Jan 2005 04:42:52 -0800
Subject: [PATCH] [PATCH] Intel thermal monitor for x86_64

Patch adds support for notification of overheating conditions on intel
x86_64 processors.  Tested on EM64T, test booted on AMD64.

Hardware courtesy of Intel Corporation

Signed-off-by: Zwane Mwaikambo <zwane@linuxpower.ca>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---
 arch/x86_64/Kconfig            |  7 +++
 arch/x86_64/kernel/Makefile    |  1 +
 arch/x86_64/kernel/entry.S     |  3 ++
 arch/x86_64/kernel/i8259.c     |  2 +
 arch/x86_64/kernel/mce.c       | 14 +++++-
 arch/x86_64/kernel/mce_intel.c | 99 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86_64/kernel/traps.c     |  4 ++
 include/asm-x86_64/mce.h       | 13 ++++++
 8 files changed, 142 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86_64/kernel/mce_intel.c

diff --git a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig
index 4ffa04271050..bc317049ebed 100644
--- a/arch/x86_64/Kconfig
+++ b/arch/x86_64/Kconfig
@@ -338,6 +338,13 @@ config X86_MCE
 	   machine check error logs. See
 	   ftp://ftp.x86-64.org/pub/linux/tools/mcelog
 
+config X86_MCE_INTEL
+	bool "Intel MCE features"
+	depends on X86_MCE && X86_LOCAL_APIC
+	default y
+	help
+	   Additional support for intel specific MCE features such as
+	   the thermal monitor.
 endmenu
 
 #
diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index 2c0f3af82e5e..96a5111e96c6 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -10,6 +10,7 @@ obj-y	:= process.o semaphore.o signal.o entry.o traps.o irq.o \
 		setup64.o bootflag.o e820.o reboot.o warmreboot.o quirks.o
 
 obj-$(CONFIG_X86_MCE)         += mce.o
+obj-$(CONFIG_X86_MCE_INTEL)	+= mce_intel.o
 obj-$(CONFIG_MTRR)		+= ../../i386/kernel/cpu/mtrr/
 obj-$(CONFIG_ACPI_BOOT)		+= acpi/
 obj-$(CONFIG_X86_MSR)		+= msr.o
diff --git a/arch/x86_64/kernel/entry.S b/arch/x86_64/kernel/entry.S
index d8d906a7d8e1..ca050e729a85 100644
--- a/arch/x86_64/kernel/entry.S
+++ b/arch/x86_64/kernel/entry.S
@@ -538,6 +538,9 @@ retint_kernel:
 	CFI_ENDPROC
 	.endm
 
+ENTRY(thermal_interrupt)
+	apicinterrupt THERMAL_APIC_VECTOR,smp_thermal_interrupt
+
 #ifdef CONFIG_SMP	
 ENTRY(reschedule_interrupt)
 	apicinterrupt RESCHEDULE_VECTOR,smp_reschedule_interrupt
diff --git a/arch/x86_64/kernel/i8259.c b/arch/x86_64/kernel/i8259.c
index 7929a2e534a6..04e6fdab46b6 100644
--- a/arch/x86_64/kernel/i8259.c
+++ b/arch/x86_64/kernel/i8259.c
@@ -476,6 +476,7 @@ void error_interrupt(void);
 void reschedule_interrupt(void);
 void call_function_interrupt(void);
 void invalidate_interrupt(void);
+void thermal_interrupt(void);
 
 static void setup_timer(void)
 {
@@ -550,6 +551,7 @@ void __init init_IRQ(void)
 	/* IPI for generic function call */
 	set_intr_gate(CALL_FUNCTION_VECTOR, call_function_interrupt);
 #endif	
+	set_intr_gate(THERMAL_APIC_VECTOR, thermal_interrupt);
 
 #ifdef CONFIG_X86_LOCAL_APIC
 	/* self generated IPI for local APIC timer */
diff --git a/arch/x86_64/kernel/mce.c b/arch/x86_64/kernel/mce.c
index 5da150baf25e..6e717e470460 100644
--- a/arch/x86_64/kernel/mce.c
+++ b/arch/x86_64/kernel/mce.c
@@ -43,7 +43,7 @@ struct mce_log mcelog = {
 	MCE_LOG_LEN,
 }; 
 
-static void mce_log(struct mce *mce)
+void mce_log(struct mce *mce)
 {
 	unsigned next, entry;
 	mce->finished = 0;
@@ -305,6 +305,17 @@ static void __init mce_cpu_quirks(struct cpuinfo_x86 *c)
 	}
 }			
 
+static void __init mce_cpu_features(struct cpuinfo_x86 *c)
+{
+	switch (c->x86_vendor) {
+	case X86_VENDOR_INTEL:
+		mce_intel_feature_init(c);
+		break;
+	default:
+		break;
+	}
+}
+
 /* 
  * Called for each booted CPU to set up machine checks.
  * Must be called with preempt off. 
@@ -321,6 +332,7 @@ void __init mcheck_init(struct cpuinfo_x86 *c)
 		return;
 
 	mce_init(NULL);
+	mce_cpu_features(c);
 }
 
 /*
diff --git a/arch/x86_64/kernel/mce_intel.c b/arch/x86_64/kernel/mce_intel.c
new file mode 100644
index 000000000000..4db9a640069f
--- /dev/null
+++ b/arch/x86_64/kernel/mce_intel.c
@@ -0,0 +1,99 @@
+/*
+ * Intel specific MCE features.
+ * Copyright 2004 Zwane Mwaikambo <zwane@linuxpower.ca>
+ */
+
+#include <linux/init.h>
+#include <linux/interrupt.h>
+#include <linux/percpu.h>
+#include <asm/processor.h>
+#include <asm/msr.h>
+#include <asm/mce.h>
+#include <asm/hw_irq.h>
+
+static DEFINE_PER_CPU(unsigned long, next_check);
+
+asmlinkage void smp_thermal_interrupt(void)
+{
+	struct mce m;
+
+	ack_APIC_irq();
+
+	irq_enter();
+	if (time_before(jiffies, __get_cpu_var(next_check)))
+		goto done;
+
+	__get_cpu_var(next_check) = jiffies + HZ*300;
+	memset(&m, 0, sizeof(m));
+	m.cpu = smp_processor_id();
+	m.bank = MCE_THERMAL_BANK;
+	rdtscll(m.tsc);
+	rdmsrl(MSR_IA32_THERM_STATUS, m.status);
+	if (m.status & 0x1) {
+		printk(KERN_EMERG
+			"CPU%d: Temperature above threshold, cpu clock throttled\n", m.cpu);
+		add_taint(TAINT_MACHINE_CHECK);
+	} else {
+		printk(KERN_EMERG "CPU%d: Temperature/speed normal\n", m.cpu);
+	}
+
+	mce_log(&m);
+done:
+	irq_exit();
+}
+
+static void __init intel_init_thermal(struct cpuinfo_x86 *c)
+{
+	u32 l, h;
+	int tm2 = 0;
+	unsigned int cpu = smp_processor_id();
+
+	if (!cpu_has(c, X86_FEATURE_ACPI))
+		return;
+
+	if (!cpu_has(c, X86_FEATURE_ACC))
+		return;
+
+	/* first check if TM1 is already enabled by the BIOS, in which
+	 * case there might be some SMM goo which handles it, so we can't even
+	 * put a handler since it might be delivered via SMI already.
+	 */
+	rdmsr(MSR_IA32_MISC_ENABLE, l, h);
+	h = apic_read(APIC_LVTTHMR);
+	if ((l & (1 << 3)) && (h & APIC_DM_SMI)) {
+		printk(KERN_DEBUG
+		       "CPU%d: Thermal monitoring handled by SMI\n", cpu);
+		return;
+	}
+
+	if (cpu_has(c, X86_FEATURE_TM2) && (l & (1 << 13)))
+		tm2 = 1;
+
+	if (h & APIC_VECTOR_MASK) {
+		printk(KERN_DEBUG
+		       "CPU%d: Thermal LVT vector (%#x) already "
+		       "installed\n", cpu, (h & APIC_VECTOR_MASK));
+		return;
+	}
+
+	h = THERMAL_APIC_VECTOR;
+	h |= (APIC_DM_FIXED | APIC_LVT_MASKED);
+	apic_write_around(APIC_LVTTHMR, h);
+
+	rdmsr(MSR_IA32_THERM_INTERRUPT, l, h);
+	wrmsr(MSR_IA32_THERM_INTERRUPT, l | 0x03, h);
+
+	rdmsr(MSR_IA32_MISC_ENABLE, l, h);
+	wrmsr(MSR_IA32_MISC_ENABLE, l | (1 << 3), h);
+
+	l = apic_read(APIC_LVTTHMR);
+	apic_write_around(APIC_LVTTHMR, l & ~APIC_LVT_MASKED);
+	printk(KERN_INFO "CPU%d: Thermal monitoring enabled (%s)\n",
+		cpu, tm2 ? "TM2" : "TM1");
+	return;
+}
+
+void __init mce_intel_feature_init(struct cpuinfo_x86 *c)
+{
+	intel_init_thermal(c);
+}
diff --git a/arch/x86_64/kernel/traps.c b/arch/x86_64/kernel/traps.c
index 50e9621b0273..3ebfc9117d2a 100644
--- a/arch/x86_64/kernel/traps.c
+++ b/arch/x86_64/kernel/traps.c
@@ -882,6 +882,10 @@ asmlinkage void do_spurious_interrupt_bug(struct pt_regs * regs)
 {
 }
 
+asmlinkage void __attribute__((weak)) smp_thermal_interrupt(void)
+{
+}
+
 /*
  *  'math_state_restore()' saves the current math information in the
  * old math state array, and gets the new ones from the current task
diff --git a/include/asm-x86_64/mce.h b/include/asm-x86_64/mce.h
index 1c84fa8758c3..869249db6795 100644
--- a/include/asm-x86_64/mce.h
+++ b/include/asm-x86_64/mce.h
@@ -64,4 +64,17 @@ struct mce_log {
 #define MCE_GET_LOG_LEN      _IOR('M', 2, int)
 #define MCE_GETCLEAR_FLAGS   _IOR('M', 3, int)
 
+/* Software defined banks */
+#define MCE_EXTENDED_BANK	128
+#define MCE_THERMAL_BANK	MCE_EXTENDED_BANK + 0
+
+void mce_log(struct mce *m);
+#ifdef CONFIG_X86_MCE_INTEL
+void mce_intel_feature_init(struct cpuinfo_x86 *c);
+#else
+static inline void mce_intel_feature_init(struct cpuinfo_x86 *c)
+{
+}
+#endif
+
 #endif
-- 
2.0.0


-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: Fwd: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-10-08 22:57                     ` Borislav Petkov
@ 2014-10-09 16:53                       ` Aravind Gopalakrishnan
  2014-10-09 17:35                         ` Borislav Petkov
  0 siblings, 1 reply; 28+ messages in thread
From: Aravind Gopalakrishnan @ 2014-10-09 16:53 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: slaoub, Tony Luck, linux-edac, LKML

On Thu, Oct 09, 2014 at 12:57:50AM +0200, Borislav Petkov wrote:
> On Wed, Oct 08, 2014 at 04:52:06PM -0500, Aravind Gopalakrishnan wrote:
> > I am not understanding why m.bank is assigned this value..
> 
> That's a very good question, see below for some history.
> 
> > 
> > It only causes incorrect decoding-
> > [  608.832916] DEBUG: raise_amd_threshold_event
> > [  608.832926] [Hardware Error]: Corrected error, no action required.
> > [  608.833143] [Hardware Error]: CPU:26 (15:2:0)
> > MC165_STATUS[-|CE|MiscV|-|AddrV|-|-]: 0x8c00000000000000
> > [  608.833551] [Hardware Error]: MC165_ADDR: 0x0000000000000000
> > [  608.833777] [Hardware Error]: cache level: RESV, tx: INSN
> > [  608.834034] amd_inject module loaded ...
> > 
> > 
> > (Obviously, as in amd_decode_mce() we switch (m->bank) for decoding the
> > status and there is no bank 165)
> > 
> > OTOH, if m.bank = bank;
> > Then we get correct decoding info-
> 
> Yes, and I think we should do that only if we're using the *last* error
> to report the overflow with: we're reporting a thresholding counter
> overflow and the bank on which it was detected on should, of course, be
> part of the report.
> 

How do you mean "last error"?
The interrupt is only fired upon overflow..

> The "funny" bank is some sort of a software defined banks thing which
> got added in 2005 (see the patch I dug out below) and it was supposed
> to be used (I'm guessing here) for reporting thermal events using MCA
> (dumb idea, if you ask me) so since thermal events don't really have
> a bank, they decided to have some sort of a software-defined MCA bank
> which doesn't correspond to any hardware bank.
> 
> Then Jacob decided to use it for some reason too:
> 
> 95268664390b ("[PATCH] x86_64: mce_amd support for family 0x10 processors")
> 
> maybe because thresholding errors don't have a bank associated with them
> but if I'm not missing anything, they do!
> 

Right. The thresholding registers are nothing but _MISC(x) where x is a
bank value.

> Oh oh, ok, it just dawned on me! I think I know what it *might* have
> been: they wanted to report the overflowing with a special error
> signature which uses a software-defined bank. Ok, that actually makes
> sense: when you see an error for a sw-defined bank, you're reporting an
> thresholding counter overflow.
> 
> Which means that we shouldn't be populating m.status either, i.e. what
> we did earlier:
> 
> 	rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
> 
> because this is a special error type.
>

How is it a "special error type"? It's still the same CE error that
we get notified with. Only difference being - now it's crossed a
specific 'threshold_limit'

So- I am not getting the rationale behind a S/W defined bank for reporting
this.

CE error if collected through polling gives proper decoding
info. So, why should this be any different for the same CE error for
which an interrupt is generated on crossing a threshold?

Thanks,
-Aravind

> Hmm, it is too late here to think straight, more tomorrow. But Aravind,
> that was a very good question, you actually made me dig into git history
> :-)
> 
> Good night.
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-10-09 16:53                       ` Aravind Gopalakrishnan
@ 2014-10-09 17:35                         ` Borislav Petkov
  2014-10-09 19:01                           ` Aravind Gopalakrishnan
  0 siblings, 1 reply; 28+ messages in thread
From: Borislav Petkov @ 2014-10-09 17:35 UTC (permalink / raw)
  To: Aravind Gopalakrishnan; +Cc: slaoub, Tony Luck, linux-edac, LKML

On Thu, Oct 09, 2014 at 11:53:39AM -0500, Aravind Gopalakrishnan wrote:
> How do you mean "last error"?
> The interrupt is only fired upon overflow..

And? Think about it, what is causing the overflow? A CE, right?

There was even a call to machine_check_poll() there which we removed,
but for another reason. In any case, you should have the error signature
in the MCA banks of the last error causing the overflow, right? This is
what I mean with last error.

However(!),...

> CE error if collected through polling gives proper decoding info. So,
> why should this be any different for the same CE error for which an
> interrupt is generated on crossing a threshold?

... we're currently using a special signature to signal the overflow
with the K8_MCE_THRESHOLD_BASE thing. You simply report a special bank
and this way you can tell userspace that this is an overflow error. I
think that was the reason behind the software-defined banks.

Now, we can also drop that and simply log a normal error but make sure
MASK_OVERFLOW_HI is passed onto userspace so that it can see that the
error is an overflow error. I.e., something like this:

        mce_setup(&m);
	// rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus); - not sure about this one - we're not looking at MCGSTATUS for CEs
        // rdmsrl(address, m.misc); - this MSR can be saved too as we're reading
	// the MISC register already.
        rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
        m.bank = bank;
        mce_log(&m);

so in the end it'll be something like this:

	mce_setup(&m);
	m.misc = (high << 32) | low;
	rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
	m.bank = bank;
        mce_log(&m);

so I'm still on the fence about what we want to do and am expecting
arguments. I like the last one more because it is simpler and tools
don't need to know about the software-defined banks.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-10-09 17:35                         ` Borislav Petkov
@ 2014-10-09 19:01                           ` Aravind Gopalakrishnan
  2014-10-21 20:28                             ` Borislav Petkov
  0 siblings, 1 reply; 28+ messages in thread
From: Aravind Gopalakrishnan @ 2014-10-09 19:01 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: slaoub, Tony Luck, linux-edac, LKML

On 10/9/2014 12:35 PM, Borislav Petkov wrote:
> On Thu, Oct 09, 2014 at 11:53:39AM -0500, Aravind Gopalakrishnan wrote:
>> How do you mean "last error"?
>> The interrupt is only fired upon overflow..
> And? Think about it, what is causing the overflow? A CE, right?
>
> There was even a call to machine_check_poll() there which we removed,
> but for another reason. In any case, you should have the error signature
> in the MCA banks of the last error causing the overflow, right?

Right. I was not arguing that we shouldn't. Just wasn't clear on what 
you meant.
Anyway, Thanks for clarifying.

> This is
> what I mean with last error.
>
> However(!),...
>
>> CE error if collected through polling gives proper decoding info. So,
>> why should this be any different for the same CE error for which an
>> interrupt is generated on crossing a threshold?
> ... we're currently using a special signature to signal the overflow
> with the K8_MCE_THRESHOLD_BASE thing. You simply report a special bank
> and this way you can tell userspace that this is an overflow error. I
> think that was the reason behind the software-defined banks.
>
> Now, we can also drop that and simply log a normal error but make sure
> MASK_OVERFLOW_HI is passed onto userspace so that it can see that the
> error is an overflow error. I.e., something like this:
>
>          mce_setup(&m);
> 	// rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus); - not sure about this one - we're not looking at MCGSTATUS for CEs
That's right. Might as well remove it.

>          // rdmsrl(address, m.misc); - this MSR can be saved too as we're reading
> 	// the MISC register already.
>          rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
>          m.bank = bank;
>          mce_log(&m);
>
> so in the end it'll be something like this:
>
> 	mce_setup(&m);
> 	m.misc = (high << 32) | low;
> 	rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
> 	m.bank = bank;
>          mce_log(&m);
>
> so I'm still on the fence about what we want to do and am expecting
> arguments.

I actually agree with this approach. So no argument:)
> I like the last one more because it is simpler and tools
> don't need to know about the software-defined banks.
>

Thanks
-Aravind.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-10-09 19:01                           ` Aravind Gopalakrishnan
@ 2014-10-21 20:28                             ` Borislav Petkov
  2014-10-22  1:51                               ` Chen Yucong
  0 siblings, 1 reply; 28+ messages in thread
From: Borislav Petkov @ 2014-10-21 20:28 UTC (permalink / raw)
  To: Aravind Gopalakrishnan; +Cc: slaoub, Tony Luck, linux-edac, LKML

On Thu, Oct 09, 2014 at 02:01:06PM -0500, Aravind Gopalakrishnan wrote:
> I actually agree with this approach. So no argument:)

Ok, thanks, here's a patch.

Btw, I'm pushing the whole queue to a ras-for-3.19 branch at
https://git.kernel.org/cgit/linux/kernel/git/bp/bp.git if you'd like to
take a look and see whether we haven't forgotten anything before I send
it to tip guys.

Thanks.

---
From: Borislav Petkov <bp@suse.de>
Subject: [PATCH] x86, MCE, AMD: Drop software-defined bank in error thresholding

Aravind had the good question about why we're assigning a
software-defined bank when reporting error thresholding errors instead
of simply using the bank which reports the last error causing the
overflow.

Digging through git history, it pointed to

95268664390b ("[PATCH] x86_64: mce_amd support for family 0x10 processors")

which added that functionality. The problem with this, however, is that
tools don't know about software-defined banks and get puzzled. So drop
that K8_MCE_THRESHOLD_BASE and simply use the hw bank reporting the
thresholding interrupt.

Save us a couple of MSR reads while at it.

Reported-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Link: https://lkml.kernel.org/r/5435B206.60402@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/mce.h           | 1 -
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 5 ++---
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 958b90f761e5..276392f121fb 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -78,7 +78,6 @@
 /* Software defined banks */
 #define MCE_EXTENDED_BANK	128
 #define MCE_THERMAL_BANK	(MCE_EXTENDED_BANK + 0)
-#define K8_MCE_THRESHOLD_BASE   (MCE_EXTENDED_BANK + 1)
 
 #define MCE_LOG_LEN 32
 #define MCE_LOG_SIGNATURE	"MACHINECHECK"
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 9af7bd74828b..6606523ff1c1 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -318,10 +318,9 @@ static void amd_threshold_interrupt(void)
 
 log:
 	mce_setup(&m);
-	rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
-	rdmsrl(address, m.misc);
 	rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
-	m.bank = K8_MCE_THRESHOLD_BASE + bank * NR_BLOCKS + block;
+	m.misc = ((u64)high << 32) | low;
+	m.bank = bank;
 	mce_log(&m);
 
 	wrmsrl(MSR_IA32_MCx_STATUS(bank), 0);
-- 
2.0.0


-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: Fwd: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-10-21 20:28                             ` Borislav Petkov
@ 2014-10-22  1:51                               ` Chen Yucong
  2014-10-22  8:16                                 ` Borislav Petkov
  0 siblings, 1 reply; 28+ messages in thread
From: Chen Yucong @ 2014-10-22  1:51 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Aravind Gopalakrishnan, Tony Luck, linux-edac, LKML

On Tue, 2014-10-21 at 22:28 +0200, Borislav Petkov wrote:
> On Thu, Oct 09, 2014 at 02:01:06PM -0500, Aravind Gopalakrishnan wrote:
> > I actually agree with this approach. So no argument:)
> 
> Ok, thanks, here's a patch.
> 
> Btw, I'm pushing the whole queue to a ras-for-3.19 branch at
> https://git.kernel.org/cgit/linux/kernel/git/bp/bp.git if you'd like to
> take a look and see whether we haven't forgotten anything before I send
> it to tip guys.
> 
Hi Boris,

Can you check the following link? The link contains my reply about 
"x86, MCE, AMD: Move invariant code out from loop body". The reply
was sent to you on October 7, but until now, there aren't any comments
from you!

https://lkml.org/lkml/2014/10/7/84

Thanks!
cyc


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-10-22  1:51                               ` Chen Yucong
@ 2014-10-22  8:16                                 ` Borislav Petkov
  2014-10-22  8:53                                   ` Chen Yucong
  0 siblings, 1 reply; 28+ messages in thread
From: Borislav Petkov @ 2014-10-22  8:16 UTC (permalink / raw)
  To: Chen Yucong; +Cc: Aravind Gopalakrishnan, Tony Luck, linux-edac, LKML

On Wed, Oct 22, 2014 at 09:51:18AM +0800, Chen Yucong wrote:
> Can you check the following link? The link contains my reply about
> "x86, MCE, AMD: Move invariant code out from loop body". The reply was
> sent to you on October 7, but until now, there aren't any comments
> from you!

https://git.kernel.org/cgit/linux/kernel/git/bp/bp.git/commit/?h=ras-for-3.19&id=69b957583580bf40624553c64d802fefb54199cb

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-10-22  8:16                                 ` Borislav Petkov
@ 2014-10-22  8:53                                   ` Chen Yucong
  2014-10-22  9:30                                     ` Borislav Petkov
  0 siblings, 1 reply; 28+ messages in thread
From: Chen Yucong @ 2014-10-22  8:53 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Aravind Gopalakrishnan, Tony Luck, linux-edac, LKML

On Wed, 2014-10-22 at 10:16 +0200, Borislav Petkov wrote:
> On Wed, Oct 22, 2014 at 09:51:18AM +0800, Chen Yucong wrote:
> > Can you check the following link? The link contains my reply about
> > "x86, MCE, AMD: Move invariant code out from loop body". The reply was
> > sent to you on October 7, but until now, there aren't any comments
> > from you!
> 
> https://git.kernel.org/cgit/linux/kernel/git/bp/bp.git/commit/?h=ras-for-3.19&id=69b957583580bf40624553c64d802fefb54199cb

I have checked this link! I mean that there is another reply that you
may not have noticed.

thx!
cyc



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-10-22  8:53                                   ` Chen Yucong
@ 2014-10-22  9:30                                     ` Borislav Petkov
  2014-10-29 15:59                                       ` Aravind Gopalakrishnan
  0 siblings, 1 reply; 28+ messages in thread
From: Borislav Petkov @ 2014-10-22  9:30 UTC (permalink / raw)
  To: Aravind Gopalakrishnan; +Cc: Chen Yucong, Tony Luck, linux-edac, LKML

Hi Aravind,

question: what's the story with MC?_MISC[IntP], is that bit still there?
Because I don't see it in my BKDGs here.

The background of the story is

https://lkml.org/lkml/2014/10/7/84

There's this thing we did at the time

f227d4306cf3 ("x86, MCE, AMD: Make APIC LVT thresholding interrupt optional")

which, AFAICR, is about some F15h versions having a counter but *not*
generating a thresholding interrupt. Can you confirm that is still
the case and we can have a counter but no interrupt gets generated on
overflow?

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-10-22  9:30                                     ` Borislav Petkov
@ 2014-10-29 15:59                                       ` Aravind Gopalakrishnan
  2014-10-30 19:04                                         ` Aravind Gopalakrishnan
  0 siblings, 1 reply; 28+ messages in thread
From: Aravind Gopalakrishnan @ 2014-10-29 15:59 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Chen Yucong, Tony Luck, linux-edac, LKML

On 10/22/2014 4:30 AM, Borislav Petkov wrote:
> Hi Aravind,
>
> question: what's the story with MC?_MISC[IntP], is that bit still there?
> Because I don't see it in my BKDGs here.

Yep, It exists.
Maybe you are referring to Fam15h M0h BKDG? I think the bit was 
introduced only from F15h M30h onwards.
The bit does *not* exist for bank=4, But-
if (bank ==4)
   return true;

takes care of that.

> The background of the story is
>
> https://lkml.org/lkml/2014/10/7/84
>
> There's this thing we did at the time
>
> f227d4306cf3 ("x86, MCE, AMD: Make APIC LVT thresholding interrupt optional")
>
> which, AFAICR, is about some F15h versions having a counter but *not*
> generating a thresholding interrupt. Can you confirm that is still
> the case and we can have a counter but no interrupt gets generated on
> overflow?
>

So yes, moving the assignment inside the if condition should work just fine.

I see the patch on your 'ras-for-3.19' branch does not have this, so 
I'll make this modification
to the branch before I test it.

Thanks,
-Aravind.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-10-29 15:59                                       ` Aravind Gopalakrishnan
@ 2014-10-30 19:04                                         ` Aravind Gopalakrishnan
  2014-10-30 21:39                                           ` Borislav Petkov
  0 siblings, 1 reply; 28+ messages in thread
From: Aravind Gopalakrishnan @ 2014-10-30 19:04 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Chen Yucong, Tony Luck, linux-edac, LKML

On 10/29/2014 10:59 AM, Aravind Gopalakrishnan wrote:
> On 10/22/2014 4:30 AM, Borislav Petkov wrote:
>> Hi Aravind,
>>
>> question: what's the story with MC?_MISC[IntP], is that bit still there?
>> Because I don't see it in my BKDGs here.
>
> Yep, It exists.
> Maybe you are referring to Fam15h M0h BKDG? I think the bit was 
> introduced only from F15h M30h onwards.
> The bit does *not* exist for bank=4, But-
> if (bank ==4)
>   return true;
>
> takes care of that.
>
>> The background of the story is
>>
>> https://lkml.org/lkml/2014/10/7/84
>>
>> There's this thing we did at the time
>>
>> f227d4306cf3 ("x86, MCE, AMD: Make APIC LVT thresholding interrupt 
>> optional")
>>
>> which, AFAICR, is about some F15h versions having a counter but *not*
>> generating a thresholding interrupt. Can you confirm that is still
>> the case and we can have a counter but no interrupt gets generated on
>> overflow?
>>
>
> So yes, moving the assignment inside the if condition should work just 
> fine.
>
> I see the patch on your 'ras-for-3.19' branch does not have this, so 
> I'll make this modification
> to the branch before I test it.
>

Hi Boris,
I have tested the branch with this bit:

  if (b.interrupt_capable) {
             ... ...
             if (mce_threshold_vector != amd_threshold_interrupt)
                     mce_threshold_vector = amd_threshold_interrupt;
     }

and it works fine.

Thanks,
-Aravind.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Fwd: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
  2014-10-30 19:04                                         ` Aravind Gopalakrishnan
@ 2014-10-30 21:39                                           ` Borislav Petkov
  0 siblings, 0 replies; 28+ messages in thread
From: Borislav Petkov @ 2014-10-30 21:39 UTC (permalink / raw)
  To: Aravind Gopalakrishnan; +Cc: Chen Yucong, Tony Luck, linux-edac, LKML

On Thu, Oct 30, 2014 at 02:04:17PM -0500, Aravind Gopalakrishnan wrote:
> Hi Boris,
> I have tested the branch with this bit:
> 
>  if (b.interrupt_capable) {
>             ... ...
>             if (mce_threshold_vector != amd_threshold_interrupt)
>                     mce_threshold_vector = amd_threshold_interrupt;
>     }
> 
> and it works fine.

Good, I'll adjust Chen's patch tomorrow then and send up.

Thanks a lot for testing!

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-10-30 21:40 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-23  2:16 [PATCH] x86, MCE, AMD: use macros to compute bank MSRs Chen Yucong
2014-09-23  8:19 ` [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it Chen Yucong
2014-09-28  8:15   ` Chen Yucong
2014-09-29 12:05   ` Borislav Petkov
2014-09-30  0:39     ` Chen Yucong
2014-09-30  7:25       ` Borislav Petkov
2014-09-30  9:56         ` Chen Yucong
2014-09-30 10:09           ` Borislav Petkov
2014-10-01  4:35             ` Chen Yucong
2014-10-02 13:12               ` Borislav Petkov
2014-10-02 14:37                 ` Chen Yucong
     [not found]                 ` <CAOjmkp9qQiTbqU3NUhUDAoQAa8wAPJnE_qXbDuBKrA3ee1_APQ@mail.gmail.com>
2014-10-08 21:52                   ` Fwd: " Aravind Gopalakrishnan
2014-10-08 22:57                     ` Borislav Petkov
2014-10-09 16:53                       ` Aravind Gopalakrishnan
2014-10-09 17:35                         ` Borislav Petkov
2014-10-09 19:01                           ` Aravind Gopalakrishnan
2014-10-21 20:28                             ` Borislav Petkov
2014-10-22  1:51                               ` Chen Yucong
2014-10-22  8:16                                 ` Borislav Petkov
2014-10-22  8:53                                   ` Chen Yucong
2014-10-22  9:30                                     ` Borislav Petkov
2014-10-29 15:59                                       ` Aravind Gopalakrishnan
2014-10-30 19:04                                         ` Aravind Gopalakrishnan
2014-10-30 21:39                                           ` Borislav Petkov
2014-10-01  5:26             ` Chen Yucong
2014-10-01 10:10               ` Borislav Petkov
2014-09-28  8:09 ` [PATCH] x86, MCE, AMD: use macros to compute bank MSRs Chen Yucong
2014-09-29 11:48 ` Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.