All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if valid bits are not set"
@ 2018-03-26 19:15 ` Yazen Ghannam
  0 siblings, 0 replies; 29+ messages in thread
From: Yazen Ghannam @ 2018-03-26 19:15 UTC (permalink / raw)
  To: linux-edac; +Cc: Yazen Ghannam, linux-kernel, bp, tony.luck, x86

From: Yazen Ghannam <yazen.ghannam@amd.com>

This reverts commit 4b1e84276a6172980c5bf39aa091ba13e90d6dad.

Software uses the valid bits to decide if the values can be used for
further processing or other actions. So setting the valid bits will have
software act on values that it shouldn't be acting on.

The recommendation to save all the register values does not mean that
the values are always valid.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 86079b90ebcf..42cf2880d0ed 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -446,20 +446,6 @@ static inline void mce_gather_info(struct mce *m, struct pt_regs *regs)
 		if (mca_cfg.rip_msr)
 			m->ip = mce_rdmsrl(mca_cfg.rip_msr);
 	}
-
-	/*
-	 * Error handlers should save the values in MCA_ADDR, MCA_MISC0, and
-	 * MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and
-	 * MCA_STATUS[SyndV] are zero.
-	 */
-	if (m->cpuvendor == X86_VENDOR_AMD) {
-		u64 status = MCI_STATUS_ADDRV | MCI_STATUS_MISCV;
-
-		if (mce_flags.smca)
-			status |= MCI_STATUS_SYNDV;
-
-		m->status |= status;
-	}
 }
 
 int mce_available(struct cpuinfo_x86 *c)
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [1/2] Revert "x86/mce/AMD: Collect error info even if valid bits are not set"
@ 2018-03-26 19:15 ` Yazen Ghannam
  0 siblings, 0 replies; 29+ messages in thread
From: Yazen Ghannam @ 2018-03-26 19:15 UTC (permalink / raw)
  To: linux-edac; +Cc: Yazen Ghannam, linux-kernel, bp, tony.luck, x86

From: Yazen Ghannam <yazen.ghannam@amd.com>

This reverts commit 4b1e84276a6172980c5bf39aa091ba13e90d6dad.

Software uses the valid bits to decide if the values can be used for
further processing or other actions. So setting the valid bits will have
software act on values that it shouldn't be acting on.

The recommendation to save all the register values does not mean that
the values are always valid.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 86079b90ebcf..42cf2880d0ed 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -446,20 +446,6 @@ static inline void mce_gather_info(struct mce *m, struct pt_regs *regs)
 		if (mca_cfg.rip_msr)
 			m->ip = mce_rdmsrl(mca_cfg.rip_msr);
 	}
-
-	/*
-	 * Error handlers should save the values in MCA_ADDR, MCA_MISC0, and
-	 * MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and
-	 * MCA_STATUS[SyndV] are zero.
-	 */
-	if (m->cpuvendor == X86_VENDOR_AMD) {
-		u64 status = MCI_STATUS_ADDRV | MCI_STATUS_MISCV;
-
-		if (mce_flags.smca)
-			status |= MCI_STATUS_SYNDV;
-
-		m->status |= status;
-	}
 }
 
 int mce_available(struct cpuinfo_x86 *c)

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents
@ 2018-03-26 19:15   ` Yazen Ghannam
  0 siblings, 0 replies; 29+ messages in thread
From: Yazen Ghannam @ 2018-03-26 19:15 UTC (permalink / raw)
  To: linux-edac; +Cc: Yazen Ghannam, linux-kernel, bp, tony.luck, x86

From: Yazen Ghannam <yazen.ghannam@amd.com>

The Intel SDM and AMD APM both state that the contents of the MCA_ADDR
register should be saved if MCA_STATUS[ADDRV] is set. The same applies
to MCA_MISC and MCA_SYND (on SMCA systems) and their respective valid
bits.

However, the Fam17h Processor Programming Reference states
"Error handlers should save the values in MCA_ADDR, MCA_MISC0, and
MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and
MCA_STATUS[SyndV] are zero."

This is to ensure that all MCA state information is collected even if
software cannot act upon it (because the valid bits are cleared).

So always save the auxiliary MCA register contents even if the valid
bits are cleared. This should not affect error processing because
software should still check the valid bits before using the register
contents for error processing.

Also, print MCA_{ADDR,MISC,SYND} even if their valid bits are not set.
Printing from EDAC/mce_amd is included here since we want to do this on
AMD systems.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c     | 23 +++++++----------------
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 10 +++-------
 drivers/edac/mce_amd.c               | 10 +++-------
 3 files changed, 13 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 42cf2880d0ed..a556e1cadfbc 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -248,19 +248,14 @@ static void __print_mce(struct mce *m)
 	}
 
 	pr_emerg(HW_ERR "TSC %llx ", m->tsc);
-	if (m->addr)
-		pr_cont("ADDR %llx ", m->addr);
-	if (m->misc)
-		pr_cont("MISC %llx ", m->misc);
+	pr_cont("ADDR %016llx ", m->addr);
+	pr_cont("MISC %016llx\n", m->misc);
 
 	if (mce_flags.smca) {
-		if (m->synd)
-			pr_cont("SYND %llx ", m->synd);
-		if (m->ipid)
-			pr_cont("IPID %llx ", m->ipid);
+		pr_emerg(HW_ERR "IPID %016llx ", m->ipid);
+		pr_cont("SYND %016llx\n", m->synd);
 	}
 
-	pr_cont("\n");
 	/*
 	 * Note this output is parsed by external tools and old fields
 	 * should not be changed.
@@ -639,12 +634,10 @@ static struct notifier_block mce_default_nb = {
  */
 static void mce_read_aux(struct mce *m, int i)
 {
-	if (m->status & MCI_STATUS_MISCV)
-		m->misc = mce_rdmsrl(msr_ops.misc(i));
+	m->misc = mce_rdmsrl(msr_ops.misc(i));
+	m->addr = mce_rdmsrl(msr_ops.addr(i));
 
 	if (m->status & MCI_STATUS_ADDRV) {
-		m->addr = mce_rdmsrl(msr_ops.addr(i));
-
 		/*
 		 * Mask the reported address by the reported granularity.
 		 */
@@ -667,9 +660,7 @@ static void mce_read_aux(struct mce *m, int i)
 
 	if (mce_flags.smca) {
 		m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i));
-
-		if (m->status & MCI_STATUS_SYNDV)
-			m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
+		m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
 	}
 }
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index f7666eef4a87..5a37ae704578 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -799,13 +799,12 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
 	mce_setup(&m);
 
 	m.status = status;
+	m.addr	 = addr;
 	m.misc   = misc;
 	m.bank   = bank;
 	m.tsc	 = rdtsc();
 
 	if (m.status & MCI_STATUS_ADDRV) {
-		m.addr = addr;
-
 		/*
 		 * Extract [55:<lsb>] where lsb is the least significant
 		 * *valid* bit of the address bits.
@@ -819,9 +818,7 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
 
 	if (mce_flags.smca) {
 		rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m.ipid);
-
-		if (m.status & MCI_STATUS_SYNDV)
-			rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m.synd);
+		rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m.synd);
 	}
 
 	mce_log(&m);
@@ -849,8 +846,7 @@ _log_error_bank(unsigned int bank, u32 msr_stat, u32 msr_addr, u64 misc)
 	if (!(status & MCI_STATUS_VAL))
 		return false;
 
-	if (status & MCI_STATUS_ADDRV)
-		rdmsrl(msr_addr, addr);
+	rdmsrl(msr_addr, addr);
 
 	__log_error(bank, status, addr, misc);
 
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 2ab4d61ee47e..004425cc8ddf 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -990,16 +990,12 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 
 	pr_cont("]: 0x%016llx\n", m->status);
 
-	if (m->status & MCI_STATUS_ADDRV)
-		pr_emerg(HW_ERR "Error Addr: 0x%016llx\n", m->addr);
+	pr_emerg(HW_ERR "Error Addr: 0x%016llx\n", m->addr);
+	pr_emerg(HW_ERR "Misc: 0x%016llx\n", m->misc);
 
 	if (boot_cpu_has(X86_FEATURE_SMCA)) {
 		pr_emerg(HW_ERR "IPID: 0x%016llx", m->ipid);
-
-		if (m->status & MCI_STATUS_SYNDV)
-			pr_cont(", Syndrome: 0x%016llx", m->synd);
-
-		pr_cont("\n");
+		pr_cont(", Syndrome: 0x%016llx\n", m->synd);
 
 		decode_smca_error(m);
 		goto err_code;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents
@ 2018-03-26 19:15   ` Yazen Ghannam
  0 siblings, 0 replies; 29+ messages in thread
From: Yazen Ghannam @ 2018-03-26 19:15 UTC (permalink / raw)
  To: linux-edac; +Cc: Yazen Ghannam, linux-kernel, bp, tony.luck, x86

From: Yazen Ghannam <yazen.ghannam@amd.com>

The Intel SDM and AMD APM both state that the contents of the MCA_ADDR
register should be saved if MCA_STATUS[ADDRV] is set. The same applies
to MCA_MISC and MCA_SYND (on SMCA systems) and their respective valid
bits.

However, the Fam17h Processor Programming Reference states
"Error handlers should save the values in MCA_ADDR, MCA_MISC0, and
MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and
MCA_STATUS[SyndV] are zero."

This is to ensure that all MCA state information is collected even if
software cannot act upon it (because the valid bits are cleared).

So always save the auxiliary MCA register contents even if the valid
bits are cleared. This should not affect error processing because
software should still check the valid bits before using the register
contents for error processing.

Also, print MCA_{ADDR,MISC,SYND} even if their valid bits are not set.
Printing from EDAC/mce_amd is included here since we want to do this on
AMD systems.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c     | 23 +++++++----------------
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 10 +++-------
 drivers/edac/mce_amd.c               | 10 +++-------
 3 files changed, 13 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 42cf2880d0ed..a556e1cadfbc 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -248,19 +248,14 @@ static void __print_mce(struct mce *m)
 	}
 
 	pr_emerg(HW_ERR "TSC %llx ", m->tsc);
-	if (m->addr)
-		pr_cont("ADDR %llx ", m->addr);
-	if (m->misc)
-		pr_cont("MISC %llx ", m->misc);
+	pr_cont("ADDR %016llx ", m->addr);
+	pr_cont("MISC %016llx\n", m->misc);
 
 	if (mce_flags.smca) {
-		if (m->synd)
-			pr_cont("SYND %llx ", m->synd);
-		if (m->ipid)
-			pr_cont("IPID %llx ", m->ipid);
+		pr_emerg(HW_ERR "IPID %016llx ", m->ipid);
+		pr_cont("SYND %016llx\n", m->synd);
 	}
 
-	pr_cont("\n");
 	/*
 	 * Note this output is parsed by external tools and old fields
 	 * should not be changed.
@@ -639,12 +634,10 @@ static struct notifier_block mce_default_nb = {
  */
 static void mce_read_aux(struct mce *m, int i)
 {
-	if (m->status & MCI_STATUS_MISCV)
-		m->misc = mce_rdmsrl(msr_ops.misc(i));
+	m->misc = mce_rdmsrl(msr_ops.misc(i));
+	m->addr = mce_rdmsrl(msr_ops.addr(i));
 
 	if (m->status & MCI_STATUS_ADDRV) {
-		m->addr = mce_rdmsrl(msr_ops.addr(i));
-
 		/*
 		 * Mask the reported address by the reported granularity.
 		 */
@@ -667,9 +660,7 @@ static void mce_read_aux(struct mce *m, int i)
 
 	if (mce_flags.smca) {
 		m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i));
-
-		if (m->status & MCI_STATUS_SYNDV)
-			m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
+		m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
 	}
 }
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index f7666eef4a87..5a37ae704578 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -799,13 +799,12 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
 	mce_setup(&m);
 
 	m.status = status;
+	m.addr	 = addr;
 	m.misc   = misc;
 	m.bank   = bank;
 	m.tsc	 = rdtsc();
 
 	if (m.status & MCI_STATUS_ADDRV) {
-		m.addr = addr;
-
 		/*
 		 * Extract [55:<lsb>] where lsb is the least significant
 		 * *valid* bit of the address bits.
@@ -819,9 +818,7 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
 
 	if (mce_flags.smca) {
 		rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m.ipid);
-
-		if (m.status & MCI_STATUS_SYNDV)
-			rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m.synd);
+		rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m.synd);
 	}
 
 	mce_log(&m);
@@ -849,8 +846,7 @@ _log_error_bank(unsigned int bank, u32 msr_stat, u32 msr_addr, u64 misc)
 	if (!(status & MCI_STATUS_VAL))
 		return false;
 
-	if (status & MCI_STATUS_ADDRV)
-		rdmsrl(msr_addr, addr);
+	rdmsrl(msr_addr, addr);
 
 	__log_error(bank, status, addr, misc);
 
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 2ab4d61ee47e..004425cc8ddf 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -990,16 +990,12 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 
 	pr_cont("]: 0x%016llx\n", m->status);
 
-	if (m->status & MCI_STATUS_ADDRV)
-		pr_emerg(HW_ERR "Error Addr: 0x%016llx\n", m->addr);
+	pr_emerg(HW_ERR "Error Addr: 0x%016llx\n", m->addr);
+	pr_emerg(HW_ERR "Misc: 0x%016llx\n", m->misc);
 
 	if (boot_cpu_has(X86_FEATURE_SMCA)) {
 		pr_emerg(HW_ERR "IPID: 0x%016llx", m->ipid);
-
-		if (m->status & MCI_STATUS_SYNDV)
-			pr_cont(", Syndrome: 0x%016llx", m->synd);
-
-		pr_cont("\n");
+		pr_cont(", Syndrome: 0x%016llx\n", m->synd);
 
 		decode_smca_error(m);
 		goto err_code;

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if valid bits are not set"
@ 2018-03-26 19:30   ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2018-03-26 19:30 UTC (permalink / raw)
  To: Yazen Ghannam; +Cc: linux-edac, linux-kernel, tony.luck, x86

On Mon, Mar 26, 2018 at 02:15:25PM -0500, Yazen Ghannam wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> This reverts commit 4b1e84276a6172980c5bf39aa091ba13e90d6dad.
> 
> Software uses the valid bits to decide if the values can be used for
> further processing or other actions. So setting the valid bits will have
> software act on values that it shouldn't be acting on.
> 
> The recommendation to save all the register values does not mean that
> the values are always valid.

So what does that

"Error handlers should save the values in MCA_ADDR, MCA_MISC0,
and MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and
MCA_STATUS[SyndV] are zero."

*actually* mean then?

It is still in the PPR.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [1/2] Revert "x86/mce/AMD: Collect error info even if valid bits are not set"
@ 2018-03-26 19:30   ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2018-03-26 19:30 UTC (permalink / raw)
  To: Yazen Ghannam; +Cc: linux-edac, linux-kernel, tony.luck, x86

On Mon, Mar 26, 2018 at 02:15:25PM -0500, Yazen Ghannam wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> This reverts commit 4b1e84276a6172980c5bf39aa091ba13e90d6dad.
> 
> Software uses the valid bits to decide if the values can be used for
> further processing or other actions. So setting the valid bits will have
> software act on values that it shouldn't be acting on.
> 
> The recommendation to save all the register values does not mean that
> the values are always valid.

So what does that

"Error handlers should save the values in MCA_ADDR, MCA_MISC0,
and MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and
MCA_STATUS[SyndV] are zero."

*actually* mean then?

It is still in the PPR.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents
@ 2018-03-26 19:35     ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2018-03-26 19:35 UTC (permalink / raw)
  To: Yazen Ghannam; +Cc: linux-edac, linux-kernel, tony.luck, x86

On Mon, Mar 26, 2018 at 02:15:26PM -0500, Yazen Ghannam wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> The Intel SDM and AMD APM both state that the contents of the MCA_ADDR
> register should be saved if MCA_STATUS[ADDRV] is set. The same applies
> to MCA_MISC and MCA_SYND (on SMCA systems) and their respective valid
> bits.
> 
> However, the Fam17h Processor Programming Reference states
> "Error handlers should save the values in MCA_ADDR, MCA_MISC0, and
> MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and
> MCA_STATUS[SyndV] are zero."

Well, then you can't remove valid bit checks for older families. This
sounds like F17h only.

If so, it better be abstracted away cleanly and not changing the generic
code.

> 
> This is to ensure that all MCA state information is collected even if
> software cannot act upon it (because the valid bits are cleared).
> 
> So always save the auxiliary MCA register contents even if the valid
> bits are cleared. This should not affect error processing because
> software should still check the valid bits before using the register
> contents for error processing.
> 
> Also, print MCA_{ADDR,MISC,SYND} even if their valid bits are not set.
> Printing from EDAC/mce_amd is included here since we want to do this on
> AMD systems.
> 
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c     | 23 +++++++----------------
>  arch/x86/kernel/cpu/mcheck/mce_amd.c | 10 +++-------
>  drivers/edac/mce_amd.c               | 10 +++-------
>  3 files changed, 13 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 42cf2880d0ed..a556e1cadfbc 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -248,19 +248,14 @@ static void __print_mce(struct mce *m)
>  	}
>  
>  	pr_emerg(HW_ERR "TSC %llx ", m->tsc);
> -	if (m->addr)
> -		pr_cont("ADDR %llx ", m->addr);
> -	if (m->misc)
> -		pr_cont("MISC %llx ", m->misc);
> +	pr_cont("ADDR %016llx ", m->addr);
> +	pr_cont("MISC %016llx\n", m->misc);

You simply can't do this - this is generic code, not AMD only.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents
@ 2018-03-26 19:35     ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2018-03-26 19:35 UTC (permalink / raw)
  To: Yazen Ghannam; +Cc: linux-edac, linux-kernel, tony.luck, x86

On Mon, Mar 26, 2018 at 02:15:26PM -0500, Yazen Ghannam wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> The Intel SDM and AMD APM both state that the contents of the MCA_ADDR
> register should be saved if MCA_STATUS[ADDRV] is set. The same applies
> to MCA_MISC and MCA_SYND (on SMCA systems) and their respective valid
> bits.
> 
> However, the Fam17h Processor Programming Reference states
> "Error handlers should save the values in MCA_ADDR, MCA_MISC0, and
> MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and
> MCA_STATUS[SyndV] are zero."

Well, then you can't remove valid bit checks for older families. This
sounds like F17h only.

If so, it better be abstracted away cleanly and not changing the generic
code.

> 
> This is to ensure that all MCA state information is collected even if
> software cannot act upon it (because the valid bits are cleared).
> 
> So always save the auxiliary MCA register contents even if the valid
> bits are cleared. This should not affect error processing because
> software should still check the valid bits before using the register
> contents for error processing.
> 
> Also, print MCA_{ADDR,MISC,SYND} even if their valid bits are not set.
> Printing from EDAC/mce_amd is included here since we want to do this on
> AMD systems.
> 
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c     | 23 +++++++----------------
>  arch/x86/kernel/cpu/mcheck/mce_amd.c | 10 +++-------
>  drivers/edac/mce_amd.c               | 10 +++-------
>  3 files changed, 13 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 42cf2880d0ed..a556e1cadfbc 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -248,19 +248,14 @@ static void __print_mce(struct mce *m)
>  	}
>  
>  	pr_emerg(HW_ERR "TSC %llx ", m->tsc);
> -	if (m->addr)
> -		pr_cont("ADDR %llx ", m->addr);
> -	if (m->misc)
> -		pr_cont("MISC %llx ", m->misc);
> +	pr_cont("ADDR %016llx ", m->addr);
> +	pr_cont("MISC %016llx\n", m->misc);

You simply can't do this - this is generic code, not AMD only.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if valid bits are not set"
@ 2018-03-26 19:58     ` Yazen Ghannam
  0 siblings, 0 replies; 29+ messages in thread
From: Ghannam, Yazen @ 2018-03-26 19:58 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel, tony.luck, x86

> -----Original Message-----
> From: linux-edac-owner@vger.kernel.org <linux-edac-
> owner@vger.kernel.org> On Behalf Of Borislav Petkov
> Sent: Monday, March 26, 2018 3:31 PM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org;
> tony.luck@intel.com; x86@kernel.org
> Subject: Re: [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if
> valid bits are not set"
> 
> On Mon, Mar 26, 2018 at 02:15:25PM -0500, Yazen Ghannam wrote:
> > From: Yazen Ghannam <yazen.ghannam@amd.com>
> >
> > This reverts commit 4b1e84276a6172980c5bf39aa091ba13e90d6dad.
> >
> > Software uses the valid bits to decide if the values can be used for
> > further processing or other actions. So setting the valid bits will have
> > software act on values that it shouldn't be acting on.
> >
> > The recommendation to save all the register values does not mean that
> > the values are always valid.
> 
> So what does that
> 
> "Error handlers should save the values in MCA_ADDR, MCA_MISC0,
> and MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and
> MCA_STATUS[SyndV] are zero."
> 
> *actually* mean then?
> 
> It is still in the PPR.
> 

We should always save as much of the error state as we can even if we
can't act upon it. Basically, we don't ever want to lose information in the
case of some unforeseen issue in the reporting mechanisms or something
else. There aren't any issues that require this change at the moment. But
I think the Design folks are being more conservative in ensuring that all
possible data is collected.

So at a minimum, we should always save and report as much as we can.
But we don't try any recovery actions unless we're sure the data is valid.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [1/2] Revert "x86/mce/AMD: Collect error info even if valid bits are not set"
@ 2018-03-26 19:58     ` Yazen Ghannam
  0 siblings, 0 replies; 29+ messages in thread
From: Yazen Ghannam @ 2018-03-26 19:58 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel, tony.luck, x86

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBsaW51eC1lZGFjLW93bmVyQHZn
ZXIua2VybmVsLm9yZyA8bGludXgtZWRhYy0NCj4gb3duZXJAdmdlci5rZXJuZWwub3JnPiBPbiBC
ZWhhbGYgT2YgQm9yaXNsYXYgUGV0a292DQo+IFNlbnQ6IE1vbmRheSwgTWFyY2ggMjYsIDIwMTgg
MzozMSBQTQ0KPiBUbzogR2hhbm5hbSwgWWF6ZW4gPFlhemVuLkdoYW5uYW1AYW1kLmNvbT4NCj4g
Q2M6IGxpbnV4LWVkYWNAdmdlci5rZXJuZWwub3JnOyBsaW51eC1rZXJuZWxAdmdlci5rZXJuZWwu
b3JnOw0KPiB0b255Lmx1Y2tAaW50ZWwuY29tOyB4ODZAa2VybmVsLm9yZw0KPiBTdWJqZWN0OiBS
ZTogW1BBVENIIDEvMl0gUmV2ZXJ0ICJ4ODYvbWNlL0FNRDogQ29sbGVjdCBlcnJvciBpbmZvIGV2
ZW4gaWYNCj4gdmFsaWQgYml0cyBhcmUgbm90IHNldCINCj4gDQo+IE9uIE1vbiwgTWFyIDI2LCAy
MDE4IGF0IDAyOjE1OjI1UE0gLTA1MDAsIFlhemVuIEdoYW5uYW0gd3JvdGU6DQo+ID4gRnJvbTog
WWF6ZW4gR2hhbm5hbSA8eWF6ZW4uZ2hhbm5hbUBhbWQuY29tPg0KPiA+DQo+ID4gVGhpcyByZXZl
cnRzIGNvbW1pdCA0YjFlODQyNzZhNjE3Mjk4MGM1YmYzOWFhMDkxYmExM2U5MGQ2ZGFkLg0KPiA+
DQo+ID4gU29mdHdhcmUgdXNlcyB0aGUgdmFsaWQgYml0cyB0byBkZWNpZGUgaWYgdGhlIHZhbHVl
cyBjYW4gYmUgdXNlZCBmb3INCj4gPiBmdXJ0aGVyIHByb2Nlc3Npbmcgb3Igb3RoZXIgYWN0aW9u
cy4gU28gc2V0dGluZyB0aGUgdmFsaWQgYml0cyB3aWxsIGhhdmUNCj4gPiBzb2Z0d2FyZSBhY3Qg
b24gdmFsdWVzIHRoYXQgaXQgc2hvdWxkbid0IGJlIGFjdGluZyBvbi4NCj4gPg0KPiA+IFRoZSBy
ZWNvbW1lbmRhdGlvbiB0byBzYXZlIGFsbCB0aGUgcmVnaXN0ZXIgdmFsdWVzIGRvZXMgbm90IG1l
YW4gdGhhdA0KPiA+IHRoZSB2YWx1ZXMgYXJlIGFsd2F5cyB2YWxpZC4NCj4gDQo+IFNvIHdoYXQg
ZG9lcyB0aGF0DQo+IA0KPiAiRXJyb3IgaGFuZGxlcnMgc2hvdWxkIHNhdmUgdGhlIHZhbHVlcyBp
biBNQ0FfQUREUiwgTUNBX01JU0MwLA0KPiBhbmQgTUNBX1NZTkQgZXZlbiBpZiBNQ0FfU1RBVFVT
W0FkZHJWXSwgTUNBX1NUQVRVU1tNaXNjVl0sIGFuZA0KPiBNQ0FfU1RBVFVTW1N5bmRWXSBhcmUg
emVyby4iDQo+IA0KPiAqYWN0dWFsbHkqIG1lYW4gdGhlbj8NCj4gDQo+IEl0IGlzIHN0aWxsIGlu
IHRoZSBQUFIuDQo+IA0KDQpXZSBzaG91bGQgYWx3YXlzIHNhdmUgYXMgbXVjaCBvZiB0aGUgZXJy
b3Igc3RhdGUgYXMgd2UgY2FuIGV2ZW4gaWYgd2UNCmNhbid0IGFjdCB1cG9uIGl0LiBCYXNpY2Fs
bHksIHdlIGRvbid0IGV2ZXIgd2FudCB0byBsb3NlIGluZm9ybWF0aW9uIGluIHRoZQ0KY2FzZSBv
ZiBzb21lIHVuZm9yZXNlZW4gaXNzdWUgaW4gdGhlIHJlcG9ydGluZyBtZWNoYW5pc21zIG9yIHNv
bWV0aGluZw0KZWxzZS4gVGhlcmUgYXJlbid0IGFueSBpc3N1ZXMgdGhhdCByZXF1aXJlIHRoaXMg
Y2hhbmdlIGF0IHRoZSBtb21lbnQuIEJ1dA0KSSB0aGluayB0aGUgRGVzaWduIGZvbGtzIGFyZSBi
ZWluZyBtb3JlIGNvbnNlcnZhdGl2ZSBpbiBlbnN1cmluZyB0aGF0IGFsbA0KcG9zc2libGUgZGF0
YSBpcyBjb2xsZWN0ZWQuDQoNClNvIGF0IGEgbWluaW11bSwgd2Ugc2hvdWxkIGFsd2F5cyBzYXZl
IGFuZCByZXBvcnQgYXMgbXVjaCBhcyB3ZSBjYW4uDQpCdXQgd2UgZG9uJ3QgdHJ5IGFueSByZWNv
dmVyeSBhY3Rpb25zIHVubGVzcyB3ZSdyZSBzdXJlIHRoZSBkYXRhIGlzIHZhbGlkLg0KDQpUaGFu
a3MsDQpZYXplbg0K
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents
@ 2018-03-26 20:05       ` Yazen Ghannam
  0 siblings, 0 replies; 29+ messages in thread
From: Ghannam, Yazen @ 2018-03-26 20:05 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel, tony.luck, x86

> -----Original Message-----
> From: linux-edac-owner@vger.kernel.org <linux-edac-
> owner@vger.kernel.org> On Behalf Of Borislav Petkov
> Sent: Monday, March 26, 2018 3:35 PM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org;
> tony.luck@intel.com; x86@kernel.org
> Subject: Re: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND}
> register contents
> 
> On Mon, Mar 26, 2018 at 02:15:26PM -0500, Yazen Ghannam wrote:
> > From: Yazen Ghannam <yazen.ghannam@amd.com>
> >
> > The Intel SDM and AMD APM both state that the contents of the
> MCA_ADDR
> > register should be saved if MCA_STATUS[ADDRV] is set. The same applies
> > to MCA_MISC and MCA_SYND (on SMCA systems) and their respective valid
> > bits.
> >
> > However, the Fam17h Processor Programming Reference states
> > "Error handlers should save the values in MCA_ADDR, MCA_MISC0, and
> > MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and
> > MCA_STATUS[SyndV] are zero."
> 
> Well, then you can't remove valid bit checks for older families. This
> sounds like F17h only.
> 
> If so, it better be abstracted away cleanly and not changing the generic
> code.
> 

Sure, I can do that. But I didn't think it was necessary because it doesn't hurt
to read the registers whether or not the valid bits are set.

> >
> > This is to ensure that all MCA state information is collected even if
> > software cannot act upon it (because the valid bits are cleared).
> >
> > So always save the auxiliary MCA register contents even if the valid
> > bits are cleared. This should not affect error processing because
> > software should still check the valid bits before using the register
> > contents for error processing.
> >
> > Also, print MCA_{ADDR,MISC,SYND} even if their valid bits are not set.
> > Printing from EDAC/mce_amd is included here since we want to do this on
> > AMD systems.
> >
> > Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> > ---
> >  arch/x86/kernel/cpu/mcheck/mce.c     | 23 +++++++----------------
> >  arch/x86/kernel/cpu/mcheck/mce_amd.c | 10 +++-------
> >  drivers/edac/mce_amd.c               | 10 +++-------
> >  3 files changed, 13 insertions(+), 30 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c
> b/arch/x86/kernel/cpu/mcheck/mce.c
> > index 42cf2880d0ed..a556e1cadfbc 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> > @@ -248,19 +248,14 @@ static void __print_mce(struct mce *m)
> >  	}
> >
> >  	pr_emerg(HW_ERR "TSC %llx ", m->tsc);
> > -	if (m->addr)
> > -		pr_cont("ADDR %llx ", m->addr);
> > -	if (m->misc)
> > -		pr_cont("MISC %llx ", m->misc);
> > +	pr_cont("ADDR %016llx ", m->addr);
> > +	pr_cont("MISC %016llx\n", m->misc);
> 
> You simply can't do this - this is generic code, not AMD only.
> 

I can change this if you'd like. I just thought it would be simpler to
make the change here since we're just printing the values.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents
@ 2018-03-26 20:05       ` Yazen Ghannam
  0 siblings, 0 replies; 29+ messages in thread
From: Yazen Ghannam @ 2018-03-26 20:05 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel, tony.luck, x86

> -----Original Message-----
> From: linux-edac-owner@vger.kernel.org <linux-edac-
> owner@vger.kernel.org> On Behalf Of Borislav Petkov
> Sent: Monday, March 26, 2018 3:35 PM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org;
> tony.luck@intel.com; x86@kernel.org
> Subject: Re: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND}
> register contents
> 
> On Mon, Mar 26, 2018 at 02:15:26PM -0500, Yazen Ghannam wrote:
> > From: Yazen Ghannam <yazen.ghannam@amd.com>
> >
> > The Intel SDM and AMD APM both state that the contents of the
> MCA_ADDR
> > register should be saved if MCA_STATUS[ADDRV] is set. The same applies
> > to MCA_MISC and MCA_SYND (on SMCA systems) and their respective valid
> > bits.
> >
> > However, the Fam17h Processor Programming Reference states
> > "Error handlers should save the values in MCA_ADDR, MCA_MISC0, and
> > MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and
> > MCA_STATUS[SyndV] are zero."
> 
> Well, then you can't remove valid bit checks for older families. This
> sounds like F17h only.
> 
> If so, it better be abstracted away cleanly and not changing the generic
> code.
> 

Sure, I can do that. But I didn't think it was necessary because it doesn't hurt
to read the registers whether or not the valid bits are set.

> >
> > This is to ensure that all MCA state information is collected even if
> > software cannot act upon it (because the valid bits are cleared).
> >
> > So always save the auxiliary MCA register contents even if the valid
> > bits are cleared. This should not affect error processing because
> > software should still check the valid bits before using the register
> > contents for error processing.
> >
> > Also, print MCA_{ADDR,MISC,SYND} even if their valid bits are not set.
> > Printing from EDAC/mce_amd is included here since we want to do this on
> > AMD systems.
> >
> > Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> > ---
> >  arch/x86/kernel/cpu/mcheck/mce.c     | 23 +++++++----------------
> >  arch/x86/kernel/cpu/mcheck/mce_amd.c | 10 +++-------
> >  drivers/edac/mce_amd.c               | 10 +++-------
> >  3 files changed, 13 insertions(+), 30 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c
> b/arch/x86/kernel/cpu/mcheck/mce.c
> > index 42cf2880d0ed..a556e1cadfbc 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> > @@ -248,19 +248,14 @@ static void __print_mce(struct mce *m)
> >  	}
> >
> >  	pr_emerg(HW_ERR "TSC %llx ", m->tsc);
> > -	if (m->addr)
> > -		pr_cont("ADDR %llx ", m->addr);
> > -	if (m->misc)
> > -		pr_cont("MISC %llx ", m->misc);
> > +	pr_cont("ADDR %016llx ", m->addr);
> > +	pr_cont("MISC %016llx\n", m->misc);
> 
> You simply can't do this - this is generic code, not AMD only.
> 

I can change this if you'd like. I just thought it would be simpler to
make the change here since we're just printing the values.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if valid bits are not set"
@ 2018-03-26 20:07       ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2018-03-26 20:07 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel, tony.luck, x86

On Mon, Mar 26, 2018 at 07:58:51PM +0000, Ghannam, Yazen wrote:
> So at a minimum, we should always save and report as much as we can.

Only on Zen or all AMD families?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [1/2] Revert "x86/mce/AMD: Collect error info even if valid bits are not set"
@ 2018-03-26 20:07       ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2018-03-26 20:07 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel, tony.luck, x86

On Mon, Mar 26, 2018 at 07:58:51PM +0000, Ghannam, Yazen wrote:
> So at a minimum, we should always save and report as much as we can.

Only on Zen or all AMD families?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents
@ 2018-03-26 20:09         ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2018-03-26 20:09 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel, tony.luck, x86

On Mon, Mar 26, 2018 at 08:05:37PM +0000, Ghannam, Yazen wrote:
> Sure, I can do that. But I didn't think it was necessary because it doesn't hurt
> to read the registers whether or not the valid bits are set.

No, this needs to be AMD-specific because it will confuse people using
Intel machines.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents
@ 2018-03-26 20:09         ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2018-03-26 20:09 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel, tony.luck, x86

On Mon, Mar 26, 2018 at 08:05:37PM +0000, Ghannam, Yazen wrote:
> Sure, I can do that. But I didn't think it was necessary because it doesn't hurt
> to read the registers whether or not the valid bits are set.

No, this needs to be AMD-specific because it will confuse people using
Intel machines.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents
@ 2018-03-26 20:27           ` Luck, Tony
  0 siblings, 0 replies; 29+ messages in thread
From: Luck, Tony @ 2018-03-26 20:27 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Ghannam, Yazen, linux-edac, linux-kernel, x86

On Mon, Mar 26, 2018 at 10:09:55PM +0200, Borislav Petkov wrote:
> On Mon, Mar 26, 2018 at 08:05:37PM +0000, Ghannam, Yazen wrote:
> > Sure, I can do that. But I didn't think it was necessary because it doesn't hurt
> > to read the registers whether or not the valid bits are set.
> 
> No, this needs to be AMD-specific because it will confuse people using
> Intel machines.

Worse than confusion it may even cause a crash on Intel. Quoting the
Intel SDM:

  15.3.2.3 IA32_MCi_ADDR MSRs

  The IA32_MCi_ADDR MSR contains the address of the code or data memory
  location that produced the machine- check error if the ADDRV flag in
  the IA32_MCi_STATUS register is set (see Section 15-7, “IA32_MCi_ADDR
  MSR”).  The IA32_MCi_ADDR register is either not implemented or
  contains no address if the ADDRV flag in the IA32_MCi_STATUS register
  is clear. When not implemented in the processor, all reads and writes
  to this MSR will cause a general protection exception.

Ditto for the MISC register.  Please don't read them unless
the ADDRV/MISCV bits are set in the corresponding STATUS
register.

Thanks

-Tony

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents
@ 2018-03-26 20:27           ` Luck, Tony
  0 siblings, 0 replies; 29+ messages in thread
From: Luck, Tony @ 2018-03-26 20:27 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Ghannam, Yazen, linux-edac, linux-kernel, x86

On Mon, Mar 26, 2018 at 10:09:55PM +0200, Borislav Petkov wrote:
> On Mon, Mar 26, 2018 at 08:05:37PM +0000, Ghannam, Yazen wrote:
> > Sure, I can do that. But I didn't think it was necessary because it doesn't hurt
> > to read the registers whether or not the valid bits are set.
> 
> No, this needs to be AMD-specific because it will confuse people using
> Intel machines.

Worse than confusion it may even cause a crash on Intel. Quoting the
Intel SDM:

  15.3.2.3 IA32_MCi_ADDR MSRs

  The IA32_MCi_ADDR MSR contains the address of the code or data memory
  location that produced the machine- check error if the ADDRV flag in
  the IA32_MCi_STATUS register is set (see Section 15-7, “IA32_MCi_ADDR
  MSR”).  The IA32_MCi_ADDR register is either not implemented or
  contains no address if the ADDRV flag in the IA32_MCi_STATUS register
  is clear. When not implemented in the processor, all reads and writes
  to this MSR will cause a general protection exception.

Ditto for the MISC register.  Please don't read them unless
the ADDRV/MISCV bits are set in the corresponding STATUS
register.

Thanks

-Tony
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if valid bits are not set"
@ 2018-03-27 14:02         ` Yazen Ghannam
  0 siblings, 0 replies; 29+ messages in thread
From: Ghannam, Yazen @ 2018-03-27 14:02 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel, tony.luck, x86

> -----Original Message-----
> From: Borislav Petkov <bp@alien8.de>
> Sent: Monday, March 26, 2018 4:08 PM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org;
> tony.luck@intel.com; x86@kernel.org
> Subject: Re: [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if
> valid bits are not set"
> 
> On Mon, Mar 26, 2018 at 07:58:51PM +0000, Ghannam, Yazen wrote:
> > So at a minimum, we should always save and report as much as we can.
> 
> Only on Zen or all AMD families?
> 

I'll confirm with the HW folks. I understand it as a change in philosophy
rather than a change in hardware.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [1/2] Revert "x86/mce/AMD: Collect error info even if valid bits are not set"
@ 2018-03-27 14:02         ` Yazen Ghannam
  0 siblings, 0 replies; 29+ messages in thread
From: Yazen Ghannam @ 2018-03-27 14:02 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel, tony.luck, x86

> -----Original Message-----
> From: Borislav Petkov <bp@alien8.de>
> Sent: Monday, March 26, 2018 4:08 PM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org;
> tony.luck@intel.com; x86@kernel.org
> Subject: Re: [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if
> valid bits are not set"
> 
> On Mon, Mar 26, 2018 at 07:58:51PM +0000, Ghannam, Yazen wrote:
> > So at a minimum, we should always save and report as much as we can.
> 
> Only on Zen or all AMD families?
> 

I'll confirm with the HW folks. I understand it as a change in philosophy
rather than a change in hardware.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents
@ 2018-03-27 14:07             ` Yazen Ghannam
  0 siblings, 0 replies; 29+ messages in thread
From: Ghannam, Yazen @ 2018-03-27 14:07 UTC (permalink / raw)
  To: Luck, Tony, Borislav Petkov; +Cc: linux-edac, linux-kernel, x86

> -----Original Message-----
> From: Luck, Tony <tony.luck@intel.com>
> Sent: Monday, March 26, 2018 4:28 PM
> To: Borislav Petkov <bp@alien8.de>
> Cc: Ghannam, Yazen <Yazen.Ghannam@amd.com>; linux-
> edac@vger.kernel.org; linux-kernel@vger.kernel.org; x86@kernel.org
> Subject: Re: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND}
> register contents
> 
> On Mon, Mar 26, 2018 at 10:09:55PM +0200, Borislav Petkov wrote:
> > On Mon, Mar 26, 2018 at 08:05:37PM +0000, Ghannam, Yazen wrote:
> > > Sure, I can do that. But I didn't think it was necessary because it doesn't
> hurt
> > > to read the registers whether or not the valid bits are set.
> >
> > No, this needs to be AMD-specific because it will confuse people using
> > Intel machines.
> 
> Worse than confusion it may even cause a crash on Intel. Quoting the
> Intel SDM:
> 
>   15.3.2.3 IA32_MCi_ADDR MSRs
> 
>   The IA32_MCi_ADDR MSR contains the address of the code or data memory
>   location that produced the machine- check error if the ADDRV flag in
>   the IA32_MCi_STATUS register is set (see Section 15-7, “IA32_MCi_ADDR
>   MSR”).  The IA32_MCi_ADDR register is either not implemented or
>   contains no address if the ADDRV flag in the IA32_MCi_STATUS register
>   is clear. When not implemented in the processor, all reads and writes
>   to this MSR will cause a general protection exception.
> 
> Ditto for the MISC register.  Please don't read them unless
> the ADDRV/MISCV bits are set in the corresponding STATUS
> register.
> 

Okay, then we won't do this.

On AMD, the registers are implemented if MCA is implemented. They'll just
be read-as-zero if not available for some reason.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents
@ 2018-03-27 14:07             ` Yazen Ghannam
  0 siblings, 0 replies; 29+ messages in thread
From: Yazen Ghannam @ 2018-03-27 14:07 UTC (permalink / raw)
  To: Luck, Tony, Borislav Petkov; +Cc: linux-edac, linux-kernel, x86

> -----Original Message-----
> From: Luck, Tony <tony.luck@intel.com>
> Sent: Monday, March 26, 2018 4:28 PM
> To: Borislav Petkov <bp@alien8.de>
> Cc: Ghannam, Yazen <Yazen.Ghannam@amd.com>; linux-
> edac@vger.kernel.org; linux-kernel@vger.kernel.org; x86@kernel.org
> Subject: Re: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND}
> register contents
> 
> On Mon, Mar 26, 2018 at 10:09:55PM +0200, Borislav Petkov wrote:
> > On Mon, Mar 26, 2018 at 08:05:37PM +0000, Ghannam, Yazen wrote:
> > > Sure, I can do that. But I didn't think it was necessary because it doesn't
> hurt
> > > to read the registers whether or not the valid bits are set.
> >
> > No, this needs to be AMD-specific because it will confuse people using
> > Intel machines.
> 
> Worse than confusion it may even cause a crash on Intel. Quoting the
> Intel SDM:
> 
>   15.3.2.3 IA32_MCi_ADDR MSRs
> 
>   The IA32_MCi_ADDR MSR contains the address of the code or data memory
>   location that produced the machine- check error if the ADDRV flag in
>   the IA32_MCi_STATUS register is set (see Section 15-7, “IA32_MCi_ADDR
>   MSR”).  The IA32_MCi_ADDR register is either not implemented or
>   contains no address if the ADDRV flag in the IA32_MCi_STATUS register
>   is clear. When not implemented in the processor, all reads and writes
>   to this MSR will cause a general protection exception.
> 
> Ditto for the MISC register.  Please don't read them unless
> the ADDRV/MISCV bits are set in the corresponding STATUS
> register.
> 

Okay, then we won't do this.

On AMD, the registers are implemented if MCA is implemented. They'll just
be read-as-zero if not available for some reason.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if valid bits are not set"
@ 2018-03-27 15:59           ` Yazen Ghannam
  0 siblings, 0 replies; 29+ messages in thread
From: Ghannam, Yazen @ 2018-03-27 15:59 UTC (permalink / raw)
  To: Ghannam, Yazen, Borislav Petkov; +Cc: linux-edac, linux-kernel, tony.luck, x86

> -----Original Message-----
> From: linux-edac-owner@vger.kernel.org <linux-edac-
> owner@vger.kernel.org> On Behalf Of Ghannam, Yazen
> Sent: Tuesday, March 27, 2018 10:02 AM
> To: Borislav Petkov <bp@alien8.de>
> Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org;
> tony.luck@intel.com; x86@kernel.org
> Subject: RE: [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if
> valid bits are not set"
> 
> > -----Original Message-----
> > From: Borislav Petkov <bp@alien8.de>
> > Sent: Monday, March 26, 2018 4:08 PM
> > To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> > Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org;
> > tony.luck@intel.com; x86@kernel.org
> > Subject: Re: [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if
> > valid bits are not set"
> >
> > On Mon, Mar 26, 2018 at 07:58:51PM +0000, Ghannam, Yazen wrote:
> > > So at a minimum, we should always save and report as much as we can.
> >
> > Only on Zen or all AMD families?
> >
> 
> I'll confirm with the HW folks. I understand it as a change in philosophy
> rather than a change in hardware.
> 

So this recommendation could apply to all families, but it's okay if we just
apply this behavior to SMCA systems. That way we don't need to worry
about changing things on legacy systems.

I'll write a new patch that abstracts the register reads and applies the
different behaviors.

In any case, this patch should be reverted since faking the valid bits will
cause the downstream code in the notifier blocks to process errors they
shouldn't.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [1/2] Revert "x86/mce/AMD: Collect error info even if valid bits are not set"
@ 2018-03-27 15:59           ` Yazen Ghannam
  0 siblings, 0 replies; 29+ messages in thread
From: Yazen Ghannam @ 2018-03-27 15:59 UTC (permalink / raw)
  To: Ghannam, Yazen, Borislav Petkov; +Cc: linux-edac, linux-kernel, tony.luck, x86

> -----Original Message-----
> From: linux-edac-owner@vger.kernel.org <linux-edac-
> owner@vger.kernel.org> On Behalf Of Ghannam, Yazen
> Sent: Tuesday, March 27, 2018 10:02 AM
> To: Borislav Petkov <bp@alien8.de>
> Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org;
> tony.luck@intel.com; x86@kernel.org
> Subject: RE: [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if
> valid bits are not set"
> 
> > -----Original Message-----
> > From: Borislav Petkov <bp@alien8.de>
> > Sent: Monday, March 26, 2018 4:08 PM
> > To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> > Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org;
> > tony.luck@intel.com; x86@kernel.org
> > Subject: Re: [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if
> > valid bits are not set"
> >
> > On Mon, Mar 26, 2018 at 07:58:51PM +0000, Ghannam, Yazen wrote:
> > > So at a minimum, we should always save and report as much as we can.
> >
> > Only on Zen or all AMD families?
> >
> 
> I'll confirm with the HW folks. I understand it as a change in philosophy
> rather than a change in hardware.
> 

So this recommendation could apply to all families, but it's okay if we just
apply this behavior to SMCA systems. That way we don't need to worry
about changing things on legacy systems.

I'll write a new patch that abstracts the register reads and applies the
different behaviors.

In any case, this patch should be reverted since faking the valid bits will
cause the downstream code in the notifier blocks to process errors they
shouldn't.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [tip:ras/core] Revert "x86/mce/AMD: Collect error info even if valid bits are not set"
  2018-03-26 19:15 ` [1/2] " Yazen Ghannam
                   ` (2 preceding siblings ...)
  (?)
@ 2018-03-28 18:39 ` tip-bot for Yazen Ghannam
  -1 siblings, 0 replies; 29+ messages in thread
From: tip-bot for Yazen Ghannam @ 2018-03-28 18:39 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: yazen.ghannam, Yazen.Ghannam, mingo, linux-kernel, tglx, hpa

Commit-ID:  e2efacb6a54ab54626da3507be1008d0040492cc
Gitweb:     https://git.kernel.org/tip/e2efacb6a54ab54626da3507be1008d0040492cc
Author:     Yazen Ghannam <yazen.ghannam@amd.com>
AuthorDate: Mon, 26 Mar 2018 14:15:25 -0500
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 28 Mar 2018 20:34:59 +0200

Revert "x86/mce/AMD: Collect error info even if valid bits are not set"

This reverts commit 4b1e84276a6172980c5bf39aa091ba13e90d6dad.

Software uses the valid bits to decide if the values can be used for
further processing or other actions. So setting the valid bits will have
software act on values that it shouldn't be acting on.

The recommendation to save all the register values does not mean that
the values are always valid.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: tony.luck@intel.com
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: bp@suse.de
Cc: linux-edac@vger.kernel.org
Link: https://lkml.kernel.org/r/20180326191526.64314-1-Yazen.Ghannam@amd.com

---
 arch/x86/kernel/cpu/mcheck/mce.c | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 21962c48dad7..3c1eec17312b 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -446,20 +446,6 @@ static inline void mce_gather_info(struct mce *m, struct pt_regs *regs)
 		if (mca_cfg.rip_msr)
 			m->ip = mce_rdmsrl(mca_cfg.rip_msr);
 	}
-
-	/*
-	 * Error handlers should save the values in MCA_ADDR, MCA_MISC0, and
-	 * MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and
-	 * MCA_STATUS[SyndV] are zero.
-	 */
-	if (m->cpuvendor == X86_VENDOR_AMD) {
-		u64 status = MCI_STATUS_ADDRV | MCI_STATUS_MISCV;
-
-		if (mce_flags.smca)
-			status |= MCI_STATUS_SYNDV;
-
-		m->status |= status;
-	}
 }
 
 int mce_available(struct cpuinfo_x86 *c)

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if valid bits are not set"
@ 2018-08-23 12:24             ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2018-08-23 12:24 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel, tony.luck, x86

Reviving an old issue while cleaning my inbox.

On Tue, Mar 27, 2018 at 03:59:37PM +0000, Ghannam, Yazen wrote:
> > > On Mon, Mar 26, 2018 at 07:58:51PM +0000, Ghannam, Yazen wrote:
> > > > So at a minimum, we should always save and report as much as we can.
> > >
> > > Only on Zen or all AMD families?
> > >
> > 
> > I'll confirm with the HW folks. I understand it as a change in philosophy
> > rather than a change in hardware.
> > 
> 
> So this recommendation could apply to all families, but it's okay if we just

Ok, so I think we should do this, still, as it is exactly what the
recommendation says: read the MSRs even if the valid bits are not set and it
doesn't set any Valid bits to confuse error handling downstream.

This way we'll collect all possible info and then mce_amd.c should stop looking
at the valid bits too and dump whatever has been logged.

Ok?

---
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 4b767284b7f5..c14674025615 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -639,10 +639,12 @@ static struct notifier_block mce_default_nb = {
  */
 static void mce_read_aux(struct mce *m, int i)
 {
-	if (m->status & MCI_STATUS_MISCV)
+	bool is_amd = m->cpuvendor == X86_VENDOR_AMD;
+
+	if (m->status & MCI_STATUS_MISCV || is_amd)
 		m->misc = mce_rdmsrl(msr_ops.misc(i));
 
-	if (m->status & MCI_STATUS_ADDRV) {
+	if (m->status & MCI_STATUS_ADDRV || is_amd) {
 		m->addr = mce_rdmsrl(msr_ops.addr(i));
 
 		/*
@@ -668,7 +670,7 @@ static void mce_read_aux(struct mce *m, int i)
 	if (mce_flags.smca) {
 		m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i));
 
-		if (m->status & MCI_STATUS_SYNDV)
+		if (m->status & MCI_STATUS_SYNDV || is_amd)
 			m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
 	}
 }

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [1/2] Revert "x86/mce/AMD: Collect error info even if valid bits are not set"
@ 2018-08-23 12:24             ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2018-08-23 12:24 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel, tony.luck, x86

Reviving an old issue while cleaning my inbox.

On Tue, Mar 27, 2018 at 03:59:37PM +0000, Ghannam, Yazen wrote:
> > > On Mon, Mar 26, 2018 at 07:58:51PM +0000, Ghannam, Yazen wrote:
> > > > So at a minimum, we should always save and report as much as we can.
> > >
> > > Only on Zen or all AMD families?
> > >
> > 
> > I'll confirm with the HW folks. I understand it as a change in philosophy
> > rather than a change in hardware.
> > 
> 
> So this recommendation could apply to all families, but it's okay if we just

Ok, so I think we should do this, still, as it is exactly what the
recommendation says: read the MSRs even if the valid bits are not set and it
doesn't set any Valid bits to confuse error handling downstream.

This way we'll collect all possible info and then mce_amd.c should stop looking
at the valid bits too and dump whatever has been logged.

Ok?

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 4b767284b7f5..c14674025615 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -639,10 +639,12 @@ static struct notifier_block mce_default_nb = {
  */
 static void mce_read_aux(struct mce *m, int i)
 {
-	if (m->status & MCI_STATUS_MISCV)
+	bool is_amd = m->cpuvendor == X86_VENDOR_AMD;
+
+	if (m->status & MCI_STATUS_MISCV || is_amd)
 		m->misc = mce_rdmsrl(msr_ops.misc(i));
 
-	if (m->status & MCI_STATUS_ADDRV) {
+	if (m->status & MCI_STATUS_ADDRV || is_amd) {
 		m->addr = mce_rdmsrl(msr_ops.addr(i));
 
 		/*
@@ -668,7 +670,7 @@ static void mce_read_aux(struct mce *m, int i)
 	if (mce_flags.smca) {
 		m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i));
 
-		if (m->status & MCI_STATUS_SYNDV)
+		if (m->status & MCI_STATUS_SYNDV || is_amd)
 			m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
 	}
 }

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* RE: [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if valid bits are not set"
@ 2018-08-23 17:53               ` Yazen Ghannam
  0 siblings, 0 replies; 29+ messages in thread
From: Ghannam, Yazen @ 2018-08-23 17:53 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel, tony.luck, x86

> -----Original Message-----
> From: linux-edac-owner@vger.kernel.org <linux-edac-owner@vger.kernel.org>
> On Behalf Of Borislav Petkov
> Sent: Thursday, August 23, 2018 7:24 AM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org;
> tony.luck@intel.com; x86@kernel.org
> Subject: Re: [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if valid
> bits are not set"
> 
> Reviving an old issue while cleaning my inbox.
> 
> On Tue, Mar 27, 2018 at 03:59:37PM +0000, Ghannam, Yazen wrote:
> > > > On Mon, Mar 26, 2018 at 07:58:51PM +0000, Ghannam, Yazen wrote:
> > > > > So at a minimum, we should always save and report as much as we can.
> > > >
> > > > Only on Zen or all AMD families?
> > > >
> > >
> > > I'll confirm with the HW folks. I understand it as a change in philosophy
> > > rather than a change in hardware.
> > >
> >
> > So this recommendation could apply to all families, but it's okay if we just
> 
> Ok, so I think we should do this, still, as it is exactly what the
> recommendation says: read the MSRs even if the valid bits are not set and it
> doesn't set any Valid bits to confuse error handling downstream.
> 
> This way we'll collect all possible info and then mce_amd.c should stop looking
> at the valid bits too and dump whatever has been logged.
> 
> Ok?
> 

Yes, this seems okay to me.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [1/2] Revert "x86/mce/AMD: Collect error info even if valid bits are not set"
@ 2018-08-23 17:53               ` Yazen Ghannam
  0 siblings, 0 replies; 29+ messages in thread
From: Yazen Ghannam @ 2018-08-23 17:53 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel, tony.luck, x86

> -----Original Message-----
> From: linux-edac-owner@vger.kernel.org <linux-edac-owner@vger.kernel.org>
> On Behalf Of Borislav Petkov
> Sent: Thursday, August 23, 2018 7:24 AM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org;
> tony.luck@intel.com; x86@kernel.org
> Subject: Re: [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if valid
> bits are not set"
> 
> Reviving an old issue while cleaning my inbox.
> 
> On Tue, Mar 27, 2018 at 03:59:37PM +0000, Ghannam, Yazen wrote:
> > > > On Mon, Mar 26, 2018 at 07:58:51PM +0000, Ghannam, Yazen wrote:
> > > > > So at a minimum, we should always save and report as much as we can.
> > > >
> > > > Only on Zen or all AMD families?
> > > >
> > >
> > > I'll confirm with the HW folks. I understand it as a change in philosophy
> > > rather than a change in hardware.
> > >
> >
> > So this recommendation could apply to all families, but it's okay if we just
> 
> Ok, so I think we should do this, still, as it is exactly what the
> recommendation says: read the MSRs even if the valid bits are not set and it
> doesn't set any Valid bits to confuse error handling downstream.
> 
> This way we'll collect all possible info and then mce_amd.c should stop looking
> at the valid bits too and dump whatever has been logged.
> 
> Ok?
> 

Yes, this seems okay to me.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2018-08-23 17:53 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-26 19:15 [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if valid bits are not set" Yazen Ghannam
2018-03-26 19:15 ` [1/2] " Yazen Ghannam
2018-03-26 19:15 ` [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents Yazen Ghannam
2018-03-26 19:15   ` [2/2] " Yazen Ghannam
2018-03-26 19:35   ` [PATCH 2/2] " Borislav Petkov
2018-03-26 19:35     ` [2/2] " Borislav Petkov
2018-03-26 20:05     ` [PATCH 2/2] " Ghannam, Yazen
2018-03-26 20:05       ` [2/2] " Yazen Ghannam
2018-03-26 20:09       ` [PATCH 2/2] " Borislav Petkov
2018-03-26 20:09         ` [2/2] " Borislav Petkov
2018-03-26 20:27         ` [PATCH 2/2] " Luck, Tony
2018-03-26 20:27           ` [2/2] " Luck, Tony
2018-03-27 14:07           ` [PATCH 2/2] " Ghannam, Yazen
2018-03-27 14:07             ` [2/2] " Yazen Ghannam
2018-03-26 19:30 ` [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if valid bits are not set" Borislav Petkov
2018-03-26 19:30   ` [1/2] " Borislav Petkov
2018-03-26 19:58   ` [PATCH 1/2] " Ghannam, Yazen
2018-03-26 19:58     ` [1/2] " Yazen Ghannam
2018-03-26 20:07     ` [PATCH 1/2] " Borislav Petkov
2018-03-26 20:07       ` [1/2] " Borislav Petkov
2018-03-27 14:02       ` [PATCH 1/2] " Ghannam, Yazen
2018-03-27 14:02         ` [1/2] " Yazen Ghannam
2018-03-27 15:59         ` [PATCH 1/2] " Ghannam, Yazen
2018-03-27 15:59           ` [1/2] " Yazen Ghannam
2018-08-23 12:24           ` [PATCH 1/2] " Borislav Petkov
2018-08-23 12:24             ` [1/2] " Borislav Petkov
2018-08-23 17:53             ` [PATCH 1/2] " Ghannam, Yazen
2018-08-23 17:53               ` [1/2] " Yazen Ghannam
2018-03-28 18:39 ` [tip:ras/core] " tip-bot for Yazen Ghannam

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.