linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/4] x86/mce: Add Zhaoxin MCE support
@ 2019-09-11 12:01 Tony W Wang-oc
  2019-09-11 14:35 ` Borislav Petkov
  2019-09-13 18:10 ` Luck, Tony
  0 siblings, 2 replies; 5+ messages in thread
From: Tony W Wang-oc @ 2019-09-11 12:01 UTC (permalink / raw)
  To: tony.luck, Borislav Petkov (bp@alien8.de),
	tglx, mingo, hpa, x86, linux-edac, linux-kernel, yazen.ghannam,
	vishal.l.verma, qiuxu.zhuo
  Cc: David Wang, Cooper Yan(BJ-RD), Qiyuan Wang(BJ-RD), Herry Yang(BJ-RD)

All Zhaoxin newer CPUs support MCE that compatible with Intel's
"Machine-Check Architecture", so add support for Zhaoxin MCE in
mce/core.c.

Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
---
v2->v3:
 - Make ifelse-case to switch-case
 - Simplify Zhaoxin CPU FMS checking

 arch/x86/kernel/cpu/mce/core.c | 38 ++++++++++++++++++++++++++++----------
 1 file changed, 28 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 743370e..7bcd8c1 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -488,8 +488,9 @@ int mce_usable_address(struct mce *m)
 	if (!(m->status & MCI_STATUS_ADDRV))
 		return 0;
 
-	/* Checks after this one are Intel-specific: */
-	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+	/* Checks after this one are Intel/Zhaoxin-specific: */
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL &&
+	    boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN)
 		return 1;
 
 	if (!(m->status & MCI_STATUS_MISCV))
@@ -507,10 +508,13 @@ EXPORT_SYMBOL_GPL(mce_usable_address);
 
 bool mce_is_memory_error(struct mce *m)
 {
-	if (m->cpuvendor == X86_VENDOR_AMD ||
-	    m->cpuvendor == X86_VENDOR_HYGON) {
+	switch (m->cpuvendor) {
+	case X86_VENDOR_AMD:
+	case X86_VENDOR_HYGON:
 		return amd_mce_is_memory_error(m);
-	} else if (m->cpuvendor == X86_VENDOR_INTEL) {
+
+	case X86_VENDOR_INTEL:
+	case X86_VENDOR_ZHAOXIN:
 		/*
 		 * Intel SDM Volume 3B - 15.9.2 Compound Error Codes
 		 *
@@ -527,9 +531,10 @@ bool mce_is_memory_error(struct mce *m)
 		return (m->status & 0xef80) == BIT(7) ||
 		       (m->status & 0xef00) == BIT(8) ||
 		       (m->status & 0xeffc) == 0xc;
-	}
 
-	return false;
+	default:
+		return false;
+	}
 }
 EXPORT_SYMBOL_GPL(mce_is_memory_error);
 
@@ -1697,6 +1702,18 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
 		if (c->x86 == 6 && c->x86_model == 45)
 			quirk_no_way_out = quirk_sandybridge_ifu;
 	}
+
+	if (c->x86_vendor == X86_VENDOR_ZHAOXIN) {
+		/*
+		 * All newer Zhaoxin CPUs support MCE broadcasting. Enable
+		 * synchronization with a one second timeout.
+		 */
+		if (c->x86 > 6 || (c->x86_model == 0x19 || c->x86_model == 0x1f)) {
+			if (cfg->monarch_timeout < 0)
+				cfg->monarch_timeout = USEC_PER_SEC;
+		}
+	}
+
 	if (cfg->monarch_timeout < 0)
 		cfg->monarch_timeout = 0;
 	if (cfg->bootlog != 0)
@@ -2014,15 +2031,16 @@ static void mce_disable_error_reporting(void)
 static void vendor_disable_error_reporting(void)
 {
 	/*
-	 * Don't clear on Intel or AMD or Hygon CPUs. Some of these MSRs
-	 * are socket-wide.
+	 * Don't clear on Intel or AMD or Hygon or Zhaoxin CPUs. Some of these
+	 * MSRs are socket-wide.
 	 * Disabling them for just a single offlined CPU is bad, since it will
 	 * inhibit reporting for all shared resources on the socket like the
 	 * last level cache (LLC), the integrated memory controller (iMC), etc.
 	 */
 	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
 	    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON ||
-	    boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
+	    boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
+	    boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN)
 		return;
 
 	mce_disable_error_reporting();
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 1/4] x86/mce: Add Zhaoxin MCE support
  2019-09-11 12:01 [PATCH v3 1/4] x86/mce: Add Zhaoxin MCE support Tony W Wang-oc
@ 2019-09-11 14:35 ` Borislav Petkov
  2019-09-13 18:10 ` Luck, Tony
  1 sibling, 0 replies; 5+ messages in thread
From: Borislav Petkov @ 2019-09-11 14:35 UTC (permalink / raw)
  To: Tony W Wang-oc
  Cc: tony.luck, tglx, mingo, hpa, x86, linux-edac, linux-kernel,
	yazen.ghannam, vishal.l.verma, qiuxu.zhuo, David Wang,
	Cooper Yan(BJ-RD), Qiyuan Wang(BJ-RD), Herry Yang(BJ-RD)

On Wed, Sep 11, 2019 at 12:01:42PM +0000, Tony W Wang-oc wrote:
> All Zhaoxin newer CPUs support MCE that compatible with Intel's
> "Machine-Check Architecture", so add support for Zhaoxin MCE in
> mce/core.c.
> 
> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
> ---
> v2->v3:
>  - Make ifelse-case to switch-case
>  - Simplify Zhaoxin CPU FMS checking

Btw, for future submissions - you don't have to do it now - search the
web for threaded mails, look at git-send-email's manpage and especially
the --thread and --chain-reply-to options. Also, look at lkml for
examples.

IOW, patchsets should have a 0/N message and all the others should
be sent as a reply to that message, i.e., shallow threading, as the
git-send-email manpage calls it.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 1/4] x86/mce: Add Zhaoxin MCE support
  2019-09-11 12:01 [PATCH v3 1/4] x86/mce: Add Zhaoxin MCE support Tony W Wang-oc
  2019-09-11 14:35 ` Borislav Petkov
@ 2019-09-13 18:10 ` Luck, Tony
  2019-09-13 21:16   ` Borislav Petkov
  1 sibling, 1 reply; 5+ messages in thread
From: Luck, Tony @ 2019-09-13 18:10 UTC (permalink / raw)
  To: Tony W Wang-oc
  Cc: Borislav Petkov (bp@alien8.de),
	tglx, mingo, hpa, x86, linux-edac, linux-kernel, yazen.ghannam,
	vishal.l.verma, qiuxu.zhuo, David Wang, Cooper Yan(BJ-RD),
	Qiyuan Wang(BJ-RD), Herry Yang(BJ-RD)

On Wed, Sep 11, 2019 at 12:01:42PM +0000, Tony W Wang-oc wrote:
> +	/* Checks after this one are Intel/Zhaoxin-specific: */
> +	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL &&
> +	    boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN)


Is it time to have a big cleanup on how we handle similarities
and oddities in the MCE subsystem?  We've been adding ad-hoc
tests like this in random places ... and it all looks very
messy.  Lines that mention x86_vendor|x86|x86_model below
arch/x86/kernel/cpu/mce/ currently look like this:

arch/x86/kernel/cpu/mce/amd.c:		   (c->x86_model >= 0x10 && c->x86_model <= 0x2F)) {
arch/x86/kernel/cpu/mce/amd.c:	    c->x86_model >= 0x10 && c->x86_model <= 0x2F &&
arch/x86/kernel/cpu/mce/amd.c:	} else if (c->x86 == 0x17 &&
arch/x86/kernel/cpu/mce/amd.c:	if (c->x86 == 0x15 && bank == 4) {
arch/x86/kernel/cpu/mce/amd.c:	if (c->x86 == 0x17 &&
arch/x86/kernel/cpu/mce/core.c:	    boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
arch/x86/kernel/cpu/mce/core.c:	    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON ||
arch/x86/kernel/cpu/mce/core.c:	     c->x86 > 6) {
arch/x86/kernel/cpu/mce/core.c:	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
arch/x86/kernel/cpu/mce/core.c:	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
arch/x86/kernel/cpu/mce/core.c:	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
arch/x86/kernel/cpu/mce/core.c:		if (c->x86 < 0x11 && cfg->bootlog < 0) {
arch/x86/kernel/cpu/mce/core.c:		if (c->x86 == 0x15 && c->x86_model <= 0xf)
arch/x86/kernel/cpu/mce/core.c:		if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) {
arch/x86/kernel/cpu/mce/core.c:	if (c->x86 != 5)
arch/x86/kernel/cpu/mce/core.c:		if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) &&
arch/x86/kernel/cpu/mce/core.c:		if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0)
arch/x86/kernel/cpu/mce/core.c:	if ((c->x86 == 6 && c->x86_model == 0xf && c->x86_stepping >= 0xe) ||
arch/x86/kernel/cpu/mce/core.c:		if (c->x86 == 6 && c->x86_model <= 13 && cfg->bootlog < 0)
arch/x86/kernel/cpu/mce/core.c:		if (c->x86 == 6 && c->x86_model == 45)
arch/x86/kernel/cpu/mce/core.c:		if (c->x86 == 6 && this_cpu_read(mce_num_banks) > 0)
arch/x86/kernel/cpu/mce/core.c:	if (c->x86_vendor == X86_VENDOR_AMD) {
arch/x86/kernel/cpu/mce/core.c:	if (c->x86_vendor == X86_VENDOR_AMD || c->x86_vendor == X86_VENDOR_HYGON) {
arch/x86/kernel/cpu/mce/core.c:	if (c->x86_vendor == X86_VENDOR_INTEL) {
arch/x86/kernel/cpu/mce/core.c:	if (c->x86_vendor == X86_VENDOR_UNKNOWN) {
arch/x86/kernel/cpu/mce/core.c:	m->cpuvendor = boot_cpu_data.x86_vendor;
arch/x86/kernel/cpu/mce/core.c:	switch (c->x86_vendor) {
arch/x86/kernel/cpu/mce/inject.c:	    boot_cpu_data.x86 < 0x17) {
arch/x86/kernel/cpu/mce/inject.c:	m->cpuvendor = boot_cpu_data.x86_vendor;
arch/x86/kernel/cpu/mce/intel.c:	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
arch/x86/kernel/cpu/mce/intel.c:	switch (c->x86_model) {
arch/x86/kernel/cpu/mce/severity.c:	    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
arch/x86/kernel/cpu/mce/severity.c:	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
arch/x86/kernel/cpu/mce/therm_throt.c:		if (c->x86 == 6 && (c->x86_model == 9 || c->x86_model == 13)) {

Maybe we can X86_VENDOR_ZHAOXIN to this jumble with the excuse that
it is already so ugly that this patch series only makes things 5% worse?

Or should we make a big table of CPU vendors/families/models and use
x86_match_cpu() to pick out what are running on and set some bits/flags
(like X86_FEATURE/X86_BUG) which we can use in the code to do the
right thing in each place?

E.g. default for Intel and Zhaoxin vendors would be to set MCE_INTEL_LIKE.

Thoughts?

-Tony

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 1/4] x86/mce: Add Zhaoxin MCE support
  2019-09-13 18:10 ` Luck, Tony
@ 2019-09-13 21:16   ` Borislav Petkov
  0 siblings, 0 replies; 5+ messages in thread
From: Borislav Petkov @ 2019-09-13 21:16 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Tony W Wang-oc, tglx, mingo, hpa, x86, linux-edac, linux-kernel,
	yazen.ghannam, vishal.l.verma, qiuxu.zhuo, David Wang,
	Cooper Yan(BJ-RD), Qiyuan Wang(BJ-RD), Herry Yang(BJ-RD)

On Fri, Sep 13, 2019 at 11:10:31AM -0700, Luck, Tony wrote:
> Is it time to have a big cleanup on how we handle similarities
> and oddities in the MCE subsystem?  We've been adding ad-hoc
> tests like this in random places ... and it all looks very
> messy.

Hohum, it has been bothering me for a while now too. ;-\

> Or should we make a big table of CPU vendors/families/models and use
> x86_match_cpu() to pick out what are running on and set some bits/flags
> (like X86_FEATURE/X86_BUG) which we can use in the code to do the
> right thing in each place?

Yes, that. And I have started doing something along those lines, see
struct mce_vendor_flags.

If we did the X86_FEATURE/BUG things, we would still end up using those
new definitions in the MCA code only so I think having our own bits in a
bitfield would be cleaner/nicer.

Anyway, detection can be all done in __mcheck_cpu_init_early() or
somewhere similar, all matching flags/bits set and then the rest of the
code would query only them.

We can also merge mce_vendor_flags into mca_cfg as that thing is used
everywhere.

Another advantage of having our own flags is that we can define them as
we like and stick them all in internal.h so no exposure to the outside.

And so on.

> E.g. default for Intel and Zhaoxin vendors would be to set MCE_INTEL_LIKE.
> 
> Thoughts?

Yah, I think that's a good idea and I think we should do it. Not
immediately but work towards it.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3 1/4] x86/mce: Add Zhaoxin MCE support
@ 2019-09-16 11:36 Tony W Wang-oc
  0 siblings, 0 replies; 5+ messages in thread
From: Tony W Wang-oc @ 2019-09-16 11:36 UTC (permalink / raw)
  To: tony.luck, Borislav Petkov (bp@alien8.de),
	tglx, mingo, hpa, x86, linux-edac, linux-kernel, yazen.ghannam,
	vishal.l.verma, qiuxu.zhuo
  Cc: David Wang, Cooper Yan(BJ-RD), Qiyuan Wang(BJ-RD), Herry Yang(BJ-RD)

All Zhaoxin newer CPUs support MCE that compatible with Intel's
"Machine-Check Architecture", so add support for Zhaoxin MCE in
mce/core.c.

Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
---
 arch/x86/kernel/cpu/mce/core.c | 38 ++++++++++++++++++++++++++++----------
 1 file changed, 28 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 743370e..7bcd8c1 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -488,8 +488,9 @@ int mce_usable_address(struct mce *m)
 	if (!(m->status & MCI_STATUS_ADDRV))
 		return 0;
 
-	/* Checks after this one are Intel-specific: */
-	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+	/* Checks after this one are Intel/Zhaoxin-specific: */
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL &&
+	    boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN)
 		return 1;
 
 	if (!(m->status & MCI_STATUS_MISCV))
@@ -507,10 +508,13 @@ EXPORT_SYMBOL_GPL(mce_usable_address);
 
 bool mce_is_memory_error(struct mce *m)
 {
-	if (m->cpuvendor == X86_VENDOR_AMD ||
-	    m->cpuvendor == X86_VENDOR_HYGON) {
+	switch (m->cpuvendor) {
+	case X86_VENDOR_AMD:
+	case X86_VENDOR_HYGON:
 		return amd_mce_is_memory_error(m);
-	} else if (m->cpuvendor == X86_VENDOR_INTEL) {
+
+	case X86_VENDOR_INTEL:
+	case X86_VENDOR_ZHAOXIN:
 		/*
 		 * Intel SDM Volume 3B - 15.9.2 Compound Error Codes
 		 *
@@ -527,9 +531,10 @@ bool mce_is_memory_error(struct mce *m)
 		return (m->status & 0xef80) == BIT(7) ||
 		       (m->status & 0xef00) == BIT(8) ||
 		       (m->status & 0xeffc) == 0xc;
-	}
 
-	return false;
+	default:
+		return false;
+	}
 }
 EXPORT_SYMBOL_GPL(mce_is_memory_error);
 
@@ -1697,6 +1702,18 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
 		if (c->x86 == 6 && c->x86_model == 45)
 			quirk_no_way_out = quirk_sandybridge_ifu;
 	}
+
+	if (c->x86_vendor == X86_VENDOR_ZHAOXIN) {
+		/*
+		 * All newer Zhaoxin CPUs support MCE broadcasting. Enable
+		 * synchronization with a one second timeout.
+		 */
+		if (c->x86 > 6 || (c->x86_model == 0x19 || c->x86_model == 0x1f)) {
+			if (cfg->monarch_timeout < 0)
+				cfg->monarch_timeout = USEC_PER_SEC;
+		}
+	}
+
 	if (cfg->monarch_timeout < 0)
 		cfg->monarch_timeout = 0;
 	if (cfg->bootlog != 0)
@@ -2014,15 +2031,16 @@ static void mce_disable_error_reporting(void)
 static void vendor_disable_error_reporting(void)
 {
 	/*
-	 * Don't clear on Intel or AMD or Hygon CPUs. Some of these MSRs
-	 * are socket-wide.
+	 * Don't clear on Intel or AMD or Hygon or Zhaoxin CPUs. Some of these
+	 * MSRs are socket-wide.
 	 * Disabling them for just a single offlined CPU is bad, since it will
 	 * inhibit reporting for all shared resources on the socket like the
 	 * last level cache (LLC), the integrated memory controller (iMC), etc.
 	 */
 	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
 	    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON ||
-	    boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
+	    boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
+	    boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN)
 		return;
 
 	mce_disable_error_reporting();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-09-16 11:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-11 12:01 [PATCH v3 1/4] x86/mce: Add Zhaoxin MCE support Tony W Wang-oc
2019-09-11 14:35 ` Borislav Petkov
2019-09-13 18:10 ` Luck, Tony
2019-09-13 21:16   ` Borislav Petkov
2019-09-16 11:36 Tony W Wang-oc

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).