All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/2] Rework mce_severity
@ 2015-03-20 16:31 Aravind Gopalakrishnan
  2015-03-20 16:31 ` [PATCH V2 1/2] x86, mce, severities: Add AMD severities function Aravind Gopalakrishnan
  2015-03-20 16:31 ` [PATCH V2 2/2] x86, mce, severities: Define mce_severity function pointer Aravind Gopalakrishnan
  0 siblings, 2 replies; 7+ messages in thread
From: Aravind Gopalakrishnan @ 2015-03-20 16:31 UTC (permalink / raw)
  To: tglx, mingo, hpa, tony.luck, bp, slaoub, luto, x86, linux-kernel,
	linux-edac
  Cc: Aravind Gopalakrishnan

Patch1: Introduce AMD severities function
Patch2: Initialise mce_severity function pointer to choose between
	Intel or AMD grading mechanisms

Aravind Gopalakrishnan (2):
  x86, mce, severities: Add AMD severities function
  x86, mce, severities: Define mce_severity function pointer

 arch/x86/include/asm/mce.h                |  8 ++++
 arch/x86/kernel/cpu/mcheck/mce-internal.h |  3 +-
 arch/x86/kernel/cpu/mcheck/mce-severity.c | 80 ++++++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/mcheck/mce.c          | 10 ++++
 4 files changed, 99 insertions(+), 2 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH V2 1/2] x86, mce, severities: Add AMD severities function
  2015-03-20 16:31 [PATCH V2 0/2] Rework mce_severity Aravind Gopalakrishnan
@ 2015-03-20 16:31 ` Aravind Gopalakrishnan
  2015-03-20 16:31 ` [PATCH V2 2/2] x86, mce, severities: Define mce_severity function pointer Aravind Gopalakrishnan
  1 sibling, 0 replies; 7+ messages in thread
From: Aravind Gopalakrishnan @ 2015-03-20 16:31 UTC (permalink / raw)
  To: tglx, mingo, hpa, tony.luck, bp, slaoub, luto, x86, linux-kernel,
	linux-edac
  Cc: Aravind Gopalakrishnan, Aravind Gopalakrishnan

Add a severities function that caters to AMD processors.
This allows us to do some vendor specific work within the
function if necessary.

Also, introduce a vendor flag bitfield which contains vendor
specific flags. The severities code uses this to define error
scope based on the prescence of the flags field.

This is based off of work by Boris Petkov.

Testing details:
Tested the patch for any regressions on F15hM0h-0fh (Orochi)
and F15hM60h-6fh (Carrizo) and it works fine.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
---
Changes from V1:
 - Test mce_flags.overflow_recov once instead of multiple times

 arch/x86/include/asm/mce.h                |  6 ++++
 arch/x86/kernel/cpu/mcheck/mce-severity.c | 53 +++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/mcheck/mce.c          |  9 ++++++
 3 files changed, 68 insertions(+)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index fd38a23..b574fbf 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -116,6 +116,12 @@ struct mca_config {
 	u32 rip_msr;
 };
 
+struct mce_vendor_flags {
+	__u64		overflow_recov	: 1, /* cpuid_ebx(80000007) */
+			__reserved_0	: 63;
+};
+extern struct mce_vendor_flags mce_flags;
+
 extern struct mca_config mca_cfg;
 extern void mce_register_decode_chain(struct notifier_block *nb);
 extern void mce_unregister_decode_chain(struct notifier_block *nb);
diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index 8bb4330..4f8f87d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -186,12 +186,65 @@ static int error_context(struct mce *m)
 	return ((m->cs & 3) == 3) ? IN_USER : IN_KERNEL;
 }
 
+/* keeping mce_severity_amd in sync with AMD error scope heirarchy table */
+static int mce_severity_amd(struct mce *m, enum context ctx)
+{
+	enum context ctx = error_context(m);
+	/* Processor Context Corrupt, no need to fumble too much, die! */
+	if (m->status & MCI_STATUS_PCC)
+		return MCE_PANIC_SEVERITY;
+
+	if (m->status & MCI_STATUS_UC) {
+		/*
+		 * On older systems, where overflow_recov flag is not
+		 * present, we should simply PANIC if Overflow occurs.
+		 * If overflow_recov flag set, then SW can try
+		 * to at least kill process to salvage systen operation.
+		 */
+
+		if (mce_flags.overflow_recov) {
+			/* software can try to contain */
+			if (!(m->mcgstatus & MCG_STATUS_RIPV))
+				if (ctx == IN_KERNEL)
+					return MCE_PANIC_SEVERITY;
+
+				/* kill current process */
+				return MCE_AR_SEVERITY;
+		} else {
+			/* at least one error was not logged */
+			if (m->status & MCI_STATUS_OVER)
+				return MCE_PANIC_SEVERITY;
+		}
+		/*
+		 * any other case, return MCE_UC_SEVERITY so that
+		 * we log the error and exit #MC handler.
+		 */
+		return MCE_UC_SEVERITY;
+	}
+
+	/*
+	 * deferred error: poll handler catches these and adds to mce_ring
+	 * so memory-failure can take recovery actions.
+	 */
+	if (m->status & MCI_STATUS_DEFERRED)
+		return MCE_DEFERRED_SEVERITY;
+
+	/*
+	 * corrected error: poll handler catches these and passes
+	 * responsibility of decoding the error to EDAC
+	 */
+	return MCE_KEEP_SEVERITY;
+}
+
 int mce_severity(struct mce *m, int tolerant, char **msg, bool is_excp)
 {
 	enum exception excp = (is_excp ? EXCP_CONTEXT : NO_EXCP);
 	enum context ctx = error_context(m);
 	struct severity *s;
 
+	if (m->cpuvendor == X86_VENDOR_AMD)
+		return mce_severity_amd(m, ctx);
+
 	for (s = severities;; s++) {
 		if ((m->status & s->mask) != s->result)
 			continue;
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 17ad025..680cfb2 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -65,6 +65,7 @@ static DEFINE_MUTEX(mce_chrdev_read_mutex);
 DEFINE_PER_CPU(unsigned, mce_exception_count);
 
 struct mce_bank *mce_banks __read_mostly;
+struct mce_vendor_flags mce_flags __read_mostly;
 
 struct mca_config mca_cfg __read_mostly = {
 	.bootlog  = -1,
@@ -1532,6 +1533,13 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
 		 if (c->x86 == 6 && cfg->banks > 0)
 			mce_banks[0].ctl = 0;
 
+		/*
+		 * overflow_recov is supported for F15h Models 00h-0fh
+		 * even though we don't have cpuid bit for this
+		 */
+		if (c->x86 == 0x15 && c->x86_model <= 0xf)
+			mce_flags.overflow_recov = 1;
+
 		 /*
 		  * Turn off MC4_MISC thresholding banks on those models since
 		  * they're not supported there.
@@ -1637,6 +1645,7 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
 		break;
 	case X86_VENDOR_AMD:
 		mce_amd_feature_init(c);
+		mce_flags.overflow_recov = cpuid_ebx(0x80000007) & 0x1;
 		break;
 	default:
 		break;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH V2 2/2] x86, mce, severities: Define mce_severity function pointer
  2015-03-20 16:31 [PATCH V2 0/2] Rework mce_severity Aravind Gopalakrishnan
  2015-03-20 16:31 ` [PATCH V2 1/2] x86, mce, severities: Add AMD severities function Aravind Gopalakrishnan
@ 2015-03-20 16:31 ` Aravind Gopalakrishnan
  2015-03-20 22:31   ` Luck, Tony
  1 sibling, 1 reply; 7+ messages in thread
From: Aravind Gopalakrishnan @ 2015-03-20 16:31 UTC (permalink / raw)
  To: tglx, mingo, hpa, tony.luck, bp, slaoub, luto, x86, linux-kernel,
	linux-edac
  Cc: Aravind Gopalakrishnan

Rename mce_severity() as mce_severity_intel and assign mce_severity
function pointer to either one of mce_severity_intel or
mce_severity_amd during init depending on which processor we are on.

This way, we can avoid a test to call mce_severity_amd every time we
get into mce_severity(). And it's cleaner to do it this way.

Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
---
 arch/x86/include/asm/mce.h                |  2 ++
 arch/x86/kernel/cpu/mcheck/mce-internal.h |  3 ++-
 arch/x86/kernel/cpu/mcheck/mce-severity.c | 35 ++++++++++++++++++++++++++-----
 arch/x86/kernel/cpu/mcheck/mce.c          |  1 +
 4 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index b574fbf..1f5a86d 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -134,9 +134,11 @@ extern int mce_p5_enabled;
 #ifdef CONFIG_X86_MCE
 int mcheck_init(void);
 void mcheck_cpu_init(struct cpuinfo_x86 *c);
+void mcheck_vendor_init_severity(void);
 #else
 static inline int mcheck_init(void) { return 0; }
 static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {}
+static inline void mcheck_vendor_init_severity(void) {}
 #endif
 
 #ifdef CONFIG_X86_ANCIENT_MCE
diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h
index e12f0bf..4758f5f 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-internal.h
+++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h
@@ -24,7 +24,8 @@ struct mce_bank {
 	char			attrname[ATTR_LEN];	/* attribute name */
 };
 
-int mce_severity(struct mce *a, int tolerant, char **msg, bool is_excp);
+extern int (*mce_severity)(struct mce *a, int tolerant,
+			    char **msg, bool is_excp);
 struct dentry *mce_get_debugfs_dir(void);
 
 extern struct mce_bank *mce_banks;
diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index 4f8f87d..683a06f 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -187,7 +187,8 @@ static int error_context(struct mce *m)
 }
 
 /* keeping mce_severity_amd in sync with AMD error scope heirarchy table */
-static int mce_severity_amd(struct mce *m, enum context ctx)
+static int mce_severity_amd(struct mce *m, int tolerant,
+			    char **msg, bool is_excp)
 {
 	enum context ctx = error_context(m);
 	/* Processor Context Corrupt, no need to fumble too much, die! */
@@ -236,15 +237,13 @@ static int mce_severity_amd(struct mce *m, enum context ctx)
 	return MCE_KEEP_SEVERITY;
 }
 
-int mce_severity(struct mce *m, int tolerant, char **msg, bool is_excp)
+static int mce_severity_intel(struct mce *m, int tolerant,
+			      char **msg, bool is_excp)
 {
 	enum exception excp = (is_excp ? EXCP_CONTEXT : NO_EXCP);
 	enum context ctx = error_context(m);
 	struct severity *s;
 
-	if (m->cpuvendor == X86_VENDOR_AMD)
-		return mce_severity_amd(m, ctx);
-
 	for (s = severities;; s++) {
 		if ((m->status & s->mask) != s->result)
 			continue;
@@ -269,6 +268,32 @@ int mce_severity(struct mce *m, int tolerant, char **msg, bool is_excp)
 	}
 }
 
+static int mce_severity_default(struct mce *m, int tolerant,
+				char **msg, bool is_excp)
+{
+	return MCE_PANIC_SEVERITY;
+}
+
+int (*mce_severity)(struct mce *m, int tolerant, char **msg, bool is_excp) =
+		    mce_severity_default;
+
+void __init mcheck_vendor_init_severity(void)
+{
+	struct cpuinfo_x86 *c = &boot_cpu_data;
+
+	switch (c->x86_vendor) {
+	case X86_VENDOR_INTEL:
+		mce_severity = mce_severity_intel;
+		break;
+	case X86_VENDOR_AMD:
+		mce_severity = mce_severity_amd;
+		break;
+	default:
+		WARN_ONCE(1, "WTF!?");
+		break;
+	}
+}
+
 #ifdef CONFIG_DEBUG_FS
 static void *s_start(struct seq_file *f, loff_t *pos)
 {
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 680cfb2..f22e76f 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2030,6 +2030,7 @@ __setup("mce", mcheck_enable);
 int __init mcheck_init(void)
 {
 	mcheck_intel_therm_init();
+	mcheck_vendor_init_severity();
 
 	return 0;
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* RE: [PATCH V2 2/2] x86, mce, severities: Define mce_severity function pointer
  2015-03-20 16:31 ` [PATCH V2 2/2] x86, mce, severities: Define mce_severity function pointer Aravind Gopalakrishnan
@ 2015-03-20 22:31   ` Luck, Tony
  2015-03-21  2:35     ` Aravind Gopalakrishnan
  0 siblings, 1 reply; 7+ messages in thread
From: Luck, Tony @ 2015-03-20 22:31 UTC (permalink / raw)
  To: Aravind Gopalakrishnan, tglx, mingo, hpa, bp, slaoub, luto, x86,
	linux-kernel, linux-edac

+	default:
+		WARN_ONCE(1, "WTF!?");
+		break;

You meant to type:

		mce_severity = mce_severity_default;

just there, right?

-Tony

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2 2/2] x86, mce, severities: Define mce_severity function pointer
  2015-03-20 22:31   ` Luck, Tony
@ 2015-03-21  2:35     ` Aravind Gopalakrishnan
  2015-03-21  6:10       ` Borislav Petkov
  0 siblings, 1 reply; 7+ messages in thread
From: Aravind Gopalakrishnan @ 2015-03-21  2:35 UTC (permalink / raw)
  To: Luck, Tony, tglx, mingo, hpa, bp, slaoub, luto, x86,
	linux-kernel, linux-edac


On 3/20/15 5:31 PM, Luck, Tony wrote:
> +	default:
> +		WARN_ONCE(1, "WTF!?");
> +		break;
>
> You meant to type:
>
> 		mce_severity = mce_severity_default;
>
> just there, right?
>
Other function pointers in the mce code like unexpected_machine_check 
and default_threshold_interrupt
are assigned to the respective function pointers when they are defined.

So, I just followed a similar assignment for mce_severity_default.

I can do that in the default statement if you prefer.

Thanks,
-Aravind.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2 2/2] x86, mce, severities: Define mce_severity function pointer
  2015-03-21  2:35     ` Aravind Gopalakrishnan
@ 2015-03-21  6:10       ` Borislav Petkov
  2015-03-23 14:20         ` Aravind Gopalakrishnan
  0 siblings, 1 reply; 7+ messages in thread
From: Borislav Petkov @ 2015-03-21  6:10 UTC (permalink / raw)
  To: Aravind Gopalakrishnan
  Cc: Luck, Tony, tglx, mingo, hpa, slaoub, luto, x86, linux-kernel,
	linux-edac

On Fri, Mar 20, 2015 at 09:35:26PM -0500, Aravind Gopalakrishnan wrote:
> Other function pointers in the mce code like unexpected_machine_check
> and default_threshold_interrupt are assigned to the respective
> function pointers when they are defined.

The "WTF?!" would still fire and we don't want that.

Also, I'm not sure about returning MCE_PANIC_SEVERITY by default.
I mean, the code for !(Intel || AMD) has worked just fine with the
original severities, i.e., mce_severity_intel() now.

So maybe we should assign mce_severity_intel() on static init of the
mce_severity pointer and override it only on AMD...

This keeps the old behaviour for other machines, in the manner of
letting sleeping dogs lie...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2 2/2] x86, mce, severities: Define mce_severity function pointer
  2015-03-21  6:10       ` Borislav Petkov
@ 2015-03-23 14:20         ` Aravind Gopalakrishnan
  0 siblings, 0 replies; 7+ messages in thread
From: Aravind Gopalakrishnan @ 2015-03-23 14:20 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luck, Tony, tglx, mingo, hpa, slaoub, luto, x86, linux-kernel,
	linux-edac

On 3/21/2015 1:10 AM, Borislav Petkov wrote:
> On Fri, Mar 20, 2015 at 09:35:26PM -0500, Aravind Gopalakrishnan wrote:
>> Other function pointers in the mce code like unexpected_machine_check
>> and default_threshold_interrupt are assigned to the respective
>> function pointers when they are defined.
> The "WTF?!" would still fire and we don't want that.

Ah. Ok, I misunderstood. Will clear this.

> Also, I'm not sure about returning MCE_PANIC_SEVERITY by default.
> I mean, the code for !(Intel || AMD) has worked just fine with the
> original severities, i.e., mce_severity_intel() now.
>
> So maybe we should assign mce_severity_intel() on static init of the
> mce_severity pointer and override it only on AMD...
>
> This keeps the old behaviour for other machines, in the manner of
> letting sleeping dogs lie...
>

Ok, I'll do that and resend.

Thanks,
-Aravind.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-03-23 14:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-20 16:31 [PATCH V2 0/2] Rework mce_severity Aravind Gopalakrishnan
2015-03-20 16:31 ` [PATCH V2 1/2] x86, mce, severities: Add AMD severities function Aravind Gopalakrishnan
2015-03-20 16:31 ` [PATCH V2 2/2] x86, mce, severities: Define mce_severity function pointer Aravind Gopalakrishnan
2015-03-20 22:31   ` Luck, Tony
2015-03-21  2:35     ` Aravind Gopalakrishnan
2015-03-21  6:10       ` Borislav Petkov
2015-03-23 14:20         ` Aravind Gopalakrishnan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.