linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch V1 0/3] x86 Local Machine Check Exception (LMCE)
@ 2015-05-29 16:27 Ashok Raj
  2015-05-29 16:28 ` [Patch V1 1/3] x86, mce: Add LMCE definitions Ashok Raj
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Ashok Raj @ 2015-05-29 16:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ashok Raj, linux-edac, Borislav Petkov, Tony Luck

Historically machine checks on Intel X86 processors have been broadcast to all
logical processors in the system. Upcoming CPUs will support an opt-in
mechanism to request some machine checks delivered to a single logical
processor experiencing the fault.

For more details see Vol3, Chapter 15, Machine Check Architecture.

Ashok Raj (3):
  x86, mce: Add LMCE definitions.
  x86, mce: Add infrastructure required to support LMCE
  x86, mce: Handling LMCE events

 Documentation/x86/x86_64/boot-options.txt |  3 ++
 arch/x86/include/asm/mce.h                | 10 ++++
 arch/x86/include/uapi/asm/msr-index.h     |  2 +
 arch/x86/kernel/cpu/mcheck/mce.c          | 28 ++++++++++--
 arch/x86/kernel/cpu/mcheck/mce_intel.c    | 76 +++++++++++++++++++++++++++++++
 5 files changed, 116 insertions(+), 3 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Patch V1 1/3] x86, mce: Add LMCE definitions.
  2015-05-29 16:27 [Patch V1 0/3] x86 Local Machine Check Exception (LMCE) Ashok Raj
@ 2015-05-29 16:28 ` Ashok Raj
  2015-05-29 16:28 ` [Patch V1 2/3] x86, mce: Add infrastructure required to support LMCE Ashok Raj
  2015-05-29 16:28 ` [Patch V1 3/3] x86, mce: Handling LMCE events Ashok Raj
  2 siblings, 0 replies; 10+ messages in thread
From: Ashok Raj @ 2015-05-29 16:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ashok Raj, linux-edac, Borislav Petkov, Tony Luck

Add required definitions to support Local Machine Check Exceptions.

See http://www.intel.com/sdm Volume 3, System Programming Guide, chapter 15
for more information on MSR's and documentation on Local MCE.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/include/asm/mce.h            | 5 +++++
 arch/x86/include/uapi/asm/msr-index.h | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 1f5a86d..677a408 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -17,11 +17,16 @@
 #define MCG_EXT_CNT(c)		(((c) & MCG_EXT_CNT_MASK) >> MCG_EXT_CNT_SHIFT)
 #define MCG_SER_P		(1ULL<<24)   /* MCA recovery/new status bits */
 #define MCG_ELOG_P		(1ULL<<26)   /* Extended error log supported */
+#define MCG_LMCE_P		(1ULL<<27)   /* Local machine check supported */
 
 /* MCG_STATUS register defines */
 #define MCG_STATUS_RIPV  (1ULL<<0)   /* restart ip valid */
 #define MCG_STATUS_EIPV  (1ULL<<1)   /* ip points to correct instruction */
 #define MCG_STATUS_MCIP  (1ULL<<2)   /* machine check in progress */
+#define MCG_STATUS_LMCES (1ULL<<3)   /* LMCE signaled */
+
+/* MCG_EXT_CTL register defines */
+#define MCG_EXT_CTL_LMCE_EN (1ULL<<0) /* Enable LMCE */
 
 /* MCi_STATUS register defines */
 #define MCI_STATUS_VAL   (1ULL<<63)  /* valid error */
diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h
index c469490..e28d5a2 100644
--- a/arch/x86/include/uapi/asm/msr-index.h
+++ b/arch/x86/include/uapi/asm/msr-index.h
@@ -56,6 +56,7 @@
 #define MSR_IA32_MCG_CAP		0x00000179
 #define MSR_IA32_MCG_STATUS		0x0000017a
 #define MSR_IA32_MCG_CTL		0x0000017b
+#define MSR_IA32_MCG_EXT_CTL		0x000004d0
 
 #define MSR_OFFCORE_RSP_0		0x000001a6
 #define MSR_OFFCORE_RSP_1		0x000001a7
@@ -379,6 +380,7 @@
 #define FEATURE_CONTROL_LOCKED				(1<<0)
 #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX	(1<<1)
 #define FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX	(1<<2)
+#define FEATURE_CONTROL_LMCE_SUPPORT_ENABLED		(1<<20)
 
 #define MSR_IA32_APICBASE		0x0000001b
 #define MSR_IA32_APICBASE_BSP		(1<<8)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Patch V1 2/3] x86, mce: Add infrastructure required to support LMCE
  2015-05-29 16:27 [Patch V1 0/3] x86 Local Machine Check Exception (LMCE) Ashok Raj
  2015-05-29 16:28 ` [Patch V1 1/3] x86, mce: Add LMCE definitions Ashok Raj
@ 2015-05-29 16:28 ` Ashok Raj
  2015-05-29 17:35   ` Borislav Petkov
  2015-05-29 17:44   ` Borislav Petkov
  2015-05-29 16:28 ` [Patch V1 3/3] x86, mce: Handling LMCE events Ashok Raj
  2 siblings, 2 replies; 10+ messages in thread
From: Ashok Raj @ 2015-05-29 16:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ashok Raj, linux-edac, Borislav Petkov, Tony Luck

Initialization and handling for LMCE
- boot time option to disable LMCE for that boot instance
- Check for capability via IA32_MCG_CAP
- provide ability to enable/disable LMCE on demand.

See http://www.intel.com/sdm Volume 3 System Programming Guide, Chapter 15
for more information on MSR's and documentation on Local MCE.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 Documentation/x86/x86_64/boot-options.txt |  3 ++
 arch/x86/include/asm/mce.h                |  5 +++
 arch/x86/kernel/cpu/mcheck/mce.c          |  3 ++
 arch/x86/kernel/cpu/mcheck/mce_intel.c    | 75 +++++++++++++++++++++++++++++++
 4 files changed, 86 insertions(+)

diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
index 5223479..79edee0 100644
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -31,6 +31,9 @@ Machine check
 		(e.g. BIOS or hardware monitoring applications), conflicting
 		with OS's error handling, and you cannot deactivate the agent,
 		then this option will be a help.
+   mce=no_lmce
+		Do not opt-in to Local MCE delivery. Use legacy method
+		to broadcast MCE's.
    mce=bootlog
 		Enable logging of machine checks left over from booting.
 		Disabled by default on AMD because some BIOS leave bogus ones.
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 677a408..8ba4d7a 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -109,6 +109,7 @@ struct mce_log {
 struct mca_config {
 	bool dont_log_ce;
 	bool cmci_disabled;
+	bool lmce_disabled;
 	bool ignore_ce;
 	bool disabled;
 	bool ser;
@@ -173,12 +174,16 @@ void cmci_clear(void);
 void cmci_reenable(void);
 void cmci_rediscover(void);
 void cmci_recheck(void);
+void lmce_clear(void);
+void lmce_enable(void);
 #else
 static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { }
 static inline void cmci_clear(void) {}
 static inline void cmci_reenable(void) {}
 static inline void cmci_rediscover(void) {}
 static inline void cmci_recheck(void) {}
+static inline void lmce_clear(void) {}
+static inline void lmce_enable(void) {}
 #endif
 
 #ifdef CONFIG_X86_MCE_AMD
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index e535533..d10aada 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1976,6 +1976,7 @@ void mce_disable_bank(int bank)
 /*
  * mce=off Disables machine check
  * mce=no_cmci Disables CMCI
+ * mce=no_lmce Disables LMCE
  * mce=dont_log_ce Clears corrected events silently, no log created for CEs.
  * mce=ignore_ce Disables polling and CMCI, corrected events are not cleared.
  * mce=TOLERANCELEVEL[,monarchtimeout] (number, see above)
@@ -1999,6 +2000,8 @@ static int __init mcheck_enable(char *str)
 		cfg->disabled = true;
 	else if (!strcmp(str, "no_cmci"))
 		cfg->cmci_disabled = true;
+	else if (!strcmp(str, "no_lmce"))
+		cfg->lmce_disabled = true;
 	else if (!strcmp(str, "dont_log_ce"))
 		cfg->dont_log_ce = true;
 	else if (!strcmp(str, "ignore_ce"))
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index b4a41cf..be3a5c6 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -70,6 +70,10 @@ enum {
 
 static atomic_t cmci_storm_on_cpus;
 
+#define FEATURE_CONTROL_LMCE_BITS	((FEATURE_CONTROL_LOCKED) | \
+					 (FEATURE_CONTROL_LMCE_SUPPORT_ENABLED))
+#define MCG_CAP_LMCE_BITS		((MCG_SER_P) | (MCG_LMCE_P))
+
 static int cmci_supported(int *banks)
 {
 	u64 cap;
@@ -91,6 +95,34 @@ static int cmci_supported(int *banks)
 	return !!(cap & MCG_CMCI_P);
 }
 
+static bool lmce_supported(void)
+{
+	u64 cap, feature_ctl;
+	bool lmce_bios_support, retval;
+
+	if (mca_cfg.lmce_disabled)
+		return false;
+
+	rdmsrl(MSR_IA32_MCG_CAP, cap);
+	rdmsrl(MSR_IA32_FEATURE_CONTROL, feature_ctl);
+
+	/*
+	 * BIOS should indicate support for LMCE by setting
+	 * bit20 in IA32_FEATURE_CONTROL. without which touching
+	 * MCG_EXT_CTL will generate #GP fault.
+	 */
+	lmce_bios_support = ((feature_ctl & (FEATURE_CONTROL_LMCE_BITS)) ==
+			(FEATURE_CONTROL_LMCE_BITS));
+
+	/*
+	 * MCG_CAP should indicate both MCG_SER_P and MCG_LMCE_P
+	 */
+	cap = ((cap & MCG_CAP_LMCE_BITS) == (MCG_CAP_LMCE_BITS));
+	retval = (cap && lmce_bios_support);
+
+	return retval;
+}
+
 bool mce_intel_cmci_poll(void)
 {
 	if (__this_cpu_read(cmci_storm_state) == CMCI_STORM_NONE)
@@ -405,6 +437,49 @@ static void intel_init_cmci(void)
 	cmci_recheck();
 }
 
+static void __lmce_enable(void)
+{
+	u64 val;
+
+	rdmsrl(MSR_IA32_MCG_EXT_CTL, val);
+	val |= MCG_EXT_CTL_LMCE_EN;
+	wrmsrl(MSR_IA32_MCG_EXT_CTL, val);
+}
+
+
+void intel_init_lmce(void)
+{
+	if (!lmce_supported())
+		return;
+
+	__lmce_enable();
+}
+
+void lmce_enable(void)
+{
+	intel_init_lmce();
+}
+
+void lmce_disable(void)
+{
+	u64 val;
+
+	rdmsrl(MSR_IA32_MCG_EXT_CTL, val);
+	val &= ~MCG_EXT_CTL_LMCE_EN;
+	wrmsrl(MSR_IA32_MCG_EXT_CTL, val);
+}
+
+/*
+ * Disable LMCE on this CPU for all banks it owns when it goes down.
+ * This allows other CPUs to claim the banks on rediscovery.
+ */
+void lmce_clear(void)
+{
+	if (!lmce_supported())
+		return;
+	lmce_disable();
+}
+
 void mce_intel_feature_init(struct cpuinfo_x86 *c)
 {
 	intel_init_thermal(c);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Patch V1 3/3] x86, mce: Handling LMCE events
  2015-05-29 16:27 [Patch V1 0/3] x86 Local Machine Check Exception (LMCE) Ashok Raj
  2015-05-29 16:28 ` [Patch V1 1/3] x86, mce: Add LMCE definitions Ashok Raj
  2015-05-29 16:28 ` [Patch V1 2/3] x86, mce: Add infrastructure required to support LMCE Ashok Raj
@ 2015-05-29 16:28 ` Ashok Raj
  2015-05-29 17:36   ` Borislav Petkov
  2 siblings, 1 reply; 10+ messages in thread
From: Ashok Raj @ 2015-05-29 16:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ashok Raj, linux-edac, Borislav Petkov, Tony Luck

This patch has handling changes to do_machine_check() to process MCE
signaled as local MCE. Typically only recoverable errors (SRAR) type
error will be Signaled as LMCE. But architecture does not restrict to
only those errors.

When errors are signaled as LMCE, there is no need for the MCE handler to
perform rendezvous with other logical processors unlike earlier processors
that would broadcast machine check errors.

See http://www.intel.com/sdm Volume 3, Chapter 15 for more information
on MSR's and documentation on Local MCE.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c       | 25 ++++++++++++++++++++++---
 arch/x86/kernel/cpu/mcheck/mce_intel.c |  1 +
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index d10aada..c130391 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1047,6 +1047,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 	char *msg = "Unknown";
 	u64 recover_paddr = ~0ull;
 	int flags = MF_ACTION_REQUIRED;
+	int lmce = 0;
 
 	prev_state = ist_enter(regs);
 
@@ -1074,11 +1075,19 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 		kill_it = 1;
 
 	/*
+	 * Check if this MCE is signaled to only this logical processor
+	 */
+	if (m.mcgstatus & MCG_STATUS_LMCES)
+		lmce = 1;
+	/*
 	 * Go through all the banks in exclusion of the other CPUs.
 	 * This way we don't report duplicated events on shared banks
 	 * because the first one to see it will clear it.
+	 * If this is a Local MCE, then no need to perform rendezvous.
 	 */
-	order = mce_start(&no_way_out);
+	if (!lmce)
+		order = mce_start(&no_way_out);
+
 	for (i = 0; i < cfg->banks; i++) {
 		__clear_bit(i, toclear);
 		if (!test_bit(i, valid_banks))
@@ -1155,8 +1164,18 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 	 * Do most of the synchronization with other CPUs.
 	 * When there's any problem use only local no_way_out state.
 	 */
-	if (mce_end(order) < 0)
-		no_way_out = worst >= MCE_PANIC_SEVERITY;
+	if (!lmce) {
+		if (mce_end(order) < 0)
+			no_way_out = worst >= MCE_PANIC_SEVERITY;
+	} else {
+		/*
+		 * Local MCE skipped calling mce_reign()
+		 * If we found a fatal error, we need to panic here.
+		 */
+		 if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3)
+			mce_panic("Machine check from unknown source",
+				NULL, NULL);
+	}
 
 	/*
 	 * At insane "tolerant" levels we take no action. Otherwise
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index be3a5c6..73a2844 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -484,4 +484,5 @@ void mce_intel_feature_init(struct cpuinfo_x86 *c)
 {
 	intel_init_thermal(c);
 	intel_init_cmci();
+	intel_init_lmce();
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Patch V1 2/3] x86, mce: Add infrastructure required to support LMCE
  2015-05-29 16:28 ` [Patch V1 2/3] x86, mce: Add infrastructure required to support LMCE Ashok Raj
@ 2015-05-29 17:35   ` Borislav Petkov
  2015-05-29 17:44   ` Borislav Petkov
  1 sibling, 0 replies; 10+ messages in thread
From: Borislav Petkov @ 2015-05-29 17:35 UTC (permalink / raw)
  To: Ashok Raj; +Cc: linux-kernel, linux-edac, Borislav Petkov, Tony Luck

On Fri, May 29, 2015 at 09:28:01AM -0700, Ashok Raj wrote:
> Initialization and handling for LMCE
> - boot time option to disable LMCE for that boot instance
> - Check for capability via IA32_MCG_CAP
> - provide ability to enable/disable LMCE on demand.
> 
> See http://www.intel.com/sdm Volume 3 System Programming Guide, Chapter 15
> for more information on MSR's and documentation on Local MCE.
> 
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> ---
>  Documentation/x86/x86_64/boot-options.txt |  3 ++
>  arch/x86/include/asm/mce.h                |  5 +++
>  arch/x86/kernel/cpu/mcheck/mce.c          |  3 ++
>  arch/x86/kernel/cpu/mcheck/mce_intel.c    | 75 +++++++++++++++++++++++++++++++
>  4 files changed, 86 insertions(+)

> +static bool lmce_supported(void)
> +{
> +	u64 cap, feature_ctl;
> +	bool lmce_bios_support, retval;
> +
> +	if (mca_cfg.lmce_disabled)
> +		return false;
> +
> +	rdmsrl(MSR_IA32_MCG_CAP, cap);
> +	rdmsrl(MSR_IA32_FEATURE_CONTROL, feature_ctl);
> +
> +	/*
> +	 * BIOS should indicate support for LMCE by setting
> +	 * bit20 in IA32_FEATURE_CONTROL. without which touching
> +	 * MCG_EXT_CTL will generate #GP fault.
> +	 */
> +	lmce_bios_support = ((feature_ctl & (FEATURE_CONTROL_LMCE_BITS)) ==
> +			(FEATURE_CONTROL_LMCE_BITS));
> +
> +	/*
> +	 * MCG_CAP should indicate both MCG_SER_P and MCG_LMCE_P
> +	 */
> +	cap = ((cap & MCG_CAP_LMCE_BITS) == (MCG_CAP_LMCE_BITS));

No need for those local definitions of MSR bits. Simply do:

	if (feature_ctl & (FEATURE_CONTROL_LOCKED | FEATURE_CONTROL_LMCE_SUPPORT_ENABLED) !=
			  (FEATURE_CONTROL_LOCKED | FEATURE_CONTROL_LMCE_SUPPORT_ENABLED))
		return false;

Same with cap:

	if (cap & (.. | ..) != ( .. | ..))
		return false;

	return true;

Also, shorten those bit definitions.


> +	retval = (cap && lmce_bios_support);
> +
> +	return retval;
> +}
> +
>  bool mce_intel_cmci_poll(void)
>  {
>  	if (__this_cpu_read(cmci_storm_state) == CMCI_STORM_NONE)
> @@ -405,6 +437,49 @@ static void intel_init_cmci(void)
>  	cmci_recheck();
>  }
>  
> +static void __lmce_enable(void)
> +{
> +	u64 val;
> +
> +	rdmsrl(MSR_IA32_MCG_EXT_CTL, val);
> +	val |= MCG_EXT_CTL_LMCE_EN;
> +	wrmsrl(MSR_IA32_MCG_EXT_CTL, val);
> +}

Called only in intel_init_lmce(), merge it into it.

> +
> +
> +void intel_init_lmce(void)
> +{
> +	if (!lmce_supported())
> +		return;
> +
> +	__lmce_enable();
> +}
> +
> +void lmce_enable(void)

This one's unused, drop it.

> +{
> +	intel_init_lmce();
> +}
> +
> +void lmce_disable(void)
> +{
> +	u64 val;
> +
> +	rdmsrl(MSR_IA32_MCG_EXT_CTL, val);
> +	val &= ~MCG_EXT_CTL_LMCE_EN;
> +	wrmsrl(MSR_IA32_MCG_EXT_CTL, val);
> +}

Called only in lmce_clear(), merge it into it.

> +
> +/*
> + * Disable LMCE on this CPU for all banks it owns when it goes down.
> + * This allows other CPUs to claim the banks on rediscovery.
> + */
> +void lmce_clear(void)
> +{
> +	if (!lmce_supported())
> +		return;
> +	lmce_disable();
> +}
> +
>  void mce_intel_feature_init(struct cpuinfo_x86 *c)
>  {
>  	intel_init_thermal(c);
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-edac" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch V1 3/3] x86, mce: Handling LMCE events
  2015-05-29 16:28 ` [Patch V1 3/3] x86, mce: Handling LMCE events Ashok Raj
@ 2015-05-29 17:36   ` Borislav Petkov
  0 siblings, 0 replies; 10+ messages in thread
From: Borislav Petkov @ 2015-05-29 17:36 UTC (permalink / raw)
  To: Ashok Raj; +Cc: linux-kernel, linux-edac, Borislav Petkov, Tony Luck

On Fri, May 29, 2015 at 09:28:02AM -0700, Ashok Raj wrote:
> This patch has handling changes to do_machine_check() to process MCE
> signaled as local MCE. Typically only recoverable errors (SRAR) type
> error will be Signaled as LMCE. But architecture does not restrict to
> only those errors.
> 
> When errors are signaled as LMCE, there is no need for the MCE handler to
> perform rendezvous with other logical processors unlike earlier processors
> that would broadcast machine check errors.
> 
> See http://www.intel.com/sdm Volume 3, Chapter 15 for more information
> on MSR's and documentation on Local MCE.
> 
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c       | 25 ++++++++++++++++++++++---
>  arch/x86/kernel/cpu/mcheck/mce_intel.c |  1 +
>  2 files changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index d10aada..c130391 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1047,6 +1047,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>  	char *msg = "Unknown";
>  	u64 recover_paddr = ~0ull;
>  	int flags = MF_ACTION_REQUIRED;
> +	int lmce = 0;
>  
>  	prev_state = ist_enter(regs);
>  
> @@ -1074,11 +1075,19 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>  		kill_it = 1;
>  
>  	/*
> +	 * Check if this MCE is signaled to only this logical processor
> +	 */
> +	if (m.mcgstatus & MCG_STATUS_LMCES)
> +		lmce = 1;

	else
		/*
		 * Go through ...
		 * ...
		 */
		order = mce_start(&no_way_out);


> +	/*
>  	 * Go through all the banks in exclusion of the other CPUs.
>  	 * This way we don't report duplicated events on shared banks
>  	 * because the first one to see it will clear it.
> +	 * If this is a Local MCE, then no need to perform rendezvous.
>  	 */
> -	order = mce_start(&no_way_out);
> +	if (!lmce)
> +		order = mce_start(&no_way_out);
> +
>  	for (i = 0; i < cfg->banks; i++) {
>  		__clear_bit(i, toclear);
>  		if (!test_bit(i, valid_banks))
-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch V1 2/3] x86, mce: Add infrastructure required to support LMCE
  2015-05-29 16:28 ` [Patch V1 2/3] x86, mce: Add infrastructure required to support LMCE Ashok Raj
  2015-05-29 17:35   ` Borislav Petkov
@ 2015-05-29 17:44   ` Borislav Petkov
  2015-06-01 17:09     ` Raj, Ashok
  2015-06-01 18:48     ` Raj, Ashok
  1 sibling, 2 replies; 10+ messages in thread
From: Borislav Petkov @ 2015-05-29 17:44 UTC (permalink / raw)
  To: Ashok Raj; +Cc: linux-kernel, linux-edac, Borislav Petkov, Tony Luck

On Fri, May 29, 2015 at 09:28:01AM -0700, Ashok Raj wrote:
> Initialization and handling for LMCE
> - boot time option to disable LMCE for that boot instance
> - Check for capability via IA32_MCG_CAP
> - provide ability to enable/disable LMCE on demand.
> 
> See http://www.intel.com/sdm Volume 3 System Programming Guide, Chapter 15
> for more information on MSR's and documentation on Local MCE.
> 
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> ---
>  Documentation/x86/x86_64/boot-options.txt |  3 ++
>  arch/x86/include/asm/mce.h                |  5 +++
>  arch/x86/kernel/cpu/mcheck/mce.c          |  3 ++
>  arch/x86/kernel/cpu/mcheck/mce_intel.c    | 75 +++++++++++++++++++++++++++++++
>  4 files changed, 86 insertions(+)

...

> +static bool lmce_supported(void)
> +{
> +	u64 cap, feature_ctl;
> +	bool lmce_bios_support, retval;
> +
> +	if (mca_cfg.lmce_disabled)
> +		return false;
> +
> +	rdmsrl(MSR_IA32_MCG_CAP, cap);
> +	rdmsrl(MSR_IA32_FEATURE_CONTROL, feature_ctl);

One more thing: You should check MCG_LMCE_P *first* and only read
MSR_IA32_FEATURE_CONTROL if MCG_LMCE_P is set - otherwise this'll start
blowing up on older machines which don't sport that new MSR and on kvm.

> +	/*
> +	 * BIOS should indicate support for LMCE by setting
> +	 * bit20 in IA32_FEATURE_CONTROL. without which touching
> +	 * MCG_EXT_CTL will generate #GP fault.
> +	 */
> +	lmce_bios_support = ((feature_ctl & (FEATURE_CONTROL_LMCE_BITS)) ==
> +			(FEATURE_CONTROL_LMCE_BITS));
> +
> +	/*
> +	 * MCG_CAP should indicate both MCG_SER_P and MCG_LMCE_P
> +	 */

Also, why do we need to look at MCG_SER_P for LMCE?

Btw, we do that already in __mcheck_cpu_cap_init() so you could check
mca_cfg.ser here instead.

> +	cap = ((cap & MCG_CAP_LMCE_BITS) == (MCG_CAP_LMCE_BITS));
> +	retval = (cap && lmce_bios_support);

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch V1 2/3] x86, mce: Add infrastructure required to support LMCE
  2015-05-29 17:44   ` Borislav Petkov
@ 2015-06-01 17:09     ` Raj, Ashok
  2015-06-01 18:48     ` Raj, Ashok
  1 sibling, 0 replies; 10+ messages in thread
From: Raj, Ashok @ 2015-06-01 17:09 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, linux-edac, Borislav Petkov, Tony Luck, Ashok Raj



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch V1 2/3] x86, mce: Add infrastructure required to support LMCE
  2015-05-29 17:44   ` Borislav Petkov
  2015-06-01 17:09     ` Raj, Ashok
@ 2015-06-01 18:48     ` Raj, Ashok
  2015-06-01 19:37       ` Borislav Petkov
  1 sibling, 1 reply; 10+ messages in thread
From: Raj, Ashok @ 2015-06-01 18:48 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, linux-edac, Borislav Petkov, Tony Luck, Ashok Raj

Hi Boris

If you got a blank email, sorry about that. Its been a while since i used
mutt and my setup was goofed up probably. Or i might have read your 
signature a bit too literally :-)

> > +
> > +	if (mca_cfg.lmce_disabled)
> > +		return false;
> > +
> > +	rdmsrl(MSR_IA32_MCG_CAP, cap);
> > +	rdmsrl(MSR_IA32_FEATURE_CONTROL, feature_ctl);
> 

> One more thing: You should check MCG_LMCE_P *first* and only read
> MSR_IA32_FEATURE_CONTROL if MCG_LMCE_P is set - otherwise this'll start
> blowing up on older machines which don't sport that new MSR and on kvm.

I did re-organize this to read better in my upcoming post. But in general 
reading FEATURE_CONTROL isn't bad. It wont trip on a #GP for e.g. 
FEATURE_CONTROL has been around for a while. Only when we set
reserved bits without checking would be bad.
> 
> > +	lmce_bios_support = ((feature_ctl & (FEATURE_CONTROL_LMCE_BITS)) ==
> > +			(FEATURE_CONTROL_LMCE_BITS));
> > +
> Also, why do we need to look at MCG_SER_P for LMCE?

Good point. Its required by architecture, since it depends on recovery support
in processors to work. I forgot to add that to the SDM when i made those 
updates. I will update the SDM appropriately on my next attempt at it.

> 
> Btw, we do that already in __mcheck_cpu_cap_init() so you could check
> mca_cfg.ser here instead.

Could have used mca_cfg. But just being paranoid, would be safe to test per-cpu 
instead of taking the global based on BSP. Just in case someone put
a system with slightly different capabilities. 

> 
> 
> ECO tip #101: Trim your mails when you reply.

Sorry about my config challenges.. hopefully this makes it out with 
all the responses :-)

Cheers,
Ashok



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch V1 2/3] x86, mce: Add infrastructure required to support LMCE
  2015-06-01 18:48     ` Raj, Ashok
@ 2015-06-01 19:37       ` Borislav Petkov
  0 siblings, 0 replies; 10+ messages in thread
From: Borislav Petkov @ 2015-06-01 19:37 UTC (permalink / raw)
  To: Raj, Ashok; +Cc: linux-kernel, linux-edac, Borislav Petkov, Tony Luck

On Mon, Jun 01, 2015 at 11:48:57AM -0700, Raj, Ashok wrote:
> If you got a blank email, sorry about that. Its been a while since i used
> mutt and my setup was goofed up probably. Or i might have read your 
> signature a bit too literally :-)

LOL.

> I did re-organize this to read better in my upcoming post. But in
> general reading FEATURE_CONTROL isn't bad. It wont trip on a #GP for
> e.g. FEATURE_CONTROL has been around for a while.

Are you sure? Have you checked booting on qemu+kvm too?

I mean, I'm fine with whatever MSR access ordering we do as long as
nothing breaks. Obviously.

> Good point. Its required by architecture, since it depends on recovery
> support in processors to work. I forgot to add that to the SDM when
> i made those updates. I will update the SDM appropriately on my next
> attempt at it.

Ok, cool. Please add this fact to the commit message too.

> Sorry about my config challenges.. hopefully this makes it out with
> all the responses :-)

No worries, looks good :-)

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-06-01 19:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-29 16:27 [Patch V1 0/3] x86 Local Machine Check Exception (LMCE) Ashok Raj
2015-05-29 16:28 ` [Patch V1 1/3] x86, mce: Add LMCE definitions Ashok Raj
2015-05-29 16:28 ` [Patch V1 2/3] x86, mce: Add infrastructure required to support LMCE Ashok Raj
2015-05-29 17:35   ` Borislav Petkov
2015-05-29 17:44   ` Borislav Petkov
2015-06-01 17:09     ` Raj, Ashok
2015-06-01 18:48     ` Raj, Ashok
2015-06-01 19:37       ` Borislav Petkov
2015-05-29 16:28 ` [Patch V1 3/3] x86, mce: Handling LMCE events Ashok Raj
2015-05-29 17:36   ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).