All of lore.kernel.org
 help / color / mirror / Atom feed
* [Patch V0] x86, mce: Don't clear global error reporting banks during cpu_offline
@ 2015-09-03 18:17 Ashok Raj
  2015-09-04 11:50 ` Borislav Petkov
  0 siblings, 1 reply; 5+ messages in thread
From: Ashok Raj @ 2015-09-03 18:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ashok Raj, Boris Petkov, linux-edac, Tony Luck, Serge Ayoun

During CPU offline, or during suspend/resume operations, its not safe to
clear MCi_CTL. These MSR's are either thread scoped (meaning private to
thread), or core scoped (private to threads in that core only), or socket
scope i.e visible and controllable from all threads in the socket.

When we turn off during CPU_OFFLINE, just offlining a single CPU will
stop signaling for all the socket wide resources, such as LLC, iMC for e.g.

It is true for Intel CPU's. But there seems some history that other processors
may require to turn these off during every CPU offline.

Intel Secure Guard eXtentions will be disabled when these controls are cleared
from a security perspective. This patch enables SGX to work across
suspend/resume.

- Consolidated some code to use sharing
- Minor changes to some prototypes to fit usage.
- Left handling same for non-Intel CPU models to avoid any unknown regressions.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Serge Ayoun <serge.ayoun@intel.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 38 ++++++++++++++++++++++++++++----------
 1 file changed, 28 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index d350858..5498a79 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2100,7 +2100,7 @@ int __init mcheck_init(void)
  * Disable machine checks on suspend and shutdown. We can't really handle
  * them later.
  */
-static int mce_disable_error_reporting(void)
+static void mce_disable_error_reporting(void)
 {
 	int i;
 
@@ -2110,17 +2110,40 @@ static int mce_disable_error_reporting(void)
 		if (b->init)
 			wrmsrl(MSR_IA32_MCx_CTL(i), 0);
 	}
-	return 0;
+	return;
+}
+
+static void _vendor_disable_error_reporting(void)
+{
+	struct cpuinfo_x86 *c = &boot_cpu_data;
+
+	switch (c->x86_vendor) {
+	case X86_VENDOR_INTEL:
+		/*
+		 * Don't clear on Intel CPU's. Some of these MSR's are
+		 * socket wide. Disabling them for just a single cpu offline
+		 * is bad, since it will inhibit reporting for all shared
+		 * resources.. such as LLC, iMC for e.g.
+		 */
+		break;
+	default:
+		/*
+		 * Disble MCE reporting for all other CPU Vendor.
+		 * Don't want to break functionality on those
+		 */
+		mce_disable_error_reporting();
+	}
 }
 
 static int mce_syscore_suspend(void)
 {
-	return mce_disable_error_reporting();
+	_vendor_disable_error_reporting();
+	return 0;
 }
 
 static void mce_syscore_shutdown(void)
 {
-	mce_disable_error_reporting();
+	_vendor_disable_error_reporting();
 }
 
 /*
@@ -2400,19 +2423,14 @@ static void mce_device_remove(unsigned int cpu)
 static void mce_disable_cpu(void *h)
 {
 	unsigned long action = *(unsigned long *)h;
-	int i;
 
 	if (!mce_available(raw_cpu_ptr(&cpu_info)))
 		return;
 
 	if (!(action & CPU_TASKS_FROZEN))
 		cmci_clear();
-	for (i = 0; i < mca_cfg.banks; i++) {
-		struct mce_bank *b = &mce_banks[i];
 
-		if (b->init)
-			wrmsrl(MSR_IA32_MCx_CTL(i), 0);
-	}
+	_vendor_disable_error_reporting();
 }
 
 static void mce_reenable_cpu(void *h)
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Patch V0] x86, mce: Don't clear global error reporting banks during cpu_offline
  2015-09-03 18:17 [Patch V0] x86, mce: Don't clear global error reporting banks during cpu_offline Ashok Raj
@ 2015-09-04 11:50 ` Borislav Petkov
  2015-09-04 16:38   ` Luck, Tony
  2015-09-04 17:59   ` Raj, Ashok
  0 siblings, 2 replies; 5+ messages in thread
From: Borislav Petkov @ 2015-09-04 11:50 UTC (permalink / raw)
  To: Ashok Raj; +Cc: linux-kernel, Boris Petkov, linux-edac, Tony Luck, Serge Ayoun

On Thu, Sep 03, 2015 at 02:17:04PM -0400, Ashok Raj wrote:
> During CPU offline, or during suspend/resume operations, its not safe to
> clear MCi_CTL. These MSR's are either thread scoped (meaning private to
> thread), or core scoped (private to threads in that core only), or socket
> scope i.e visible and controllable from all threads in the socket.
> 
> When we turn off during CPU_OFFLINE, just offlining a single CPU will
> stop signaling for all the socket wide resources, such as LLC, iMC for e.g.
> 
> It is true for Intel CPU's. But there seems some history that other processors
> may require to turn these off during every CPU offline.
> 
> Intel Secure Guard eXtentions will be disabled when these controls are cleared
> from a security perspective. This patch enables SGX to work across
> suspend/resume.

What does that mean? What does SGX have to do with MCI_CTL registers?
Explain that in the commit message so that !Intel people can understand.

> - Consolidated some code to use sharing
> - Minor changes to some prototypes to fit usage.
> - Left handling same for non-Intel CPU models to avoid any unknown regressions.

For the whole patch text do:

s/cpu/CPU/

s/CPU's/CPUs/ and s/MSR's/MSRs/ if you mean plural. Also spellcheck all text.

> 
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Tested-by: Serge Ayoun <serge.ayoun@intel.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c | 38 ++++++++++++++++++++++++++++----------
>  1 file changed, 28 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index d350858..5498a79 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -2100,7 +2100,7 @@ int __init mcheck_init(void)
>   * Disable machine checks on suspend and shutdown. We can't really handle
>   * them later.
>   */
> -static int mce_disable_error_reporting(void)
> +static void mce_disable_error_reporting(void)
>  {
>  	int i;
>  
> @@ -2110,17 +2110,40 @@ static int mce_disable_error_reporting(void)
>  		if (b->init)
>  			wrmsrl(MSR_IA32_MCx_CTL(i), 0);
>  	}
> -	return 0;
> +	return;
> +}
> +
> +static void _vendor_disable_error_reporting(void)

Why the "_" prepended here?

> +{
> +	struct cpuinfo_x86 *c = &boot_cpu_data;
> +
> +	switch (c->x86_vendor) {
> +	case X86_VENDOR_INTEL:
> +		/*
> +		 * Don't clear on Intel CPU's. Some of these MSR's are
> +		 * socket wide. Disabling them for just a single cpu offline
> +		 * is bad, since it will inhibit reporting for all shared
> +		 * resources.. such as LLC, iMC for e.g.
> +		 */
> +		break;
> +	default:
> +		/*
> +		 * Disble MCE reporting for all other CPU Vendor.
> +		 * Don't want to break functionality on those
> +		 */
> +		mce_disable_error_reporting();
> +	}

I think the switch-case makes this unnecessarily bloated as code. Just
do:

	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
		return;

	mce_disable_error_reporting();

...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [Patch V0] x86, mce: Don't clear global error reporting banks during cpu_offline
  2015-09-04 11:50 ` Borislav Petkov
@ 2015-09-04 16:38   ` Luck, Tony
  2015-09-04 17:59   ` Raj, Ashok
  1 sibling, 0 replies; 5+ messages in thread
From: Luck, Tony @ 2015-09-04 16:38 UTC (permalink / raw)
  To: Borislav Petkov, Raj, Ashok
  Cc: linux-kernel, Boris Petkov, linux-edac, Ayoun, Serge

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 612 bytes --]

> What does that mean? What does SGX have to do with MCI_CTL registers?
> Explain that in the commit message so that !Intel people can understand.

I think the SGX folk are worried that it might be possible to compromise the
integrity of code running in SGX if an attacker has control of the host and can
inject errors which are ignored because of MCi_CTL settings. Hence they have
the h/w drop out of SGX if they see anyone tamper with MCi_CTL

-Tony
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Patch V0] x86, mce: Don't clear global error reporting banks during cpu_offline
  2015-09-04 11:50 ` Borislav Petkov
  2015-09-04 16:38   ` Luck, Tony
@ 2015-09-04 17:59   ` Raj, Ashok
  2015-09-05 15:32     ` Borislav Petkov
  1 sibling, 1 reply; 5+ messages in thread
From: Raj, Ashok @ 2015-09-04 17:59 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, Boris Petkov, linux-edac, Tony Luck, Serge Ayoun

Hi Boris

On Fri, Sep 04, 2015 at 01:50:29PM +0200, Borislav Petkov wrote:
> > Intel Secure Guard eXtentions will be disabled when these controls are cleared
> > from a security perspective. This patch enables SGX to work across
> > suspend/resume.
> 
> What does that mean? What does SGX have to do with MCI_CTL registers?
> Explain that in the commit message so that !Intel people can understand.

I will add something more in the commit log and resubmit. Tony had the right 
explanation in his response.. 


> For the whole patch text do:
> 
> s/cpu/CPU/
> 
> s/CPU's/CPUs/ and s/MSR's/MSRs/ if you mean plural. Also spellcheck all text.

Will do.

> > +static void _vendor_disable_error_reporting(void)
> 
> Why the "_" prepended here?

Hummm.. just tried to keep the style as other parts of mcheck_cpu_init()
where we have _mcheck_cpu_init_vendor(). Its not a rule, but sometimes 
local static functions have "__" varient.

Maybe it should have been the double "__"? 

> 
> > +{
> > +	struct cpuinfo_x86 *c = &boot_cpu_data;
> > +
> > +	switch (c->x86_vendor) {
> > +	case X86_VENDOR_INTEL:
> > +		/*
> > +		 * Don't clear on Intel CPU's. Some of these MSR's are
> > +		 * socket wide. Disabling them for just a single cpu offline
> > +		 * is bad, since it will inhibit reporting for all shared
> > +		 * resources.. such as LLC, iMC for e.g.
> > +		 */
> > +		break;
> > +	default:
> > +		/*
> > +		 * Disble MCE reporting for all other CPU Vendor.
> > +		 * Don't want to break functionality on those
> > +		 */
> > +		mce_disable_error_reporting();
> > +	}
> 
> I think the switch-case makes this unnecessarily bloated as code. Just
> do:


Makes sense... i will switch it as suggested.

> 
> 	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
> 		return;
> 
> 	mce_disable_error_reporting();
> 
> ...
> 

Cheers,
Ashok


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Patch V0] x86, mce: Don't clear global error reporting banks during cpu_offline
  2015-09-04 17:59   ` Raj, Ashok
@ 2015-09-05 15:32     ` Borislav Petkov
  0 siblings, 0 replies; 5+ messages in thread
From: Borislav Petkov @ 2015-09-05 15:32 UTC (permalink / raw)
  To: Raj, Ashok; +Cc: linux-kernel, Boris Petkov, linux-edac, Tony Luck, Serge Ayoun

On Fri, Sep 04, 2015 at 01:59:31PM -0400, Raj, Ashok wrote:
> Hummm.. just tried to keep the style as other parts of mcheck_cpu_init()
> where we have _mcheck_cpu_init_vendor().

But there's nothing init-like to disabling error reporting and this
function is not being called by mcheck_cpu_init()...

> Its not a rule, but sometimes local static functions have "__"
> varient.

Yes, that's internal functions, roughly.

> Maybe it should have been the double "__"?

Not at all - there's no layering here or whatever. Simply calling it
vendor_disable_error_reporting() is perfectly fine IMO.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-09-05 15:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-03 18:17 [Patch V0] x86, mce: Don't clear global error reporting banks during cpu_offline Ashok Raj
2015-09-04 11:50 ` Borislav Petkov
2015-09-04 16:38   ` Luck, Tony
2015-09-04 17:59   ` Raj, Ashok
2015-09-05 15:32     ` Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.