linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/umwait: Fix error handling in umwait_init()
@ 2019-08-10  1:40 Fenghua Yu
  2019-08-10  5:25 ` Thomas Gleixner
  2019-08-12 12:55 ` [tip:x86/urgent] " tip-bot for Fenghua Yu
  0 siblings, 2 replies; 4+ messages in thread
From: Fenghua Yu @ 2019-08-10  1:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Valdis Kletnieks,
	x86, linux-kernel
  Cc: Fenghua Yu

Currently, failure of cpuhp_setup_state() is ignored and the syscore
ops and the control interfaces can still be added even after the
failure. But, this error handling will cause a few issues:

1. The CPUs may have different values in the IA32_UMWAIT_CONTROL
   MSR because there is no way to roll back the control MSR on
   the CPUs which already set the MSR before the failure.
2. If the sysfs interface is added successfully, there will be a mismatch
   between the global control value and the control MSR:
   - The interface shows the default global control value. But,
     the control MSR is not set to the value because the CPU online
     function, which is supposed to set the MSR to the value,
     is not installed.
   - If the sysadmin changes the global control value through
     the interface, the control MSR on all current online CPUs is
     set to the new value. But, the control MSR on newly onlined CPUs
     after the value change will not be set to the new value due to
     lack of the CPU online function.
3. On resume from suspend/hibernation, the boot CPU restores the control
   MSR to the global control value through the syscore ops. But, the
   control MSR on all APs is not set due to lake of the CPU online
   function.

To solve the issues and enforce consistent behavior on the failure
of the CPU hotplug setup, make the following changes:

1. Cache the original control MSR value which is configured by
   hardware or BIOS before kernel boot. This value is likely to
   be 0. But it could be a different number as well. Cache the
   control MSR only once before the MSR is changed.
2. Add the CPU offline function so that the MSR is restored to the
   original control value on all CPUs on the failure.
3. On the failure, exit from cpumait_init() so that the syscore ops
   and the control interfaces are not added.

Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/kernel/cpu/umwait.c | 39 +++++++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c
index 6a204e7336c1..95581f4cf6d8 100644
--- a/arch/x86/kernel/cpu/umwait.c
+++ b/arch/x86/kernel/cpu/umwait.c
@@ -17,6 +17,12 @@
  */
 static u32 umwait_control_cached = UMWAIT_CTRL_VAL(100000, UMWAIT_C02_ENABLE);
 
+/*
+ * Cache the original IA32_UMWAIT_CONTROL MSR value which is configured by
+ * hardware or BIOS before kernel boot.
+ */
+static u32 orig_umwait_control_cached __read_mostly;
+
 /*
  * Serialize access to umwait_control_cached and IA32_UMWAIT_CONTROL MSR in
  * the sysfs write functions.
@@ -52,6 +58,23 @@ static int umwait_cpu_online(unsigned int cpu)
 	return 0;
 }
 
+/*
+ * The CPU hotplug callback sets the control MSR to the original control
+ * value.
+ */
+static int umwait_cpu_offline(unsigned int cpu)
+{
+	/*
+	 * This code is protected by the CPU hotplug already and
+	 * orig_umwait_control_cached is never changed after it caches
+	 * the original control MSR value in umwait_init(). So there
+	 * is no race condition here.
+	 */
+	wrmsr(MSR_IA32_UMWAIT_CONTROL, orig_umwait_control_cached, 0);
+
+	return 0;
+}
+
 /*
  * On resume, restore IA32_UMWAIT_CONTROL MSR on the boot processor which
  * is the only active CPU at this time. The MSR is set up on the APs via the
@@ -185,8 +208,22 @@ static int __init umwait_init(void)
 	if (!boot_cpu_has(X86_FEATURE_WAITPKG))
 		return -ENODEV;
 
+	/*
+	 * Cache the original control MSR value before the control MSR is
+	 * changed. This is the only place where orig_umwait_control_cached
+	 * is modified.
+	 */
+	rdmsrl(MSR_IA32_UMWAIT_CONTROL, orig_umwait_control_cached);
+
 	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait:online",
-				umwait_cpu_online, NULL);
+				umwait_cpu_online, umwait_cpu_offline);
+	if (ret < 0) {
+		/*
+		 * On failure, the control MSR on all CPUs has the
+		 * original control value.
+		 */
+		return ret;
+	}
 
 	register_syscore_ops(&umwait_syscore_ops);
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] x86/umwait: Fix error handling in umwait_init()
  2019-08-10  1:40 [PATCH] x86/umwait: Fix error handling in umwait_init() Fenghua Yu
@ 2019-08-10  5:25 ` Thomas Gleixner
  2019-08-10 13:09   ` Thomas Gleixner
  2019-08-12 12:55 ` [tip:x86/urgent] " tip-bot for Fenghua Yu
  1 sibling, 1 reply; 4+ messages in thread
From: Thomas Gleixner @ 2019-08-10  5:25 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Borislav Petkov, Valdis Kletnieks, x86, linux-kernel

On Fri, 9 Aug 2019, Fenghua Yu wrote:
> +/*
> + * The CPU hotplug callback sets the control MSR to the original control
> + * value.
> + */
> +static int umwait_cpu_offline(unsigned int cpu)
> +{
> +	/*
> +	 * This code is protected by the CPU hotplug already and
> +	 * orig_umwait_control_cached is never changed after it caches
> +	 * the original control MSR value in umwait_init(). So there
> +	 * is no race condition here.
> +	 */
> +	wrmsr(MSR_IA32_UMWAIT_CONTROL, orig_umwait_control_cached, 0);

Even my brain compiler emits an error here.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] x86/umwait: Fix error handling in umwait_init()
  2019-08-10  5:25 ` Thomas Gleixner
@ 2019-08-10 13:09   ` Thomas Gleixner
  0 siblings, 0 replies; 4+ messages in thread
From: Thomas Gleixner @ 2019-08-10 13:09 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Borislav Petkov, Valdis Kletnieks, x86, linux-kernel

On Sat, 10 Aug 2019, Thomas Gleixner wrote:
> On Fri, 9 Aug 2019, Fenghua Yu wrote:
> > +/*
> > + * The CPU hotplug callback sets the control MSR to the original control
> > + * value.
> > + */
> > +static int umwait_cpu_offline(unsigned int cpu)
> > +{
> > +	/*
> > +	 * This code is protected by the CPU hotplug already and
> > +	 * orig_umwait_control_cached is never changed after it caches
> > +	 * the original control MSR value in umwait_init(). So there
> > +	 * is no race condition here.
> > +	 */
> > +	wrmsr(MSR_IA32_UMWAIT_CONTROL, orig_umwait_control_cached, 0);
> 
> Even my brain compiler emits an error here.

Didn't parse correctly due to not enough coffee and dirty glasses. Sorry
for the noise.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [tip:x86/urgent] x86/umwait: Fix error handling in umwait_init()
  2019-08-10  1:40 [PATCH] x86/umwait: Fix error handling in umwait_init() Fenghua Yu
  2019-08-10  5:25 ` Thomas Gleixner
@ 2019-08-12 12:55 ` tip-bot for Fenghua Yu
  1 sibling, 0 replies; 4+ messages in thread
From: tip-bot for Fenghua Yu @ 2019-08-12 12:55 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, tglx, mingo, valdis.kletnieks, hpa, fenghua.yu

Commit-ID:  e7409258845a0f64967f8377e99294d438137537
Gitweb:     https://git.kernel.org/tip/e7409258845a0f64967f8377e99294d438137537
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Fri, 9 Aug 2019 18:40:37 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Mon, 12 Aug 2019 14:51:13 +0200

x86/umwait: Fix error handling in umwait_init()

Currently, failure of cpuhp_setup_state() is ignored and the syscore ops
and the control interfaces can still be added even after the failure. But,
this error handling will cause a few issues:

1. The CPUs may have different values in the IA32_UMWAIT_CONTROL
   MSR because there is no way to roll back the control MSR on
   the CPUs which already set the MSR before the failure.

2. If the sysfs interface is added successfully, there will be a mismatch
   between the global control value and the control MSR:
   - The interface shows the default global control value. But,
     the control MSR is not set to the value because the CPU online
     function, which is supposed to set the MSR to the value,
     is not installed.
   - If the sysadmin changes the global control value through
     the interface, the control MSR on all current online CPUs is
     set to the new value. But, the control MSR on newly onlined CPUs
     after the value change will not be set to the new value due to
     lack of the CPU online function.

3. On resume from suspend/hibernation, the boot CPU restores the control
   MSR to the global control value through the syscore ops. But, the
   control MSR on all APs is not set due to lake of the CPU online
   function.

To solve the issues and enforce consistent behavior on the failure
of the CPU hotplug setup, make the following changes:

1. Cache the original control MSR value which is configured by
   hardware or BIOS before kernel boot. This value is likely to
   be 0. But it could be a different number as well. Cache the
   control MSR only once before the MSR is changed.
2. Add the CPU offline function so that the MSR is restored to the
   original control value on all CPUs on the failure.
3. On the failure, exit from cpumait_init() so that the syscore ops
   and the control interfaces are not added.

Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1565401237-60936-1-git-send-email-fenghua.yu@intel.com

---
 arch/x86/kernel/cpu/umwait.c | 39 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c
index 6a204e7336c1..32b4dc9030aa 100644
--- a/arch/x86/kernel/cpu/umwait.c
+++ b/arch/x86/kernel/cpu/umwait.c
@@ -17,6 +17,12 @@
  */
 static u32 umwait_control_cached = UMWAIT_CTRL_VAL(100000, UMWAIT_C02_ENABLE);
 
+/*
+ * Cache the original IA32_UMWAIT_CONTROL MSR value which is configured by
+ * hardware or BIOS before kernel boot.
+ */
+static u32 orig_umwait_control_cached __ro_after_init;
+
 /*
  * Serialize access to umwait_control_cached and IA32_UMWAIT_CONTROL MSR in
  * the sysfs write functions.
@@ -52,6 +58,23 @@ static int umwait_cpu_online(unsigned int cpu)
 	return 0;
 }
 
+/*
+ * The CPU hotplug callback sets the control MSR to the original control
+ * value.
+ */
+static int umwait_cpu_offline(unsigned int cpu)
+{
+	/*
+	 * This code is protected by the CPU hotplug already and
+	 * orig_umwait_control_cached is never changed after it caches
+	 * the original control MSR value in umwait_init(). So there
+	 * is no race condition here.
+	 */
+	wrmsr(MSR_IA32_UMWAIT_CONTROL, orig_umwait_control_cached, 0);
+
+	return 0;
+}
+
 /*
  * On resume, restore IA32_UMWAIT_CONTROL MSR on the boot processor which
  * is the only active CPU at this time. The MSR is set up on the APs via the
@@ -185,8 +208,22 @@ static int __init umwait_init(void)
 	if (!boot_cpu_has(X86_FEATURE_WAITPKG))
 		return -ENODEV;
 
+	/*
+	 * Cache the original control MSR value before the control MSR is
+	 * changed. This is the only place where orig_umwait_control_cached
+	 * is modified.
+	 */
+	rdmsrl(MSR_IA32_UMWAIT_CONTROL, orig_umwait_control_cached);
+
 	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait:online",
-				umwait_cpu_online, NULL);
+				umwait_cpu_online, umwait_cpu_offline);
+	if (ret < 0) {
+		/*
+		 * On failure, the control MSR on all CPUs has the
+		 * original control value.
+		 */
+		return ret;
+	}
 
 	register_syscore_ops(&umwait_syscore_ops);
 

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-08-12 12:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-10  1:40 [PATCH] x86/umwait: Fix error handling in umwait_init() Fenghua Yu
2019-08-10  5:25 ` Thomas Gleixner
2019-08-10 13:09   ` Thomas Gleixner
2019-08-12 12:55 ` [tip:x86/urgent] " tip-bot for Fenghua Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).