linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/5] x86/umwait: Enable user wait instructions
@ 2019-06-07 22:00 Fenghua Yu
  2019-06-07 22:00 ` [PATCH v4 1/5] x86/cpufeatures: Enumerate " Fenghua Yu
                   ` (5 more replies)
  0 siblings, 6 replies; 40+ messages in thread
From: Fenghua Yu @ 2019-06-07 22:00 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Ashok Raj, Tony Luck, Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

Today, if an application needs to wait for a very short duration
they have to have spinloops. Spinloops consume more power and continue
to use execution resources that could hurt its thread siblings in a core
with hyperthreads. New instructions umonitor, umwait and tpause allow
a low power alternative waiting at the same time could improve the HT
sibling perform while giving it any power headroom. These instructions
can be used in both user space and kernel space.

A new MSR IA32_UMWAIT_CONTROL allows kernel to set a time limit in
TSC-quanta that prevents user applications from waiting for a long time.
This allows applications to yield the CPU and the user application
should consider using other alternatives to wait.

The processor supports two levels of optimized states: a light-weight
power/performance optimized state (C0.1 state) or an improved
power/performance optimized state (C0.2 state with deeper power saving
and higher exit latency). The above MSR can be used to restrict
entry to C0.2 and then any request for C0.2 will revert to C0.1.

This patch set covers feature discovery, provides initial values for
the MSR, adds some sysfs control files for admin to tweak the values
in the MSR if needed.

The sysfs interface files are in /sys/devices/system/cpu/umwait_control/

GCC 9 enables intrinsics for the instructions. To use the instructions,
user applications should include <immintrin.h> and be compiled with
-mwaitpkg.

Detailed information on the instructions, the MSR, and syntax of the
intrinsics can be found in the latest Intel Architecture Instruction
Set Extensions and Future Features Programming Reference and Intel 64
and IA-32 Architectures Software Developer's Manual.

Changelog:
v4:
- Error out when bit[1:0] in IA32_UMWAIT_CONTROL is not zero per
Andy Lutomirski's comment.
- Use umwait_control_cached to cache IA32_UMWAIT_CONTROL MSR. This
variable replaces the two previous variables umwait_max_time and
umwait_c0_2_enabled. The code is simpler than before and the cached MSR
will be easier to be used in future KVM support.

v3:
Address issues pointed out by Andy Lutomirski:
- Change default umwait max time to 100k TSC cycles
- Setting up MSR on BSP during resume suspend/hibernation
- A few other naming and coding changes as suggested
- Some security concerns of the user wait instructions are not issues
of the patches and cannot be addressed in the patch set. They will be
discussed on lkml.

Plus:
- Add ABI document entry for umwait control sysfs interfaces

v2:
- Address comments from Thomas Gleixner and Andy Lutomirski
- Remove vDSO functions
- Add sysfs control file for umwait max time

v1:
Based on comments from Thomas:
- Change user APIs to vDSO functions
- Changed sysfs per comments from Thomas.
- Change patch descriptions etc

Fenghua Yu (5):
  x86/cpufeatures: Enumerate user wait instructions
  x86/umwait: Initialize umwait control values
  x86/umwait: Add sysfs interface to control umwait C0.2 state
  x86/umwait: Add sysfs interface to control umwait maximum time
  x86/umwait: Document umwait control sysfs interfaces

 .../ABI/testing/sysfs-devices-system-cpu      |  21 ++
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/msr-index.h              |   4 +
 arch/x86/power/Makefile                       |   1 +
 arch/x86/power/umwait.c                       | 182 ++++++++++++++++++
 5 files changed, 209 insertions(+)
 create mode 100644 arch/x86/power/umwait.c

-- 
2.19.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v4 1/5] x86/cpufeatures: Enumerate user wait instructions
  2019-06-07 22:00 [PATCH v4 0/5] x86/umwait: Enable user wait instructions Fenghua Yu
@ 2019-06-07 22:00 ` Fenghua Yu
  2019-06-07 22:00 ` [PATCH v4 2/5] x86/umwait: Initialize umwait control values Fenghua Yu
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 40+ messages in thread
From: Fenghua Yu @ 2019-06-07 22:00 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Ashok Raj, Tony Luck, Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

umonitor, umwait, and tpause are a set of user wait instructions.

umonitor arms address monitoring hardware using an address. The
address range is determined by using CPUID.0x5. A store to
an address within the specified address range triggers the
monitoring hardware to wake up the processor waiting in umwait.

umwait instructs the processor to enter an implementation-dependent
optimized state while monitoring a range of addresses. The optimized
state may be either a light-weight power/performance optimized state
(C0.1 state) or an improved power/performance optimized state
(C0.2 state).

tpause instructs the processor to enter an implementation-dependent
optimized state C0.1 or C0.2 state and wake up when time-stamp counter
reaches specified timeout.

The three instructions may be executed at any privilege level.

The instructions provide power saving method while waiting in
user space. Additionally, they can allow a sibling hyperthread to
make faster progress while this thread is waiting. One example of an
application usage of umwait is when waiting for input data from another
application, such as a user level multi-threaded packet processing
engine.

Availability of the user wait instructions is indicated by the presence
of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5].

Detailed information on the instructions and CPUID feature WAITPKG flag
can be found in the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference and Intel 64 and IA-32
Architectures Software Developer's Manual.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 75f27ee2c263..b8bd428ae5bc 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -322,6 +322,7 @@
 #define X86_FEATURE_UMIP		(16*32+ 2) /* User Mode Instruction Protection */
 #define X86_FEATURE_PKU			(16*32+ 3) /* Protection Keys for Userspace */
 #define X86_FEATURE_OSPKE		(16*32+ 4) /* OS Protection Keys Enable */
+#define X86_FEATURE_WAITPKG		(16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */
 #define X86_FEATURE_AVX512_VBMI2	(16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */
 #define X86_FEATURE_GFNI		(16*32+ 8) /* Galois Field New Instructions */
 #define X86_FEATURE_VAES		(16*32+ 9) /* Vector AES */
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 2/5] x86/umwait: Initialize umwait control values
  2019-06-07 22:00 [PATCH v4 0/5] x86/umwait: Enable user wait instructions Fenghua Yu
  2019-06-07 22:00 ` [PATCH v4 1/5] x86/cpufeatures: Enumerate " Fenghua Yu
@ 2019-06-07 22:00 ` Fenghua Yu
  2019-06-08 22:52   ` Andy Lutomirski
  2019-06-11  8:50   ` Peter Zijlstra
  2019-06-07 22:00 ` [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state Fenghua Yu
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 40+ messages in thread
From: Fenghua Yu @ 2019-06-07 22:00 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Ashok Raj, Tony Luck, Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

umwait or tpause allows processor to enter a light-weight
power/performance optimized state (C0.1 state) or an improved
power/performance optimized state (C0.2 state) for a period
specified by the instruction or until the system time limit or until
a store to the monitored address range in umwait.

IA32_UMWAIT_CONTROL MSR register allows kernel to enable/disable C0.2
on the processor and set maximum time the processor can reside in
C0.1 or C0.2.

By default C0.2 is enabled so the user wait instructions can enter the
C0.2 state to save more power with slower wakeup time.

Default maximum umwait time is 100000 cycles. A later patch provides
a sysfs interface to adjust this value.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/msr-index.h |  4 +++
 arch/x86/power/Makefile          |  1 +
 arch/x86/power/umwait.c          | 56 ++++++++++++++++++++++++++++++++
 3 files changed, 61 insertions(+)
 create mode 100644 arch/x86/power/umwait.c

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 979ef971cc78..af502e947298 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -61,6 +61,10 @@
 #define MSR_PLATFORM_INFO_CPUID_FAULT_BIT	31
 #define MSR_PLATFORM_INFO_CPUID_FAULT		BIT_ULL(MSR_PLATFORM_INFO_CPUID_FAULT_BIT)
 
+#define MSR_IA32_UMWAIT_CONTROL			0xe1
+#define MSR_IA32_UMWAIT_CONTROL_C02		BIT(0)
+#define MSR_IA32_UMWAIT_CONTROL_MAX_TIME	0xfffffffc
+
 #define MSR_PKG_CST_CONFIG_CONTROL	0x000000e2
 #define NHM_C3_AUTO_DEMOTE		(1UL << 25)
 #define NHM_C1_AUTO_DEMOTE		(1UL << 26)
diff --git a/arch/x86/power/Makefile b/arch/x86/power/Makefile
index 37923d715741..62e2c609d1fe 100644
--- a/arch/x86/power/Makefile
+++ b/arch/x86/power/Makefile
@@ -8,3 +8,4 @@ CFLAGS_cpu.o	:= $(nostackp)
 
 obj-$(CONFIG_PM_SLEEP)		+= cpu.o
 obj-$(CONFIG_HIBERNATION)	+= hibernate_$(BITS).o hibernate_asm_$(BITS).o hibernate.o
+obj-y				+= umwait.o
diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c
new file mode 100644
index 000000000000..23151e77c138
--- /dev/null
+++ b/arch/x86/power/umwait.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/syscore_ops.h>
+#include <linux/suspend.h>
+#include <linux/cpu.h>
+#include <asm/msr.h>
+
+/*
+ * Cache IA32_UMWAIT_CONTROL MSR in this variable. All CPUs have the same
+ * MSR value. By default, umwait max time is 100000 in TSC-quanta and C0.2
+ * is enabled
+ */
+static u32 umwait_control_cached = 100000;
+
+/* Set up IA32_UMWAIT_CONTROL MSR on CPU using the current global setting. */
+static int umwait_cpu_online(unsigned int cpu)
+{
+	wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_control_cached, 0);
+
+	return 0;
+}
+
+/*
+ * On resume, set up IA32_UMWAIT_CONTROL MSR on BP which is the only active
+ * CPU at this time. Setting up the MSR on APs when they are re-added later
+ * using CPU hotplug.
+ * The MSR on BP is supposed not to be changed during suspend and thus it's
+ * unnecessary to set it again during resume from suspend. But at this point
+ * we don't know resume is from suspend or hibernation. To simplify the
+ * situation, just set up the MSR on resume from suspend.
+ */
+static void umwait_syscore_resume(void)
+{
+	wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_control_cached, 0);
+}
+
+static struct syscore_ops umwait_syscore_ops = {
+	.resume	= umwait_syscore_resume,
+};
+
+static int __init umwait_init(void)
+{
+	int ret;
+
+	if (!boot_cpu_has(X86_FEATURE_WAITPKG))
+		return -ENODEV;
+
+	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait/intel:online",
+				umwait_cpu_online, NULL);
+	if (ret < 0)
+		return ret;
+
+	register_syscore_ops(&umwait_syscore_ops);
+
+	return 0;
+}
+device_initcall(umwait_init);
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-07 22:00 [PATCH v4 0/5] x86/umwait: Enable user wait instructions Fenghua Yu
  2019-06-07 22:00 ` [PATCH v4 1/5] x86/cpufeatures: Enumerate " Fenghua Yu
  2019-06-07 22:00 ` [PATCH v4 2/5] x86/umwait: Initialize umwait control values Fenghua Yu
@ 2019-06-07 22:00 ` Fenghua Yu
  2019-06-08 22:50   ` Andy Lutomirski
                     ` (2 more replies)
  2019-06-07 22:00 ` [PATCH v4 4/5] x86/umwait: Add sysfs interface to control umwait maximum time Fenghua Yu
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 40+ messages in thread
From: Fenghua Yu @ 2019-06-07 22:00 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Ashok Raj, Tony Luck, Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

C0.2 state in umwait and tpause instructions can be enabled or disabled
on a processor through IA32_UMWAIT_CONTROL MSR register.

By default, C0.2 is enabled and the user wait instructions result in
lower power consumption with slower wakeup time.

But in real time systems which require faster wakeup time although power
savings could be smaller, the administrator needs to disable C0.2 and all
C0.2 requests from user applications revert to C0.1.

A sysfs interface "/sys/devices/system/cpu/umwait_control/enable_c02" is
created to allow the administrator to control C0.2 state during run time.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/power/umwait.c | 89 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 88 insertions(+), 1 deletion(-)

diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c
index 23151e77c138..9c176f3e59b6 100644
--- a/arch/x86/power/umwait.c
+++ b/arch/x86/power/umwait.c
@@ -11,10 +11,18 @@
  */
 static u32 umwait_control_cached = 100000;
 
+/*
+ * Serialize access to umwait_control_cached and IA32_UMWAIT_CONTROL MSR
+ * to guarantee all CPUs have the same MSR value.
+ */
+static DEFINE_MUTEX(umwait_lock);
+
 /* Set up IA32_UMWAIT_CONTROL MSR on CPU using the current global setting. */
 static int umwait_cpu_online(unsigned int cpu)
 {
+	mutex_lock(&umwait_lock);
 	wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_control_cached, 0);
+	mutex_unlock(&umwait_lock);
 
 	return 0;
 }
@@ -30,6 +38,7 @@ static int umwait_cpu_online(unsigned int cpu)
  */
 static void umwait_syscore_resume(void)
 {
+	/* No need to lock because only BP is running now. */
 	wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_control_cached, 0);
 }
 
@@ -37,17 +46,95 @@ static struct syscore_ops umwait_syscore_ops = {
 	.resume	= umwait_syscore_resume,
 };
 
+static void umwait_control_msr_update(void *unused)
+{
+	wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_control_cached, 0);
+}
+
+static u32 get_umwait_control_c02(void)
+{
+	return umwait_control_cached & MSR_IA32_UMWAIT_CONTROL_C02;
+}
+
+static u32 get_umwait_control_max_time(void)
+{
+	return umwait_control_cached & MSR_IA32_UMWAIT_CONTROL_MAX_TIME;
+}
+
+static ssize_t
+enable_c02_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	 /*
+	  * When bit 0 in IA32_UMWAIT_CONTROL MSR is 1, C0.2 is disabled.
+	  * Otherwise, C0.2 is enabled. Show the opposite of bit 0.
+	  */
+	return sprintf(buf, "%d\n", !(bool)get_umwait_control_c02());
+}
+
+static ssize_t enable_c02_store(struct device *dev,
+				struct device_attribute *attr,
+				const char *buf, size_t count)
+{
+	u32 umwait_control_c02;
+	bool c02_enabled;
+	int ret;
+
+	ret = kstrtobool(buf, &c02_enabled);
+	if (ret)
+		return ret;
+
+	mutex_lock(&umwait_lock);
+
+	/*
+	 * The value of bit 0 in IA32_UMWAIT_CONTROL MSR is opposite of
+	 * c02_enabled.
+	 */
+	umwait_control_c02 = (u32)!c02_enabled;
+	if (umwait_control_c02 == get_umwait_control_c02())
+		goto out_unlock;
+
+	umwait_control_cached = umwait_control_c02 | get_umwait_control_max_time();
+	/* Enable/disable C0.2 state on all CPUs */
+	on_each_cpu(umwait_control_msr_update, NULL, 1);
+
+out_unlock:
+	mutex_unlock(&umwait_lock);
+
+	return count;
+}
+static DEVICE_ATTR_RW(enable_c02);
+
+static struct attribute *umwait_attrs[] = {
+	&dev_attr_enable_c02.attr,
+	NULL
+};
+
+static struct attribute_group umwait_attr_group = {
+	.attrs = umwait_attrs,
+	.name = "umwait_control",
+};
+
 static int __init umwait_init(void)
 {
+	struct device *dev;
 	int ret;
 
 	if (!boot_cpu_has(X86_FEATURE_WAITPKG))
 		return -ENODEV;
 
+	/* Add umwait control interface. */
+	dev = cpu_subsys.dev_root;
+	ret = sysfs_create_group(&dev->kobj, &umwait_attr_group);
+	if (ret)
+		return ret;
+
 	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait/intel:online",
 				umwait_cpu_online, NULL);
-	if (ret < 0)
+	if (ret < 0) {
+		sysfs_remove_group(&dev->kobj, &umwait_attr_group);
+
 		return ret;
+	}
 
 	register_syscore_ops(&umwait_syscore_ops);
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 4/5] x86/umwait: Add sysfs interface to control umwait maximum time
  2019-06-07 22:00 [PATCH v4 0/5] x86/umwait: Enable user wait instructions Fenghua Yu
                   ` (2 preceding siblings ...)
  2019-06-07 22:00 ` [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state Fenghua Yu
@ 2019-06-07 22:00 ` Fenghua Yu
  2019-06-07 22:00 ` [PATCH v4 5/5] x86/umwait: Document umwait control sysfs interfaces Fenghua Yu
  2019-06-11  9:01 ` [PATCH v4 0/5] x86/umwait: Enable user wait instructions Peter Zijlstra
  5 siblings, 0 replies; 40+ messages in thread
From: Fenghua Yu @ 2019-06-07 22:00 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Ashok Raj, Tony Luck, Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

IA32_UMWAIT_CONTROL[31:2] determines the maximum time in TSC-quanta
that processor can stay in C0.1 or C0.2. A zero value means no maximum
time.

Each instruction sets its own deadline in the instruction's implicit
input EDX:EAX value. The instruction wakes up if the time-stamp counter
reaches or exceeds the specified deadline, or the umwait maximum time
expires, or a store happens in the monitored address range in umwait.

Users can write an unsigned 32-bit number to
/sys/devices/system/cpu/umwait_control/max_time to change the default
value. Note that a value of zero means there is no limit. Low order
two bits must be zero.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/power/umwait.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c
index 9c176f3e59b6..7fa381e3fd4e 100644
--- a/arch/x86/power/umwait.c
+++ b/arch/x86/power/umwait.c
@@ -104,8 +104,47 @@ static ssize_t enable_c02_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(enable_c02);
 
+static ssize_t
+max_time_show(struct device *kobj, struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%u\n", get_umwait_control_max_time());
+}
+
+static ssize_t max_time_store(struct device *kobj,
+			      struct device_attribute *attr,
+			      const char *buf, size_t count)
+{
+	u32 max_time;
+	int ret;
+
+	ret = kstrtou32(buf, 0, &max_time);
+	if (ret)
+		return ret;
+
+	/* bits[1:0] must be zero */
+	if (max_time & ~MSR_IA32_UMWAIT_CONTROL_MAX_TIME)
+		return -EINVAL;
+
+	mutex_lock(&umwait_lock);
+
+	if (max_time == get_umwait_control_max_time())
+		goto out_unlock;
+
+	umwait_control_cached = max_time | get_umwait_control_c02();
+
+	/* Update umwait max time on all CPUs */
+	on_each_cpu(umwait_control_msr_update, NULL, 1);
+
+out_unlock:
+	mutex_unlock(&umwait_lock);
+
+	return count;
+}
+static DEVICE_ATTR_RW(max_time);
+
 static struct attribute *umwait_attrs[] = {
 	&dev_attr_enable_c02.attr,
+	&dev_attr_max_time.attr,
 	NULL
 };
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 5/5] x86/umwait: Document umwait control sysfs interfaces
  2019-06-07 22:00 [PATCH v4 0/5] x86/umwait: Enable user wait instructions Fenghua Yu
                   ` (3 preceding siblings ...)
  2019-06-07 22:00 ` [PATCH v4 4/5] x86/umwait: Add sysfs interface to control umwait maximum time Fenghua Yu
@ 2019-06-07 22:00 ` Fenghua Yu
  2019-06-11  9:01 ` [PATCH v4 0/5] x86/umwait: Enable user wait instructions Peter Zijlstra
  5 siblings, 0 replies; 40+ messages in thread
From: Fenghua Yu @ 2019-06-07 22:00 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Ashok Raj, Tony Luck, Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

Since two new sysfs interface files are created for umwait control, add
an ABI document entry for the files:
	/sys/devices/system/cpu/umwait_control/enable_c02
	/sys/devices/system/cpu/umwait_control/max_time

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
---
 .../ABI/testing/sysfs-devices-system-cpu      | 21 +++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 1528239f69b2..d22cdadd3161 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -538,3 +538,24 @@ Description:	Intel Energy and Performance Bias Hint (EPB)
 
 		This attribute is present for all online CPUs supporting the
 		Intel EPB feature.
+
+What:		/sys/devices/system/cpu/umwait_control
+		/sys/devices/system/cpu/umwait_control/enable_c02
+		/sys/devices/system/cpu/umwait_control/max_time
+Date:		May 2019
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	Umwait control
+
+		enable_c02: Read/write interface to control umwait C0.2 state
+			Read returns C0.2 state status:
+				0: C0.2 is disabled
+				1: C0.2 is enabled
+
+			Write 'Yy1' or [oO][nN] for on to enable C0.2 state.
+			Write 'Nn0' or [oO][fF] for off to disable C0.2 state.
+
+		max_time: Read/write interface to control umwait maximum time
+			  in TSC-quanta that the CPU can reside in either C0.1
+			  or C0.2 state. The time is an unsigned 32-bit number.
+			  Note that a value of zero means there is no limit.
+			  Low order two bits must be zero.
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-07 22:00 ` [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state Fenghua Yu
@ 2019-06-08 22:50   ` Andy Lutomirski
  2019-06-10  3:53     ` Fenghua Yu
  2019-06-08 22:52   ` Andy Lutomirski
  2019-06-11  8:54   ` Peter Zijlstra
  2 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2019-06-08 22:50 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Fri, Jun 7, 2019 at 3:10 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
>
> C0.2 state in umwait and tpause instructions can be enabled or disabled
> on a processor through IA32_UMWAIT_CONTROL MSR register.
>
> By default, C0.2 is enabled and the user wait instructions result in
> lower power consumption with slower wakeup time.
>
> But in real time systems which require faster wakeup time although power
> savings could be smaller, the administrator needs to disable C0.2 and all
> C0.2 requests from user applications revert to C0.1.
>
> A sysfs interface "/sys/devices/system/cpu/umwait_control/enable_c02" is
> created to allow the administrator to control C0.2 state during run time.

This looks better than the previous version.  I think the locking is
still rather confused.  You have a mutex that you hold while changing
the value, which is entirely reasonable.  But, of the code paths that
write the MSR, only one takes the mutex.

I think you should consider making a function that just does:

wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);

and using it in all the places that update the MSR.  The only thing
that should need the lock is the sysfs code to avoid accidentally
corrupting the value, but that code should also use WRITE_ONCE to do
its update.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-07 22:00 ` [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state Fenghua Yu
  2019-06-08 22:50   ` Andy Lutomirski
@ 2019-06-08 22:52   ` Andy Lutomirski
  2019-06-10  4:04     ` Fenghua Yu
  2019-06-11  8:54   ` Peter Zijlstra
  2 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2019-06-08 22:52 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Fri, Jun 7, 2019 at 3:10 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
>
> C0.2 state in umwait and tpause instructions can be enabled or disabled
> on a processor through IA32_UMWAIT_CONTROL MSR register.
>

> +static u32 get_umwait_control_c02(void)
> +{
> +       return umwait_control_cached & MSR_IA32_UMWAIT_CONTROL_C02;
> +}
> +
> +static u32 get_umwait_control_max_time(void)
> +{
> +       return umwait_control_cached & MSR_IA32_UMWAIT_CONTROL_MAX_TIME;
> +}
> +

I'm not convinced that these helpers make the code any more readable.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/5] x86/umwait: Initialize umwait control values
  2019-06-07 22:00 ` [PATCH v4 2/5] x86/umwait: Initialize umwait control values Fenghua Yu
@ 2019-06-08 22:52   ` Andy Lutomirski
  2019-06-10  4:13     ` Fenghua Yu
  2019-06-11  8:50   ` Peter Zijlstra
  1 sibling, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2019-06-08 22:52 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Fri, Jun 7, 2019 at 3:10 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
>
> umwait or tpause allows processor to enter a light-weight
> power/performance optimized state (C0.1 state) or an improved
> power/performance optimized state (C0.2 state) for a period
> specified by the instruction or until the system time limit or until
> a store to the monitored address range in umwait.
>
> IA32_UMWAIT_CONTROL MSR register allows kernel to enable/disable C0.2
> on the processor and set maximum time the processor can reside in
> C0.1 or C0.2.
>
> By default C0.2 is enabled so the user wait instructions can enter the
> C0.2 state to save more power with slower wakeup time.

Sounds good, but:

> +#define MSR_IA32_UMWAIT_CONTROL_C02            BIT(0)

> +static u32 umwait_control_cached = 100000;

The code seems to disagree.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-08 22:50   ` Andy Lutomirski
@ 2019-06-10  3:53     ` Fenghua Yu
  2019-06-10  4:24       ` Andy Lutomirski
  0 siblings, 1 reply; 40+ messages in thread
From: Fenghua Yu @ 2019-06-10  3:53 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Ashok Raj, Tony Luck, Ravi V Shankar, linux-kernel, x86

On Sat, Jun 08, 2019 at 03:50:32PM -0700, Andy Lutomirski wrote:
> On Fri, Jun 7, 2019 at 3:10 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> >
> > C0.2 state in umwait and tpause instructions can be enabled or disabled
> > on a processor through IA32_UMWAIT_CONTROL MSR register.
> >
> > By default, C0.2 is enabled and the user wait instructions result in
> > lower power consumption with slower wakeup time.
> >
> > But in real time systems which require faster wakeup time although power
> > savings could be smaller, the administrator needs to disable C0.2 and all
> > C0.2 requests from user applications revert to C0.1.
> >
> > A sysfs interface "/sys/devices/system/cpu/umwait_control/enable_c02" is
> > created to allow the administrator to control C0.2 state during run time.
> 
> This looks better than the previous version.  I think the locking is
> still rather confused.  You have a mutex that you hold while changing
> the value, which is entirely reasonable.  But, of the code paths that
> write the MSR, only one takes the mutex.
> 
> I think you should consider making a function that just does:
> 
> wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
> 
> and using it in all the places that update the MSR.  The only thing
> that should need the lock is the sysfs code to avoid accidentally
> corrupting the value, but that code should also use WRITE_ONCE to do
> its update.

Based on the comment, the illustrative CPU online and enable_c02 store
functions would be:

umwait_cpu_online()
{
        wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
        return 0;
}

enable_c02_store()
{
       mutex_lock(&umwait_lock);
       umwait_control_c02 = (u32)!c02_enabled;
       WRITE_ONCE(umwait_control_cached, 2 | get_umwait_control_max_time());
       on_each_cpu(umwait_control_msr_update, NULL, 1);
       mutex_unlock(&umwait_lock);
}

Then suppose umwait_control_cached = 100000 initially and only CPU0 is
running. Admin change bit 0 in MSR from 0 to 1 to disable C0.2 and is
onlining CPU1 in the same time:

1. On CPU1, read umwait_control_cached to eax as 100000 in
umwait_cpu_online()
2. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
3. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()
4. On CPU0, wrmsr with 100001 in enabled_c02_store()

The result is CPU0 and CPU1 have different MSR values.

The problem is because there is no wrmsr serialization b/w uwait_cpu_online()
and enable_c02_store(). The WRITE_ONCE() and READ_ONCE() only serialize
access to umwait_control_cached. But we need to serialize wrmsr() as well to
guarantee all CPUs have the same MSR value.

So does it make sense to keep the mutex and locking as the current patch does?

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-08 22:52   ` Andy Lutomirski
@ 2019-06-10  4:04     ` Fenghua Yu
  2019-06-10  4:26       ` Andy Lutomirski
  0 siblings, 1 reply; 40+ messages in thread
From: Fenghua Yu @ 2019-06-10  4:04 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Ashok Raj, Tony Luck, Ravi V Shankar, linux-kernel, x86

On Sat, Jun 08, 2019 at 03:52:03PM -0700, Andy Lutomirski wrote:
> On Fri, Jun 7, 2019 at 3:10 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> >
> > C0.2 state in umwait and tpause instructions can be enabled or disabled
> > on a processor through IA32_UMWAIT_CONTROL MSR register.
> >
> 
> > +static u32 get_umwait_control_c02(void)
> > +{
> > +       return umwait_control_cached & MSR_IA32_UMWAIT_CONTROL_C02;
> > +}
> > +
> > +static u32 get_umwait_control_max_time(void)
> > +{
> > +       return umwait_control_cached & MSR_IA32_UMWAIT_CONTROL_MAX_TIME;
> > +}
> > +
> 
> I'm not convinced that these helpers make the code any more readable.

The helpers reduce length of statements that call them. Otherwise, all of
the statements would be easily over 80 characters.

Plus, each of the helpers is called multiple places in #0003 and #0004.
So the helpers make the patches smaller and cleaner.

So is it still OK to keep the helpers?

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/5] x86/umwait: Initialize umwait control values
  2019-06-08 22:52   ` Andy Lutomirski
@ 2019-06-10  4:13     ` Fenghua Yu
  2019-06-10  4:27       ` Andy Lutomirski
  2019-06-11 20:46       ` Thomas Gleixner
  0 siblings, 2 replies; 40+ messages in thread
From: Fenghua Yu @ 2019-06-10  4:13 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Ashok Raj, Tony Luck, Ravi V Shankar, linux-kernel, x86

On Sat, Jun 08, 2019 at 03:52:42PM -0700, Andy Lutomirski wrote:
> On Fri, Jun 7, 2019 at 3:10 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> >
> > umwait or tpause allows processor to enter a light-weight
> > power/performance optimized state (C0.1 state) or an improved
> > power/performance optimized state (C0.2 state) for a period
> > specified by the instruction or until the system time limit or until
> > a store to the monitored address range in umwait.
> >
> > IA32_UMWAIT_CONTROL MSR register allows kernel to enable/disable C0.2
> > on the processor and set maximum time the processor can reside in
> > C0.1 or C0.2.
> >
> > By default C0.2 is enabled so the user wait instructions can enter the
> > C0.2 state to save more power with slower wakeup time.
> 
> Sounds good, but:
> 
> > +#define MSR_IA32_UMWAIT_CONTROL_C02            BIT(0)
> 
> > +static u32 umwait_control_cached = 100000;
> 
> The code seems to disagree.

The definition of bit[0] is: C0.2 is disabled when bit[0]=1. So
100000 means C0.2 is enabled (and max time is 100000).

Would it be better to change 
+#define MSR_IA32_UMWAIT_CONTROL_C02            BIT(0)
to
+#define MSR_IA32_UMWAIT_CONTROL_C02_DISABLED            BIT(0)
?

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-10  3:53     ` Fenghua Yu
@ 2019-06-10  4:24       ` Andy Lutomirski
  2019-06-10  6:02         ` Fenghua Yu
  0 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2019-06-10  4:24 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Sun, Jun 9, 2019 at 9:02 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
>
> On Sat, Jun 08, 2019 at 03:50:32PM -0700, Andy Lutomirski wrote:
> > On Fri, Jun 7, 2019 at 3:10 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > >
> > > C0.2 state in umwait and tpause instructions can be enabled or disabled
> > > on a processor through IA32_UMWAIT_CONTROL MSR register.
> > >
> > > By default, C0.2 is enabled and the user wait instructions result in
> > > lower power consumption with slower wakeup time.
> > >
> > > But in real time systems which require faster wakeup time although power
> > > savings could be smaller, the administrator needs to disable C0.2 and all
> > > C0.2 requests from user applications revert to C0.1.
> > >
> > > A sysfs interface "/sys/devices/system/cpu/umwait_control/enable_c02" is
> > > created to allow the administrator to control C0.2 state during run time.
> >
> > This looks better than the previous version.  I think the locking is
> > still rather confused.  You have a mutex that you hold while changing
> > the value, which is entirely reasonable.  But, of the code paths that
> > write the MSR, only one takes the mutex.
> >
> > I think you should consider making a function that just does:
> >
> > wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
> >
> > and using it in all the places that update the MSR.  The only thing
> > that should need the lock is the sysfs code to avoid accidentally
> > corrupting the value, but that code should also use WRITE_ONCE to do
> > its update.
>
> Based on the comment, the illustrative CPU online and enable_c02 store
> functions would be:
>
> umwait_cpu_online()
> {
>         wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
>         return 0;
> }
>
> enable_c02_store()
> {
>        mutex_lock(&umwait_lock);
>        umwait_control_c02 = (u32)!c02_enabled;
>        WRITE_ONCE(umwait_control_cached, 2 | get_umwait_control_max_time());
>        on_each_cpu(umwait_control_msr_update, NULL, 1);
>        mutex_unlock(&umwait_lock);
> }
>
> Then suppose umwait_control_cached = 100000 initially and only CPU0 is
> running. Admin change bit 0 in MSR from 0 to 1 to disable C0.2 and is
> onlining CPU1 in the same time:
>
> 1. On CPU1, read umwait_control_cached to eax as 100000 in
> umwait_cpu_online()
> 2. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
> 3. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()
> 4. On CPU0, wrmsr with 100001 in enabled_c02_store()
>
> The result is CPU0 and CPU1 have different MSR values.

Yes, but only transiently, because you didn't finish your example.

Step 5: enable_c02_store() does on_each_cpu(), and CPU 1 gets updated.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-10  4:04     ` Fenghua Yu
@ 2019-06-10  4:26       ` Andy Lutomirski
  2019-06-17 22:48         ` Fenghua Yu
  0 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2019-06-10  4:26 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Sun, Jun 9, 2019 at 9:14 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
>
> On Sat, Jun 08, 2019 at 03:52:03PM -0700, Andy Lutomirski wrote:
> > On Fri, Jun 7, 2019 at 3:10 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > >
> > > C0.2 state in umwait and tpause instructions can be enabled or disabled
> > > on a processor through IA32_UMWAIT_CONTROL MSR register.
> > >
> >
> > > +static u32 get_umwait_control_c02(void)
> > > +{
> > > +       return umwait_control_cached & MSR_IA32_UMWAIT_CONTROL_C02;
> > > +}
> > > +
> > > +static u32 get_umwait_control_max_time(void)
> > > +{
> > > +       return umwait_control_cached & MSR_IA32_UMWAIT_CONTROL_MAX_TIME;
> > > +}
> > > +
> >
> > I'm not convinced that these helpers make the code any more readable.
>
> The helpers reduce length of statements that call them. Otherwise, all of
> the statements would be easily over 80 characters.
>
> Plus, each of the helpers is called multiple places in #0003 and #0004.
> So the helpers make the patches smaller and cleaner.
>

I was imagining things like:

umwait_control_cached &= ~MSR_IA32_UMWAIT_CONTROL_C02;
if (whatever condition)
  umwait_control_cached |= MSR_IA32_UMWAIT_CONTROL_C02;
umwait_control_cached &= ~MSR_IA32_UMWAIT_CONTROL_MAX_TIME;
umwait_control_cached |= new_max_time;

You could save 8 characters by just calling the variable umwait_control.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/5] x86/umwait: Initialize umwait control values
  2019-06-10  4:13     ` Fenghua Yu
@ 2019-06-10  4:27       ` Andy Lutomirski
  2019-06-11 20:46       ` Thomas Gleixner
  1 sibling, 0 replies; 40+ messages in thread
From: Andy Lutomirski @ 2019-06-10  4:27 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Sun, Jun 9, 2019 at 9:23 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
>
> On Sat, Jun 08, 2019 at 03:52:42PM -0700, Andy Lutomirski wrote:
> > On Fri, Jun 7, 2019 at 3:10 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > >
> > > umwait or tpause allows processor to enter a light-weight
> > > power/performance optimized state (C0.1 state) or an improved
> > > power/performance optimized state (C0.2 state) for a period
> > > specified by the instruction or until the system time limit or until
> > > a store to the monitored address range in umwait.
> > >
> > > IA32_UMWAIT_CONTROL MSR register allows kernel to enable/disable C0.2
> > > on the processor and set maximum time the processor can reside in
> > > C0.1 or C0.2.
> > >
> > > By default C0.2 is enabled so the user wait instructions can enter the
> > > C0.2 state to save more power with slower wakeup time.
> >
> > Sounds good, but:
> >
> > > +#define MSR_IA32_UMWAIT_CONTROL_C02            BIT(0)
> >
> > > +static u32 umwait_control_cached = 100000;
> >
> > The code seems to disagree.
>
> The definition of bit[0] is: C0.2 is disabled when bit[0]=1. So
> 100000 means C0.2 is enabled (and max time is 100000).
>
> Would it be better to change
> +#define MSR_IA32_UMWAIT_CONTROL_C02            BIT(0)
> to
> +#define MSR_IA32_UMWAIT_CONTROL_C02_DISABLED            BIT(0)

Sounds like a good improvement.

Thanks,
Andy

> ?
>
> Thanks.
>
> -Fenghua

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-10  4:24       ` Andy Lutomirski
@ 2019-06-10  6:02         ` Fenghua Yu
  2019-06-10 13:41           ` Andy Lutomirski
  0 siblings, 1 reply; 40+ messages in thread
From: Fenghua Yu @ 2019-06-10  6:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Ashok Raj, Tony Luck, Ravi V Shankar, linux-kernel, x86

On Sun, Jun 09, 2019 at 09:24:18PM -0700, Andy Lutomirski wrote:
> On Sun, Jun 9, 2019 at 9:02 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> >
> > On Sat, Jun 08, 2019 at 03:50:32PM -0700, Andy Lutomirski wrote:
> > > On Fri, Jun 7, 2019 at 3:10 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > > >
> > > > C0.2 state in umwait and tpause instructions can be enabled or disabled
> > > > on a processor through IA32_UMWAIT_CONTROL MSR register.
> > > >
> > > > By default, C0.2 is enabled and the user wait instructions result in
> > > > lower power consumption with slower wakeup time.
> > > >
> > > > But in real time systems which require faster wakeup time although power
> > > > savings could be smaller, the administrator needs to disable C0.2 and all
> > > > C0.2 requests from user applications revert to C0.1.
> > > >
> > > > A sysfs interface "/sys/devices/system/cpu/umwait_control/enable_c02" is
> > > > created to allow the administrator to control C0.2 state during run time.
> > >
> > > This looks better than the previous version.  I think the locking is
> > > still rather confused.  You have a mutex that you hold while changing
> > > the value, which is entirely reasonable.  But, of the code paths that
> > > write the MSR, only one takes the mutex.
> > >
> > > I think you should consider making a function that just does:
> > >
> > > wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
> > >
> > > and using it in all the places that update the MSR.  The only thing
> > > that should need the lock is the sysfs code to avoid accidentally
> > > corrupting the value, but that code should also use WRITE_ONCE to do
> > > its update.
> >
> > Based on the comment, the illustrative CPU online and enable_c02 store
> > functions would be:
> >
> > umwait_cpu_online()
> > {
> >         wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
> >         return 0;
> > }
> >
> > enable_c02_store()
> > {
> >        mutex_lock(&umwait_lock);
> >        umwait_control_c02 = (u32)!c02_enabled;
> >        WRITE_ONCE(umwait_control_cached, 2 | get_umwait_control_max_time());
> >        on_each_cpu(umwait_control_msr_update, NULL, 1);
> >        mutex_unlock(&umwait_lock);
> > }
> >
> > Then suppose umwait_control_cached = 100000 initially and only CPU0 is
> > running. Admin change bit 0 in MSR from 0 to 1 to disable C0.2 and is
> > onlining CPU1 in the same time:
> >
> > 1. On CPU1, read umwait_control_cached to eax as 100000 in
> > umwait_cpu_online()
> > 2. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
> > 3. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()
> > 4. On CPU0, wrmsr with 100001 in enabled_c02_store()
> >
> > The result is CPU0 and CPU1 have different MSR values.
> 
> Yes, but only transiently, because you didn't finish your example.
> 
> Step 5: enable_c02_store() does on_each_cpu(), and CPU 1 gets updated.

There is no sync on wrmsr on CPU0 and CPU1. So a better sequence to
describe the problem is changing the order of wrmsr:

1. On CPU1, read umwait_control_cached to eax as 100000 in
umwait_cpu_online()
2. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
3. On CPU0, wrmsr with 100001 in on_each_cpu() in enabled_c02_store()
4. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()

So CPU1 and CPU0 have different MSR values. This won't be transient.

So we do need the mutex as in the current patch, right?

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-10  6:02         ` Fenghua Yu
@ 2019-06-10 13:41           ` Andy Lutomirski
  2019-06-17 20:27             ` Fenghua Yu
  0 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2019-06-10 13:41 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86



> On Jun 9, 2019, at 11:02 PM, Fenghua Yu <fenghua.yu@intel.com> wrote:
> 
>> On Sun, Jun 09, 2019 at 09:24:18PM -0700, Andy Lutomirski wrote:
>>> On Sun, Jun 9, 2019 at 9:02 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
>>> 
>>>> On Sat, Jun 08, 2019 at 03:50:32PM -0700, Andy Lutomirski wrote:
>>>>> On Fri, Jun 7, 2019 at 3:10 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
>>>>> 
>>>>> C0.2 state in umwait and tpause instructions can be enabled or disabled
>>>>> on a processor through IA32_UMWAIT_CONTROL MSR register.
>>>>> 
>>>>> By default, C0.2 is enabled and the user wait instructions result in
>>>>> lower power consumption with slower wakeup time.
>>>>> 
>>>>> But in real time systems which require faster wakeup time although power
>>>>> savings could be smaller, the administrator needs to disable C0.2 and all
>>>>> C0.2 requests from user applications revert to C0.1.
>>>>> 
>>>>> A sysfs interface "/sys/devices/system/cpu/umwait_control/enable_c02" is
>>>>> created to allow the administrator to control C0.2 state during run time.
>>>> 
>>>> This looks better than the previous version.  I think the locking is
>>>> still rather confused.  You have a mutex that you hold while changing
>>>> the value, which is entirely reasonable.  But, of the code paths that
>>>> write the MSR, only one takes the mutex.
>>>> 
>>>> I think you should consider making a function that just does:
>>>> 
>>>> wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
>>>> 
>>>> and using it in all the places that update the MSR.  The only thing
>>>> that should need the lock is the sysfs code to avoid accidentally
>>>> corrupting the value, but that code should also use WRITE_ONCE to do
>>>> its update.
>>> 
>>> Based on the comment, the illustrative CPU online and enable_c02 store
>>> functions would be:
>>> 
>>> umwait_cpu_online()
>>> {
>>>        wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
>>>        return 0;
>>> }
>>> 
>>> enable_c02_store()
>>> {
>>>       mutex_lock(&umwait_lock);
>>>       umwait_control_c02 = (u32)!c02_enabled;
>>>       WRITE_ONCE(umwait_control_cached, 2 | get_umwait_control_max_time());
>>>       on_each_cpu(umwait_control_msr_update, NULL, 1);
>>>       mutex_unlock(&umwait_lock);
>>> }
>>> 
>>> Then suppose umwait_control_cached = 100000 initially and only CPU0 is
>>> running. Admin change bit 0 in MSR from 0 to 1 to disable C0.2 and is
>>> onlining CPU1 in the same time:
>>> 
>>> 1. On CPU1, read umwait_control_cached to eax as 100000 in
>>> umwait_cpu_online()
>>> 2. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
>>> 3. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()
>>> 4. On CPU0, wrmsr with 100001 in enabled_c02_store()
>>> 
>>> The result is CPU0 and CPU1 have different MSR values.
>> 
>> Yes, but only transiently, because you didn't finish your example.
>> 
>> Step 5: enable_c02_store() does on_each_cpu(), and CPU 1 gets updated.
> 
> There is no sync on wrmsr on CPU0 and CPU1.

What do you mean by sync?

> So a better sequence to
> describe the problem is changing the order of wrmsr:
> 
> 1. On CPU1, read umwait_control_cached to eax as 100000 in
> umwait_cpu_online()
> 2. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
> 3. On CPU0, wrmsr with 100001 in on_each_cpu() in enabled_c02_store()
> 4. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()
> 
> So CPU1 and CPU0 have different MSR values. This won't be transient.

You are still ignoring the wrmsr on CPU1 due to on_each_cpu().


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/5] x86/umwait: Initialize umwait control values
  2019-06-07 22:00 ` [PATCH v4 2/5] x86/umwait: Initialize umwait control values Fenghua Yu
  2019-06-08 22:52   ` Andy Lutomirski
@ 2019-06-11  8:50   ` Peter Zijlstra
  2019-06-11 17:04     ` Fenghua Yu
  1 sibling, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2019-06-11  8:50 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Fri, Jun 07, 2019 at 03:00:34PM -0700, Fenghua Yu wrote:
> umwait or tpause allows processor to enter a light-weight
> power/performance optimized state (C0.1 state) or an improved
> power/performance optimized state (C0.2 state) for a period
> specified by the instruction or until the system time limit or until
> a store to the monitored address range in umwait.
> 
> IA32_UMWAIT_CONTROL MSR register allows kernel to enable/disable C0.2
> on the processor and set maximum time the processor can reside in
> C0.1 or C0.2.
> 
> By default C0.2 is enabled so the user wait instructions can enter the
> C0.2 state to save more power with slower wakeup time.
> 
> Default maximum umwait time is 100000 cycles. A later patch provides
> a sysfs interface to adjust this value.
> 
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Reviewed-by: Ashok Raj <ashok.raj@intel.com>
> Reviewed-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/include/asm/msr-index.h |  4 +++
>  arch/x86/power/Makefile          |  1 +
>  arch/x86/power/umwait.c          | 56 ++++++++++++++++++++++++++++++++

Why is this in power/, this isn't in the least related to
suspend/hybernate. arch/x86/kernel/cpu/ might be a better place for
instruction support.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-07 22:00 ` [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state Fenghua Yu
  2019-06-08 22:50   ` Andy Lutomirski
  2019-06-08 22:52   ` Andy Lutomirski
@ 2019-06-11  8:54   ` Peter Zijlstra
  2019-06-11 16:04     ` Andy Lutomirski
  2 siblings, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2019-06-11  8:54 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Fri, Jun 07, 2019 at 03:00:35PM -0700, Fenghua Yu wrote:
> C0.2 state in umwait and tpause instructions can be enabled or disabled
> on a processor through IA32_UMWAIT_CONTROL MSR register.
> 
> By default, C0.2 is enabled and the user wait instructions result in
> lower power consumption with slower wakeup time.
> 
> But in real time systems which require faster wakeup time although power
> savings could be smaller, the administrator needs to disable C0.2 and all
> C0.2 requests from user applications revert to C0.1.
> 
> A sysfs interface "/sys/devices/system/cpu/umwait_control/enable_c02" is
> created to allow the administrator to control C0.2 state during run time.

We already have an interface for applications to convey their latency
requirements (pm-qos). We do not need another magic sys variable.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/5] x86/umwait: Enable user wait instructions
  2019-06-07 22:00 [PATCH v4 0/5] x86/umwait: Enable user wait instructions Fenghua Yu
                   ` (4 preceding siblings ...)
  2019-06-07 22:00 ` [PATCH v4 5/5] x86/umwait: Document umwait control sysfs interfaces Fenghua Yu
@ 2019-06-11  9:01 ` Peter Zijlstra
  2019-06-11 17:37   ` Fenghua Yu
  5 siblings, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2019-06-11  9:01 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Fri, Jun 07, 2019 at 03:00:32PM -0700, Fenghua Yu wrote:
> Today, if an application needs to wait for a very short duration
> they have to have spinloops. Spinloops consume more power and continue
> to use execution resources that could hurt its thread siblings in a core
> with hyperthreads. New instructions umonitor, umwait and tpause allow
> a low power alternative waiting at the same time could improve the HT
> sibling perform while giving it any power headroom. These instructions
> can be used in both user space and kernel space.
> 
> A new MSR IA32_UMWAIT_CONTROL allows kernel to set a time limit in
> TSC-quanta that prevents user applications from waiting for a long time.
> This allows applications to yield the CPU and the user application
> should consider using other alternatives to wait.

I'm confused on the purpose of this control; what do we win by limiting
this time?

>  .../ABI/testing/sysfs-devices-system-cpu      |  21 ++
>  arch/x86/include/asm/cpufeatures.h            |   1 +
>  arch/x86/include/asm/msr-index.h              |   4 +
>  arch/x86/power/Makefile                       |   1 +
>  arch/x86/power/umwait.c                       | 182 ++++++++++++++++++

You seem to miss the arch/x86/lib/delay.c change to use this fancy new
stuff for udelay(). I'm thinking that's exactly what TPAUSE is good for.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-11  8:54   ` Peter Zijlstra
@ 2019-06-11 16:04     ` Andy Lutomirski
  2019-06-11 17:27       ` Peter Zijlstra
  0 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2019-06-11 16:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Andy Lutomirski, Ashok Raj, Tony Luck,
	Ravi V Shankar, linux-kernel, x86



> On Jun 11, 2019, at 1:54 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> On Fri, Jun 07, 2019 at 03:00:35PM -0700, Fenghua Yu wrote:
>> C0.2 state in umwait and tpause instructions can be enabled or disabled
>> on a processor through IA32_UMWAIT_CONTROL MSR register.
>> 
>> By default, C0.2 is enabled and the user wait instructions result in
>> lower power consumption with slower wakeup time.
>> 
>> But in real time systems which require faster wakeup time although power
>> savings could be smaller, the administrator needs to disable C0.2 and all
>> C0.2 requests from user applications revert to C0.1.
>> 
>> A sysfs interface "/sys/devices/system/cpu/umwait_control/enable_c02" is
>> created to allow the administrator to control C0.2 state during run time.
> 
> We already have an interface for applications to convey their latency
> requirements (pm-qos). We do not need another magic sys variable.

I’m not sure I agree.  This isn’t an overall latency request, and setting an absurdly low pm_qos will badly hurt idle power and turbo performance.  Also, pm_qos isn’t exactly beautiful.

(I speak from some experience. I may be literally the only person to write a driver that listens to dev_pm_qos latency requests. And, in my production box, I directly disable c states instead of messing with pm_qos.)

I do wonder whether anyone will ever use this particular control, though.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/5] x86/umwait: Initialize umwait control values
  2019-06-11  8:50   ` Peter Zijlstra
@ 2019-06-11 17:04     ` Fenghua Yu
  0 siblings, 0 replies; 40+ messages in thread
From: Fenghua Yu @ 2019-06-11 17:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Tue, Jun 11, 2019 at 10:50:36AM +0200, Peter Zijlstra wrote:
> On Fri, Jun 07, 2019 at 03:00:34PM -0700, Fenghua Yu wrote:
> > umwait or tpause allows processor to enter a light-weight
> > power/performance optimized state (C0.1 state) or an improved
> > power/performance optimized state (C0.2 state) for a period
> > specified by the instruction or until the system time limit or until
> > a store to the monitored address range in umwait.
> > 
> > IA32_UMWAIT_CONTROL MSR register allows kernel to enable/disable C0.2
> > on the processor and set maximum time the processor can reside in
> > C0.1 or C0.2.
> > 
> > By default C0.2 is enabled so the user wait instructions can enter the
> > C0.2 state to save more power with slower wakeup time.
> > 
> > Default maximum umwait time is 100000 cycles. A later patch provides
> > a sysfs interface to adjust this value.
> > 
> > Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> > Reviewed-by: Ashok Raj <ashok.raj@intel.com>
> > Reviewed-by: Andy Lutomirski <luto@kernel.org>
> > ---
> >  arch/x86/include/asm/msr-index.h |  4 +++
> >  arch/x86/power/Makefile          |  1 +
> >  arch/x86/power/umwait.c          | 56 ++++++++++++++++++++++++++++++++
> 
> Why is this in power/, this isn't in the least related to
> suspend/hybernate. arch/x86/kernel/cpu/ might be a better place for
> instruction support.

Ok. I can move umwait.c to arch/x86/kernel/cpu/umwait.c.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-11 16:04     ` Andy Lutomirski
@ 2019-06-11 17:27       ` Peter Zijlstra
  2019-06-17 15:14         ` Andy Lutomirski
  0 siblings, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2019-06-11 17:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Andy Lutomirski, Ashok Raj, Tony Luck,
	Ravi V Shankar, linux-kernel, x86


(can you, perchance, look at a MUA that isn't 'broken' ?)

On Tue, Jun 11, 2019 at 09:04:30AM -0700, Andy Lutomirski wrote:
> 
> 
> > On Jun 11, 2019, at 1:54 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> >> On Fri, Jun 07, 2019 at 03:00:35PM -0700, Fenghua Yu wrote:
> >> C0.2 state in umwait and tpause instructions can be enabled or disabled
> >> on a processor through IA32_UMWAIT_CONTROL MSR register.
> >> 
> >> By default, C0.2 is enabled and the user wait instructions result in
> >> lower power consumption with slower wakeup time.
> >> 
> >> But in real time systems which require faster wakeup time although power
> >> savings could be smaller, the administrator needs to disable C0.2 and all
> >> C0.2 requests from user applications revert to C0.1.
> >> 
> >> A sysfs interface "/sys/devices/system/cpu/umwait_control/enable_c02" is
> >> created to allow the administrator to control C0.2 state during run time.
> > 
> > We already have an interface for applications to convey their latency
> > requirements (pm-qos). We do not need another magic sys variable.
> 
> I’m not sure I agree.  This isn’t an overall latency request, and
> setting an absurdly low pm_qos will badly hurt idle power and turbo
> performance.  Also, pm_qos isn’t exactly beautiful.
> 
> (I speak from some experience. I may be literally the only person to
> write a driver that listens to dev_pm_qos latency requests. And, in my
> production box, I directly disable c states instead of messing with
> pm_qos.)
> 
> I do wonder whether anyone will ever use this particular control, though.

I agree that pm-qos is pretty terrible; but that doesn't mean we should
just add random control files all over the place.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/5] x86/umwait: Enable user wait instructions
  2019-06-11  9:01 ` [PATCH v4 0/5] x86/umwait: Enable user wait instructions Peter Zijlstra
@ 2019-06-11 17:37   ` Fenghua Yu
  2019-06-17 14:19     ` Peter Zijlstra
  0 siblings, 1 reply; 40+ messages in thread
From: Fenghua Yu @ 2019-06-11 17:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Tue, Jun 11, 2019 at 11:01:45AM +0200, Peter Zijlstra wrote:
> On Fri, Jun 07, 2019 at 03:00:32PM -0700, Fenghua Yu wrote:
> > Today, if an application needs to wait for a very short duration
> > they have to have spinloops. Spinloops consume more power and continue
> > to use execution resources that could hurt its thread siblings in a core
> > with hyperthreads. New instructions umonitor, umwait and tpause allow
> > a low power alternative waiting at the same time could improve the HT
> > sibling perform while giving it any power headroom. These instructions
> > can be used in both user space and kernel space.
> > 
> > A new MSR IA32_UMWAIT_CONTROL allows kernel to set a time limit in
> > TSC-quanta that prevents user applications from waiting for a long time.
> > This allows applications to yield the CPU and the user application
> > should consider using other alternatives to wait.
> 
> I'm confused on the purpose of this control; what do we win by limiting
> this time?

In previous patches, there is no time limit (max time is 0 which means no
time limit).

Andy Lutomirski proposed to set the time limit:

https://lkml.org/lkml/2019/2/26/735

"So I propose setting the timeout to either 100 microseconds or 100k
"cycles" by default.  In the event someone determines that they save
materially more power or gets materially better performance with a
longer timeout, we can revisit the value."

Does it make sense?

> 
> >  .../ABI/testing/sysfs-devices-system-cpu      |  21 ++
> >  arch/x86/include/asm/cpufeatures.h            |   1 +
> >  arch/x86/include/asm/msr-index.h              |   4 +
> >  arch/x86/power/Makefile                       |   1 +
> >  arch/x86/power/umwait.c                       | 182 ++++++++++++++++++
> 
> You seem to miss the arch/x86/lib/delay.c change to use this fancy new
> stuff for udelay(). I'm thinking that's exactly what TPAUSE is good for.

There may be other places to use the instructions. But I think this
patch set just first enables basic functionalities. We can focus on how to
use the instructions in the future.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/5] x86/umwait: Initialize umwait control values
  2019-06-10  4:13     ` Fenghua Yu
  2019-06-10  4:27       ` Andy Lutomirski
@ 2019-06-11 20:46       ` Thomas Gleixner
  2019-06-17 20:46         ` Fenghua Yu
  1 sibling, 1 reply; 40+ messages in thread
From: Thomas Gleixner @ 2019-06-11 20:46 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Andy Lutomirski, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Ashok Raj, Tony Luck, Ravi V Shankar, linux-kernel, x86

On Sun, 9 Jun 2019, Fenghua Yu wrote:

> On Sat, Jun 08, 2019 at 03:52:42PM -0700, Andy Lutomirski wrote:
> > On Fri, Jun 7, 2019 at 3:10 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > >
> > > umwait or tpause allows processor to enter a light-weight
> > > power/performance optimized state (C0.1 state) or an improved
> > > power/performance optimized state (C0.2 state) for a period
> > > specified by the instruction or until the system time limit or until
> > > a store to the monitored address range in umwait.
> > >
> > > IA32_UMWAIT_CONTROL MSR register allows kernel to enable/disable C0.2
> > > on the processor and set maximum time the processor can reside in
> > > C0.1 or C0.2.
> > >
> > > By default C0.2 is enabled so the user wait instructions can enter the
> > > C0.2 state to save more power with slower wakeup time.
> > 
> > Sounds good, but:
> > 
> > > +#define MSR_IA32_UMWAIT_CONTROL_C02            BIT(0)
> > 
> > > +static u32 umwait_control_cached = 100000;
> > 
> > The code seems to disagree.
> 
> The definition of bit[0] is: C0.2 is disabled when bit[0]=1. So
> 100000 means C0.2 is enabled (and max time is 100000).

which is totally non obvious. If you have to encode the control bit, then
please make it explicit, i.e. mask out the disable bit in the initializer.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 0/5] x86/umwait: Enable user wait instructions
  2019-06-11 17:37   ` Fenghua Yu
@ 2019-06-17 14:19     ` Peter Zijlstra
  0 siblings, 0 replies; 40+ messages in thread
From: Peter Zijlstra @ 2019-06-17 14:19 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Tue, Jun 11, 2019 at 10:37:34AM -0700, Fenghua Yu wrote:
> On Tue, Jun 11, 2019 at 11:01:45AM +0200, Peter Zijlstra wrote:
> > On Fri, Jun 07, 2019 at 03:00:32PM -0700, Fenghua Yu wrote:
> > > Today, if an application needs to wait for a very short duration
> > > they have to have spinloops. Spinloops consume more power and continue
> > > to use execution resources that could hurt its thread siblings in a core
> > > with hyperthreads. New instructions umonitor, umwait and tpause allow
> > > a low power alternative waiting at the same time could improve the HT
> > > sibling perform while giving it any power headroom. These instructions
> > > can be used in both user space and kernel space.
> > > 
> > > A new MSR IA32_UMWAIT_CONTROL allows kernel to set a time limit in
> > > TSC-quanta that prevents user applications from waiting for a long time.
> > > This allows applications to yield the CPU and the user application
> > > should consider using other alternatives to wait.
> > 
> > I'm confused on the purpose of this control; what do we win by limiting
> > this time?
> 
> In previous patches, there is no time limit (max time is 0 which means no
> time limit).
> 
> Andy Lutomirski proposed to set the time limit:
> 
> https://lkml.org/lkml/2019/2/26/735
> 
> "So I propose setting the timeout to either 100 microseconds or 100k
> "cycles" by default.  In the event someone determines that they save
> materially more power or gets materially better performance with a
> longer timeout, we can revisit the value."
> 
> Does it make sense?

You quoted exactly the wrong part of that message; Andy's concern was
with NOHZ_FULL. And I think we should preserve that concern in both the
code and Changelog introducing this limit.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-11 17:27       ` Peter Zijlstra
@ 2019-06-17 15:14         ` Andy Lutomirski
  2019-06-17 18:11           ` Fenghua Yu
  0 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2019-06-17 15:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Andy Lutomirski, Ashok Raj, Tony Luck,
	Ravi V Shankar, linux-kernel, x86

On Tue, Jun 11, 2019 at 10:27 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
>
> (can you, perchance, look at a MUA that isn't 'broken' ?)
>
> On Tue, Jun 11, 2019 at 09:04:30AM -0700, Andy Lutomirski wrote:
> >
> >
> > > On Jun 11, 2019, at 1:54 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > >> On Fri, Jun 07, 2019 at 03:00:35PM -0700, Fenghua Yu wrote:
> > >> C0.2 state in umwait and tpause instructions can be enabled or disabled
> > >> on a processor through IA32_UMWAIT_CONTROL MSR register.
> > >>
> > >> By default, C0.2 is enabled and the user wait instructions result in
> > >> lower power consumption with slower wakeup time.
> > >>
> > >> But in real time systems which require faster wakeup time although power
> > >> savings could be smaller, the administrator needs to disable C0.2 and all
> > >> C0.2 requests from user applications revert to C0.1.
> > >>
> > >> A sysfs interface "/sys/devices/system/cpu/umwait_control/enable_c02" is
> > >> created to allow the administrator to control C0.2 state during run time.
> > >
> > > We already have an interface for applications to convey their latency
> > > requirements (pm-qos). We do not need another magic sys variable.
> >
> > I’m not sure I agree.  This isn’t an overall latency request, and
> > setting an absurdly low pm_qos will badly hurt idle power and turbo
> > performance.  Also, pm_qos isn’t exactly beautiful.
> >
> > (I speak from some experience. I may be literally the only person to
> > write a driver that listens to dev_pm_qos latency requests. And, in my
> > production box, I directly disable c states instead of messing with
> > pm_qos.)
> >
> > I do wonder whether anyone will ever use this particular control, though.
>
> I agree that pm-qos is pretty terrible; but that doesn't mean we should
> just add random control files all over the place.

I don't think pm-qos is expressive enough.  It seems entirely
reasonable to want to do a C0.1 wait for lower latency *while waiting*
but still want full power-saving idle when not waiting.

Do we even know what the C0.2 and C0.1 latencies are?  And why is this
thing an MSR instead of a flag passed to UMWAIT?

--Andy

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-17 15:14         ` Andy Lutomirski
@ 2019-06-17 18:11           ` Fenghua Yu
  0 siblings, 0 replies; 40+ messages in thread
From: Fenghua Yu @ 2019-06-17 18:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Mon, Jun 17, 2019 at 08:14:44AM -0700, Andy Lutomirski wrote:
> On Tue, Jun 11, 2019 at 10:27 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> >
> > (can you, perchance, look at a MUA that isn't 'broken' ?)
> >
> > On Tue, Jun 11, 2019 at 09:04:30AM -0700, Andy Lutomirski wrote:
> > >
> > >
> > > > On Jun 11, 2019, at 1:54 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > >> On Fri, Jun 07, 2019 at 03:00:35PM -0700, Fenghua Yu wrote:
> > > >> C0.2 state in umwait and tpause instructions can be enabled or disabled
> > > >> on a processor through IA32_UMWAIT_CONTROL MSR register.
> > > >>
> > > >> By default, C0.2 is enabled and the user wait instructions result in
> > > >> lower power consumption with slower wakeup time.
> > > >>
> > > >> But in real time systems which require faster wakeup time although power
> > > >> savings could be smaller, the administrator needs to disable C0.2 and all
> > > >> C0.2 requests from user applications revert to C0.1.
> > > >>
> > > >> A sysfs interface "/sys/devices/system/cpu/umwait_control/enable_c02" is
> > > >> created to allow the administrator to control C0.2 state during run time.
> > > >
> > > > We already have an interface for applications to convey their latency
> > > > requirements (pm-qos). We do not need another magic sys variable.
> > >
> > > I’m not sure I agree.  This isn’t an overall latency request, and
> > > setting an absurdly low pm_qos will badly hurt idle power and turbo
> > > performance.  Also, pm_qos isn’t exactly beautiful.
> > >
> > > (I speak from some experience. I may be literally the only person to
> > > write a driver that listens to dev_pm_qos latency requests. And, in my
> > > production box, I directly disable c states instead of messing with
> > > pm_qos.)
> > >
> > > I do wonder whether anyone will ever use this particular control, though.
> >
> > I agree that pm-qos is pretty terrible; but that doesn't mean we should
> > just add random control files all over the place.
> 
> I don't think pm-qos is expressive enough.  It seems entirely
> reasonable to want to do a C0.1 wait for lower latency *while waiting*
> but still want full power-saving idle when not waiting.
> 
> Do we even know what the C0.2 and C0.1 latencies are?  And why is this
> thing an MSR instead of a flag passed to UMWAIT?

I will still keep this sysfs interface in the next version of patches, right?

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-10 13:41           ` Andy Lutomirski
@ 2019-06-17 20:27             ` Fenghua Yu
  2019-06-17 23:02               ` Andy Lutomirski
  0 siblings, 1 reply; 40+ messages in thread
From: Fenghua Yu @ 2019-06-17 20:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Mon, Jun 10, 2019 at 06:41:31AM -0700, Andy Lutomirski wrote:
> 
> 
> > On Jun 9, 2019, at 11:02 PM, Fenghua Yu <fenghua.yu@intel.com> wrote:
> > 
> >> On Sun, Jun 09, 2019 at 09:24:18PM -0700, Andy Lutomirski wrote:
> >>> On Sun, Jun 9, 2019 at 9:02 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> >>> 
> >>>> On Sat, Jun 08, 2019 at 03:50:32PM -0700, Andy Lutomirski wrote:
> >>>>> On Fri, Jun 7, 2019 at 3:10 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> >>>>> 
> >>>>> C0.2 state in umwait and tpause instructions can be enabled or disabled
> >>>>> on a processor through IA32_UMWAIT_CONTROL MSR register.
> >>>>> 
> >>>>> By default, C0.2 is enabled and the user wait instructions result in
> >>>>> lower power consumption with slower wakeup time.
> >>>>> 
> >>>>> But in real time systems which require faster wakeup time although power
> >>>>> savings could be smaller, the administrator needs to disable C0.2 and all
> >>>>> C0.2 requests from user applications revert to C0.1.
> >>>>> 
> >>>>> A sysfs interface "/sys/devices/system/cpu/umwait_control/enable_c02" is
> >>>>> created to allow the administrator to control C0.2 state during run time.
> >>>> 
> >>>> This looks better than the previous version.  I think the locking is
> >>>> still rather confused.  You have a mutex that you hold while changing
> >>>> the value, which is entirely reasonable.  But, of the code paths that
> >>>> write the MSR, only one takes the mutex.
> >>>> 
> >>>> I think you should consider making a function that just does:
> >>>> 
> >>>> wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
> >>>> 
> >>>> and using it in all the places that update the MSR.  The only thing
> >>>> that should need the lock is the sysfs code to avoid accidentally
> >>>> corrupting the value, but that code should also use WRITE_ONCE to do
> >>>> its update.
> >>> 
> >>> Based on the comment, the illustrative CPU online and enable_c02 store
> >>> functions would be:
> >>> 
> >>> umwait_cpu_online()
> >>> {
> >>>        wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
> >>>        return 0;
> >>> }
> >>> 
> >>> enable_c02_store()
> >>> {
> >>>       mutex_lock(&umwait_lock);
> >>>       umwait_control_c02 = (u32)!c02_enabled;
> >>>       WRITE_ONCE(umwait_control_cached, 2 | get_umwait_control_max_time());
> >>>       on_each_cpu(umwait_control_msr_update, NULL, 1);
> >>>       mutex_unlock(&umwait_lock);
> >>> }
> >>> 
> >>> Then suppose umwait_control_cached = 100000 initially and only CPU0 is
> >>> running. Admin change bit 0 in MSR from 0 to 1 to disable C0.2 and is
> >>> onlining CPU1 in the same time:
> >>> 
> >>> 1. On CPU1, read umwait_control_cached to eax as 100000 in
> >>> umwait_cpu_online()
> >>> 2. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
> >>> 3. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()
> >>> 4. On CPU0, wrmsr with 100001 in enabled_c02_store()
> >>> 
> >>> The result is CPU0 and CPU1 have different MSR values.
> >> 
> >> Yes, but only transiently, because you didn't finish your example.
> >> 
> >> Step 5: enable_c02_store() does on_each_cpu(), and CPU 1 gets updated.
> > 
> > There is no sync on wrmsr on CPU0 and CPU1.
> 
> What do you mean by sync?
> 
> > So a better sequence to
> > describe the problem is changing the order of wrmsr:
> > 
> > 1. On CPU1, read umwait_control_cached to eax as 100000 in
> > umwait_cpu_online()
> > 2. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
> > 3. On CPU0, wrmsr with 100001 in on_each_cpu() in enabled_c02_store()
> > 4. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()
> > 
> > So CPU1 and CPU0 have different MSR values. This won't be transient.
> 
> You are still ignoring the wrmsr on CPU1 due to on_each_cpu().
> 

Initially umwait_control_cached is 100000 and CPU0 is online while CPU1
is going to be online:

1. On CPU1, cpu_online_mask=0x3 in start_secondary()
2. On CPU1, read umwait_control_cached to eax as 100000 in umwait_cpu_online()
3. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
4. On CPU0, execute one_each_cpu() in enabled_c02_store():
    wrmsr with 100001 on CPU0
    wrmsr with 100001 on CPU1
5. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()

So the MSR is 100000 on CPU1 and 100001 on CPU0. The MSRs are different on
the CPUs.

Is this a right sequence to demonstrate locking issue without the mutex
locking?

Thanks.

-Fenghua


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/5] x86/umwait: Initialize umwait control values
  2019-06-11 20:46       ` Thomas Gleixner
@ 2019-06-17 20:46         ` Fenghua Yu
  2019-06-18  5:43           ` Thomas Gleixner
  0 siblings, 1 reply; 40+ messages in thread
From: Fenghua Yu @ 2019-06-17 20:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Ashok Raj, Tony Luck, Ravi V Shankar, linux-kernel, x86

On Tue, Jun 11, 2019 at 10:46:55PM +0200, Thomas Gleixner wrote:
> On Sun, 9 Jun 2019, Fenghua Yu wrote:
> 
> > On Sat, Jun 08, 2019 at 03:52:42PM -0700, Andy Lutomirski wrote:
> > > On Fri, Jun 7, 2019 at 3:10 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > > >
> > > > umwait or tpause allows processor to enter a light-weight
> > > > power/performance optimized state (C0.1 state) or an improved
> > > > power/performance optimized state (C0.2 state) for a period
> > > > specified by the instruction or until the system time limit or until
> > > > a store to the monitored address range in umwait.
> > > >
> > > > IA32_UMWAIT_CONTROL MSR register allows kernel to enable/disable C0.2
> > > > on the processor and set maximum time the processor can reside in
> > > > C0.1 or C0.2.
> > > >
> > > > By default C0.2 is enabled so the user wait instructions can enter the
> > > > C0.2 state to save more power with slower wakeup time.
> > > 
> > > Sounds good, but:
> > > 
> > > > +#define MSR_IA32_UMWAIT_CONTROL_C02            BIT(0)
> > > 
> > > > +static u32 umwait_control_cached = 100000;
> > > 
> > > The code seems to disagree.
> > 
> > The definition of bit[0] is: C0.2 is disabled when bit[0]=1. So
> > 100000 means C0.2 is enabled (and max time is 100000).
> 
> which is totally non obvious. If you have to encode the control bit, then
> please make it explicit, i.e. mask out the disable bit in the initializer.

Is this right?

static u32 umwait_control_cached = 100000 & ~MSR_IA32_UMWAIT_CONTROL_C02_DISABLED;

Thanks.

-Fenghua


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-10  4:26       ` Andy Lutomirski
@ 2019-06-17 22:48         ` Fenghua Yu
  2019-06-17 22:59           ` Andy Lutomirski
  0 siblings, 1 reply; 40+ messages in thread
From: Fenghua Yu @ 2019-06-17 22:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Ashok Raj, Tony Luck, Ravi V Shankar, linux-kernel, x86

On Sun, Jun 09, 2019 at 09:26:29PM -0700, Andy Lutomirski wrote:
> On Sun, Jun 9, 2019 at 9:14 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> >
> > On Sat, Jun 08, 2019 at 03:52:03PM -0700, Andy Lutomirski wrote:
> > > On Fri, Jun 7, 2019 at 3:10 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > > >
> > > > C0.2 state in umwait and tpause instructions can be enabled or disabled
> > > > on a processor through IA32_UMWAIT_CONTROL MSR register.
> > > >
> > >
> > > > +static u32 get_umwait_control_c02(void)
> > > > +{
> > > > +       return umwait_control_cached & MSR_IA32_UMWAIT_CONTROL_C02;
> > > > +}
> > > > +
> > > > +static u32 get_umwait_control_max_time(void)
> > > > +{
> > > > +       return umwait_control_cached & MSR_IA32_UMWAIT_CONTROL_MAX_TIME;
> > > > +}
> > > > +
> > >
> > > I'm not convinced that these helpers make the code any more readable.
> >
> > The helpers reduce length of statements that call them. Otherwise, all of
> > the statements would be easily over 80 characters.
> >
> > Plus, each of the helpers is called multiple places in #0003 and #0004.
> > So the helpers make the patches smaller and cleaner.
> >
> 
> I was imagining things like:
> 
> umwait_control_cached &= ~MSR_IA32_UMWAIT_CONTROL_C02;
> if (whatever condition)
>   umwait_control_cached |= MSR_IA32_UMWAIT_CONTROL_C02;
> umwait_control_cached &= ~MSR_IA32_UMWAIT_CONTROL_MAX_TIME;
> umwait_control_cached |= new_max_time;

How about this statement?
With the helpers:
        umwait_control_cached = max_time | get_umwait_control_c02();
If there is no helpers, the above statement will need two statements:
	umwait_control_cached &= ~MSR_IA32_UMWAIT_CONTROL_MAX_TIME;
	umwait_control_cached |= max_time;

Another example:
With the helpers:
        if (umwait_control_c02 == get_umwait_control_c02())
If no helpers, the above statement will be long:
       if (umwait_control_c02 == (umwait_control_cached & MSR_IA32_UMWAIT_CONTROL_C02_DISABLED))

There are quite a few places like above examples.

The helpers can reduce the length of those long lines and make code more
readable and shorter, right?

Can I still keep the helpers?

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-17 22:59           ` Andy Lutomirski
@ 2019-06-17 22:51             ` Fenghua Yu
  0 siblings, 0 replies; 40+ messages in thread
From: Fenghua Yu @ 2019-06-17 22:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Ashok Raj, Tony Luck, Ravi V Shankar, linux-kernel, x86

On Mon, Jun 17, 2019 at 03:59:28PM -0700, Andy Lutomirski wrote:
> On Mon, Jun 17, 2019 at 3:57 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> >
> > On Sun, Jun 09, 2019 at 09:26:29PM -0700, Andy Lutomirski wrote:
> > > On Sun, Jun 9, 2019 at 9:14 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > > >
> > > > On Sat, Jun 08, 2019 at 03:52:03PM -0700, Andy Lutomirski wrote:
> > > > > On Fri, Jun 7, 2019 at 3:10 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > > > > >
> > > > > > C0.2 state in umwait and tpause instructions can be enabled or disabled
> > > > > > on a processor through IA32_UMWAIT_CONTROL MSR register.
> > > > > >
> > > > >
> > > > > > +static u32 get_umwait_control_c02(void)
> > > > > > +{
> > > > > > +       return umwait_control_cached & MSR_IA32_UMWAIT_CONTROL_C02;
> > > > > > +}
> > > > > > +
> > > > > > +static u32 get_umwait_control_max_time(void)
> > > > > > +{
> > > > > > +       return umwait_control_cached & MSR_IA32_UMWAIT_CONTROL_MAX_TIME;
> > > > > > +}
> > > > > > +
> > > > >
> > > > > I'm not convinced that these helpers make the code any more readable.
> > > >
> > > > The helpers reduce length of statements that call them. Otherwise, all of
> > > > the statements would be easily over 80 characters.
> > > >
> > > > Plus, each of the helpers is called multiple places in #0003 and #0004.
> > > > So the helpers make the patches smaller and cleaner.
> > > >
> > >
> > > I was imagining things like:
> > >
> > > umwait_control_cached &= ~MSR_IA32_UMWAIT_CONTROL_C02;
> > > if (whatever condition)
> > >   umwait_control_cached |= MSR_IA32_UMWAIT_CONTROL_C02;
> > > umwait_control_cached &= ~MSR_IA32_UMWAIT_CONTROL_MAX_TIME;
> > > umwait_control_cached |= new_max_time;
> >
> > How about this statement?
> > With the helpers:
> >         umwait_control_cached = max_time | get_umwait_control_c02();
> > If there is no helpers, the above statement will need two statements:
> >         umwait_control_cached &= ~MSR_IA32_UMWAIT_CONTROL_MAX_TIME;
> >         umwait_control_cached |= max_time;
> >
> > Another example:
> > With the helpers:
> >         if (umwait_control_c02 == get_umwait_control_c02())
> > If no helpers, the above statement will be long:
> >        if (umwait_control_c02 == (umwait_control_cached & MSR_IA32_UMWAIT_CONTROL_C02_DISABLED))
> >
> > There are quite a few places like above examples.
> >
> > The helpers can reduce the length of those long lines and make code more
> > readable and shorter, right?
> >
> > Can I still keep the helpers?
> >
> 
> Sure, unless someone else objects.

Thank you very much for your advice!

-Fenghua

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-17 22:48         ` Fenghua Yu
@ 2019-06-17 22:59           ` Andy Lutomirski
  2019-06-17 22:51             ` Fenghua Yu
  0 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2019-06-17 22:59 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Mon, Jun 17, 2019 at 3:57 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
>
> On Sun, Jun 09, 2019 at 09:26:29PM -0700, Andy Lutomirski wrote:
> > On Sun, Jun 9, 2019 at 9:14 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > >
> > > On Sat, Jun 08, 2019 at 03:52:03PM -0700, Andy Lutomirski wrote:
> > > > On Fri, Jun 7, 2019 at 3:10 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > > > >
> > > > > C0.2 state in umwait and tpause instructions can be enabled or disabled
> > > > > on a processor through IA32_UMWAIT_CONTROL MSR register.
> > > > >
> > > >
> > > > > +static u32 get_umwait_control_c02(void)
> > > > > +{
> > > > > +       return umwait_control_cached & MSR_IA32_UMWAIT_CONTROL_C02;
> > > > > +}
> > > > > +
> > > > > +static u32 get_umwait_control_max_time(void)
> > > > > +{
> > > > > +       return umwait_control_cached & MSR_IA32_UMWAIT_CONTROL_MAX_TIME;
> > > > > +}
> > > > > +
> > > >
> > > > I'm not convinced that these helpers make the code any more readable.
> > >
> > > The helpers reduce length of statements that call them. Otherwise, all of
> > > the statements would be easily over 80 characters.
> > >
> > > Plus, each of the helpers is called multiple places in #0003 and #0004.
> > > So the helpers make the patches smaller and cleaner.
> > >
> >
> > I was imagining things like:
> >
> > umwait_control_cached &= ~MSR_IA32_UMWAIT_CONTROL_C02;
> > if (whatever condition)
> >   umwait_control_cached |= MSR_IA32_UMWAIT_CONTROL_C02;
> > umwait_control_cached &= ~MSR_IA32_UMWAIT_CONTROL_MAX_TIME;
> > umwait_control_cached |= new_max_time;
>
> How about this statement?
> With the helpers:
>         umwait_control_cached = max_time | get_umwait_control_c02();
> If there is no helpers, the above statement will need two statements:
>         umwait_control_cached &= ~MSR_IA32_UMWAIT_CONTROL_MAX_TIME;
>         umwait_control_cached |= max_time;
>
> Another example:
> With the helpers:
>         if (umwait_control_c02 == get_umwait_control_c02())
> If no helpers, the above statement will be long:
>        if (umwait_control_c02 == (umwait_control_cached & MSR_IA32_UMWAIT_CONTROL_C02_DISABLED))
>
> There are quite a few places like above examples.
>
> The helpers can reduce the length of those long lines and make code more
> readable and shorter, right?
>
> Can I still keep the helpers?
>

Sure, unless someone else objects.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-17 20:27             ` Fenghua Yu
@ 2019-06-17 23:02               ` Andy Lutomirski
  2019-06-17 23:11                 ` Fenghua Yu
  0 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2019-06-17 23:02 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Mon, Jun 17, 2019 at 1:36 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
>
> On Mon, Jun 10, 2019 at 06:41:31AM -0700, Andy Lutomirski wrote:
> >
> >
> > > On Jun 9, 2019, at 11:02 PM, Fenghua Yu <fenghua.yu@intel.com> wrote:
> > >
> > >> On Sun, Jun 09, 2019 at 09:24:18PM -0700, Andy Lutomirski wrote:
> > >>> On Sun, Jun 9, 2019 at 9:02 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > >>>
> > >>>> On Sat, Jun 08, 2019 at 03:50:32PM -0700, Andy Lutomirski wrote:
> > >>>>> On Fri, Jun 7, 2019 at 3:10 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > >>>>>
> > >>>>> C0.2 state in umwait and tpause instructions can be enabled or disabled
> > >>>>> on a processor through IA32_UMWAIT_CONTROL MSR register.
> > >>>>>
> > >>>>> By default, C0.2 is enabled and the user wait instructions result in
> > >>>>> lower power consumption with slower wakeup time.
> > >>>>>
> > >>>>> But in real time systems which require faster wakeup time although power
> > >>>>> savings could be smaller, the administrator needs to disable C0.2 and all
> > >>>>> C0.2 requests from user applications revert to C0.1.
> > >>>>>
> > >>>>> A sysfs interface "/sys/devices/system/cpu/umwait_control/enable_c02" is
> > >>>>> created to allow the administrator to control C0.2 state during run time.
> > >>>>
> > >>>> This looks better than the previous version.  I think the locking is
> > >>>> still rather confused.  You have a mutex that you hold while changing
> > >>>> the value, which is entirely reasonable.  But, of the code paths that
> > >>>> write the MSR, only one takes the mutex.
> > >>>>
> > >>>> I think you should consider making a function that just does:
> > >>>>
> > >>>> wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
> > >>>>
> > >>>> and using it in all the places that update the MSR.  The only thing
> > >>>> that should need the lock is the sysfs code to avoid accidentally
> > >>>> corrupting the value, but that code should also use WRITE_ONCE to do
> > >>>> its update.
> > >>>
> > >>> Based on the comment, the illustrative CPU online and enable_c02 store
> > >>> functions would be:
> > >>>
> > >>> umwait_cpu_online()
> > >>> {
> > >>>        wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
> > >>>        return 0;
> > >>> }
> > >>>
> > >>> enable_c02_store()
> > >>> {
> > >>>       mutex_lock(&umwait_lock);
> > >>>       umwait_control_c02 = (u32)!c02_enabled;
> > >>>       WRITE_ONCE(umwait_control_cached, 2 | get_umwait_control_max_time());
> > >>>       on_each_cpu(umwait_control_msr_update, NULL, 1);
> > >>>       mutex_unlock(&umwait_lock);
> > >>> }
> > >>>
> > >>> Then suppose umwait_control_cached = 100000 initially and only CPU0 is
> > >>> running. Admin change bit 0 in MSR from 0 to 1 to disable C0.2 and is
> > >>> onlining CPU1 in the same time:
> > >>>
> > >>> 1. On CPU1, read umwait_control_cached to eax as 100000 in
> > >>> umwait_cpu_online()
> > >>> 2. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
> > >>> 3. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()
> > >>> 4. On CPU0, wrmsr with 100001 in enabled_c02_store()
> > >>>
> > >>> The result is CPU0 and CPU1 have different MSR values.
> > >>
> > >> Yes, but only transiently, because you didn't finish your example.
> > >>
> > >> Step 5: enable_c02_store() does on_each_cpu(), and CPU 1 gets updated.
> > >
> > > There is no sync on wrmsr on CPU0 and CPU1.
> >
> > What do you mean by sync?
> >
> > > So a better sequence to
> > > describe the problem is changing the order of wrmsr:
> > >
> > > 1. On CPU1, read umwait_control_cached to eax as 100000 in
> > > umwait_cpu_online()
> > > 2. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
> > > 3. On CPU0, wrmsr with 100001 in on_each_cpu() in enabled_c02_store()
> > > 4. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()
> > >
> > > So CPU1 and CPU0 have different MSR values. This won't be transient.
> >
> > You are still ignoring the wrmsr on CPU1 due to on_each_cpu().
> >
>
> Initially umwait_control_cached is 100000 and CPU0 is online while CPU1
> is going to be online:
>
> 1. On CPU1, cpu_online_mask=0x3 in start_secondary()
> 2. On CPU1, read umwait_control_cached to eax as 100000 in umwait_cpu_online()
> 3. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
> 4. On CPU0, execute one_each_cpu() in enabled_c02_store():
>     wrmsr with 100001 on CPU0
>     wrmsr with 100001 on CPU1
> 5. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()
>
> So the MSR is 100000 on CPU1 and 100001 on CPU0. The MSRs are different on
> the CPUs.
>
> Is this a right sequence to demonstrate locking issue without the mutex
> locking?
>

Fair enough.  I would fix it differently, though:

static void update_this_cpu_umwait_msr(void)
{
  WARN_ON_ONCE(!irqs_disabled());  /* or local_irq_save() */

  /* We need to prevent umwait_control from being changed *and*
completing its WRMSR between our read and our WRMSR.  By turning IRQs
off here, we ensure that no sysfs write happens on this CPU and we
also make sure that any concurrent sysfs write from a different CPU
will not finish updating us via IPI until we're done. */
  wrmsrl(MSR_..., READ_ONCE(umwait_control), 0);
}

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-17 23:02               ` Andy Lutomirski
@ 2019-06-17 23:11                 ` Fenghua Yu
  2019-06-17 23:41                   ` Andy Lutomirski
  0 siblings, 1 reply; 40+ messages in thread
From: Fenghua Yu @ 2019-06-17 23:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Ashok Raj, Tony Luck, Ravi V Shankar, linux-kernel, x86

On Mon, Jun 17, 2019 at 04:02:50PM -0700, Andy Lutomirski wrote:
> On Mon, Jun 17, 2019 at 1:36 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> >
> > On Mon, Jun 10, 2019 at 06:41:31AM -0700, Andy Lutomirski wrote:
> > >
> > >
> > > > On Jun 9, 2019, at 11:02 PM, Fenghua Yu <fenghua.yu@intel.com> wrote:
> > > >
> > > >> On Sun, Jun 09, 2019 at 09:24:18PM -0700, Andy Lutomirski wrote:
> > > >>> On Sun, Jun 9, 2019 at 9:02 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > > >>>
> > > >>>> On Sat, Jun 08, 2019 at 03:50:32PM -0700, Andy Lutomirski wrote:
> > > >>>>> On Fri, Jun 7, 2019 at 3:10 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > > >>>>>
> > > >>>>> C0.2 state in umwait and tpause instructions can be enabled or disabled
> > > >>>>> on a processor through IA32_UMWAIT_CONTROL MSR register.
> > > >>>>>
> > > >>>>> By default, C0.2 is enabled and the user wait instructions result in
> > > >>>>> lower power consumption with slower wakeup time.
> > > >>>>>
> > > >>>>> But in real time systems which require faster wakeup time although power
> > > >>>>> savings could be smaller, the administrator needs to disable C0.2 and all
> > > >>>>> C0.2 requests from user applications revert to C0.1.
> > > >>>>>
> > > >>>>> A sysfs interface "/sys/devices/system/cpu/umwait_control/enable_c02" is
> > > >>>>> created to allow the administrator to control C0.2 state during run time.
> > > >>>>
> > > >>>> This looks better than the previous version.  I think the locking is
> > > >>>> still rather confused.  You have a mutex that you hold while changing
> > > >>>> the value, which is entirely reasonable.  But, of the code paths that
> > > >>>> write the MSR, only one takes the mutex.
> > > >>>>
> > > >>>> I think you should consider making a function that just does:
> > > >>>>
> > > >>>> wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
> > > >>>>
> > > >>>> and using it in all the places that update the MSR.  The only thing
> > > >>>> that should need the lock is the sysfs code to avoid accidentally
> > > >>>> corrupting the value, but that code should also use WRITE_ONCE to do
> > > >>>> its update.
> > > >>>
> > > >>> Based on the comment, the illustrative CPU online and enable_c02 store
> > > >>> functions would be:
> > > >>>
> > > >>> umwait_cpu_online()
> > > >>> {
> > > >>>        wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
> > > >>>        return 0;
> > > >>> }
> > > >>>
> > > >>> enable_c02_store()
> > > >>> {
> > > >>>       mutex_lock(&umwait_lock);
> > > >>>       umwait_control_c02 = (u32)!c02_enabled;
> > > >>>       WRITE_ONCE(umwait_control_cached, 2 | get_umwait_control_max_time());
> > > >>>       on_each_cpu(umwait_control_msr_update, NULL, 1);
> > > >>>       mutex_unlock(&umwait_lock);
> > > >>> }
> > > >>>
> > > >>> Then suppose umwait_control_cached = 100000 initially and only CPU0 is
> > > >>> running. Admin change bit 0 in MSR from 0 to 1 to disable C0.2 and is
> > > >>> onlining CPU1 in the same time:
> > > >>>
> > > >>> 1. On CPU1, read umwait_control_cached to eax as 100000 in
> > > >>> umwait_cpu_online()
> > > >>> 2. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
> > > >>> 3. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()
> > > >>> 4. On CPU0, wrmsr with 100001 in enabled_c02_store()
> > > >>>
> > > >>> The result is CPU0 and CPU1 have different MSR values.
> > > >>
> > > >> Yes, but only transiently, because you didn't finish your example.
> > > >>
> > > >> Step 5: enable_c02_store() does on_each_cpu(), and CPU 1 gets updated.
> > > >
> > > > There is no sync on wrmsr on CPU0 and CPU1.
> > >
> > > What do you mean by sync?
> > >
> > > > So a better sequence to
> > > > describe the problem is changing the order of wrmsr:
> > > >
> > > > 1. On CPU1, read umwait_control_cached to eax as 100000 in
> > > > umwait_cpu_online()
> > > > 2. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
> > > > 3. On CPU0, wrmsr with 100001 in on_each_cpu() in enabled_c02_store()
> > > > 4. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()
> > > >
> > > > So CPU1 and CPU0 have different MSR values. This won't be transient.
> > >
> > > You are still ignoring the wrmsr on CPU1 due to on_each_cpu().
> > >
> >
> > Initially umwait_control_cached is 100000 and CPU0 is online while CPU1
> > is going to be online:
> >
> > 1. On CPU1, cpu_online_mask=0x3 in start_secondary()
> > 2. On CPU1, read umwait_control_cached to eax as 100000 in umwait_cpu_online()
> > 3. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
> > 4. On CPU0, execute one_each_cpu() in enabled_c02_store():
> >     wrmsr with 100001 on CPU0
> >     wrmsr with 100001 on CPU1
> > 5. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()
> >
> > So the MSR is 100000 on CPU1 and 100001 on CPU0. The MSRs are different on
> > the CPUs.
> >
> > Is this a right sequence to demonstrate locking issue without the mutex
> > locking?
> >
> 
> Fair enough.  I would fix it differently, though:
> 
> static void update_this_cpu_umwait_msr(void)
> {
>   WARN_ON_ONCE(!irqs_disabled());  /* or local_irq_save() */
> 
>   /* We need to prevent umwait_control from being changed *and*
> completing its WRMSR between our read and our WRMSR.  By turning IRQs
> off here, we ensure that no sysfs write happens on this CPU and we
> also make sure that any concurrent sysfs write from a different CPU
> will not finish updating us via IPI until we're done. */
>   wrmsrl(MSR_..., READ_ONCE(umwait_control), 0);
> }

If no other objections, then I will keep the current mutex lock/unlock to
protect wrmsr and the umwait_control_cached variable.

Thanks.

-Fenghua


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-17 23:11                 ` Fenghua Yu
@ 2019-06-17 23:41                   ` Andy Lutomirski
  2019-06-18  0:00                     ` Fenghua Yu
  0 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2019-06-17 23:41 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Mon, Jun 17, 2019 at 4:20 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
>
> On Mon, Jun 17, 2019 at 04:02:50PM -0700, Andy Lutomirski wrote:
> > On Mon, Jun 17, 2019 at 1:36 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > >
> > > On Mon, Jun 10, 2019 at 06:41:31AM -0700, Andy Lutomirski wrote:
> > > >
> > > >
> > > > > On Jun 9, 2019, at 11:02 PM, Fenghua Yu <fenghua.yu@intel.com> wrote:
> > > > >
> > > > >> On Sun, Jun 09, 2019 at 09:24:18PM -0700, Andy Lutomirski wrote:
> > > > >>> On Sun, Jun 9, 2019 at 9:02 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > > > >>>
> > > > >>>> On Sat, Jun 08, 2019 at 03:50:32PM -0700, Andy Lutomirski wrote:
> > > > >>>>> On Fri, Jun 7, 2019 at 3:10 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > > > >>>>>
> > > > >>>>> C0.2 state in umwait and tpause instructions can be enabled or disabled
> > > > >>>>> on a processor through IA32_UMWAIT_CONTROL MSR register.
> > > > >>>>>
> > > > >>>>> By default, C0.2 is enabled and the user wait instructions result in
> > > > >>>>> lower power consumption with slower wakeup time.
> > > > >>>>>
> > > > >>>>> But in real time systems which require faster wakeup time although power
> > > > >>>>> savings could be smaller, the administrator needs to disable C0.2 and all
> > > > >>>>> C0.2 requests from user applications revert to C0.1.
> > > > >>>>>
> > > > >>>>> A sysfs interface "/sys/devices/system/cpu/umwait_control/enable_c02" is
> > > > >>>>> created to allow the administrator to control C0.2 state during run time.
> > > > >>>>
> > > > >>>> This looks better than the previous version.  I think the locking is
> > > > >>>> still rather confused.  You have a mutex that you hold while changing
> > > > >>>> the value, which is entirely reasonable.  But, of the code paths that
> > > > >>>> write the MSR, only one takes the mutex.
> > > > >>>>
> > > > >>>> I think you should consider making a function that just does:
> > > > >>>>
> > > > >>>> wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
> > > > >>>>
> > > > >>>> and using it in all the places that update the MSR.  The only thing
> > > > >>>> that should need the lock is the sysfs code to avoid accidentally
> > > > >>>> corrupting the value, but that code should also use WRITE_ONCE to do
> > > > >>>> its update.
> > > > >>>
> > > > >>> Based on the comment, the illustrative CPU online and enable_c02 store
> > > > >>> functions would be:
> > > > >>>
> > > > >>> umwait_cpu_online()
> > > > >>> {
> > > > >>>        wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
> > > > >>>        return 0;
> > > > >>> }
> > > > >>>
> > > > >>> enable_c02_store()
> > > > >>> {
> > > > >>>       mutex_lock(&umwait_lock);
> > > > >>>       umwait_control_c02 = (u32)!c02_enabled;
> > > > >>>       WRITE_ONCE(umwait_control_cached, 2 | get_umwait_control_max_time());
> > > > >>>       on_each_cpu(umwait_control_msr_update, NULL, 1);
> > > > >>>       mutex_unlock(&umwait_lock);
> > > > >>> }
> > > > >>>
> > > > >>> Then suppose umwait_control_cached = 100000 initially and only CPU0 is
> > > > >>> running. Admin change bit 0 in MSR from 0 to 1 to disable C0.2 and is
> > > > >>> onlining CPU1 in the same time:
> > > > >>>
> > > > >>> 1. On CPU1, read umwait_control_cached to eax as 100000 in
> > > > >>> umwait_cpu_online()
> > > > >>> 2. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
> > > > >>> 3. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()
> > > > >>> 4. On CPU0, wrmsr with 100001 in enabled_c02_store()
> > > > >>>
> > > > >>> The result is CPU0 and CPU1 have different MSR values.
> > > > >>
> > > > >> Yes, but only transiently, because you didn't finish your example.
> > > > >>
> > > > >> Step 5: enable_c02_store() does on_each_cpu(), and CPU 1 gets updated.
> > > > >
> > > > > There is no sync on wrmsr on CPU0 and CPU1.
> > > >
> > > > What do you mean by sync?
> > > >
> > > > > So a better sequence to
> > > > > describe the problem is changing the order of wrmsr:
> > > > >
> > > > > 1. On CPU1, read umwait_control_cached to eax as 100000 in
> > > > > umwait_cpu_online()
> > > > > 2. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
> > > > > 3. On CPU0, wrmsr with 100001 in on_each_cpu() in enabled_c02_store()
> > > > > 4. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()
> > > > >
> > > > > So CPU1 and CPU0 have different MSR values. This won't be transient.
> > > >
> > > > You are still ignoring the wrmsr on CPU1 due to on_each_cpu().
> > > >
> > >
> > > Initially umwait_control_cached is 100000 and CPU0 is online while CPU1
> > > is going to be online:
> > >
> > > 1. On CPU1, cpu_online_mask=0x3 in start_secondary()
> > > 2. On CPU1, read umwait_control_cached to eax as 100000 in umwait_cpu_online()
> > > 3. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
> > > 4. On CPU0, execute one_each_cpu() in enabled_c02_store():
> > >     wrmsr with 100001 on CPU0
> > >     wrmsr with 100001 on CPU1
> > > 5. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()
> > >
> > > So the MSR is 100000 on CPU1 and 100001 on CPU0. The MSRs are different on
> > > the CPUs.
> > >
> > > Is this a right sequence to demonstrate locking issue without the mutex
> > > locking?
> > >
> >
> > Fair enough.  I would fix it differently, though:
> >
> > static void update_this_cpu_umwait_msr(void)
> > {
> >   WARN_ON_ONCE(!irqs_disabled());  /* or local_irq_save() */
> >
> >   /* We need to prevent umwait_control from being changed *and*
> > completing its WRMSR between our read and our WRMSR.  By turning IRQs
> > off here, we ensure that no sysfs write happens on this CPU and we
> > also make sure that any concurrent sysfs write from a different CPU
> > will not finish updating us via IPI until we're done. */
> >   wrmsrl(MSR_..., READ_ONCE(umwait_control), 0);
> > }
>
> If no other objections, then I will keep the current mutex lock/unlock to
> protect wrmsr and the umwait_control_cached variable.
>

I don't think that's sufficient.  In your current code, you hold the
mutex in some places and not in others, and there's no explanation.
And I think you're relying on the IRQs-off protection in at least one
code path already, so you're not gaining any simplicity.  At the very
least, you need to add some extensive comments everywhere if you want
to keep the mutex, but I think it's simpler and clearer if you just
use the same logic everywhere, for example, as I proposed above.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-17 23:41                   ` Andy Lutomirski
@ 2019-06-18  0:00                     ` Fenghua Yu
  2019-06-18  0:19                       ` Andy Lutomirski
  0 siblings, 1 reply; 40+ messages in thread
From: Fenghua Yu @ 2019-06-18  0:00 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Ashok Raj, Tony Luck, Ravi V Shankar, linux-kernel, x86

On Mon, Jun 17, 2019 at 04:41:38PM -0700, Andy Lutomirski wrote:
> On Mon, Jun 17, 2019 at 4:20 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> >
> > On Mon, Jun 17, 2019 at 04:02:50PM -0700, Andy Lutomirski wrote:
> > > On Mon, Jun 17, 2019 at 1:36 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > > >
> > > > On Mon, Jun 10, 2019 at 06:41:31AM -0700, Andy Lutomirski wrote:
> > > > >
> > > > >
> > > > > > On Jun 9, 2019, at 11:02 PM, Fenghua Yu <fenghua.yu@intel.com> wrote:
> > > > > >
> > > > > >> On Sun, Jun 09, 2019 at 09:24:18PM -0700, Andy Lutomirski wrote:
> > > > > >>> On Sun, Jun 9, 2019 at 9:02 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > > > > >>>
> > > > > >>>> On Sat, Jun 08, 2019 at 03:50:32PM -0700, Andy Lutomirski wrote:
> > > > > >>>>> On Fri, Jun 7, 2019 at 3:10 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > > > > >>>>>
> > > > > >>>>> C0.2 state in umwait and tpause instructions can be enabled or disabled
> > > > > >>>>> on a processor through IA32_UMWAIT_CONTROL MSR register.
> > > > > >>>>>
> > > > > >>>>> By default, C0.2 is enabled and the user wait instructions result in
> > > > > >>>>> lower power consumption with slower wakeup time.
> > > > > >>>>>
> > > > > >>>>> But in real time systems which require faster wakeup time although power
> > > > > >>>>> savings could be smaller, the administrator needs to disable C0.2 and all
> > > > > >>>>> C0.2 requests from user applications revert to C0.1.
> > > > > >>>>>
> > > > > >>>>> A sysfs interface "/sys/devices/system/cpu/umwait_control/enable_c02" is
> > > > > >>>>> created to allow the administrator to control C0.2 state during run time.
> > > > > >>>>
> > > > > >>>> This looks better than the previous version.  I think the locking is
> > > > > >>>> still rather confused.  You have a mutex that you hold while changing
> > > > > >>>> the value, which is entirely reasonable.  But, of the code paths that
> > > > > >>>> write the MSR, only one takes the mutex.
> > > > > >>>>
> > > > > >>>> I think you should consider making a function that just does:
> > > > > >>>>
> > > > > >>>> wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
> > > > > >>>>
> > > > > >>>> and using it in all the places that update the MSR.  The only thing
> > > > > >>>> that should need the lock is the sysfs code to avoid accidentally
> > > > > >>>> corrupting the value, but that code should also use WRITE_ONCE to do
> > > > > >>>> its update.
> > > > > >>>
> > > > > >>> Based on the comment, the illustrative CPU online and enable_c02 store
> > > > > >>> functions would be:
> > > > > >>>
> > > > > >>> umwait_cpu_online()
> > > > > >>> {
> > > > > >>>        wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
> > > > > >>>        return 0;
> > > > > >>> }
> > > > > >>>
> > > > > >>> enable_c02_store()
> > > > > >>> {
> > > > > >>>       mutex_lock(&umwait_lock);
> > > > > >>>       umwait_control_c02 = (u32)!c02_enabled;
> > > > > >>>       WRITE_ONCE(umwait_control_cached, 2 | get_umwait_control_max_time());
> > > > > >>>       on_each_cpu(umwait_control_msr_update, NULL, 1);
> > > > > >>>       mutex_unlock(&umwait_lock);
> > > > > >>> }
> > > > > >>>
> > > > > >>> Then suppose umwait_control_cached = 100000 initially and only CPU0 is
> > > > > >>> running. Admin change bit 0 in MSR from 0 to 1 to disable C0.2 and is
> > > > > >>> onlining CPU1 in the same time:
> > > > > >>>
> > > > > >>> 1. On CPU1, read umwait_control_cached to eax as 100000 in
> > > > > >>> umwait_cpu_online()
> > > > > >>> 2. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
> > > > > >>> 3. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()
> > > > > >>> 4. On CPU0, wrmsr with 100001 in enabled_c02_store()
> > > > > >>>
> > > > > >>> The result is CPU0 and CPU1 have different MSR values.
> > > > > >>
> > > > > >> Yes, but only transiently, because you didn't finish your example.
> > > > > >>
> > > > > >> Step 5: enable_c02_store() does on_each_cpu(), and CPU 1 gets updated.
> > > > > >
> > > > > > There is no sync on wrmsr on CPU0 and CPU1.
> > > > >
> > > > > What do you mean by sync?
> > > > >
> > > > > > So a better sequence to
> > > > > > describe the problem is changing the order of wrmsr:
> > > > > >
> > > > > > 1. On CPU1, read umwait_control_cached to eax as 100000 in
> > > > > > umwait_cpu_online()
> > > > > > 2. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
> > > > > > 3. On CPU0, wrmsr with 100001 in on_each_cpu() in enabled_c02_store()
> > > > > > 4. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()
> > > > > >
> > > > > > So CPU1 and CPU0 have different MSR values. This won't be transient.
> > > > >
> > > > > You are still ignoring the wrmsr on CPU1 due to on_each_cpu().
> > > > >
> > > >
> > > > Initially umwait_control_cached is 100000 and CPU0 is online while CPU1
> > > > is going to be online:
> > > >
> > > > 1. On CPU1, cpu_online_mask=0x3 in start_secondary()
> > > > 2. On CPU1, read umwait_control_cached to eax as 100000 in umwait_cpu_online()
> > > > 3. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
> > > > 4. On CPU0, execute one_each_cpu() in enabled_c02_store():
> > > >     wrmsr with 100001 on CPU0
> > > >     wrmsr with 100001 on CPU1
> > > > 5. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()
> > > >
> > > > So the MSR is 100000 on CPU1 and 100001 on CPU0. The MSRs are different on
> > > > the CPUs.
> > > >
> > > > Is this a right sequence to demonstrate locking issue without the mutex
> > > > locking?
> > > >
> > >
> > > Fair enough.  I would fix it differently, though:
> > >
> > > static void update_this_cpu_umwait_msr(void)
> > > {
> > >   WARN_ON_ONCE(!irqs_disabled());  /* or local_irq_save() */
> > >
> > >   /* We need to prevent umwait_control from being changed *and*
> > > completing its WRMSR between our read and our WRMSR.  By turning IRQs
> > > off here, we ensure that no sysfs write happens on this CPU and we
> > > also make sure that any concurrent sysfs write from a different CPU
> > > will not finish updating us via IPI until we're done. */
> > >   wrmsrl(MSR_..., READ_ONCE(umwait_control), 0);
> > > }
> >
> > If no other objections, then I will keep the current mutex lock/unlock to
> > protect wrmsr and the umwait_control_cached variable.
> >
> 
> I don't think that's sufficient.  In your current code, you hold the
> mutex in some places and not in others, and there's no explanation.

The mutex is used in sysfs writing and cpu online.

But it's not used in syscore resume because only BP is running syscore
resume.

> And I think you're relying on the IRQs-off protection in at least one
> code path already, so you're not gaining any simplicity. 

I don't rely on IRQs-off protection. I only use mutex to protect.

> At the very
> least, you need to add some extensive comments everywhere if you want
> to keep the mutex, 

I have comment on why no need for mutex protection in syscore resume. But
I can add more comments on the locking.

> but I think it's simpler and clearer if you just
> use the same logic everywhere, for example, as I proposed above.

But using irqs_disabled() before wrmsr() and READ_ONCE/WRITE_ONCE for
umwait_control_cached alone are not sufficient. The mutex is still needed
to protect sysfs writing, is that right? Without mutex, one_each_cpu()
can write different values on CPUs, right?

If irqs disabling, READ_ONCE/WRITE_ONCE, and mutex are all used to protect,
isn't that more complex than just using mutex?

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-18  0:00                     ` Fenghua Yu
@ 2019-06-18  0:19                       ` Andy Lutomirski
  2019-06-18  2:32                         ` Fenghua Yu
  0 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2019-06-18  0:19 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Mon, Jun 17, 2019 at 5:09 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
>
> On Mon, Jun 17, 2019 at 04:41:38PM -0700, Andy Lutomirski wrote:
> > On Mon, Jun 17, 2019 at 4:20 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > >
> > > On Mon, Jun 17, 2019 at 04:02:50PM -0700, Andy Lutomirski wrote:
> > > > On Mon, Jun 17, 2019 at 1:36 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > > > >
> > > > > On Mon, Jun 10, 2019 at 06:41:31AM -0700, Andy Lutomirski wrote:
> > > > > >
> > > > > >
> > > > > > > On Jun 9, 2019, at 11:02 PM, Fenghua Yu <fenghua.yu@intel.com> wrote:
> > > > > > >
> > > > > > >> On Sun, Jun 09, 2019 at 09:24:18PM -0700, Andy Lutomirski wrote:
> > > > > > >>> On Sun, Jun 9, 2019 at 9:02 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > > > > > >>>
> > > > > > >>>> On Sat, Jun 08, 2019 at 03:50:32PM -0700, Andy Lutomirski wrote:
> > > > > > >>>>> On Fri, Jun 7, 2019 at 3:10 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > > > > > >>>>>
> > > > > > >>>>> C0.2 state in umwait and tpause instructions can be enabled or disabled
> > > > > > >>>>> on a processor through IA32_UMWAIT_CONTROL MSR register.
> > > > > > >>>>>
> > > > > > >>>>> By default, C0.2 is enabled and the user wait instructions result in
> > > > > > >>>>> lower power consumption with slower wakeup time.
> > > > > > >>>>>
> > > > > > >>>>> But in real time systems which require faster wakeup time although power
> > > > > > >>>>> savings could be smaller, the administrator needs to disable C0.2 and all
> > > > > > >>>>> C0.2 requests from user applications revert to C0.1.
> > > > > > >>>>>
> > > > > > >>>>> A sysfs interface "/sys/devices/system/cpu/umwait_control/enable_c02" is
> > > > > > >>>>> created to allow the administrator to control C0.2 state during run time.
> > > > > > >>>>
> > > > > > >>>> This looks better than the previous version.  I think the locking is
> > > > > > >>>> still rather confused.  You have a mutex that you hold while changing
> > > > > > >>>> the value, which is entirely reasonable.  But, of the code paths that
> > > > > > >>>> write the MSR, only one takes the mutex.
> > > > > > >>>>
> > > > > > >>>> I think you should consider making a function that just does:
> > > > > > >>>>
> > > > > > >>>> wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
> > > > > > >>>>
> > > > > > >>>> and using it in all the places that update the MSR.  The only thing
> > > > > > >>>> that should need the lock is the sysfs code to avoid accidentally
> > > > > > >>>> corrupting the value, but that code should also use WRITE_ONCE to do
> > > > > > >>>> its update.
> > > > > > >>>
> > > > > > >>> Based on the comment, the illustrative CPU online and enable_c02 store
> > > > > > >>> functions would be:
> > > > > > >>>
> > > > > > >>> umwait_cpu_online()
> > > > > > >>> {
> > > > > > >>>        wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
> > > > > > >>>        return 0;
> > > > > > >>> }
> > > > > > >>>
> > > > > > >>> enable_c02_store()
> > > > > > >>> {
> > > > > > >>>       mutex_lock(&umwait_lock);
> > > > > > >>>       umwait_control_c02 = (u32)!c02_enabled;
> > > > > > >>>       WRITE_ONCE(umwait_control_cached, 2 | get_umwait_control_max_time());
> > > > > > >>>       on_each_cpu(umwait_control_msr_update, NULL, 1);
> > > > > > >>>       mutex_unlock(&umwait_lock);
> > > > > > >>> }
> > > > > > >>>
> > > > > > >>> Then suppose umwait_control_cached = 100000 initially and only CPU0 is
> > > > > > >>> running. Admin change bit 0 in MSR from 0 to 1 to disable C0.2 and is
> > > > > > >>> onlining CPU1 in the same time:
> > > > > > >>>
> > > > > > >>> 1. On CPU1, read umwait_control_cached to eax as 100000 in
> > > > > > >>> umwait_cpu_online()
> > > > > > >>> 2. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
> > > > > > >>> 3. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()
> > > > > > >>> 4. On CPU0, wrmsr with 100001 in enabled_c02_store()
> > > > > > >>>
> > > > > > >>> The result is CPU0 and CPU1 have different MSR values.
> > > > > > >>
> > > > > > >> Yes, but only transiently, because you didn't finish your example.
> > > > > > >>
> > > > > > >> Step 5: enable_c02_store() does on_each_cpu(), and CPU 1 gets updated.
> > > > > > >
> > > > > > > There is no sync on wrmsr on CPU0 and CPU1.
> > > > > >
> > > > > > What do you mean by sync?
> > > > > >
> > > > > > > So a better sequence to
> > > > > > > describe the problem is changing the order of wrmsr:
> > > > > > >
> > > > > > > 1. On CPU1, read umwait_control_cached to eax as 100000 in
> > > > > > > umwait_cpu_online()
> > > > > > > 2. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
> > > > > > > 3. On CPU0, wrmsr with 100001 in on_each_cpu() in enabled_c02_store()
> > > > > > > 4. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()
> > > > > > >
> > > > > > > So CPU1 and CPU0 have different MSR values. This won't be transient.
> > > > > >
> > > > > > You are still ignoring the wrmsr on CPU1 due to on_each_cpu().
> > > > > >
> > > > >
> > > > > Initially umwait_control_cached is 100000 and CPU0 is online while CPU1
> > > > > is going to be online:
> > > > >
> > > > > 1. On CPU1, cpu_online_mask=0x3 in start_secondary()
> > > > > 2. On CPU1, read umwait_control_cached to eax as 100000 in umwait_cpu_online()
> > > > > 3. On CPU0, write 100001 to umwait_control_cached in enable_c02_store()
> > > > > 4. On CPU0, execute one_each_cpu() in enabled_c02_store():
> > > > >     wrmsr with 100001 on CPU0
> > > > >     wrmsr with 100001 on CPU1
> > > > > 5. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online()
> > > > >
> > > > > So the MSR is 100000 on CPU1 and 100001 on CPU0. The MSRs are different on
> > > > > the CPUs.
> > > > >
> > > > > Is this a right sequence to demonstrate locking issue without the mutex
> > > > > locking?
> > > > >
> > > >
> > > > Fair enough.  I would fix it differently, though:
> > > >
> > > > static void update_this_cpu_umwait_msr(void)
> > > > {
> > > >   WARN_ON_ONCE(!irqs_disabled());  /* or local_irq_save() */
> > > >
> > > >   /* We need to prevent umwait_control from being changed *and*
> > > > completing its WRMSR between our read and our WRMSR.  By turning IRQs
> > > > off here, we ensure that no sysfs write happens on this CPU and we
> > > > also make sure that any concurrent sysfs write from a different CPU
> > > > will not finish updating us via IPI until we're done. */
> > > >   wrmsrl(MSR_..., READ_ONCE(umwait_control), 0);
> > > > }
> > >
> > > If no other objections, then I will keep the current mutex lock/unlock to
> > > protect wrmsr and the umwait_control_cached variable.
> > >
> >
> > I don't think that's sufficient.  In your current code, you hold the
> > mutex in some places and not in others, and there's no explanation.
>
> The mutex is used in sysfs writing and cpu online.
>
> But it's not used in syscore resume because only BP is running syscore
> resume.
>
> > And I think you're relying on the IRQs-off protection in at least one
> > code path already, so you're not gaining any simplicity.
>
> I don't rely on IRQs-off protection. I only use mutex to protect.

You're relying on being single-threaded in umwait_syscore_resume().
Do you actually know that's safe?  You say it's because you're single
threaded, but what if you were suspended in the middle of a sysfs
operation?  I think it's fine, but it needs an argument along the
lines of the argument for why the irqs disabled case is okay.

>
> > At the very
> > least, you need to add some extensive comments everywhere if you want
> > to keep the mutex,
>
> I have comment on why no need for mutex protection in syscore resume. But
> I can add more comments on the locking.
>
> > but I think it's simpler and clearer if you just
> > use the same logic everywhere, for example, as I proposed above.
>
> But using irqs_disabled() before wrmsr() and READ_ONCE/WRITE_ONCE for
> umwait_control_cached alone are not sufficient. The mutex is still needed
> to protect sysfs writing, is that right? Without mutex, one_each_cpu()
> can write different values on CPUs, right?

Yes, you probably need a mutex to prevent two sysfs writers from
clobbering each other.

>
> If irqs disabling, READ_ONCE/WRITE_ONCE, and mutex are all used to protect,
> isn't that more complex than just using mutex?

But you're already using a mutex and a comment.  And you're hoping
that the syscore resume callback reads something sensible despite the
lack of READ_ONCE / WRITE_ONCE.  The compiler is unlikely to butcher
this too badly, but still.

--Andy

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-18  0:19                       ` Andy Lutomirski
@ 2019-06-18  2:32                         ` Fenghua Yu
  0 siblings, 0 replies; 40+ messages in thread
From: Fenghua Yu @ 2019-06-18  2:32 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Ashok Raj, Tony Luck, Ravi V Shankar, linux-kernel, x86

On Mon, Jun 17, 2019 at 05:19:02PM -0700, Andy Lutomirski wrote:
> On Mon, Jun 17, 2019 at 5:09 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> But you're already using a mutex and a comment.  And you're hoping
> that the syscore resume callback reads something sensible despite the
> lack of READ_ONCE / WRITE_ONCE.  The compiler is unlikely to butcher
> this too badly, but still.

You are right, syscore_resume will be wrong if suspend in middle of
sysfs writing.

Ok. I change this patch based on your proposed locking. Is this patch
right? Should I use WRITE_ONCE/READ_ONCE on each access of
umwait_control_cached?

Thanks.

-Fenghua

diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c
index 9594af9f657e..d17572605c1a 100644
--- a/arch/x86/power/umwait.c
+++ b/arch/x86/power/umwait.c
@@ -11,10 +11,34 @@
  */
 static u32 umwait_control_cached = 100000 & ~MSR_IA32_UMWAIT_CONTROL_C02_DISABLED;
 
+/*
+ * Serialize access to umwait_control_cached and IA32_UMWAIT_CONTROL MSR
+ * in writing sysfs to ensure all CPUs have the same MSR value.
+ */
+static DEFINE_MUTEX(umwait_lock);
+
+static void update_this_cpu_umwait_control_msr(void)
+{
+	unsigned long flags;
+
+	/*
+	 * We need to prevent umwait_control_cached from being changed *and*
+	 * completing its WRMSR between our read and our WRMSR. By turning
+	 * IRQs off here, ensure that no sysfs write happens on this CPU
+	 * and we also make sure that any concurrent sysfs write from a
+	 * different CPU will not finish updating us via IPI until we're done.
+	 */
+	local_irq_save(flags);
+
+	wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
+
+	local_irq_restore(flags);
+}
+
 /* Set up IA32_UMWAIT_CONTROL MSR on CPU using the current global setting. */
 static int umwait_cpu_online(unsigned int cpu)
 {
-	wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_control_cached, 0);
+	update_this_cpu_umwait_control_msr();
 
 	return 0;
 }
@@ -30,24 +54,102 @@ static int umwait_cpu_online(unsigned int cpu)
  */
 static void umwait_syscore_resume(void)
 {
-	wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_control_cached, 0);
+	update_this_cpu_umwait_control_msr();
 }
 
 static struct syscore_ops umwait_syscore_ops = {
 	.resume	= umwait_syscore_resume,
 };
 
+static void umwait_control_msr_update(void *unused)
+{
+	update_this_cpu_umwait_control_msr();
+}
+
+static u32 get_umwait_control_c02(void)
+{
+	return READ_ONCE(umwait_control_cached) & MSR_IA32_UMWAIT_CONTROL_C02_DISABLED;
+}
+
+static u32 get_umwait_control_max_time(void)
+{
+	return READ_ONCE(umwait_control_cached) & MSR_IA32_UMWAIT_CONTROL_MAX_TIME;
+}
+
+static ssize_t
+enable_c02_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	 /*
+	  * When bit 0 in IA32_UMWAIT_CONTROL MSR is 1, C0.2 is disabled.
+	  * Otherwise, C0.2 is enabled. Show the opposite of bit 0.
+	  */
+	return sprintf(buf, "%d\n", !(bool)get_umwait_control_c02());
+}
+
+static ssize_t enable_c02_store(struct device *dev,
+				struct device_attribute *attr,
+				const char *buf, size_t count)
+{
+	u32 umwait_control_c02;
+	bool c02_enabled;
+	int ret;
+
+	ret = kstrtobool(buf, &c02_enabled);
+	if (ret)
+		return ret;
+
+	mutex_lock(&umwait_lock);
+
+	/*
+	 * The value of bit 0 in IA32_UMWAIT_CONTROL MSR is opposite of
+	 * c02_enabled.
+	 */
+	umwait_control_c02 = (u32)!c02_enabled;
+	if (umwait_control_c02 == get_umwait_control_c02())
+		goto out_unlock;
+
+	WRITE_ONCE(umwait_control_cached, umwait_control_c02 | get_umwait_control_max_time());
+	/* Enable/disable C0.2 state on all CPUs */
+	on_each_cpu(umwait_control_msr_update, NULL, 1);
+
+out_unlock:
+	mutex_unlock(&umwait_lock);
+
+	return count;
+}
+static DEVICE_ATTR_RW(enable_c02);
+
+static struct attribute *umwait_attrs[] = {
+	&dev_attr_enable_c02.attr,
+	NULL
+};
+
+static struct attribute_group umwait_attr_group = {
+	.attrs = umwait_attrs,
+	.name = "umwait_control",
+};
+
 static int __init umwait_init(void)
 {
+	struct device *dev;
 	int ret;
 
 	if (!boot_cpu_has(X86_FEATURE_WAITPKG))
 		return -ENODEV;
 
+	/* Add umwait control interface. */
+	dev = cpu_subsys.dev_root;
+	ret = sysfs_create_group(&dev->kobj, &umwait_attr_group);
+	if (ret)
+		return ret;
+
 	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait/intel:online",
 				umwait_cpu_online, NULL);
-	if (ret < 0)
+	if (ret < 0) {
+		sysfs_remove_group(&dev->kobj, &umwait_attr_group);
+
 		return ret;
+	}
 
 	register_syscore_ops(&umwait_syscore_ops);
 
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/5] x86/umwait: Initialize umwait control values
  2019-06-17 20:46         ` Fenghua Yu
@ 2019-06-18  5:43           ` Thomas Gleixner
  0 siblings, 0 replies; 40+ messages in thread
From: Thomas Gleixner @ 2019-06-18  5:43 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Andy Lutomirski, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Ashok Raj, Tony Luck, Ravi V Shankar, linux-kernel, x86

On Mon, 17 Jun 2019, Fenghua Yu wrote:
> On Tue, Jun 11, 2019 at 10:46:55PM +0200, Thomas Gleixner wrote:
> > On Sun, 9 Jun 2019, Fenghua Yu wrote:
> > > > Sounds good, but:
> > > > 
> > > > > +#define MSR_IA32_UMWAIT_CONTROL_C02            BIT(0)
> > > > 
> > > > > +static u32 umwait_control_cached = 100000;
> > > > 
> > > > The code seems to disagree.
> > > 
> > > The definition of bit[0] is: C0.2 is disabled when bit[0]=1. So
> > > 100000 means C0.2 is enabled (and max time is 100000).
> > 
> > which is totally non obvious. If you have to encode the control bit, then
> > please make it explicit, i.e. mask out the disable bit in the initializer.
> 
> Is this right?
> 
> static u32 umwait_control_cached = 100000 & ~MSR_IA32_UMWAIT_CONTROL_C02_DISABLED;

Works, but looks pretty odd. I'd rather create an explicit initializer
macro, something like:

    	   UMWAIT_CTRL_VAL(100000, UMWAIT_DISABLED);

Hmm?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2019-06-18  7:05 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-07 22:00 [PATCH v4 0/5] x86/umwait: Enable user wait instructions Fenghua Yu
2019-06-07 22:00 ` [PATCH v4 1/5] x86/cpufeatures: Enumerate " Fenghua Yu
2019-06-07 22:00 ` [PATCH v4 2/5] x86/umwait: Initialize umwait control values Fenghua Yu
2019-06-08 22:52   ` Andy Lutomirski
2019-06-10  4:13     ` Fenghua Yu
2019-06-10  4:27       ` Andy Lutomirski
2019-06-11 20:46       ` Thomas Gleixner
2019-06-17 20:46         ` Fenghua Yu
2019-06-18  5:43           ` Thomas Gleixner
2019-06-11  8:50   ` Peter Zijlstra
2019-06-11 17:04     ` Fenghua Yu
2019-06-07 22:00 ` [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state Fenghua Yu
2019-06-08 22:50   ` Andy Lutomirski
2019-06-10  3:53     ` Fenghua Yu
2019-06-10  4:24       ` Andy Lutomirski
2019-06-10  6:02         ` Fenghua Yu
2019-06-10 13:41           ` Andy Lutomirski
2019-06-17 20:27             ` Fenghua Yu
2019-06-17 23:02               ` Andy Lutomirski
2019-06-17 23:11                 ` Fenghua Yu
2019-06-17 23:41                   ` Andy Lutomirski
2019-06-18  0:00                     ` Fenghua Yu
2019-06-18  0:19                       ` Andy Lutomirski
2019-06-18  2:32                         ` Fenghua Yu
2019-06-08 22:52   ` Andy Lutomirski
2019-06-10  4:04     ` Fenghua Yu
2019-06-10  4:26       ` Andy Lutomirski
2019-06-17 22:48         ` Fenghua Yu
2019-06-17 22:59           ` Andy Lutomirski
2019-06-17 22:51             ` Fenghua Yu
2019-06-11  8:54   ` Peter Zijlstra
2019-06-11 16:04     ` Andy Lutomirski
2019-06-11 17:27       ` Peter Zijlstra
2019-06-17 15:14         ` Andy Lutomirski
2019-06-17 18:11           ` Fenghua Yu
2019-06-07 22:00 ` [PATCH v4 4/5] x86/umwait: Add sysfs interface to control umwait maximum time Fenghua Yu
2019-06-07 22:00 ` [PATCH v4 5/5] x86/umwait: Document umwait control sysfs interfaces Fenghua Yu
2019-06-11  9:01 ` [PATCH v4 0/5] x86/umwait: Enable user wait instructions Peter Zijlstra
2019-06-11 17:37   ` Fenghua Yu
2019-06-17 14:19     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).