All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/5] x86/umwait: Enable user wait instructions
@ 2019-05-24 23:55 Fenghua Yu
  2019-05-24 23:55 ` [PATCH v3 1/5] x86/cpufeatures: Enumerate " Fenghua Yu
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Fenghua Yu @ 2019-05-24 23:55 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Andrew Cooper, Ashok Raj, Tony Luck,
	Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

Today, if an application needs to wait for a very short duration
they have to have spinloops. Spinloops consume more power and continue
to use execution resources that could hurt its thread siblings in a core
with hyperthreads. New instructions umonitor, umwait and tpause allow
a low power alternative waiting at the same time could improve the HT
sibling perform while giving it any power headroom. These instructions
can be used in both user space and kernel space.

A new MSR IA32_UMWAIT_CONTROL allows kernel to set a time limit in
TSC-quanta that prevents user applications from waiting for a long time.
This allows applications to yield the CPU and the user application
should consider using other alternatives to wait.

The processor supports two levels of optimized states: a light-weight
power/performance optimized state (C0.1 state) or an improved
power/performance optimized state (C0.2 state with deeper power saving
and higher exit latency). The above MSR can be used to restrict
entry to C0.2 and then any request for C0.2 will revert to C0.1.

This patch set covers feature discovery, provides initial values for
the MSR, adds some sysfs control files for admin to tweak the values
in the MSR if needed.

The sysfs interface files are in /sys/devices/system/cpu/umwait_control/

GCC 9 enables intrinsics for the instructions. To use the instructions,
user applications should include <immintrin.h> and be compiled with
-mwaitpkg.

Detailed information on the instructions, the MSR, and syntax of the
intrinsics can be found in the latest Intel Architecture Instruction
Set Extensions and Future Features Programming Reference and Intel 64
and IA-32 Architectures Software Developer's Manual.

Changelog:
v3:
Address issues pointed out by Andy Lutomirski:
- Change default umwait max time to 100k TSC cycles
- Setting up MSR on BSP during resume suspend/hibernation
- A few other naming and coding changes as suggested
- Some security concerns of the user wait instructions are not issues
of the patches and cannot be addressed in the patch set. They will be
discussed on lkml.

Plus:
- Add ABI document entry for umwait control sysfs interfaces

v2:
- Address comments from Thomas Gleixner and Andy Lutomirski
- Remove vDSO functions
- Add sysfs control file for umwait max time

v1:
Based on comments from Thomas:
- Change user APIs to vDSO functions
- Changed sysfs per comments from Thomas.
- Change patch descriptions etc

Fenghua Yu (5):
  x86/cpufeatures: Enumerate user wait instructions
  x86/umwait: Initialize umwait control values
  x86/umwait: Add sysfs interface to control umwait C0.2 state
  x86/umwait: Add sysfs interface to control umwait maximum time
  x86/umwait: Document umwait control sysfs interfaces

 .../ABI/testing/sysfs-devices-system-cpu      |  21 ++
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/msr-index.h              |   4 +
 arch/x86/power/Makefile                       |   1 +
 arch/x86/power/umwait.c                       | 179 ++++++++++++++++++
 5 files changed, 206 insertions(+)
 create mode 100644 arch/x86/power/umwait.c

-- 
2.19.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 1/5] x86/cpufeatures: Enumerate user wait instructions
  2019-05-24 23:55 [PATCH v3 0/5] x86/umwait: Enable user wait instructions Fenghua Yu
@ 2019-05-24 23:55 ` Fenghua Yu
  2019-05-30 14:37   ` Andy Lutomirski
  2019-05-24 23:55 ` [PATCH v3 2/5] x86/umwait: Initialize umwait control values Fenghua Yu
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Fenghua Yu @ 2019-05-24 23:55 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Andrew Cooper, Ashok Raj, Tony Luck,
	Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

umonitor, umwait, and tpause are a set of user wait instructions.

umonitor arms address monitoring hardware using an address. The
address range is determined by using CPUID.0x5. A store to
an address within the specified address range triggers the
monitoring hardware to wake up the processor waiting in umwait.

umwait instructs the processor to enter an implementation-dependent
optimized state while monitoring a range of addresses. The optimized
state may be either a light-weight power/performance optimized state
(C0.1 state) or an improved power/performance optimized state
(C0.2 state).

tpause instructs the processor to enter an implementation-dependent
optimized state C0.1 or C0.2 state and wake up when time-stamp counter
reaches specified timeout.

The three instructions may be executed at any privilege level.

The instructions provide power saving method while waiting in
user space. Additionally, they can allow a sibling hyperthread to
make faster progress while this thread is waiting. One example of an
application usage of umwait is when waiting for input data from another
application, such as a user level multi-threaded packet processing
engine.

Availability of the user wait instructions is indicated by the presence
of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5].

Detailed information on the instructions and CPUID feature WAITPKG flag
can be found in the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference and Intel 64 and IA-32
Architectures Software Developer's Manual.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 75f27ee2c263..b8bd428ae5bc 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -322,6 +322,7 @@
 #define X86_FEATURE_UMIP		(16*32+ 2) /* User Mode Instruction Protection */
 #define X86_FEATURE_PKU			(16*32+ 3) /* Protection Keys for Userspace */
 #define X86_FEATURE_OSPKE		(16*32+ 4) /* OS Protection Keys Enable */
+#define X86_FEATURE_WAITPKG		(16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */
 #define X86_FEATURE_AVX512_VBMI2	(16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */
 #define X86_FEATURE_GFNI		(16*32+ 8) /* Galois Field New Instructions */
 #define X86_FEATURE_VAES		(16*32+ 9) /* Vector AES */
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/5] x86/umwait: Initialize umwait control values
  2019-05-24 23:55 [PATCH v3 0/5] x86/umwait: Enable user wait instructions Fenghua Yu
  2019-05-24 23:55 ` [PATCH v3 1/5] x86/cpufeatures: Enumerate " Fenghua Yu
@ 2019-05-24 23:55 ` Fenghua Yu
  2019-05-30 21:05   ` Andy Lutomirski
  2019-05-24 23:56 ` [PATCH v3 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state Fenghua Yu
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Fenghua Yu @ 2019-05-24 23:55 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Andrew Cooper, Ashok Raj, Tony Luck,
	Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

umwait or tpause allows processor to enter a light-weight
power/performance optimized state (C0.1 state) or an improved
power/performance optimized state (C0.2 state) for a period
specified by the instruction or until the system time limit or until
a store to the monitored address range in umwait.

IA32_UMWAIT_CONTROL MSR register allows kernel to enable/disable C0.2
on the processor and set maximum time the processor can reside in
C0.1 or C0.2.

By default C0.2 is enabled so the user wait instructions can enter the
C0.2 state to save more power with slower wakeup time.

Default maximum umwait time is 100000 cycles. A later patch provides
a sysfs interface to adjust this value.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/include/asm/msr-index.h |  4 ++
 arch/x86/power/Makefile          |  1 +
 arch/x86/power/umwait.c          | 75 ++++++++++++++++++++++++++++++++
 3 files changed, 80 insertions(+)
 create mode 100644 arch/x86/power/umwait.c

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 979ef971cc78..af502e947298 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -61,6 +61,10 @@
 #define MSR_PLATFORM_INFO_CPUID_FAULT_BIT	31
 #define MSR_PLATFORM_INFO_CPUID_FAULT		BIT_ULL(MSR_PLATFORM_INFO_CPUID_FAULT_BIT)
 
+#define MSR_IA32_UMWAIT_CONTROL			0xe1
+#define MSR_IA32_UMWAIT_CONTROL_C02		BIT(0)
+#define MSR_IA32_UMWAIT_CONTROL_MAX_TIME	0xfffffffc
+
 #define MSR_PKG_CST_CONFIG_CONTROL	0x000000e2
 #define NHM_C3_AUTO_DEMOTE		(1UL << 25)
 #define NHM_C1_AUTO_DEMOTE		(1UL << 26)
diff --git a/arch/x86/power/Makefile b/arch/x86/power/Makefile
index 37923d715741..62e2c609d1fe 100644
--- a/arch/x86/power/Makefile
+++ b/arch/x86/power/Makefile
@@ -8,3 +8,4 @@ CFLAGS_cpu.o	:= $(nostackp)
 
 obj-$(CONFIG_PM_SLEEP)		+= cpu.o
 obj-$(CONFIG_HIBERNATION)	+= hibernate_$(BITS).o hibernate_asm_$(BITS).o hibernate.o
+obj-y				+= umwait.o
diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c
new file mode 100644
index 000000000000..80cc53a9c2d0
--- /dev/null
+++ b/arch/x86/power/umwait.c
@@ -0,0 +1,75 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/syscore_ops.h>
+#include <linux/suspend.h>
+#include <linux/cpu.h>
+#include <asm/msr.h>
+
+static bool umwait_c0_2_enabled = true;
+/* Umwait max time is in TSC-quanta. Bits[1:0] are zero. */
+static u32 umwait_max_time = 100000;
+
+/* Return value that will be used to set IA32_UMWAIT_CONTROL MSR */
+static u32 umwait_compute_msr_value(void)
+{
+	/*
+	 * When bit 0 in IA32_UMWAIT_CONTROL MSR is 1, C0.2 is disabled.
+	 * Otherwise, C0.2 is enabled.
+	 * So the value in bit 0 is opposite of umwait_c0_2_enabled.
+	 */
+	u32 umwait_c0_2_disabled = umwait_c0_2_enabled ? 0 : 1;
+
+	return (umwait_c0_2_disabled & MSR_IA32_UMWAIT_CONTROL_C02) |
+	       (umwait_max_time & MSR_IA32_UMWAIT_CONTROL_MAX_TIME);
+}
+
+static void umwait_control_msr_update(void)
+{
+	u32 msr_val;
+
+	msr_val = umwait_compute_msr_value();
+	wrmsr(MSR_IA32_UMWAIT_CONTROL, msr_val, 0);
+}
+
+/* Set up IA32_UMWAIT_CONTROL MSR on CPU using the current global setting. */
+static int umwait_cpu_online(unsigned int cpu)
+{
+	umwait_control_msr_update();
+
+	return 0;
+}
+
+/*
+ * On resume, set up IA32_UMWAIT_CONTROL MSR on BP which is the only active
+ * CPU at this time. Setting up the MSR on APs when they are re-added later
+ * using CPU hotplug.
+ * The MSR on BP is supposed not to be changed during suspend and thus it's
+ * unnecessary to set it again during resume from suspend. But at this point
+ * we don't know resume is from suspend or hiberation. To simplify the
+ * situation, just set up the MSR on resume from suspend.
+ */
+static void umwait_syscore_resume(void)
+{
+	umwait_control_msr_update();
+}
+
+static struct syscore_ops umwait_syscore_ops = {
+	.resume	= umwait_syscore_resume,
+};
+
+static int __init umwait_init(void)
+{
+	int ret;
+
+	if (!boot_cpu_has(X86_FEATURE_WAITPKG))
+		return -ENODEV;
+
+	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait/intel:online",
+				umwait_cpu_online, NULL);
+	if (ret < 0)
+		return ret;
+
+	register_syscore_ops(&umwait_syscore_ops);
+
+	return 0;
+}
+device_initcall(umwait_init);
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-05-24 23:55 [PATCH v3 0/5] x86/umwait: Enable user wait instructions Fenghua Yu
  2019-05-24 23:55 ` [PATCH v3 1/5] x86/cpufeatures: Enumerate " Fenghua Yu
  2019-05-24 23:55 ` [PATCH v3 2/5] x86/umwait: Initialize umwait control values Fenghua Yu
@ 2019-05-24 23:56 ` Fenghua Yu
  2019-05-30 21:10   ` Andy Lutomirski
  2019-05-24 23:56 ` [PATCH v3 4/5] x86/umwait: Add sysfs interface to control umwait maximum time Fenghua Yu
  2019-05-24 23:56 ` [PATCH v3 5/5] x86/umwait: Document umwait control sysfs interfaces Fenghua Yu
  4 siblings, 1 reply; 11+ messages in thread
From: Fenghua Yu @ 2019-05-24 23:56 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Andrew Cooper, Ashok Raj, Tony Luck,
	Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

C0.2 state in umwait and tpause instructions can be enabled or disabled
on a processor through IA32_UMWAIT_CONTROL MSR register.

By default, C0.2 is enabled and the user wait instructions result in
lower power consumption with slower wakeup time.

But in real time systems which requrie faster wakeup time although power
savings could be smaller, the administrator needs to disable C0.2 and all
C0.2 requests from user applications revert to C0.1.

A sysfs interface "/sys/devices/system/cpu/umwait_control/enable_c0_2" is
created to allow the administrator to control C0.2 state during run time.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/power/umwait.c | 75 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 71 insertions(+), 4 deletions(-)

diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c
index 80cc53a9c2d0..cf5de7e1cc24 100644
--- a/arch/x86/power/umwait.c
+++ b/arch/x86/power/umwait.c
@@ -7,6 +7,7 @@
 static bool umwait_c0_2_enabled = true;
 /* Umwait max time is in TSC-quanta. Bits[1:0] are zero. */
 static u32 umwait_max_time = 100000;
+static DEFINE_MUTEX(umwait_lock);
 
 /* Return value that will be used to set IA32_UMWAIT_CONTROL MSR */
 static u32 umwait_compute_msr_value(void)
@@ -22,7 +23,7 @@ static u32 umwait_compute_msr_value(void)
 	       (umwait_max_time & MSR_IA32_UMWAIT_CONTROL_MAX_TIME);
 }
 
-static void umwait_control_msr_update(void)
+static void umwait_control_msr_update(void *unused)
 {
 	u32 msr_val;
 
@@ -33,7 +34,9 @@ static void umwait_control_msr_update(void)
 /* Set up IA32_UMWAIT_CONTROL MSR on CPU using the current global setting. */
 static int umwait_cpu_online(unsigned int cpu)
 {
-	umwait_control_msr_update();
+	mutex_lock(&umwait_lock);
+	umwait_control_msr_update(NULL);
+	mutex_unlock(&umwait_lock);
 
 	return 0;
 }
@@ -49,24 +52,88 @@ static int umwait_cpu_online(unsigned int cpu)
  */
 static void umwait_syscore_resume(void)
 {
-	umwait_control_msr_update();
+	/* No need to lock because only BP is running now. */
+	umwait_control_msr_update(NULL);
 }
 
 static struct syscore_ops umwait_syscore_ops = {
 	.resume	= umwait_syscore_resume,
 };
 
+static ssize_t
+enable_c0_2_show(struct device *dev, struct device_attribute *attr,
+		 char *buf)
+{
+	return sprintf(buf, "%d\n", umwait_c0_2_enabled);
+}
+
+static void umwait_control_msr_update_all_cpus(void)
+{
+	u32 msr_val;
+
+	msr_val = umwait_compute_msr_value();
+	/* All CPUs have same umwait control setting */
+	on_each_cpu(umwait_control_msr_update, NULL, 1);
+}
+
+static ssize_t enable_c0_2_store(struct device *dev,
+				 struct device_attribute *attr,
+				 const char *buf, size_t count)
+{
+	bool c0_2_enabled;
+	int ret;
+
+	ret = kstrtobool(buf, &c0_2_enabled);
+	if (ret)
+		return ret;
+
+	mutex_lock(&umwait_lock);
+
+	if (umwait_c0_2_enabled == c0_2_enabled)
+		goto out_unlock;
+
+	umwait_c0_2_enabled = c0_2_enabled;
+	/* Enable/disable C0.2 state on all CPUs */
+	umwait_control_msr_update_all_cpus();
+
+out_unlock:
+	mutex_unlock(&umwait_lock);
+
+	return count;
+}
+static DEVICE_ATTR_RW(enable_c0_2);
+
+static struct attribute *umwait_attrs[] = {
+	&dev_attr_enable_c0_2.attr,
+	NULL
+};
+
+static struct attribute_group umwait_attr_group = {
+	.attrs = umwait_attrs,
+	.name = "umwait_control",
+};
+
 static int __init umwait_init(void)
 {
+	struct device *dev;
 	int ret;
 
 	if (!boot_cpu_has(X86_FEATURE_WAITPKG))
 		return -ENODEV;
 
+	/* Add umwait control interface. */
+	dev = cpu_subsys.dev_root;
+	ret = sysfs_create_group(&dev->kobj, &umwait_attr_group);
+	if (ret)
+		return ret;
+
 	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait/intel:online",
 				umwait_cpu_online, NULL);
-	if (ret < 0)
+	if (ret < 0) {
+		sysfs_remove_group(&dev->kobj, &umwait_attr_group);
+
 		return ret;
+	}
 
 	register_syscore_ops(&umwait_syscore_ops);
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 4/5] x86/umwait: Add sysfs interface to control umwait maximum time
  2019-05-24 23:55 [PATCH v3 0/5] x86/umwait: Enable user wait instructions Fenghua Yu
                   ` (2 preceding siblings ...)
  2019-05-24 23:56 ` [PATCH v3 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state Fenghua Yu
@ 2019-05-24 23:56 ` Fenghua Yu
  2019-05-30 21:11   ` Andy Lutomirski
  2019-05-24 23:56 ` [PATCH v3 5/5] x86/umwait: Document umwait control sysfs interfaces Fenghua Yu
  4 siblings, 1 reply; 11+ messages in thread
From: Fenghua Yu @ 2019-05-24 23:56 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Andrew Cooper, Ashok Raj, Tony Luck,
	Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

IA32_UMWAIT_CONTROL[31:2] determines the maximum time in TSC-quanta
that processor can stay in C0.1 or C0.2. A zero value means no maximum
time.

Each instruction sets its own deadline in the instruction's implicit
input EDX:EAX value. The instruction wakes up if the time-stamp counter
reaches or exceeds the specified deadline, or the umwait maximum time
expires, or a store happens in the monitored address range in umwait.

Users can write an unsigned 32-bit number to
/sys/devices/system/cpu/umwait_control/max_time to change the default
value. Note that a value of zero means there is no limit. Low order
two bits are ignored.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/power/umwait.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c
index cf5de7e1cc24..61076aad7138 100644
--- a/arch/x86/power/umwait.c
+++ b/arch/x86/power/umwait.c
@@ -103,8 +103,45 @@ static ssize_t enable_c0_2_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(enable_c0_2);
 
+static ssize_t
+max_time_show(struct device *kobj, struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%u\n", umwait_max_time);
+}
+
+static ssize_t max_time_store(struct device *kobj,
+			      struct device_attribute *attr,
+			      const char *buf, size_t count)
+{
+	u32 max_time;
+	int ret;
+
+	ret = kstrtou32(buf, 0, &max_time);
+	if (ret)
+		return ret;
+
+	mutex_lock(&umwait_lock);
+
+	/* Only get max time value from bits[31:2] */
+	max_time &= MSR_IA32_UMWAIT_CONTROL_MAX_TIME;
+	if (umwait_max_time == max_time)
+		goto out_unlock;
+
+	umwait_max_time = max_time;
+
+	/* Update umwait max time on all CPUs */
+	umwait_control_msr_update_all_cpus();
+
+out_unlock:
+	mutex_unlock(&umwait_lock);
+
+	return count;
+}
+static DEVICE_ATTR_RW(max_time);
+
 static struct attribute *umwait_attrs[] = {
 	&dev_attr_enable_c0_2.attr,
+	&dev_attr_max_time.attr,
 	NULL
 };
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 5/5] x86/umwait: Document umwait control sysfs interfaces
  2019-05-24 23:55 [PATCH v3 0/5] x86/umwait: Enable user wait instructions Fenghua Yu
                   ` (3 preceding siblings ...)
  2019-05-24 23:56 ` [PATCH v3 4/5] x86/umwait: Add sysfs interface to control umwait maximum time Fenghua Yu
@ 2019-05-24 23:56 ` Fenghua Yu
  4 siblings, 0 replies; 11+ messages in thread
From: Fenghua Yu @ 2019-05-24 23:56 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Andrew Cooper, Ashok Raj, Tony Luck,
	Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

Since two new sysfs interface files are created for umwait control, add
an ABI document entry for the files:
	/sys/devices/system/cpu/umwait_control/enable_c0_2
	/sys/devices/system/cpu/umwait_control/max_time

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
---
 .../ABI/testing/sysfs-devices-system-cpu      | 21 +++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 1528239f69b2..bbf65ae447ff 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -538,3 +538,24 @@ Description:	Intel Energy and Performance Bias Hint (EPB)
 
 		This attribute is present for all online CPUs supporting the
 		Intel EPB feature.
+
+What:		/sys/devices/system/cpu/umwait_control
+		/sys/devices/system/cpu/umwait_control/enable_c0_2
+		/sys/devices/system/cpu/umwait_control/max_time
+Date:		May 2019
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	Umwait control
+
+		enable_c0_2: Read/write interface to control umwait C0.2 state
+			Read returns C0.2 state status:
+				0: C0.2 is disabled
+				1: C0.2 is enabled
+
+			Write 'Yy1' or [oO][nN] for on to enable C0.2 state.
+			Write 'Nn0' or [oO][fF] for off to disable C0.2 state.
+
+		max_time: Read/write interface to control umwait maximum time
+			  in TSC-quanta that the CPU can reside in either C0.1
+			  or C0.2 state. The time is represented as an unsigned
+			  integer decimal value. Bits[1:0] are ignore.
+			  A zero value indicates no maximum time.
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/5] x86/cpufeatures: Enumerate user wait instructions
  2019-05-24 23:55 ` [PATCH v3 1/5] x86/cpufeatures: Enumerate " Fenghua Yu
@ 2019-05-30 14:37   ` Andy Lutomirski
  0 siblings, 0 replies; 11+ messages in thread
From: Andy Lutomirski @ 2019-05-30 14:37 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andrew Cooper, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Fri, May 24, 2019 at 5:05 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
>
> umonitor, umwait, and tpause are a set of user wait instructions.
>
> umonitor arms address monitoring hardware using an address. The
> address range is determined by using CPUID.0x5. A store to
> an address within the specified address range triggers the
> monitoring hardware to wake up the processor waiting in umwait.
>
> umwait instructs the processor to enter an implementation-dependent
> optimized state while monitoring a range of addresses. The optimized
> state may be either a light-weight power/performance optimized state
> (C0.1 state) or an improved power/performance optimized state
> (C0.2 state).
>
> tpause instructs the processor to enter an implementation-dependent
> optimized state C0.1 or C0.2 state and wake up when time-stamp counter
> reaches specified timeout.
>
> The three instructions may be executed at any privilege level.
>
> The instructions provide power saving method while waiting in
> user space. Additionally, they can allow a sibling hyperthread to
> make faster progress while this thread is waiting. One example of an
> application usage of umwait is when waiting for input data from another
> application, such as a user level multi-threaded packet processing
> engine.
>
> Availability of the user wait instructions is indicated by the presence
> of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5].
>
> Detailed information on the instructions and CPUID feature WAITPKG flag
> can be found in the latest Intel Architecture Instruction Set Extensions
> and Future Features Programming Reference and Intel 64 and IA-32
> Architectures Software Developer's Manual.
>

Reviewed-by: Andy Lutomirski <luto@kernel.org>

> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Reviewed-by: Ashok Raj <ashok.raj@intel.com>
> ---
>  arch/x86/include/asm/cpufeatures.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 75f27ee2c263..b8bd428ae5bc 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -322,6 +322,7 @@
>  #define X86_FEATURE_UMIP               (16*32+ 2) /* User Mode Instruction Protection */
>  #define X86_FEATURE_PKU                        (16*32+ 3) /* Protection Keys for Userspace */
>  #define X86_FEATURE_OSPKE              (16*32+ 4) /* OS Protection Keys Enable */
> +#define X86_FEATURE_WAITPKG            (16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */
>  #define X86_FEATURE_AVX512_VBMI2       (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */
>  #define X86_FEATURE_GFNI               (16*32+ 8) /* Galois Field New Instructions */
>  #define X86_FEATURE_VAES               (16*32+ 9) /* Vector AES */
> --
> 2.19.1
>


-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/5] x86/umwait: Initialize umwait control values
  2019-05-24 23:55 ` [PATCH v3 2/5] x86/umwait: Initialize umwait control values Fenghua Yu
@ 2019-05-30 21:05   ` Andy Lutomirski
  0 siblings, 0 replies; 11+ messages in thread
From: Andy Lutomirski @ 2019-05-30 21:05 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andrew Cooper, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Fri, May 24, 2019 at 5:05 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
>
> umwait or tpause allows processor to enter a light-weight
> power/performance optimized state (C0.1 state) or an improved
> power/performance optimized state (C0.2 state) for a period
> specified by the instruction or until the system time limit or until
> a store to the monitored address range in umwait.
>
> IA32_UMWAIT_CONTROL MSR register allows kernel to enable/disable C0.2
> on the processor and set maximum time the processor can reside in
> C0.1 or C0.2.
>
> By default C0.2 is enabled so the user wait instructions can enter the
> C0.2 state to save more power with slower wakeup time.
>
> Default maximum umwait time is 100000 cycles. A later patch provides
> a sysfs interface to adjust this value.

Reviewed-by: Andy Lutomirski <luto@kernel.org>

with the caveat that we should really clean up our CPU init code to
have a function like cpu_prepare_for_user_code() that is called on all
CPUs after every boot, resume, etc before running user code.  This
would subsume syscall_init().

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-05-24 23:56 ` [PATCH v3 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state Fenghua Yu
@ 2019-05-30 21:10   ` Andy Lutomirski
  2019-05-31  1:17     ` Yu, Fenghua
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2019-05-30 21:10 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andrew Cooper, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Fri, May 24, 2019 at 5:05 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
>
> C0.2 state in umwait and tpause instructions can be enabled or disabled
> on a processor through IA32_UMWAIT_CONTROL MSR register.
>
> By default, C0.2 is enabled and the user wait instructions result in
> lower power consumption with slower wakeup time.
>
> But in real time systems which requrie faster wakeup time although power
> savings could be smaller, the administrator needs to disable C0.2 and all
> C0.2 requests from user applications revert to C0.1.
>
> A sysfs interface "/sys/devices/system/cpu/umwait_control/enable_c0_2" is
> created to allow the administrator to control C0.2 state during run time.
>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Reviewed-by: Ashok Raj <ashok.raj@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/power/umwait.c | 75 ++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 71 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c
> index 80cc53a9c2d0..cf5de7e1cc24 100644
> --- a/arch/x86/power/umwait.c
> +++ b/arch/x86/power/umwait.c
> @@ -7,6 +7,7 @@
>  static bool umwait_c0_2_enabled = true;
>  /* Umwait max time is in TSC-quanta. Bits[1:0] are zero. */
>  static u32 umwait_max_time = 100000;
> +static DEFINE_MUTEX(umwait_lock);
>
>  /* Return value that will be used to set IA32_UMWAIT_CONTROL MSR */
>  static u32 umwait_compute_msr_value(void)
> @@ -22,7 +23,7 @@ static u32 umwait_compute_msr_value(void)
>                (umwait_max_time & MSR_IA32_UMWAIT_CONTROL_MAX_TIME);
>  }
>
> -static void umwait_control_msr_update(void)
> +static void umwait_control_msr_update(void *unused)
>  {
>         u32 msr_val;
>
> @@ -33,7 +34,9 @@ static void umwait_control_msr_update(void)
>  /* Set up IA32_UMWAIT_CONTROL MSR on CPU using the current global setting. */
>  static int umwait_cpu_online(unsigned int cpu)
>  {
> -       umwait_control_msr_update();
> +       mutex_lock(&umwait_lock);
> +       umwait_control_msr_update(NULL);
> +       mutex_unlock(&umwait_lock);

What's the mutex for?  Can't you just use READ_ONCE?

> +static void umwait_control_msr_update_all_cpus(void)
> +{
> +       u32 msr_val;
> +
> +       msr_val = umwait_compute_msr_value();
> +       /* All CPUs have same umwait control setting */
> +       on_each_cpu(umwait_control_msr_update, NULL, 1);

Why are you calling umwait_compute_msr_value()?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 4/5] x86/umwait: Add sysfs interface to control umwait maximum time
  2019-05-24 23:56 ` [PATCH v3 4/5] x86/umwait: Add sysfs interface to control umwait maximum time Fenghua Yu
@ 2019-05-30 21:11   ` Andy Lutomirski
  0 siblings, 0 replies; 11+ messages in thread
From: Andy Lutomirski @ 2019-05-30 21:11 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andrew Cooper, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Fri, May 24, 2019 at 5:05 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
>
> IA32_UMWAIT_CONTROL[31:2] determines the maximum time in TSC-quanta
> that processor can stay in C0.1 or C0.2. A zero value means no maximum
> time.
>
> Each instruction sets its own deadline in the instruction's implicit
> input EDX:EAX value. The instruction wakes up if the time-stamp counter
> reaches or exceeds the specified deadline, or the umwait maximum time
> expires, or a store happens in the monitored address range in umwait.
>
> Users can write an unsigned 32-bit number to
> /sys/devices/system/cpu/umwait_control/max_time to change the default
> value. Note that a value of zero means there is no limit. Low order
> two bits are ignored.
>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Reviewed-by: Ashok Raj <ashok.raj@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/power/umwait.c | 37 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 37 insertions(+)
>
> diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c
> index cf5de7e1cc24..61076aad7138 100644
> --- a/arch/x86/power/umwait.c
> +++ b/arch/x86/power/umwait.c
> @@ -103,8 +103,45 @@ static ssize_t enable_c0_2_store(struct device *dev,
>  }
>  static DEVICE_ATTR_RW(enable_c0_2);
>
> +static ssize_t
> +max_time_show(struct device *kobj, struct device_attribute *attr, char *buf)
> +{
> +       return sprintf(buf, "%u\n", umwait_max_time);
> +}
> +
> +static ssize_t max_time_store(struct device *kobj,
> +                             struct device_attribute *attr,
> +                             const char *buf, size_t count)
> +{
> +       u32 max_time;
> +       int ret;
> +
> +       ret = kstrtou32(buf, 0, &max_time);
> +       if (ret)
> +               return ret;
> +
> +       mutex_lock(&umwait_lock);
> +
> +       /* Only get max time value from bits[31:2] */
> +       max_time &= MSR_IA32_UMWAIT_CONTROL_MAX_TIME;

I think you should error out if high bits are set.  I'm okay with
masking off low bits, except that an input of 1 should not turn into
0, since 0 is special IIRC.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH v3 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-05-30 21:10   ` Andy Lutomirski
@ 2019-05-31  1:17     ` Yu, Fenghua
  0 siblings, 0 replies; 11+ messages in thread
From: Yu, Fenghua @ 2019-05-31  1:17 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andrew Cooper, Raj, Ashok, Luck, Tony, Shankar, Ravi V,
	linux-kernel, x86

> On Thursday, May 30, 2019 2:11 PM Andy Lutomirski [mailto:luto@kernel.org] wrote:
> On Fri, May 24, 2019 at 5:05 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> >
> > C0.2 state in umwait and tpause instructions can be enabled or
> > disabled on a processor through IA32_UMWAIT_CONTROL MSR register.
> >
> > By default, C0.2 is enabled and the user wait instructions result in
> > lower power consumption with slower wakeup time.
> >
> > But in real time systems which requrie faster wakeup time although
> > power savings could be smaller, the administrator needs to disable
> > C0.2 and all
> > C0.2 requests from user applications revert to C0.1.
> >
> > A sysfs interface "/sys/devices/system/cpu/umwait_control/enable_c0_2"
> > is created to allow the administrator to control C0.2 state during run time.
> >
> > Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> > Reviewed-by: Ashok Raj <ashok.raj@intel.com>
> > Reviewed-by: Tony Luck <tony.luck@intel.com>
> > ---
> >  arch/x86/power/umwait.c | 75
> > ++++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 71 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c index
> > 80cc53a9c2d0..cf5de7e1cc24 100644
> > --- a/arch/x86/power/umwait.c
> > +++ b/arch/x86/power/umwait.c
> > @@ -7,6 +7,7 @@
> >  static bool umwait_c0_2_enabled = true;
> >  /* Umwait max time is in TSC-quanta. Bits[1:0] are zero. */  static
> > u32 umwait_max_time = 100000;
> > +static DEFINE_MUTEX(umwait_lock);
> >
> >  /* Return value that will be used to set IA32_UMWAIT_CONTROL MSR */
> > static u32 umwait_compute_msr_value(void) @@ -22,7 +23,7 @@ static
> u32
> > umwait_compute_msr_value(void)
> >                (umwait_max_time & MSR_IA32_UMWAIT_CONTROL_MAX_TIME);
> >  }
> >
> > -static void umwait_control_msr_update(void)
> > +static void umwait_control_msr_update(void *unused)
> >  {
> >         u32 msr_val;
> >
> > @@ -33,7 +34,9 @@ static void umwait_control_msr_update(void)
> >  /* Set up IA32_UMWAIT_CONTROL MSR on CPU using the current global
> > setting. */  static int umwait_cpu_online(unsigned int cpu)  {
> > -       umwait_control_msr_update();
> > +       mutex_lock(&umwait_lock);
> > +       umwait_control_msr_update(NULL);
> > +       mutex_unlock(&umwait_lock);
> 
> What's the mutex for?  Can't you just use READ_ONCE?

umwait_control_msr_update() will write both umwait_c0_2_enabled and umwait_max_time (which also can be
changed through sysfs in the next patch) to the TEST_CTRL MSR.

Just using READ_ONCE() for the two variables cannot guarantee all CPUs have the same setting of C0.2 and max time.
READ_ONCE() and WRITE_ONCE() can only guarantee atomicity for reading and writng the same variable.

For e.g. without mutex protection:

initial values: umwait_c0_2_enabled=1 and umwait_max_time=100000

1. umwait_cpu_online(X): read umwait_c0_2_enabled as 1
2. enable_c0_2_store(): umwait_c0_2_enabled = 0 and update all online CPUs as C0.2 disabled.
3. umwait_cpu_online(X): read umwait_max_time=100000
4. umwait_cpu_online(Y): read umwait_c0_2_enabled as 0
5. umwait_max_time_store(): umwait_max_time=500 and update all online CPUs as max time = 500 cycles.
6. umwait_cpu_online(Y): read umwait_max_time as 500
7. umwati_cpu_online(X): wrmsr() enables C0.2 and sets max time 100000 on CPU X
8. umwait_cpu_online(Y): disables C0.2 and sets  max time 500 on CPU Y

With the mutex to protect the two variables and wrmsr(), each CPU will have the same setting of C0.2 and max time.

> 
> > +static void umwait_control_msr_update_all_cpus(void)
> > +{
> > +       u32 msr_val;
> > +
> > +       msr_val = umwait_compute_msr_value();
> > +       /* All CPUs have same umwait control setting */
> > +       on_each_cpu(umwait_control_msr_update, NULL, 1);
> 
> Why are you calling umwait_compute_msr_value()?

Umwait_compute_msr_value() computes the TEST_CTL value from two variables umwait_c0_2_enabled and umwait_max_time.
Any of the two variables may be changed when  umwait_control_msr_update_all_cpus() is called. So need to re-calculate the
MSR value then write the value to MSR on all CPUs.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-05-31  1:17 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-24 23:55 [PATCH v3 0/5] x86/umwait: Enable user wait instructions Fenghua Yu
2019-05-24 23:55 ` [PATCH v3 1/5] x86/cpufeatures: Enumerate " Fenghua Yu
2019-05-30 14:37   ` Andy Lutomirski
2019-05-24 23:55 ` [PATCH v3 2/5] x86/umwait: Initialize umwait control values Fenghua Yu
2019-05-30 21:05   ` Andy Lutomirski
2019-05-24 23:56 ` [PATCH v3 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state Fenghua Yu
2019-05-30 21:10   ` Andy Lutomirski
2019-05-31  1:17     ` Yu, Fenghua
2019-05-24 23:56 ` [PATCH v3 4/5] x86/umwait: Add sysfs interface to control umwait maximum time Fenghua Yu
2019-05-30 21:11   ` Andy Lutomirski
2019-05-24 23:56 ` [PATCH v3 5/5] x86/umwait: Document umwait control sysfs interfaces Fenghua Yu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.