All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/5] x86/umwait: Enable user wait instructions
@ 2019-06-20  1:33 Fenghua Yu
  2019-06-20  1:33 ` [PATCH v5 1/5] x86/cpufeatures: Enumerate " Fenghua Yu
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Fenghua Yu @ 2019-06-20  1:33 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Ashok Raj, Tony Luck,
	Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

Today, if an application needs to wait for a very short duration
they have to have spinloops. Spinloops consume more power and continue
to use execution resources that could hurt its thread siblings in a core
with hyperthreads. New instructions umonitor, umwait and tpause allow
a low power alternative waiting at the same time could improve the HT
sibling perform while giving it any power headroom. These instructions
can be used in both user space and kernel space.

A new MSR IA32_UMWAIT_CONTROL allows kernel to set a time limit in
TSC-quanta that prevents user applications from waiting for a long time.
This allows applications to yield the CPU and the user application
should consider using other alternatives to wait.

A quote from Andy Lutomirski on setting the time limit:

"What I want to avoid is the case where it works dramatically
differently on NO_HZ_FULL systems as compared to everything else.
Also, UMWAIT may behave a bit differently if the max timeout is hit,
and I'd like that path to get exercised widely by making it happen
even on default configs.

So I propose setting the timeout to either 100 microseconds or 100k
"cycles" by default."

The processor supports two levels of optimized states: a light-weight
power/performance optimized state (C0.1 state) or an improved
power/performance optimized state (C0.2 state with deeper power saving
and higher exit latency). The above MSR can be used to restrict
entry to C0.2 and then any request for C0.2 will revert to C0.1.

This patch set covers feature discovery, provides initial values for
the MSR, adds some sysfs control files for admin to tweak the values
in the MSR if needed.

The sysfs interface files are in /sys/devices/system/cpu/umwait_control/

GCC 9 enables intrinsics for the instructions. To use the instructions,
user applications should include <immintrin.h> and be compiled with
-mwaitpkg.

Detailed information on the instructions, the MSR, and syntax of the
intrinsics can be found in the latest Intel Architecture Instruction
Set Extensions and Future Features Programming Reference and Intel 64
and IA-32 Architectures Software Developer's Manual.

Changelog:
v5:
- Change locking from mutex to disabling irq before wrmsr per
Andy Lutomirski's comment
- Add macro UMWAIT_CTRL_VAL to explicitly disable C0.2 per
Thomas Gleixner's comment
- Move umwait.c to arch/x86/kernel/cpu/ per Peter Zijlstra's comment
- Add justification of max time 100k per Peter Zijlstra's comment

v4:
- Error out when bit[1:0] in IA32_UMWAIT_CONTROL is not zero per
Andy Lutomirski's comment.
- Use umwait_control_cached to cache IA32_UMWAIT_CONTROL MSR. This
variable replaces the two previous variables umwait_max_time and
umwait_c0_2_enabled. The code is simpler than before and the cached MSR
will be easier to be used in future KVM support.

v3:
Address issues pointed out by Andy Lutomirski:
- Change default umwait max time to 100k TSC cycles
- Setting up MSR on BSP during resume suspend/hibernation
- A few other naming and coding changes as suggested
- Some security concerns of the user wait instructions are not issues
of the patches and cannot be addressed in the patch set. They will be
discussed on lkml.

Plus:
- Add ABI document entry for umwait control sysfs interfaces

v2:
- Address comments from Thomas Gleixner and Andy Lutomirski
- Remove vDSO functions
- Add sysfs control file for umwait max time

v1:
Based on comments from Thomas:
- Change user APIs to vDSO functions
- Changed sysfs per comments from Thomas.
- Change patch descriptions etc

Fenghua Yu (5):
  x86/cpufeatures: Enumerate user wait instructions
  x86/umwait: Initialize umwait control values
  x86/umwait: Add sysfs interface to control umwait C0.2 state
  x86/umwait: Add sysfs interface to control umwait maximum time
  x86/umwait: Document umwait control sysfs interfaces

 .../ABI/testing/sysfs-devices-system-cpu      |  21 ++
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/msr-index.h              |   4 +
 arch/x86/kernel/cpu/Makefile                  |   1 +
 arch/x86/kernel/cpu/umwait.c                  | 205 ++++++++++++++++++
 5 files changed, 232 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/umwait.c

-- 
2.19.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v5 1/5] x86/cpufeatures: Enumerate user wait instructions
  2019-06-20  1:33 [PATCH v5 0/5] x86/umwait: Enable user wait instructions Fenghua Yu
@ 2019-06-20  1:33 ` Fenghua Yu
  2019-06-24  0:01   ` [tip:x86/cpu] " tip-bot for Fenghua Yu
  2019-06-20  1:33 ` [PATCH v5 2/5] x86/umwait: Initialize umwait control values Fenghua Yu
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Fenghua Yu @ 2019-06-20  1:33 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Ashok Raj, Tony Luck,
	Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

umonitor, umwait, and tpause are a set of user wait instructions.

umonitor arms address monitoring hardware using an address. The
address range is determined by using CPUID.0x5. A store to
an address within the specified address range triggers the
monitoring hardware to wake up the processor waiting in umwait.

umwait instructs the processor to enter an implementation-dependent
optimized state while monitoring a range of addresses. The optimized
state may be either a light-weight power/performance optimized state
(C0.1 state) or an improved power/performance optimized state
(C0.2 state).

tpause instructs the processor to enter an implementation-dependent
optimized state C0.1 or C0.2 state and wake up when time-stamp counter
reaches specified timeout.

The three instructions may be executed at any privilege level.

The instructions provide power saving method while waiting in
user space. Additionally, they can allow a sibling hyperthread to
make faster progress while this thread is waiting. One example of an
application usage of umwait is when waiting for input data from another
application, such as a user level multi-threaded packet processing
engine.

Availability of the user wait instructions is indicated by the presence
of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5].

Detailed information on the instructions and CPUID feature WAITPKG flag
can be found in the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference and Intel 64 and IA-32
Architectures Software Developer's Manual.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 75f27ee2c263..b8bd428ae5bc 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -322,6 +322,7 @@
 #define X86_FEATURE_UMIP		(16*32+ 2) /* User Mode Instruction Protection */
 #define X86_FEATURE_PKU			(16*32+ 3) /* Protection Keys for Userspace */
 #define X86_FEATURE_OSPKE		(16*32+ 4) /* OS Protection Keys Enable */
+#define X86_FEATURE_WAITPKG		(16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */
 #define X86_FEATURE_AVX512_VBMI2	(16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */
 #define X86_FEATURE_GFNI		(16*32+ 8) /* Galois Field New Instructions */
 #define X86_FEATURE_VAES		(16*32+ 9) /* Vector AES */
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 2/5] x86/umwait: Initialize umwait control values
  2019-06-20  1:33 [PATCH v5 0/5] x86/umwait: Enable user wait instructions Fenghua Yu
  2019-06-20  1:33 ` [PATCH v5 1/5] x86/cpufeatures: Enumerate " Fenghua Yu
@ 2019-06-20  1:33 ` Fenghua Yu
  2019-06-23 22:39   ` Thomas Gleixner
  2019-06-24  0:01   ` [tip:x86/cpu] " tip-bot for Fenghua Yu
  2019-06-20  1:33 ` [PATCH v5 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state Fenghua Yu
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 19+ messages in thread
From: Fenghua Yu @ 2019-06-20  1:33 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Ashok Raj, Tony Luck,
	Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

umwait or tpause allows processor to enter a light-weight
power/performance optimized state (C0.1 state) or an improved
power/performance optimized state (C0.2 state) for a period
specified by the instruction or until the system time limit or until
a store to the monitored address range in umwait.

IA32_UMWAIT_CONTROL MSR register allows kernel to enable/disable C0.2
on the processor and set maximum time the processor can reside in
C0.1 or C0.2.

By default C0.2 is enabled so the user wait instructions can enter the
C0.2 state to save more power with slower wakeup time.

Andy Lutomirski proposes setting maximum umwait time to 100000 cycles
by default. A quote from Andy:

"What I want to avoid is the case where it works dramatically differently
on NO_HZ_FULL systems as compared to everything else. Also, UMWAIT may
behave a bit differently if the max timeout is hit, and I'd like that path
to get exercised widely by making it happen even on default configs."

A later patch provides a sysfs interface to adjust this value.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/msr-index.h |  4 +++
 arch/x86/kernel/cpu/Makefile     |  1 +
 arch/x86/kernel/cpu/umwait.c     | 62 ++++++++++++++++++++++++++++++++
 3 files changed, 67 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/umwait.c

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 979ef971cc78..3b057079d6b5 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -61,6 +61,10 @@
 #define MSR_PLATFORM_INFO_CPUID_FAULT_BIT	31
 #define MSR_PLATFORM_INFO_CPUID_FAULT		BIT_ULL(MSR_PLATFORM_INFO_CPUID_FAULT_BIT)
 
+#define MSR_IA32_UMWAIT_CONTROL			0xe1
+#define MSR_IA32_UMWAIT_CONTROL_C02_DISABLED	BIT(0)
+#define MSR_IA32_UMWAIT_CONTROL_MAX_TIME	0xfffffffc
+
 #define MSR_PKG_CST_CONFIG_CONTROL	0x000000e2
 #define NHM_C3_AUTO_DEMOTE		(1UL << 25)
 #define NHM_C1_AUTO_DEMOTE		(1UL << 26)
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 5102bf7c8192..b4c81e9a18c6 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -24,6 +24,7 @@ obj-y			+= match.o
 obj-y			+= bugs.o
 obj-y			+= aperfmperf.o
 obj-y			+= cpuid-deps.o
+obj-y			+= umwait.o
 
 obj-$(CONFIG_PROC_FS)	+= proc.o
 obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o
diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c
new file mode 100644
index 000000000000..b0bf7adde36f
--- /dev/null
+++ b/arch/x86/kernel/cpu/umwait.c
@@ -0,0 +1,62 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/syscore_ops.h>
+#include <linux/suspend.h>
+#include <linux/cpu.h>
+#include <asm/msr.h>
+
+#define UMWAIT_C02_ENABLED	(0 & MSR_IA32_UMWAIT_CONTROL_C02_DISABLED)
+
+#define UMWAIT_CTRL_VAL(max_time, c02_disabled)				\
+	(((max_time) & MSR_IA32_UMWAIT_CONTROL_MAX_TIME) |		\
+	((c02_disabled) & MSR_IA32_UMWAIT_CONTROL_C02_DISABLED))
+
+/*
+ * Cache IA32_UMWAIT_CONTROL MSR in this variable. All CPUs have the same
+ * MSR value. By default, umwait max time is 100000 in TSC-quanta and C0.2
+ * is enabled
+ */
+static u32 umwait_control_cached = UMWAIT_CTRL_VAL(100000, UMWAIT_C02_ENABLED);
+
+/* Set up IA32_UMWAIT_CONTROL MSR on CPU using the current global setting. */
+static int umwait_cpu_online(unsigned int cpu)
+{
+	wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_control_cached, 0);
+
+	return 0;
+}
+
+/*
+ * On resume, set up IA32_UMWAIT_CONTROL MSR on BP which is the only active
+ * CPU at this time. Setting up the MSR on APs when they are re-added later
+ * using CPU hotplug.
+ * The MSR on BP is supposed not to be changed during suspend and thus it's
+ * unnecessary to set it again during resume from suspend. But at this point
+ * we don't know resume is from suspend or hibernation. To simplify the
+ * situation, just set up the MSR on resume from suspend.
+ */
+static void umwait_syscore_resume(void)
+{
+	wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_control_cached, 0);
+}
+
+static struct syscore_ops umwait_syscore_ops = {
+	.resume	= umwait_syscore_resume,
+};
+
+static int __init umwait_init(void)
+{
+	int ret;
+
+	if (!boot_cpu_has(X86_FEATURE_WAITPKG))
+		return -ENODEV;
+
+	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait/intel:online",
+				umwait_cpu_online, NULL);
+	if (ret < 0)
+		return ret;
+
+	register_syscore_ops(&umwait_syscore_ops);
+
+	return 0;
+}
+device_initcall(umwait_init);
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-20  1:33 [PATCH v5 0/5] x86/umwait: Enable user wait instructions Fenghua Yu
  2019-06-20  1:33 ` [PATCH v5 1/5] x86/cpufeatures: Enumerate " Fenghua Yu
  2019-06-20  1:33 ` [PATCH v5 2/5] x86/umwait: Initialize umwait control values Fenghua Yu
@ 2019-06-20  1:33 ` Fenghua Yu
  2019-06-23 22:40   ` Thomas Gleixner
  2019-06-24  0:02   ` [tip:x86/cpu] " tip-bot for Fenghua Yu
  2019-06-20  1:33 ` [PATCH v5 4/5] x86/umwait: Add sysfs interface to control umwait maximum time Fenghua Yu
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 19+ messages in thread
From: Fenghua Yu @ 2019-06-20  1:33 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Ashok Raj, Tony Luck,
	Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

C0.2 state in umwait and tpause instructions can be enabled or disabled
on a processor through IA32_UMWAIT_CONTROL MSR register.

By default, C0.2 is enabled and the user wait instructions result in
lower power consumption with slower wakeup time.

But in real time systems which require faster wakeup time although power
savings could be smaller, the administrator needs to disable C0.2 and all
C0.2 requests from user applications revert to C0.1.

A sysfs interface "/sys/devices/system/cpu/umwait_control/enable_c02" is
created to allow the administrator to control C0.2 state during run time.

Andy Lutomirski suggests to turn off local irqs before writing
MSR_TEST_CTL to ensure msr_test_ctl_cached is not changed by sysfs write
on this CPU or by any concurrent sysfs write from a different CPU via IPI
until we're done.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/umwait.c | 109 ++++++++++++++++++++++++++++++++++-
 1 file changed, 106 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c
index b0bf7adde36f..3bd6d37a7b2c 100644
--- a/arch/x86/kernel/cpu/umwait.c
+++ b/arch/x86/kernel/cpu/umwait.c
@@ -17,10 +17,34 @@
  */
 static u32 umwait_control_cached = UMWAIT_CTRL_VAL(100000, UMWAIT_C02_ENABLED);
 
+/*
+ * Serialize access to umwait_control_cached and IA32_UMWAIT_CONTROL MSR
+ * in writing sysfs to ensure all CPUs have the same MSR value.
+ */
+static DEFINE_MUTEX(umwait_lock);
+
+static void update_this_cpu_umwait_control_msr(void)
+{
+	unsigned long flags;
+
+	/*
+	 * We need to prevent umwait_control_cached from being changed *and*
+	 * completing its WRMSR between our read and our WRMSR. By turning
+	 * IRQs off here, ensure that no sysfs write happens on this CPU
+	 * and we also make sure that any concurrent sysfs write from a
+	 * different CPU will not finish updating us via IPI until we're done.
+	 */
+	local_irq_save(flags);
+
+	wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
+
+	local_irq_restore(flags);
+}
+
 /* Set up IA32_UMWAIT_CONTROL MSR on CPU using the current global setting. */
 static int umwait_cpu_online(unsigned int cpu)
 {
-	wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_control_cached, 0);
+	update_this_cpu_umwait_control_msr();
 
 	return 0;
 }
@@ -36,24 +60,103 @@ static int umwait_cpu_online(unsigned int cpu)
  */
 static void umwait_syscore_resume(void)
 {
-	wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_control_cached, 0);
+	update_this_cpu_umwait_control_msr();
 }
 
 static struct syscore_ops umwait_syscore_ops = {
 	.resume	= umwait_syscore_resume,
 };
 
+static void umwait_control_msr_update(void *unused)
+{
+	update_this_cpu_umwait_control_msr();
+}
+
+static u32 get_umwait_ctrl_c02(void)
+{
+	return READ_ONCE(umwait_control_cached) & MSR_IA32_UMWAIT_CONTROL_C02_DISABLED;
+}
+
+static u32 get_umwait_ctrl_max_time(void)
+{
+	return READ_ONCE(umwait_control_cached) & MSR_IA32_UMWAIT_CONTROL_MAX_TIME;
+}
+
+static ssize_t
+enable_c02_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	 /*
+	  * When bit 0 in IA32_UMWAIT_CONTROL MSR is 1, C0.2 is disabled.
+	  * Otherwise, C0.2 is enabled. Show the opposite of bit 0.
+	  */
+	return sprintf(buf, "%d\n", !(bool)get_umwait_ctrl_c02());
+}
+
+static ssize_t enable_c02_store(struct device *dev,
+				struct device_attribute *attr,
+				const char *buf, size_t count)
+{
+	u32 umwait_c02;
+	bool c02_enabled;
+	int ret;
+
+	ret = kstrtobool(buf, &c02_enabled);
+	if (ret)
+		return ret;
+
+	mutex_lock(&umwait_lock);
+
+	/*
+	 * The value of bit 0 in IA32_UMWAIT_CONTROL MSR is opposite of
+	 * c02_enabled.
+	 */
+	umwait_c02 = (u32)!c02_enabled;
+	if (umwait_c02 == get_umwait_ctrl_c02())
+		goto out_unlock;
+
+	WRITE_ONCE(umwait_control_cached,
+		   UMWAIT_CTRL_VAL(get_umwait_ctrl_max_time(), umwait_c02));
+	/* Enable/disable C0.2 state on all CPUs */
+	on_each_cpu(umwait_control_msr_update, NULL, 1);
+
+out_unlock:
+	mutex_unlock(&umwait_lock);
+
+	return count;
+}
+static DEVICE_ATTR_RW(enable_c02);
+
+static struct attribute *umwait_attrs[] = {
+	&dev_attr_enable_c02.attr,
+	NULL
+};
+
+static struct attribute_group umwait_attr_group = {
+	.attrs = umwait_attrs,
+	.name = "umwait_control",
+};
+
 static int __init umwait_init(void)
 {
+	struct device *dev;
 	int ret;
 
 	if (!boot_cpu_has(X86_FEATURE_WAITPKG))
 		return -ENODEV;
 
+	/* Add umwait control interface. */
+	dev = cpu_subsys.dev_root;
+	ret = sysfs_create_group(&dev->kobj, &umwait_attr_group);
+	if (ret)
+		return ret;
+
 	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait/intel:online",
 				umwait_cpu_online, NULL);
-	if (ret < 0)
+	if (ret < 0) {
+		sysfs_remove_group(&dev->kobj, &umwait_attr_group);
+
 		return ret;
+	}
 
 	register_syscore_ops(&umwait_syscore_ops);
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 4/5] x86/umwait: Add sysfs interface to control umwait maximum time
  2019-06-20  1:33 [PATCH v5 0/5] x86/umwait: Enable user wait instructions Fenghua Yu
                   ` (2 preceding siblings ...)
  2019-06-20  1:33 ` [PATCH v5 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state Fenghua Yu
@ 2019-06-20  1:33 ` Fenghua Yu
  2019-06-23 22:40   ` Thomas Gleixner
  2019-06-24  0:03   ` [tip:x86/cpu] " tip-bot for Fenghua Yu
  2019-06-20  1:33 ` [PATCH v5 5/5] x86/umwait: Document umwait control sysfs interfaces Fenghua Yu
  2019-06-20 16:25 ` [PATCH v5 0/5] x86/umwait: Enable user wait instructions Andy Lutomirski
  5 siblings, 2 replies; 19+ messages in thread
From: Fenghua Yu @ 2019-06-20  1:33 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Ashok Raj, Tony Luck,
	Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

IA32_UMWAIT_CONTROL[31:2] determines the maximum time in TSC-quanta
that processor can stay in C0.1 or C0.2. A zero value means no maximum
time.

Each instruction sets its own deadline in the instruction's implicit
input EDX:EAX value. The instruction wakes up if the time-stamp counter
reaches or exceeds the specified deadline, or the umwait maximum time
expires, or a store happens in the monitored address range in umwait.

Users can write an unsigned 32-bit number to
/sys/devices/system/cpu/umwait_control/max_time to change the default
value. Note that a value of zero means there is no limit. Low order
two bits must be zero.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/umwait.c | 40 ++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c
index 3bd6d37a7b2c..4b2aff7b2d4d 100644
--- a/arch/x86/kernel/cpu/umwait.c
+++ b/arch/x86/kernel/cpu/umwait.c
@@ -126,8 +126,48 @@ static ssize_t enable_c02_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(enable_c02);
 
+static ssize_t
+max_time_show(struct device *kobj, struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%u\n", get_umwait_ctrl_max_time());
+}
+
+static ssize_t max_time_store(struct device *kobj,
+			      struct device_attribute *attr,
+			      const char *buf, size_t count)
+{
+	u32 max_time;
+	int ret;
+
+	ret = kstrtou32(buf, 0, &max_time);
+	if (ret)
+		return ret;
+
+	/* bits[1:0] must be zero */
+	if (max_time & ~MSR_IA32_UMWAIT_CONTROL_MAX_TIME)
+		return -EINVAL;
+
+	mutex_lock(&umwait_lock);
+
+	if (max_time == get_umwait_ctrl_max_time())
+		goto out_unlock;
+
+	WRITE_ONCE(umwait_control_cached,
+		   UMWAIT_CTRL_VAL(max_time, get_umwait_ctrl_c02()));
+
+	/* Update umwait max time on all CPUs */
+	on_each_cpu(umwait_control_msr_update, NULL, 1);
+
+out_unlock:
+	mutex_unlock(&umwait_lock);
+
+	return count;
+}
+static DEVICE_ATTR_RW(max_time);
+
 static struct attribute *umwait_attrs[] = {
 	&dev_attr_enable_c02.attr,
+	&dev_attr_max_time.attr,
 	NULL
 };
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 5/5] x86/umwait: Document umwait control sysfs interfaces
  2019-06-20  1:33 [PATCH v5 0/5] x86/umwait: Enable user wait instructions Fenghua Yu
                   ` (3 preceding siblings ...)
  2019-06-20  1:33 ` [PATCH v5 4/5] x86/umwait: Add sysfs interface to control umwait maximum time Fenghua Yu
@ 2019-06-20  1:33 ` Fenghua Yu
  2019-06-23 22:42   ` Thomas Gleixner
  2019-06-24  0:03   ` [tip:x86/cpu] Documentation/ABI: " tip-bot for Fenghua Yu
  2019-06-20 16:25 ` [PATCH v5 0/5] x86/umwait: Enable user wait instructions Andy Lutomirski
  5 siblings, 2 replies; 19+ messages in thread
From: Fenghua Yu @ 2019-06-20  1:33 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Ashok Raj, Tony Luck,
	Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

Since two new sysfs interface files are created for umwait control, add
an ABI document entry for the files:
	/sys/devices/system/cpu/umwait_control/enable_c02
	/sys/devices/system/cpu/umwait_control/max_time

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
---
 .../ABI/testing/sysfs-devices-system-cpu      | 21 +++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 1528239f69b2..d22cdadd3161 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -538,3 +538,24 @@ Description:	Intel Energy and Performance Bias Hint (EPB)
 
 		This attribute is present for all online CPUs supporting the
 		Intel EPB feature.
+
+What:		/sys/devices/system/cpu/umwait_control
+		/sys/devices/system/cpu/umwait_control/enable_c02
+		/sys/devices/system/cpu/umwait_control/max_time
+Date:		May 2019
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	Umwait control
+
+		enable_c02: Read/write interface to control umwait C0.2 state
+			Read returns C0.2 state status:
+				0: C0.2 is disabled
+				1: C0.2 is enabled
+
+			Write 'Yy1' or [oO][nN] for on to enable C0.2 state.
+			Write 'Nn0' or [oO][fF] for off to disable C0.2 state.
+
+		max_time: Read/write interface to control umwait maximum time
+			  in TSC-quanta that the CPU can reside in either C0.1
+			  or C0.2 state. The time is an unsigned 32-bit number.
+			  Note that a value of zero means there is no limit.
+			  Low order two bits must be zero.
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 0/5] x86/umwait: Enable user wait instructions
  2019-06-20  1:33 [PATCH v5 0/5] x86/umwait: Enable user wait instructions Fenghua Yu
                   ` (4 preceding siblings ...)
  2019-06-20  1:33 ` [PATCH v5 5/5] x86/umwait: Document umwait control sysfs interfaces Fenghua Yu
@ 2019-06-20 16:25 ` Andy Lutomirski
  2019-06-20 23:28   ` Fenghua Yu
  5 siblings, 1 reply; 19+ messages in thread
From: Andy Lutomirski @ 2019-06-20 16:25 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Ashok Raj, Tony Luck,
	Ravi V Shankar, linux-kernel, x86

On Wed, Jun 19, 2019 at 6:43 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> The sysfs interface files are in /sys/devices/system/cpu/umwait_control/

This might be a silly question, but: what do we envision as the use
case for changing the C0.2 setting?  I'm wondering if we'll ever end
up wanting it as a prctl() instead of a sysfs file.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 0/5] x86/umwait: Enable user wait instructions
  2019-06-20 16:25 ` [PATCH v5 0/5] x86/umwait: Enable user wait instructions Andy Lutomirski
@ 2019-06-20 23:28   ` Fenghua Yu
  0 siblings, 0 replies; 19+ messages in thread
From: Fenghua Yu @ 2019-06-20 23:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Thu, Jun 20, 2019 at 09:25:44AM -0700, Andy Lutomirski wrote:
> On Wed, Jun 19, 2019 at 6:43 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> > The sysfs interface files are in /sys/devices/system/cpu/umwait_control/
> 
> This might be a silly question, but: what do we envision as the use
> case for changing the C0.2 setting?  I'm wondering if we'll ever end
> up wanting it as a prctl() instead of a sysfs file.

There may be some use cases, e.g. C0.2 state is enabled for saving more
power when the system has less workloads and is disabled for better
performance when the system is busy, or a real time system wants to disable
C0.2 for better response time, etc.

We thought about controling C0.2 per process before. But if doing so, the
umwait control MSR is per proces and needs to be saved/restored in
context switch. xsave/xrestore doesn't support the MSR. So the overhead
of saving/restoring the MSR could be high, especially the overhead
may hurt real time apps.

And there is no clear usage cases for changing C0.2 per process.

We hope the current patches to be available in upstream first for its
simplity and usage.

If we find usage of controling C0.2 per process, we can add code later
and/or may have xsave/xrestore support for the MSR to speed up context
switch.

The current C0.2 control won't block potential per process control if
the per process control is supported in the future.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 2/5] x86/umwait: Initialize umwait control values
  2019-06-20  1:33 ` [PATCH v5 2/5] x86/umwait: Initialize umwait control values Fenghua Yu
@ 2019-06-23 22:39   ` Thomas Gleixner
  2019-06-24 22:12     ` Fenghua Yu
  2019-06-24  0:01   ` [tip:x86/cpu] " tip-bot for Fenghua Yu
  1 sibling, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2019-06-23 22:39 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Borislav Petkov, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Wed, 19 Jun 2019, Fenghua Yu wrote:
>  
> +#define MSR_IA32_UMWAIT_CONTROL			0xe1
> +#define MSR_IA32_UMWAIT_CONTROL_C02_DISABLED	BIT(0)
> +#define MSR_IA32_UMWAIT_CONTROL_MAX_TIME	0xfffffffc

Errm, no! That's not maxtime, that's the time field mask in the
MSR. Throughout the code you use that as a mask, which is not really
obvious.

> +	(((max_time) & MSR_IA32_UMWAIT_CONTROL_MAX_TIME) |		\

and later on:

	if (max_time & ~MSR_IA32_UMWAIT_CONTROL_MAX_TIME)

What? How is anyone supposed to understand that?

	if (max_time & ~MSR_IA32_UMWAIT_CONTROL_TIME_MASK)

makes it entirely clear that the value is not allowed to have any bits
outside of the mask set.

> +
> +#define UMWAIT_C02_ENABLED	(0 & MSR_IA32_UMWAIT_CONTROL_C02_DISABLED)

The AND is there for maximal confusion of the reader?

> +/*
> + * On resume, set up IA32_UMWAIT_CONTROL MSR on BP which is the only active
> + * CPU at this time. Setting up the MSR on APs when they are re-added later
> + * using CPU hotplug.
> + * The MSR on BP is supposed not to be changed during suspend and thus it's
> + * unnecessary to set it again during resume from suspend. But at this point
> + * we don't know resume is from suspend or hibernation. To simplify the
> + * situation, just set up the MSR on resume from suspend.

We also do not trust any firmware by default whatever it is supposed to do.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-20  1:33 ` [PATCH v5 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state Fenghua Yu
@ 2019-06-23 22:40   ` Thomas Gleixner
  2019-06-24  0:02   ` [tip:x86/cpu] " tip-bot for Fenghua Yu
  1 sibling, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2019-06-23 22:40 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Borislav Petkov, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Wed, 19 Jun 2019, Fenghua Yu wrote:
> C0.2 state in umwait and tpause instructions can be enabled or disabled
> on a processor through IA32_UMWAIT_CONTROL MSR register.

  through the IA32....CONTROL MSR.

MSR register, IOW: Machine Specific Register register.

> 
> Andy Lutomirski suggests to turn off local irqs before writing
> MSR_TEST_CTL to ensure msr_test_ctl_cached is not changed by sysfs write

What has MSR_TEST_CTL to do with this?

> +/*
> + * Serialize access to umwait_control_cached and IA32_UMWAIT_CONTROL MSR
> + * in writing sysfs to ensure all CPUs have the same MSR value.
> + */
> +static DEFINE_MUTEX(umwait_lock);
> +
> +static void update_this_cpu_umwait_control_msr(void)

Why is this not following the umwait_ namespace as everything else?

> +{
> +	unsigned long flags;
> +
> +	/*
> +	 * We need to prevent umwait_control_cached from being changed *and*
> +	 * completing its WRMSR between our read and our WRMSR. By turning

Huch? How does umwait_control_cached complete its WRMSR?

> +	 * IRQs off here, ensure that no sysfs write happens on this CPU

How would that happen? If this is called from a hotplugged CPU then that
CPU cannot handle sysfs writes even if the hotplug thread on which this
runs is preempted. The CPU is not marked active yet and cannot schedule
user space tasks. Doing the wrmsr(MSR, READ()) preemptible would just make
the race window larger.

> +	 * and we also make sure that any concurrent sysfs write from a
> +	 * different CPU will not finish updating us via IPI until we're done.

  will not finish updating us?

> +	local_irq_save(flags);

That local_irq_save() belongs into the cpu hotplug callback and there it
wants to be a local_irq_disable(). The IPI runs already with interrupts
disabled. And that comment wants to be in the hotplug function as well.

With that you can also spare the indirection via that extra IPI function
below and just add the void *unused argument to this and invoke it from the
hotplug callback with NULL.

> +
> +	wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
> +
> +	local_irq_restore(flags);
> +}

> +static u32 get_umwait_ctrl_c02(void)
> +{
> +	return READ_ONCE(umwait_control_cached) & MSR_IA32_UMWAIT_CONTROL_C02_DISABLED;
> +}
> +
> +static u32 get_umwait_ctrl_max_time(void)
> +{
> +	return READ_ONCE(umwait_control_cached) & MSR_IA32_UMWAIT_CONTROL_MAX_TIME;
> +}
> +
> +static ssize_t
> +enable_c02_show(struct device *dev, struct device_attribute *attr, char *buf)
> +{
> +	 /*
> +	  * When bit 0 in IA32_UMWAIT_CONTROL MSR is 1, C0.2 is disabled.
> +	  * Otherwise, C0.2 is enabled. Show the opposite of bit 0.
> +	  */
> +	return sprintf(buf, "%d\n", !(bool)get_umwait_ctrl_c02());

Eeew. !(bool)....

> +}
> +
> +static ssize_t enable_c02_store(struct device *dev,
> +				struct device_attribute *attr,
> +				const char *buf, size_t count)
> +{
> +	u32 umwait_c02;
> +	bool c02_enabled;

Your naming conventions are inconsistent. The file is named 'enable', which
is fine, but here you read the written value into a variable named 'enabled'

> +	int ret;
> +
> +	ret = kstrtobool(buf, &c02_enabled);
> +	if (ret)
> +		return ret;
> +
> +	mutex_lock(&umwait_lock);
> +
> +	/*
> +	 * The value of bit 0 in IA32_UMWAIT_CONTROL MSR is opposite of
> +	 * c02_enabled.

How many of those comments do we need?

> +	 */
> +	umwait_c02 = (u32)!c02_enabled;

Umpf.

> +	if (umwait_c02 == get_umwait_ctrl_c02())
> +		goto out_unlock;
> +
> +	WRITE_ONCE(umwait_control_cached,
> +		   UMWAIT_CTRL_VAL(get_umwait_ctrl_max_time(), umwait_c02));

And how often do we need to read that cached value? Why not doing the
obvious, read the ctrl value once and then work from there.

    	ctrl = READ_ONCE(umwait_control_cached);

        if (c02_enable == umwait_c02_enabled(ctrl))
                goto out_unlock;

        ctrl = umwait_max_time(ctrl);
        if (!c02_enable)
                ctrl |= MSR_IA32_UMWAIT_CONTROL_C02_DISABLE;

        WRITE_ONCE(umwait_control_cached, ctrl);

That does not need any comment about the inverted bit and whatever. It's
just clear what it does. The only place where this comment is required is
in umwait_c02_enabled() which returns the enabled state from the control
value argument as boolean. That makes also the weird type casts go away.

>  static int __init umwait_init(void)
>  {
> +	struct device *dev;
>  	int ret;
>  
>  	if (!boot_cpu_has(X86_FEATURE_WAITPKG))
>  		return -ENODEV;
>  
> +	/* Add umwait control interface. */
> +	dev = cpu_subsys.dev_root;
> +	ret = sysfs_create_group(&dev->kobj, &umwait_attr_group);
> +	if (ret)
> +		return ret;

When that fails then all CPUs have some random possibly nonsensical value
in the control MSR, whatever the firmware or reset default set it to.

Not that it matters much, because boot will probably not work at all, but
you can spare the removal below, when you move that sysfs thing to the very
end. And it makes more sense that way.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 4/5] x86/umwait: Add sysfs interface to control umwait maximum time
  2019-06-20  1:33 ` [PATCH v5 4/5] x86/umwait: Add sysfs interface to control umwait maximum time Fenghua Yu
@ 2019-06-23 22:40   ` Thomas Gleixner
  2019-06-24  0:03   ` [tip:x86/cpu] " tip-bot for Fenghua Yu
  1 sibling, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2019-06-23 22:40 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Borislav Petkov, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Wed, 19 Jun 2019, Fenghua Yu wrote:

> Users can write an unsigned 32-bit number to
> /sys/devices/system/cpu/umwait_control/max_time to change the default

Users? Administrators can. Users NOT.

> value. Note that a value of zero means there is no limit. Low order
> two bits must be zero.

...

> +static ssize_t max_time_store(struct device *kobj,
> +			      struct device_attribute *attr,
> +			      const char *buf, size_t count)
> +{
> +	u32 max_time;
> +	int ret;
> +
> +	ret = kstrtou32(buf, 0, &max_time);
> +	if (ret)
> +		return ret;
> +
> +	/* bits[1:0] must be zero */
> +	if (max_time & ~MSR_IA32_UMWAIT_CONTROL_MAX_TIME)
> +		return -EINVAL;
> +
> +	mutex_lock(&umwait_lock);
> +
> +	if (max_time == get_umwait_ctrl_max_time())
> +		goto out_unlock;
> +
> +	WRITE_ONCE(umwait_control_cached,
> +		   UMWAIT_CTRL_VAL(max_time, get_umwait_ctrl_c02()));

Same convoluted logic with reading the cached value twice to confuse the
reader.

	ctrl = READ_ONCE(umwait_control_cached);
	if (max_time == umwait_ctrl_max_time(ctrl))
		goto out_unlock;

	ctrl = (ctrl & ~MSR_IA32_UMWAIT_CONTROL_TIME_MASK) | max_time;
	WRITE_ONCE(umwait_control_cached, ctrl);

Simple, right?

But this can be done even simpler with a shared update function:

static void umwait_update_control(u32 maxtime, bool c02_enable)
{
        u32 ctrl = maxtime & MSR_IA32_UMWAIT_CONTROL_TIME_MASK;

        if (!c02_enable)
                ctrl |= MSR_IA32_UMWAIT_CONTROL_C02_DISABLE;

        WRITE_ONCE(umwait_control_cached, ctrl);
        /* Propagate to all CPUs */
        on_each_cpu(umwait_update_control_msr, NULL, 1);
}

With that both functions become trivial and do not have duplicated code.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 5/5] x86/umwait: Document umwait control sysfs interfaces
  2019-06-20  1:33 ` [PATCH v5 5/5] x86/umwait: Document umwait control sysfs interfaces Fenghua Yu
@ 2019-06-23 22:42   ` Thomas Gleixner
  2019-06-23 22:46     ` Thomas Gleixner
  2019-06-24  0:03   ` [tip:x86/cpu] Documentation/ABI: " tip-bot for Fenghua Yu
  1 sibling, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2019-06-23 22:42 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Borislav Petkov, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Wed, 19 Jun 2019, Fenghua Yu wrote:
> +Description:	Umwait control
> +
> +		enable_c02: Read/write interface to control umwait C0.2 state
> +			Read returns C0.2 state status:
> +				0: C0.2 is disabled
> +				1: C0.2 is enabled
> +
> +			Write 'Yy1' or [oO][nN] for on to enable C0.2 state.

  Write 'Yy1' ? You meant [Yy1] I assume.

> +			Write 'Nn0' or [oO][fF] for off to disable C0.2 state.
  
What about avoiding all that unreadable confusion?

                        Write 'y' or '1'  or 'on' to enable C0.2 state.
                        Write 'n' or '0'  or 'of' to disable C0.2 state.

                        The interface is case insensitive.
Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 5/5] x86/umwait: Document umwait control sysfs interfaces
  2019-06-23 22:42   ` Thomas Gleixner
@ 2019-06-23 22:46     ` Thomas Gleixner
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2019-06-23 22:46 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Borislav Petkov, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Mon, 24 Jun 2019, Thomas Gleixner wrote:
> On Wed, 19 Jun 2019, Fenghua Yu wrote:
> > +Description:	Umwait control
> > +
> > +		enable_c02: Read/write interface to control umwait C0.2 state
> > +			Read returns C0.2 state status:
> > +				0: C0.2 is disabled
> > +				1: C0.2 is enabled
> > +
> > +			Write 'Yy1' or [oO][nN] for on to enable C0.2 state.
> 
>   Write 'Yy1' ? You meant [Yy1] I assume.
> 
> > +			Write 'Nn0' or [oO][fF] for off to disable C0.2 state.
>   
> What about avoiding all that unreadable confusion?
> 
>                         Write 'y' or '1'  or 'on' to enable C0.2 state.
>                         Write 'n' or '0'  or 'of' to disable C0.2 state.
> 
>                         The interface is case insensitive.

Don't try to fixup all that in a hurry.

I've already done most of it while trying to ready if for merging. If you
see the tip bot mails coming in, make sure to double check my modifications
and yell if I screwed up on the way.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [tip:x86/cpu] x86/cpufeatures: Enumerate user wait instructions
  2019-06-20  1:33 ` [PATCH v5 1/5] x86/cpufeatures: Enumerate " Fenghua Yu
@ 2019-06-24  0:01   ` tip-bot for Fenghua Yu
  0 siblings, 0 replies; 19+ messages in thread
From: tip-bot for Fenghua Yu @ 2019-06-24  0:01 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, tony.luck, ashok.raj, linux-kernel, fenghua.yu, tglx, luto,
	mingo, peterz, ravi.v.shankar, bp

Commit-ID:  6dbbf5ec9e1e9f607a4c51266d0f9a63ba754b63
Gitweb:     https://git.kernel.org/tip/6dbbf5ec9e1e9f607a4c51266d0f9a63ba754b63
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Wed, 19 Jun 2019 18:33:54 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Mon, 24 Jun 2019 01:44:19 +0200

x86/cpufeatures: Enumerate user wait instructions

umonitor, umwait, and tpause are a set of user wait instructions.

umonitor arms address monitoring hardware using an address. The
address range is determined by using CPUID.0x5. A store to
an address within the specified address range triggers the
monitoring hardware to wake up the processor waiting in umwait.

umwait instructs the processor to enter an implementation-dependent
optimized state while monitoring a range of addresses. The optimized
state may be either a light-weight power/performance optimized state
(C0.1 state) or an improved power/performance optimized state
(C0.2 state).

tpause instructs the processor to enter an implementation-dependent
optimized state C0.1 or C0.2 state and wake up when time-stamp counter
reaches specified timeout.

The three instructions may be executed at any privilege level.

The instructions provide power saving method while waiting in
user space. Additionally, they can allow a sibling hyperthread to
make faster progress while this thread is waiting. One example of an
application usage of umwait is when waiting for input data from another
application, such as a user level multi-threaded packet processing
engine.

Availability of the user wait instructions is indicated by the presence
of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5].

Detailed information on the instructions and CPUID feature WAITPKG flag
can be found in the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference and Intel 64 and IA-32
Architectures Software Developer's Manual.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: "Borislav Petkov" <bp@alien8.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/1560994438-235698-2-git-send-email-fenghua.yu@intel.com

---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 8ecd9fac97c3..998c2cc08363 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -330,6 +330,7 @@
 #define X86_FEATURE_UMIP		(16*32+ 2) /* User Mode Instruction Protection */
 #define X86_FEATURE_PKU			(16*32+ 3) /* Protection Keys for Userspace */
 #define X86_FEATURE_OSPKE		(16*32+ 4) /* OS Protection Keys Enable */
+#define X86_FEATURE_WAITPKG		(16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */
 #define X86_FEATURE_AVX512_VBMI2	(16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */
 #define X86_FEATURE_GFNI		(16*32+ 8) /* Galois Field New Instructions */
 #define X86_FEATURE_VAES		(16*32+ 9) /* Vector AES */

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [tip:x86/cpu] x86/umwait: Initialize umwait control values
  2019-06-20  1:33 ` [PATCH v5 2/5] x86/umwait: Initialize umwait control values Fenghua Yu
  2019-06-23 22:39   ` Thomas Gleixner
@ 2019-06-24  0:01   ` tip-bot for Fenghua Yu
  1 sibling, 0 replies; 19+ messages in thread
From: tip-bot for Fenghua Yu @ 2019-06-24  0:01 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, bp, tony.luck, fenghua.yu, ravi.v.shankar, tglx, mingo,
	luto, peterz, ashok.raj, linux-kernel

Commit-ID:  bd688c69b7e6693de3bd78f38fd63f7850c2711e
Gitweb:     https://git.kernel.org/tip/bd688c69b7e6693de3bd78f38fd63f7850c2711e
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Wed, 19 Jun 2019 18:33:55 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Mon, 24 Jun 2019 01:44:19 +0200

x86/umwait: Initialize umwait control values

umwait or tpause allows the processor to enter a light-weight
power/performance optimized state (C0.1 state) or an improved
power/performance optimized state (C0.2 state) for a period specified by
the instruction or until the system time limit or until a store to the
monitored address range in umwait.

IA32_UMWAIT_CONTROL MSR register allows the OS to enable/disable C0.2 on
the processor and to set the maximum time the processor can reside in C0.1
or C0.2.

By default C0.2 is enabled so the user wait instructions can enter the
C0.2 state to save more power with slower wakeup time.

Andy Lutomirski proposed to set the maximum umwait time to 100000 cycles by
default. A quote from Andy:

  "What I want to avoid is the case where it works dramatically differently
   on NO_HZ_FULL systems as compared to everything else. Also, UMWAIT may
   behave a bit differently if the max timeout is hit, and I'd like that
   path to get exercised widely by making it happen even on default
   configs."

A sysfs interface to adjust the time and the C0.2 enablement is provided in
a follow up change.

[ tglx: Renamed MSR_IA32_UMWAIT_CONTROL_MAX_TIME to
  	MSR_IA32_UMWAIT_CONTROL_TIME_MASK because the constant is used as
  	mask throughout the code.
	Massaged comments and changelog ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: "Borislav Petkov" <bp@alien8.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/1560994438-235698-3-git-send-email-fenghua.yu@intel.com

---
 arch/x86/include/asm/msr-index.h |  9 ++++++
 arch/x86/kernel/cpu/Makefile     |  1 +
 arch/x86/kernel/cpu/umwait.c     | 62 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 72 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 979ef971cc78..6b4fc2788078 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -61,6 +61,15 @@
 #define MSR_PLATFORM_INFO_CPUID_FAULT_BIT	31
 #define MSR_PLATFORM_INFO_CPUID_FAULT		BIT_ULL(MSR_PLATFORM_INFO_CPUID_FAULT_BIT)
 
+#define MSR_IA32_UMWAIT_CONTROL			0xe1
+#define MSR_IA32_UMWAIT_CONTROL_C02_DISABLE	BIT(0)
+#define MSR_IA32_UMWAIT_CONTROL_RESERVED	BIT(1)
+/*
+ * The time field is bit[31:2], but representing a 32bit value with
+ * bit[1:0] zero.
+ */
+#define MSR_IA32_UMWAIT_CONTROL_TIME_MASK	(~0x03U)
+
 #define MSR_PKG_CST_CONFIG_CONTROL	0x000000e2
 #define NHM_C3_AUTO_DEMOTE		(1UL << 25)
 #define NHM_C1_AUTO_DEMOTE		(1UL << 26)
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index a7d9a4cb3ab6..4b4eb06e117c 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -24,6 +24,7 @@ obj-y			+= match.o
 obj-y			+= bugs.o
 obj-y			+= aperfmperf.o
 obj-y			+= cpuid-deps.o
+obj-y			+= umwait.o
 
 obj-$(CONFIG_PROC_FS)	+= proc.o
 obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o
diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c
new file mode 100644
index 000000000000..0a113c731df3
--- /dev/null
+++ b/arch/x86/kernel/cpu/umwait.c
@@ -0,0 +1,62 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/syscore_ops.h>
+#include <linux/suspend.h>
+#include <linux/cpu.h>
+
+#include <asm/msr.h>
+
+#define UMWAIT_C02_ENABLE	0
+
+#define UMWAIT_CTRL_VAL(maxtime, c02_disable)				\
+	(((maxtime) & MSR_IA32_UMWAIT_CONTROL_TIME_MASK) |		\
+	((c02_disable) & MSR_IA32_UMWAIT_CONTROL_C02_DISABLE))
+
+/*
+ * Cache IA32_UMWAIT_CONTROL MSR. This is a systemwide control. By default,
+ * umwait max time is 100000 in TSC-quanta and C0.2 is enabled
+ */
+static u32 umwait_control_cached = UMWAIT_CTRL_VAL(100000, UMWAIT_C02_ENABLE);
+
+/* Set IA32_UMWAIT_CONTROL MSR on this CPU to the current global setting. */
+static int umwait_cpu_online(unsigned int cpu)
+{
+	wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_control_cached, 0);
+	return 0;
+}
+
+/*
+ * On resume, restore IA32_UMWAIT_CONTROL MSR on the boot processor which
+ * is the only active CPU at this time. The MSR is set up on the APs via the
+ * CPU hotplug callback.
+ *
+ * This function is invoked on resume from suspend and hibernation. On
+ * resume from suspend the restore should be not required, but we neither
+ * trust the firmware nor does it matter if the same value is written
+ * again.
+ */
+static void umwait_syscore_resume(void)
+{
+	wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_control_cached, 0);
+}
+
+static struct syscore_ops umwait_syscore_ops = {
+	.resume	= umwait_syscore_resume,
+};
+
+static int __init umwait_init(void)
+{
+	int ret;
+
+	if (!boot_cpu_has(X86_FEATURE_WAITPKG))
+		return -ENODEV;
+
+	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait:online",
+				umwait_cpu_online, NULL);
+	if (ret < 0)
+		return ret;
+
+	register_syscore_ops(&umwait_syscore_ops);
+
+	return 0;
+}
+device_initcall(umwait_init);

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [tip:x86/cpu] x86/umwait: Add sysfs interface to control umwait C0.2 state
  2019-06-20  1:33 ` [PATCH v5 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state Fenghua Yu
  2019-06-23 22:40   ` Thomas Gleixner
@ 2019-06-24  0:02   ` tip-bot for Fenghua Yu
  1 sibling, 0 replies; 19+ messages in thread
From: tip-bot for Fenghua Yu @ 2019-06-24  0:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, mingo, tglx, tony.luck, ravi.v.shankar, bp, fenghua.yu,
	hpa, linux-kernel, luto, ashok.raj

Commit-ID:  ff4b353f2ef9dc8e396d7cb9572801e34a8c7374
Gitweb:     https://git.kernel.org/tip/ff4b353f2ef9dc8e396d7cb9572801e34a8c7374
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Wed, 19 Jun 2019 18:33:56 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Mon, 24 Jun 2019 01:44:20 +0200

x86/umwait: Add sysfs interface to control umwait C0.2 state

C0.2 state in umwait and tpause instructions can be enabled or disabled
on a processor through IA32_UMWAIT_CONTROL MSR register.

By default, C0.2 is enabled and the user wait instructions results in
lower power consumption with slower wakeup time.

But in real time systems which require faster wakeup time although power
savings could be smaller, the administrator needs to disable C0.2 and all
umwait invocations from user applications use C0.1.

Create a sysfs interface which allows the administrator to control C0.2
state during run time.

Andy Lutomirski suggested to turn off local irqs before writing the MSR to
ensure the cached control value is not changed by a concurrent sysfs write
from a different CPU via IPI.

[ tglx: Simplified the update logic in the write function and got rid of
  	all the convoluted type casts. Added a shared update function and
	made the namespace consistent. Moved the sysfs create invocation.
	Massaged changelog ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: "Borislav Petkov" <bp@alien8.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Andy Lutomirski" <luto@kernel.org>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/1560994438-235698-4-git-send-email-fenghua.yu@intel.com

---
 arch/x86/kernel/cpu/umwait.c | 118 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 110 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c
index 0a113c731df3..56149d630e35 100644
--- a/arch/x86/kernel/cpu/umwait.c
+++ b/arch/x86/kernel/cpu/umwait.c
@@ -7,8 +7,8 @@
 
 #define UMWAIT_C02_ENABLE	0
 
-#define UMWAIT_CTRL_VAL(maxtime, c02_disable)				\
-	(((maxtime) & MSR_IA32_UMWAIT_CONTROL_TIME_MASK) |		\
+#define UMWAIT_CTRL_VAL(max_time, c02_disable)				\
+	(((max_time) & MSR_IA32_UMWAIT_CONTROL_TIME_MASK) |		\
 	((c02_disable) & MSR_IA32_UMWAIT_CONTROL_C02_DISABLE))
 
 /*
@@ -17,10 +17,38 @@
  */
 static u32 umwait_control_cached = UMWAIT_CTRL_VAL(100000, UMWAIT_C02_ENABLE);
 
-/* Set IA32_UMWAIT_CONTROL MSR on this CPU to the current global setting. */
+/*
+ * Serialize access to umwait_control_cached and IA32_UMWAIT_CONTROL MSR in
+ * the sysfs write functions.
+ */
+static DEFINE_MUTEX(umwait_lock);
+
+static void umwait_update_control_msr(void * unused)
+{
+	lockdep_assert_irqs_disabled();
+	wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
+}
+
+/*
+ * The CPU hotplug callback sets the control MSR to the global control
+ * value.
+ *
+ * Disable interrupts so the read of umwait_control_cached and the WRMSR
+ * are protected against a concurrent sysfs write. Otherwise the sysfs
+ * write could update the cached value after it had been read on this CPU
+ * and issue the IPI before the old value had been written. The IPI would
+ * interrupt, write the new value and after return from IPI the previous
+ * value would be written by this CPU.
+ *
+ * With interrupts disabled the upcoming CPU either sees the new control
+ * value or the IPI is updating this CPU to the new control value after
+ * interrupts have been reenabled.
+ */
 static int umwait_cpu_online(unsigned int cpu)
 {
-	wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_control_cached, 0);
+	local_irq_disable();
+	umwait_update_control_msr(NULL);
+	local_irq_enable();
 	return 0;
 }
 
@@ -36,15 +64,86 @@ static int umwait_cpu_online(unsigned int cpu)
  */
 static void umwait_syscore_resume(void)
 {
-	wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_control_cached, 0);
+	umwait_update_control_msr(NULL);
 }
 
 static struct syscore_ops umwait_syscore_ops = {
 	.resume	= umwait_syscore_resume,
 };
 
+/* sysfs interface */
+
+/*
+ * When bit 0 in IA32_UMWAIT_CONTROL MSR is 1, C0.2 is disabled.
+ * Otherwise, C0.2 is enabled.
+ */
+static inline bool umwait_ctrl_c02_enabled(u32 ctrl)
+{
+	return !(ctrl & MSR_IA32_UMWAIT_CONTROL_C02_DISABLE);
+}
+
+static inline u32 umwait_ctrl_max_time(u32 ctrl)
+{
+	return ctrl & MSR_IA32_UMWAIT_CONTROL_TIME_MASK;
+}
+
+static inline void umwait_update_control(u32 maxtime, bool c02_enable)
+{
+	u32 ctrl = maxtime & MSR_IA32_UMWAIT_CONTROL_TIME_MASK;
+
+	if (!c02_enable)
+		ctrl |= MSR_IA32_UMWAIT_CONTROL_C02_DISABLE;
+
+	WRITE_ONCE(umwait_control_cached, ctrl);
+	/* Propagate to all CPUs */
+	on_each_cpu(umwait_update_control_msr, NULL, 1);
+}
+
+static ssize_t
+enable_c02_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	u32 ctrl = READ_ONCE(umwait_control_cached);
+
+	return sprintf(buf, "%d\n", umwait_ctrl_c02_enabled(ctrl));
+}
+
+static ssize_t enable_c02_store(struct device *dev,
+				struct device_attribute *attr,
+				const char *buf, size_t count)
+{
+	bool c02_enable;
+	u32 ctrl;
+	int ret;
+
+	ret = kstrtobool(buf, &c02_enable);
+	if (ret)
+		return ret;
+
+	mutex_lock(&umwait_lock);
+
+	ctrl = READ_ONCE(umwait_control_cached);
+	if (c02_enable != umwait_ctrl_c02_enabled(ctrl))
+		umwait_update_control(ctrl, c02_enable);
+
+	mutex_unlock(&umwait_lock);
+
+	return count;
+}
+static DEVICE_ATTR_RW(enable_c02);
+
+static struct attribute *umwait_attrs[] = {
+	&dev_attr_enable_c02.attr,
+	NULL
+};
+
+static struct attribute_group umwait_attr_group = {
+	.attrs = umwait_attrs,
+	.name = "umwait_control",
+};
+
 static int __init umwait_init(void)
 {
+	struct device *dev;
 	int ret;
 
 	if (!boot_cpu_has(X86_FEATURE_WAITPKG))
@@ -52,11 +151,14 @@ static int __init umwait_init(void)
 
 	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait:online",
 				umwait_cpu_online, NULL);
-	if (ret < 0)
-		return ret;
 
 	register_syscore_ops(&umwait_syscore_ops);
 
-	return 0;
+	/*
+	 * Add umwait control interface. Ignore failure, so at least the
+	 * default values are set up in case the machine manages to boot.
+	 */
+	dev = cpu_subsys.dev_root;
+	return sysfs_create_group(&dev->kobj, &umwait_attr_group);
 }
 device_initcall(umwait_init);

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [tip:x86/cpu] x86/umwait: Add sysfs interface to control umwait maximum time
  2019-06-20  1:33 ` [PATCH v5 4/5] x86/umwait: Add sysfs interface to control umwait maximum time Fenghua Yu
  2019-06-23 22:40   ` Thomas Gleixner
@ 2019-06-24  0:03   ` tip-bot for Fenghua Yu
  1 sibling, 0 replies; 19+ messages in thread
From: tip-bot for Fenghua Yu @ 2019-06-24  0:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: ravi.v.shankar, hpa, mingo, bp, luto, fenghua.yu, ashok.raj,
	tony.luck, peterz, linux-kernel, tglx

Commit-ID:  bd9a0c97e53c3d7a56b2751179903ddc5da42683
Gitweb:     https://git.kernel.org/tip/bd9a0c97e53c3d7a56b2751179903ddc5da42683
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Wed, 19 Jun 2019 18:33:57 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Mon, 24 Jun 2019 01:44:20 +0200

x86/umwait: Add sysfs interface to control umwait maximum time

IA32_UMWAIT_CONTROL[31:2] determines the maximum time in TSC-quanta
that processor can stay in C0.1 or C0.2. A zero value means no maximum
time.

Each instruction sets its own deadline in the instruction's implicit
input EDX:EAX value. The instruction wakes up if the time-stamp counter
reaches or exceeds the specified deadline, or the umwait maximum time
expires, or a store happens in the monitored address range in umwait.

The administrator can write an unsigned 32-bit number to
/sys/devices/system/cpu/umwait_control/max_time to change the default
value. Note that a value of zero means there is no limit. The lower two
bits of the value must be zero.

[ tglx: Simplify the write function. Massage changelog ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: "Borislav Petkov" <bp@alien8.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Andy Lutomirski" <luto@kernel.org>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/1560994438-235698-5-git-send-email-fenghua.yu@intel.com

---
 arch/x86/kernel/cpu/umwait.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c
index 56149d630e35..6a204e7336c1 100644
--- a/arch/x86/kernel/cpu/umwait.c
+++ b/arch/x86/kernel/cpu/umwait.c
@@ -131,8 +131,44 @@ static ssize_t enable_c02_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(enable_c02);
 
+static ssize_t
+max_time_show(struct device *kobj, struct device_attribute *attr, char *buf)
+{
+	u32 ctrl = READ_ONCE(umwait_control_cached);
+
+	return sprintf(buf, "%u\n", umwait_ctrl_max_time(ctrl));
+}
+
+static ssize_t max_time_store(struct device *kobj,
+			      struct device_attribute *attr,
+			      const char *buf, size_t count)
+{
+	u32 max_time, ctrl;
+	int ret;
+
+	ret = kstrtou32(buf, 0, &max_time);
+	if (ret)
+		return ret;
+
+	/* bits[1:0] must be zero */
+	if (max_time & ~MSR_IA32_UMWAIT_CONTROL_TIME_MASK)
+		return -EINVAL;
+
+	mutex_lock(&umwait_lock);
+
+	ctrl = READ_ONCE(umwait_control_cached);
+	if (max_time != umwait_ctrl_max_time(ctrl))
+		umwait_update_control(max_time, umwait_ctrl_c02_enabled(ctrl));
+
+	mutex_unlock(&umwait_lock);
+
+	return count;
+}
+static DEVICE_ATTR_RW(max_time);
+
 static struct attribute *umwait_attrs[] = {
 	&dev_attr_enable_c02.attr,
+	&dev_attr_max_time.attr,
 	NULL
 };
 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [tip:x86/cpu] Documentation/ABI: Document umwait control sysfs interfaces
  2019-06-20  1:33 ` [PATCH v5 5/5] x86/umwait: Document umwait control sysfs interfaces Fenghua Yu
  2019-06-23 22:42   ` Thomas Gleixner
@ 2019-06-24  0:03   ` tip-bot for Fenghua Yu
  1 sibling, 0 replies; 19+ messages in thread
From: tip-bot for Fenghua Yu @ 2019-06-24  0:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, hpa, bp, peterz, ravi.v.shankar, ashok.raj, tony.luck,
	tglx, fenghua.yu, linux-kernel, luto

Commit-ID:  203dffacf592317e54480704f569a09f8b7ca380
Gitweb:     https://git.kernel.org/tip/203dffacf592317e54480704f569a09f8b7ca380
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Wed, 19 Jun 2019 18:33:58 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Mon, 24 Jun 2019 01:44:35 +0200

Documentation/ABI: Document umwait control sysfs interfaces

Since two new sysfs interface files are created for umwait control, add
an ABI document entry for the files:

   /sys/devices/system/cpu/umwait_control/enable_c02
   /sys/devices/system/cpu/umwait_control/max_time

[ tglx: Made the write value instructions readable ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Cc: "Borislav Petkov" <bp@alien8.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Andy Lutomirski" <luto@kernel.org>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/1560994438-235698-6-git-send-email-fenghua.yu@intel.com
---
 Documentation/ABI/testing/sysfs-devices-system-cpu | 23 ++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 1528239f69b2..923fe2001472 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -538,3 +538,26 @@ Description:	Intel Energy and Performance Bias Hint (EPB)
 
 		This attribute is present for all online CPUs supporting the
 		Intel EPB feature.
+
+What:		/sys/devices/system/cpu/umwait_control
+		/sys/devices/system/cpu/umwait_control/enable_c02
+		/sys/devices/system/cpu/umwait_control/max_time
+Date:		May 2019
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	Umwait control
+
+		enable_c02: Read/write interface to control umwait C0.2 state
+			Read returns C0.2 state status:
+				0: C0.2 is disabled
+				1: C0.2 is enabled
+
+			Write 'y' or '1'  or 'on' to enable C0.2 state.
+			Write 'n' or '0'  or 'off' to disable C0.2 state.
+
+			The interface is case insensitive.
+
+		max_time: Read/write interface to control umwait maximum time
+			  in TSC-quanta that the CPU can reside in either C0.1
+			  or C0.2 state. The time is an unsigned 32-bit number.
+			  Note that a value of zero means there is no limit.
+			  Low order two bits must be zero.

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 2/5] x86/umwait: Initialize umwait control values
  2019-06-23 22:39   ` Thomas Gleixner
@ 2019-06-24 22:12     ` Fenghua Yu
  0 siblings, 0 replies; 19+ messages in thread
From: Fenghua Yu @ 2019-06-24 22:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, H Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Ashok Raj, Tony Luck, Ravi V Shankar,
	linux-kernel, x86

On Mon, Jun 24, 2019 at 12:39:05AM +0200, Thomas Gleixner wrote:
> On Wed, 19 Jun 2019, Fenghua Yu wrote:
> >  
> > +#define MSR_IA32_UMWAIT_CONTROL			0xe1
> > +#define MSR_IA32_UMWAIT_CONTROL_C02_DISABLED	BIT(0)
> > +#define MSR_IA32_UMWAIT_CONTROL_MAX_TIME	0xfffffffc
> 
> Errm, no! That's not maxtime, that's the time field mask in the
> MSR. Throughout the code you use that as a mask, which is not really
> obvious.
> 
> > +	(((max_time) & MSR_IA32_UMWAIT_CONTROL_MAX_TIME) |		\
> 
> and later on:
> 
> 	if (max_time & ~MSR_IA32_UMWAIT_CONTROL_MAX_TIME)
> 
> What? How is anyone supposed to understand that?
> 
> 	if (max_time & ~MSR_IA32_UMWAIT_CONTROL_TIME_MASK)
> 
> makes it entirely clear that the value is not allowed to have any bits
> outside of the mask set.
> 
> > +
> > +#define UMWAIT_C02_ENABLED	(0 & MSR_IA32_UMWAIT_CONTROL_C02_DISABLED)
> 
> The AND is there for maximal confusion of the reader?
> 
> > +/*
> > + * On resume, set up IA32_UMWAIT_CONTROL MSR on BP which is the only active
> > + * CPU at this time. Setting up the MSR on APs when they are re-added later
> > + * using CPU hotplug.
> > + * The MSR on BP is supposed not to be changed during suspend and thus it's
> > + * unnecessary to set it again during resume from suspend. But at this point
> > + * we don't know resume is from suspend or hibernation. To simplify the
> > + * situation, just set up the MSR on resume from suspend.
> 
> We also do not trust any firmware by default whatever it is supposed to do.

Thank you very much for pointing out and fixing all the issues and merging
the patches into the tip tree!

I test the tip tree and everything works fine.

-Fenghua



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-06-24 22:22 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-20  1:33 [PATCH v5 0/5] x86/umwait: Enable user wait instructions Fenghua Yu
2019-06-20  1:33 ` [PATCH v5 1/5] x86/cpufeatures: Enumerate " Fenghua Yu
2019-06-24  0:01   ` [tip:x86/cpu] " tip-bot for Fenghua Yu
2019-06-20  1:33 ` [PATCH v5 2/5] x86/umwait: Initialize umwait control values Fenghua Yu
2019-06-23 22:39   ` Thomas Gleixner
2019-06-24 22:12     ` Fenghua Yu
2019-06-24  0:01   ` [tip:x86/cpu] " tip-bot for Fenghua Yu
2019-06-20  1:33 ` [PATCH v5 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state Fenghua Yu
2019-06-23 22:40   ` Thomas Gleixner
2019-06-24  0:02   ` [tip:x86/cpu] " tip-bot for Fenghua Yu
2019-06-20  1:33 ` [PATCH v5 4/5] x86/umwait: Add sysfs interface to control umwait maximum time Fenghua Yu
2019-06-23 22:40   ` Thomas Gleixner
2019-06-24  0:03   ` [tip:x86/cpu] " tip-bot for Fenghua Yu
2019-06-20  1:33 ` [PATCH v5 5/5] x86/umwait: Document umwait control sysfs interfaces Fenghua Yu
2019-06-23 22:42   ` Thomas Gleixner
2019-06-23 22:46     ` Thomas Gleixner
2019-06-24  0:03   ` [tip:x86/cpu] Documentation/ABI: " tip-bot for Fenghua Yu
2019-06-20 16:25 ` [PATCH v5 0/5] x86/umwait: Enable user wait instructions Andy Lutomirski
2019-06-20 23:28   ` Fenghua Yu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.