linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/8] x86: Enable a few new instructions
@ 2018-06-16  3:06 Fenghua Yu
  2018-06-16  3:06 ` [RFC PATCH 1/8] x86/cpufeatures: Enumerate MOVDIRI instruction Fenghua Yu
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Fenghua Yu @ 2018-06-16  3:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H Peter Anvin
  Cc: Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86, Fenghua Yu

A few new instructions including direct stores (movdiri and movdir64b)
and user wait (umwait, umonitor, and tpause) and IA32_MWAIT_CONTROL MSR to
control umwait/umonitor/tpause behaviors will be available in Tremont and
other future x86 processors.

This patch set enumerates the instructions, adds a sysfs interface for
user to configure the umwait/umonitor/tpause instructions, and provides
APIs for user or kernel to call the instructions.

The sysfs interface files are in /sys/devices/system/cpu/user_wait because
it's hard to find existing place to host the files.

The user libraries for the instructions are in arch/x86/include/uapi/asm/.
Hopefully this is a right place to keep the libraries.

Detailed information on the instructions and the MSR can be found in
the latest Intel Architecture Instruction Set Extensions and Future
Features Programming Reference at
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

Fenghua Yu (8):
  x86/cpufeatures: Enumerate MOVDIRI instruction
  x86/cpufeatures: Enumerate MOVDIR64B instruction
  x86/cpufeatures: Enumerate UMONITOR, UMWAIT, and TPAUSE instructions
  cpuidle: Set up maximum umwait time and umwait states
  x86/umwait.c: Add sysfs interface to show tsc_khz
  x86/lib_direct_store.h: Add APIs for direct store instructions
  x86/lib_user_wait.h: Add APIs for user wait instructions
  selftests/x86: Self test for the APIs in lib_direct_store.h and
    lib_user_wait.h

 arch/x86/include/asm/cpufeatures.h               |   3 +
 arch/x86/include/asm/msr-index.h                 |   4 +
 arch/x86/include/uapi/asm/lib_direct_store.h     | 161 ++++++++++++++
 arch/x86/include/uapi/asm/lib_user_wait.h        | 255 +++++++++++++++++++++++
 arch/x86/power/Makefile                          |   1 +
 arch/x86/power/umwait.c                          | 123 +++++++++++
 tools/testing/selftests/x86/Makefile             |   5 +-
 tools/testing/selftests/x86/directstore_umwait.c | 202 ++++++++++++++++++
 8 files changed, 752 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/include/uapi/asm/lib_direct_store.h
 create mode 100644 arch/x86/include/uapi/asm/lib_user_wait.h
 create mode 100644 arch/x86/power/umwait.c
 create mode 100644 tools/testing/selftests/x86/directstore_umwait.c

-- 
2.5.0


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH 1/8] x86/cpufeatures: Enumerate MOVDIRI instruction
  2018-06-16  3:06 [RFC PATCH 0/8] x86: Enable a few new instructions Fenghua Yu
@ 2018-06-16  3:06 ` Fenghua Yu
  2018-06-19  8:57   ` Thomas Gleixner
  2018-06-16  3:06 ` [RFC PATCH 2/8] x86/cpufeatures: Enumerate MOVDIR64B instruction Fenghua Yu
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Fenghua Yu @ 2018-06-16  3:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H Peter Anvin
  Cc: Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86, Fenghua Yu

MOVDIRI moves doubleword or quadword from register to memory through
direct store which is implemented by using write combining (WC) for
writing data directly into memory without caching the data.

Availability of the MOVDIRI instruction is indicated by the presence of
the CPUID feature flag MOVDIRI(CPUID.0x07.0x0:ECX[bit 27]).

Please check the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference for more details on the CPUID
feature MOVDIRI flag.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 5701f5cecd31..92630c469675 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -329,6 +329,7 @@
 #define X86_FEATURE_LA57		(16*32+16) /* 5-level page tables */
 #define X86_FEATURE_RDPID		(16*32+22) /* RDPID instruction */
 #define X86_FEATURE_CLDEMOTE		(16*32+25) /* CLDEMOTE instruction */
+#define X86_FEATURE_MOVDIRI		(16*32+27) /* MOVDIRI instruction */
 
 /* AMD-defined CPU features, CPUID level 0x80000007 (EBX), word 17 */
 #define X86_FEATURE_OVERFLOW_RECOV	(17*32+ 0) /* MCA overflow recovery support */
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH 2/8] x86/cpufeatures: Enumerate MOVDIR64B instruction
  2018-06-16  3:06 [RFC PATCH 0/8] x86: Enable a few new instructions Fenghua Yu
  2018-06-16  3:06 ` [RFC PATCH 1/8] x86/cpufeatures: Enumerate MOVDIRI instruction Fenghua Yu
@ 2018-06-16  3:06 ` Fenghua Yu
  2018-06-16  3:06 ` [RFC PATCH 3/8] x86/cpufeatures: Enumerate UMONITOR, UMWAIT, and TPAUSE instructions Fenghua Yu
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Fenghua Yu @ 2018-06-16  3:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H Peter Anvin
  Cc: Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86, Fenghua Yu

MOVDIR64B moves 64-bytes as direct-store with 64-bytes write atomicity.
Direct store is implemented by using write combining (WC) for writing
data directly into memory without caching the data.

Availability of the MOVDIR64B instruction is indicated by the
presence of the CPUID feature flag MOVDIR64B (CPUID.0x07.0x0:ECX[bit 28]).

Please check the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference for more details on the CPUID
feature MOVDIR64B flag.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 92630c469675..69f1137877b6 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -330,6 +330,7 @@
 #define X86_FEATURE_RDPID		(16*32+22) /* RDPID instruction */
 #define X86_FEATURE_CLDEMOTE		(16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI		(16*32+27) /* MOVDIRI instruction */
+#define X86_FEATURE_MOVDIR64B		(16*32+28) /* MOVDIR64B instruction */
 
 /* AMD-defined CPU features, CPUID level 0x80000007 (EBX), word 17 */
 #define X86_FEATURE_OVERFLOW_RECOV	(17*32+ 0) /* MCA overflow recovery support */
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH 3/8] x86/cpufeatures: Enumerate UMONITOR, UMWAIT, and TPAUSE instructions
  2018-06-16  3:06 [RFC PATCH 0/8] x86: Enable a few new instructions Fenghua Yu
  2018-06-16  3:06 ` [RFC PATCH 1/8] x86/cpufeatures: Enumerate MOVDIRI instruction Fenghua Yu
  2018-06-16  3:06 ` [RFC PATCH 2/8] x86/cpufeatures: Enumerate MOVDIR64B instruction Fenghua Yu
@ 2018-06-16  3:06 ` Fenghua Yu
  2018-06-16  3:06 ` [RFC PATCH 4/8] cpuidle: Set up maximum umwait time and umwait states Fenghua Yu
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Fenghua Yu @ 2018-06-16  3:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H Peter Anvin
  Cc: Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86, Fenghua Yu

UMONITOR, UMWAIT, and TPAUSE are a set of user wait instructions.

UMONITOR arms address monitoring hardware using an address. A store
to an address within the specified address range triggers the
monitoring hardware to wake up the processor waiting in umwait.

UMWAIT instructs the processor to enter an implementation-dependent
optimized state while monitoring a range of addresses. The optimized
state may be either a light-weight power/performance optimized state
(c0.1 state) or an improved power/performance optimized state
(c0.2 state).

The UMONITOR and UMWAIT operate together to provide power saving
in idle.

TPAUSE instructs the processor to enter an implementation-dependent
optimized state c0.1 or c0.2 state and wake up when time-stamp counter
reaches specified timeout.

The three instructions may be executed at any privilege level.

Availability of the user wait instructions is indicated by the presence
of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5].

Please check the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference for more details on the
instructions and CPUID feature WAITPKG flag.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 69f1137877b6..70ed3087821d 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -318,6 +318,7 @@
 #define X86_FEATURE_UMIP		(16*32+ 2) /* User Mode Instruction Protection */
 #define X86_FEATURE_PKU			(16*32+ 3) /* Protection Keys for Userspace */
 #define X86_FEATURE_OSPKE		(16*32+ 4) /* OS Protection Keys Enable */
+#define X86_FEATURE_WAITPKG		(16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */
 #define X86_FEATURE_AVX512_VBMI2	(16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */
 #define X86_FEATURE_GFNI		(16*32+ 8) /* Galois Field New Instructions */
 #define X86_FEATURE_VAES		(16*32+ 9) /* Vector AES */
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH 4/8] cpuidle: Set up maximum umwait time and umwait states
  2018-06-16  3:06 [RFC PATCH 0/8] x86: Enable a few new instructions Fenghua Yu
                   ` (2 preceding siblings ...)
  2018-06-16  3:06 ` [RFC PATCH 3/8] x86/cpufeatures: Enumerate UMONITOR, UMWAIT, and TPAUSE instructions Fenghua Yu
@ 2018-06-16  3:06 ` Fenghua Yu
  2018-06-19  9:03   ` Thomas Gleixner
  2018-06-16  3:06 ` [RFC PATCH 5/8] x86/umwait.c: Add sysfs interface to show tsc_khz Fenghua Yu
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Fenghua Yu @ 2018-06-16  3:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H Peter Anvin
  Cc: Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86, Fenghua Yu

UMWAIT or TPAUSE called by user process makes processor to reside in
a light-weight power/performance optimized state (C0.1 state) or an
improved power/performance optimized state (C0.2 state).

IA32_UMAIT_CONTROL MSR register control allows OS to set maximum umwait
time and disable C0.2 on the processor.

The maximum time value in IA32_UMWAIT_CONTROL[31-2] is set as zero which
means there is no global time limit for UMWAIT and TPAUSE instructions.
Each process sets its own umwait maximum time as the instructions operand.
We don't set a non-zero global umwait maximum time value to enforce user
wait timeout because we couldn't find any usage for it.

By default C0.2 is enabled so user wait can save more power but wakeup
time is slower. In some cases e.g. real time, user wants to disable C0.2
so that user wait saves less power but wakeup time is faster.

A new "/sys/devices/system/cpu/cpuidle/umwait_disable_c0_2" file is
created to allow user to check if C0.2 is enabled or disabled and also
allow user to enable or disable C0.2. Value "1" in the file means C0.2 is
disabled. Value "0" means C0.2 is enabled.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/msr-index.h |   4 ++
 arch/x86/power/Makefile          |   1 +
 arch/x86/power/umwait.c          | 106 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 111 insertions(+)
 create mode 100644 arch/x86/power/umwait.c

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 68b2c3150de1..92ef30f0f62d 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -58,6 +58,10 @@
 #define MSR_PLATFORM_INFO_CPUID_FAULT_BIT	31
 #define MSR_PLATFORM_INFO_CPUID_FAULT		BIT_ULL(MSR_PLATFORM_INFO_CPUID_FAULT_BIT)
 
+#define MSR_IA32_UMWAIT_CONTROL		0xe1
+#define UMWAIT_CONTROL_C02_BIT		0x0
+#define UMWAIT_CONTROL_C02_MASK		0x00000001
+
 #define MSR_PKG_CST_CONFIG_CONTROL	0x000000e2
 #define NHM_C3_AUTO_DEMOTE		(1UL << 25)
 #define NHM_C1_AUTO_DEMOTE		(1UL << 26)
diff --git a/arch/x86/power/Makefile b/arch/x86/power/Makefile
index a4701389562c..d3dfa8a47983 100644
--- a/arch/x86/power/Makefile
+++ b/arch/x86/power/Makefile
@@ -8,3 +8,4 @@ CFLAGS_cpu.o	:= $(nostackp)
 
 obj-$(CONFIG_PM_SLEEP)		+= cpu.o
 obj-$(CONFIG_HIBERNATION)	+= hibernate_$(BITS).o hibernate_asm_$(BITS).o
+obj-y				+= umwait.o
diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c
new file mode 100644
index 000000000000..fd7b18d9ed02
--- /dev/null
+++ b/arch/x86/power/umwait.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * umwait.c - control user wait
+ *
+ * Copyright (c) 2018, Intel Corporation.
+ * Fenghua Yu <fenghua.yu@intel.com>
+ */
+/*
+ * umwait.c adds control of user wait states that user enters through user wait
+ * instructions umwait or tpause.
+ */
+#include <linux/cpu.h>
+#include <asm/msr.h>
+
+static int umwait_disable_c0_2;
+static DEFINE_MUTEX(umwait_lock);
+
+static ssize_t umwait_disable_c0_2_show(struct device *dev,
+					struct device_attribute *attr,
+					char *buf)
+{
+	return sprintf(buf, "%d\n", umwait_disable_c0_2);
+}
+
+static ssize_t umwait_disable_c0_2_store(struct device *dev,
+					 struct device_attribute *attr,
+					 const char *buf, size_t count)
+{
+	int disable_c0_2, cpu, ret;
+	u32 msr_val;
+
+	ret = kstrtou32(buf, 10, &disable_c0_2);
+	if (ret)
+		return ret;
+	if (disable_c0_2 != 1 && disable_c0_2 != 0)
+		return -EINVAL;
+
+	mutex_lock(&umwait_lock);
+	umwait_disable_c0_2 = disable_c0_2;
+	/*
+	 * No global umwait maximum time limit (0 in bits 31-0).
+	 * Enable or disable C0.2 based on global setting (bit 0) on all CPUs.
+	 */
+	msr_val = umwait_disable_c0_2 & UMWAIT_CONTROL_C02_MASK;
+	for_each_online_cpu(cpu)
+		wrmsr_on_cpu(cpu, MSR_IA32_UMWAIT_CONTROL, msr_val, 0);
+	mutex_unlock(&umwait_lock);
+
+	return count;
+}
+
+static DEVICE_ATTR_RW(umwait_disable_c0_2);
+
+static struct attribute *umwait_attrs[] = {
+	&dev_attr_umwait_disable_c0_2.attr,
+	NULL
+};
+
+static struct attribute_group umwait_attr_group = {
+	.attrs = umwait_attrs,
+	.name = "user_wait",
+};
+
+/* Keep the umwait control MSR on this CPU with the current global setting. */
+static int umwait_cpu_online(unsigned int cpu)
+{
+	u32 msr_val;
+
+	mutex_lock(&umwait_lock);
+	/*
+	 * No global umwait maximum time limit (0 in bits 31-0).
+	 * Enable or disable C0.2 based on global setting (bit 0) on this CPU.
+	 */
+	msr_val = umwait_disable_c0_2 & UMWAIT_CONTROL_C02_MASK;
+	wrmsr(MSR_IA32_UMWAIT_CONTROL, umwait_disable_c0_2, 0);
+	mutex_unlock(&umwait_lock);
+
+	return 0;
+}
+
+static int __init umwait_init(void)
+{
+	struct device *dev;
+	int ret;
+
+	if (!boot_cpu_has(X86_FEATURE_WAITPKG))
+		return -ENODEV;
+
+	/* Add CPU global user wait interface to control umwait C0.2. */
+	dev = cpu_subsys.dev_root;
+	ret = sysfs_create_group(&dev->kobj, &umwait_attr_group);
+	if (ret)
+		return ret;
+
+	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait/intel:online",
+				umwait_cpu_online, NULL);
+	if (ret < 0)
+		goto out_group;
+
+	return 0;
+out_group:
+	sysfs_remove_group(&dev->kobj, &umwait_attr_group);
+
+	return ret;
+}
+device_initcall(umwait_init);
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH 5/8] x86/umwait.c: Add sysfs interface to show tsc_khz
  2018-06-16  3:06 [RFC PATCH 0/8] x86: Enable a few new instructions Fenghua Yu
                   ` (3 preceding siblings ...)
  2018-06-16  3:06 ` [RFC PATCH 4/8] cpuidle: Set up maximum umwait time and umwait states Fenghua Yu
@ 2018-06-16  3:06 ` Fenghua Yu
  2018-06-19  9:08   ` Thomas Gleixner
  2018-06-16  3:06 ` [RFC PATCH 6/8] x86/lib_direct_store.h: Add APIs for direct store instructions Fenghua Yu
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Fenghua Yu @ 2018-06-16  3:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H Peter Anvin
  Cc: Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86, Fenghua Yu

User wait process or any other process wants to know tsc frequency to
convert seconds to tsc ticks. Kernel already gets tsc freqency in kernel
internal variable tsc_khz. The sysfs interface /sys/devices/system/cpu/
user_mwait/tsc_khz exposes the internal variable tsc_khz in decimal to
user.

tsc_khz and the interface are available only on CPU that supports
X86_FEATURE_TSC_KNOW_FREQ.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/power/umwait.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c
index fd7b18d9ed02..33b3ccb40cb9 100644
--- a/arch/x86/power/umwait.c
+++ b/arch/x86/power/umwait.c
@@ -7,7 +7,8 @@
  */
 /*
  * umwait.c adds control of user wait states that user enters through user wait
- * instructions umwait or tpause.
+ * instructions umwait or tpause. It also dumps tsc_khz to user so user process
+ * can convert seconds to tsc for umwait or other usages.
  */
 #include <linux/cpu.h>
 #include <asm/msr.h>
@@ -49,7 +50,14 @@ static ssize_t umwait_disable_c0_2_store(struct device *dev,
 	return count;
 }
 
+static ssize_t tsc_khz_show(struct device *dev, struct device_attribute *attr,
+			    char *buf)
+{
+	return sprintf(buf, "%d\n", tsc_khz);
+}
+
 static DEVICE_ATTR_RW(umwait_disable_c0_2);
+static DEVICE_ATTR_RO(tsc_khz);
 
 static struct attribute *umwait_attrs[] = {
 	&dev_attr_umwait_disable_c0_2.attr,
@@ -92,6 +100,15 @@ static int __init umwait_init(void)
 	if (ret)
 		return ret;
 
+	/* Only add the tsc_khz interface when the value is known. */
+	if (boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) {
+		ret = sysfs_add_file_to_group(&dev->kobj,
+					      &dev_attr_tsc_khz.attr,
+					      umwait_attr_group.name);
+		if (ret)
+			goto out_group;
+	}
+
 	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait/intel:online",
 				umwait_cpu_online, NULL);
 	if (ret < 0)
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH 6/8] x86/lib_direct_store.h: Add APIs for direct store instructions
  2018-06-16  3:06 [RFC PATCH 0/8] x86: Enable a few new instructions Fenghua Yu
                   ` (4 preceding siblings ...)
  2018-06-16  3:06 ` [RFC PATCH 5/8] x86/umwait.c: Add sysfs interface to show tsc_khz Fenghua Yu
@ 2018-06-16  3:06 ` Fenghua Yu
  2018-06-19  8:47   ` Thomas Gleixner
  2018-06-16  3:06 ` [RFC PATCH 7/8] x86/lib_user_wait.h: Add APIs for user wait instructions Fenghua Yu
  2018-06-16  3:06 ` [RFC PATCH 8/8] selftests/x86: Self test for the APIs in lib_direct_store.h and lib_user_wait.h Fenghua Yu
  7 siblings, 1 reply; 23+ messages in thread
From: Fenghua Yu @ 2018-06-16  3:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H Peter Anvin
  Cc: Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86, Fenghua Yu

Direct store instructions MOVDIRI and MOVDIR64B are published in the
latest Intel Instruction Set Extensions document.

Define the APIs for user or kernel to use the instructions.

If feature enabled GCC is available in the future, implementation
of the APIs will be changed to call the intrinsic instructions.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/uapi/asm/lib_direct_store.h | 161 +++++++++++++++++++++++++++
 1 file changed, 161 insertions(+)
 create mode 100644 arch/x86/include/uapi/asm/lib_direct_store.h

diff --git a/arch/x86/include/uapi/asm/lib_direct_store.h b/arch/x86/include/uapi/asm/lib_direct_store.h
new file mode 100644
index 000000000000..95a676f170ff
--- /dev/null
+++ b/arch/x86/include/uapi/asm/lib_direct_store.h
@@ -0,0 +1,161 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * This library provides a set of APIs for user or kernel to use
+ * some new instructions:
+ * - Director stores: movdiri and movdir64b
+ *
+ * Detailed information on the instructions can be found in
+ * Intel Architecture Instruction Set Extensions and Future Features
+ * Programming Reference.
+ *
+ * Copyright (C) 2018 Intel Corporation
+ *
+ * Author:
+ *     Fenghua Yu <fenghua.yu@intel.com>
+ */
+#ifndef _ASM_X86_LIB_DIRECT_STORES_H
+#define _ASM_X86_LIB_DIRECT_STORES_H
+
+#include <stdbool.h>
+
+/* CPUID.07H.0H:ECX[bit 27] */
+#define MOVDIRI_BIT		27
+/* CPUID.07H.0H:ECX[bit 28] */
+#define MOVDIR64B_BIT		28
+
+static bool _movdiri_supported, _movdiri_enumerated;
+static bool _movdir64b_supported, _movdir64b_enumerated;
+
+/**
+ * movdiri_supported() - Is movdiri instruction supported?
+ *
+ * Return:
+ * true: supported
+ *
+ * false: not supported
+ */
+static inline bool movdiri_supported(void)
+{
+	int eax, ebx, ecx, edx;
+	bool ret;
+
+	/*
+	 * If movdiri has been enumerated before, return cached movdiri
+	 * support info.
+	 */
+	if (_movdiri_enumerated)
+		return _movdiri_supported;
+
+	/* Otherwise, enumerate movdiri from CPUID. */
+	asm volatile("mov $7, %%eax\t\n"
+		     "mov $0, %%ecx\t\n"
+		     "cpuid\t\n"
+		     : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx));
+
+	if (ecx & (1 << MOVDIRI_BIT))
+		ret = true;
+	else
+		ret = false;
+
+	/*
+	 * Cache movdiri support info so we can use it later without
+	 * calling CPUID.
+	 */
+	_movdiri_enumerated = true;
+	_movdiri_supported = ret;
+
+	return ret;
+}
+
+/**
+ * movdir64b_supported() - Is movdir64b instruction supported?
+ *
+ * Return:
+ * true: supported
+ *
+ * false: not supported
+ */
+static inline bool movdir64b_supported(void)
+{
+	int eax, ebx, ecx, edx;
+	int ret;
+
+	/*
+	 * If movdir64b has been enumerated before, return cached movdir64b
+	 * support info.
+	 */
+	if (_movdir64b_enumerated)
+		return _movdir64b_supported;
+
+	/* Otherwise, enumerate movdir64b from CPUID. */
+	asm volatile("mov $7, %%eax\t\n"
+		     "mov $0, %%ecx\t\n"
+		     "cpuid\t\n"
+		     : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx));
+
+	if (ecx & (1 << MOVDIR64B_BIT))
+		ret = true;
+	else
+		ret = false;
+
+	/*
+	 * Cache movdir64b support info so we can use it later without
+	 * calling CPUID.
+	 */
+	_movdir64b_enumerated = true;
+	_movdir64b_supported = ret;
+
+	return ret;
+}
+
+/**
+ * movdiri32() - Move doubleword using direct store.
+ * @dst: Destination address.
+ * @data: 32-bit data.
+ *
+ * Moves the doubleword integer in @data to the destination address @dst
+ * using a direct-store operation.
+ */
+static inline void movdiri32(int *dst, int data)
+{
+	/* movdiri eax, [rdx] */
+	asm volatile(".byte 0x0f, 0x38, 0xf9, 0x02"
+		     : "=m" (*dst)
+		     : "a" (data), "d" (dst));
+}
+
+/**
+ * movdiri64() - Move quadword using direct store
+ * @dst: Destination address
+ * @data: 64-bit data
+ *
+ * Moves the quadword integer in @data to the destination address @dst
+ * using a direct-store operation.
+ */
+static inline void movdiri64(long *dst, long data)
+{
+	/* movdiri rax, [rdx] */
+	asm volatile(".byte 0x48, 0x0f, 0x38, 0xf9, 0x02"
+		     : "=m" (*dst)
+		     : "a" (data), "d" (dst));
+}
+
+/**
+ * movdir64b() - Move 64 bytes using direct store
+ * @dst: Destination address
+ * @src: Source address
+ *
+ * Moves 64 bytes as direct store with 64 bytes write atomicity from
+ * source memory address @src to destination address @dst.
+ *
+ * @dst must be 64-byte aligned. No alignment requirement for @src.
+ */
+static inline void movdir64b(void *dst, void *src)
+{
+	 /* movdir64b [rax], rdx */
+	asm volatile(".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"
+		     : "=m" (*dst)
+		     : "a" (src), "d" (dst));
+}
+
+#endif /* _ASM_X86_LIB_DIRECT_STORES_H */
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH 7/8] x86/lib_user_wait.h: Add APIs for user wait instructions
  2018-06-16  3:06 [RFC PATCH 0/8] x86: Enable a few new instructions Fenghua Yu
                   ` (5 preceding siblings ...)
  2018-06-16  3:06 ` [RFC PATCH 6/8] x86/lib_direct_store.h: Add APIs for direct store instructions Fenghua Yu
@ 2018-06-16  3:06 ` Fenghua Yu
  2018-06-19  9:12   ` Thomas Gleixner
  2018-06-16  3:06 ` [RFC PATCH 8/8] selftests/x86: Self test for the APIs in lib_direct_store.h and lib_user_wait.h Fenghua Yu
  7 siblings, 1 reply; 23+ messages in thread
From: Fenghua Yu @ 2018-06-16  3:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H Peter Anvin
  Cc: Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86, Fenghua Yu

A few new user wait instructions UMONITOR, UMWAIT, and TPAUSE are
published in the latest Intel Instruction Set Extensions document.

Define the APIs for user or kernel to use the instructions.

If feature enabled GCC is available in the future, implementation
of the APIs will be changed to call the intrinsic instructions.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/uapi/asm/lib_user_wait.h | 255 ++++++++++++++++++++++++++++++
 1 file changed, 255 insertions(+)
 create mode 100644 arch/x86/include/uapi/asm/lib_user_wait.h

diff --git a/arch/x86/include/uapi/asm/lib_user_wait.h b/arch/x86/include/uapi/asm/lib_user_wait.h
new file mode 100644
index 000000000000..027d45c1e383
--- /dev/null
+++ b/arch/x86/include/uapi/asm/lib_user_wait.h
@@ -0,0 +1,255 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * This library provides a set of APIs for user or kernel to use
+ * some new user wait instructions:
+ * - tpause, umonitor, and umwait
+ *
+ * Detailed information on the instructions can be found in
+ * Intel Architecture Instruction Set Extensions and Future Features
+ * Programming Reference.
+ */
+
+#ifndef _ASM_X86_LIB_USER_WAIT_H
+#define _ASM_X86_LIB_USER_WAIT_H
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <math.h>
+
+/* CPUID.07H.0H:ECX[5] */
+#define WAITPKG_BIT		5
+
+static bool _waitpkg_supported, _waitpkg_enumerated;
+static unsigned long tsc_khz;
+
+/**
+ * waitpkg_supported() - Is CPU flag waitpkg supported?
+ *
+ * Return:
+ * true: supported
+ *
+ * false: not supported
+ */
+static inline int waitpkg_supported(void)
+{
+	int eax, ebx, ecx, edx, ret;
+
+	/*
+	 * If waitpkg has been enumerated before, return cached waitpkg
+	 * support info.
+	 */
+	if (_waitpkg_enumerated)
+		return _waitpkg_supported;
+
+	/* Otherwise, enumerate the feature from CPUID. */
+	asm volatile("mov $7, %%eax\t\n"
+		     "mov $0, %%ecx\t\n"
+		     "cpuid\t\n"
+		     : "=a"(eax), "=b" (ebx), "=c" (ecx), "=d" (edx));
+
+	if (ecx & (1 << WAITPKG_BIT))
+		ret = true;
+	else
+		ret = false;
+
+	/* Cache waitpkg support for future use. */
+	_waitpkg_enumerated = true;
+	_waitpkg_supported = true;
+
+	return ret;
+}
+
+static inline int get_tsc_khz(unsigned long *tsc_khz_val)
+{
+	int fd, ret = 0;
+	char buf[32];
+
+	if (tsc_khz != 0) {
+		*tsc_khz_val = tsc_khz;
+		return 0;
+	}
+
+	fd = open("/sys/devices/system/cpu/user_wait/tsc_khz", O_RDONLY);
+	if (!fd)
+		return -1;
+	ret = read(fd, buf, 32);
+	if (ret < 0)
+		goto out;
+
+	tsc_khz = atol(buf);
+	*tsc_khz_val = tsc_khz;
+printf("tsc_khz=%ld\n", tsc_khz);
+
+out:
+	close(fd);
+	return ret;
+}
+
+#define	USEC_PER_SEC	1000000
+
+static inline int nsec_to_tsc(unsigned long nsec, unsigned long *tsc)
+{
+	int ret;
+
+	/* Get tsc frequency in HZ */
+	ret = get_tsc_khz(&tsc_khz);
+	if (ret < 0)
+		return ret;
+
+	*tsc = (unsigned long)round((double)tsc_khz * nsec / USEC_PER_SEC);
+
+	return 0;
+}
+
+/**
+ * umonitor() - Set up monitoring address
+ * @addr: Monitored address
+ *
+ * This API sets up address monitoring hardware using address @addr.
+ * It can be executed at any privilege level.
+ */
+static inline void umonitor(void *addr)
+{
+	asm volatile("mov %0, %%rdi\t\n"
+		     ".byte 0xf3, 0x0f, 0xae, 0xf7\t\n"
+		     : : "r" (addr));
+}
+
+static inline int _umwait(int state, unsigned long eax, unsigned long edx)
+{
+	unsigned long cflags;
+
+	asm volatile("mov %3, %%edi\t\n"
+		     ".byte 0xf2, 0x0f, 0xae, 0xf7\t\n"
+		     "pushf\t\n"
+		     "pop %0\t\n"
+		     : "=r" (cflags)
+		     : "d" (edx), "a" (eax), "r"(state));
+
+	/*
+	 * If the processor wakes due to expiration of OS time-limit, the CF
+	 * flag is set. Otherwise, the flag is cleared.
+	 */
+	return cflags & 1;
+}
+
+static unsigned long rdtsc(void)
+{
+	unsigned int low, high;
+
+	asm volatile ("rdtsc\t\n"
+		      : "=a" (low), "=d" (high));
+
+	return (unsigned long)high << 32 | low;
+}
+
+/**
+ * umwait() - Monitor wait
+ * @state: State
+ * @nsec: Time out in nano seconds
+ *
+ * A hint that allows the processor to stop instruction execution and
+ * enter an implementation-dependent optimized state. The processor
+ * wakes up because of events such as store to the monitored address,
+ * timeout, NMI, SMI, machine check, debug exception, etc.
+ *
+ * State 0 is light-weight power optimized state. It allows the processor
+ * to enter C0.2 state which has larger power saving but slower wakeup time.
+ *
+ * State 1 is performance optimized state. It allows the processor
+ * to enter C0.1 state which has smaller power saving but faster wakeup time.
+ *
+ * This function can be executed at any privilege level.
+ *
+ * Return:
+ * 1: the processor wakes due to expiration of OS time-limit
+ *
+ * 0: the processor wakes due to other reasons
+ *
+ * less than 0: error
+ */
+static inline int umwait(int state, unsigned long nsec)
+{
+	unsigned long tsc;
+	int ret;
+
+	if (state != 0 && state != 1)
+		return -1;
+
+	ret = nsec_to_tsc(nsec, &tsc);
+	if (ret)
+		return ret;
+
+	/* Get umwait deadline */
+	tsc += rdtsc();
+	ret = _umwait(state, tsc & 0xffffffff, tsc >> 32);
+
+	return ret;
+}
+
+static inline int _tpause(int state, unsigned long eax, unsigned long edx)
+{
+	unsigned long cflags;
+
+	asm volatile("mov %3, %%edi\t\n"
+		     ".byte 0x66, 0x0f, 0xae, 0xf7\t\n"
+		     "pushf\t\n"
+		     "pop %0\t\n"
+		     : "=r" (cflags)
+		     : "d" (edx), "a" (eax), "r"(state));
+
+	/*
+	 * If the processor wakes due to expiration of OS time-limit, the CF
+	 * flag is set. Otherwise, the flag is cleared.
+	 */
+	return cflags & 1;
+}
+
+/**
+ * tpause() - Timed pause
+ * @state: State
+ * @nsec: Timeout in nano seconds
+ *
+ * tpause() allows the processor to stop instruction execution and
+ * enter an implementation-dependent optimized state. The processor
+ * wakes up because of events such as store to the monitored
+ * address, timeout, NMI, SMI, machine check, debug exception, etc.
+ *
+ * State 0 is light-weight power optimized state. It allows the processor
+ * to enter C0.2 state which has larger power saving but slower wakeup time.
+ *
+ * State 1 is performance optimized state. It allows the processor
+ * to enter C0.1 state which has smaller power saving but faster wakeup time.
+ *
+ * This function can be executed at any privilege level.
+ *
+ * Return:
+ * 1: the processor wakes due to expiration of OS time-limit
+ *
+ * 0: the processor wakes due to other reasons
+ *
+ * less than 0: error
+ */
+static inline int tpause(int state, unsigned long nsec)
+{
+	unsigned long tsc;
+	int ret;
+
+	if (state != 0 && state != 1)
+		return -1;
+
+	ret = nsec_to_tsc(nsec, &tsc);
+	if (ret)
+		return ret;
+
+	/* Get tpause deadline */
+	tsc += rdtsc();
+	ret = _tpause(state, tsc & 0xffffffff, tsc >> 32);
+
+	return ret;
+}
+
+#endif /* _ASM_X86_LIB_USER_WAIT_H */
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH 8/8] selftests/x86: Self test for the APIs in lib_direct_store.h and lib_user_wait.h
  2018-06-16  3:06 [RFC PATCH 0/8] x86: Enable a few new instructions Fenghua Yu
                   ` (6 preceding siblings ...)
  2018-06-16  3:06 ` [RFC PATCH 7/8] x86/lib_user_wait.h: Add APIs for user wait instructions Fenghua Yu
@ 2018-06-16  3:06 ` Fenghua Yu
  7 siblings, 0 replies; 23+ messages in thread
From: Fenghua Yu @ 2018-06-16  3:06 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H Peter Anvin
  Cc: Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86, Fenghua Yu

The self test checks APIs defined in arch/x86/include/uapi/asm/
lib_direct_store.h and arch/x86/include/uapi/asm/lib_user_wait.h

Limited by testing environment, this test suit only tests simple cases.
More test cases may be added later.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 tools/testing/selftests/x86/Makefile             |   5 +-
 tools/testing/selftests/x86/directstore_umwait.c | 202 +++++++++++++++++++++++
 2 files changed, 205 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/x86/directstore_umwait.c

diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 186520198de7..36cf86f3eeae 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -12,7 +12,8 @@ CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh $(CC) trivial_program.c -no-pie)
 
 TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap_vdso \
 			check_initial_reg_state sigreturn iopl mpx-mini-test ioperm \
-			protection_keys test_vdso test_vsyscall mov_ss_trap
+			protection_keys test_vdso test_vsyscall mov_ss_trap \
+			directstore_umwait
 TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault test_syscall_vdso unwind_vdso \
 			test_FCMOV test_FCOMI test_FISTTP \
 			vdso_restorer
@@ -73,7 +74,7 @@ $(BINARIES_32): $(OUTPUT)/%_32: %.c
 	$(CC) -m32 -o $@ $(CFLAGS) $(EXTRA_CFLAGS) $^ -lrt -ldl -lm
 
 $(BINARIES_64): $(OUTPUT)/%_64: %.c
-	$(CC) -m64 -o $@ $(CFLAGS) $(EXTRA_CFLAGS) $^ -lrt -ldl
+	$(CC) -m64 -o $@ $(CFLAGS) $(EXTRA_CFLAGS) $^ -lrt -ldl -lm
 
 # x86_64 users should be encouraged to install 32-bit libraries
 ifeq ($(CAN_BUILD_I386)$(CAN_BUILD_X86_64),01)
diff --git a/tools/testing/selftests/x86/directstore_umwait.c b/tools/testing/selftests/x86/directstore_umwait.c
new file mode 100644
index 000000000000..d1bb1293d2ad
--- /dev/null
+++ b/tools/testing/selftests/x86/directstore_umwait.c
@@ -0,0 +1,202 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * directstore_umwait.c - Test APIs defined in lib_direct_store.h and
+ * lib_user_wait.h
+ *
+ * Copyright (c) 2018 Intel Corporation
+ * Fenghua Yu <fenghua.yu@intel.com>
+ */
+#define _GNU_SOURCE
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <time.h>
+#include <unistd.h>
+#include <string.h>
+#include <asm/lib_direct_store.h>
+#include <asm/lib_user_wait.h>
+
+void test_movdiri_32_bit(void)
+{
+	int __attribute((aligned(64))) dst[10];
+	int __attribute((aligned(64))) data;
+
+	if (!movdiri_supported()) {
+		printf("movdiri is not supported\n");
+
+		return;
+	}
+	dst[0] = 0;
+	data = 0x12345670;
+
+	movdiri32(dst, data);
+
+	if (dst[0] == data)
+		printf("movdiri 32-bit test passed\n");
+	else
+		printf("movdiri 32-bit test failed\n");
+}
+
+void test_movdiri_64_bit(void)
+{
+	long __attribute((aligned(64))) dst[10];
+	long __attribute((aligned(64))) data;
+
+	if (!movdiri_supported()) {
+		printf("movdiri is not supported\n");
+
+		return;
+	}
+	dst[0] = 0;
+	data = 0x123456789abcdef0;
+
+	movdiri64(dst, data);
+
+	if (dst[0] == data)
+		printf("movdiri 64-bit test passed\n");
+	else
+		printf("movdiri 64-bit test failed\n");
+}
+
+void test_movdiri(void)
+{
+	test_movdiri_32_bit();
+	test_movdiri_64_bit();
+}
+
+void test_movdir64b(void)
+{
+	char __attribute((aligned(64))) src[1024], dst[1024];
+
+	if (!movdir64b_supported()) {
+		printf("movdir64b is not supported\n");
+
+		return;
+	}
+	memset(src, 0, 1024);
+	memset(dst, 0, 1024);
+	for (int i = 0; i < 1024; i++)
+		dst[i] = i;
+
+	movdir64b(src, dst);
+	if (memcmp(src, dst, 64))
+		printf("movdir64b test failed\n");
+	else
+		printf("movdir64b test passed\n");
+}
+
+void test_timeout(char *test_name, int state, unsigned long timeout_ns,
+		  unsigned long overhead_ns)
+{
+	unsigned long tsc1, tsc2, real_tsc, real_ns, tsc_per_nsec;
+	int ret;
+
+	ret = nsec_to_tsc(1, &tsc_per_nsec);
+	if (ret) {
+		printf("umwait test failed: nsec cannot be coverted to tsc.\n");
+		return;
+	}
+
+	if (waitpkg_supported()) {
+		if (!strcmp(test_name, "umwait")) {
+			tsc1 = rdtsc();
+			umwait(state, timeout_ns);
+			tsc2 = rdtsc();
+		} else {
+			tsc1 = rdtsc();
+			tpause(state, timeout_ns);
+			tsc2 = rdtsc();
+		}
+		real_tsc = tsc2 - tsc1;
+		real_ns = real_tsc / tsc_per_nsec;
+		/* Give enough time for overhead on slow running machine. */
+		if (abs(real_ns - timeout_ns) < overhead_ns) {
+			printf("%s test passed\n", test_name);
+		} else {
+			printf("%s test failed:\n", test_name);
+			printf("real=%luns, expected=%luns. ",
+			       real_ns, timeout_ns);
+			printf("Likely due to slow machine. ");
+			printf("Please adjust overhead_ns or re-run test for a few more times.\n");
+		}
+	} else {
+		printf("%s is not supported\n", test_name);
+	}
+}
+
+void test_tpause_timeout(int state)
+{
+	/*
+	 * Timeout 100usec. Assume overhead of executing umwait is 10usec.
+	 * You can adjust the overhead number based on your machine.
+	 */
+	test_timeout("tpause", state, 100000, 10000);
+}
+
+void test_tpause(void)
+{
+	/* Test timeout in state 0 (C0.2). */
+	test_tpause_timeout(0);
+	/* Test timeout in state 1 (C0.1). */
+	test_tpause_timeout(1);
+	/* More tests ... */
+}
+
+char umonitor_range[1024];
+
+void test_umonitor_only(void)
+{
+	if (waitpkg_supported()) {
+		umonitor(umonitor_range);
+		printf("umonitor test passed\n");
+	} else {
+		printf("waitpkg not supported\n");
+	}
+}
+
+void show_basic_info(void)
+{
+	unsigned long tsc;
+	int ret;
+
+	ret = nsec_to_tsc(1, &tsc);
+	if (ret < 0)
+		printf("not tsc freq CPUID available\n");
+	else
+		printf("1 nsec = %lu tsc\n", tsc);
+}
+
+void test_umonitor(void)
+{
+	test_umonitor_only();
+}
+
+void test_umwait_timeout(int state)
+{
+	/*
+	 * Timeout 100usec. Overhead of executing umwait assumes 90usec.
+	 * You can adjust the overhead number based on your machine.
+	 */
+	test_timeout("umwait", state, 100000, 90000);
+}
+
+void test_umwait(void)
+{
+	/* Test timeout in state 0 (C0.2). */
+	test_umwait_timeout(0);
+	/* Test timeout in state 1 (C0.1). */
+	test_umwait_timeout(1);
+	/* More tests ... */
+}
+
+int main(void)
+{
+	show_basic_info();
+	test_movdiri();
+	test_movdir64b();
+	test_tpause();
+	test_umonitor();
+	test_umwait();
+
+	return 0;
+}
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 6/8] x86/lib_direct_store.h: Add APIs for direct store instructions
  2018-06-16  3:06 ` [RFC PATCH 6/8] x86/lib_direct_store.h: Add APIs for direct store instructions Fenghua Yu
@ 2018-06-19  8:47   ` Thomas Gleixner
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2018-06-19  8:47 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, H Peter Anvin, Ashok Raj, Alan Cox, Ravi V Shankar,
	linux-kernel, x86, Peter Zijlstra, Borislav Petkov

On Fri, 15 Jun 2018, Fenghua Yu wrote:
> +static inline bool movdiri_supported(void)
> +{
> +	int eax, ebx, ecx, edx;
> +	bool ret;
> +
> +	/*
> +	 * If movdiri has been enumerated before, return cached movdiri
> +	 * support info.
> +	 */
> +	if (_movdiri_enumerated)
> +		return _movdiri_supported;
> +
> +	/* Otherwise, enumerate movdiri from CPUID. */
> +	asm volatile("mov $7, %%eax\t\n"
> +		     "mov $0, %%ecx\t\n"
> +		     "cpuid\t\n"
> +		     : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx));

Why on earth do we need yet another machinery to figure out whether
something is enumerated in CPUID? We have feature bits and the whole set of
functions around it, including those which are run time patched.

Aside of that adding static booleans to every compilation unit which
includes that header file is just broken.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/8] x86/cpufeatures: Enumerate MOVDIRI instruction
  2018-06-16  3:06 ` [RFC PATCH 1/8] x86/cpufeatures: Enumerate MOVDIRI instruction Fenghua Yu
@ 2018-06-19  8:57   ` Thomas Gleixner
  2018-06-19 21:36     ` Fenghua Yu
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2018-06-19  8:57 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, H Peter Anvin, Ashok Raj, Alan Cox, Ravi V Shankar,
	linux-kernel, x86

On Fri, 15 Jun 2018, Fenghua Yu wrote:

> MOVDIRI moves doubleword or quadword from register to memory through
> direct store which is implemented by using write combining (WC) for
> writing data directly into memory without caching the data.

And that is useful for what?

Thanks,

	tglx 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 4/8] cpuidle: Set up maximum umwait time and umwait states
  2018-06-16  3:06 ` [RFC PATCH 4/8] cpuidle: Set up maximum umwait time and umwait states Fenghua Yu
@ 2018-06-19  9:03   ` Thomas Gleixner
  2018-06-19 15:46     ` Fenghua Yu
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2018-06-19  9:03 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, H Peter Anvin, Ashok Raj, Alan Cox, Ravi V Shankar,
	linux-kernel, x86

On Fri, 15 Jun 2018, Fenghua Yu wrote:
> By default C0.2 is enabled so user wait can save more power but wakeup
> time is slower. In some cases e.g. real time, user wants to disable C0.2
> so that user wait saves less power but wakeup time is faster.

Why is this default enabled?

> A new "/sys/devices/system/cpu/cpuidle/umwait_disable_c0_2" file is
> created to allow user to check if C0.2 is enabled or disabled and also
> allow user to enable or disable C0.2. Value "1" in the file means C0.2 is
> disabled. Value "0" means C0.2 is enabled.

Can we please use straight forward positive logic and have a enable file?

> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * umwait.c - control user wait

Please remove these pointless file references. They get stale before its merged.

> + *
> + * Copyright (c) 2018, Intel Corporation.
> + * Fenghua Yu <fenghua.yu@intel.com>
> + */
> +/*
> + * umwait.c adds control of user wait states that user enters through user wait
> + * instructions umwait or tpause.

umwait.c adds something?

> + */
> +#include <linux/cpu.h>
> +#include <asm/msr.h>
> +
> +static int umwait_disable_c0_2;
> +static DEFINE_MUTEX(umwait_lock);
> +
> +static ssize_t umwait_disable_c0_2_show(struct device *dev,
> +					struct device_attribute *attr,
> +					char *buf)
> +{
> +	return sprintf(buf, "%d\n", umwait_disable_c0_2);
> +}
> +
> +static ssize_t umwait_disable_c0_2_store(struct device *dev,
> +					 struct device_attribute *attr,
> +					 const char *buf, size_t count)
> +{
> +	int disable_c0_2, cpu, ret;
> +	u32 msr_val;
> +
> +	ret = kstrtou32(buf, 10, &disable_c0_2);
> +	if (ret)
> +		return ret;
> +	if (disable_c0_2 != 1 && disable_c0_2 != 0)
> +		return -EINVAL;
> +
> +	mutex_lock(&umwait_lock);
> +	umwait_disable_c0_2 = disable_c0_2;
> +	/*
> +	 * No global umwait maximum time limit (0 in bits 31-0).
> +	 * Enable or disable C0.2 based on global setting (bit 0) on all CPUs.
> +	 */
> +	msr_val = umwait_disable_c0_2 & UMWAIT_CONTROL_C02_MASK;

That mask is there because the variable can only have 0 and 1 as content....

> +	for_each_online_cpu(cpu)
> +		wrmsr_on_cpu(cpu, MSR_IA32_UMWAIT_CONTROL, msr_val, 0);

This lacks protection against CPU hotplug.

> +	mutex_unlock(&umwait_lock);
> +
> +	return count;

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 5/8] x86/umwait.c: Add sysfs interface to show tsc_khz
  2018-06-16  3:06 ` [RFC PATCH 5/8] x86/umwait.c: Add sysfs interface to show tsc_khz Fenghua Yu
@ 2018-06-19  9:08   ` Thomas Gleixner
  2018-06-19 15:11     ` Fenghua Yu
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2018-06-19  9:08 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, H Peter Anvin, Ashok Raj, Alan Cox, Ravi V Shankar,
	linux-kernel, x86

On Fri, 15 Jun 2018, Fenghua Yu wrote:
 
> +static ssize_t tsc_khz_show(struct device *dev, struct device_attribute *attr,
> +			    char *buf)
> +{
> +	return sprintf(buf, "%d\n", tsc_khz);
> +}
> +
>  static DEVICE_ATTR_RW(umwait_disable_c0_2);
> +static DEVICE_ATTR_RO(tsc_khz);

The right place to expose that information is the VDSO and a helper
function which allows to convert from nsec to TSC.

>  static struct attribute *umwait_attrs[] = {
>  	&dev_attr_umwait_disable_c0_2.attr,
> @@ -92,6 +100,15 @@ static int __init umwait_init(void)
>  	if (ret)
>  		return ret;
>  
> +	/* Only add the tsc_khz interface when the value is known. */

Why so? The only reason why you don't want to expose TSC frequency is when
it's not constant frequency.

> +	if (boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) {

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 7/8] x86/lib_user_wait.h: Add APIs for user wait instructions
  2018-06-16  3:06 ` [RFC PATCH 7/8] x86/lib_user_wait.h: Add APIs for user wait instructions Fenghua Yu
@ 2018-06-19  9:12   ` Thomas Gleixner
  2018-06-19 22:27     ` Fenghua Yu
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2018-06-19  9:12 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, H Peter Anvin, Ashok Raj, Alan Cox, Ravi V Shankar,
	linux-kernel, x86, Peter Zijlstra, Borislav Petkov

On Fri, 15 Jun 2018, Fenghua Yu wrote:

> A few new user wait instructions UMONITOR, UMWAIT, and TPAUSE are
> published in the latest Intel Instruction Set Extensions document.
> 
> Define the APIs for user or kernel to use the instructions.

You're not defining APIs. You're adding a pile of misdesigned helper
functions which again add static storage per compilation unit and CPUID
fiddling.

If you want to add proper APIs then add the stuff to the VDSO and be done
with it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 5/8] x86/umwait.c: Add sysfs interface to show tsc_khz
  2018-06-19  9:08   ` Thomas Gleixner
@ 2018-06-19 15:11     ` Fenghua Yu
  0 siblings, 0 replies; 23+ messages in thread
From: Fenghua Yu @ 2018-06-19 15:11 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Fenghua Yu, Ingo Molnar, H Peter Anvin, Ashok Raj, Alan Cox,
	Ravi V Shankar, linux-kernel, x86

On Tue, Jun 19, 2018 at 11:08:46AM +0200, Thomas Gleixner wrote:
> On Fri, 15 Jun 2018, Fenghua Yu wrote:
>  
> > +static ssize_t tsc_khz_show(struct device *dev, struct device_attribute *attr,
> > +			    char *buf)
> > +{
> > +	return sprintf(buf, "%d\n", tsc_khz);
> > +}
> > +
> >  static DEVICE_ATTR_RW(umwait_disable_c0_2);
> > +static DEVICE_ATTR_RO(tsc_khz);
> 
> The right place to expose that information is the VDSO and a helper
> function which allows to convert from nsec to TSC.

Sure. I will do that.

> 
> >  static struct attribute *umwait_attrs[] = {
> >  	&dev_attr_umwait_disable_c0_2.attr,
> > @@ -92,6 +100,15 @@ static int __init umwait_init(void)
> >  	if (ret)
> >  		return ret;
> >  
> > +	/* Only add the tsc_khz interface when the value is known. */
> 
> Why so? The only reason why you don't want to expose TSC frequency is when
> it's not constant frequency.

> > +	if (boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) {
> 

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 4/8] cpuidle: Set up maximum umwait time and umwait states
  2018-06-19  9:03   ` Thomas Gleixner
@ 2018-06-19 15:46     ` Fenghua Yu
  0 siblings, 0 replies; 23+ messages in thread
From: Fenghua Yu @ 2018-06-19 15:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Fenghua Yu, Ingo Molnar, H Peter Anvin, Ashok Raj, Alan Cox,
	Ravi V Shankar, linux-kernel, x86

On Tue, Jun 19, 2018 at 11:03:22AM +0200, Thomas Gleixner wrote:
> On Fri, 15 Jun 2018, Fenghua Yu wrote:
> > By default C0.2 is enabled so user wait can save more power but wakeup
> > time is slower. In some cases e.g. real time, user wants to disable C0.2
> > so that user wait saves less power but wakeup time is faster.
> 
> Why is this default enabled?

Current hardware implementation enables C0.2 by default. But I'm
not sure future every hardware always enables it by default.

In kernel, init code enforces to enable C0.2 globally by default.
Each umwait or tpause instruction will specify which state (C0.1 or C0.2)
to use during wait depending on user's decision. And a sysfs interface
allows user to disable C0.2 globally during run time.

Is this right arrangement?

> 
> > A new "/sys/devices/system/cpu/cpuidle/umwait_disable_c0_2" file is
> > created to allow user to check if C0.2 is enabled or disabled and also
> > allow user to enable or disable C0.2. Value "1" in the file means C0.2 is
> > disabled. Value "0" means C0.2 is enabled.
> 
> Can we please use straight forward positive logic and have a enable file?

Sure. I will do that.

> 
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * umwait.c - control user wait
> 
> Please remove these pointless file references. They get stale before its merged.
Sure. I will remove these.

> 
> > + *
> > + * Copyright (c) 2018, Intel Corporation.
> > + * Fenghua Yu <fenghua.yu@intel.com>
> > + */
> > +/*
> > + * umwait.c adds control of user wait states that user enters through user wait
> > + * instructions umwait or tpause.
> 
> umwait.c adds something?

I will change to "umwait.c controls user wait states that user enters..."?

> 
> > + */
> > +#include <linux/cpu.h>
> > +#include <asm/msr.h>
> > +
> > +static int umwait_disable_c0_2;
> > +static DEFINE_MUTEX(umwait_lock);
> > +
> > +static ssize_t umwait_disable_c0_2_show(struct device *dev,
> > +					struct device_attribute *attr,
> > +					char *buf)
> > +{
> > +	return sprintf(buf, "%d\n", umwait_disable_c0_2);
> > +}
> > +
> > +static ssize_t umwait_disable_c0_2_store(struct device *dev,
> > +					 struct device_attribute *attr,
> > +					 const char *buf, size_t count)
> > +{
> > +	int disable_c0_2, cpu, ret;
> > +	u32 msr_val;
> > +
> > +	ret = kstrtou32(buf, 10, &disable_c0_2);
> > +	if (ret)
> > +		return ret;
> > +	if (disable_c0_2 != 1 && disable_c0_2 != 0)
> > +		return -EINVAL;
> > +
> > +	mutex_lock(&umwait_lock);
> > +	umwait_disable_c0_2 = disable_c0_2;
> > +	/*
> > +	 * No global umwait maximum time limit (0 in bits 31-0).
> > +	 * Enable or disable C0.2 based on global setting (bit 0) on all CPUs.
> > +	 */
> > +	msr_val = umwait_disable_c0_2 & UMWAIT_CONTROL_C02_MASK;
> 
> That mask is there because the variable can only have 0 and 1 as content....

You are right, the variable can only have 0 or 1. Is it ok to have the mask?

> 
> > +	for_each_online_cpu(cpu)
> > +		wrmsr_on_cpu(cpu, MSR_IA32_UMWAIT_CONTROL, msr_val, 0);
> 
> This lacks protection against CPU hotplug.

You are right. I will add protection.

> 
> > +	mutex_unlock(&umwait_lock);
> > +
> > +	return count;
> 
> Thanks,
> 
> 	tglx

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/8] x86/cpufeatures: Enumerate MOVDIRI instruction
  2018-06-19  8:57   ` Thomas Gleixner
@ 2018-06-19 21:36     ` Fenghua Yu
  2018-06-19 22:32       ` Thomas Gleixner
  2018-06-25 16:13       ` David Laight
  0 siblings, 2 replies; 23+ messages in thread
From: Fenghua Yu @ 2018-06-19 21:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Fenghua Yu, Ingo Molnar, H Peter Anvin, Ashok Raj, Alan Cox,
	Ravi V Shankar, linux-kernel, x86

On Tue, Jun 19, 2018 at 10:57:44AM +0200, Thomas Gleixner wrote:
> On Fri, 15 Jun 2018, Fenghua Yu wrote:
> 
> > MOVDIRI moves doubleword or quadword from register to memory through
> > direct store which is implemented by using write combining (WC) for
> > writing data directly into memory without caching the data.
> 
> And that is useful for what?

Programmable agents can handle streaming offload (e.g. high speed packet
processing in network). Hardware implements a doorbell (tail pointer)
register that is updated by software when adding new work-elements to
the streaming offload work-queue.

MOVDIRI can be used as the doorbell write which is a 4-byte or 8-byte
uncachable write to MMIO. MOVDIRI has lower overhead than other ways
to write the doorbell.

In low latency offload (e.g. Non-Volatile Memory, etc), MOVDIR64B writes
work descriptors (and data in some cases) to device-hosted work-queues
with atomicity.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 7/8] x86/lib_user_wait.h: Add APIs for user wait instructions
  2018-06-19  9:12   ` Thomas Gleixner
@ 2018-06-19 22:27     ` Fenghua Yu
  2018-06-19 22:34       ` Thomas Gleixner
  0 siblings, 1 reply; 23+ messages in thread
From: Fenghua Yu @ 2018-06-19 22:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Fenghua Yu, Ingo Molnar, H Peter Anvin, Ashok Raj, Alan Cox,
	Ravi V Shankar, linux-kernel, x86, Peter Zijlstra,
	Borislav Petkov

On Tue, Jun 19, 2018 at 11:12:05AM +0200, Thomas Gleixner wrote:
> On Fri, 15 Jun 2018, Fenghua Yu wrote:
> 
> > A few new user wait instructions UMONITOR, UMWAIT, and TPAUSE are
> > published in the latest Intel Instruction Set Extensions document.
> > 
> > Define the APIs for user or kernel to use the instructions.
> 
> You're not defining APIs. You're adding a pile of misdesigned helper
> functions which again add static storage per compilation unit and CPUID
> fiddling.
> 
> If you want to add proper APIs then add the stuff to the VDSO and be done
> with it.

The user wait instructions are mainly called by user apps; but they can
be used in kernel as well.

I'm planning to provide five APIs to user:
1. If user wait feature is supported
2. nsec to tsc translation
3. umonitor function that exectes UMONITOR instruction
4. umwait function that executes UMWAIT instruction
5. tpause function that executes TPAUSE instruction

Seems 1-2 can be implemented in VDSO. But should I implement 3-5 in
VDSO/kernel as well?

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/8] x86/cpufeatures: Enumerate MOVDIRI instruction
  2018-06-19 21:36     ` Fenghua Yu
@ 2018-06-19 22:32       ` Thomas Gleixner
  2018-06-19 22:35         ` Fenghua Yu
  2018-06-25 16:13       ` David Laight
  1 sibling, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2018-06-19 22:32 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, H Peter Anvin, Ashok Raj, Alan Cox, Ravi V Shankar,
	linux-kernel, x86

On Tue, 19 Jun 2018, Fenghua Yu wrote:

> On Tue, Jun 19, 2018 at 10:57:44AM +0200, Thomas Gleixner wrote:
> > On Fri, 15 Jun 2018, Fenghua Yu wrote:
> > 
> > > MOVDIRI moves doubleword or quadword from register to memory through
> > > direct store which is implemented by using write combining (WC) for
> > > writing data directly into memory without caching the data.
> > 
> > And that is useful for what?
> 
> Programmable agents can handle streaming offload (e.g. high speed packet
> processing in network). Hardware implements a doorbell (tail pointer)
> register that is updated by software when adding new work-elements to
> the streaming offload work-queue.
> 
> MOVDIRI can be used as the doorbell write which is a 4-byte or 8-byte
> uncachable write to MMIO. MOVDIRI has lower overhead than other ways
> to write the doorbell.
> 
> In low latency offload (e.g. Non-Volatile Memory, etc), MOVDIR64B writes
> work descriptors (and data in some cases) to device-hosted work-queues
> with atomicity.

Makes sense, but why is this not part of the changelog ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 7/8] x86/lib_user_wait.h: Add APIs for user wait instructions
  2018-06-19 22:27     ` Fenghua Yu
@ 2018-06-19 22:34       ` Thomas Gleixner
  2018-06-19 22:36         ` Fenghua Yu
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2018-06-19 22:34 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, H Peter Anvin, Ashok Raj, Alan Cox, Ravi V Shankar,
	linux-kernel, x86, Peter Zijlstra, Borislav Petkov

On Tue, 19 Jun 2018, Fenghua Yu wrote:
> On Tue, Jun 19, 2018 at 11:12:05AM +0200, Thomas Gleixner wrote:
> > On Fri, 15 Jun 2018, Fenghua Yu wrote:
> > 
> > > A few new user wait instructions UMONITOR, UMWAIT, and TPAUSE are
> > > published in the latest Intel Instruction Set Extensions document.
> > > 
> > > Define the APIs for user or kernel to use the instructions.
> > 
> > You're not defining APIs. You're adding a pile of misdesigned helper
> > functions which again add static storage per compilation unit and CPUID
> > fiddling.
> > 
> > If you want to add proper APIs then add the stuff to the VDSO and be done
> > with it.
> 
> The user wait instructions are mainly called by user apps; but they can
> be used in kernel as well.
> 
> I'm planning to provide five APIs to user:
> 1. If user wait feature is supported
> 2. nsec to tsc translation
> 3. umonitor function that exectes UMONITOR instruction
> 4. umwait function that executes UMWAIT instruction
> 5. tpause function that executes TPAUSE instruction
> 
> Seems 1-2 can be implemented in VDSO. But should I implement 3-5 in
> VDSO/kernel as well?

If you want a real API, then yes. If not, then an UAPI header is definitely
not the place for this.

Thanks

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/8] x86/cpufeatures: Enumerate MOVDIRI instruction
  2018-06-19 22:32       ` Thomas Gleixner
@ 2018-06-19 22:35         ` Fenghua Yu
  0 siblings, 0 replies; 23+ messages in thread
From: Fenghua Yu @ 2018-06-19 22:35 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Fenghua Yu, Ingo Molnar, H Peter Anvin, Ashok Raj, Alan Cox,
	Ravi V Shankar, linux-kernel, x86

On Wed, Jun 20, 2018 at 12:32:14AM +0200, Thomas Gleixner wrote:
> On Tue, 19 Jun 2018, Fenghua Yu wrote:
> 
> > On Tue, Jun 19, 2018 at 10:57:44AM +0200, Thomas Gleixner wrote:
> > > On Fri, 15 Jun 2018, Fenghua Yu wrote:
> > > 
> > > > MOVDIRI moves doubleword or quadword from register to memory through
> > > > direct store which is implemented by using write combining (WC) for
> > > > writing data directly into memory without caching the data.
> > > 
> > > And that is useful for what?
> > 
> > Programmable agents can handle streaming offload (e.g. high speed packet
> > processing in network). Hardware implements a doorbell (tail pointer)
> > register that is updated by software when adding new work-elements to
> > the streaming offload work-queue.
> > 
> > MOVDIRI can be used as the doorbell write which is a 4-byte or 8-byte
> > uncachable write to MMIO. MOVDIRI has lower overhead than other ways
> > to write the doorbell.
> > 
> > In low latency offload (e.g. Non-Volatile Memory, etc), MOVDIR64B writes
> > work descriptors (and data in some cases) to device-hosted work-queues
> > with atomicity.
> 
> Makes sense, but why is this not part of the changelog ?

Sorry. I forgot to put the usage info in the commit description. I will
add it in the next version.

> 
> Thanks,
> 
> 	tglx

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 7/8] x86/lib_user_wait.h: Add APIs for user wait instructions
  2018-06-19 22:34       ` Thomas Gleixner
@ 2018-06-19 22:36         ` Fenghua Yu
  0 siblings, 0 replies; 23+ messages in thread
From: Fenghua Yu @ 2018-06-19 22:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Fenghua Yu, Ingo Molnar, H Peter Anvin, Ashok Raj, Alan Cox,
	Ravi V Shankar, linux-kernel, x86, Peter Zijlstra,
	Borislav Petkov

On Wed, Jun 20, 2018 at 12:34:32AM +0200, Thomas Gleixner wrote:
> On Tue, 19 Jun 2018, Fenghua Yu wrote:
> > On Tue, Jun 19, 2018 at 11:12:05AM +0200, Thomas Gleixner wrote:
> > > On Fri, 15 Jun 2018, Fenghua Yu wrote:
> > > 
> > > > A few new user wait instructions UMONITOR, UMWAIT, and TPAUSE are
> > > > published in the latest Intel Instruction Set Extensions document.
> > > > 
> > > > Define the APIs for user or kernel to use the instructions.
> > > 
> > > You're not defining APIs. You're adding a pile of misdesigned helper
> > > functions which again add static storage per compilation unit and CPUID
> > > fiddling.
> > > 
> > > If you want to add proper APIs then add the stuff to the VDSO and be done
> > > with it.
> > 
> > The user wait instructions are mainly called by user apps; but they can
> > be used in kernel as well.
> > 
> > I'm planning to provide five APIs to user:
> > 1. If user wait feature is supported
> > 2. nsec to tsc translation
> > 3. umonitor function that exectes UMONITOR instruction
> > 4. umwait function that executes UMWAIT instruction
> > 5. tpause function that executes TPAUSE instruction
> > 
> > Seems 1-2 can be implemented in VDSO. But should I implement 3-5 in
> > VDSO/kernel as well?
> 
> If you want a real API, then yes. If not, then an UAPI header is definitely
> not the place for this.

Sure. I will implement the APIs in VDSO.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [RFC PATCH 1/8] x86/cpufeatures: Enumerate MOVDIRI instruction
  2018-06-19 21:36     ` Fenghua Yu
  2018-06-19 22:32       ` Thomas Gleixner
@ 2018-06-25 16:13       ` David Laight
  1 sibling, 0 replies; 23+ messages in thread
From: David Laight @ 2018-06-25 16:13 UTC (permalink / raw)
  To: 'Fenghua Yu', Thomas Gleixner
  Cc: Ingo Molnar, H Peter Anvin, Ashok Raj, Alan Cox, Ravi V Shankar,
	linux-kernel, x86

From: Fenghua Yu
> Sent: 19 June 2018 22:37
> To: Thomas Gleixner
> Cc: Fenghua Yu; Ingo Molnar; H Peter Anvin; Ashok Raj; Alan Cox; Ravi V Shankar; linux-kernel; x86
> Subject: Re: [RFC PATCH 1/8] x86/cpufeatures: Enumerate MOVDIRI instruction
> 
> On Tue, Jun 19, 2018 at 10:57:44AM +0200, Thomas Gleixner wrote:
> > On Fri, 15 Jun 2018, Fenghua Yu wrote:
> >
> > > MOVDIRI moves doubleword or quadword from register to memory through
> > > direct store which is implemented by using write combining (WC) for
> > > writing data directly into memory without caching the data.
> >
> > And that is useful for what?
> 
> Programmable agents can handle streaming offload (e.g. high speed packet
> processing in network). Hardware implements a doorbell (tail pointer)
> register that is updated by software when adding new work-elements to
> the streaming offload work-queue.
> 
> MOVDIRI can be used as the doorbell write which is a 4-byte or 8-byte
> uncachable write to MMIO. MOVDIRI has lower overhead than other ways
> to write the doorbell.

I'd have thought that it wouldn't make any significant difference for
uncached accesses to device registers.

> In low latency offload (e.g. Non-Volatile Memory, etc), MOVDIR64B writes
> work descriptors (and data in some cases) to device-hosted work-queues
> with atomicity.

More likely it is useful more writing to memory without polluting the
data cache.
This might be because the programmer knows the data won't be read for
a long time (at least by the cpu in question).
It might also be useful to avoid a lot of cache snooping on data that
will be accessed by hardware - especially if the hardware is also
likely to be writing to the same cache line.

I can also just about imagine MOVDIR64B being useful for generating
64 byte PCIe TLP to optimise memcpy_to/fromio() without needing
an AVX512 register.

	David


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-06-25 16:12 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-16  3:06 [RFC PATCH 0/8] x86: Enable a few new instructions Fenghua Yu
2018-06-16  3:06 ` [RFC PATCH 1/8] x86/cpufeatures: Enumerate MOVDIRI instruction Fenghua Yu
2018-06-19  8:57   ` Thomas Gleixner
2018-06-19 21:36     ` Fenghua Yu
2018-06-19 22:32       ` Thomas Gleixner
2018-06-19 22:35         ` Fenghua Yu
2018-06-25 16:13       ` David Laight
2018-06-16  3:06 ` [RFC PATCH 2/8] x86/cpufeatures: Enumerate MOVDIR64B instruction Fenghua Yu
2018-06-16  3:06 ` [RFC PATCH 3/8] x86/cpufeatures: Enumerate UMONITOR, UMWAIT, and TPAUSE instructions Fenghua Yu
2018-06-16  3:06 ` [RFC PATCH 4/8] cpuidle: Set up maximum umwait time and umwait states Fenghua Yu
2018-06-19  9:03   ` Thomas Gleixner
2018-06-19 15:46     ` Fenghua Yu
2018-06-16  3:06 ` [RFC PATCH 5/8] x86/umwait.c: Add sysfs interface to show tsc_khz Fenghua Yu
2018-06-19  9:08   ` Thomas Gleixner
2018-06-19 15:11     ` Fenghua Yu
2018-06-16  3:06 ` [RFC PATCH 6/8] x86/lib_direct_store.h: Add APIs for direct store instructions Fenghua Yu
2018-06-19  8:47   ` Thomas Gleixner
2018-06-16  3:06 ` [RFC PATCH 7/8] x86/lib_user_wait.h: Add APIs for user wait instructions Fenghua Yu
2018-06-19  9:12   ` Thomas Gleixner
2018-06-19 22:27     ` Fenghua Yu
2018-06-19 22:34       ` Thomas Gleixner
2018-06-19 22:36         ` Fenghua Yu
2018-06-16  3:06 ` [RFC PATCH 8/8] selftests/x86: Self test for the APIs in lib_direct_store.h and lib_user_wait.h Fenghua Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).