linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] x86: Enable a few new instructions
@ 2018-07-23 12:55 Fenghua Yu
  2018-07-23 12:55 ` [PATCH 1/7] x86/cpufeatures: Enumerate MOVDIRI instruction Fenghua Yu
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Fenghua Yu @ 2018-07-23 12:55 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H Peter Anvin
  Cc: Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86, Fenghua Yu

A few new instructions including direct stores (movdiri and movdir64b)
and user wait (umwait, umonitor, and tpause) and IA32_MWAIT_CONTROL MSR to
control umwait/umonitor/tpause behaviors will be available in Tremont and
other future x86 processors.

This patch set enumerates the instructions, adds a sysfs interface for
user to configure the umwait/umonitor/tpause instructions, and provides
APIs for user to call the instructions.

The sysfs interface file are in /sys/devices/system/cpu/umwait_control/
umwait_enable_c0_2 because it's hard to find an existing place to host
the files.

The user APIs for the instructions are implemented as vDSO functions.

Detailed information on the instructions and the MSR can be found in
the latest Intel Architecture Instruction Set Extensions and Future
Features Programming Reference at
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

Changelog:

Based on comments from Thomas:
- Change user APIs to vDSO functions
- Change sysfs to positive logic and enable file name
- Change patch descriptions etc

Fenghua Yu (7):
  x86/cpufeatures: Enumerate MOVDIRI instruction
  x86/cpufeatures: Enumerate MOVDIR64B instruction
  x86/cpufeatures: Enumerate UMONITOR, UMWAIT, and TPAUSE instructions
  x86/umwait_contro: Set global umwait maximum time limit and umwait
    C0.2 state
  x86/vdso: Add vDSO functions for direct store instructions
  x86/vdso: Add vDSO functions for user wait instructions
  selftests/vDSO: Add selftest to test vDSO functions for direct store
    and user wait instructions

 arch/x86/entry/vdso/Makefile                      |   2 +-
 arch/x86/entry/vdso/vdirectstore.c                | 152 ++++++++
 arch/x86/entry/vdso/vdso.lds.S                    |  20 ++
 arch/x86/entry/vdso/vma.c                         |  21 ++
 arch/x86/entry/vdso/vuserwait.c                   | 233 +++++++++++++
 arch/x86/include/asm/cpufeatures.h                |   3 +
 arch/x86/include/asm/msr-index.h                  |   4 +
 arch/x86/include/asm/vdso_funcs_data.h            |  20 ++
 arch/x86/include/asm/vvar.h                       |   1 +
 arch/x86/power/Makefile                           |   1 +
 arch/x86/power/umwait.c                           | 116 +++++++
 tools/testing/selftests/vDSO/Makefile             |   4 +-
 tools/testing/selftests/vDSO/vdso_inst_test_x86.c | 405 ++++++++++++++++++++++
 13 files changed, 980 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/entry/vdso/vdirectstore.c
 create mode 100644 arch/x86/entry/vdso/vuserwait.c
 create mode 100644 arch/x86/include/asm/vdso_funcs_data.h
 create mode 100644 arch/x86/power/umwait.c
 create mode 100644 tools/testing/selftests/vDSO/vdso_inst_test_x86.c

-- 
2.5.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/7] x86/cpufeatures: Enumerate MOVDIRI instruction
  2018-07-23 12:55 [PATCH 0/7] x86: Enable a few new instructions Fenghua Yu
@ 2018-07-23 12:55 ` Fenghua Yu
  2018-07-23 12:55 ` [PATCH 2/7] x86/cpufeatures: Enumerate MOVDIR64B instruction Fenghua Yu
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Fenghua Yu @ 2018-07-23 12:55 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H Peter Anvin
  Cc: Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86, Fenghua Yu

MOVDIRI moves doubleword or quadword from register to memory through
direct store which is implemented by using write combining (WC) for
writing data directly into memory without caching the data.

Programmable agents can handle streaming offload (e.g. high speed packet
processing in network). Hardware implements a doorbell (tail pointer)
register that is updated by software when adding new work-elements to
the streaming offload work-queue.

MOVDIRI can be used as the doorbell write which is a 4-byte or 8-byte
uncachable write to MMIO. MOVDIRI has lower overhead than other ways
to write the doorbell.

Availability of the MOVDIRI instruction is indicated by the presence of
the CPUID feature flag MOVDIRI(CPUID.0x07.0x0:ECX[bit 27]).

Please check the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference for more details on the CPUID
feature MOVDIRI flag.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 5701f5cecd31..92630c469675 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -329,6 +329,7 @@
 #define X86_FEATURE_LA57		(16*32+16) /* 5-level page tables */
 #define X86_FEATURE_RDPID		(16*32+22) /* RDPID instruction */
 #define X86_FEATURE_CLDEMOTE		(16*32+25) /* CLDEMOTE instruction */
+#define X86_FEATURE_MOVDIRI		(16*32+27) /* MOVDIRI instruction */
 
 /* AMD-defined CPU features, CPUID level 0x80000007 (EBX), word 17 */
 #define X86_FEATURE_OVERFLOW_RECOV	(17*32+ 0) /* MCA overflow recovery support */
-- 
2.5.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/7] x86/cpufeatures: Enumerate MOVDIR64B instruction
  2018-07-23 12:55 [PATCH 0/7] x86: Enable a few new instructions Fenghua Yu
  2018-07-23 12:55 ` [PATCH 1/7] x86/cpufeatures: Enumerate MOVDIRI instruction Fenghua Yu
@ 2018-07-23 12:55 ` Fenghua Yu
  2018-07-23 12:55 ` [PATCH 3/7] x86/cpufeatures: Enumerate UMONITOR, UMWAIT, and TPAUSE instructions Fenghua Yu
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Fenghua Yu @ 2018-07-23 12:55 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H Peter Anvin
  Cc: Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86, Fenghua Yu

MOVDIR64B moves 64-bytes as direct-store with 64-bytes write atomicity.
Direct store is implemented by using write combining (WC) for writing
data directly into memory without caching the data.

In low latency offload (e.g. Non-Volatile Memory, etc), MOVDIR64B writes
work descriptors (and data in some cases) to device-hosted work-queues
atomically without cache pollution.

Availability of the MOVDIR64B instruction is indicated by the
presence of the CPUID feature flag MOVDIR64B (CPUID.0x07.0x0:ECX[bit 28]).

Please check the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference for more details on the CPUID
feature MOVDIR64B flag.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 92630c469675..69f1137877b6 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -330,6 +330,7 @@
 #define X86_FEATURE_RDPID		(16*32+22) /* RDPID instruction */
 #define X86_FEATURE_CLDEMOTE		(16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI		(16*32+27) /* MOVDIRI instruction */
+#define X86_FEATURE_MOVDIR64B		(16*32+28) /* MOVDIR64B instruction */
 
 /* AMD-defined CPU features, CPUID level 0x80000007 (EBX), word 17 */
 #define X86_FEATURE_OVERFLOW_RECOV	(17*32+ 0) /* MCA overflow recovery support */
-- 
2.5.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/7] x86/cpufeatures: Enumerate UMONITOR, UMWAIT, and TPAUSE instructions
  2018-07-23 12:55 [PATCH 0/7] x86: Enable a few new instructions Fenghua Yu
  2018-07-23 12:55 ` [PATCH 1/7] x86/cpufeatures: Enumerate MOVDIRI instruction Fenghua Yu
  2018-07-23 12:55 ` [PATCH 2/7] x86/cpufeatures: Enumerate MOVDIR64B instruction Fenghua Yu
@ 2018-07-23 12:55 ` Fenghua Yu
  2018-07-23 12:55 ` [PATCH 4/7] x86/umwait_contro: Set global umwait maximum time limit and umwait C0.2 state Fenghua Yu
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Fenghua Yu @ 2018-07-23 12:55 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H Peter Anvin
  Cc: Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86, Fenghua Yu

UMONITOR, UMWAIT, and TPAUSE are a set of user wait instructions.

UMONITOR arms address monitoring hardware using an address. A store
to an address within the specified address range triggers the
monitoring hardware to wake up the processor waiting in umwait.

UMWAIT instructs the processor to enter an implementation-dependent
optimized state while monitoring a range of addresses. The optimized
state may be either a light-weight power/performance optimized state
(c0.1 state) or an improved power/performance optimized state
(c0.2 state).

The UMONITOR and UMWAIT operate together to provide power saving
in idle.

TPAUSE instructs the processor to enter an implementation-dependent
optimized state c0.1 or c0.2 state and wake up when time-stamp counter
reaches specified timeout.

The three instructions may be executed at any privilege level.

Availability of the user wait instructions is indicated by the presence
of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5].

Please check the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference for more details on the
instructions and CPUID feature WAITPKG flag.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 69f1137877b6..70ed3087821d 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -318,6 +318,7 @@
 #define X86_FEATURE_UMIP		(16*32+ 2) /* User Mode Instruction Protection */
 #define X86_FEATURE_PKU			(16*32+ 3) /* Protection Keys for Userspace */
 #define X86_FEATURE_OSPKE		(16*32+ 4) /* OS Protection Keys Enable */
+#define X86_FEATURE_WAITPKG		(16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */
 #define X86_FEATURE_AVX512_VBMI2	(16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */
 #define X86_FEATURE_GFNI		(16*32+ 8) /* Galois Field New Instructions */
 #define X86_FEATURE_VAES		(16*32+ 9) /* Vector AES */
-- 
2.5.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 4/7] x86/umwait_contro: Set global umwait maximum time limit and umwait C0.2 state
  2018-07-23 12:55 [PATCH 0/7] x86: Enable a few new instructions Fenghua Yu
                   ` (2 preceding siblings ...)
  2018-07-23 12:55 ` [PATCH 3/7] x86/cpufeatures: Enumerate UMONITOR, UMWAIT, and TPAUSE instructions Fenghua Yu
@ 2018-07-23 12:55 ` Fenghua Yu
  2018-07-24  1:41   ` Andy Lutomirski
  2018-07-23 12:55 ` [PATCH 5/7] x86/vdso: Add vDSO functions for direct store instructions Fenghua Yu
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Fenghua Yu @ 2018-07-23 12:55 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H Peter Anvin
  Cc: Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86, Fenghua Yu

UMWAIT or TPAUSE called by user process makes processor to reside in
a light-weight power/performance optimized state (C0.1 state) or an
improved power/performance optimized state (C0.2 state).

IA32_UMWAIT_CONTROL MSR register allows OS to set global maximum umwait
time and disable C0.2 on the processor.

The maximum time value in IA32_UMWAIT_CONTROL[31-2] is set as zero which
means there is no global time limit for UMWAIT and TPAUSE instructions.
Each process sets its own umwait maximum time as the instructions operand.
We don't set a non-zero global umwait maximum time value to enforce user
wait timeout because we couldn't find any usage for it.

By default C0.2 is enabled so user wait instructions can enter the state
if user wants to save more power but wakeup time is slower. In some cases
e.g. real time, user wants to disable C0.2 and all C0.2 requests revert
to C0.1.

A new "/sys/devices/system/cpu/umwait_control/umwait_enable_c0_2" file is
created to allow user to check if C0.2 is enabled or disabled and also
allow user to enable or disable C0.2. Value "0" in the file means C0.2 is
disabled. Value "1" means C0.2 is enabled.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/msr-index.h |   4 ++
 arch/x86/power/Makefile          |   1 +
 arch/x86/power/umwait.c          | 116 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 121 insertions(+)
 create mode 100644 arch/x86/power/umwait.c

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 68b2c3150de1..92ef30f0f62d 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -58,6 +58,10 @@
 #define MSR_PLATFORM_INFO_CPUID_FAULT_BIT	31
 #define MSR_PLATFORM_INFO_CPUID_FAULT		BIT_ULL(MSR_PLATFORM_INFO_CPUID_FAULT_BIT)
 
+#define MSR_IA32_UMWAIT_CONTROL		0xe1
+#define UMWAIT_CONTROL_C02_BIT		0x0
+#define UMWAIT_CONTROL_C02_MASK		0x00000001
+
 #define MSR_PKG_CST_CONFIG_CONTROL	0x000000e2
 #define NHM_C3_AUTO_DEMOTE		(1UL << 25)
 #define NHM_C1_AUTO_DEMOTE		(1UL << 26)
diff --git a/arch/x86/power/Makefile b/arch/x86/power/Makefile
index a4701389562c..d3dfa8a47983 100644
--- a/arch/x86/power/Makefile
+++ b/arch/x86/power/Makefile
@@ -8,3 +8,4 @@ CFLAGS_cpu.o	:= $(nostackp)
 
 obj-$(CONFIG_PM_SLEEP)		+= cpu.o
 obj-$(CONFIG_HIBERNATION)	+= hibernate_$(BITS).o hibernate_asm_$(BITS).o
+obj-y				+= umwait.o
diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c
new file mode 100644
index 000000000000..7fb645c65f22
--- /dev/null
+++ b/arch/x86/power/umwait.c
@@ -0,0 +1,116 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Sysfs interface for umwait control
+ *
+ * Copyright (C) 2018, Intel Corporation.
+ *
+ * Author: Fenghua Yu <fenghua.yu@intel.com>
+ */
+#include <linux/cpu.h>
+#include <asm/msr.h>
+
+static int umwait_enable_c0_2 = 1; /* 0: disable C0.2. 1: enable C0.2. */
+static DEFINE_MUTEX(umwait_lock);
+
+/* Return value that will be used to set umwait control MSR */
+static inline u32 umwait_control_val(void)
+{
+	/*
+	 * We do not set global umwait maximum time limit. So bits 31-2 are 0.
+	 *
+	 * Enable or disable C0.2 (bit 0) based on global setting on all CPUs.
+	 * When bit 0 is 1, C0.2 is disabled. Otherwise, C0.2 is enabled.
+	 * So value in bit 0 is opposite of umwait_enable_c0_2.
+	 */
+	return ~umwait_enable_c0_2 & UMWAIT_CONTROL_C02_MASK;
+}
+
+static ssize_t umwait_enable_c0_2_show(struct device *dev,
+				       struct device_attribute *attr,
+				       char *buf)
+{
+	return sprintf(buf, "%d\n", umwait_enable_c0_2);
+}
+
+static ssize_t umwait_enable_c0_2_store(struct device *dev,
+					struct device_attribute *attr,
+					const char *buf, size_t count)
+{
+	int enable_c0_2, cpu, ret;
+	u32 msr_val;
+
+	ret = kstrtou32(buf, 10, &enable_c0_2);
+	if (ret)
+		return ret;
+
+	if (enable_c0_2 != 1 && enable_c0_2 != 0)
+		return -EINVAL;
+
+	mutex_lock(&umwait_lock);
+
+	umwait_enable_c0_2 = enable_c0_2;
+	msr_val = umwait_control_val();
+	get_online_cpus();
+	/* All CPUs have same umwait control setting */
+	for_each_online_cpu(cpu)
+		wrmsr_on_cpu(cpu, MSR_IA32_UMWAIT_CONTROL, msr_val, 0);
+	put_online_cpus();
+
+	mutex_unlock(&umwait_lock);
+
+	return count;
+}
+
+static DEVICE_ATTR_RW(umwait_enable_c0_2);
+
+static struct attribute *umwait_attrs[] = {
+	&dev_attr_umwait_enable_c0_2.attr,
+	NULL
+};
+
+static struct attribute_group umwait_attr_group = {
+	.attrs = umwait_attrs,
+	.name = "umwait_control",
+};
+
+/* Set up umwait control MSR on this CPU using the current global setting. */
+static int umwait_cpu_online(unsigned int cpu)
+{
+	u32 msr_val;
+
+	mutex_lock(&umwait_lock);
+
+	msr_val = umwait_control_val();
+	wrmsr(MSR_IA32_UMWAIT_CONTROL, msr_val, 0);
+
+	mutex_unlock(&umwait_lock);
+
+	return 0;
+}
+
+static int __init umwait_init(void)
+{
+	struct device *dev;
+	int ret;
+
+	if (!boot_cpu_has(X86_FEATURE_WAITPKG))
+		return -ENODEV;
+
+	/* Add CPU global user wait interface to control umwait C0.2 state. */
+	dev = cpu_subsys.dev_root;
+	ret = sysfs_create_group(&dev->kobj, &umwait_attr_group);
+	if (ret)
+		return ret;
+
+	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait/intel:online",
+				umwait_cpu_online, NULL);
+	if (ret < 0)
+		goto out_group;
+
+	return 0;
+out_group:
+	sysfs_remove_group(&dev->kobj, &umwait_attr_group);
+
+	return ret;
+}
+device_initcall(umwait_init);
-- 
2.5.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 5/7] x86/vdso: Add vDSO functions for direct store instructions
  2018-07-23 12:55 [PATCH 0/7] x86: Enable a few new instructions Fenghua Yu
                   ` (3 preceding siblings ...)
  2018-07-23 12:55 ` [PATCH 4/7] x86/umwait_contro: Set global umwait maximum time limit and umwait C0.2 state Fenghua Yu
@ 2018-07-23 12:55 ` Fenghua Yu
  2018-07-24  1:48   ` Andy Lutomirski
  2018-07-23 12:55 ` [PATCH 6/7] x86/vdso: Add vDSO functions for user wait instructions Fenghua Yu
  2018-07-23 12:55 ` [PATCH 7/7] selftests/vDSO: Add selftest to test vDSO functions for direct store and " Fenghua Yu
  6 siblings, 1 reply; 19+ messages in thread
From: Fenghua Yu @ 2018-07-23 12:55 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H Peter Anvin
  Cc: Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86, Fenghua Yu

User wants to query if direct store instructions are supported and use
the instructions. The vDSO functions provides fast interface for user
to query the support and use the instructions.

movdiri_supported and its alias __vdso_movdiri_supported check if
movdiri instructions are supported.

movdir64b_supported and its alias __vdso_movdir64b_supported checks
if movdir64b instruction is supported.

movdiri32 and its alias __vdso_movdiri32 provide user APIs for calling
32-bit movdiri instruction.

movdiri64 and its alias __vdso_movdiri64 provide user APIs for calling
64-bit movdiri instruction.

movdir64b and its alias __vdso_movdir64b provide user APIs to move
64-byte data through movdir64b instruction.

The instructions can be implemented in intrinsic functions in future
GCC. But the vDSO interfaces are available to user without the
intrinsic functions support in GCC and the APIs movdiri_supported and
movdir64b_supported cannot be implemented as GCC functions.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/entry/vdso/Makefile           |   2 +-
 arch/x86/entry/vdso/vdirectstore.c     | 152 +++++++++++++++++++++++++++++++++
 arch/x86/entry/vdso/vdso.lds.S         |  10 +++
 arch/x86/entry/vdso/vma.c              |  12 +++
 arch/x86/include/asm/vdso_funcs_data.h |  17 ++++
 arch/x86/include/asm/vvar.h            |   1 +
 6 files changed, 193 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/entry/vdso/vdirectstore.c
 create mode 100644 arch/x86/include/asm/vdso_funcs_data.h

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index b9ed1aa53a26..af4fcae5de83 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -17,7 +17,7 @@ VDSO32-$(CONFIG_X86_32)		:= y
 VDSO32-$(CONFIG_IA32_EMULATION)	:= y
 
 # files to link into the vdso
-vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o
+vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o vdirectstore.o
 
 # files to link into kernel
 obj-y				+= vma.o
diff --git a/arch/x86/entry/vdso/vdirectstore.c b/arch/x86/entry/vdso/vdirectstore.c
new file mode 100644
index 000000000000..6f6d76ed8cde
--- /dev/null
+++ b/arch/x86/entry/vdso/vdirectstore.c
@@ -0,0 +1,152 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * vDSO functions for direct store instructions
+ *
+ * Copyright 2018 Intel Coporation
+ *
+ * Author: Fenghua Yu <fenghua.yu@intel.com>
+ */
+#include <asm/vdso_funcs_data.h>
+
+notrace bool __vdso_movdiri_supported(void)
+{
+	return _vdso_funcs_data->movdiri_supported;
+}
+
+/**
+ * movdiri_supported() - vDSO function for checking movdiri instructions support
+ *
+ * This function checks if direct store instruction movdiri is supported
+ * on this machine.
+ *
+ * movdiri_supported() and its alias __vdso_movdiri_supported() are
+ * implemented as vDSO functions.
+ *
+ * Return:
+ * true: supported
+ *
+ * false: not supported
+ */
+bool movdiri_supported(void)
+	__attribute__((weak, alias("__vdso_movdiri_supported")));
+
+notrace int __vdso_movdir64b_supported(void)
+{
+	return _vdso_funcs_data->movdir64b_supported;
+}
+
+/**
+ * movdir64b_supported() - vDSO function for checking movdir64b instruction
+ * support
+ *
+ * This function checks if direct store instruction movdir64b is supported
+ * on this machine.
+ *
+ * movdir64b_supported() and its alias __vdso_movdir64b_supported() are
+ * implemented as vDSO functions.
+ *
+ * Return:
+ * true: supported
+ *
+ * false: not supported
+ */
+bool movdir64b_supported(void)
+	__attribute__((weak, alias("__vdso_movdir64b_supported")));
+
+notrace int __vdso_movdiri32(int *dst, int data)
+{
+	if (!_vdso_funcs_data->movdiri_supported)
+		return -ENODEV;
+
+	/* movdiri eax, [rdx] */
+	asm volatile(".byte 0x0f, 0x38, 0xf9, 0x02"
+		     : "=m" (*dst)
+		     : "a" (data), "d" (dst));
+
+	return 0;
+}
+
+/**
+ * movdiri32() - vDSO function for moving doubleword using direct store.
+ * @dst: Destination address.
+ * @data: 32-bit data.
+ *
+ * Moves the doubleword in @data to the destination address @dst
+ * using direct-store operation.
+ *
+ * movdiri32() and its alias __vdso_movdiri32() are implemented as vDSO
+ * functions.
+ *
+ * Return:
+ * 0: Successful
+ *
+ * Less than zero: error code
+ */
+int movdiri32(int *dst, int data)
+	__attribute__((weak, alias("__vdso_movdiri32")));
+
+notrace int __vdso_movdiri64(long *dst, long data)
+{
+	if (!_vdso_funcs_data->movdiri_supported)
+		return -ENODEV;
+
+	/* movdiri rax, [rdx] */
+	asm volatile(".byte 0x48, 0x0f, 0x38, 0xf9, 0x02"
+		     : "=m" (*dst)
+		     : "a" (data), "d" (dst));
+
+	return 0;
+}
+
+/**
+ * movdiri64() - vDSO function for moving quadword using direct store
+ * @dst: Destination address
+ * @data: 64-bit data
+ *
+ * Moves the quadword in @data to the destination address @dst using
+ * direct-store operation.
+ *
+ * movdiri64() and its alias __vdso_movdiri64() are implemented as vDSO
+ * functions.
+ *
+ * Return:
+ * 0: Successful
+ *
+ * Less than zero: error code
+ */
+void movdiri64(long *dst, long data)
+	__attribute__((weak, alias("__vdso_movdiri64")));
+
+notrace int __vdso_movdir64b(char *dst, char *src)
+{
+	if (!_vdso_funcs_data->movdir64b_supported)
+		return -ENODEV;
+
+	 /* movdir64b [rax], rdx */
+	asm volatile(".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"
+		     : "=m" (*dst)
+		     : "a" (src), "d" (dst));
+
+	return 0;
+}
+
+/**
+ * movdir64b() - vDSO function for moving 64 bytes using direct store
+ * @dst: Destination address
+ * @src: Source address
+ *
+ * Moves 64 bytes as direct store with 64 bytes write atomicity from
+ * source memory address @src to destination address @dst.
+ *
+ * @dst must be 64-byte aligned. No alignment requirement for @src.
+ *
+ * movdir64b() and its alias __vdso_movdir64b() are implemented as vDSO
+ * functions.
+ *
+ * Return:
+ * 0: Successful
+ *
+ * Less than zero: error code
+ */
+int movdir64b(char *dst, char *src)
+	__attribute__((weak, alias("__vdso_movdir64b")));
diff --git a/arch/x86/entry/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S
index d3a2dce4cfa9..097cdcda43a5 100644
--- a/arch/x86/entry/vdso/vdso.lds.S
+++ b/arch/x86/entry/vdso/vdso.lds.S
@@ -25,6 +25,16 @@ VERSION {
 		__vdso_getcpu;
 		time;
 		__vdso_time;
+		movdiri_supported;
+		__vdso_movdiri_supported;
+		movdiri32;
+		__vdso_movdiri32;
+		movdiri64;
+		__vdso_movdiri64;
+		movdir64b_supported;
+		__vdso_movdir64b_supported;
+		movdir64b;
+		__vdso_movdir64b;
 	local: *;
 	};
 }
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 5b8b556dbb12..edbe5e63e5c2 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -23,11 +23,14 @@
 #include <asm/desc.h>
 #include <asm/cpufeature.h>
 #include <asm/mshyperv.h>
+#include <asm/vdso_funcs_data.h>
 
 #if defined(CONFIG_X86_64)
 unsigned int __read_mostly vdso64_enabled = 1;
 #endif
 
+DEFINE_VVAR(struct vdso_funcs_data, vdso_funcs_data);
+
 void __init init_vdso_image(const struct vdso_image *image)
 {
 	BUG_ON(image->size % PAGE_SIZE != 0);
@@ -367,6 +370,14 @@ static int vgetcpu_online(unsigned int cpu)
 	return smp_call_function_single(cpu, vgetcpu_cpu_init, NULL, 1);
 }
 
+static void __init init_vdso_funcs_data(void)
+{
+	if (static_cpu_has(X86_FEATURE_MOVDIRI))
+		vdso_funcs_data.movdiri_supported = true;
+	if (static_cpu_has(X86_FEATURE_MOVDIR64B))
+		vdso_funcs_data.movdir64b_supported = true;
+}
+
 static int __init init_vdso(void)
 {
 	init_vdso_image(&vdso_image_64);
@@ -374,6 +385,7 @@ static int __init init_vdso(void)
 #ifdef CONFIG_X86_X32_ABI
 	init_vdso_image(&vdso_image_x32);
 #endif
+	init_vdso_funcs_data();
 
 	/* notifier priority > KVM */
 	return cpuhp_setup_state(CPUHP_AP_X86_VDSO_VMA_ONLINE,
diff --git a/arch/x86/include/asm/vdso_funcs_data.h b/arch/x86/include/asm/vdso_funcs_data.h
new file mode 100644
index 000000000000..b99a5685029e
--- /dev/null
+++ b/arch/x86/include/asm/vdso_funcs_data.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_VDSO_FUNCS_DATA_H
+#define _ASM_X86_VDSO_FUNCS_DATA_H
+
+#include <linux/errno.h>
+#include <linux/types.h>
+#include <asm/vvar.h>
+
+/* This structure contains data used by vDSO functions. */
+struct vdso_funcs_data {
+	bool movdiri_supported;   /* if movdiri instruction is supported */
+	bool movdir64b_supported; /* if movdir64b instruction is supported */
+};
+
+#define _vdso_funcs_data (&VVAR(vdso_funcs_data))
+
+#endif /* _ASM_X86_VDSO_FUNCS_DATA_H */
diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h
index 3f32dfc2ab73..74047c0490fe 100644
--- a/arch/x86/include/asm/vvar.h
+++ b/arch/x86/include/asm/vvar.h
@@ -45,6 +45,7 @@ extern char __vvar_page;
 /* DECLARE_VVAR(offset, type, name) */
 
 DECLARE_VVAR(128, struct vsyscall_gtod_data, vsyscall_gtod_data)
+DECLARE_VVAR(256, struct vdso_funcs_data, vdso_funcs_data)
 
 #undef DECLARE_VVAR
 
-- 
2.5.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 6/7] x86/vdso: Add vDSO functions for user wait instructions
  2018-07-23 12:55 [PATCH 0/7] x86: Enable a few new instructions Fenghua Yu
                   ` (4 preceding siblings ...)
  2018-07-23 12:55 ` [PATCH 5/7] x86/vdso: Add vDSO functions for direct store instructions Fenghua Yu
@ 2018-07-23 12:55 ` Fenghua Yu
  2018-07-24  2:11   ` Andy Lutomirski
  2018-07-23 12:55 ` [PATCH 7/7] selftests/vDSO: Add selftest to test vDSO functions for direct store and " Fenghua Yu
  6 siblings, 1 reply; 19+ messages in thread
From: Fenghua Yu @ 2018-07-23 12:55 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H Peter Anvin
  Cc: Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86, Fenghua Yu

User wants to query if user wait instructions (umonitor, umwait, and
tpause) are supported and use the instructions. The vDSO functions
provides fast interface for user to check the support and use the
instructions.

waitpkg_supported and its alias __vdso_waitpkg_supported check if
user wait instructions (a.k.a. wait package feature) are supported

umonitor and its alias __vdso_umonitor provide user APIs for calling
umonitor instruction.

umwait and its alias __vdso_umwait provide user APIs for calling
umwait instruction.

tpause and its alias __vdso_tpause provide user APIs for calling
tpause instruction.

nsec_to_tsc and its alias __vdso_nsec_to_tsc converts nanoseconds
to TSC counter if TSC frequency is known. It will fail if TSC frequency
is unknown.

The instructions can be implemented in intrinsic functions in future
GCC. But the vDSO interfaces are available to user without the
intrinsic functions support in GCC and the API waitpkg_supported and
nsec_to_tsc cannot be implemented as GCC functions.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/entry/vdso/Makefile           |   2 +-
 arch/x86/entry/vdso/vdso.lds.S         |  10 ++
 arch/x86/entry/vdso/vma.c              |   9 ++
 arch/x86/entry/vdso/vuserwait.c        | 233 +++++++++++++++++++++++++++++++++
 arch/x86/include/asm/vdso_funcs_data.h |   3 +
 5 files changed, 256 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/entry/vdso/vuserwait.c

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index af4fcae5de83..fb0062b09b3c 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -17,7 +17,7 @@ VDSO32-$(CONFIG_X86_32)		:= y
 VDSO32-$(CONFIG_IA32_EMULATION)	:= y
 
 # files to link into the vdso
-vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o vdirectstore.o
+vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o vdirectstore.o vuserwait.o
 
 # files to link into kernel
 obj-y				+= vma.o
diff --git a/arch/x86/entry/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S
index 097cdcda43a5..0942710608bf 100644
--- a/arch/x86/entry/vdso/vdso.lds.S
+++ b/arch/x86/entry/vdso/vdso.lds.S
@@ -35,6 +35,16 @@ VERSION {
 		__vdso_movdir64b_supported;
 		movdir64b;
 		__vdso_movdir64b;
+		waitpkg_supported;
+		__vdso_waitpkg_supported;
+		umonitor;
+		__vdso_umonitor;
+		umwait;
+		__vdso_umwait;
+		tpause;
+		__vdso_tpause;
+		nsec_to_tsc;
+		__vdso_nsec_to_tsc;
 	local: *;
 	};
 }
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index edbe5e63e5c2..006dfb5e5003 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -372,10 +372,19 @@ static int vgetcpu_online(unsigned int cpu)
 
 static void __init init_vdso_funcs_data(void)
 {
+	struct system_counterval_t sys_counterval;
+
 	if (static_cpu_has(X86_FEATURE_MOVDIRI))
 		vdso_funcs_data.movdiri_supported = true;
 	if (static_cpu_has(X86_FEATURE_MOVDIR64B))
 		vdso_funcs_data.movdir64b_supported = true;
+	if (static_cpu_has(X86_FEATURE_WAITPKG))
+		vdso_funcs_data.waitpkg_supported = true;
+	if (static_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) {
+		vdso_funcs_data.tsc_known_freq = true;
+		sys_counterval = convert_art_ns_to_tsc(1);
+		vdso_funcs_data.tsc_per_nsec = sys_counterval.cycles;
+	}
 }
 
 static int __init init_vdso(void)
diff --git a/arch/x86/entry/vdso/vuserwait.c b/arch/x86/entry/vdso/vuserwait.c
new file mode 100644
index 000000000000..17ff564aef7e
--- /dev/null
+++ b/arch/x86/entry/vdso/vuserwait.c
@@ -0,0 +1,233 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * vDSO functions for user wait instructions
+ *
+ * Copyright (C) 2018 Intel Corporation
+ *
+ * Author: Fenghua Yu <fenghua.yu@intel.com>
+ */
+#include <linux/timer.h>
+#include <asm/vdso_funcs_data.h>
+
+notrace bool __vdso_waitpkg_supported(void)
+{
+	return _vdso_funcs_data->waitpkg_supported;
+}
+
+/**
+ * waitpkg_supported() - vDSO function for checking if user wait instructions
+ * are supported
+ *
+ * waitpkg instructions (a.k.a. user wait instructions) group has three
+ * instructions: umonitor, umwait, and tpause. This vDSO API tells user
+ * if the instructions group is supported on the machine.
+ *
+ * waitpkg_supported() and its alias __vdso_waitpkg_supported() are implemented
+ * as vDSO functions.
+ *
+ * Return:
+ * true: supported
+ *
+ * false: not supported
+ */
+bool waitpkge_supported(void)
+	__attribute__((weak, alias("__vdso_waitpkg_supported")));
+
+notrace int __vdso_nsec_to_tsc(unsigned long nsec, unsigned long *tsc)
+{
+	if (!_vdso_funcs_data->tsc_known_freq)
+		return -ENODEV;
+
+	*tsc = _vdso_funcs_data->tsc_per_nsec * nsec;
+
+	return 0;
+}
+
+/**
+ * nsec_to_tsc() - vDSO function for converting nanoseconds to TSC counter
+ * @nsec: nanoseconds
+ * @tsc: Returned TSC counter
+ *
+ * If TSC has known frequency (i.e. X86_FEATURE_TSC_KNOWN_FREQ is supported),
+ * convert nanoseconds to TSC counter.
+ *
+ * nsec_to_tsc() and its alias __vdso_nsec_to_tsc() are implemented
+ * as vDSO functions.
+ *
+ * Return:
+ * 0: Successful
+ *
+ * Less than zero: error code
+ */
+int nsec_to_tsc(unsigned long nsec, unsigned long *tsc)
+	__attribute__((weak, alias("__vdso_nsec_to_tsc")));
+
+notrace int __vdso_umonitor(void *addr)
+{
+	if (!_vdso_funcs_data->waitpkg_supported)
+		return -ENODEV;
+
+	asm volatile("mov %0, %%rdi\t\n"
+		     ".byte 0xf3, 0x0f, 0xae, 0xf7\t\n"
+		     : : "r" (addr));
+
+	return 0;
+}
+
+/**
+ * umonitor() - vDSO function for setting up monitoring address
+ * @addr: Monitored address
+ *
+ * The vDSO function sets up address monitoring hardware using address @addr.
+ * It can be executed at any privilege level.
+ *
+ * umonitor() and its alias __vdso_umonitor() are implemented
+ * as vDSO functions.
+ *
+ * Return:
+ * 0: Successful
+ *
+ * Less than zero: error code
+ */
+int umonitor(void *addr)
+	__attribute__((weak, alias("__vdso_umonitor")));
+
+static inline int _umwait(int state, unsigned long eax, unsigned long edx)
+{
+	unsigned long cflags;
+
+	asm volatile("mov %3, %%edi\t\n"
+		     ".byte 0xf2, 0x0f, 0xae, 0xf7\t\n"
+		     "pushf\t\n"
+		     "pop %0\t\n"
+		     : "=r" (cflags)
+		     : "d" (edx), "a" (eax), "r"(state));
+
+	/*
+	 * If the processor wakes due to expiration of OS time-limit, the CF
+	 * flag is set. Otherwise, the flag is cleared.
+	 */
+	return cflags & 1;
+}
+
+notrace int __vdso_umwait(int state, unsigned long nsec)
+{
+	unsigned long tsc;
+	int ret;
+
+	if (!_vdso_funcs_data->waitpkg_supported)
+		return -ENODEV;
+
+	if (state != 0 && state != 1)
+		return -EINVAL;
+
+	ret = nsec_to_tsc(nsec, &tsc);
+	if (ret)
+		return ret;
+
+	/* Get umwait deadline */
+	tsc += rdtsc();
+	ret = _umwait(state, tsc & 0xffffffff, tsc >> 32);
+
+	return ret;
+}
+
+/**
+ * umwait() - vDSO function for user monitor wait
+ * @state: State
+ * @nsec: Time out in nanoseconds
+ *
+ * A hint that allows the processor to stop instruction execution and
+ * enter an implementation-dependent optimized state. The processor
+ * wakes up because of events such as store to the monitored address,
+ * timeout, NMI, SMI, machine check, debug exception, etc.
+ *
+ * State 0 is light-weight power optimized state. It allows the processor
+ * to enter C0.2 state which has larger power saving but slower wakeup time.
+ *
+ * State 1 is performance optimized state. It allows the processor
+ * to enter C0.1 state which has smaller power saving but faster wakeup time.
+ *
+ * This function can be executed at any privilege level.
+ *
+ * umwait() and its alias __vdso_umwait() are implemented as vDSO functions.
+ *
+ * Return:
+ * 1: the processor wakes due to expiration of OS time-limit
+ *
+ * 0: the processor wakes due to other reasons
+ *
+ * less than 0: error code
+ */
+int umwait(int state, unsigned long nsec)
+	__attribute__((weak, alias("__vdso_umwait")));
+
+static inline int _tpause(int state, unsigned long eax, unsigned long edx)
+{
+	unsigned long cflags;
+
+	asm volatile("mov %3, %%edi\t\n"
+		     ".byte 0x66, 0x0f, 0xae, 0xf7\t\n"
+		     "pushf\t\n"
+		     "pop %0\t\n"
+		     : "=r" (cflags)
+		     : "d" (edx), "a" (eax), "r"(state));
+
+	/*
+	 * If the processor wakes due to expiration of OS time-limit, the CF
+	 * flag is set. Otherwise, the flag is cleared.
+	 */
+	return cflags & 1;
+}
+
+notrace int __vdso_tpause(int state, unsigned long nsec)
+{
+	unsigned long tsc;
+	int ret;
+
+	if (!_vdso_funcs_data->waitpkg_supported)
+		return -ENODEV;
+
+	if (state != 0 && state != 1)
+		return -EINVAL;
+
+	ret = nsec_to_tsc(nsec, &tsc);
+	if (ret)
+		return ret;
+
+	/* Get tpause deadline */
+	tsc += rdtsc();
+	ret = _tpause(state, tsc & 0xffffffff, tsc >> 32);
+
+	return ret;
+}
+
+/**
+ * tpause() - vDSO function for timed pause
+ * @state: State
+ * @nsec: Timeout in nanoseconds
+ *
+ * tpause() allows the processor to stop instruction execution and
+ * enter an implementation-dependent optimized state. The processor
+ * wakes up because of events such as store to the monitored
+ * address, timeout, NMI, SMI, machine check, debug exception, etc.
+ *
+ * State 0 is light-weight power optimized state. It allows the processor
+ * to enter C0.2 state which has larger power saving but slower wakeup time.
+ *
+ * State 1 is performance optimized state. It allows the processor
+ * to enter C0.1 state which has smaller power saving but faster wakeup time.
+ *
+ * This function can be executed at any privilege level.
+ *
+ * tpause() and its alias __vdso_tpause() are implemented as vDSO functions.
+ *
+ * Return:
+ * 1: the processor wakes due to expiration of OS time-limit
+ *
+ * 0: the processor wakes due to other reasons
+ *
+ * less than 0: error code
+ */
+int tpause(int state, unsigned long nsec)
+	__attribute__((weak, alias("__vdso_tpause")));
diff --git a/arch/x86/include/asm/vdso_funcs_data.h b/arch/x86/include/asm/vdso_funcs_data.h
index b99a5685029e..a4caa64bbe8d 100644
--- a/arch/x86/include/asm/vdso_funcs_data.h
+++ b/arch/x86/include/asm/vdso_funcs_data.h
@@ -10,6 +10,9 @@
 struct vdso_funcs_data {
 	bool movdiri_supported;   /* if movdiri instruction is supported */
 	bool movdir64b_supported; /* if movdir64b instruction is supported */
+	bool waitpkg_supported;   /* if wait pkg instructions are supported */
+	bool tsc_known_freq;      /* if TSC has known freqency */
+	u64  tsc_per_nsec;        /* TSC counter per nanosecond */
 };
 
 #define _vdso_funcs_data (&VVAR(vdso_funcs_data))
-- 
2.5.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 7/7] selftests/vDSO: Add selftest to test vDSO functions for direct store and user wait instructions
  2018-07-23 12:55 [PATCH 0/7] x86: Enable a few new instructions Fenghua Yu
                   ` (5 preceding siblings ...)
  2018-07-23 12:55 ` [PATCH 6/7] x86/vdso: Add vDSO functions for user wait instructions Fenghua Yu
@ 2018-07-23 12:55 ` Fenghua Yu
  6 siblings, 0 replies; 19+ messages in thread
From: Fenghua Yu @ 2018-07-23 12:55 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H Peter Anvin
  Cc: Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86, Fenghua Yu

The selftest tool tests the vDSO functions for calling the instructions
including movdiri32, movdiri64, movdir64b, umonitor, umwait, tpause,
and their support checking.

Limited by testing environment, the selftest doesn't contain some
complex tests e.g. wake up process by writing the monitor address.
After testing environment is ready, the tests will be added.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 tools/testing/selftests/vDSO/Makefile             |   4 +-
 tools/testing/selftests/vDSO/vdso_inst_test_x86.c | 405 ++++++++++++++++++++++
 2 files changed, 408 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/vDSO/vdso_inst_test_x86.c

diff --git a/tools/testing/selftests/vDSO/Makefile b/tools/testing/selftests/vDSO/Makefile
index f5d7a7851e21..d83228714fbd 100644
--- a/tools/testing/selftests/vDSO/Makefile
+++ b/tools/testing/selftests/vDSO/Makefile
@@ -8,7 +8,7 @@ ifeq ($(CONFIG_X86_32),y)
 LDLIBS += -lgcc_s
 endif
 
-TEST_PROGS := $(OUTPUT)/vdso_test $(OUTPUT)/vdso_standalone_test_x86
+TEST_PROGS := $(OUTPUT)/vdso_test $(OUTPUT)/vdso_standalone_test_x86 $(OUTPUT)/vdso_inst_test_x86
 
 all: $(TEST_PROGS)
 $(OUTPUT)/vdso_test: parse_vdso.c vdso_test.c
@@ -17,5 +17,7 @@ $(OUTPUT)/vdso_standalone_test_x86: vdso_standalone_test_x86.c parse_vdso.c
 		vdso_standalone_test_x86.c parse_vdso.c \
 		-o $@
 
+$(OUTPUT)/vdso_inst_test_x86: parse_vdso.c vdso_inst_test_x86.c
+
 EXTRA_CLEAN := $(TEST_PROGS)
 endif
diff --git a/tools/testing/selftests/vDSO/vdso_inst_test_x86.c b/tools/testing/selftests/vDSO/vdso_inst_test_x86.c
new file mode 100644
index 000000000000..157afa7d56a1
--- /dev/null
+++ b/tools/testing/selftests/vDSO/vdso_inst_test_x86.c
@@ -0,0 +1,405 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Test vDSO APIs for direct store and user wait instructions
+ *
+ * Copyright (C) 2018 Intel Corporation
+ *
+ * Author: Fenghua Yu <fenghua.yu@intel.com>
+ */
+#include <stdint.h>
+#include <elf.h>
+#include <stdio.h>
+#include <sys/auxv.h>
+#include <sys/time.h>
+#include <stdbool.h>
+#include <string.h>
+#include <stdlib.h>
+
+extern void *vdso_sym(const char *version, const char *name);
+extern void vdso_init_from_sysinfo_ehdr(uintptr_t base);
+extern void vdso_init_from_auxv(void *auxv);
+
+const char *kernel_version = "LINUX_2.6";
+
+/*
+ * If the given instruction group is supported.
+ *   @vdso_name: vdso function name
+ *   @inst_name: instruction group name
+ *
+ * Return:
+ *   True: the instruction group is supported
+ *   False: the instruction group is not supported
+ */
+bool inst_supported(char *vdso_name, char *inst_name)
+{
+	typedef int (*inst_supported_t)(void);
+	int ret;
+
+	inst_supported_t inst_supported = (inst_supported_t)
+					  vdso_sym(kernel_version, vdso_name);
+
+	if (!inst_supported) {
+		printf("Could not find %s\n", vdso_name);
+
+		return false;
+	}
+
+	return inst_supported();
+}
+
+void test_inst_support(char *vdso_name, char *inst_name)
+{
+	int ret;
+
+	ret = inst_supported(vdso_name, inst_name);
+	if (ret)
+		printf("%s supported\n", inst_name);
+	else
+		printf("%s not supported\n", inst_name);
+}
+
+void test_movdiri_support(void)
+{
+	test_inst_support("__vdso_movdiri_supported", "movdiri");
+}
+
+void test_movdir64b_support(void)
+{
+	test_inst_support("__vdso_movdir64b_supported", "movdir64b");
+}
+
+void test_waitpkg_support(void)
+{
+	test_inst_support("__vdso_waitpkg_supported", "waitpkg");
+}
+
+void test_insts_support(void)
+{
+	printf("==");
+	printf("features detection test:\n");
+	test_movdiri_support();
+	test_movdir64b_support();
+	test_waitpkg_support();
+}
+
+/* Test movdiri 32-bit API */
+void test_movdiri32(void)
+{
+	typedef void (*movdiri32_t)(int *, int);
+	char vdso_name[] = "__vdso_movdiri32", inst_name[] = "movdiri32";
+	int ret, data, dst __attribute((aligned(64)));
+
+	printf("==");
+
+	/* movdiri instructions should be supported */
+	ret = inst_supported("__vdso_movdiri_supported", "movdiri");
+	if (!ret) {
+		printf("movdiri not supported\n");
+
+		return;
+	}
+
+	/* movdiri32 API should exist */
+	movdiri32_t movdiri32 = (movdiri32_t)
+				vdso_sym(kernel_version, vdso_name);
+	if (!movdiri32) {
+		printf("Could not find %s\n", vdso_name);
+
+		return;
+	}
+
+	dst = 0;
+	data = 100;
+	/* Call movdiri32 API to move 100 to dst */
+	movdiri32(&dst, data);
+	if (dst == data)
+		printf("%s passes\n", inst_name);
+	else
+		printf("%s fails\n", inst_name);
+}
+
+/* Test movdiri 64-bit API */
+void test_movdiri64(void)
+{
+	typedef void (*movdiri64_t)(long *, long);
+	char vdso_name[] = "__vdso_movdiri64", inst_name[] = "movdiri64";
+	long dst __attribute((aligned(64))), data __attribute((aligned(64)));
+
+	printf("==");
+
+	/* movdiri instructions should be supported */
+	if (!inst_supported("__vdso_movdiri_supported", "movdiri")) {
+		printf("movdiri is not supported\n");
+
+		return;
+	}
+
+	/* movdiri64 API should exist */
+	movdiri64_t movdiri64 = (movdiri64_t)
+				vdso_sym(kernel_version, vdso_name);
+	if (!movdiri64) {
+		printf("Could not find %s\n", vdso_name);
+
+		return;
+	}
+
+	dst = 0;
+	data = 0x123456789abcdef0;
+
+	/* Call movdiri64 API to move 64-bit data to dst */
+	movdiri64(&dst, data);
+
+	if (dst == data)
+		printf("movdiri 64-bit test passed\n");
+	else
+		printf("movdiri 64-bit test failed\n");
+}
+
+void test_movdir64b(void)
+{
+	typedef void (*movdir64b_t)(void *, void *);
+	char __attribute((aligned(64))) dst[1024];
+	char vdso_name[] = "__vdso_movdir64b";
+	char inst_name[] = "movdir64b";
+	int  data_size = 64;
+	char src[1024];
+
+	printf("==");
+
+	/* movdiri instructions should be supported */
+	if (!inst_supported("__vdso_movdir64b_supported", "movdir64b")) {
+		printf("movdir64b is not supported\n");
+
+		return;
+	}
+
+	/* movdir64b API should exist */
+	movdir64b_t movdir64b = (movdir64b_t)
+				vdso_sym(kernel_version, vdso_name);
+	if (!movdir64b) {
+		printf("Could not find %s\n", vdso_name);
+
+		return;
+	}
+
+	memset(src, 0, data_size);
+	memset(dst, 0, data_size);
+	for (int i = 0; i < data_size; i++)
+		dst[i] = i;
+
+	/* Call movdir64b API to move 64 bytes data from src to dst */
+	movdir64b(src, dst);
+
+	if (memcmp(src, dst, data_size))
+		printf("movdir64b test failed\n");
+	else
+		printf("movdir64b test passed\n");
+}
+
+bool waitpkg_supported(void)
+{
+	return inst_supported("__vdso_waitpkg_supported", "waitpkg_supported");
+}
+
+bool nsec_to_tsc(unsigned long nsec, unsigned long *tsc)
+{
+	typedef int (*nsec_to_tsc_t)(unsigned long, unsigned long *);
+	char vdso_name[] = "__vdso_nsec_to_tsc";
+	int ret;
+
+	/* nsec_to_tsc API should exist */
+	nsec_to_tsc_t nsec_to_tsc = (nsec_to_tsc_t)
+			    vdso_sym(kernel_version, "__vdso_nsec_to_tsc");
+	if (!nsec_to_tsc) {
+		printf("Could not find __vdso_nsec_to_tsc\n");
+
+		return false;
+	}
+
+	/* Call nsec_to_tsc API to convert nsec to tsc */
+	ret = nsec_to_tsc(nsec, tsc);
+	if (ret)
+		return false;
+
+	return true;
+}
+
+static unsigned long rdtsc(void)
+{
+	unsigned int low, high;
+
+	asm volatile ("rdtsc\t\n"
+		      : "=a" (low), "=d" (high));
+
+	return (unsigned long)high << 32 | low;
+}
+
+void test_timeout(char *test_name, int state, unsigned long timeout_ns,
+		  unsigned long overhead_ns)
+{
+	typedef int (*umwait_t)(int state, unsigned long nsec);
+	typedef int (*tpause_t)(int state, unsigned long nsec);
+	unsigned long tsc1, tsc2, real_tsc, real_ns, tsc_per_nsec;
+	tpause_t tpause;
+	umwait_t umwait;
+
+	if (!nsec_to_tsc(1, &tsc_per_nsec)) {
+		printf("timeout test failed: ns cannot be converted to tsc.\n");
+		return;
+	}
+
+	/* tpause API should exist */
+	tpause = (tpause_t)vdso_sym(kernel_version, "__vdso_tpause");
+	if (!tpause) {
+		printf("Could not find tpause\n");
+
+		return;
+	}
+
+	/* umwait API should exist */
+	umwait = (umwait_t)vdso_sym(kernel_version, "__vdso_umwait");
+	if (!umwait) {
+		printf("Could not find umwait\n");
+
+		return;
+	}
+
+	umwait = (umwait_t)vdso_sym(kernel_version, "umwait");
+	if (!umwait) {
+		printf("Could not find umwait\n");
+
+		return;
+	}
+
+	if (waitpkg_supported()) {
+		if (!strcmp(test_name, "umwait")) {
+			tsc1 = rdtsc();
+			umwait(state, timeout_ns);
+			tsc2 = rdtsc();
+		} else {
+			tsc1 = rdtsc();
+			tpause(state, timeout_ns);
+			tsc2 = rdtsc();
+		}
+		real_tsc = tsc2 - tsc1;
+		real_ns = real_tsc / tsc_per_nsec;
+		/* Give enough time for overhead on slow running machine. */
+		if (abs(real_ns - timeout_ns) < overhead_ns) {
+			printf("%s C0.%1d test passed\n", test_name, state + 1);
+		} else {
+			printf("%s test failed:\n", test_name);
+			printf("real=%luns, expected=%luns. ",
+			       real_ns, timeout_ns);
+			printf("Likely due to slow machine. ");
+			printf("Please adjust overhead_ns or re-run test for ");
+			printf("a few more times.\n");
+		}
+	} else {
+		printf("%s is not supported\n", test_name);
+	}
+}
+
+void test_tpause_timeout(int state)
+{
+	/*
+	 * Timeout 100usec. Assume overhead of executing umwait is 30usec.
+	 * You can adjust the overhead number based on your machine.
+	 */
+	test_timeout("tpause", state, 100000, 30000);
+}
+
+/* Test tpause API */
+void test_tpause(void)
+{
+	printf("==");
+	/* Test timeout in state 0 (C0.2). */
+	test_tpause_timeout(0);
+	/* Test timeout in state 1 (C0.1). */
+	test_tpause_timeout(1);
+	/* More tests ... */
+}
+
+char umonitor_range[1024];
+
+void test_umonitor_only(void)
+{
+	typedef void (*umonitor_t)(void *addr);
+
+	/* umonitor API should exist */
+	umonitor_t umonitor = (umonitor_t)
+			      vdso_sym(kernel_version, "__vdso_umonitor");
+	if (!umonitor) {
+		printf("Could not find umonitor\n");
+
+		return;
+	}
+
+	if (waitpkg_supported()) {
+		umonitor(umonitor_range);
+		printf("umonitor test passed\n");
+	} else {
+		printf("waitpkg not supported\n");
+	}
+}
+
+/* Test umonitor API */
+void test_umonitor(void)
+{
+	printf("==");
+	test_umonitor_only();
+}
+
+void test_umwait_timeout(int state)
+{
+	/*
+	 * Timeout 100usec. Overhead of executing umwait assumes 90usec.
+	 * You can adjust the overhead number based on your machine.
+	 */
+	test_timeout("umwait", state, 100000, 90000);
+}
+
+/* Test umwait API */
+void test_umwait(void)
+{
+	printf("==");
+	/* Test timeout in state 0 (C0.2). */
+	test_umwait_timeout(0);
+	/* Test timeout in state 1 (C0.1). */
+	test_umwait_timeout(1);
+	/* To add more tests here ... */
+}
+
+void show_basic_info(void)
+{
+	unsigned long tsc;
+	int ret;
+
+	ret = nsec_to_tsc(1, &tsc);
+	if (ret < 0)
+		printf("not tsc freq CPUID available\n");
+	else
+		printf("1 nsec = %lu tsc\n", tsc);
+}
+
+int main(int argc, char **argv)
+{
+	unsigned long sysinfo_ehdr = getauxval(AT_SYSINFO_EHDR);
+
+	if (!sysinfo_ehdr) {
+		printf("AT_SYSINFO_EHDR is not present!\n");
+		return 0;
+	}
+
+	vdso_init_from_sysinfo_ehdr(getauxval(AT_SYSINFO_EHDR));
+
+	show_basic_info();
+	test_insts_support();
+	test_movdiri32();
+	test_movdiri64();
+	test_movdir64b();
+	test_umonitor();
+	test_umwait();
+	test_tpause();
+
+	return 0;
+}
-- 
2.5.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/7] x86/umwait_contro: Set global umwait maximum time limit and umwait C0.2 state
  2018-07-23 12:55 ` [PATCH 4/7] x86/umwait_contro: Set global umwait maximum time limit and umwait C0.2 state Fenghua Yu
@ 2018-07-24  1:41   ` Andy Lutomirski
  2018-08-01  9:01     ` Thomas Gleixner
  0 siblings, 1 reply; 19+ messages in thread
From: Andy Lutomirski @ 2018-07-24  1:41 UTC (permalink / raw)
  To: Fenghua Yu, Thomas Gleixner, Ingo Molnar, H Peter Anvin
  Cc: Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86

On 07/23/2018 05:55 AM, Fenghua Yu wrote:
> UMWAIT or TPAUSE called by user process makes processor to reside in
> a light-weight power/performance optimized state (C0.1 state) or an
> improved power/performance optimized state (C0.2 state).
> 
> IA32_UMWAIT_CONTROL MSR register allows OS to set global maximum umwait
> time and disable C0.2 on the processor.
> 
> The maximum time value in IA32_UMWAIT_CONTROL[31-2] is set as zero which
> means there is no global time limit for UMWAIT and TPAUSE instructions.
> Each process sets its own umwait maximum time as the instructions operand.
> We don't set a non-zero global umwait maximum time value to enforce user
> wait timeout because we couldn't find any usage for it.

Do you know what the instruction designers had in mind?  I assume they 
were thinking of *something*, but I'm seriously mystified by three things:

  - Why does CF work the way it does?  It seems like it would be 
genuinely useful for CF to indicate whether the in-register timeout has 
expired, but that's not what CF does.

  - Why does the global timeout apply even at CPL 0?

  - Why does the C0.2 control apply at CPL 0?

And I'm also a bit surprised that the instruction can't be turned off 
entirely for CPL 3.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/7] x86/vdso: Add vDSO functions for direct store instructions
  2018-07-23 12:55 ` [PATCH 5/7] x86/vdso: Add vDSO functions for direct store instructions Fenghua Yu
@ 2018-07-24  1:48   ` Andy Lutomirski
  2018-07-24  3:42     ` Fenghua Yu
  0 siblings, 1 reply; 19+ messages in thread
From: Andy Lutomirski @ 2018-07-24  1:48 UTC (permalink / raw)
  To: Fenghua Yu, Thomas Gleixner, Ingo Molnar, H Peter Anvin
  Cc: Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86

On 07/23/2018 05:55 AM, Fenghua Yu wrote:
> User wants to query if direct store instructions are supported and use
> the instructions. The vDSO functions provides fast interface for user
> to query the support and use the instructions.
> 
> movdiri_supported and its alias __vdso_movdiri_supported check if
> movdiri instructions are supported.
> 
> movdir64b_supported and its alias __vdso_movdir64b_supported checks
> if movdir64b instruction is supported.
> 
> movdiri32 and its alias __vdso_movdiri32 provide user APIs for calling
> 32-bit movdiri instruction.
> 
> movdiri64 and its alias __vdso_movdiri64 provide user APIs for calling
> 64-bit movdiri instruction.
> 
> movdir64b and its alias __vdso_movdir64b provide user APIs to move
> 64-byte data through movdir64b instruction.
> 
> The instructions can be implemented in intrinsic functions in future
> GCC. But the vDSO interfaces are available to user without the
> intrinsic functions support in GCC and the APIs movdiri_supported and
> movdir64b_supported cannot be implemented as GCC functions.

I'm not convinced that any of this belongs in the vDSO at all.  You 
could just add AT_HWCAP (or AT_HWCAP2) flags for the new instructions. 
Or user code could use CPUID just like for any other new instruction. 
But, if there really is some compelling reason to add this to the vDSO, 
then see below:


+
> +notrace bool __vdso_movdiri_supported(void)
> +{
> +	return _vdso_funcs_data->movdiri_supported;

return static_cpu_has(X86_FEATURE_MOVDIRI);

And all the VVAR stuff can be removed.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/7] x86/vdso: Add vDSO functions for user wait instructions
  2018-07-23 12:55 ` [PATCH 6/7] x86/vdso: Add vDSO functions for user wait instructions Fenghua Yu
@ 2018-07-24  2:11   ` Andy Lutomirski
  2018-07-24 15:14     ` Andy Lutomirski
  2018-07-31 21:22     ` Thomas Gleixner
  0 siblings, 2 replies; 19+ messages in thread
From: Andy Lutomirski @ 2018-07-24  2:11 UTC (permalink / raw)
  To: Fenghua Yu, Thomas Gleixner, Ingo Molnar, H Peter Anvin
  Cc: Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86

On 07/23/2018 05:55 AM, Fenghua Yu wrote:
> User wants to query if user wait instructions (umonitor, umwait, and
> tpause) are supported and use the instructions. The vDSO functions
> provides fast interface for user to check the support and use the
> instructions.
> 
> waitpkg_supported and its alias __vdso_waitpkg_supported check if
> user wait instructions (a.k.a. wait package feature) are supported
> 
> umonitor and its alias __vdso_umonitor provide user APIs for calling
> umonitor instruction.
> 
> umwait and its alias __vdso_umwait provide user APIs for calling
> umwait instruction.
> 
> tpause and its alias __vdso_tpause provide user APIs for calling
> tpause instruction.
> 
> nsec_to_tsc and its alias __vdso_nsec_to_tsc converts nanoseconds
> to TSC counter if TSC frequency is known. It will fail if TSC frequency
> is unknown.
> 
> The instructions can be implemented in intrinsic functions in future
> GCC. But the vDSO interfaces are available to user without the
> intrinsic functions support in GCC and the API waitpkg_supported and
> nsec_to_tsc cannot be implemented as GCC functions.
> 
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> ---
>   arch/x86/entry/vdso/Makefile           |   2 +-
>   arch/x86/entry/vdso/vdso.lds.S         |  10 ++
>   arch/x86/entry/vdso/vma.c              |   9 ++
>   arch/x86/entry/vdso/vuserwait.c        | 233 +++++++++++++++++++++++++++++++++
>   arch/x86/include/asm/vdso_funcs_data.h |   3 +
>   5 files changed, 256 insertions(+), 1 deletion(-)
>   create mode 100644 arch/x86/entry/vdso/vuserwait.c
> 
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index af4fcae5de83..fb0062b09b3c 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -17,7 +17,7 @@ VDSO32-$(CONFIG_X86_32)		:= y
>   VDSO32-$(CONFIG_IA32_EMULATION)	:= y
>   
>   # files to link into the vdso
> -vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o vdirectstore.o
> +vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o vdirectstore.o vuserwait.o
>   
>   # files to link into kernel
>   obj-y				+= vma.o
> diff --git a/arch/x86/entry/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S
> index 097cdcda43a5..0942710608bf 100644
> --- a/arch/x86/entry/vdso/vdso.lds.S
> +++ b/arch/x86/entry/vdso/vdso.lds.S
> @@ -35,6 +35,16 @@ VERSION {
>   		__vdso_movdir64b_supported;
>   		movdir64b;
>   		__vdso_movdir64b;
> +		waitpkg_supported;
> +		__vdso_waitpkg_supported;
> +		umonitor;
> +		__vdso_umonitor;
> +		umwait;
> +		__vdso_umwait;
> +		tpause;
> +		__vdso_tpause;
> +		nsec_to_tsc;
> +		__vdso_nsec_to_tsc;
>   	local: *;
>   	};
>   }
> diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
> index edbe5e63e5c2..006dfb5e5003 100644
> --- a/arch/x86/entry/vdso/vma.c
> +++ b/arch/x86/entry/vdso/vma.c
> @@ -372,10 +372,19 @@ static int vgetcpu_online(unsigned int cpu)
>   
>   static void __init init_vdso_funcs_data(void)
>   {
> +	struct system_counterval_t sys_counterval;
> +
>   	if (static_cpu_has(X86_FEATURE_MOVDIRI))
>   		vdso_funcs_data.movdiri_supported = true;
>   	if (static_cpu_has(X86_FEATURE_MOVDIR64B))
>   		vdso_funcs_data.movdir64b_supported = true;
> +	if (static_cpu_has(X86_FEATURE_WAITPKG))
> +		vdso_funcs_data.waitpkg_supported = true;
> +	if (static_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) {
> +		vdso_funcs_data.tsc_known_freq = true;
> +		sys_counterval = convert_art_ns_to_tsc(1);
> +		vdso_funcs_data.tsc_per_nsec = sys_counterval.cycles;
> +	}

You're losing a ton of precision here.  You might even be losing *all* 
of the precision and malfunctioning rather badly.

The correct way to do this is:

tsc_counts = ns * mul >> shift;

and the vclock code illustrates it.  convert_art_ns_to_tsc() is a bad 
example because it uses an expensive division operation for no good 
reason except that no one bothered to optimize it.

> +notrace int __vdso_nsec_to_tsc(unsigned long nsec, unsigned long *tsc)
> +{
> +	if (!_vdso_funcs_data->tsc_known_freq)
> +		return -ENODEV;
> +
> +	*tsc = _vdso_funcs_data->tsc_per_nsec * nsec;
> +
> +	return 0;
> +}

Please don't expose this one at all.  It would be nice for programs that 
use waitpkg to be migratable using CRIU-like tools, and this export 
actively harms any such effort.  If you omit this function, then the 
kernel could learn to abort an in-progress __vdso_umwait if preempted 
(rseq-style) and CRIU would just work.  It would be a bit of a hack, but 
it solves a real problem.

> +notrace int __vdso_umwait(int state, unsigned long nsec)

__vdso_umwait_relative(), please.  Because some day (possibly soon) 
someone will want __vdso_umwait_absolute() and its friend 
__vdso_read_art_ns() so they can do:

u64 start = __vdso_read_art_ns();
__vdso_umonitor(...);
... do something potentially slow or that might fault ...
__vdso_umwait_absolute(start + timeout);

Also, this patch appears to have a subtle but show-stopping race.  Consider:

1. Task A does UMONITOR on CPU 1
2. Task A is preempted.
3. Task B does UMONITOR on CPU 1 at a different address
4. Task A resumes
5. Task A does UMWAIT

Now task A hangs, at least until the next external wakeup happens.

It's not entirely clear to me how you're supposed to fix this without 
some abomination that's so bad that it torpedoes the entire feature. 
Except that there is no chicken bit to turn this thing off.  Sigh.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/7] x86/vdso: Add vDSO functions for direct store instructions
  2018-07-24  1:48   ` Andy Lutomirski
@ 2018-07-24  3:42     ` Fenghua Yu
  2018-07-24  5:27       ` Andy Lutomirski
  0 siblings, 1 reply; 19+ messages in thread
From: Fenghua Yu @ 2018-07-24  3:42 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, H Peter Anvin,
	Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86

On Mon, Jul 23, 2018 at 06:48:00PM -0700, Andy Lutomirski wrote:
> On 07/23/2018 05:55 AM, Fenghua Yu wrote:
> >The instructions can be implemented in intrinsic functions in future
> >GCC. But the vDSO interfaces are available to user without the
> I'm not convinced that any of this belongs in the vDSO at all.  You could
> just add AT_HWCAP (or AT_HWCAP2) flags for the new instructions. Or user

Thomas asked to use vDSO. Please see the discussion thread:
https://lkml.org/lkml/2018/6/19/316

> code could use CPUID just like for any other new instruction. But, if there
> really is some compelling reason to add this to the vDSO, then see below:
> >+notrace bool __vdso_movdiri_supported(void)
> >+{
> >+	return _vdso_funcs_data->movdiri_supported;
> return static_cpu_has(X86_FEATURE_MOVDIRI);

But boot_cpu_data (used in static_cpu_has) cannot be accessed by user
unless mapped in VVAR. So this change cannot be compiled.

> And all the VVAR stuff can be removed.

The VVAR stuff needs to map the kernel data _vdso_funcs_data to user
space before user can access it.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/7] x86/vdso: Add vDSO functions for direct store instructions
  2018-07-24  3:42     ` Fenghua Yu
@ 2018-07-24  5:27       ` Andy Lutomirski
  2018-07-25 22:18         ` Fenghua Yu
  0 siblings, 1 reply; 19+ messages in thread
From: Andy Lutomirski @ 2018-07-24  5:27 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H Peter Anvin,
	Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86

On Mon, Jul 23, 2018 at 8:42 PM, Fenghua Yu <fenghua.yu@intel.com> wrote:
> On Mon, Jul 23, 2018 at 06:48:00PM -0700, Andy Lutomirski wrote:
>> On 07/23/2018 05:55 AM, Fenghua Yu wrote:
>> >The instructions can be implemented in intrinsic functions in future
>> >GCC. But the vDSO interfaces are available to user without the
>> I'm not convinced that any of this belongs in the vDSO at all.  You could
>> just add AT_HWCAP (or AT_HWCAP2) flags for the new instructions. Or user
>
> Thomas asked to use vDSO. Please see the discussion thread:
> https://lkml.org/lkml/2018/6/19/316

I think he meant that, if these helpers belong in the kernel at all,
then they belong in the vDSO.  But I think they mostly don't belong in
the kernel.

>
>> code could use CPUID just like for any other new instruction. But, if there
>> really is some compelling reason to add this to the vDSO, then see below:
>> >+notrace bool __vdso_movdiri_supported(void)
>> >+{
>> >+    return _vdso_funcs_data->movdiri_supported;
>> return static_cpu_has(X86_FEATURE_MOVDIRI);
>
> But boot_cpu_data (used in static_cpu_has) cannot be accessed by user
> unless mapped in VVAR. So this change cannot be compiled.

The underlying alternative infrastructure works in the vDSO.  You'd
need to introduce an alternate version of _static_cpu_has if
BUILD_VDSO that skips the boot_cpu_has fallback.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/7] x86/vdso: Add vDSO functions for user wait instructions
  2018-07-24  2:11   ` Andy Lutomirski
@ 2018-07-24 15:14     ` Andy Lutomirski
  2018-07-31 21:22     ` Thomas Gleixner
  1 sibling, 0 replies; 19+ messages in thread
From: Andy Lutomirski @ 2018-07-24 15:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, H Peter Anvin,
	Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86

On Mon, Jul 23, 2018 at 7:11 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On 07/23/2018 05:55 AM, Fenghua Yu wrote:
>>
>> User wants to query if user wait instructions (umonitor, umwait, and
>> tpause) are supported and use the instructions. The vDSO functions
>> provides fast interface for user to check the support and use the
>> instructions.
>>
>> waitpkg_supported and its alias __vdso_waitpkg_supported check if
>> user wait instructions (a.k.a. wait package feature) are supported
>>
>> umonitor and its alias __vdso_umonitor provide user APIs for calling
>> umonitor instruction.
>>
>> umwait and its alias __vdso_umwait provide user APIs for calling
>> umwait instruction.
>>
>> tpause and its alias __vdso_tpause provide user APIs for calling
>> tpause instruction.
>>
>> nsec_to_tsc and its alias __vdso_nsec_to_tsc converts nanoseconds
>> to TSC counter if TSC frequency is known. It will fail if TSC frequency
>> is unknown.
>>
>> The instructions can be implemented in intrinsic functions in future
>> GCC. But the vDSO interfaces are available to user without the
>> intrinsic functions support in GCC and the API waitpkg_supported and
>> nsec_to_tsc cannot be implemented as GCC functions.
>>
>> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
>> ---
>>   arch/x86/entry/vdso/Makefile           |   2 +-
>>   arch/x86/entry/vdso/vdso.lds.S         |  10 ++
>>   arch/x86/entry/vdso/vma.c              |   9 ++
>>   arch/x86/entry/vdso/vuserwait.c        | 233
>> +++++++++++++++++++++++++++++++++
>>   arch/x86/include/asm/vdso_funcs_data.h |   3 +
>>   5 files changed, 256 insertions(+), 1 deletion(-)
>>   create mode 100644 arch/x86/entry/vdso/vuserwait.c
>>
>> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
>> index af4fcae5de83..fb0062b09b3c 100644
>> --- a/arch/x86/entry/vdso/Makefile
>> +++ b/arch/x86/entry/vdso/Makefile
>> @@ -17,7 +17,7 @@ VDSO32-$(CONFIG_X86_32)               := y
>>   VDSO32-$(CONFIG_IA32_EMULATION)       := y
>>     # files to link into the vdso
>> -vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o vdirectstore.o
>> +vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o vdirectstore.o
>> vuserwait.o
>>     # files to link into kernel
>>   obj-y                         += vma.o
>> diff --git a/arch/x86/entry/vdso/vdso.lds.S
>> b/arch/x86/entry/vdso/vdso.lds.S
>> index 097cdcda43a5..0942710608bf 100644
>> --- a/arch/x86/entry/vdso/vdso.lds.S
>> +++ b/arch/x86/entry/vdso/vdso.lds.S
>> @@ -35,6 +35,16 @@ VERSION {
>>                 __vdso_movdir64b_supported;
>>                 movdir64b;
>>                 __vdso_movdir64b;
>> +               waitpkg_supported;
>> +               __vdso_waitpkg_supported;
>> +               umonitor;
>> +               __vdso_umonitor;
>> +               umwait;
>> +               __vdso_umwait;
>> +               tpause;
>> +               __vdso_tpause;
>> +               nsec_to_tsc;
>> +               __vdso_nsec_to_tsc;
>>         local: *;
>>         };
>>   }
>> diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
>> index edbe5e63e5c2..006dfb5e5003 100644
>> --- a/arch/x86/entry/vdso/vma.c
>> +++ b/arch/x86/entry/vdso/vma.c
>> @@ -372,10 +372,19 @@ static int vgetcpu_online(unsigned int cpu)
>>     static void __init init_vdso_funcs_data(void)
>>   {
>> +       struct system_counterval_t sys_counterval;
>> +
>>         if (static_cpu_has(X86_FEATURE_MOVDIRI))
>>                 vdso_funcs_data.movdiri_supported = true;
>>         if (static_cpu_has(X86_FEATURE_MOVDIR64B))
>>                 vdso_funcs_data.movdir64b_supported = true;
>> +       if (static_cpu_has(X86_FEATURE_WAITPKG))
>> +               vdso_funcs_data.waitpkg_supported = true;
>> +       if (static_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) {
>> +               vdso_funcs_data.tsc_known_freq = true;
>> +               sys_counterval = convert_art_ns_to_tsc(1);
>> +               vdso_funcs_data.tsc_per_nsec = sys_counterval.cycles;
>> +       }
>
>
> You're losing a ton of precision here.  You might even be losing *all* of
> the precision and malfunctioning rather badly.
>
> The correct way to do this is:
>
> tsc_counts = ns * mul >> shift;
>
> and the vclock code illustrates it.  convert_art_ns_to_tsc() is a bad
> example because it uses an expensive division operation for no good reason
> except that no one bothered to optimize it.
>
>> +notrace int __vdso_nsec_to_tsc(unsigned long nsec, unsigned long *tsc)
>> +{
>> +       if (!_vdso_funcs_data->tsc_known_freq)
>> +               return -ENODEV;
>> +
>> +       *tsc = _vdso_funcs_data->tsc_per_nsec * nsec;
>> +
>> +       return 0;
>> +}
>
>
> Please don't expose this one at all.  It would be nice for programs that use
> waitpkg to be migratable using CRIU-like tools, and this export actively
> harms any such effort.  If you omit this function, then the kernel could
> learn to abort an in-progress __vdso_umwait if preempted (rseq-style) and
> CRIU would just work.  It would be a bit of a hack, but it solves a real
> problem.
>
>> +notrace int __vdso_umwait(int state, unsigned long nsec)
>
>
> __vdso_umwait_relative(), please.  Because some day (possibly soon) someone
> will want __vdso_umwait_absolute() and its friend __vdso_read_art_ns() so
> they can do:
>
> u64 start = __vdso_read_art_ns();
> __vdso_umonitor(...);
> ... do something potentially slow or that might fault ...
> __vdso_umwait_absolute(start + timeout);
>
> Also, this patch appears to have a subtle but show-stopping race.  Consider:
>
> 1. Task A does UMONITOR on CPU 1
> 2. Task A is preempted.
> 3. Task B does UMONITOR on CPU 1 at a different address
> 4. Task A resumes
> 5. Task A does UMWAIT
>
> Now task A hangs, at least until the next external wakeup happens.
>
> It's not entirely clear to me how you're supposed to fix this without some
> abomination that's so bad that it torpedoes the entire feature. Except that
> there is no chicken bit to turn this thing off.  Sigh.

The UMWAIT mechanism also looks like it will work incorrectly under a
VM.  How do you (or, more generally, Intel) plan to handle that?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/7] x86/vdso: Add vDSO functions for direct store instructions
  2018-07-24  5:27       ` Andy Lutomirski
@ 2018-07-25 22:18         ` Fenghua Yu
  0 siblings, 0 replies; 19+ messages in thread
From: Fenghua Yu @ 2018-07-25 22:18 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, H Peter Anvin,
	Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86,
	Hongjiu Lu

On Mon, Jul 23, 2018 at 10:27:34PM -0700, Andy Lutomirski wrote:
> On Mon, Jul 23, 2018 at 8:42 PM, Fenghua Yu <fenghua.yu@intel.com> wrote:
> > On Mon, Jul 23, 2018 at 06:48:00PM -0700, Andy Lutomirski wrote:
> >> On 07/23/2018 05:55 AM, Fenghua Yu wrote:
> >> >The instructions can be implemented in intrinsic functions in future
> >> >GCC. But the vDSO interfaces are available to user without the
> >> I'm not convinced that any of this belongs in the vDSO at all.  You could
> >> just add AT_HWCAP (or AT_HWCAP2) flags for the new instructions. Or user
> >
> > Thomas asked to use vDSO. Please see the discussion thread:
> > https://lkml.org/lkml/2018/6/19/316
> 
> I think he meant that, if these helpers belong in the kernel at all,
> then they belong in the vDSO.  But I think they mostly don't belong in
> the kernel.

I think both you and Thomas are right. I misunderstood Thomas's comments.

I will remove the APIs/functions from next version of patch set. They
don't belong to kernel and will be implemented in either GCC or glibc.

GCC8 already implemented movdiri and movdir64b instruction intrinsics.
GCC9 will implement user wait instruction intrinsics.
If user wants to use those instinsics now, this GCC link has them:
https://gcc.gnu.org/git/?p=gcc.git;a=summary

If user wants to use the intrinsics in native code, the user needs to
check availability of the features (i.e. movdiri, movdir64b, waitpkg)
by running cpuid or a glibc interface that is going to be available in
newer glibc.

Thank you for  your comments!

-Fenghua

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/7] x86/vdso: Add vDSO functions for user wait instructions
  2018-07-24  2:11   ` Andy Lutomirski
  2018-07-24 15:14     ` Andy Lutomirski
@ 2018-07-31 21:22     ` Thomas Gleixner
  2018-07-31 21:38       ` Andy Lutomirski
  1 sibling, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2018-07-31 21:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Fenghua Yu, Ingo Molnar, H Peter Anvin, Ashok Raj, Alan Cox,
	Ravi V Shankar, linux-kernel, x86

On Mon, 23 Jul 2018, Andy Lutomirski wrote:
> On 07/23/2018 05:55 AM, Fenghua Yu wrote:
> >     static void __init init_vdso_funcs_data(void)
> >   {
> > +	struct system_counterval_t sys_counterval;
> > +
> >   	if (static_cpu_has(X86_FEATURE_MOVDIRI))
> >   		vdso_funcs_data.movdiri_supported = true;
> >   	if (static_cpu_has(X86_FEATURE_MOVDIR64B))
> >   		vdso_funcs_data.movdir64b_supported = true;
> > +	if (static_cpu_has(X86_FEATURE_WAITPKG))
> > +		vdso_funcs_data.waitpkg_supported = true;
> > +	if (static_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) {
> > +		vdso_funcs_data.tsc_known_freq = true;
> > +		sys_counterval = convert_art_ns_to_tsc(1);
> > +		vdso_funcs_data.tsc_per_nsec = sys_counterval.cycles;
> > +	}
> 
> You're losing a ton of precision here.  You might even be losing *all* of the
> precision and malfunctioning rather badly.

Indeed.

> The correct way to do this is:
> 
> tsc_counts = ns * mul >> shift;

> and the vclock code illustrates it.

Albeit you cannot use the TSC mult/shift pair as that is for the TSC to
nsec conversion.

To get the proper mult/shift pair use clocks_calc_mult_shift(). Note that
the scaled math has an upper limit when using 64 bit variables. You might
need 128bit scaled math to make it work correctly.

> convert_art_ns_to_tsc() is a bad example because it uses an expensive
> division operation for no good reason except that no one bothered to
> optimize it.

Right. It's not a hot path function and it does the job and we would need
128bit scaled math to avoid mult overflows.

Aside of that I have no idea why anyone would use convert_art_ns_to_tsc()
for anything else than converting art to nsecs.

> > +notrace int __vdso_umwait(int state, unsigned long nsec)
> 
> __vdso_umwait_relative(), please.  Because some day (possibly soon) someone
> will want __vdso_umwait_absolute() and its friend __vdso_read_art_ns() so they
> can do:
> 
> u64 start = __vdso_read_art_ns();

Errm. No. You can't read ART. ART is only used by decives to which it is
distributed. You can only read TSC here and convert that to nsecs.

> __vdso_umonitor(...);
> ... do something potentially slow or that might fault ...
> __vdso_umwait_absolute(start + timeout);

That definitely requires 128bit scaled math to work correctly, unless you
make the timeout relative before conversion.

But I really think we should avoid creating yet another interface to
retrieve TSC time in nsecs. We have enough of these things already.

Ideally we'd use CLOCK_MONOTONIC here, but that needs more thought as:

  1) TSC might be disabled as the timekeeping clocksource

  2) The mult/shift pair for converting to nanoseconds is affected by
     NTP/PTP so it can be different from the initial mult/shift pair for
     converting nanoseconds to TSC.

A possible solution would be to use CLOCK_MOTONIC_RAW which is not affected
by NTP/PTP adjustments. But that still has the issue of TSC not being the
timekeeping clocksource. Bah, the whole TSC deadline mode sucks. I have no
idea what's wrong with simple down counters. They Just Work.

> Also, this patch appears to have a subtle but show-stopping race.  Consider:

It's not the patch which has this issue. It's the hardware ....

> 1. Task A does UMONITOR on CPU 1
> 2. Task A is preempted.
> 3. Task B does UMONITOR on CPU 1 at a different address
> 4. Task A resumes
> 5. Task A does UMWAIT
> 
> Now task A hangs, at least until the next external wakeup happens.
> 
> It's not entirely clear to me how you're supposed to fix this without some
> abomination that's so bad that it torpedoes the entire feature. Except that
> there is no chicken bit to turn this thing off.  Sigh.

That sounds similar to the ARM monitor which is used for implementing
cmpxchg. They had to sprinkle monitor aborts all over the place....

So yes, unless there is undocumented magic which aborts the monitor under
certain circumstances, this thing is broken beyond repair.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/7] x86/vdso: Add vDSO functions for user wait instructions
  2018-07-31 21:22     ` Thomas Gleixner
@ 2018-07-31 21:38       ` Andy Lutomirski
  2018-08-01  8:55         ` Thomas Gleixner
  0 siblings, 1 reply; 19+ messages in thread
From: Andy Lutomirski @ 2018-07-31 21:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Fenghua Yu, Ingo Molnar, H Peter Anvin,
	Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86

On Tue, Jul 31, 2018 at 2:22 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Mon, 23 Jul 2018, Andy Lutomirski wrote:
>> On 07/23/2018 05:55 AM, Fenghua Yu wrote:
>> >     static void __init init_vdso_funcs_data(void)
>> >   {
>> > +   struct system_counterval_t sys_counterval;
>> > +
>> >     if (static_cpu_has(X86_FEATURE_MOVDIRI))
>> >             vdso_funcs_data.movdiri_supported = true;
>> >     if (static_cpu_has(X86_FEATURE_MOVDIR64B))
>> >             vdso_funcs_data.movdir64b_supported = true;
>> > +   if (static_cpu_has(X86_FEATURE_WAITPKG))
>> > +           vdso_funcs_data.waitpkg_supported = true;
>> > +   if (static_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) {
>> > +           vdso_funcs_data.tsc_known_freq = true;
>> > +           sys_counterval = convert_art_ns_to_tsc(1);
>> > +           vdso_funcs_data.tsc_per_nsec = sys_counterval.cycles;
>> > +   }
>>
>> You're losing a ton of precision here.  You might even be losing *all* of the
>> precision and malfunctioning rather badly.
>
> Indeed.
>
>> The correct way to do this is:
>>
>> tsc_counts = ns * mul >> shift;
>
>> and the vclock code illustrates it.
>
> Albeit you cannot use the TSC mult/shift pair as that is for the TSC to
> nsec conversion.
>
> To get the proper mult/shift pair use clocks_calc_mult_shift(). Note that
> the scaled math has an upper limit when using 64 bit variables. You might
> need 128bit scaled math to make it work correctly.
>
>> convert_art_ns_to_tsc() is a bad example because it uses an expensive
>> division operation for no good reason except that no one bothered to
>> optimize it.
>
> Right. It's not a hot path function and it does the job and we would need
> 128bit scaled math to avoid mult overflows.
>
> Aside of that I have no idea why anyone would use convert_art_ns_to_tsc()
> for anything else than converting art to nsecs.
>
>> > +notrace int __vdso_umwait(int state, unsigned long nsec)
>>
>> __vdso_umwait_relative(), please.  Because some day (possibly soon) someone
>> will want __vdso_umwait_absolute() and its friend __vdso_read_art_ns() so they
>> can do:
>>
>> u64 start = __vdso_read_art_ns();
>
> Errm. No. You can't read ART. ART is only used by decives to which it is
> distributed. You can only read TSC here and convert that to nsecs.

Bah.

But my point remains -- I think that the user (non-vDSO) code should
think in nanoseconds, not TSC ticks.  That we have have a much better
chance of getting migration right.

>
>> __vdso_umonitor(...);
>> ... do something potentially slow or that might fault ...
>> __vdso_umwait_absolute(start + timeout);
>
> That definitely requires 128bit scaled math to work correctly, unless you
> make the timeout relative before conversion.
>
> But I really think we should avoid creating yet another interface to
> retrieve TSC time in nsecs. We have enough of these things already.
>
> Ideally we'd use CLOCK_MONOTONIC here, but that needs more thought as:
>
>   1) TSC might be disabled as the timekeeping clocksource
>
>   2) The mult/shift pair for converting to nanoseconds is affected by
>      NTP/PTP so it can be different from the initial mult/shift pair for
>      converting nanoseconds to TSC.
>
> A possible solution would be to use CLOCK_MOTONIC_RAW which is not affected
> by NTP/PTP adjustments. But that still has the issue of TSC not being the
> timekeeping clocksource. Bah, the whole TSC deadline mode sucks. I have no
> idea what's wrong with simple down counters. They Just Work.


I think it's not totally crazy to declare UMWAIT on a system with a
non-TSC clocksource to be unsupported.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/7] x86/vdso: Add vDSO functions for user wait instructions
  2018-07-31 21:38       ` Andy Lutomirski
@ 2018-08-01  8:55         ` Thomas Gleixner
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2018-08-01  8:55 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Fenghua Yu, Ingo Molnar, H Peter Anvin, Ashok Raj, Alan Cox,
	Ravi V Shankar, linux-kernel, x86

On Tue, 31 Jul 2018, Andy Lutomirski wrote:
> On Tue, Jul 31, 2018 at 2:22 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >> u64 start = __vdso_read_art_ns();
> >
> > Errm. No. You can't read ART. ART is only used by decives to which it is
> > distributed. You can only read TSC here and convert that to nsecs.
> 
> Bah.
> 
> But my point remains -- I think that the user (non-vDSO) code should
> think in nanoseconds, not TSC ticks.  That we have have a much better
> chance of getting migration right.

Agreed. And we should not create new interfaces for that. We already have
clock_gettime() which is the right thing to use.

> > A possible solution would be to use CLOCK_MOTONIC_RAW which is not affected
> > by NTP/PTP adjustments. But that still has the issue of TSC not being the
> > timekeeping clocksource. Bah, the whole TSC deadline mode sucks. I have no
> > idea what's wrong with simple down counters. They Just Work.
> 
> I think it's not totally crazy to declare UMWAIT on a system with a
> non-TSC clocksource to be unsupported.

Sure, the information is already available in the VDSO gtod data.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/7] x86/umwait_contro: Set global umwait maximum time limit and umwait C0.2 state
  2018-07-24  1:41   ` Andy Lutomirski
@ 2018-08-01  9:01     ` Thomas Gleixner
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2018-08-01  9:01 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Fenghua Yu, Ingo Molnar, H Peter Anvin, Ashok Raj, Alan Cox,
	Ravi V Shankar, linux-kernel, x86

On Mon, 23 Jul 2018, Andy Lutomirski wrote:
> On 07/23/2018 05:55 AM, Fenghua Yu wrote:
> > UMWAIT or TPAUSE called by user process makes processor to reside in
> > a light-weight power/performance optimized state (C0.1 state) or an
> > improved power/performance optimized state (C0.2 state).
> > 
> > IA32_UMWAIT_CONTROL MSR register allows OS to set global maximum umwait
> > time and disable C0.2 on the processor.
> > 
> > The maximum time value in IA32_UMWAIT_CONTROL[31-2] is set as zero which
> > means there is no global time limit for UMWAIT and TPAUSE instructions.
> > Each process sets its own umwait maximum time as the instructions operand.
> > We don't set a non-zero global umwait maximum time value to enforce user
> > wait timeout because we couldn't find any usage for it.
> 
> Do you know what the instruction designers had in mind?  I assume they were
> thinking of *something*, but I'm seriously mystified by three things:
> 
>  - Why does CF work the way it does?  It seems like it would be genuinely
> useful for CF to indicate whether the in-register timeout has expired, but
> that's not what CF does.

Right, it would be useful to see whether the timeout caused the exit or
something else.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-08-01  9:02 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-23 12:55 [PATCH 0/7] x86: Enable a few new instructions Fenghua Yu
2018-07-23 12:55 ` [PATCH 1/7] x86/cpufeatures: Enumerate MOVDIRI instruction Fenghua Yu
2018-07-23 12:55 ` [PATCH 2/7] x86/cpufeatures: Enumerate MOVDIR64B instruction Fenghua Yu
2018-07-23 12:55 ` [PATCH 3/7] x86/cpufeatures: Enumerate UMONITOR, UMWAIT, and TPAUSE instructions Fenghua Yu
2018-07-23 12:55 ` [PATCH 4/7] x86/umwait_contro: Set global umwait maximum time limit and umwait C0.2 state Fenghua Yu
2018-07-24  1:41   ` Andy Lutomirski
2018-08-01  9:01     ` Thomas Gleixner
2018-07-23 12:55 ` [PATCH 5/7] x86/vdso: Add vDSO functions for direct store instructions Fenghua Yu
2018-07-24  1:48   ` Andy Lutomirski
2018-07-24  3:42     ` Fenghua Yu
2018-07-24  5:27       ` Andy Lutomirski
2018-07-25 22:18         ` Fenghua Yu
2018-07-23 12:55 ` [PATCH 6/7] x86/vdso: Add vDSO functions for user wait instructions Fenghua Yu
2018-07-24  2:11   ` Andy Lutomirski
2018-07-24 15:14     ` Andy Lutomirski
2018-07-31 21:22     ` Thomas Gleixner
2018-07-31 21:38       ` Andy Lutomirski
2018-08-01  8:55         ` Thomas Gleixner
2018-07-23 12:55 ` [PATCH 7/7] selftests/vDSO: Add selftest to test vDSO functions for direct store and " Fenghua Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).