All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-21 14:18 ` Mathieu Desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers

Hi,

Following changes based on a thorough coding style and patch changelog
review from Thomas Gleixner and Peter Zijlstra, I'm respinning this
series for another RFC.

This series contains:

- Restartable sequences system call (x86 32/64, powerpc 32/64, arm 32),
- CPU operation vector system call (x86 32/64, powerpc 32/64, arm 32),
- membarrier shared expedited command.

Compared to v11, I've removed the "sync core" membarrier command,
now queued for 4.16.

I have also fixed a missing page fault-in in cpu_opv, and added
a selftest test-case to cover this.

This series applies on top of current Linus' master as of
commit e1d1ea549b57 "Merge tag 'fbdev-v4.15' of git://github.com/bzolnier/linux"

The git tag including this series can be found at
https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git
tag: v4.14+-rseq-20171121

Thanks,

Mathieu

Boqun Feng (2):
  powerpc: Add support for restartable sequences
  powerpc: Wire up restartable sequences system call

Mathieu Desnoyers (20):
  uapi headers: Provide types_32_64.h
  rseq: Introduce restartable sequences system call (v12)
  arm: Add restartable sequences support
  arm: Wire up restartable sequences system call
  x86: Add support for restartable sequences
  x86: Wire up restartable sequence system call
  sched: Implement push_task_to_cpu
  cpu_opv: Provide cpu_opv system call (v4)
  x86: Wire up cpu_opv system call
  powerpc: Wire up cpu_opv system call
  arm: Wire up cpu_opv system call
  cpu_opv: selftests: Implement selftests (v3)
  rseq: selftests: Provide self-tests (v3)
  rseq: selftests: arm: workaround gcc asm size guess
  Fix: membarrier: add missing preempt off around smp_call_function_many
  membarrier: selftest: Test private expedited cmd (v2)
  powerpc: membarrier: Skip memory barrier in switch_mm() (v7)
  membarrier: Document scheduler barrier requirements (v5)
  membarrier: provide SHARED_EXPEDITED command (v2)
  membarrier: selftest: Test shared expedited cmd

 MAINTAINERS                                        |   21 +
 arch/Kconfig                                       |    7 +
 arch/arm/Kconfig                                   |    1 +
 arch/arm/kernel/signal.c                           |    7 +
 arch/arm/tools/syscall.tbl                         |    2 +
 arch/powerpc/Kconfig                               |    2 +
 arch/powerpc/include/asm/membarrier.h              |   26 +
 arch/powerpc/include/asm/systbl.h                  |    2 +
 arch/powerpc/include/asm/unistd.h                  |    2 +-
 arch/powerpc/include/uapi/asm/unistd.h             |    2 +
 arch/powerpc/kernel/signal.c                       |    3 +
 arch/powerpc/mm/mmu_context.c                      |    7 +
 arch/x86/Kconfig                                   |    1 +
 arch/x86/entry/common.c                            |    1 +
 arch/x86/entry/syscalls/syscall_32.tbl             |    2 +
 arch/x86/entry/syscalls/syscall_64.tbl             |    2 +
 arch/x86/kernel/signal.c                           |    6 +
 arch/x86/mm/tlb.c                                  |    5 +
 fs/exec.c                                          |    1 +
 include/linux/sched.h                              |  102 ++
 include/linux/sched/mm.h                           |   21 +-
 include/linux/syscalls.h                           |    6 +
 include/trace/events/rseq.h                        |   56 +
 include/uapi/linux/cpu_opv.h                       |  114 ++
 include/uapi/linux/membarrier.h                    |   34 +-
 include/uapi/linux/rseq.h                          |  141 +++
 include/uapi/linux/types_32_64.h                   |   67 +
 init/Kconfig                                       |   31 +
 kernel/Makefile                                    |    2 +
 kernel/cpu_opv.c                                   | 1060 ++++++++++++++++
 kernel/fork.c                                      |    2 +
 kernel/rseq.c                                      |  338 +++++
 kernel/sched/core.c                                |   88 +-
 kernel/sched/membarrier.c                          |  125 +-
 kernel/sched/sched.h                               |    9 +
 kernel/sys_ni.c                                    |    4 +
 tools/testing/selftests/Makefile                   |    2 +
 tools/testing/selftests/cpu-opv/.gitignore         |    1 +
 tools/testing/selftests/cpu-opv/Makefile           |   17 +
 .../testing/selftests/cpu-opv/basic_cpu_opv_test.c | 1189 ++++++++++++++++++
 tools/testing/selftests/cpu-opv/cpu-op.c           |  348 ++++++
 tools/testing/selftests/cpu-opv/cpu-op.h           |   68 ++
 tools/testing/selftests/lib.mk                     |    4 +
 .../testing/selftests/membarrier/membarrier_test.c |  162 ++-
 tools/testing/selftests/rseq/.gitignore            |    4 +
 tools/testing/selftests/rseq/Makefile              |   23 +
 .../testing/selftests/rseq/basic_percpu_ops_test.c |  333 +++++
 tools/testing/selftests/rseq/basic_test.c          |   55 +
 tools/testing/selftests/rseq/param_test.c          | 1285 ++++++++++++++++++++
 tools/testing/selftests/rseq/rseq-arm.h            |  568 +++++++++
 tools/testing/selftests/rseq/rseq-ppc.h            |  567 +++++++++
 tools/testing/selftests/rseq/rseq-x86.h            |  898 ++++++++++++++
 tools/testing/selftests/rseq/rseq.c                |  116 ++
 tools/testing/selftests/rseq/rseq.h                |  154 +++
 tools/testing/selftests/rseq/run_param_test.sh     |  124 ++
 55 files changed, 8166 insertions(+), 52 deletions(-)
 create mode 100644 arch/powerpc/include/asm/membarrier.h
 create mode 100644 include/trace/events/rseq.h
 create mode 100644 include/uapi/linux/cpu_opv.h
 create mode 100644 include/uapi/linux/rseq.h
 create mode 100644 include/uapi/linux/types_32_64.h
 create mode 100644 kernel/cpu_opv.c
 create mode 100644 kernel/rseq.c
 create mode 100644 tools/testing/selftests/cpu-opv/.gitignore
 create mode 100644 tools/testing/selftests/cpu-opv/Makefile
 create mode 100644 tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.h
 create mode 100644 tools/testing/selftests/rseq/.gitignore
 create mode 100644 tools/testing/selftests/rseq/Makefile
 create mode 100644 tools/testing/selftests/rseq/basic_percpu_ops_test.c
 create mode 100644 tools/testing/selftests/rseq/basic_test.c
 create mode 100644 tools/testing/selftests/rseq/param_test.c
 create mode 100644 tools/testing/selftests/rseq/rseq-arm.h
 create mode 100644 tools/testing/selftests/rseq/rseq-ppc.h
 create mode 100644 tools/testing/selftests/rseq/rseq-x86.h
 create mode 100644 tools/testing/selftests/rseq/rseq.c
 create mode 100644 tools/testing/selftests/rseq/rseq.h
 create mode 100755 tools/testing/selftests/rseq/run_param_test.sh

-- 
2.11.0

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-21 14:18 ` Mathieu Desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers

Hi,

Following changes based on a thorough coding style and patch changelog
review from Thomas Gleixner and Peter Zijlstra, I'm respinning this
series for another RFC.

This series contains:

- Restartable sequences system call (x86 32/64, powerpc 32/64, arm 32),
- CPU operation vector system call (x86 32/64, powerpc 32/64, arm 32),
- membarrier shared expedited command.

Compared to v11, I've removed the "sync core" membarrier command,
now queued for 4.16.

I have also fixed a missing page fault-in in cpu_opv, and added
a selftest test-case to cover this.

This series applies on top of current Linus' master as of
commit e1d1ea549b57 "Merge tag 'fbdev-v4.15' of git://github.com/bzolnier/linux"

The git tag including this series can be found at
https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git
tag: v4.14+-rseq-20171121

Thanks,

Mathieu

Boqun Feng (2):
  powerpc: Add support for restartable sequences
  powerpc: Wire up restartable sequences system call

Mathieu Desnoyers (20):
  uapi headers: Provide types_32_64.h
  rseq: Introduce restartable sequences system call (v12)
  arm: Add restartable sequences support
  arm: Wire up restartable sequences system call
  x86: Add support for restartable sequences
  x86: Wire up restartable sequence system call
  sched: Implement push_task_to_cpu
  cpu_opv: Provide cpu_opv system call (v4)
  x86: Wire up cpu_opv system call
  powerpc: Wire up cpu_opv system call
  arm: Wire up cpu_opv system call
  cpu_opv: selftests: Implement selftests (v3)
  rseq: selftests: Provide self-tests (v3)
  rseq: selftests: arm: workaround gcc asm size guess
  Fix: membarrier: add missing preempt off around smp_call_function_many
  membarrier: selftest: Test private expedited cmd (v2)
  powerpc: membarrier: Skip memory barrier in switch_mm() (v7)
  membarrier: Document scheduler barrier requirements (v5)
  membarrier: provide SHARED_EXPEDITED command (v2)
  membarrier: selftest: Test shared expedited cmd

 MAINTAINERS                                        |   21 +
 arch/Kconfig                                       |    7 +
 arch/arm/Kconfig                                   |    1 +
 arch/arm/kernel/signal.c                           |    7 +
 arch/arm/tools/syscall.tbl                         |    2 +
 arch/powerpc/Kconfig                               |    2 +
 arch/powerpc/include/asm/membarrier.h              |   26 +
 arch/powerpc/include/asm/systbl.h                  |    2 +
 arch/powerpc/include/asm/unistd.h                  |    2 +-
 arch/powerpc/include/uapi/asm/unistd.h             |    2 +
 arch/powerpc/kernel/signal.c                       |    3 +
 arch/powerpc/mm/mmu_context.c                      |    7 +
 arch/x86/Kconfig                                   |    1 +
 arch/x86/entry/common.c                            |    1 +
 arch/x86/entry/syscalls/syscall_32.tbl             |    2 +
 arch/x86/entry/syscalls/syscall_64.tbl             |    2 +
 arch/x86/kernel/signal.c                           |    6 +
 arch/x86/mm/tlb.c                                  |    5 +
 fs/exec.c                                          |    1 +
 include/linux/sched.h                              |  102 ++
 include/linux/sched/mm.h                           |   21 +-
 include/linux/syscalls.h                           |    6 +
 include/trace/events/rseq.h                        |   56 +
 include/uapi/linux/cpu_opv.h                       |  114 ++
 include/uapi/linux/membarrier.h                    |   34 +-
 include/uapi/linux/rseq.h                          |  141 +++
 include/uapi/linux/types_32_64.h                   |   67 +
 init/Kconfig                                       |   31 +
 kernel/Makefile                                    |    2 +
 kernel/cpu_opv.c                                   | 1060 ++++++++++++++++
 kernel/fork.c                                      |    2 +
 kernel/rseq.c                                      |  338 +++++
 kernel/sched/core.c                                |   88 +-
 kernel/sched/membarrier.c                          |  125 +-
 kernel/sched/sched.h                               |    9 +
 kernel/sys_ni.c                                    |    4 +
 tools/testing/selftests/Makefile                   |    2 +
 tools/testing/selftests/cpu-opv/.gitignore         |    1 +
 tools/testing/selftests/cpu-opv/Makefile           |   17 +
 .../testing/selftests/cpu-opv/basic_cpu_opv_test.c | 1189 ++++++++++++++++++
 tools/testing/selftests/cpu-opv/cpu-op.c           |  348 ++++++
 tools/testing/selftests/cpu-opv/cpu-op.h           |   68 ++
 tools/testing/selftests/lib.mk                     |    4 +
 .../testing/selftests/membarrier/membarrier_test.c |  162 ++-
 tools/testing/selftests/rseq/.gitignore            |    4 +
 tools/testing/selftests/rseq/Makefile              |   23 +
 .../testing/selftests/rseq/basic_percpu_ops_test.c |  333 +++++
 tools/testing/selftests/rseq/basic_test.c          |   55 +
 tools/testing/selftests/rseq/param_test.c          | 1285 ++++++++++++++++++++
 tools/testing/selftests/rseq/rseq-arm.h            |  568 +++++++++
 tools/testing/selftests/rseq/rseq-ppc.h            |  567 +++++++++
 tools/testing/selftests/rseq/rseq-x86.h            |  898 ++++++++++++++
 tools/testing/selftests/rseq/rseq.c                |  116 ++
 tools/testing/selftests/rseq/rseq.h                |  154 +++
 tools/testing/selftests/rseq/run_param_test.sh     |  124 ++
 55 files changed, 8166 insertions(+), 52 deletions(-)
 create mode 100644 arch/powerpc/include/asm/membarrier.h
 create mode 100644 include/trace/events/rseq.h
 create mode 100644 include/uapi/linux/cpu_opv.h
 create mode 100644 include/uapi/linux/rseq.h
 create mode 100644 include/uapi/linux/types_32_64.h
 create mode 100644 kernel/cpu_opv.c
 create mode 100644 kernel/rseq.c
 create mode 100644 tools/testing/selftests/cpu-opv/.gitignore
 create mode 100644 tools/testing/selftests/cpu-opv/Makefile
 create mode 100644 tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.h
 create mode 100644 tools/testing/selftests/rseq/.gitignore
 create mode 100644 tools/testing/selftests/rseq/Makefile
 create mode 100644 tools/testing/selftests/rseq/basic_percpu_ops_test.c
 create mode 100644 tools/testing/selftests/rseq/basic_test.c
 create mode 100644 tools/testing/selftests/rseq/param_test.c
 create mode 100644 tools/testing/selftests/rseq/rseq-arm.h
 create mode 100644 tools/testing/selftests/rseq/rseq-ppc.h
 create mode 100644 tools/testing/selftests/rseq/rseq-x86.h
 create mode 100644 tools/testing/selftests/rseq/rseq.c
 create mode 100644 tools/testing/selftests/rseq/rseq.h
 create mode 100755 tools/testing/selftests/rseq/run_param_test.sh

-- 
2.11.0

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 01/22] uapi headers: Provide types_32_64.h
  2017-11-21 14:18 ` Mathieu Desnoyers
  (?)
@ 2017-11-21 14:18 ` Mathieu Desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers

Provide helper macros for fields which represent pointers in
kernel-userspace ABI. This facilitates handling of 32-bit
user-space by 64-bit kernels by defining those fields as
32-bit 0-padding and 32-bit integer on 32-bit architectures,
which allows the kernel to treat those as 64-bit integers.
The order of padding and 32-bit integer depends on the
endianness.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul Turner <pjt@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Andrew Hunter <ahh@google.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org
---
 include/uapi/linux/types_32_64.h | 67 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)
 create mode 100644 include/uapi/linux/types_32_64.h

diff --git a/include/uapi/linux/types_32_64.h b/include/uapi/linux/types_32_64.h
new file mode 100644
index 000000000000..18dc8808d026
--- /dev/null
+++ b/include/uapi/linux/types_32_64.h
@@ -0,0 +1,67 @@
+#ifndef _UAPI_LINUX_TYPES_32_64_H
+#define _UAPI_LINUX_TYPES_32_64_H
+
+/*
+ * linux/types_32_64.h
+ *
+ * Integer type declaration for pointers across 32-bit and 64-bit systems.
+ *
+ * Copyright (c) 2015-2017 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifdef __KERNEL__
+# include <linux/types.h>
+#else
+# include <stdint.h>
+#endif
+
+#include <asm/byteorder.h>
+
+#ifdef __BYTE_ORDER
+# if (__BYTE_ORDER == __BIG_ENDIAN)
+#  define LINUX_BYTE_ORDER_BIG_ENDIAN
+# else
+#  define LINUX_BYTE_ORDER_LITTLE_ENDIAN
+# endif
+#else
+# ifdef __BIG_ENDIAN
+#  define LINUX_BYTE_ORDER_BIG_ENDIAN
+# else
+#  define LINUX_BYTE_ORDER_LITTLE_ENDIAN
+# endif
+#endif
+
+#ifdef __LP64__
+# define LINUX_FIELD_u32_u64(field)			uint64_t field
+# define LINUX_FIELD_u32_u64_INIT_ONSTACK(field, v)	field = (intptr_t)v
+#else
+# ifdef LINUX_BYTE_ORDER_BIG_ENDIAN
+#  define LINUX_FIELD_u32_u64(field)	uint32_t field ## _padding, field
+#  define LINUX_FIELD_u32_u64_INIT_ONSTACK(field, v)	\
+	field ## _padding = 0, field = (intptr_t)v
+# else
+#  define LINUX_FIELD_u32_u64(field)	uint32_t field, field ## _padding
+#  define LINUX_FIELD_u32_u64_INIT_ONSTACK(field, v)	\
+	field = (intptr_t)v, field ## _padding = 0
+# endif
+#endif
+
+#endif /* _UAPI_LINUX_TYPES_32_64_H */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 v12 02/22] rseq: Introduce restartable sequences system call
  2017-11-21 14:18 ` Mathieu Desnoyers
  (?)
  (?)
@ 2017-11-21 14:18 ` Mathieu Desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Alexander Viro

Expose a new system call allowing each thread to register one userspace
memory area to be used as an ABI between kernel and user-space for two
purposes: user-space restartable sequences and quick access to read the
current CPU number value from user-space.

* Restartable sequences (per-cpu atomics)

Restartables sequences allow user-space to perform update operations on
per-cpu data without requiring heavy-weight atomic operations.

The restartable critical sections (percpu atomics) work has been started
by Paul Turner and Andrew Hunter. It lets the kernel handle restart of
critical sections. [1] [2] The re-implementation proposed here brings a
few simplifications to the ABI which facilitates porting to other
architectures and speeds up the user-space fast path. A second system
call, cpu_opv(), is proposed as fallback to deal with debugger
single-stepping. cpu_opv() executes a sequence of operations on behalf
of user-space with preemption disabled.

Here are benchmarks of various rseq use-cases.

Test hardware:

arm32: ARMv7 Processor rev 4 (v7l) "Cubietruck", 2-core
x86-64: Intel E5-2630 v3@2.40GHz, 16-core, hyperthreading

The following benchmarks were all performed on a single thread.

* Per-CPU statistic counter increment

                getcpu+atomic (ns/op)    rseq (ns/op)    speedup
arm32:                344.0                 31.4          11.0
x86-64:                15.3                  2.0           7.7

* LTTng-UST: write event 32-bit header, 32-bit payload into tracer
             per-cpu buffer

                getcpu+atomic (ns/op)    rseq (ns/op)    speedup
arm32:               2502.0                 2250.0         1.1
x86-64:               117.4                   98.0         1.2

* liburcu percpu: lock-unlock pair, dereference, read/compare word

                getcpu+atomic (ns/op)    rseq (ns/op)    speedup
arm32:                751.0                 128.5          5.8
x86-64:                53.4                  28.6          1.9

* jemalloc memory allocator adapted to use rseq

Using rseq with per-cpu memory pools in jemalloc at Facebook (based on
rseq 2016 implementation):

The production workload response-time has 1-2% gain avg. latency, and
the P99 overall latency drops by 2-3%.

* Reading the current CPU number

Speeding up reading the current CPU number on which the caller thread is
running is done by keeping the current CPU number up do date within the
cpu_id field of the memory area registered by the thread. This is done
by making scheduler preemption set the TIF_NOTIFY_RESUME flag on the
current thread. Upon return to user-space, a notify-resume handler
updates the current CPU value within the registered user-space memory
area. User-space can then read the current CPU number directly from
memory.

Keeping the current cpu id in a memory area shared between kernel and
user-space is an improvement over current mechanisms available to read
the current CPU number, which has the following benefits over
alternative approaches:

- 35x speedup on ARM vs system call through glibc
- 20x speedup on x86 compared to calling glibc, which calls vdso
  executing a "lsl" instruction,
- 14x speedup on x86 compared to inlined "lsl" instruction,
- Unlike vdso approaches, this cpu_id value can be read from an inline
  assembly, which makes it a useful building block for restartable
  sequences.
- The approach of reading the cpu id through memory mapping shared
  between kernel and user-space is portable (e.g. ARM), which is not the
  case for the lsl-based x86 vdso.

On x86, yet another possible approach would be to use the gs segment
selector to point to user-space per-cpu data. This approach performs
similarly to the cpu id cache, but it has two disadvantages: it is
not portable, and it is incompatible with existing applications already
using the gs segment selector for other purposes.

Benchmarking various approaches for reading the current CPU number:

ARMv7 Processor rev 4 (v7l)
Machine model: Cubietruck
- Baseline (empty loop):                                    8.4 ns
- Read CPU from rseq cpu_id:                               16.7 ns
- Read CPU from rseq cpu_id (lazy register):               19.8 ns
- glibc 2.19-0ubuntu6.6 getcpu:                           301.8 ns
- getcpu system call:                                     234.9 ns

x86-64 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz:
- Baseline (empty loop):                                    0.8 ns
- Read CPU from rseq cpu_id:                                0.8 ns
- Read CPU from rseq cpu_id (lazy register):                0.8 ns
- Read using gs segment selector:                           0.8 ns
- "lsl" inline assembly:                                   13.0 ns
- glibc 2.19-0ubuntu6 getcpu:                              16.6 ns
- getcpu system call:                                      53.9 ns

- Speed (benchmark taken on v8 of patchset)

Running 10 runs of hackbench -l 100000 seems to indicate, contrary to
expectations, that enabling CONFIG_RSEQ slightly accelerates the
scheduler:

Configuration: 2 sockets * 8-core Intel(R) Xeon(R) CPU E5-2630 v3 @
2.40GHz (directly on hardware, hyperthreading disabled in BIOS, energy
saving disabled in BIOS, turboboost disabled in BIOS, cpuidle.off=1
kernel parameter), with a Linux v4.6 defconfig+localyesconfig,
restartable sequences series applied.

* CONFIG_RSEQ=n

avg.:      41.37 s
std.dev.:   0.36 s

* CONFIG_RSEQ=y

avg.:      40.46 s
std.dev.:   0.33 s

- Size

On x86-64, between CONFIG_RSEQ=n/y, the text size increase of vmlinux is
567 bytes, and the data size increase of vmlinux is 5696 bytes.

On x86-64, between CONFIG_CPU_OPV=n/y, the text size increase of vmlinux is
5576 bytes, and the data size increase of vmlinux is 6164 bytes.

[1] https://lwn.net/Articles/650333/
[2] http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdf

Link: http://lkml.kernel.org/r/20151027235635.16059.11630.stgit@pjt-glaptop.roam.corp.google.com
Link: http://lkml.kernel.org/r/20150624222609.6116.86035.stgit@kitami.mtv.corp.google.com
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: linux-api@vger.kernel.org
---

Changes since v1:
- Return -1, errno=EINVAL if cpu_cache pointer is not aligned on
  sizeof(int32_t).
- Update man page to describe the pointer alignement requirements and
  update atomicity guarantees.
- Add MAINTAINERS file GETCPU_CACHE entry.
- Remove dynamic memory allocation: go back to having a single
  getcpu_cache entry per thread. Update documentation accordingly.
- Rebased on Linux 4.4.

Changes since v2:
- Introduce a "cmd" argument, along with an enum with GETCPU_CACHE_GET
  and GETCPU_CACHE_SET. Introduce a uapi header linux/getcpu_cache.h
  defining this enumeration.
- Split resume notifier architecture implementation from the system call
  wire up in the following arch-specific patches.
- Man pages updates.
- Handle 32-bit compat pointers.
- Simplify handling of getcpu_cache GETCPU_CACHE_SET compiler barrier:
  set the current cpu cache pointer before doing the cache update, and
  set it back to NULL if the update fails. Setting it back to NULL on
  error ensures that no resume notifier will trigger a SIGSEGV if a
  migration happened concurrently.

Changes since v3:
- Fix __user annotations in compat code,
- Update memory ordering comments.
- Rebased on kernel v4.5-rc5.

Changes since v4:
- Inline getcpu_cache_fork, getcpu_cache_execve, and getcpu_cache_exit.
- Add new line between if() and switch() to improve readability.
- Added sched switch benchmarks (hackbench) and size overhead comparison
  to change log.

Changes since v5:
- Rename "getcpu_cache" to "thread_local_abi", allowing to extend
  this system call to cover future features such as restartable critical
  sections. Generalizing this system call ensures that we can add
  features similar to the cpu_id field within the same cache-line
  without having to track one pointer per feature within the task
  struct.
- Add a tlabi_nr parameter to the system call, thus allowing to extend
  the ABI beyond the initial 64-byte structure by registering structures
  with tlabi_nr greater than 0. The initial ABI structure is associated
  with tlabi_nr 0.
- Rebased on kernel v4.5.

Changes since v6:
- Integrate "restartable sequences" v2 patchset from Paul Turner.
- Add handling of single-stepping purely in user-space, with a
  fallback to locking after 2 rseq failures to ensure progress, and
  by exposing a __rseq_table section to debuggers so they know where
  to put breakpoints when dealing with rseq assembly blocks which
  can be aborted at any point.
- make the code and ABI generic: porting the kernel implementation
  simply requires to wire up the signal handler and return to user-space
  hooks, and allocate the syscall number.
- extend testing with a fully configurable test program. See
  param_spinlock_test -h for details.
- handling of rseq ENOSYS in user-space, also with a fallback
  to locking.
- modify Paul Turner's rseq ABI to only require a single TLS store on
  the user-space fast-path, removing the need to populate two additional
  registers. This is made possible by introducing struct rseq_cs into
  the ABI to describe a critical section start_ip, post_commit_ip, and
  abort_ip.
- Rebased on kernel v4.7-rc7.

Changes since v7:
- Documentation updates.
- Integrated powerpc architecture support.
- Compare rseq critical section start_ip, allows shriking the user-space
  fast-path code size.
- Added Peter Zijlstra, Paul E. McKenney and Boqun Feng as
  co-maintainers.
- Added do_rseq2 and do_rseq_memcpy to test program helper library.
- Code cleanup based on review from Peter Zijlstra, Andy Lutomirski and
  Boqun Feng.
- Rebase on kernel v4.8-rc2.

Changes since v8:
- clear rseq_cs even if non-nested. Speeds up user-space fast path by
  removing the final "rseq_cs=NULL" assignment.
- add enum rseq_flags: critical sections and threads can set migration,
  preemption and signal "disable" flags to inhibit rseq behavior.
- rseq_event_counter needs to be updated with a pre-increment: Otherwise
  misses an increment after exec (when TLS and in-kernel states are
  initially 0).

Changes since v9:
- Update changelog.
- Fold instrumentation patch.
- check abort-ip signature: Add a signature before the abort-ip landing
  address. This signature is also received as a new parameter to the
  rseq system call. The kernel uses it ensures that rseq cannot be used
  as an exploit vector to redirect execution to arbitrary code.
- Use rseq pointer for both register and unregister. This is more
  symmetric, and eventually allow supporting a linked list of rseq
  struct per thread if needed in the future.
- Unregistration of a rseq structure is now done with
  RSEQ_FLAG_UNREGISTER.
- Remove reference counting. Return "EBUSY" to the caller if rseq is
  already registered for the current thread. This simplifies
  implementation while still allowing user-space to perform lazy
  registration in multi-lib use-cases. (suggested by Ben Maurer)
- Clear rseq_cs upon unregister.
- Set cpu_id back to -1 on unregister, so if rseq user libraries follow
  an unregister, and they expect to lazily register rseq, they can do
  so.
- Document rseq_cs clear requirement: JIT should reset the rseq_cs
  pointer before reclaiming memory of rseq_cs structure.
- Introduce rseq_len syscall parameter, rseq_cs version field:
  Allow keeping track of the registered rseq struct length, for future
  extensions. Add rseq_cs version as first field. Will allow future
  extensions.
- Use offset and unsigned arithmetic to save a branch:  Save a
  conditional branch when comparing instruction pointer against a
  rseq_cs descriptor's address range by having post_commit_ip as an
  offset from start_ip, and using unsigned integer comparison.
  Suggested by Ben Maurer.
- Remove event counter from ABI. Suggested by Andy Lutomirski.
- Add INIT_ONSTACK macro: Introduce the
  RSEQ_FIELD_u32_u64_INIT_ONSTACK() macros to ensure that users
  correctly initialize the upper bits of RSEQ_FIELD_u32_u64() on their
  stack to 0 on 32-bit architectures.
- Select MEMBARRIER: Allows user-space rseq fast-paths to use the value
  of cpu_id field (inherently required by the rseq algorithm) to figure
  out whether membarrier can be expected to be available.
  This effectively allows user-space fast-paths to remove extra
  comparisons and branch testing whether membarrier is enabled, and thus
  whether a full barrier is required (e.g. in userspace RCU
  implementation after rcu_read_lock/before rcu_read_unlock).
- Expose cpu_id_start field: Checking whether the (cpu_id < 0) in the C
  preparation part of the rseq fast-path brings significant overhead at
  least on arm32. We can remove this extra comparison by exposing two
  distinct cpu_id fields in the rseq TLS:

  The field cpu_id_start always contain a *possible* cpu number, although
  it may not be the current one if, for instance, rseq is not initialized
  for the current thread. cpu_id_start is meant to be used in the C code
  for the pointer chasing to figure out which per-cpu data structure
  should be passed to the rseq asm sequence.

  The field cpu_id values -1 means rseq is not initialized, and -2 means
  initialization failed. That field is used in the rseq asm sequence to
  confirm that the cpu_id_start value was indeed the current cpu number.
  It also ends up confirming that rseq is initialized for the current
  thread, because values -1 and -2 will never match the cpu_id_start
  possible cpu number values.

  This allows checking the current CPU number and rseq initialization
  state with a single comparison on the fast-path.

Changes since v10:

- Update rseq.c comment, removing reference to event_counter.

Changes since v11:

- Replace task struct rseq_preempt, rseq_signal, and rseq_migrate
  bool by u32 rseq_event_mask.
- Add missing sys_rseq() asmlinkage declaration to
  include/linux/syscalls.h.
- copy event mask on process fork, set to 0 on exec and thread-fork.
- Cleanups based on review from Peter Zijlstra.
- Cleanups based on review from Thomas Gleixner.

Man page associated:

RSEQ(2)                Linux Programmer's Manual               RSEQ(2)

NAME
       rseq - Restartable sequences and cpu number cache

SYNOPSIS
       #include <linux/rseq.h>

       int rseq(struct rseq * rseq, uint32_t rseq_len, int flags, uint32_t sig);

DESCRIPTION
       The  rseq()  ABI  accelerates  user-space operations on per-cpu
       data by defining a shared data structure ABI between each user-
       space thread and the kernel.

       It  allows  user-space  to perform update operations on per-cpu
       data without requiring heavy-weight atomic operations.

       Restartable sequences are atomic  with  respect  to  preemption
       (making  it atomic with respect to other threads running on the
       same CPU), as well as  signal  delivery  (user-space  execution
       contexts nested over the same thread).

       It is suited for update operations on per-cpu data.

       It can be used on data structures shared between threads within
       a process, and on data structures shared between threads across
       different processes.

       Some examples of operations that can be accelerated or improved
       by this ABI:

       · Memory allocator per-cpu free-lists,

       · Querying the current CPU number,

       · Incrementing per-CPU counters,

       · Modifying data protected by per-CPU spinlocks,

       · Inserting/removing elements in per-CPU linked-lists,

       · Writing/reading per-CPU ring buffers content.

       · Accurately reading performance monitoring unit counters  with
         respect to thread migration.

       The  rseq argument is a pointer to the thread-local rseq struc‐
       ture to be shared between kernel and user-space.  A  NULL  rseq
       value unregisters the current thread rseq structure.

       The layout of struct rseq is as follows:

       Structure alignment
              This structure is aligned on multiples of 32 bytes.

       Structure size
              This  structure  is  extensible.  Its  size is passed as
              parameter to the rseq system call.

       Fields

           cpu_id_start
              Optimistic cache of the CPU number on which the  current
              thread  is running. Its value is guaranteed to always be
              a possible CPU number, even when rseq  is  not  initial‐
              ized.  The  value it contains should always be confirmed
              by reading the cpu_id field.

           cpu_id
              Cache of the CPU number on which the current  thread  is
              running.  -1 if uninitialized.

           rseq_cs
              The  rseq_cs  field is a pointer to a struct rseq_cs. Is
              is NULL when no rseq assembly block critical section  is
              active for the current thread.  Setting it to point to a
              critical section descriptor (struct rseq_cs)  marks  the
              beginning of the critical section.

           flags
              Flags  indicating  the  restart behavior for the current
              thread. This is mainly used for debugging purposes.  Can
              be either:

       ·      RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT

       ·      RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL

       ·      RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE

       The layout of struct rseq_cs version 0 is as follows:

       Structure alignment
              This structure is aligned on multiples of 32 bytes.

       Structure size
              This structure has a fixed size of 32 bytes.

       Fields

           version
              Version of this structure.

           flags
              Flags indicating the restart behavior of this structure.
              Can be either:

       ·      RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT

       ·      RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL

       ·      RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE

           start_ip
              Instruction pointer address of the first instruction  of
              the sequence of consecutive assembly instructions.

           post_commit_offset
              Offset  (from start_ip address) of the address after the
              last instruction of the sequence of consecutive assembly
              instructions.

           abort_ip
              Instruction  pointer address where to move the execution
              flow in case of abort of  the  sequence  of  consecutive
              assembly instructions.

       The  rseq_len argument is the size of the struct rseq to regis‐
       ter.

       The flags argument is 0 for registration, and  RSEQ_FLAG_UNREG‐
       ISTER for unregistration.

       The  sig argument is the 32-bit signature to be expected before
       the abort handler code.

       A single library per process should keep the rseq structure  in
       a  thread-local  storage  variable.  The cpu_id field should be
       initialized to -1, and the cpu_id_start field  should  be  ini‐
       tialized to a possible CPU value (typically 0).

       Each  thread  is  responsible for registering and unregistering
       its rseq structure. No more than one rseq structure address can
       be registered per thread at a given time.

       In  a  typical  usage scenario, the thread registering the rseq
       structure will be performing  loads  and  stores  from/to  that
       structure.  It  is  however also allowed to read that structure
       from other threads.  The rseq field updates  performed  by  the
       kernel  provide  relaxed  atomicity  semantics, which guarantee
       that other threads performing relaxed atomic reads of  the  cpu
       number cache will always observe a consistent value.

RETURN VALUE
       A  return  value  of  0  indicates  success.  On  error,  -1 is
       returned, and errno is set appropriately.

ERRORS
       EINVAL Either flags contains an invalid value, or rseq contains
              an  address  which  is  not  appropriately  aligned,  or
              rseq_len contains a size that does not  match  the  size
              received on registration.

       ENOSYS The  rseq()  system call is not implemented by this ker‐
              nel.

       EFAULT rseq is an invalid address.

       EBUSY  Restartable sequence  is  already  registered  for  this
              thread.

       EPERM  The  sig  argument  on unregistration does not match the
              signature received on registration.

VERSIONS
       The rseq() system call was added in Linux 4.X (TODO).

CONFORMING TO
       rseq() is Linux-specific.

SEE ALSO
       sched_getcpu(3)

Linux                         2017-11-06                       RSEQ(2)
---
 MAINTAINERS                 |  11 ++
 arch/Kconfig                |   7 +
 fs/exec.c                   |   1 +
 include/linux/sched.h       | 102 +++++++++++++
 include/linux/syscalls.h    |   3 +
 include/trace/events/rseq.h |  56 ++++++++
 include/uapi/linux/rseq.h   | 141 ++++++++++++++++++
 init/Kconfig                |  14 ++
 kernel/Makefile             |   1 +
 kernel/fork.c               |   2 +
 kernel/rseq.c               | 338 ++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/core.c         |   4 +
 kernel/sys_ni.c             |   3 +
 13 files changed, 683 insertions(+)
 create mode 100644 include/trace/events/rseq.h
 create mode 100644 include/uapi/linux/rseq.h
 create mode 100644 kernel/rseq.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 44512c346206..b8f6a99005b4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11616,6 +11616,17 @@ F:	include/dt-bindings/reset/
 F:	include/linux/reset.h
 F:	include/linux/reset-controller.h
 
+RESTARTABLE SEQUENCES SUPPORT
+M:	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+M:	Peter Zijlstra <peterz@infradead.org>
+M:	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
+M:	Boqun Feng <boqun.feng@gmail.com>
+L:	linux-kernel@vger.kernel.org
+S:	Supported
+F:	kernel/rseq.c
+F:	include/uapi/linux/rseq.h
+F:	include/trace/events/rseq.h
+
 RFKILL
 M:	Johannes Berg <johannes@sipsolutions.net>
 L:	linux-wireless@vger.kernel.org
diff --git a/arch/Kconfig b/arch/Kconfig
index 400b9e1b2f27..2d7f54a5784b 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -258,6 +258,13 @@ config HAVE_REGS_AND_STACK_ACCESS_API
 	  declared in asm/ptrace.h
 	  For example the kprobes-based event tracer needs this API.
 
+config HAVE_RSEQ
+	bool
+	depends on HAVE_REGS_AND_STACK_ACCESS_API
+	help
+	  This symbol should be selected by an architecture if it
+	  supports an implementation of restartable sequences.
+
 config HAVE_CLK
 	bool
 	help
diff --git a/fs/exec.c b/fs/exec.c
index 1d6243d9f2b6..0caa4e1f1ce8 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1803,6 +1803,7 @@ static int do_execveat_common(int fd, struct filename *filename,
 	current->fs->in_exec = 0;
 	current->in_execve = 0;
 	membarrier_execve(current);
+	rseq_execve(current);
 	acct_update_integrals(current);
 	task_numa_free(current);
 	free_bprm(bprm);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index a5dc7c98b0a2..39c42fce56e4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -27,6 +27,7 @@
 #include <linux/signal_types.h>
 #include <linux/mm_types_task.h>
 #include <linux/task_io_accounting.h>
+#include <linux/rseq.h>
 
 /* task_struct member predeclarations (sorted alphabetically): */
 struct audit_context;
@@ -978,6 +979,13 @@ struct task_struct {
 	unsigned long			numa_pages_migrated;
 #endif /* CONFIG_NUMA_BALANCING */
 
+#ifdef CONFIG_RSEQ
+	struct rseq __user *rseq;
+	u32 rseq_len;
+	u32 rseq_sig;
+	u32 rseq_event_mask;
+#endif
+
 	struct tlbflush_unmap_batch	tlb_ubc;
 
 	struct rcu_head			rcu;
@@ -1668,4 +1676,98 @@ extern long sched_getaffinity(pid_t pid, struct cpumask *mask);
 #define TASK_SIZE_OF(tsk)	TASK_SIZE
 #endif
 
+#ifdef CONFIG_RSEQ
+/*
+ * Map the event mask on the user-space ABI enum rseq_cs_flags
+ * for direct mask checks.
+ */
+enum rseq_event_mask {
+	RSEQ_EVENT_PREEMPT	= RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT,
+	RSEQ_EVENT_SIGNAL	= RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL,
+	RSEQ_EVENT_MIGRATE	= RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE,
+};
+
+static inline void rseq_set_notify_resume(struct task_struct *t)
+{
+	if (t->rseq)
+		set_tsk_thread_flag(t, TIF_NOTIFY_RESUME);
+}
+void __rseq_handle_notify_resume(struct pt_regs *regs);
+static inline void rseq_handle_notify_resume(struct pt_regs *regs)
+{
+	if (current->rseq)
+		__rseq_handle_notify_resume(regs);
+}
+/*
+ * If parent process has a registered restartable sequences area, the
+ * child inherits. Only applies when forking a process, not a thread. In
+ * case a parent fork() in the middle of a restartable sequence, set the
+ * resume notifier to force the child to retry.
+ */
+static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags)
+{
+	if (clone_flags & CLONE_THREAD) {
+		t->rseq = NULL;
+		t->rseq_len = 0;
+		t->rseq_sig = 0;
+		t->rseq_event_mask = 0;
+	} else {
+		t->rseq = current->rseq;
+		t->rseq_len = current->rseq_len;
+		t->rseq_sig = current->rseq_sig;
+		t->rseq_event_mask = current->rseq_event_mask;
+		rseq_set_notify_resume(t);
+	}
+}
+static inline void rseq_execve(struct task_struct *t)
+{
+	t->rseq = NULL;
+	t->rseq_len = 0;
+	t->rseq_sig = 0;
+	t->rseq_event_mask = 0;
+}
+static inline void rseq_sched_out(struct task_struct *t)
+{
+	rseq_set_notify_resume(t);
+}
+static inline void rseq_signal_deliver(struct pt_regs *regs)
+{
+	current->rseq_event_mask |= RSEQ_EVENT_SIGNAL;
+	rseq_handle_notify_resume(regs);
+}
+static inline void rseq_preempt(struct task_struct *t)
+{
+	t->rseq_event_mask |= RSEQ_EVENT_PREEMPT;
+}
+static inline void rseq_migrate(struct task_struct *t)
+{
+	t->rseq_event_mask |= RSEQ_EVENT_MIGRATE;
+}
+#else
+static inline void rseq_set_notify_resume(struct task_struct *t)
+{
+}
+static inline void rseq_handle_notify_resume(struct pt_regs *regs)
+{
+}
+static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags)
+{
+}
+static inline void rseq_execve(struct task_struct *t)
+{
+}
+static inline void rseq_sched_out(struct task_struct *t)
+{
+}
+static inline void rseq_signal_deliver(struct pt_regs *regs)
+{
+}
+static inline void rseq_preempt(struct task_struct *t)
+{
+}
+static inline void rseq_migrate(struct task_struct *t)
+{
+}
+#endif
+
 #endif
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a78186d826d7..340650b4ec54 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -66,6 +66,7 @@ struct old_linux_dirent;
 struct perf_event_attr;
 struct file_handle;
 struct sigaltstack;
+struct rseq;
 union bpf_attr;
 
 #include <linux/types.h>
@@ -940,5 +941,7 @@ asmlinkage long sys_pkey_alloc(unsigned long flags, unsigned long init_val);
 asmlinkage long sys_pkey_free(int pkey);
 asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
 			  unsigned mask, struct statx __user *buffer);
+asmlinkage long sys_rseq(struct rseq __user *rseq, uint32_t rseq_len,
+			int flags, uint32_t sig);
 
 #endif
diff --git a/include/trace/events/rseq.h b/include/trace/events/rseq.h
new file mode 100644
index 000000000000..c4609a3f5008
--- /dev/null
+++ b/include/trace/events/rseq.h
@@ -0,0 +1,56 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM rseq
+
+#if !defined(_TRACE_RSEQ_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_RSEQ_H
+
+#include <linux/tracepoint.h>
+#include <linux/types.h>
+
+TRACE_EVENT(rseq_update,
+
+	TP_PROTO(struct task_struct *t),
+
+	TP_ARGS(t),
+
+	TP_STRUCT__entry(
+		__field(s32, cpu_id)
+	),
+
+	TP_fast_assign(
+		__entry->cpu_id = raw_smp_processor_id();
+	),
+
+	TP_printk("cpu_id=%d", __entry->cpu_id)
+);
+
+TRACE_EVENT(rseq_ip_fixup,
+
+	TP_PROTO(unsigned long regs_ip, unsigned long start_ip,
+		unsigned long post_commit_offset, unsigned long abort_ip),
+
+	TP_ARGS(regs_ip, start_ip, post_commit_offset, abort_ip),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, regs_ip)
+		__field(unsigned long, start_ip)
+		__field(unsigned long, post_commit_offset)
+		__field(unsigned long, abort_ip)
+	),
+
+	TP_fast_assign(
+		__entry->regs_ip = regs_ip;
+		__entry->start_ip = start_ip;
+		__entry->post_commit_offset = post_commit_offset;
+		__entry->abort_ip = abort_ip;
+	),
+
+	TP_printk("regs_ip=0x%lx start_ip=0x%lx post_commit_offset=%lu abort_ip=0x%lx",
+		__entry->regs_ip, __entry->start_ip,
+		__entry->post_commit_offset, __entry->abort_ip)
+);
+
+#endif /* _TRACE_SOCK_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h
new file mode 100644
index 000000000000..ff85e0795b50
--- /dev/null
+++ b/include/uapi/linux/rseq.h
@@ -0,0 +1,141 @@
+#ifndef _UAPI_LINUX_RSEQ_H
+#define _UAPI_LINUX_RSEQ_H
+
+/*
+ * linux/rseq.h
+ *
+ * Restartable sequences system call API
+ *
+ * Copyright (c) 2015-2016 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifdef __KERNEL__
+# include <linux/types.h>
+#else
+# include <stdint.h>
+#endif
+
+#include <linux/types_32_64.h>
+
+enum rseq_cpu_id_state {
+	RSEQ_CPU_ID_UNINITIALIZED		= -1,
+	RSEQ_CPU_ID_REGISTRATION_FAILED		= -2,
+};
+
+enum rseq_flags {
+	RSEQ_FLAG_UNREGISTER = (1 << 0),
+};
+
+enum rseq_cs_flags {
+	RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT	= (1U << 0),
+	RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL	= (1U << 1),
+	RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE	= (1U << 2),
+};
+
+/*
+ * struct rseq_cs is aligned on 4 * 8 bytes to ensure it is always
+ * contained within a single cache-line. It is usually declared as
+ * link-time constant data.
+ */
+struct rseq_cs {
+	/* Version of this structure. */
+	uint32_t version;
+	/* enum rseq_cs_flags */
+	uint32_t flags;
+	LINUX_FIELD_u32_u64(start_ip);
+	/* Offset from start_ip. */
+	LINUX_FIELD_u32_u64(post_commit_offset);
+	LINUX_FIELD_u32_u64(abort_ip);
+} __attribute__((aligned(4 * sizeof(uint64_t))));
+
+/*
+ * struct rseq is aligned on 4 * 8 bytes to ensure it is always
+ * contained within a single cache-line.
+ *
+ * A single struct rseq per thread is allowed.
+ */
+struct rseq {
+	/*
+	 * Restartable sequences cpu_id_start field. Updated by the
+	 * kernel, and read by user-space with single-copy atomicity
+	 * semantics. Aligned on 32-bit. Always contains a value in the
+	 * range of possible CPUs, although the value may not be the
+	 * actual current CPU (e.g. if rseq is not initialized). This
+	 * CPU number value should always be compared against the value
+	 * of the cpu_id field before performing a rseq commit or
+	 * returning a value read from a data structure indexed using
+	 * the cpu_id_start value.
+	 */
+	uint32_t cpu_id_start;
+	/*
+	 * Restartable sequences cpu_id field. Updated by the kernel,
+	 * and read by user-space with single-copy atomicity semantics.
+	 * Aligned on 32-bit. Values RSEQ_CPU_ID_UNINITIALIZED and
+	 * RSEQ_CPU_ID_REGISTRATION_FAILED have a special semantic: the
+	 * former means "rseq uninitialized", and latter means "rseq
+	 * initialization failed". This value is meant to be read within
+	 * rseq critical sections and compared with the cpu_id_start
+	 * value previously read, before performing the commit instruction,
+	 * or read and compared with the cpu_id_start value before returning
+	 * a value loaded from a data structure indexed using the
+	 * cpu_id_start value.
+	 */
+	uint32_t cpu_id;
+	/*
+	 * Restartable sequences rseq_cs field.
+	 *
+	 * Contains NULL when no critical section is active for the current
+	 * thread, or holds a pointer to the currently active struct rseq_cs.
+	 *
+	 * Updated by user-space, which sets the address of the currently
+	 * active rseq_cs at the beginning of assembly instruction sequence
+	 * block, and set to NULL by the kernel when it restarts an assembly
+	 * instruction sequence block, as well as when the kernel detects that
+	 * it is preempting or delivering a signal outside of the range
+	 * targeted by the rseq_cs. Also needs to be set to NULL by user-space
+	 * before reclaiming memory that contains the targeted struct rseq_cs.
+	 *
+	 * Read and set by the kernel with single-copy atomicity semantics.
+	 * Set by user-space with single-copy atomicity semantics. Aligned
+	 * on 64-bit.
+	 */
+	LINUX_FIELD_u32_u64(rseq_cs);
+	/*
+	 * - RSEQ_DISABLE flag:
+	 *
+	 * Fallback fast-track flag for single-stepping.
+	 * Set by user-space if lack of progress is detected.
+	 * Cleared by user-space after rseq finish.
+	 * Read by the kernel.
+	 * - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT
+	 *     Inhibit instruction sequence block restart and event
+	 *     counter increment on preemption for this thread.
+	 * - RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL
+	 *     Inhibit instruction sequence block restart and event
+	 *     counter increment on signal delivery for this thread.
+	 * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE
+	 *     Inhibit instruction sequence block restart and event
+	 *     counter increment on migration for this thread.
+	 */
+	uint32_t flags;
+} __attribute__((aligned(4 * sizeof(uint64_t))));
+
+#endif /* _UAPI_LINUX_RSEQ_H */
diff --git a/init/Kconfig b/init/Kconfig
index 2934249fba46..88e36395390f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1400,6 +1400,20 @@ config USERFAULTFD
 	  Enable the userfaultfd() system call that allows to intercept and
 	  handle page faults in userland.
 
+config RSEQ
+	bool "Enable rseq() system call" if EXPERT
+	default y
+	depends on HAVE_RSEQ
+	select MEMBARRIER
+	help
+	  Enable the restartable sequences system call. It provides a
+	  user-space cache for the current CPU number value, which
+	  speeds up getting the current CPU number from user-space,
+	  as well as an ABI to speed up user-space operations on
+	  per-CPU data.
+
+	  If unsure, say Y.
+
 config EMBEDDED
 	bool "Embedded system"
 	option allnoconfig_y
diff --git a/kernel/Makefile b/kernel/Makefile
index 172d151d429c..3574669dafd9 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -112,6 +112,7 @@ obj-$(CONFIG_CONTEXT_TRACKING) += context_tracking.o
 obj-$(CONFIG_TORTURE_TEST) += torture.o
 
 obj-$(CONFIG_HAS_IOMEM) += memremap.o
+obj-$(CONFIG_RSEQ) += rseq.o
 
 $(obj)/configs.o: $(obj)/config_data.h
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 432eadf6b58c..e903ee4f21ba 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1858,6 +1858,8 @@ static __latent_entropy struct task_struct *copy_process(
 	 */
 	copy_seccomp(p);
 
+	rseq_fork(p, clone_flags);
+
 	/*
 	 * Process group and session signals need to be delivered to just the
 	 * parent before the fork or both the parent and the child after the
diff --git a/kernel/rseq.c b/kernel/rseq.c
new file mode 100644
index 000000000000..e076a3acb454
--- /dev/null
+++ b/kernel/rseq.c
@@ -0,0 +1,338 @@
+/*
+ * Restartable sequences system call
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Copyright (C) 2015, Google, Inc.,
+ * Paul Turner <pjt@google.com> and Andrew Hunter <ahh@google.com>
+ * Copyright (C) 2015-2016, EfficiOS Inc.,
+ * Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ */
+
+#include <linux/sched.h>
+#include <linux/uaccess.h>
+#include <linux/syscalls.h>
+#include <linux/rseq.h>
+#include <linux/types.h>
+#include <asm/ptrace.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/rseq.h>
+
+/*
+ *
+ * Restartable sequences are a lightweight interface that allows
+ * user-level code to be executed atomically relative to scheduler
+ * preemption and signal delivery. Typically used for implementing
+ * per-cpu operations.
+ *
+ * It allows user-space to perform update operations on per-cpu data
+ * without requiring heavy-weight atomic operations.
+ *
+ * Detailed algorithm of rseq user-space assembly sequences:
+ *
+ *                     init(rseq_cs)
+ *                     cpu = TLS->rseq::cpu_id_start
+ *   [1]               TLS->rseq::rseq_cs = rseq_cs
+ *   [start_ip]        ----------------------------
+ *   [2]               if (cpu != TLS->rseq::cpu_id)
+ *                             goto abort_ip;
+ *   [3]               <last_instruction_in_cs>
+ *   [post_commit_ip]  ----------------------------
+ *
+ *   The address of jump target abort_ip must be outside the critical
+ *   region, i.e.:
+ *
+ *     [abort_ip] < [start_ip]  || [abort_ip] >= [post_commit_ip]
+ *
+ *   Steps [2]-[3] (inclusive) need to be a sequence of instructions in
+ *   userspace that can handle being interrupted between any of those
+ *   instructions, and then resumed to the abort_ip.
+ *
+ *   1.  Userspace stores the address of the struct rseq_cs assembly
+ *       block descriptor into the rseq_cs field of the registered
+ *       struct rseq TLS area. This update is performed through a single
+ *       store within the inline assembly instruction sequence.
+ *       [start_ip]
+ *
+ *   2.  Userspace tests to check whether the current cpu_id field match
+ *       the cpu number loaded before start_ip, branching to abort_ip
+ *       in case of a mismatch.
+ *
+ *       If the sequence is preempted or interrupted by a signal
+ *       at or after start_ip and before post_commit_ip, then the kernel
+ *       clears TLS->__rseq_abi::rseq_cs, and sets the user-space return
+ *       ip to abort_ip before returning to user-space, so the preempted
+ *       execution resumes at abort_ip.
+ *
+ *   3.  Userspace critical section final instruction before
+ *       post_commit_ip is the commit. The critical section is
+ *       self-terminating.
+ *       [post_commit_ip]
+ *
+ *   4.  <success>
+ *
+ *   On failure at [2], or if interrupted by preempt or signal delivery
+ *   between [1] and [3]:
+ *
+ *       [abort_ip]
+ *   F1. <failure>
+ */
+
+static int rseq_update_cpu_id(struct task_struct *t)
+{
+	uint32_t cpu_id = raw_smp_processor_id();
+
+	if (__put_user(cpu_id, &t->rseq->cpu_id_start))
+		return -EFAULT;
+	if (__put_user(cpu_id, &t->rseq->cpu_id))
+		return -EFAULT;
+	trace_rseq_update(t);
+	return 0;
+}
+
+static int rseq_reset_rseq_cpu_id(struct task_struct *t)
+{
+	uint32_t cpu_id_start = 0, cpu_id = RSEQ_CPU_ID_UNINITIALIZED;
+
+	/*
+	 * Reset cpu_id_start to its initial state (0).
+	 */
+	if (__put_user(cpu_id_start, &t->rseq->cpu_id_start))
+		return -EFAULT;
+	/*
+	 * Reset cpu_id to RSEQ_CPU_ID_UNINITIALIZED, so any user coming
+	 * in after unregistration can figure out that rseq needs to be
+	 * registered again.
+	 */
+	if (__put_user(cpu_id, &t->rseq->cpu_id))
+		return -EFAULT;
+	return 0;
+}
+
+static int rseq_get_rseq_cs(struct task_struct *t,
+			    unsigned long *start_ip,
+			    unsigned long *post_commit_offset,
+			    unsigned long *abort_ip,
+			    uint32_t *cs_flags)
+{
+	struct rseq_cs __user *urseq_cs;
+	struct rseq_cs rseq_cs;
+	unsigned long ptr;
+	u32 __user *usig;
+	u32 sig;
+	int ret;
+
+	ret = __get_user(ptr, &t->rseq->rseq_cs);
+	if (ret)
+		return ret;
+	if (!ptr)
+		return 0;
+	urseq_cs = (struct rseq_cs __user *)ptr;
+	if (copy_from_user(&rseq_cs, urseq_cs, sizeof(rseq_cs)))
+		return -EFAULT;
+	/*
+	 * The rseq_cs field is set to NULL on preemption or signal
+	 * delivery on top of rseq assembly block, as well as on top
+	 * of code outside of the rseq assembly block. This performs
+	 * a lazy clear of the rseq_cs field.
+	 *
+	 * Set rseq_cs to NULL with single-copy atomicity.
+	 */
+	ptr = 0;
+	ret = __put_user(ptr, &t->rseq->rseq_cs);
+	if (ret)
+		return ret;
+	if (rseq_cs.version > 0)
+		return -EINVAL;
+
+	/* Ensure that abort_ip is not in the critical section. */
+	if (rseq_cs.abort_ip - rseq_cs.start_ip < rseq_cs.post_commit_offset)
+		return -EINVAL;
+
+	*cs_flags = rseq_cs.flags;
+	*start_ip = rseq_cs.start_ip;
+	*post_commit_offset = rseq_cs.post_commit_offset;
+	*abort_ip = rseq_cs.abort_ip;
+
+	usig = (u32 __user *)(rseq_cs.abort_ip - sizeof(u32));
+	ret = get_user(sig, usig);
+	if (ret)
+		return ret;
+
+	if (current->rseq_sig != sig) {
+		printk_ratelimited(KERN_WARNING
+			"Possible attack attempt. Unexpected rseq signature 0x%x, expecting 0x%x (pid=%d, addr=%p).\n",
+			sig, current->rseq_sig, current->pid, usig);
+		return -EPERM;
+	}
+	return 0;
+}
+
+static int rseq_need_restart(struct task_struct *t, uint32_t cs_flags)
+{
+	uint32_t flags, event_mask;
+	int ret;
+
+	/* Get thread flags. */
+	ret = __get_user(flags, &t->rseq->flags);
+	if (ret)
+		return ret;
+
+	/* Take critical section flags into account. */
+	flags |= cs_flags;
+
+	/*
+	 * Restart on signal can only be inhibited when restart on
+	 * preempt and restart on migrate are inhibited too. Otherwise,
+	 * a preempted signal handler could fail to restart the prior
+	 * execution context on sigreturn.
+	 */
+	if (unlikely(flags & RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL)) {
+		if ((flags & (RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE
+		    | RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT)) !=
+		    (RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE
+		     | RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT))
+			return -EINVAL;
+	}
+	event_mask = t->rseq_event_mask;
+	t->rseq_event_mask = 0;
+	event_mask &= ~flags;
+	if (event_mask)
+		return 1;
+	return 0;
+}
+
+static int rseq_ip_fixup(struct pt_regs *regs)
+{
+	unsigned long ip = instruction_pointer(regs), start_ip = 0,
+		post_commit_offset = 0, abort_ip = 0;
+	struct task_struct *t = current;
+	uint32_t cs_flags = 0;
+	int ret;
+
+	ret = rseq_get_rseq_cs(t, &start_ip, &post_commit_offset, &abort_ip,
+			&cs_flags);
+	if (ret)
+		return ret;
+
+	ret = rseq_need_restart(t, cs_flags);
+	if (ret <= 0)
+		return ret;
+	/*
+	 * Handle potentially not being within a critical section.
+	 * Unsigned comparison will be true when
+	 * ip < start_ip (wrap-around to large values), and when
+	 * ip >= start_ip + post_commit_offset.
+	 */
+	if (ip - start_ip >= post_commit_offset)
+		return 1;
+
+	trace_rseq_ip_fixup(ip, start_ip, post_commit_offset, abort_ip);
+	instruction_pointer_set(regs, (unsigned long)abort_ip);
+	return 1;
+}
+
+/*
+ * This resume handler must always be executed between any of:
+ * - preemption,
+ * - signal delivery,
+ * and return to user-space.
+ *
+ * This is how we can ensure that the entire rseq critical section,
+ * consisting of both the C part and the assembly instruction sequence,
+ * will issue the commit instruction only if executed atomically with
+ * respect to other threads scheduled on the same CPU, and with respect
+ * to signal handlers.
+ */
+void __rseq_handle_notify_resume(struct pt_regs *regs)
+{
+	struct task_struct *t = current;
+	int ret;
+
+	if (unlikely(t->flags & PF_EXITING))
+		return;
+	if (unlikely(!access_ok(VERIFY_WRITE, t->rseq, sizeof(*t->rseq))))
+		goto error;
+	ret = rseq_ip_fixup(regs);
+	if (unlikely(ret < 0))
+		goto error;
+	if (unlikely(rseq_update_cpu_id(t)))
+		goto error;
+	return;
+
+error:
+	force_sig(SIGSEGV, t);
+}
+
+/*
+ * sys_rseq - setup restartable sequences for caller thread.
+ */
+SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, uint32_t, rseq_len,
+		int, flags, uint32_t, sig)
+{
+	int ret;
+
+	if (flags & RSEQ_FLAG_UNREGISTER) {
+		/* Unregister rseq for current thread. */
+		if (current->rseq != rseq || !current->rseq)
+			return -EINVAL;
+		if (current->rseq_len != rseq_len)
+			return -EINVAL;
+		if (current->rseq_sig != sig)
+			return -EPERM;
+		ret = rseq_reset_rseq_cpu_id(current);
+		if (ret)
+			return ret;
+		current->rseq = NULL;
+		current->rseq_len = 0;
+		current->rseq_sig = 0;
+		return 0;
+	}
+
+	if (unlikely(flags))
+		return -EINVAL;
+
+	if (current->rseq) {
+		/*
+		 * If rseq is already registered, check whether
+		 * the provided address differs from the prior
+		 * one.
+		 */
+		if (current->rseq != rseq || current->rseq_len != rseq_len)
+			return -EINVAL;
+		if (current->rseq_sig != sig)
+			return -EPERM;
+		/* Already registered. */
+		return -EBUSY;
+	}
+
+	/*
+	 * If there was no rseq previously registered,
+	 * ensure the provided rseq is properly aligned and valid.
+	 */
+	if (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq)) ||
+	    rseq_len != sizeof(*rseq))
+		return -EINVAL;
+	if (!access_ok(VERIFY_WRITE, rseq, rseq_len))
+		return -EFAULT;
+	current->rseq = rseq;
+	current->rseq_len = rseq_len;
+	current->rseq_sig = sig;
+	/*
+	 * If rseq was previously inactive, and has just been
+	 * registered, ensure the cpu_id_start and cpu_id fields
+	 * are updated before returning to user-space.
+	 */
+	rseq_set_notify_resume(current);
+
+	return 0;
+}
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 75554f366fd3..317136421ac7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1188,6 +1188,8 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 	WARN_ON_ONCE(!cpu_online(new_cpu));
 #endif
 
+	rseq_migrate(p);
+
 	trace_sched_migrate_task(p, new_cpu);
 
 	if (task_cpu(p) != new_cpu) {
@@ -2590,6 +2592,7 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev,
 {
 	sched_info_switch(rq, prev, next);
 	perf_event_task_sched_out(prev, next);
+	rseq_sched_out(prev);
 	fire_sched_out_preempt_notifiers(prev, next);
 	prepare_lock_switch(rq, next);
 	prepare_arch_switch(next);
@@ -3350,6 +3353,7 @@ static void __sched notrace __schedule(bool preempt)
 	clear_preempt_need_resched();
 
 	if (likely(prev != next)) {
+		rseq_preempt(prev);
 		rq->nr_switches++;
 		rq->curr = next;
 		/*
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index b5189762d275..bfa1ee1bf669 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -259,3 +259,6 @@ cond_syscall(sys_membarrier);
 cond_syscall(sys_pkey_mprotect);
 cond_syscall(sys_pkey_alloc);
 cond_syscall(sys_pkey_free);
+
+/* restartable sequence */
+cond_syscall(sys_rseq);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 03/22] arm: Add restartable sequences support
  2017-11-21 14:18 ` Mathieu Desnoyers
                   ` (2 preceding siblings ...)
  (?)
@ 2017-11-21 14:18 ` Mathieu Desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers

Call the rseq_handle_notify_resume() function on return to
userspace if TIF_NOTIFY_RESUME thread flag is set.

Increment the event counter and perform fixup on the pre-signal frame
when a signal is delivered on top of a restartable sequence critical
section.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org
---
 arch/arm/Kconfig         | 1 +
 arch/arm/kernel/signal.c | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 51c8df561077..556e9b2225c6 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -88,6 +88,7 @@ config ARM
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE)
 	select HAVE_REGS_AND_STACK_ACCESS_API
+	select HAVE_RSEQ
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_UID16
 	select HAVE_VIRT_CPU_ACCOUNTING_GEN
diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index bd8810d4acb3..5879ab3f53c1 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -541,6 +541,12 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 	int ret;
 
 	/*
+	 * Increment event counter and perform fixup for the pre-signal
+	 * frame.
+	 */
+	rseq_signal_deliver(regs);
+
+	/*
 	 * Set up the stack frame
 	 */
 	if (ksig->ka.sa.sa_flags & SA_SIGINFO)
@@ -660,6 +666,7 @@ do_work_pending(struct pt_regs *regs, unsigned int thread_flags, int syscall)
 			} else {
 				clear_thread_flag(TIF_NOTIFY_RESUME);
 				tracehook_notify_resume(regs);
+				rseq_handle_notify_resume(regs);
 			}
 		}
 		local_irq_disable();
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 04/22] arm: Wire up restartable sequences system call
  2017-11-21 14:18 ` Mathieu Desnoyers
                   ` (3 preceding siblings ...)
  (?)
@ 2017-11-21 14:18 ` Mathieu Desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers

Wire up the rseq system call on 32-bit ARM.

This provides an ABI improving the speed of a user-space getcpu
operation on ARM by skipping the getcpu system call on the fast path, as
well as improving the speed of user-space operations on per-cpu data
compared to using load-linked/store-conditional.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org
---
 arch/arm/tools/syscall.tbl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 0bb0e9c6376c..fbc74b5fa3ed 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -412,3 +412,4 @@
 395	common	pkey_alloc		sys_pkey_alloc
 396	common	pkey_free		sys_pkey_free
 397	common	statx			sys_statx
+398	common	rseq			sys_rseq
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 05/22] x86: Add support for restartable sequences
@ 2017-11-21 14:18   ` Mathieu Desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers

Call the rseq_handle_notify_resume() function on return to userspace if
TIF_NOTIFY_RESUME thread flag is set.

Increment the event counter and perform fixup on the pre-signal frame
when a signal is delivered on top of a restartable sequence critical
section.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org
---
 arch/x86/Kconfig         | 1 +
 arch/x86/entry/common.c  | 1 +
 arch/x86/kernel/signal.c | 6 ++++++
 3 files changed, 8 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index df3276d6bfe3..4799a440e39c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -174,6 +174,7 @@ config X86
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE		if X86_64 && UNWINDER_FRAME_POINTER && STACK_VALIDATION
 	select HAVE_STACK_VALIDATION		if X86_64
+	select HAVE_RSEQ
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_UNSTABLE_SCHED_CLOCK
 	select HAVE_USER_RETURN_NOTIFIER
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index d7d3cc24baf4..d65595b201f4 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -160,6 +160,7 @@ static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
 		if (cached_flags & _TIF_NOTIFY_RESUME) {
 			clear_thread_flag(TIF_NOTIFY_RESUME);
 			tracehook_notify_resume(regs);
+			rseq_handle_notify_resume(regs);
 		}
 
 		if (cached_flags & _TIF_USER_RETURN_NOTIFY)
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index b9e00e8f1c9b..991017d26d00 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -687,6 +687,12 @@ setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs)
 	sigset_t *set = sigmask_to_save();
 	compat_sigset_t *cset = (compat_sigset_t *) set;
 
+	/*
+	 * Increment event counter and perform fixup for the pre-signal
+	 * frame.
+	 */
+	rseq_signal_deliver(regs);
+
 	/* Set up the stack frame */
 	if (is_ia32_frame(ksig)) {
 		if (ksig->ka.sa.sa_flags & SA_SIGINFO)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 05/22] x86: Add support for restartable sequences
@ 2017-11-21 14:18   ` Mathieu Desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers

Call the rseq_handle_notify_resume() function on return to userspace if
TIF_NOTIFY_RESUME thread flag is set.

Increment the event counter and perform fixup on the pre-signal frame
when a signal is delivered on top of a restartable sequence critical
section.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
Reviewed-by: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
CC: Russell King <linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>
CC: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
CC: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
CC: Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
CC: Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
CC: Dave Watson <davejwatson-b10kYP2dOMg@public.gmane.org>
CC: Chris Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
CC: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
CC: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
CC: Ben Maurer <bmaurer-b10kYP2dOMg@public.gmane.org>
CC: Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>
CC: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
CC: Josh Triplett <josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org>
CC: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
CC: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
CC: Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
 arch/x86/Kconfig         | 1 +
 arch/x86/entry/common.c  | 1 +
 arch/x86/kernel/signal.c | 6 ++++++
 3 files changed, 8 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index df3276d6bfe3..4799a440e39c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -174,6 +174,7 @@ config X86
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE		if X86_64 && UNWINDER_FRAME_POINTER && STACK_VALIDATION
 	select HAVE_STACK_VALIDATION		if X86_64
+	select HAVE_RSEQ
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_UNSTABLE_SCHED_CLOCK
 	select HAVE_USER_RETURN_NOTIFIER
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index d7d3cc24baf4..d65595b201f4 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -160,6 +160,7 @@ static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
 		if (cached_flags & _TIF_NOTIFY_RESUME) {
 			clear_thread_flag(TIF_NOTIFY_RESUME);
 			tracehook_notify_resume(regs);
+			rseq_handle_notify_resume(regs);
 		}
 
 		if (cached_flags & _TIF_USER_RETURN_NOTIFY)
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index b9e00e8f1c9b..991017d26d00 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -687,6 +687,12 @@ setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs)
 	sigset_t *set = sigmask_to_save();
 	compat_sigset_t *cset = (compat_sigset_t *) set;
 
+	/*
+	 * Increment event counter and perform fixup for the pre-signal
+	 * frame.
+	 */
+	rseq_signal_deliver(regs);
+
 	/* Set up the stack frame */
 	if (is_ia32_frame(ksig)) {
 		if (ksig->ka.sa.sa_flags & SA_SIGINFO)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 06/22] x86: Wire up restartable sequence system call
@ 2017-11-21 14:18   ` Mathieu Desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers

Wire up the rseq system call on x86 32/64.

This provides an ABI improving the speed of a user-space getcpu
operation on x86 by removing the need to perform a function call, "lsl"
instruction, or system call on the fast path, as well as improving the
speed of user-space operations on per-cpu data.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org
---
 arch/x86/entry/syscalls/syscall_32.tbl | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 448ac2161112..ba43ee75e425 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -391,3 +391,4 @@
 382	i386	pkey_free		sys_pkey_free
 383	i386	statx			sys_statx
 384	i386	arch_prctl		sys_arch_prctl			compat_sys_arch_prctl
+385	i386	rseq			sys_rseq
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 5aef183e2f85..3ad03495bbb9 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -339,6 +339,7 @@
 330	common	pkey_alloc		sys_pkey_alloc
 331	common	pkey_free		sys_pkey_free
 332	common	statx			sys_statx
+333	common	rseq			sys_rseq
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 06/22] x86: Wire up restartable sequence system call
@ 2017-11-21 14:18   ` Mathieu Desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers

Wire up the rseq system call on x86 32/64.

This provides an ABI improving the speed of a user-space getcpu
operation on x86 by removing the need to perform a function call, "lsl"
instruction, or system call on the fast path, as well as improving the
speed of user-space operations on per-cpu data.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
Reviewed-by: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
CC: Russell King <linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>
CC: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
CC: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
CC: Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
CC: Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
CC: Dave Watson <davejwatson-b10kYP2dOMg@public.gmane.org>
CC: Chris Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
CC: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
CC: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
CC: Ben Maurer <bmaurer-b10kYP2dOMg@public.gmane.org>
CC: Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>
CC: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
CC: Josh Triplett <josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org>
CC: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
CC: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
CC: Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
 arch/x86/entry/syscalls/syscall_32.tbl | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 448ac2161112..ba43ee75e425 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -391,3 +391,4 @@
 382	i386	pkey_free		sys_pkey_free
 383	i386	statx			sys_statx
 384	i386	arch_prctl		sys_arch_prctl			compat_sys_arch_prctl
+385	i386	rseq			sys_rseq
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 5aef183e2f85..3ad03495bbb9 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -339,6 +339,7 @@
 330	common	pkey_alloc		sys_pkey_alloc
 331	common	pkey_free		sys_pkey_free
 332	common	statx			sys_statx
+333	common	rseq			sys_rseq
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 07/22] powerpc: Add support for restartable sequences
  2017-11-21 14:18 ` Mathieu Desnoyers
@ 2017-11-21 14:18   ` Mathieu Desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linuxppc-dev

From: Boqun Feng <boqun.feng@gmail.com>

Call the rseq_handle_notify_resume() function on return to userspace if
TIF_NOTIFY_RESUME thread flag is set.

Increment the event counter and perform fixup on the pre-signal when a
signal is delivered on top of a restartable sequence critical section.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Peter Zijlstra <peterz@infradead.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/Kconfig         | 1 +
 arch/powerpc/kernel/signal.c | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c51e6ce42e7a..e9992f80819c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -221,6 +221,7 @@ config PPC
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_VIRT_CPU_ACCOUNTING
 	select HAVE_IRQ_TIME_ACCOUNTING
+	select HAVE_RSEQ
 	select IRQ_DOMAIN
 	select IRQ_FORCED_THREADING
 	select MODULES_USE_ELF_RELA
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index 3d7539b90010..a7b95f7bcaf1 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk)
 	/* Re-enable the breakpoints for the signal stack */
 	thread_change_pc(tsk, tsk->thread.regs);
 
+	rseq_signal_deliver(tsk->thread.regs);
+
 	if (is32) {
         	if (ksig.ka.sa.sa_flags & SA_SIGINFO)
 			ret = handle_rt_signal32(&ksig, oldset, tsk);
@@ -161,6 +163,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
 	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
 		clear_thread_flag(TIF_NOTIFY_RESUME);
 		tracehook_notify_resume(regs);
+		rseq_handle_notify_resume(regs);
 	}
 
 	if (thread_info_flags & _TIF_PATCH_PENDING)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 07/22] powerpc: Add support for restartable sequences
@ 2017-11-21 14:18   ` Mathieu Desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers,
	Benjamin Herrenschmidt

From: Boqun Feng <boqun.feng@gmail.com>

Call the rseq_handle_notify_resume() function on return to userspace if
TIF_NOTIFY_RESUME thread flag is set.

Increment the event counter and perform fixup on the pre-signal when a
signal is delivered on top of a restartable sequence critical section.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Peter Zijlstra <peterz@infradead.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/Kconfig         | 1 +
 arch/powerpc/kernel/signal.c | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c51e6ce42e7a..e9992f80819c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -221,6 +221,7 @@ config PPC
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_VIRT_CPU_ACCOUNTING
 	select HAVE_IRQ_TIME_ACCOUNTING
+	select HAVE_RSEQ
 	select IRQ_DOMAIN
 	select IRQ_FORCED_THREADING
 	select MODULES_USE_ELF_RELA
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index 3d7539b90010..a7b95f7bcaf1 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk)
 	/* Re-enable the breakpoints for the signal stack */
 	thread_change_pc(tsk, tsk->thread.regs);
 
+	rseq_signal_deliver(tsk->thread.regs);
+
 	if (is32) {
         	if (ksig.ka.sa.sa_flags & SA_SIGINFO)
 			ret = handle_rt_signal32(&ksig, oldset, tsk);
@@ -161,6 +163,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
 	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
 		clear_thread_flag(TIF_NOTIFY_RESUME);
 		tracehook_notify_resume(regs);
+		rseq_handle_notify_resume(regs);
 	}
 
 	if (thread_info_flags & _TIF_PATCH_PENDING)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 08/22] powerpc: Wire up restartable sequences system call
  2017-11-21 14:18 ` Mathieu Desnoyers
@ 2017-11-21 14:18   ` Mathieu Desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linuxppc-dev

From: Boqun Feng <boqun.feng@gmail.com>

Wire up the rseq system call on powerpc.

This provides an ABI improving the speed of a user-space getcpu
operation on powerpc by skipping the getcpu system call on the fast
path, as well as improving the speed of user-space operations on per-cpu
data compared to using load-reservation/store-conditional atomics.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Peter Zijlstra <peterz@infradead.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/include/asm/systbl.h      | 1 +
 arch/powerpc/include/asm/unistd.h      | 2 +-
 arch/powerpc/include/uapi/asm/unistd.h | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h
index 449912f057f6..964321a5799c 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -389,3 +389,4 @@ COMPAT_SYS_SPU(preadv2)
 COMPAT_SYS_SPU(pwritev2)
 SYSCALL(kexec_file_load)
 SYSCALL(statx)
+SYSCALL(rseq)
diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h
index 9ba11dbcaca9..e76bd5601ea4 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -12,7 +12,7 @@
 #include <uapi/asm/unistd.h>
 
 
-#define NR_syscalls		384
+#define NR_syscalls		385
 
 #define __NR__exit __NR_exit
 
diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h
index df8684f31919..b1980fcd56d5 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -395,5 +395,6 @@
 #define __NR_pwritev2		381
 #define __NR_kexec_file_load	382
 #define __NR_statx		383
+#define __NR_rseq		384
 
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 08/22] powerpc: Wire up restartable sequences system call
@ 2017-11-21 14:18   ` Mathieu Desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: Will Deacon, Andi Kleen, Paul Mackerras, H . Peter Anvin,
	Chris Lameter, Russell King, Andrew Hunter, Ingo Molnar,
	Michael Kerrisk, Catalin Marinas, Paul Turner, Josh Triplett,
	Steven Rostedt, Ben Maurer, Mathieu Desnoyers, Thomas Gleixner,
	linux-api, linuxppc-dev, linux-kernel, Andrew Morton,
	Linus Torvalds

From: Boqun Feng <boqun.feng@gmail.com>

Wire up the rseq system call on powerpc.

This provides an ABI improving the speed of a user-space getcpu
operation on powerpc by skipping the getcpu system call on the fast
path, as well as improving the speed of user-space operations on per-cpu
data compared to using load-reservation/store-conditional atomics.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Peter Zijlstra <peterz@infradead.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/include/asm/systbl.h      | 1 +
 arch/powerpc/include/asm/unistd.h      | 2 +-
 arch/powerpc/include/uapi/asm/unistd.h | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h
index 449912f057f6..964321a5799c 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -389,3 +389,4 @@ COMPAT_SYS_SPU(preadv2)
 COMPAT_SYS_SPU(pwritev2)
 SYSCALL(kexec_file_load)
 SYSCALL(statx)
+SYSCALL(rseq)
diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h
index 9ba11dbcaca9..e76bd5601ea4 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -12,7 +12,7 @@
 #include <uapi/asm/unistd.h>
 
 
-#define NR_syscalls		384
+#define NR_syscalls		385
 
 #define __NR__exit __NR_exit
 
diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h
index df8684f31919..b1980fcd56d5 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -395,5 +395,6 @@
 #define __NR_pwritev2		381
 #define __NR_kexec_file_load	382
 #define __NR_statx		383
+#define __NR_rseq		384
 
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 09/22] sched: Implement push_task_to_cpu
  2017-11-21 14:18 ` Mathieu Desnoyers
                   ` (8 preceding siblings ...)
  (?)
@ 2017-11-21 14:18 ` Mathieu Desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers

Implement push_task_to_cpu(), which moves the task received as argument
to the destination cpu's runqueue. It only does so if the CPU is within
the CPU allowed mask of the task, else it returns -EINVAL.

It does not change the CPU allowed mask, and can therefore be used
within applications which rely on owning the sched_setaffinity() state.

It does not pin the task to the destination CPU, which means that the
scheduler may choose to move the task away from that CPU before the
task executes. Code invoking push_task_to_cpu() must be prepared to
retry in that case.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul Turner <pjt@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Andrew Hunter <ahh@google.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org
---
 kernel/sched/core.c  | 37 +++++++++++++++++++++++++++++++++++++
 kernel/sched/sched.h |  9 +++++++++
 2 files changed, 46 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 317136421ac7..4bbe297574b5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1061,6 +1061,43 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
 		set_curr_task(rq, p);
 }
 
+int push_task_to_cpu(struct task_struct *p, unsigned int dest_cpu)
+{
+	struct rq_flags rf;
+	struct rq *rq;
+	int ret = 0;
+
+	rq = task_rq_lock(p, &rf);
+	update_rq_clock(rq);
+
+	if (!cpumask_test_cpu(dest_cpu, &p->cpus_allowed)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (task_cpu(p) == dest_cpu)
+		goto out;
+
+	if (task_running(rq, p) || p->state == TASK_WAKING) {
+		struct migration_arg arg = { p, dest_cpu };
+		/* Need help from migration thread: drop lock and wait. */
+		task_rq_unlock(rq, p, &rf);
+		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
+		tlb_migrate_finish(p->mm);
+		return 0;
+	} else if (task_on_rq_queued(p)) {
+		/*
+		 * OK, since we're going to drop the lock immediately
+		 * afterwards anyway.
+		 */
+		rq = move_queued_task(rq, &rf, p, dest_cpu);
+	}
+out:
+	task_rq_unlock(rq, p, &rf);
+
+	return ret;
+}
+
 /*
  * Change a given task's CPU affinity. Migrate the thread to a
  * proper CPU and schedule it away if the CPU it's executing on
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b19552a212de..8d262d732d35 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1223,6 +1223,15 @@ static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu)
 #endif
 }
 
+#ifdef CONFIG_SMP
+int push_task_to_cpu(struct task_struct *p, unsigned int dest_cpu);
+#else
+static inline int push_task_to_cpu(struct task_struct *p, unsigned int dest_cpu)
+{
+	return 0;
+}
+#endif
+
 /*
  * Tunables that become constants when CONFIG_SCHED_DEBUG is off:
  */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 v4 10/22] cpu_opv: Provide cpu_opv system call
  2017-11-21 14:18 ` Mathieu Desnoyers
                   ` (9 preceding siblings ...)
  (?)
@ 2017-11-21 14:18 ` Mathieu Desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers

The cpu_opv system call executes a vector of operations on behalf of
user-space on a specific CPU with preemption disabled. It is inspired
by readv() and writev() system calls which take a "struct iovec"
array as argument.

The operations available are: comparison, memcpy, add, or, and, xor,
left shift, right shift, and memory barrier. The system call receives
a CPU number from user-space as argument, which is the CPU on which
those operations need to be performed.  All pointers in the ops must
have been set up to point to the per CPU memory of the CPU on which
the operations should be executed. The "comparison" operation can be
used to check that the data used in the preparation step did not
change between preparation of system call inputs and operation
execution within the preempt-off critical section.

The reason why we require all pointer offsets to be calculated by
user-space beforehand is because we need to use get_user_pages_fast()
to first pin all pages touched by each operation. This takes care of
faulting-in the pages. Then, preemption is disabled, and the
operations are performed atomically with respect to other thread
execution on that CPU, without generating any page fault.

An overall maximum of 4216 bytes in enforced on the sum of operation
length within an operation vector, so user-space cannot generate a
too long preempt-off critical section (cache cold critical section
duration measured as 4.7µs on x86-64). Each operation is also limited
a length of 4096 bytes, meaning that an operation can touch a
maximum of 4 pages (memcpy: 2 pages for source, 2 pages for
destination if addresses are not aligned on page boundaries).

If the thread is not running on the requested CPU, it is migrated to
it.

**** Justification for rseq ****

Here are a few reasons justifying why the cpu_opv system call is
needed in addition to rseq:

1) Handling single-stepping from tools

Tools like debuggers, and simulators use single-stepping to run through
existing programs. If core libraries start to use restartable sequences
for e.g. memory allocation, this means pre-existing programs cannot be
single-stepped, simply because the underlying glibc or jemalloc has
changed.

The rseq user-space does expose a __rseq_table section for the sake of
debuggers, so they can skip over the rseq critical sections if they
want.  However, this requires upgrading tools, and still breaks
single-stepping in case where glibc or jemalloc is updated, but not the
tooling.

Having a performance-related library improvement break tooling is likely
to cause a big push-back against wide adoption of rseq.

2) Forward-progress guarantee

Having a piece of user-space code that stops progressing due to external
conditions is pretty bad. Developers are used to think of fast-path and
slow-path (e.g. for locking), where the contended vs uncontended cases
have different performance characteristics, but each need to provide
some level of progress guarantees.

There are concerns about proposing just "rseq" without the associated
slow-path (cpu_opv) that guarantees progress. It's just asking for
trouble when real-life will happen: page faults, uprobes, and other
unforeseen conditions that would seldom cause a rseq fast-path to never
progress.

3) Handling page faults

It's pretty easy to come up with corner-case scenarios where rseq does
not progress without the help from cpu_opv. For instance, a system with
swap enabled which is under high memory pressure could trigger page
faults at pretty much every rseq attempt. Although this scenario
is extremely unlikely, rseq becomes the weak link of the chain.

4) Comparison with LL/SC

The layman versed in the load-link/store-conditional instructions in
RISC architectures will notice the similarity between rseq and LL/SC
critical sections. The comparison can even be pushed further: since
debuggers can handle those LL/SC critical sections, they should be
able to handle rseq c.s. in the same way.

First, the way gdb recognises LL/SC c.s. patterns is very fragile:
it's limited to specific common patterns, and will miss the pattern
in all other cases. But fear not, having the rseq c.s. expose a
__rseq_table to debuggers removes that guessing part.

The main difference between LL/SC and rseq is that debuggers had
to support single-stepping through LL/SC critical sections from the
get go in order to support a given architecture. For rseq, we're
adding critical sections into pre-existing applications/libraries,
so the user expectation is that tools don't break due to a library
optimization.

5) Perform maintenance operations on per-cpu data

rseq c.s. are quite limited feature-wise: they need to end with a
*single* commit instruction that updates a memory location. On the other
hand, the cpu_opv system call can combine a sequence of operations that
need to be executed with preemption disabled. While slower than rseq,
this allows for more complex maintenance operations to be performed on
per-cpu data concurrently with rseq fast-paths, in cases where it's not
possible to map those sequences of ops to a rseq.

6) Use cpu_opv as generic implementation for architectures not
   implementing rseq assembly code

rseq critical sections require architecture-specific user-space code to
be crafted in order to port an algorithm to a given architecture.  In
addition, it requires that the kernel architecture implementation adds
hooks into signal delivery and resume to user-space.

In order to facilitate integration of rseq into user-space, cpu_opv can
provide a (relatively slower) architecture-agnostic implementation of
rseq. This means that user-space code can be ported to all architectures
through use of cpu_opv initially, and have the fast-path use rseq
whenever the asm code is implemented.

7) Allow libraries with multi-part algorithms to work on same per-cpu
   data without affecting the allowed cpu mask

The lttng-ust tracer presents an interesting use-case for per-cpu
buffers: the algorithm needs to update a "reserve" counter, serialize
data into the buffer, and then update a "commit" counter _on the same
per-cpu buffer_. Using rseq for both reserve and commit can bring
significant performance benefits.

Clearly, if rseq reserve fails, the algorithm can retry on a different
per-cpu buffer. However, it's not that easy for the commit. It needs to
be performed on the same per-cpu buffer as the reserve.

The cpu_opv system call solves that problem by receiving the cpu number
on which the operation needs to be performed as argument. It can push
the task to the right CPU if needed, and perform the operations there
with preemption disabled.

Changing the allowed cpu mask for the current thread is not an
acceptable alternative for a tracing library, because the application
being traced does not expect that mask to be changed by libraries.

8) Ensure that data structures don't need store-release/load-acquire
   semantic to handle fall-back

cpu_opv performs the fall-back on the requested CPU by migrating the
task to that CPU. Executing the slow-path on the right CPU ensures that
store-release/load-acquire semantic is not required neither on the
fast-path nor slow-path.

**** rseq and cpu_opv use-cases ****

1) per-cpu spinlock

A per-cpu spinlock can be implemented as a rseq consisting of a
comparison operation (== 0) on a word, and a word store (1), followed
by an acquire barrier after control dependency. The unlock path can be
performed with a simple store-release of 0 to the word, which does
not require rseq.

The cpu_opv fallback requires a single-word comparison (== 0) and a
single-word store (1).

2) per-cpu statistics counters

A per-cpu statistics counters can be implemented as a rseq consisting
of a final "add" instruction on a word as commit.

The cpu_opv fallback can be implemented as a "ADD" operation.

Besides statistics tracking, these counters can be used to implement
user-space RCU per-cpu grace period tracking for both single and
multi-process user-space RCU.

3) per-cpu LIFO linked-list (unlimited size stack)

A per-cpu LIFO linked-list has a "push" and "pop" operation,
which respectively adds an item to the list, and removes an
item from the list.

The "push" operation can be implemented as a rseq consisting of
a word comparison instruction against head followed by a word store
(commit) to head. Its cpu_opv fallback can be implemented as a
word-compare followed by word-store as well.

The "pop" operation can be implemented as a rseq consisting of
loading head, comparing it against NULL, loading the next pointer
at the right offset within the head item, and the next pointer as
a new head, returning the old head on success.

The cpu_opv fallback for "pop" differs from its rseq algorithm:
considering that cpu_opv requires to know all pointers at system
call entry so it can pin all pages, so cpu_opv cannot simply load
head and then load the head->next address within the preempt-off
critical section. User-space needs to pass the head and head->next
addresses to the kernel, and the kernel needs to check that the
head address is unchanged since it has been loaded by user-space.
However, when accessing head->next in a ABA situation, it's
possible that head is unchanged, but loading head->next can
result in a page fault due to a concurrently freed head object.
This is why the "expect_fault" operation field is introduced: if a
fault is triggered by this access, "-EAGAIN" will be returned by
cpu_opv rather than -EFAULT, thus indicating the the operation
vector should be attempted again. The "pop" operation can thus be
implemented as a word comparison of head against the head loaded
by user-space, followed by a load of the head->next pointer (which
may fault), and a store of that pointer as a new head.

4) per-cpu LIFO ring buffer with pointers to objects (fixed-sized stack)

This structure is useful for passing around allocated objects
by passing pointers through per-cpu fixed-sized stack.

The "push" side can be implemented with a check of the current
offset against the maximum buffer length, followed by a rseq
consisting of a comparison of the previously loaded offset
against the current offset, a word "try store" operation into the
next ring buffer array index (it's OK to abort after a try-store,
since it's not the commit, and its side-effect can be overwritten),
then followed by a word-store to increment the current offset (commit).

The "push" cpu_opv fallback can be done with the comparison, and
two consecutive word stores, all within the preempt-off section.

The "pop" side can be implemented with a check that offset is not
0 (whether the buffer is empty), a load of the "head" pointer before the
offset array index, followed by a rseq consisting of a word
comparison checking that the offset is unchanged since previously
loaded, another check ensuring that the "head" pointer is unchanged,
followed by a store decrementing the current offset.

The cpu_opv "pop" can be implemented with the same algorithm
as the rseq fast-path (compare, compare, store).

5) per-cpu LIFO ring buffer with pointers to objects (fixed-sized stack)
   supporting "peek" from remote CPU

In order to implement work queues with work-stealing between CPUs, it is
useful to ensure the offset "commit" in scenario 4) "push" have a
store-release semantic, thus allowing remote CPU to load the offset
with acquire semantic, and load the top pointer, in order to check if
work-stealing should be performed. The task (work queue item) existence
should be protected by other means, e.g. RCU.

If the peek operation notices that work-stealing should indeed be
performed, a thread can use cpu_opv to move the task between per-cpu
workqueues, by first invoking cpu_opv passing the remote work queue
cpu number as argument to pop the task, and then again as "push" with
the target work queue CPU number.

6) per-cpu LIFO ring buffer with data copy (fixed-sized stack)
   (with and without acquire-release)

This structure is useful for passing around data without requiring
memory allocation by copying the data content into per-cpu fixed-sized
stack.

The "push" operation is performed with an offset comparison against
the buffer size (figuring out if the buffer is full), followed by
a rseq consisting of a comparison of the offset, a try-memcpy attempting
to copy the data content into the buffer (which can be aborted and
overwritten), and a final store incrementing the offset.

The cpu_opv fallback needs to same operations, except that the memcpy
is guaranteed to complete, given that it is performed with preemption
disabled. This requires a memcpy operation supporting length up to 4kB.

The "pop" operation is similar to the "push, except that the offset
is first compared to 0 to ensure the buffer is not empty. The
copy source is the ring buffer, and the destination is an output
buffer.

7) per-cpu FIFO ring buffer (fixed-sized queue)

This structure is useful wherever a FIFO behavior (queue) is needed.
One major use-case is tracer ring buffer.

An implementation of this ring buffer has a "reserve", followed by
serialization of multiple bytes into the buffer, ended by a "commit".
The "reserve" can be implemented as a rseq consisting of a word
comparison followed by a word store. The reserve operation moves the
producer "head". The multi-byte serialization can be performed
non-atomically. Finally, the "commit" update can be performed with
a rseq "add" commit instruction with store-release semantic. The
ring buffer consumer reads the commit value with load-acquire
semantic to know whenever it is safe to read from the ring buffer.

This use-case requires that both "reserve" and "commit" operations
be performed on the same per-cpu ring buffer, even if a migration
happens between those operations. In the typical case, both operations
will happens on the same CPU and use rseq. In the unlikely event of a
migration, the cpu_opv system call will ensure the commit can be
performed on the right CPU by migrating the task to that CPU.

On the consumer side, an alternative to using store-release and
load-acquire on the commit counter would be to use cpu_opv to
ensure the commit counter load is performed on the right CPU. This
effectively allows moving a consumer thread between CPUs to execute
close to the ring buffer cache lines it will read.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul Turner <pjt@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Andrew Hunter <ahh@google.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org
---
Changes since v1:
- handle CPU hotplug,
- cleanup implementation using function pointers: We can use function
  pointers to implement the operations rather than duplicating all the
  user-access code.
- refuse device pages: Performing cpu_opv operations on io map'd pages
  with preemption disabled could generate long preempt-off critical
  sections, which leads to unwanted scheduler latency. Return EFAULT if
  a device page is received as parameter
- restrict op vector to 4216 bytes length sum: Restrict the operation
  vector to length sum of:
  - 4096 bytes (typical page size on most architectures, should be
    enough for a string, or structures)
  - 15 * 8 bytes (typical operations on integers or pointers).
  The goal here is to keep the duration of preempt off critical section
  short, so we don't add significant scheduler latency.
- Add INIT_ONSTACK macro: Introduce the
  CPU_OP_FIELD_u32_u64_INIT_ONSTACK() macros to ensure that users
  correctly initialize the upper bits of CPU_OP_FIELD_u32_u64() on their
  stack to 0 on 32-bit architectures.
- Add CPU_MB_OP operation:
  Use-cases with:
  - two consecutive stores,
  - a mempcy followed by a store,
  require a memory barrier before the final store operation. A typical
  use-case is a store-release on the final store. Given that this is a
  slow path, just providing an explicit full barrier instruction should
  be sufficient.
- Add expect fault field:
  The use-case of list_pop brings interesting challenges. With rseq, we
  can use rseq_cmpnev_storeoffp_load(), and therefore load a pointer,
  compare it against NULL, add an offset, and load the target "next"
  pointer from the object, all within a single req critical section.

  Life is not so easy for cpu_opv in this use-case, mainly because we
  need to pin all pages we are going to touch in the preempt-off
  critical section beforehand. So we need to know the target object (in
  which we apply an offset to fetch the next pointer) when we pin pages
  before disabling preemption.

  So the approach is to load the head pointer and compare it against
  NULL in user-space, before doing the cpu_opv syscall. User-space can
  then compute the address of the head->next field, *without loading it*.

  The cpu_opv system call will first need to pin all pages associated
  with input data. This includes the page backing the head->next object,
  which may have been concurrently deallocated and unmapped. Therefore,
  in this case, getting -EFAULT when trying to pin those pages may
  happen: it just means they have been concurrently unmapped. This is
  an expected situation, and should just return -EAGAIN to user-space,
  to user-space can distinguish between "should retry" type of
  situations and actual errors that should be handled with extreme
  prejudice to the program (e.g. abort()).

  Therefore, add "expect_fault" fields along with op input address
  pointers, so user-space can identify whether a fault when getting a
  field should return EAGAIN rather than EFAULT.
- Add compiler barrier between operations: Adding a compiler barrier
  between store operations in a cpu_opv sequence can be useful when
  paired with membarrier system call.

  An algorithm with a paired slow path and fast path can use
  sys_membarrier on the slow path to replace fast-path memory barriers
  by compiler barrier.

  Adding an explicit compiler barrier between operations allows
  cpu_opv to be used as fallback for operations meant to match
  the membarrier system call.

Changes since v2:

- Fix memory leak by introducing struct cpu_opv_pinned_pages.
  Suggested by Boqun Feng.
- Cast argument 1 passed to access_ok from integer to void __user *,
  fixing sparse warning.

Changes since v3:

- Fix !SMP by adding push_task_to_cpu() empty static inline.
- Add missing sys_cpu_opv() asmlinkage declaration to
  include/linux/syscalls.h.

Changes since v4:

- Cleanup based on Thomas Gleixner's feedback.
- Fault-in pages which are not faulted in yet (e.g. zero pages).
- Handle retry in case where the scheduler migrates the thread away
  from the target CPU after migration within the syscall rather than
  returning EAGAIN to user-space.
- Move push_task_to_cpu() to its own patch.

---
Man page associated:

CPU_OPV(2)              Linux Programmer's Manual             CPU_OPV(2)

NAME
       cpu_opv - CPU preempt-off operation vector system call

SYNOPSIS
       #include <linux/cpu_opv.h>

       int cpu_opv(struct cpu_op * cpu_opv, int cpuopcnt, int cpu, int flags);

DESCRIPTION
       The cpu_opv system call executes a vector of operations on behalf
       of user-space on a specific CPU with preemption disabled.

       The operations available are: comparison, memcpy, add,  or,  and,
       xor, left shift, right shift, and memory barrier. The system call
       receives a CPU number from user-space as argument, which  is  the
       CPU on which those operations need to be performed.  All pointers
       in the ops must have been set up to point to the per  CPU  memory
       of  the CPU on which the operations should be executed. The "com‐
       parison" operation can be used to check that the data used in the
       preparation  step  did  not  change between preparation of system
       call inputs and operation execution within the preempt-off criti‐
       cal section.

       An overall maximum of 4216 bytes in enforced on the sum of opera‐
       tion length within an operation vector, so user-space cannot gen‐
       erate  a too long preempt-off critical section. Each operation is
       also limited a length of 4096 bytes. A maximum limit of 16 opera‐
       tions per cpu_opv syscall invocation is enforced.

       If the thread is not running on the requested CPU, it is migrated
       to it.

       The layout of struct cpu_opv is as follows:

       Fields

           op Operation of type enum cpu_op_type to perform. This opera‐
              tion type selects the associated "u" union field.

           len
              Length (in bytes) of data to consider for this operation.

           u.compare_op
              For a CPU_COMPARE_EQ_OP , and CPU_COMPARE_NE_OP , contains
              the  a  and  b pointers to compare. The expect_fault_a and
              expect_fault_b fields indicate whether a page fault should
              be expected for each of those pointers.  If expect_fault_a
              , or expect_fault_b is set, EAGAIN is returned  on  fault,
              else  EFAULT is returned. The len field is allowed to take
              values from 0 to 4096 for comparison operations.

           u.memcpy_op
              For a CPU_MEMCPY_OP , contains the dst and  src  pointers,
              expressing  a  copy  of src into dst. The expect_fault_dst
              and expect_fault_src fields indicate whether a page  fault
              should  be  expected  for  each  of  those  pointers.   If
              expect_fault_dst , or expect_fault_src is set,  EAGAIN  is
              returned  on fault, else EFAULT is returned. The len field
              is allowed to take values from 0 to 4096 for memcpy opera‐
              tions.

           u.arithmetic_op
              For   a  CPU_ADD_OP  ,  contains  the  p  ,  count  ,  and
              expect_fault_p fields, which are respectively a pointer to
              the  memory location to increment, the 64-bit signed inte‐
              ger value to add, and  whether  a  page  fault  should  be
              expected  for  p  .   If  expect_fault_p is set, EAGAIN is
              returned on fault, else EFAULT is returned. The len  field
              is  allowed  to take values of 1, 2, 4, 8 bytes for arith‐
              metic operations.

           u.bitwise_op
              For a CPU_OR_OP , CPU_AND_OP , and CPU_XOR_OP  ,  contains
              the  p  ,  mask  ,  and  expect_fault_p  fields, which are
              respectively a pointer to the memory location  to  target,
              the  mask  to  apply,  and  whether a page fault should be
              expected for p .  If  expect_fault_p  is  set,  EAGAIN  is
              returned  on fault, else EFAULT is returned. The len field
              is allowed to take values of 1, 2, 4, 8 bytes for  bitwise
              operations.

           u.shift_op
              For a CPU_LSHIFT_OP , and CPU_RSHIFT_OP , contains the p ,
              bits , and expect_fault_p fields, which are respectively a
              pointer  to  the  memory location to target, the number of
              bits to shift either left of right,  and  whether  a  page
              fault  should  be  expected  for p .  If expect_fault_p is
              set, EAGAIN is returned on fault, else EFAULT is returned.
              The  len  field  is  allowed  to take values of 1, 2, 4, 8
              bytes for shift operations. The bits field is  allowed  to
              take values between 0 and 63.

       The enum cpu_op_types contains the following operations:

       · CPU_COMPARE_EQ_OP:  Compare  whether  two  memory locations are
         equal,

       · CPU_COMPARE_NE_OP: Compare whether two memory locations differ,

       · CPU_MEMCPY_OP: Copy a source memory location  into  a  destina‐
         tion,

       · CPU_ADD_OP:  Increment  a  target  memory  location  of a given
         count,

       · CPU_OR_OP: Apply a "or" mask to a memory location,

       · CPU_AND_OP: Apply a "and" mask to a memory location,

       · CPU_XOR_OP: Apply a "xor" mask to a memory location,

       · CPU_LSHIFT_OP: Shift a memory location left of a  given  number
         of bits,

       · CPU_RSHIFT_OP:  Shift a memory location right of a given number
         of bits.

       · CPU_MB_OP: Issue a memory barrier.

         All of the operations above provide single-copy atomicity guar‐
         antees  for  word-sized, word-aligned target pointers, for both
         loads and stores.

       The cpuopcnt argument is the number of elements  in  the  cpu_opv
       array. It can take values from 0 to 16.

       The  cpu  argument  is  the  CPU  number  on  which the operation
       sequence needs to be executed.

       The flags argument is expected to be 0.

RETURN VALUE
       A return value of 0 indicates success. On error, -1 is  returned,
       and  errno is set appropriately. If a comparison operation fails,
       execution of the operation vector  is  stopped,  and  the  return
       value is the index after the comparison operation (values between
       1 and 16).

ERRORS
       EAGAIN cpu_opv() system call should be attempted again.

       EINVAL Either flags contains an invalid value, or cpu contains an
              invalid  value  or  a  value  not  allowed  by the current
              thread's allowed cpu mask, or cpuopcnt contains an invalid
              value, or the cpu_opv operation vector contains an invalid
              op value, or the  cpu_opv  operation  vector  contains  an
              invalid  len value, or the cpu_opv operation vector sum of
              len values is too large.

       ENOSYS The cpu_opv() system call is not implemented by this  ker‐
              nel.

       EFAULT cpu_opv  is  an  invalid  address,  or a pointer contained
              within an  operation  is  invalid  (and  a  fault  is  not
              expected for that pointer).

VERSIONS
       The cpu_opv() system call was added in Linux 4.X (TODO).

CONFORMING TO
       cpu_opv() is Linux-specific.

SEE ALSO
       membarrier(2), rseq(2)

Linux                          2017-11-10                     CPU_OPV(2)
---
 MAINTAINERS                  |    7 +
 include/linux/syscalls.h     |    3 +
 include/uapi/linux/cpu_opv.h |  114 +++++
 init/Kconfig                 |   14 +
 kernel/Makefile              |    1 +
 kernel/cpu_opv.c             | 1060 ++++++++++++++++++++++++++++++++++++++++++
 kernel/sys_ni.c              |    1 +
 7 files changed, 1200 insertions(+)
 create mode 100644 include/uapi/linux/cpu_opv.h
 create mode 100644 kernel/cpu_opv.c

diff --git a/MAINTAINERS b/MAINTAINERS
index b8f6a99005b4..0b4e504f5003 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3728,6 +3728,13 @@ B:	https://bugzilla.kernel.org
 F:	drivers/cpuidle/*
 F:	include/linux/cpuidle.h
 
+CPU NON-PREEMPTIBLE OPERATION VECTOR SUPPORT
+M:	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+L:	linux-kernel@vger.kernel.org
+S:	Supported
+F:	kernel/cpu_opv.c
+F:	include/uapi/linux/cpu_opv.h
+
 CRAMFS FILESYSTEM
 M:	Nicolas Pitre <nico@linaro.org>
 S:	Maintained
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 340650b4ec54..32d289f41f62 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -67,6 +67,7 @@ struct perf_event_attr;
 struct file_handle;
 struct sigaltstack;
 struct rseq;
+struct cpu_op;
 union bpf_attr;
 
 #include <linux/types.h>
@@ -943,5 +944,7 @@ asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
 			  unsigned mask, struct statx __user *buffer);
 asmlinkage long sys_rseq(struct rseq __user *rseq, uint32_t rseq_len,
 			int flags, uint32_t sig);
+asmlinkage long sys_cpu_opv(struct cpu_op __user *ucpuopv, int cpuopcnt,
+			int cpu, int flags);
 
 #endif
diff --git a/include/uapi/linux/cpu_opv.h b/include/uapi/linux/cpu_opv.h
new file mode 100644
index 000000000000..ccd8167fc189
--- /dev/null
+++ b/include/uapi/linux/cpu_opv.h
@@ -0,0 +1,114 @@
+#ifndef _UAPI_LINUX_CPU_OPV_H
+#define _UAPI_LINUX_CPU_OPV_H
+
+/*
+ * linux/cpu_opv.h
+ *
+ * CPU preempt-off operation vector system call API
+ *
+ * Copyright (c) 2017 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifdef __KERNEL__
+# include <linux/types.h>
+#else
+# include <stdint.h>
+#endif
+
+#include <linux/types_32_64.h>
+
+#define CPU_OP_VEC_LEN_MAX		16
+#define CPU_OP_ARG_LEN_MAX		24
+/* Maximum data len per operation. */
+#define CPU_OP_DATA_LEN_MAX		4096
+/*
+ * Maximum data len for overall vector. Restrict the amount of user-space
+ * data touched by the kernel in non-preemptible context, so it does not
+ * introduce long scheduler latencies.
+ * This allows one copy of up to 4096 bytes, and 15 operations touching 8
+ * bytes each.
+ * This limit is applied to the sum of length specified for all operations
+ * in a vector.
+ */
+#define CPU_OP_MEMCPY_EXPECT_LEN	4096
+#define CPU_OP_EXPECT_LEN		8
+#define CPU_OP_VEC_DATA_LEN_MAX		\
+	(CPU_OP_MEMCPY_EXPECT_LEN +	\
+	 (CPU_OP_VEC_LEN_MAX - 1) * CPU_OP_EXPECT_LEN)
+
+enum cpu_op_type {
+	/* compare */
+	CPU_COMPARE_EQ_OP,
+	CPU_COMPARE_NE_OP,
+	/* memcpy */
+	CPU_MEMCPY_OP,
+	/* arithmetic */
+	CPU_ADD_OP,
+	/* bitwise */
+	CPU_OR_OP,
+	CPU_AND_OP,
+	CPU_XOR_OP,
+	/* shift */
+	CPU_LSHIFT_OP,
+	CPU_RSHIFT_OP,
+	/* memory barrier */
+	CPU_MB_OP,
+};
+
+/* Vector of operations to perform. Limited to 16. */
+struct cpu_op {
+	/* enum cpu_op_type. */
+	int32_t op;
+	/* data length, in bytes. */
+	uint32_t len;
+	union {
+		struct {
+			LINUX_FIELD_u32_u64(a);
+			LINUX_FIELD_u32_u64(b);
+			uint8_t expect_fault_a;
+			uint8_t expect_fault_b;
+		} compare_op;
+		struct {
+			LINUX_FIELD_u32_u64(dst);
+			LINUX_FIELD_u32_u64(src);
+			uint8_t expect_fault_dst;
+			uint8_t expect_fault_src;
+		} memcpy_op;
+		struct {
+			LINUX_FIELD_u32_u64(p);
+			int64_t count;
+			uint8_t expect_fault_p;
+		} arithmetic_op;
+		struct {
+			LINUX_FIELD_u32_u64(p);
+			uint64_t mask;
+			uint8_t expect_fault_p;
+		} bitwise_op;
+		struct {
+			LINUX_FIELD_u32_u64(p);
+			uint32_t bits;
+			uint8_t expect_fault_p;
+		} shift_op;
+		char __padding[CPU_OP_ARG_LEN_MAX];
+	} u;
+};
+
+#endif /* _UAPI_LINUX_CPU_OPV_H */
diff --git a/init/Kconfig b/init/Kconfig
index 88e36395390f..acf678e2363c 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1404,6 +1404,7 @@ config RSEQ
 	bool "Enable rseq() system call" if EXPERT
 	default y
 	depends on HAVE_RSEQ
+	select CPU_OPV
 	select MEMBARRIER
 	help
 	  Enable the restartable sequences system call. It provides a
@@ -1414,6 +1415,19 @@ config RSEQ
 
 	  If unsure, say Y.
 
+config CPU_OPV
+	bool "Enable cpu_opv() system call" if EXPERT
+	default y
+	help
+	  Enable the CPU preempt-off operation vector system call.
+	  It allows user-space to perform a sequence of operations on
+	  per-cpu data with preemption disabled. Useful as
+	  single-stepping fall-back for restartable sequences, and for
+	  performing more complex operations on per-cpu data that would
+	  not be otherwise possible to do with restartable sequences.
+
+	  If unsure, say Y.
+
 config EMBEDDED
 	bool "Embedded system"
 	option allnoconfig_y
diff --git a/kernel/Makefile b/kernel/Makefile
index 3574669dafd9..cac8855196ff 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -113,6 +113,7 @@ obj-$(CONFIG_TORTURE_TEST) += torture.o
 
 obj-$(CONFIG_HAS_IOMEM) += memremap.o
 obj-$(CONFIG_RSEQ) += rseq.o
+obj-$(CONFIG_CPU_OPV) += cpu_opv.o
 
 $(obj)/configs.o: $(obj)/config_data.h
 
diff --git a/kernel/cpu_opv.c b/kernel/cpu_opv.c
new file mode 100644
index 000000000000..1b921ae35088
--- /dev/null
+++ b/kernel/cpu_opv.c
@@ -0,0 +1,1060 @@
+/*
+ * CPU preempt-off operation vector system call
+ *
+ * It allows user-space to perform a sequence of operations on per-cpu
+ * data with preemption disabled. Useful as single-stepping fall-back
+ * for restartable sequences, and for performing more complex operations
+ * on per-cpu data that would not be otherwise possible to do with
+ * restartable sequences.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Copyright (C) 2017, EfficiOS Inc.,
+ * Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ */
+
+#include <linux/sched.h>
+#include <linux/uaccess.h>
+#include <linux/syscalls.h>
+#include <linux/cpu_opv.h>
+#include <linux/types.h>
+#include <linux/mutex.h>
+#include <linux/pagemap.h>
+#include <linux/mm.h>
+#include <asm/ptrace.h>
+#include <asm/byteorder.h>
+
+#include "sched/sched.h"
+
+/*
+ * Typical invocation of cpu_opv need few pages. Keep struct page
+ * pointers in an array on the stack of the cpu_opv system call up to
+ * this limit, beyond which the array is dynamically allocated.
+ */
+#define NR_PAGE_PTRS_ON_STACK		8
+
+/* Maximum pages per op. */
+#define CPU_OP_MAX_PAGES		4
+
+/* Temporary on-stack buffer size for memcpy and compare operations. */
+#define TMP_BUFLEN			64
+
+union op_fn_data {
+	uint8_t _u8;
+	uint16_t _u16;
+	uint32_t _u32;
+	uint64_t _u64;
+#if (BITS_PER_LONG < 64)
+	uint32_t _u64_split[2];
+#endif
+};
+
+struct cpu_opv_page_ptrs {
+	struct page **pages;
+	size_t nr;
+	bool is_kmalloc;
+};
+
+typedef int (*op_fn_t)(union op_fn_data *data, uint64_t v, uint32_t len);
+
+/*
+ * Provide mutual exclution for threads executing a cpu_opv against an
+ * offline CPU.
+ */
+static DEFINE_MUTEX(cpu_opv_offline_lock);
+
+/*
+ * The cpu_opv system call executes a vector of operations on behalf of
+ * user-space on a specific CPU with preemption disabled. It is inspired
+ * by readv() and writev() system calls which take a "struct iovec"
+ * array as argument.
+ *
+ * The operations available are: comparison, memcpy, add, or, and, xor,
+ * left shift, right shift, and memory barrier. The system call receives
+ * a CPU number from user-space as argument, which is the CPU on which
+ * those operations need to be performed.  All pointers in the ops must
+ * have been set up to point to the per CPU memory of the CPU on which
+ * the operations should be executed. The "comparison" operation can be
+ * used to check that the data used in the preparation step did not
+ * change between preparation of system call inputs and operation
+ * execution within the preempt-off critical section.
+ *
+ * The reason why we require all pointer offsets to be calculated by
+ * user-space beforehand is because we need to use get_user_pages_fast()
+ * to first pin all pages touched by each operation. This takes care of
+ * faulting-in the pages. Then, preemption is disabled, and the
+ * operations are performed atomically with respect to other thread
+ * execution on that CPU, without generating any page fault.
+ *
+ * An overall maximum of 4216 bytes in enforced on the sum of operation
+ * length within an operation vector, so user-space cannot generate a
+ * too long preempt-off critical section (cache cold critical section
+ * duration measured as 4.7µs on x86-64). Each operation is also limited
+ * a length of 4096 bytes, meaning that an operation can touch a
+ * maximum of 4 pages (memcpy: 2 pages for source, 2 pages for
+ * destination if addresses are not aligned on page boundaries).
+ *
+ * If the thread is not running on the requested CPU, it is migrated to
+ * it.
+ */
+
+static unsigned long cpu_op_range_nr_pages(unsigned long addr,
+					   unsigned long len)
+{
+	return ((addr + len - 1) >> PAGE_SHIFT) - (addr >> PAGE_SHIFT) + 1;
+}
+
+static int cpu_op_count_pages(unsigned long addr, unsigned long len)
+{
+	unsigned long nr_pages;
+
+	if (!len)
+		return 0;
+	nr_pages = cpu_op_range_nr_pages(addr, len);
+	if (nr_pages > 2) {
+		WARN_ON(1);
+		return -EINVAL;
+	}
+	return nr_pages;
+}
+
+static struct page **cpu_op_alloc_pages_vector(int nr_pages)
+{
+	return kzalloc(nr_pages * sizeof(struct page *), GFP_KERNEL);
+}
+
+/*
+ * Check operation types and length parameters. Count number of pages.
+ */
+static int cpu_opv_check_op(struct cpu_op *op, int *nr_pages, uint32_t *sum)
+{
+	int ret;
+
+	switch (op->op) {
+	case CPU_MB_OP:
+		break;
+	default:
+		*sum += op->len;
+	}
+
+	/* Validate inputs. */
+	switch (op->op) {
+	case CPU_COMPARE_EQ_OP:
+	case CPU_COMPARE_NE_OP:
+	case CPU_MEMCPY_OP:
+		if (op->len > CPU_OP_DATA_LEN_MAX)
+			return -EINVAL;
+		break;
+	case CPU_ADD_OP:
+	case CPU_OR_OP:
+	case CPU_AND_OP:
+	case CPU_XOR_OP:
+		switch (op->len) {
+		case 1:
+		case 2:
+		case 4:
+		case 8:
+			break;
+		default:
+			return -EINVAL;
+		}
+		break;
+	case CPU_LSHIFT_OP:
+	case CPU_RSHIFT_OP:
+		switch (op->len) {
+		case 1:
+			if (op->u.shift_op.bits > 7)
+				return -EINVAL;
+			break;
+		case 2:
+			if (op->u.shift_op.bits > 15)
+				return -EINVAL;
+			break;
+		case 4:
+			if (op->u.shift_op.bits > 31)
+				return -EINVAL;
+			break;
+		case 8:
+			if (op->u.shift_op.bits > 63)
+				return -EINVAL;
+			break;
+		default:
+			return -EINVAL;
+		}
+		break;
+	case CPU_MB_OP:
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	/* Count pages. */
+	switch (op->op) {
+	case CPU_COMPARE_EQ_OP:
+	case CPU_COMPARE_NE_OP:
+		ret = cpu_op_count_pages(op->u.compare_op.a, op->len);
+		if (ret < 0)
+			return ret;
+		*nr_pages += ret;
+		ret = cpu_op_count_pages(op->u.compare_op.b, op->len);
+		if (ret < 0)
+			return ret;
+		*nr_pages += ret;
+		break;
+	case CPU_MEMCPY_OP:
+		ret = cpu_op_count_pages(op->u.memcpy_op.dst, op->len);
+		if (ret < 0)
+			return ret;
+		*nr_pages += ret;
+		ret = cpu_op_count_pages(op->u.memcpy_op.src, op->len);
+		if (ret < 0)
+			return ret;
+		*nr_pages += ret;
+		break;
+	case CPU_ADD_OP:
+		ret = cpu_op_count_pages(op->u.arithmetic_op.p, op->len);
+		if (ret < 0)
+			return ret;
+		*nr_pages += ret;
+		break;
+	case CPU_OR_OP:
+	case CPU_AND_OP:
+	case CPU_XOR_OP:
+		ret = cpu_op_count_pages(op->u.bitwise_op.p, op->len);
+		if (ret < 0)
+			return ret;
+		*nr_pages += ret;
+		break;
+	case CPU_LSHIFT_OP:
+	case CPU_RSHIFT_OP:
+		ret = cpu_op_count_pages(op->u.shift_op.p, op->len);
+		if (ret < 0)
+			return ret;
+		*nr_pages += ret;
+		break;
+	case CPU_MB_OP:
+		break;
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/*
+ * Check operation types and length parameters. Count number of pages.
+ */
+static int cpu_opv_check(struct cpu_op *cpuopv, int cpuopcnt, int *nr_pages)
+{
+	uint32_t sum = 0;
+	int i, ret;
+
+	for (i = 0; i < cpuopcnt; i++) {
+		ret = cpu_opv_check_op(&cpuopv[i], nr_pages, &sum);
+		if (ret)
+			return ret;
+	}
+	if (sum > CPU_OP_VEC_DATA_LEN_MAX)
+		return -EINVAL;
+	return 0;
+}
+
+/**
+ * fault_in_user_writeable() - Fault in user address and verify RW access
+ * @uaddr:	pointer to faulting user space address
+ */
+static int fault_in_user_writeable(unsigned long uaddr)
+{
+	struct mm_struct *mm = current->mm;
+	int ret;
+
+	down_read(&mm->mmap_sem);
+	ret = fixup_user_fault(current, mm, uaddr,
+			       FAULT_FLAG_WRITE, NULL);
+	up_read(&mm->mmap_sem);
+
+	return ret < 0 ? ret : 0;
+}
+
+/*
+ * Refusing device pages, the zero page, pages in the gate area, and
+ * special mappings. Handle page swapping through retry. Fault in the page if
+ * needed.
+ */
+static int cpu_op_check_page(struct page *page, unsigned long addr)
+{
+	struct address_space *mapping;
+
+	if (is_zone_device_page(page))
+		return -EFAULT;
+
+	/*
+	 * The page lock protects many things but in this context the page
+	 * lock stabilizes mapping, prevents inode freeing in the shared
+	 * file-backed region case and guards against movement to swap
+	 * cache.
+	 *
+	 * Strictly speaking the page lock is not needed in all cases being
+	 * considered here and page lock forces unnecessarily serialization
+	 * From this point on, mapping will be re-verified if necessary and
+	 * page lock will be acquired only if it is unavoidable
+	 *
+	 * Mapping checks require the head page for any compound page so the
+	 * head page and mapping is looked up now.
+	 */
+	page = compound_head(page);
+	mapping = READ_ONCE(page->mapping);
+
+	/*
+	 * If page->mapping is NULL, then it cannot be a PageAnon
+	 * page; but it might be the ZERO_PAGE or in the gate area or
+	 * in a special mapping (all cases which we are happy to fail);
+	 * or it may have been a good file page when get_user_pages_fast
+	 * found it, but truncated or holepunched or subjected to
+	 * invalidate_complete_page2 before we got the page lock (also
+	 * cases which we are happy to fail).  And we hold a reference,
+	 * so refcount care in invalidate_complete_page's remove_mapping
+	 * prevents drop_caches from setting mapping to NULL beneath us.
+	 *
+	 * The case we do have to guard against is when memory pressure made
+	 * shmem_writepage move it from filecache to swapcache beneath us:
+	 * an unlikely race, but we do need to retry for page->mapping.
+	 */
+	if (!mapping) {
+		int shmem_swizzled, ret;
+
+		/*
+		 * Check again with page lock held to guard against
+		 * memory pressure making shmem_writepage move the page
+		 * from filecache to swapcache.
+		 */
+		lock_page(page);
+		shmem_swizzled = PageSwapCache(page) || page->mapping;
+		unlock_page(page);
+		if (shmem_swizzled)
+			return -EAGAIN;
+		/*
+		 * Page needs to be faulted-in. If it succeeds, return
+		 * -EAGAIN to retry.
+		 */
+		ret = fault_in_user_writeable(addr);
+		if (!ret)
+			return -EAGAIN;
+		return ret;
+	}
+	return 0;
+}
+
+static int cpu_op_check_pages(struct page **pages,
+			      unsigned long nr_pages,
+			      unsigned long addr)
+{
+	unsigned long i;
+
+	for (i = 0; i < nr_pages; i++) {
+		int ret;
+
+		ret = cpu_op_check_page(pages[i], addr);
+		if (ret)
+			return ret;
+		addr += PAGE_SIZE;
+	}
+	return 0;
+}
+
+static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
+			    struct cpu_opv_page_ptrs *page_ptrs,
+			    int write)
+{
+	struct page *pages[2];
+	int ret, nr_pages, nr_put_pages, n;
+
+	nr_pages = cpu_op_count_pages(addr, len);
+	if (!nr_pages)
+		return 0;
+again:
+	ret = get_user_pages_fast(addr, nr_pages, write, pages);
+	if (ret < nr_pages) {
+		if (ret >= 0) {
+			nr_put_pages = ret;
+			ret = -EFAULT;
+		} else {
+			nr_put_pages = 0;
+		}
+		goto error;
+	}
+	ret = cpu_op_check_pages(pages, nr_pages, addr);
+	if (ret) {
+		nr_put_pages = nr_pages;
+		goto error;
+	}
+	for (n = 0; n < nr_pages; n++)
+		page_ptrs->pages[page_ptrs->nr++] = pages[n];
+	return 0;
+
+error:
+	for (n = 0; n < nr_put_pages; n++)
+		put_page(pages[n]);
+	/*
+	 * Retry if a page has been faulted in, or is being swapped in.
+	 */
+	if (ret == -EAGAIN)
+		goto again;
+	return ret;
+}
+
+static int cpu_opv_pin_pages_op(struct cpu_op *op,
+				struct cpu_opv_page_ptrs *page_ptrs,
+				bool *expect_fault)
+{
+	int ret;
+
+	switch (op->op) {
+	case CPU_COMPARE_EQ_OP:
+	case CPU_COMPARE_NE_OP:
+		ret = -EFAULT;
+		*expect_fault = op->u.compare_op.expect_fault_a;
+		if (!access_ok(VERIFY_READ,
+			       (void __user *)op->u.compare_op.a,
+			       op->len))
+			return ret;
+		ret = cpu_op_pin_pages(op->u.compare_op.a, op->len,
+				       page_ptrs, 0);
+		if (ret)
+			return ret;
+		ret = -EFAULT;
+		*expect_fault = op->u.compare_op.expect_fault_b;
+		if (!access_ok(VERIFY_READ,
+			       (void __user *)op->u.compare_op.b,
+			       op->len))
+			return ret;
+		ret = cpu_op_pin_pages(op->u.compare_op.b, op->len,
+				       page_ptrs, 0);
+		if (ret)
+			return ret;
+		break;
+	case CPU_MEMCPY_OP:
+		ret = -EFAULT;
+		*expect_fault = op->u.memcpy_op.expect_fault_dst;
+		if (!access_ok(VERIFY_WRITE,
+			       (void __user *)op->u.memcpy_op.dst,
+			       op->len))
+			return ret;
+		ret = cpu_op_pin_pages(op->u.memcpy_op.dst, op->len,
+				       page_ptrs, 1);
+		if (ret)
+			return ret;
+		ret = -EFAULT;
+		*expect_fault = op->u.memcpy_op.expect_fault_src;
+		if (!access_ok(VERIFY_READ,
+			       (void __user *)op->u.memcpy_op.src,
+			       op->len))
+			return ret;
+		ret = cpu_op_pin_pages(op->u.memcpy_op.src, op->len,
+				       page_ptrs, 0);
+		if (ret)
+			return ret;
+		break;
+	case CPU_ADD_OP:
+		ret = -EFAULT;
+		*expect_fault = op->u.arithmetic_op.expect_fault_p;
+		if (!access_ok(VERIFY_WRITE,
+			       (void __user *)op->u.arithmetic_op.p,
+			       op->len))
+			return ret;
+		ret = cpu_op_pin_pages(op->u.arithmetic_op.p, op->len,
+				       page_ptrs, 1);
+		if (ret)
+			return ret;
+		break;
+	case CPU_OR_OP:
+	case CPU_AND_OP:
+	case CPU_XOR_OP:
+		ret = -EFAULT;
+		*expect_fault = op->u.bitwise_op.expect_fault_p;
+		if (!access_ok(VERIFY_WRITE,
+			       (void __user *)op->u.bitwise_op.p,
+			       op->len))
+			return ret;
+		ret = cpu_op_pin_pages(op->u.bitwise_op.p, op->len,
+				       page_ptrs, 1);
+		if (ret)
+			return ret;
+		break;
+	case CPU_LSHIFT_OP:
+	case CPU_RSHIFT_OP:
+		ret = -EFAULT;
+		*expect_fault = op->u.shift_op.expect_fault_p;
+		if (!access_ok(VERIFY_WRITE,
+			       (void __user *)op->u.shift_op.p,
+			       op->len))
+			return ret;
+		ret = cpu_op_pin_pages(op->u.shift_op.p, op->len,
+				       page_ptrs, 1);
+		if (ret)
+			return ret;
+		break;
+	case CPU_MB_OP:
+		break;
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
+			     struct cpu_opv_page_ptrs *page_ptrs)
+{
+	int ret, i;
+	bool expect_fault = false;
+
+	/* Check access, pin pages. */
+	for (i = 0; i < cpuopcnt; i++) {
+		ret = cpu_opv_pin_pages_op(&cpuop[i], page_ptrs,
+				&expect_fault);
+		if (ret)
+			goto error;
+	}
+	return 0;
+
+error:
+	/*
+	 * If faulting access is expected, return EAGAIN to user-space.
+	 * It allows user-space to distinguish between a fault caused by
+	 * an access which is expect to fault (e.g. due to concurrent
+	 * unmapping of underlying memory) from an unexpected fault from
+	 * which a retry would not recover.
+	 */
+	if (ret == -EFAULT && expect_fault)
+		return -EAGAIN;
+	return ret;
+}
+
+static int __op_get_user(union op_fn_data *data, void __user *p, size_t len)
+{
+	switch (len) {
+	case 1:	return __get_user(data->_u8, (uint8_t __user *)p);
+	case 2:	return __get_user(data->_u16, (uint16_t __user *)p);
+	case 4:	return __get_user(data->_u32, (uint32_t __user *)p);
+	case 8:
+#if (BITS_PER_LONG == 64)
+		return __get_user(data->_u64, (uint64_t __user *)p);
+#else
+	{
+		int ret;
+
+		ret = __get_user(data->_u64_split[0],
+				 (uint32_t __user *)p);
+		if (ret)
+			return ret;
+		return __get_user(data->_u64_split[1],
+				  (uint32_t __user *)p + 1);
+	}
+#endif
+	default:
+		return -EINVAL;
+	}
+}
+
+static int __op_put_user(union op_fn_data *data, void __user *p, size_t len)
+{
+	switch (len) {
+	case 1:	return __put_user(data->_u8, (uint8_t __user *)p);
+	case 2:	return __put_user(data->_u16, (uint16_t __user *)p);
+	case 4:	return __put_user(data->_u32, (uint32_t __user *)p);
+	case 8:
+#if (BITS_PER_LONG == 64)
+		return __put_user(data->_u64, (uint64_t __user *)p);
+#else
+	{
+		int ret;
+
+		ret = __put_user(data->_u64_split[0],
+				 (uint32_t __user *)p);
+		if (ret)
+			return ret;
+		return __put_user(data->_u64_split[1],
+				  (uint32_t __user *)p + 1);
+	}
+#endif
+	default:
+		return -EINVAL;
+	}
+}
+
+/* Return 0 if same, > 0 if different, < 0 on error. */
+static int do_cpu_op_compare_iter(void __user *a, void __user *b, uint32_t len)
+{
+	char bufa[TMP_BUFLEN], bufb[TMP_BUFLEN];
+	uint32_t compared = 0;
+
+	while (compared != len) {
+		unsigned long to_compare;
+
+		to_compare = min_t(uint32_t, TMP_BUFLEN, len - compared);
+		if (__copy_from_user_inatomic(bufa, a + compared, to_compare))
+			return -EFAULT;
+		if (__copy_from_user_inatomic(bufb, b + compared, to_compare))
+			return -EFAULT;
+		if (memcmp(bufa, bufb, to_compare))
+			return 1;
+		compared += to_compare;
+	}
+	return 0;
+}
+
+/* Return 0 if same, > 0 if different, < 0 on error. */
+static int do_cpu_op_compare(unsigned long _a, unsigned long _b, uint32_t len)
+{
+	void __user *a = (void __user *)_a;
+	void __user *b = (void __user *)_b;
+	int ret = -EFAULT;
+	union op_fn_data tmp[2];
+
+	switch (len) {
+	case 1:
+	case 2:
+	case 4:
+	case 8:
+		break;
+	default:
+		return do_cpu_op_compare_iter(a, b, len);
+	}
+
+	pagefault_disable();
+
+	if (__op_get_user(&tmp[0], a, len))
+		goto end;
+	if (__op_get_user(&tmp[1], b, len))
+		goto end;
+
+	switch (len) {
+	case 1:
+		ret = !!(tmp[0]._u8 != tmp[1]._u8);
+		break;
+	case 2:
+		ret = !!(tmp[0]._u16 != tmp[1]._u16);
+		break;
+	case 4:
+		ret = !!(tmp[0]._u32 != tmp[1]._u32);
+		break;
+	case 8:
+		ret = !!(tmp[0]._u64 != tmp[1]._u64);
+		break;
+	default:
+		break;
+	}
+end:
+	pagefault_enable();
+	return ret;
+}
+
+/* Return 0 on success, < 0 on error. */
+static int do_cpu_op_memcpy_iter(void __user *dst, void __user *src,
+				 uint32_t len)
+{
+	char buf[TMP_BUFLEN];
+	uint32_t copied = 0;
+
+	while (copied != len) {
+		unsigned long to_copy;
+
+		to_copy = min_t(uint32_t, TMP_BUFLEN, len - copied);
+		if (__copy_from_user_inatomic(buf, src + copied, to_copy))
+			return -EFAULT;
+		if (__copy_to_user_inatomic(dst + copied, buf, to_copy))
+			return -EFAULT;
+		copied += to_copy;
+	}
+	return 0;
+}
+
+/* Return 0 on success, < 0 on error. */
+static int do_cpu_op_memcpy(unsigned long _dst, unsigned long _src,
+			    uint32_t len)
+{
+	void __user *dst = (void __user *)_dst;
+	void __user *src = (void __user *)_src;
+	int ret = -EFAULT;
+	union op_fn_data tmp;
+
+	switch (len) {
+	case 1:
+	case 2:
+	case 4:
+	case 8:
+		break;
+	default:
+		return do_cpu_op_memcpy_iter(dst, src, len);
+	}
+
+	pagefault_disable();
+
+	if (__op_get_user(&tmp, src, len))
+		goto end;
+	if (__op_put_user(&tmp, dst, len))
+		goto end;
+	ret = 0;
+end:
+	pagefault_enable();
+	return ret;
+}
+
+static int op_add_fn(union op_fn_data *data, uint64_t count, uint32_t len)
+{
+	int ret = 0;
+
+	switch (len) {
+	case 1:
+		data->_u8 += (uint8_t)count;
+		break;
+	case 2:
+		data->_u16 += (uint16_t)count;
+		break;
+	case 4:
+		data->_u32 += (uint32_t)count;
+		break;
+	case 8:
+		data->_u64 += (uint64_t)count;
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	return ret;
+}
+
+static int op_or_fn(union op_fn_data *data, uint64_t mask, uint32_t len)
+{
+	int ret = 0;
+
+	switch (len) {
+	case 1:
+		data->_u8 |= (uint8_t)mask;
+		break;
+	case 2:
+		data->_u16 |= (uint16_t)mask;
+		break;
+	case 4:
+		data->_u32 |= (uint32_t)mask;
+		break;
+	case 8:
+		data->_u64 |= (uint64_t)mask;
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	return ret;
+}
+
+static int op_and_fn(union op_fn_data *data, uint64_t mask, uint32_t len)
+{
+	int ret = 0;
+
+	switch (len) {
+	case 1:
+		data->_u8 &= (uint8_t)mask;
+		break;
+	case 2:
+		data->_u16 &= (uint16_t)mask;
+		break;
+	case 4:
+		data->_u32 &= (uint32_t)mask;
+		break;
+	case 8:
+		data->_u64 &= (uint64_t)mask;
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	return ret;
+}
+
+static int op_xor_fn(union op_fn_data *data, uint64_t mask, uint32_t len)
+{
+	int ret = 0;
+
+	switch (len) {
+	case 1:
+		data->_u8 ^= (uint8_t)mask;
+		break;
+	case 2:
+		data->_u16 ^= (uint16_t)mask;
+		break;
+	case 4:
+		data->_u32 ^= (uint32_t)mask;
+		break;
+	case 8:
+		data->_u64 ^= (uint64_t)mask;
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	return ret;
+}
+
+static int op_lshift_fn(union op_fn_data *data, uint64_t bits, uint32_t len)
+{
+	int ret = 0;
+
+	switch (len) {
+	case 1:
+		data->_u8 <<= (uint8_t)bits;
+		break;
+	case 2:
+		data->_u16 <<= (uint16_t)bits;
+		break;
+	case 4:
+		data->_u32 <<= (uint32_t)bits;
+		break;
+	case 8:
+		data->_u64 <<= (uint64_t)bits;
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	return ret;
+}
+
+static int op_rshift_fn(union op_fn_data *data, uint64_t bits, uint32_t len)
+{
+	int ret = 0;
+
+	switch (len) {
+	case 1:
+		data->_u8 >>= (uint8_t)bits;
+		break;
+	case 2:
+		data->_u16 >>= (uint16_t)bits;
+		break;
+	case 4:
+		data->_u32 >>= (uint32_t)bits;
+		break;
+	case 8:
+		data->_u64 >>= (uint64_t)bits;
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	return ret;
+}
+
+/* Return 0 on success, < 0 on error. */
+static int do_cpu_op_fn(op_fn_t op_fn, unsigned long _p, uint64_t v,
+			uint32_t len)
+{
+	union op_fn_data tmp;
+	void __user *p = (void __user *)_p;
+	int ret = -EFAULT;
+
+	pagefault_disable();
+	if (__op_get_user(&tmp, p, len))
+		goto end;
+	if (op_fn(&tmp, v, len))
+		goto end;
+	if (__op_put_user(&tmp, p, len))
+		goto end;
+	ret = 0;
+end:
+	pagefault_enable();
+	return ret;
+}
+
+/*
+ * Return negative value on error, positive value if comparison
+ * fails, 0 on success.
+ */
+static int __do_cpu_opv_op(struct cpu_op *op)
+{
+	int ret;
+
+	/* Guarantee a compiler barrier between each operation. */
+	barrier();
+
+	switch (op->op) {
+	case CPU_COMPARE_EQ_OP:
+		ret = do_cpu_op_compare(op->u.compare_op.a,
+					op->u.compare_op.b,
+					op->len);
+		if (ret)
+			return ret;
+		break;
+	case CPU_COMPARE_NE_OP:
+		ret = do_cpu_op_compare(op->u.compare_op.a,
+					op->u.compare_op.b,
+					op->len);
+		if (ret < 0)
+			return ret;
+		/*
+		 * Stop execution, return positive value if comparison
+		 * is identical.
+		 */
+		if (ret == 0)
+			return 1;
+		break;
+	case CPU_MEMCPY_OP:
+		ret = do_cpu_op_memcpy(op->u.memcpy_op.dst,
+				       op->u.memcpy_op.src,
+				       op->len);
+		if (ret)
+			return ret;
+		break;
+	case CPU_ADD_OP:
+		ret = do_cpu_op_fn(op_add_fn, op->u.arithmetic_op.p,
+				   op->u.arithmetic_op.count, op->len);
+		if (ret)
+			return ret;
+		break;
+	case CPU_OR_OP:
+		ret = do_cpu_op_fn(op_or_fn, op->u.bitwise_op.p,
+				   op->u.bitwise_op.mask, op->len);
+		if (ret)
+			return ret;
+		break;
+	case CPU_AND_OP:
+		ret = do_cpu_op_fn(op_and_fn, op->u.bitwise_op.p,
+				   op->u.bitwise_op.mask, op->len);
+		if (ret)
+			return ret;
+		break;
+	case CPU_XOR_OP:
+		ret = do_cpu_op_fn(op_xor_fn, op->u.bitwise_op.p,
+				   op->u.bitwise_op.mask, op->len);
+		if (ret)
+			return ret;
+		break;
+	case CPU_LSHIFT_OP:
+		ret = do_cpu_op_fn(op_lshift_fn, op->u.shift_op.p,
+				   op->u.shift_op.bits, op->len);
+		if (ret)
+			return ret;
+		break;
+	case CPU_RSHIFT_OP:
+		ret = do_cpu_op_fn(op_rshift_fn, op->u.shift_op.p,
+				   op->u.shift_op.bits, op->len);
+		if (ret)
+			return ret;
+		break;
+	case CPU_MB_OP:
+		/* Memory barrier provided by this operation. */
+		smp_mb();
+		break;
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int __do_cpu_opv(struct cpu_op *cpuop, int cpuopcnt)
+{
+	int i, ret;
+
+	for (i = 0; i < cpuopcnt; i++) {
+		ret = __do_cpu_opv_op(&cpuop[i]);
+		/* If comparison fails, stop execution and return index + 1. */
+		if (ret > 0)
+			return i + 1;
+		/* On error, stop execution. */
+		if (ret < 0)
+			return ret;
+	}
+	return 0;
+}
+
+static int do_cpu_opv(struct cpu_op *cpuop, int cpuopcnt, int cpu)
+{
+	int ret;
+
+retry:
+	if (cpu != raw_smp_processor_id()) {
+		ret = push_task_to_cpu(current, cpu);
+		if (ret)
+			goto check_online;
+	}
+	preempt_disable();
+	if (cpu != smp_processor_id()) {
+		preempt_enable();
+		goto retry;
+	}
+	ret = __do_cpu_opv(cpuop, cpuopcnt);
+	preempt_enable();
+	return ret;
+
+check_online:
+	if (!cpu_possible(cpu))
+		return -EINVAL;
+	get_online_cpus();
+	if (cpu_online(cpu)) {
+		put_online_cpus();
+		goto retry;
+	}
+	/*
+	 * CPU is offline. Perform operation from the current CPU with
+	 * cpu_online read lock held, preventing that CPU from coming online,
+	 * and with mutex held, providing mutual exclusion against other
+	 * CPUs also finding out about an offline CPU.
+	 */
+	mutex_lock(&cpu_opv_offline_lock);
+	ret = __do_cpu_opv(cpuop, cpuopcnt);
+	mutex_unlock(&cpu_opv_offline_lock);
+	put_online_cpus();
+	return ret;
+}
+
+/*
+ * cpu_opv - execute operation vector on a given CPU with preempt off.
+ *
+ * Userspace should pass current CPU number as parameter.
+ */
+SYSCALL_DEFINE4(cpu_opv, struct cpu_op __user *, ucpuopv, int, cpuopcnt,
+		int, cpu, int, flags)
+{
+	struct cpu_op cpuopv[CPU_OP_VEC_LEN_MAX];
+	struct page *page_ptrs_on_stack[NR_PAGE_PTRS_ON_STACK];
+	struct cpu_opv_page_ptrs page_ptrs = {
+		.pages = page_ptrs_on_stack,
+		.nr = 0,
+		.is_kmalloc = false,
+	};
+	int ret, i, nr_pages = 0;
+
+	if (unlikely(flags))
+		return -EINVAL;
+	if (unlikely(cpu < 0))
+		return -EINVAL;
+	if (cpuopcnt < 0 || cpuopcnt > CPU_OP_VEC_LEN_MAX)
+		return -EINVAL;
+	if (copy_from_user(cpuopv, ucpuopv, cpuopcnt * sizeof(struct cpu_op)))
+		return -EFAULT;
+	ret = cpu_opv_check(cpuopv, cpuopcnt, &nr_pages);
+	if (ret)
+		return ret;
+	if (nr_pages > NR_PAGE_PTRS_ON_STACK) {
+		page_ptrs.pages = cpu_op_alloc_pages_vector(nr_pages);
+		if (!page_ptrs.pages)
+			return -ENOMEM;
+		page_ptrs.is_kmalloc = true;
+	}
+	ret = cpu_opv_pin_pages(cpuopv, cpuopcnt, &page_ptrs);
+	if (ret)
+		goto end;
+	ret = do_cpu_opv(cpuopv, cpuopcnt, cpu);
+end:
+	for (i = 0; i < page_ptrs.nr; i++)
+		put_page(page_ptrs.pages[i]);
+	if (page_ptrs.is_kmalloc)
+		kfree(page_ptrs.pages);
+	return ret;
+}
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index bfa1ee1bf669..59e622296dc3 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -262,3 +262,4 @@ cond_syscall(sys_pkey_free);
 
 /* restartable sequence */
 cond_syscall(sys_rseq);
+cond_syscall(sys_cpu_opv);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 11/22] x86: Wire up cpu_opv system call
  2017-11-21 14:18 ` Mathieu Desnoyers
                   ` (10 preceding siblings ...)
  (?)
@ 2017-11-21 14:18 ` Mathieu Desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul Turner <pjt@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Andrew Hunter <ahh@google.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org
---
 arch/x86/entry/syscalls/syscall_32.tbl | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index ba43ee75e425..afc6988fb2c8 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -392,3 +392,4 @@
 383	i386	statx			sys_statx
 384	i386	arch_prctl		sys_arch_prctl			compat_sys_arch_prctl
 385	i386	rseq			sys_rseq
+386	i386	cpu_opv			sys_cpu_opv
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 3ad03495bbb9..ab5d1f9f9396 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -340,6 +340,7 @@
 331	common	pkey_free		sys_pkey_free
 332	common	statx			sys_statx
 333	common	rseq			sys_rseq
+334	common	cpu_opv			sys_cpu_opv
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 12/22] powerpc: Wire up cpu_opv system call
  2017-11-21 14:18 ` Mathieu Desnoyers
@ 2017-11-21 14:18   ` Mathieu Desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linuxppc-dev

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/include/asm/systbl.h      | 1 +
 arch/powerpc/include/asm/unistd.h      | 2 +-
 arch/powerpc/include/uapi/asm/unistd.h | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h
index 964321a5799c..f9cdb896fbaa 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -390,3 +390,4 @@ COMPAT_SYS_SPU(pwritev2)
 SYSCALL(kexec_file_load)
 SYSCALL(statx)
 SYSCALL(rseq)
+SYSCALL(cpu_opv)
diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h
index e76bd5601ea4..48f80f452e31 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -12,7 +12,7 @@
 #include <uapi/asm/unistd.h>
 
 
-#define NR_syscalls		385
+#define NR_syscalls		386
 
 #define __NR__exit __NR_exit
 
diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h
index b1980fcd56d5..972a7d68c143 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -396,5 +396,6 @@
 #define __NR_kexec_file_load	382
 #define __NR_statx		383
 #define __NR_rseq		384
+#define __NR_cpu_opv		385
 
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 12/22] powerpc: Wire up cpu_opv system call
@ 2017-11-21 14:18   ` Mathieu Desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers,
	Benjamin Herrenschmidt

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/include/asm/systbl.h      | 1 +
 arch/powerpc/include/asm/unistd.h      | 2 +-
 arch/powerpc/include/uapi/asm/unistd.h | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h
index 964321a5799c..f9cdb896fbaa 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -390,3 +390,4 @@ COMPAT_SYS_SPU(pwritev2)
 SYSCALL(kexec_file_load)
 SYSCALL(statx)
 SYSCALL(rseq)
+SYSCALL(cpu_opv)
diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h
index e76bd5601ea4..48f80f452e31 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -12,7 +12,7 @@
 #include <uapi/asm/unistd.h>
 
 
-#define NR_syscalls		385
+#define NR_syscalls		386
 
 #define __NR__exit __NR_exit
 
diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h
index b1980fcd56d5..972a7d68c143 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -396,5 +396,6 @@
 #define __NR_kexec_file_load	382
 #define __NR_statx		383
 #define __NR_rseq		384
+#define __NR_cpu_opv		385
 
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 13/22] arm: Wire up cpu_opv system call
  2017-11-21 14:18 ` Mathieu Desnoyers
                   ` (12 preceding siblings ...)
  (?)
@ 2017-11-21 14:18 ` Mathieu Desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org
---
 arch/arm/tools/syscall.tbl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index fbc74b5fa3ed..213ccfc2c437 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -413,3 +413,4 @@
 396	common	pkey_free		sys_pkey_free
 397	common	statx			sys_statx
 398	common	rseq			sys_rseq
+399	common	cpu_opv			sys_cpu_opv
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 v3 14/22] cpu_opv: selftests: Implement selftests
  2017-11-21 14:18 ` Mathieu Desnoyers
  (?)
@ 2017-11-21 14:18   ` mathieu.desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Shuah Khan,
	linux-kselftest

Implement cpu_opv selftests. It needs to express dependencies on
header files and .so, which require to override the selftests
lib.mk targets. Introduce a new OVERRIDE_TARGETS define for this.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Shuah Khan <shuah@kernel.org>
CC: linux-kselftest@vger.kernel.org
CC: linux-api@vger.kernel.org
---
Changes since v1:

- Expose similar library API as rseq:  Expose library API closely
  matching the rseq APIs, following removal of the event counter from
  the rseq kernel API.
- Update makefile to fix make run_tests dependency on "all".
- Introduce a OVERRIDE_TARGETS.

Changes since v2:

- Test page faults.
---
 MAINTAINERS                                        |    1 +
 tools/testing/selftests/Makefile                   |    1 +
 tools/testing/selftests/cpu-opv/.gitignore         |    1 +
 tools/testing/selftests/cpu-opv/Makefile           |   17 +
 .../testing/selftests/cpu-opv/basic_cpu_opv_test.c | 1189 ++++++++++++++++++++
 tools/testing/selftests/cpu-opv/cpu-op.c           |  348 ++++++
 tools/testing/selftests/cpu-opv/cpu-op.h           |   68 ++
 tools/testing/selftests/lib.mk                     |    4 +
 8 files changed, 1629 insertions(+)
 create mode 100644 tools/testing/selftests/cpu-opv/.gitignore
 create mode 100644 tools/testing/selftests/cpu-opv/Makefile
 create mode 100644 tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 0b4e504f5003..c6c2436d15f8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3734,6 +3734,7 @@ L:	linux-kernel@vger.kernel.org
 S:	Supported
 F:	kernel/cpu_opv.c
 F:	include/uapi/linux/cpu_opv.h
+F:	tools/testing/selftests/cpu-opv/
 
 CRAMFS FILESYSTEM
 M:	Nicolas Pitre <nico@linaro.org>
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index eaf599dc2137..fc1eba0e0130 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -5,6 +5,7 @@ TARGETS += breakpoints
 TARGETS += capabilities
 TARGETS += cpufreq
 TARGETS += cpu-hotplug
+TARGETS += cpu-opv
 TARGETS += efivarfs
 TARGETS += exec
 TARGETS += firmware
diff --git a/tools/testing/selftests/cpu-opv/.gitignore b/tools/testing/selftests/cpu-opv/.gitignore
new file mode 100644
index 000000000000..c7186eb95cf5
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/.gitignore
@@ -0,0 +1 @@
+basic_cpu_opv_test
diff --git a/tools/testing/selftests/cpu-opv/Makefile b/tools/testing/selftests/cpu-opv/Makefile
new file mode 100644
index 000000000000..21e63545d521
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/Makefile
@@ -0,0 +1,17 @@
+CFLAGS += -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./
+
+# Own dependencies because we only want to build against 1st prerequisite, but
+# still track changes to header files and depend on shared object.
+OVERRIDE_TARGETS = 1
+
+TEST_GEN_PROGS = basic_cpu_opv_test
+
+TEST_GEN_PROGS_EXTENDED = libcpu-op.so
+
+include ../lib.mk
+
+$(OUTPUT)/libcpu-op.so: cpu-op.c cpu-op.h
+	$(CC) $(CFLAGS) -shared -fPIC $< -o $@
+
+$(OUTPUT)/%: %.c $(TEST_GEN_PROGS_EXTENDED) cpu-op.h
+	$(CC) $(CFLAGS) $< -lcpu-op -o $@
diff --git a/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
new file mode 100644
index 000000000000..a31a10bbd8aa
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
@@ -0,0 +1,1189 @@
+/*
+ * Basic test coverage for cpu_opv system call.
+ */
+
+#define _GNU_SOURCE
+#include <assert.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/time.h>
+#include <errno.h>
+#include <stdlib.h>
+
+#include "cpu-op.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+#define TESTBUFLEN	4096
+#define TESTBUFLEN_CMP	16
+
+#define TESTBUFLEN_PAGE_MAX	65536
+
+#define NR_PF_ARRAY	16384
+#define PF_ARRAY_LEN	4096
+
+/* 64 MB arrays for page fault testing. */
+char pf_array_dst[NR_PF_ARRAY][PF_ARRAY_LEN];
+char pf_array_src[NR_PF_ARRAY][PF_ARRAY_LEN];
+
+static int test_compare_eq_op(char *a, char *b, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_compare_eq_same(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_eq same";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test compare_eq */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf2[i] = (char)i;
+	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret > 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 0);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_compare_eq_diff(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_eq different";
+
+	printf("Testing %s\n", test_name);
+
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret == 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 1);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_compare_ne_op(char *a, char *b, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_compare_ne_same(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_ne same";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test compare_ne */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf2[i] = (char)i;
+	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret == 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 1);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_compare_ne_diff(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_ne different";
+
+	printf("Testing %s\n", test_name);
+
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 0);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_2compare_eq_op(char *a, char *b, char *c, char *d,
+		size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_2compare_eq_index(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN_CMP];
+	char buf2[TESTBUFLEN_CMP];
+	char buf3[TESTBUFLEN_CMP];
+	char buf4[TESTBUFLEN_CMP];
+	const char *test_name = "test_2compare_eq index";
+
+	printf("Testing %s\n", test_name);
+
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN_CMP);
+	memset(buf3, 0, TESTBUFLEN_CMP);
+	memset(buf4, 0, TESTBUFLEN_CMP);
+
+	/* First compare failure is op[0], expect 1. */
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 1) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 1);
+		return -1;
+	}
+
+	/* All compares succeed. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf2[i] = (char)i;
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 0);
+		return -1;
+	}
+
+	/* First compare failure is op[1], expect 2. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf3[i] = (char)i;
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 2) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 2);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int test_2compare_ne_op(char *a, char *b, char *c, char *d,
+		size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_2compare_ne_index(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN_CMP];
+	char buf2[TESTBUFLEN_CMP];
+	char buf3[TESTBUFLEN_CMP];
+	char buf4[TESTBUFLEN_CMP];
+	const char *test_name = "test_2compare_ne index";
+
+	printf("Testing %s\n", test_name);
+
+	memset(buf1, 0, TESTBUFLEN_CMP);
+	memset(buf2, 0, TESTBUFLEN_CMP);
+	memset(buf3, 0, TESTBUFLEN_CMP);
+	memset(buf4, 0, TESTBUFLEN_CMP);
+
+	/* First compare ne failure is op[0], expect 1. */
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 1) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 1);
+		return -1;
+	}
+
+	/* All compare ne succeed. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf3[i] = (char)i;
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 0);
+		return -1;
+	}
+
+	/* First compare failure is op[1], expect 2. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf4[i] = (char)i;
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 2) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 2);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int test_memcpy_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_memcpy";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	for (i = 0; i < TESTBUFLEN; i++) {
+		if (buf2[i] != (char)i) {
+			printf("%s failed. Expecting '%d', found '%d' at offset %d\n",
+				test_name, (char)i, buf2[i], i);
+			return -1;
+		}
+	}
+	return 0;
+}
+
+static int test_memcpy_u32(void)
+{
+	int ret;
+	uint32_t v1, v2;
+	const char *test_name = "test_memcpy_u32";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy_u32 */
+	v1 = 42;
+	v2 = 0;
+	ret = test_memcpy_op(&v2, &v1, sizeof(v1));
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (v1 != v2) {
+		printf("%s failed. Expecting '%d', found '%d'\n",
+			test_name, v1, v2);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_memcpy_mb_memcpy_op(void *dst1, void *src1,
+		void *dst2, void *src2, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst1),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src1),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_MB_OP,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst2),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src2),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy_mb_memcpy(void)
+{
+	int ret;
+	int v1, v2, v3;
+	const char *test_name = "test_memcpy_mb_memcpy";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy */
+	v1 = 42;
+	v2 = v3 = 0;
+	ret = test_memcpy_mb_memcpy_op(&v2, &v1, &v3, &v2, sizeof(int));
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (v3 != v1) {
+		printf("%s failed. Expecting '%d', found '%d'\n",
+			test_name, v1, v3);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_add_op(int *v, int64_t increment)
+{
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_op_add(v, increment, sizeof(*v), cpu);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_add(void)
+{
+	int orig_v = 42, v, ret;
+	int increment = 1;
+	const char *test_name = "test_add";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_add_op(&v, increment);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != orig_v + increment) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_two_add_op(int *v, int64_t *increments)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_ADD_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.arithmetic_op.p, v),
+			.u.arithmetic_op.count = increments[0],
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+		[1] = {
+			.op = CPU_ADD_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.arithmetic_op.p, v),
+			.u.arithmetic_op.count = increments[1],
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_two_add(void)
+{
+	int orig_v = 42, v, ret;
+	int64_t increments[2] = { 99, 123 };
+	const char *test_name = "test_two_add";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_two_add_op(&v, increments);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != orig_v + increments[0] + increments[1]) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_or_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_OR_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_or(void)
+{
+	int orig_v = 0xFF00000, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_or";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_or_op(&v, mask);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v | mask)) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v | mask);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_and_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_AND_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_and(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_and";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_and_op(&v, mask);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v & mask)) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v & mask);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_xor_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_XOR_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_xor(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_xor";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_xor_op(&v, mask);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v ^ mask)) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v ^ mask);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_lshift_op(int *v, uint32_t bits)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_LSHIFT_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.shift_op.p, v),
+			.u.shift_op.bits = bits,
+			.u.shift_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_lshift(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t bits = 5;
+	const char *test_name = "test_lshift";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_lshift_op(&v, bits);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v << bits)) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v << bits);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_rshift_op(int *v, uint32_t bits)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_RSHIFT_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.shift_op.p, v),
+			.u.shift_op.bits = bits,
+			.u.shift_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_rshift(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t bits = 5;
+	const char *test_name = "test_rshift";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_rshift_op(&v, bits);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v >> bits)) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v >> bits);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_cmpxchg_op(void *v, void *expect, void *old, void *n,
+		size_t len)
+{
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_op_cmpxchg(v, expect, old, n, len, cpu);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+
+static int test_cmpxchg_success(void)
+{
+	int ret;
+	uint64_t orig_v = 1, v, expect = 1, old = 0, n = 3;
+	const char *test_name = "test_cmpxchg success";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 0);
+		return -1;
+	}
+	if (v != n) {
+		printf("%s v is %lld, expecting %lld\n",
+			test_name, (long long)v, (long long)n);
+		return -1;
+	}
+	if (old != orig_v) {
+		printf("%s old is %lld, expecting %lld\n",
+			test_name, (long long)old, (long long)orig_v);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_cmpxchg_fail(void)
+{
+	int ret;
+	uint64_t orig_v = 1, v, expect = 123, old = 0, n = 3;
+	const char *test_name = "test_cmpxchg fail";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret == 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 1);
+		return -1;
+	}
+	if (v == n) {
+		printf("%s v is %lld, expecting %lld\n",
+			test_name, (long long)v, (long long)orig_v);
+		return -1;
+	}
+	if (old != orig_v) {
+		printf("%s old is %lld, expecting %lld\n",
+			test_name, (long long)old, (long long)orig_v);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_memcpy_expect_fault_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			/* Return EAGAIN on fault. */
+			.u.memcpy_op.expect_fault_src = 1,
+		},
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_memcpy_fault(void)
+{
+	int ret;
+	char buf1[TESTBUFLEN];
+	const char *test_name = "test_memcpy_fault";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy */
+	ret = test_memcpy_op(buf1, NULL, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EFAULT)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	/* Test memcpy expect fault */
+	ret = test_memcpy_expect_fault_op(buf1, NULL, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EAGAIN)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+static int do_test_unknown_op(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = -1,	/* Unknown */
+			.len = 0,
+		},
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_unknown_op(void)
+{
+	int ret;
+	const char *test_name = "test_unknown_op";
+
+	printf("Testing %s\n", test_name);
+
+	ret = do_test_unknown_op();
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+static int do_test_max_ops(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = { .op = CPU_MB_OP, },
+		[1] = { .op = CPU_MB_OP, },
+		[2] = { .op = CPU_MB_OP, },
+		[3] = { .op = CPU_MB_OP, },
+		[4] = { .op = CPU_MB_OP, },
+		[5] = { .op = CPU_MB_OP, },
+		[6] = { .op = CPU_MB_OP, },
+		[7] = { .op = CPU_MB_OP, },
+		[8] = { .op = CPU_MB_OP, },
+		[9] = { .op = CPU_MB_OP, },
+		[10] = { .op = CPU_MB_OP, },
+		[11] = { .op = CPU_MB_OP, },
+		[12] = { .op = CPU_MB_OP, },
+		[13] = { .op = CPU_MB_OP, },
+		[14] = { .op = CPU_MB_OP, },
+		[15] = { .op = CPU_MB_OP, },
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_max_ops(void)
+{
+	int ret;
+	const char *test_name = "test_max_ops";
+
+	printf("Testing %s\n", test_name);
+
+	ret = do_test_max_ops();
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+static int do_test_too_many_ops(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = { .op = CPU_MB_OP, },
+		[1] = { .op = CPU_MB_OP, },
+		[2] = { .op = CPU_MB_OP, },
+		[3] = { .op = CPU_MB_OP, },
+		[4] = { .op = CPU_MB_OP, },
+		[5] = { .op = CPU_MB_OP, },
+		[6] = { .op = CPU_MB_OP, },
+		[7] = { .op = CPU_MB_OP, },
+		[8] = { .op = CPU_MB_OP, },
+		[9] = { .op = CPU_MB_OP, },
+		[10] = { .op = CPU_MB_OP, },
+		[11] = { .op = CPU_MB_OP, },
+		[12] = { .op = CPU_MB_OP, },
+		[13] = { .op = CPU_MB_OP, },
+		[14] = { .op = CPU_MB_OP, },
+		[15] = { .op = CPU_MB_OP, },
+		[16] = { .op = CPU_MB_OP, },
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_too_many_ops(void)
+{
+	int ret;
+	const char *test_name = "test_too_many_ops";
+
+	printf("Testing %s\n", test_name);
+
+	ret = do_test_too_many_ops();
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+/* Use 64kB len, largest page size known on Linux. */
+static int test_memcpy_single_too_large(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN_PAGE_MAX + 1];
+	char buf2[TESTBUFLEN_PAGE_MAX + 1];
+	const char *test_name = "test_memcpy_single_too_large";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN_PAGE_MAX + 1; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN_PAGE_MAX + 1);
+	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN_PAGE_MAX + 1);
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+static int test_memcpy_single_ok_sum_too_large_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy_single_ok_sum_too_large(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_memcpy_single_ok_sum_too_large";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_memcpy_single_ok_sum_too_large_op(buf2, buf1, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+/*
+ * Iterate over large uninitialized arrays to trigger page faults.
+ */
+int test_page_fault(void)
+{
+	int ret = 0;
+	uint64_t i;
+	const char *test_name = "test_page_fault";
+
+	printf("Testing %s\n", test_name);
+
+	for (i = 0; i < NR_PF_ARRAY; i++) {
+		ret = test_memcpy_op(pf_array_dst[i],
+				     pf_array_src[i],
+				     PF_ARRAY_LEN);
+		if (ret) {
+			printf("%s returned with %d, errno: %s\n",
+				test_name, ret, strerror(errno));
+			return ret;
+		}
+	}
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	int ret = 0;
+
+	ret |= test_compare_eq_same();
+	ret |= test_compare_eq_diff();
+	ret |= test_compare_ne_same();
+	ret |= test_compare_ne_diff();
+	ret |= test_2compare_eq_index();
+	ret |= test_2compare_ne_index();
+	ret |= test_memcpy();
+	ret |= test_memcpy_u32();
+	ret |= test_memcpy_mb_memcpy();
+	ret |= test_add();
+	ret |= test_two_add();
+	ret |= test_or();
+	ret |= test_and();
+	ret |= test_xor();
+	ret |= test_lshift();
+	ret |= test_rshift();
+	ret |= test_cmpxchg_success();
+	ret |= test_cmpxchg_fail();
+	ret |= test_memcpy_fault();
+	ret |= test_unknown_op();
+	ret |= test_max_ops();
+	ret |= test_too_many_ops();
+	ret |= test_memcpy_single_too_large();
+	ret |= test_memcpy_single_ok_sum_too_large();
+	ret |= test_page_fault();
+
+	return ret;
+}
diff --git a/tools/testing/selftests/cpu-opv/cpu-op.c b/tools/testing/selftests/cpu-opv/cpu-op.c
new file mode 100644
index 000000000000..d7ba481cca04
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/cpu-op.c
@@ -0,0 +1,348 @@
+/*
+ * cpu-op.c
+ *
+ * Copyright (C) 2017 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; only
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <syscall.h>
+#include <assert.h>
+#include <signal.h>
+
+#include "cpu-op.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+int cpu_opv(struct cpu_op *cpu_opv, int cpuopcnt, int cpu, int flags)
+{
+	return syscall(__NR_cpu_opv, cpu_opv, cpuopcnt, cpu, flags);
+}
+
+int cpu_op_get_current_cpu(void)
+{
+	int cpu;
+
+	cpu = sched_getcpu();
+	if (cpu < 0) {
+		perror("sched_getcpu()");
+		abort();
+	}
+	return cpu;
+}
+
+int cpu_op_cmpxchg(void *v, void *expect, void *old, void *n,
+		size_t len, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)old,
+			.u.memcpy_op.src = (unsigned long)v,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)n,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_add(void *v, int64_t count, size_t len, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_ADD_OP,
+			.len = len,
+			.u.arithmetic_op.p = (unsigned long)v,
+			.u.arithmetic_op.count = count,
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int cpu_op_cmpeqv_storep_expect_fault(intptr_t *v, intptr_t expect,
+		intptr_t *newp, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)newp,
+			.u.memcpy_op.expect_fault_dst = 0,
+			/* Return EAGAIN on src fault. */
+			.u.memcpy_op.expect_fault_src = 1,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	intptr_t oldv = READ_ONCE(*v);
+	intptr_t *newp = (intptr_t *)(oldv + voffp);
+	int ret;
+
+	if (oldv == expectnot)
+		return 1;
+	ret = cpu_op_cmpeqv_storep_expect_fault(v, oldv, newp, cpu);
+	if (!ret) {
+		*load = oldv;
+		return 0;
+	}
+	if (ret > 0) {
+		errno = EAGAIN;
+		return -1;
+	}
+	return -1;
+}
+
+int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v2,
+			.u.memcpy_op.src = (unsigned long)&newv2,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v2,
+			.u.memcpy_op.src = (unsigned long)&newv2,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MB_OP,
+		},
+		[3] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v2,
+			.u.compare_op.b = (unsigned long)&expect2,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)dst,
+			.u.memcpy_op.src = (unsigned long)src,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)dst,
+			.u.memcpy_op.src = (unsigned long)src,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MB_OP,
+		},
+		[3] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_addv(intptr_t *v, int64_t count, int cpu)
+{
+	return cpu_op_add(v, count, sizeof(intptr_t), cpu);
+}
diff --git a/tools/testing/selftests/cpu-opv/cpu-op.h b/tools/testing/selftests/cpu-opv/cpu-op.h
new file mode 100644
index 000000000000..ba2ec578ec50
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/cpu-op.h
@@ -0,0 +1,68 @@
+/*
+ * cpu-op.h
+ *
+ * (C) Copyright 2017 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef CPU_OPV_H
+#define CPU_OPV_H
+
+#include <stdlib.h>
+#include <stdint.h>
+#include <linux/cpu_opv.h>
+
+#define likely(x)		__builtin_expect(!!(x), 1)
+#define unlikely(x)		__builtin_expect(!!(x), 0)
+#define barrier()		__asm__ __volatile__("" : : : "memory")
+
+#define ACCESS_ONCE(x)		(*(__volatile__  __typeof__(x) *)&(x))
+#define WRITE_ONCE(x, v)	__extension__ ({ ACCESS_ONCE(x) = (v); })
+#define READ_ONCE(x)		ACCESS_ONCE(x)
+
+int cpu_opv(struct cpu_op *cpuopv, int cpuopcnt, int cpu, int flags);
+int cpu_op_get_current_cpu(void);
+
+int cpu_op_cmpxchg(void *v, void *expect, void *old, void *_new,
+		size_t len, int cpu);
+int cpu_op_add(void *v, int64_t count, size_t len, int cpu);
+
+int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu);
+int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu);
+int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu);
+int cpu_op_addv(intptr_t *v, int64_t count, int cpu);
+
+#endif  /* CPU_OPV_H_ */
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index 5bef05d6ba39..441d7bc63bb7 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -105,6 +105,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c
 LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
 endif
 
+# Selftest makefiles can override those targets by setting
+# OVERRIDE_TARGETS = 1.
+ifeq ($(OVERRIDE_TARGETS),)
 $(OUTPUT)/%:%.c
 	$(LINK.c) $^ $(LDLIBS) -o $@
 
@@ -113,5 +116,6 @@ $(OUTPUT)/%.o:%.S
 
 $(OUTPUT)/%:%.S
 	$(LINK.S) $^ $(LDLIBS) -o $@
+endif
 
 .PHONY: run_tests all clean install emit_tests
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 14/22] cpu_opv: selftests: Implement selftests
@ 2017-11-21 14:18   ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: mathieu.desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)


Implement cpu_opv selftests. It needs to express dependencies on
header files and .so, which require to override the selftests
lib.mk targets. Introduce a new OVERRIDE_TARGETS define for this.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
CC: Russell King <linux at arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas at arm.com>
CC: Will Deacon <will.deacon at arm.com>
CC: Thomas Gleixner <tglx at linutronix.de>
CC: Paul Turner <pjt at google.com>
CC: Andrew Hunter <ahh at google.com>
CC: Peter Zijlstra <peterz at infradead.org>
CC: Andy Lutomirski <luto at amacapital.net>
CC: Andi Kleen <andi at firstfloor.org>
CC: Dave Watson <davejwatson at fb.com>
CC: Chris Lameter <cl at linux.com>
CC: Ingo Molnar <mingo at redhat.com>
CC: "H. Peter Anvin" <hpa at zytor.com>
CC: Ben Maurer <bmaurer at fb.com>
CC: Steven Rostedt <rostedt at goodmis.org>
CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
CC: Josh Triplett <josh at joshtriplett.org>
CC: Linus Torvalds <torvalds at linux-foundation.org>
CC: Andrew Morton <akpm at linux-foundation.org>
CC: Boqun Feng <boqun.feng at gmail.com>
CC: Shuah Khan <shuah at kernel.org>
CC: linux-kselftest at vger.kernel.org
CC: linux-api at vger.kernel.org
---
Changes since v1:

- Expose similar library API as rseq:  Expose library API closely
  matching the rseq APIs, following removal of the event counter from
  the rseq kernel API.
- Update makefile to fix make run_tests dependency on "all".
- Introduce a OVERRIDE_TARGETS.

Changes since v2:

- Test page faults.
---
 MAINTAINERS                                        |    1 +
 tools/testing/selftests/Makefile                   |    1 +
 tools/testing/selftests/cpu-opv/.gitignore         |    1 +
 tools/testing/selftests/cpu-opv/Makefile           |   17 +
 .../testing/selftests/cpu-opv/basic_cpu_opv_test.c | 1189 ++++++++++++++++++++
 tools/testing/selftests/cpu-opv/cpu-op.c           |  348 ++++++
 tools/testing/selftests/cpu-opv/cpu-op.h           |   68 ++
 tools/testing/selftests/lib.mk                     |    4 +
 8 files changed, 1629 insertions(+)
 create mode 100644 tools/testing/selftests/cpu-opv/.gitignore
 create mode 100644 tools/testing/selftests/cpu-opv/Makefile
 create mode 100644 tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 0b4e504f5003..c6c2436d15f8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3734,6 +3734,7 @@ L:	linux-kernel at vger.kernel.org
 S:	Supported
 F:	kernel/cpu_opv.c
 F:	include/uapi/linux/cpu_opv.h
+F:	tools/testing/selftests/cpu-opv/
 
 CRAMFS FILESYSTEM
 M:	Nicolas Pitre <nico at linaro.org>
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index eaf599dc2137..fc1eba0e0130 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -5,6 +5,7 @@ TARGETS += breakpoints
 TARGETS += capabilities
 TARGETS += cpufreq
 TARGETS += cpu-hotplug
+TARGETS += cpu-opv
 TARGETS += efivarfs
 TARGETS += exec
 TARGETS += firmware
diff --git a/tools/testing/selftests/cpu-opv/.gitignore b/tools/testing/selftests/cpu-opv/.gitignore
new file mode 100644
index 000000000000..c7186eb95cf5
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/.gitignore
@@ -0,0 +1 @@
+basic_cpu_opv_test
diff --git a/tools/testing/selftests/cpu-opv/Makefile b/tools/testing/selftests/cpu-opv/Makefile
new file mode 100644
index 000000000000..21e63545d521
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/Makefile
@@ -0,0 +1,17 @@
+CFLAGS += -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./
+
+# Own dependencies because we only want to build against 1st prerequisite, but
+# still track changes to header files and depend on shared object.
+OVERRIDE_TARGETS = 1
+
+TEST_GEN_PROGS = basic_cpu_opv_test
+
+TEST_GEN_PROGS_EXTENDED = libcpu-op.so
+
+include ../lib.mk
+
+$(OUTPUT)/libcpu-op.so: cpu-op.c cpu-op.h
+	$(CC) $(CFLAGS) -shared -fPIC $< -o $@
+
+$(OUTPUT)/%: %.c $(TEST_GEN_PROGS_EXTENDED) cpu-op.h
+	$(CC) $(CFLAGS) $< -lcpu-op -o $@
diff --git a/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
new file mode 100644
index 000000000000..a31a10bbd8aa
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
@@ -0,0 +1,1189 @@
+/*
+ * Basic test coverage for cpu_opv system call.
+ */
+
+#define _GNU_SOURCE
+#include <assert.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/time.h>
+#include <errno.h>
+#include <stdlib.h>
+
+#include "cpu-op.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+#define TESTBUFLEN	4096
+#define TESTBUFLEN_CMP	16
+
+#define TESTBUFLEN_PAGE_MAX	65536
+
+#define NR_PF_ARRAY	16384
+#define PF_ARRAY_LEN	4096
+
+/* 64 MB arrays for page fault testing. */
+char pf_array_dst[NR_PF_ARRAY][PF_ARRAY_LEN];
+char pf_array_src[NR_PF_ARRAY][PF_ARRAY_LEN];
+
+static int test_compare_eq_op(char *a, char *b, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_compare_eq_same(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_eq same";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test compare_eq */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf2[i] = (char)i;
+	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret > 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 0);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_compare_eq_diff(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_eq different";
+
+	printf("Testing %s\n", test_name);
+
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret == 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 1);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_compare_ne_op(char *a, char *b, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_compare_ne_same(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_ne same";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test compare_ne */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf2[i] = (char)i;
+	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret == 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 1);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_compare_ne_diff(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_ne different";
+
+	printf("Testing %s\n", test_name);
+
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 0);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_2compare_eq_op(char *a, char *b, char *c, char *d,
+		size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_2compare_eq_index(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN_CMP];
+	char buf2[TESTBUFLEN_CMP];
+	char buf3[TESTBUFLEN_CMP];
+	char buf4[TESTBUFLEN_CMP];
+	const char *test_name = "test_2compare_eq index";
+
+	printf("Testing %s\n", test_name);
+
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN_CMP);
+	memset(buf3, 0, TESTBUFLEN_CMP);
+	memset(buf4, 0, TESTBUFLEN_CMP);
+
+	/* First compare failure is op[0], expect 1. */
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 1) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 1);
+		return -1;
+	}
+
+	/* All compares succeed. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf2[i] = (char)i;
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 0);
+		return -1;
+	}
+
+	/* First compare failure is op[1], expect 2. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf3[i] = (char)i;
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 2) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 2);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int test_2compare_ne_op(char *a, char *b, char *c, char *d,
+		size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_2compare_ne_index(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN_CMP];
+	char buf2[TESTBUFLEN_CMP];
+	char buf3[TESTBUFLEN_CMP];
+	char buf4[TESTBUFLEN_CMP];
+	const char *test_name = "test_2compare_ne index";
+
+	printf("Testing %s\n", test_name);
+
+	memset(buf1, 0, TESTBUFLEN_CMP);
+	memset(buf2, 0, TESTBUFLEN_CMP);
+	memset(buf3, 0, TESTBUFLEN_CMP);
+	memset(buf4, 0, TESTBUFLEN_CMP);
+
+	/* First compare ne failure is op[0], expect 1. */
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 1) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 1);
+		return -1;
+	}
+
+	/* All compare ne succeed. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf3[i] = (char)i;
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 0);
+		return -1;
+	}
+
+	/* First compare failure is op[1], expect 2. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf4[i] = (char)i;
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 2) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 2);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int test_memcpy_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_memcpy";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	for (i = 0; i < TESTBUFLEN; i++) {
+		if (buf2[i] != (char)i) {
+			printf("%s failed. Expecting '%d', found '%d' at offset %d\n",
+				test_name, (char)i, buf2[i], i);
+			return -1;
+		}
+	}
+	return 0;
+}
+
+static int test_memcpy_u32(void)
+{
+	int ret;
+	uint32_t v1, v2;
+	const char *test_name = "test_memcpy_u32";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy_u32 */
+	v1 = 42;
+	v2 = 0;
+	ret = test_memcpy_op(&v2, &v1, sizeof(v1));
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (v1 != v2) {
+		printf("%s failed. Expecting '%d', found '%d'\n",
+			test_name, v1, v2);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_memcpy_mb_memcpy_op(void *dst1, void *src1,
+		void *dst2, void *src2, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst1),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src1),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_MB_OP,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst2),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src2),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy_mb_memcpy(void)
+{
+	int ret;
+	int v1, v2, v3;
+	const char *test_name = "test_memcpy_mb_memcpy";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy */
+	v1 = 42;
+	v2 = v3 = 0;
+	ret = test_memcpy_mb_memcpy_op(&v2, &v1, &v3, &v2, sizeof(int));
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (v3 != v1) {
+		printf("%s failed. Expecting '%d', found '%d'\n",
+			test_name, v1, v3);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_add_op(int *v, int64_t increment)
+{
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_op_add(v, increment, sizeof(*v), cpu);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_add(void)
+{
+	int orig_v = 42, v, ret;
+	int increment = 1;
+	const char *test_name = "test_add";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_add_op(&v, increment);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != orig_v + increment) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_two_add_op(int *v, int64_t *increments)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_ADD_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.arithmetic_op.p, v),
+			.u.arithmetic_op.count = increments[0],
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+		[1] = {
+			.op = CPU_ADD_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.arithmetic_op.p, v),
+			.u.arithmetic_op.count = increments[1],
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_two_add(void)
+{
+	int orig_v = 42, v, ret;
+	int64_t increments[2] = { 99, 123 };
+	const char *test_name = "test_two_add";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_two_add_op(&v, increments);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != orig_v + increments[0] + increments[1]) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_or_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_OR_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_or(void)
+{
+	int orig_v = 0xFF00000, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_or";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_or_op(&v, mask);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v | mask)) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v | mask);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_and_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_AND_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_and(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_and";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_and_op(&v, mask);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v & mask)) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v & mask);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_xor_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_XOR_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_xor(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_xor";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_xor_op(&v, mask);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v ^ mask)) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v ^ mask);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_lshift_op(int *v, uint32_t bits)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_LSHIFT_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.shift_op.p, v),
+			.u.shift_op.bits = bits,
+			.u.shift_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_lshift(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t bits = 5;
+	const char *test_name = "test_lshift";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_lshift_op(&v, bits);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v << bits)) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v << bits);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_rshift_op(int *v, uint32_t bits)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_RSHIFT_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.shift_op.p, v),
+			.u.shift_op.bits = bits,
+			.u.shift_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_rshift(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t bits = 5;
+	const char *test_name = "test_rshift";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_rshift_op(&v, bits);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v >> bits)) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v >> bits);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_cmpxchg_op(void *v, void *expect, void *old, void *n,
+		size_t len)
+{
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_op_cmpxchg(v, expect, old, n, len, cpu);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+
+static int test_cmpxchg_success(void)
+{
+	int ret;
+	uint64_t orig_v = 1, v, expect = 1, old = 0, n = 3;
+	const char *test_name = "test_cmpxchg success";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 0);
+		return -1;
+	}
+	if (v != n) {
+		printf("%s v is %lld, expecting %lld\n",
+			test_name, (long long)v, (long long)n);
+		return -1;
+	}
+	if (old != orig_v) {
+		printf("%s old is %lld, expecting %lld\n",
+			test_name, (long long)old, (long long)orig_v);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_cmpxchg_fail(void)
+{
+	int ret;
+	uint64_t orig_v = 1, v, expect = 123, old = 0, n = 3;
+	const char *test_name = "test_cmpxchg fail";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret == 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 1);
+		return -1;
+	}
+	if (v == n) {
+		printf("%s v is %lld, expecting %lld\n",
+			test_name, (long long)v, (long long)orig_v);
+		return -1;
+	}
+	if (old != orig_v) {
+		printf("%s old is %lld, expecting %lld\n",
+			test_name, (long long)old, (long long)orig_v);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_memcpy_expect_fault_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			/* Return EAGAIN on fault. */
+			.u.memcpy_op.expect_fault_src = 1,
+		},
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_memcpy_fault(void)
+{
+	int ret;
+	char buf1[TESTBUFLEN];
+	const char *test_name = "test_memcpy_fault";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy */
+	ret = test_memcpy_op(buf1, NULL, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EFAULT)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	/* Test memcpy expect fault */
+	ret = test_memcpy_expect_fault_op(buf1, NULL, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EAGAIN)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+static int do_test_unknown_op(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = -1,	/* Unknown */
+			.len = 0,
+		},
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_unknown_op(void)
+{
+	int ret;
+	const char *test_name = "test_unknown_op";
+
+	printf("Testing %s\n", test_name);
+
+	ret = do_test_unknown_op();
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+static int do_test_max_ops(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = { .op = CPU_MB_OP, },
+		[1] = { .op = CPU_MB_OP, },
+		[2] = { .op = CPU_MB_OP, },
+		[3] = { .op = CPU_MB_OP, },
+		[4] = { .op = CPU_MB_OP, },
+		[5] = { .op = CPU_MB_OP, },
+		[6] = { .op = CPU_MB_OP, },
+		[7] = { .op = CPU_MB_OP, },
+		[8] = { .op = CPU_MB_OP, },
+		[9] = { .op = CPU_MB_OP, },
+		[10] = { .op = CPU_MB_OP, },
+		[11] = { .op = CPU_MB_OP, },
+		[12] = { .op = CPU_MB_OP, },
+		[13] = { .op = CPU_MB_OP, },
+		[14] = { .op = CPU_MB_OP, },
+		[15] = { .op = CPU_MB_OP, },
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_max_ops(void)
+{
+	int ret;
+	const char *test_name = "test_max_ops";
+
+	printf("Testing %s\n", test_name);
+
+	ret = do_test_max_ops();
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+static int do_test_too_many_ops(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = { .op = CPU_MB_OP, },
+		[1] = { .op = CPU_MB_OP, },
+		[2] = { .op = CPU_MB_OP, },
+		[3] = { .op = CPU_MB_OP, },
+		[4] = { .op = CPU_MB_OP, },
+		[5] = { .op = CPU_MB_OP, },
+		[6] = { .op = CPU_MB_OP, },
+		[7] = { .op = CPU_MB_OP, },
+		[8] = { .op = CPU_MB_OP, },
+		[9] = { .op = CPU_MB_OP, },
+		[10] = { .op = CPU_MB_OP, },
+		[11] = { .op = CPU_MB_OP, },
+		[12] = { .op = CPU_MB_OP, },
+		[13] = { .op = CPU_MB_OP, },
+		[14] = { .op = CPU_MB_OP, },
+		[15] = { .op = CPU_MB_OP, },
+		[16] = { .op = CPU_MB_OP, },
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_too_many_ops(void)
+{
+	int ret;
+	const char *test_name = "test_too_many_ops";
+
+	printf("Testing %s\n", test_name);
+
+	ret = do_test_too_many_ops();
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+/* Use 64kB len, largest page size known on Linux. */
+static int test_memcpy_single_too_large(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN_PAGE_MAX + 1];
+	char buf2[TESTBUFLEN_PAGE_MAX + 1];
+	const char *test_name = "test_memcpy_single_too_large";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN_PAGE_MAX + 1; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN_PAGE_MAX + 1);
+	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN_PAGE_MAX + 1);
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+static int test_memcpy_single_ok_sum_too_large_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy_single_ok_sum_too_large(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_memcpy_single_ok_sum_too_large";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_memcpy_single_ok_sum_too_large_op(buf2, buf1, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+/*
+ * Iterate over large uninitialized arrays to trigger page faults.
+ */
+int test_page_fault(void)
+{
+	int ret = 0;
+	uint64_t i;
+	const char *test_name = "test_page_fault";
+
+	printf("Testing %s\n", test_name);
+
+	for (i = 0; i < NR_PF_ARRAY; i++) {
+		ret = test_memcpy_op(pf_array_dst[i],
+				     pf_array_src[i],
+				     PF_ARRAY_LEN);
+		if (ret) {
+			printf("%s returned with %d, errno: %s\n",
+				test_name, ret, strerror(errno));
+			return ret;
+		}
+	}
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	int ret = 0;
+
+	ret |= test_compare_eq_same();
+	ret |= test_compare_eq_diff();
+	ret |= test_compare_ne_same();
+	ret |= test_compare_ne_diff();
+	ret |= test_2compare_eq_index();
+	ret |= test_2compare_ne_index();
+	ret |= test_memcpy();
+	ret |= test_memcpy_u32();
+	ret |= test_memcpy_mb_memcpy();
+	ret |= test_add();
+	ret |= test_two_add();
+	ret |= test_or();
+	ret |= test_and();
+	ret |= test_xor();
+	ret |= test_lshift();
+	ret |= test_rshift();
+	ret |= test_cmpxchg_success();
+	ret |= test_cmpxchg_fail();
+	ret |= test_memcpy_fault();
+	ret |= test_unknown_op();
+	ret |= test_max_ops();
+	ret |= test_too_many_ops();
+	ret |= test_memcpy_single_too_large();
+	ret |= test_memcpy_single_ok_sum_too_large();
+	ret |= test_page_fault();
+
+	return ret;
+}
diff --git a/tools/testing/selftests/cpu-opv/cpu-op.c b/tools/testing/selftests/cpu-opv/cpu-op.c
new file mode 100644
index 000000000000..d7ba481cca04
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/cpu-op.c
@@ -0,0 +1,348 @@
+/*
+ * cpu-op.c
+ *
+ * Copyright (C) 2017 Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; only
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <syscall.h>
+#include <assert.h>
+#include <signal.h>
+
+#include "cpu-op.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+int cpu_opv(struct cpu_op *cpu_opv, int cpuopcnt, int cpu, int flags)
+{
+	return syscall(__NR_cpu_opv, cpu_opv, cpuopcnt, cpu, flags);
+}
+
+int cpu_op_get_current_cpu(void)
+{
+	int cpu;
+
+	cpu = sched_getcpu();
+	if (cpu < 0) {
+		perror("sched_getcpu()");
+		abort();
+	}
+	return cpu;
+}
+
+int cpu_op_cmpxchg(void *v, void *expect, void *old, void *n,
+		size_t len, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)old,
+			.u.memcpy_op.src = (unsigned long)v,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)n,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_add(void *v, int64_t count, size_t len, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_ADD_OP,
+			.len = len,
+			.u.arithmetic_op.p = (unsigned long)v,
+			.u.arithmetic_op.count = count,
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int cpu_op_cmpeqv_storep_expect_fault(intptr_t *v, intptr_t expect,
+		intptr_t *newp, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)newp,
+			.u.memcpy_op.expect_fault_dst = 0,
+			/* Return EAGAIN on src fault. */
+			.u.memcpy_op.expect_fault_src = 1,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	intptr_t oldv = READ_ONCE(*v);
+	intptr_t *newp = (intptr_t *)(oldv + voffp);
+	int ret;
+
+	if (oldv == expectnot)
+		return 1;
+	ret = cpu_op_cmpeqv_storep_expect_fault(v, oldv, newp, cpu);
+	if (!ret) {
+		*load = oldv;
+		return 0;
+	}
+	if (ret > 0) {
+		errno = EAGAIN;
+		return -1;
+	}
+	return -1;
+}
+
+int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v2,
+			.u.memcpy_op.src = (unsigned long)&newv2,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v2,
+			.u.memcpy_op.src = (unsigned long)&newv2,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MB_OP,
+		},
+		[3] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v2,
+			.u.compare_op.b = (unsigned long)&expect2,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)dst,
+			.u.memcpy_op.src = (unsigned long)src,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)dst,
+			.u.memcpy_op.src = (unsigned long)src,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MB_OP,
+		},
+		[3] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_addv(intptr_t *v, int64_t count, int cpu)
+{
+	return cpu_op_add(v, count, sizeof(intptr_t), cpu);
+}
diff --git a/tools/testing/selftests/cpu-opv/cpu-op.h b/tools/testing/selftests/cpu-opv/cpu-op.h
new file mode 100644
index 000000000000..ba2ec578ec50
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/cpu-op.h
@@ -0,0 +1,68 @@
+/*
+ * cpu-op.h
+ *
+ * (C) Copyright 2017 - Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef CPU_OPV_H
+#define CPU_OPV_H
+
+#include <stdlib.h>
+#include <stdint.h>
+#include <linux/cpu_opv.h>
+
+#define likely(x)		__builtin_expect(!!(x), 1)
+#define unlikely(x)		__builtin_expect(!!(x), 0)
+#define barrier()		__asm__ __volatile__("" : : : "memory")
+
+#define ACCESS_ONCE(x)		(*(__volatile__  __typeof__(x) *)&(x))
+#define WRITE_ONCE(x, v)	__extension__ ({ ACCESS_ONCE(x) = (v); })
+#define READ_ONCE(x)		ACCESS_ONCE(x)
+
+int cpu_opv(struct cpu_op *cpuopv, int cpuopcnt, int cpu, int flags);
+int cpu_op_get_current_cpu(void);
+
+int cpu_op_cmpxchg(void *v, void *expect, void *old, void *_new,
+		size_t len, int cpu);
+int cpu_op_add(void *v, int64_t count, size_t len, int cpu);
+
+int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu);
+int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu);
+int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu);
+int cpu_op_addv(intptr_t *v, int64_t count, int cpu);
+
+#endif  /* CPU_OPV_H_ */
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index 5bef05d6ba39..441d7bc63bb7 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -105,6 +105,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c
 LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
 endif
 
+# Selftest makefiles can override those targets by setting
+# OVERRIDE_TARGETS = 1.
+ifeq ($(OVERRIDE_TARGETS),)
 $(OUTPUT)/%:%.c
 	$(LINK.c) $^ $(LDLIBS) -o $@
 
@@ -113,5 +116,6 @@ $(OUTPUT)/%.o:%.S
 
 $(OUTPUT)/%:%.S
 	$(LINK.S) $^ $(LDLIBS) -o $@
+endif
 
 .PHONY: run_tests all clean install emit_tests
-- 
2.11.0



--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 14/22] cpu_opv: selftests: Implement selftests
@ 2017-11-21 14:18   ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)


Implement cpu_opv selftests. It needs to express dependencies on
header files and .so, which require to override the selftests
lib.mk targets. Introduce a new OVERRIDE_TARGETS define for this.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
CC: Russell King <linux at arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas at arm.com>
CC: Will Deacon <will.deacon at arm.com>
CC: Thomas Gleixner <tglx at linutronix.de>
CC: Paul Turner <pjt at google.com>
CC: Andrew Hunter <ahh at google.com>
CC: Peter Zijlstra <peterz at infradead.org>
CC: Andy Lutomirski <luto at amacapital.net>
CC: Andi Kleen <andi at firstfloor.org>
CC: Dave Watson <davejwatson at fb.com>
CC: Chris Lameter <cl at linux.com>
CC: Ingo Molnar <mingo at redhat.com>
CC: "H. Peter Anvin" <hpa at zytor.com>
CC: Ben Maurer <bmaurer at fb.com>
CC: Steven Rostedt <rostedt at goodmis.org>
CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
CC: Josh Triplett <josh at joshtriplett.org>
CC: Linus Torvalds <torvalds at linux-foundation.org>
CC: Andrew Morton <akpm at linux-foundation.org>
CC: Boqun Feng <boqun.feng at gmail.com>
CC: Shuah Khan <shuah at kernel.org>
CC: linux-kselftest at vger.kernel.org
CC: linux-api at vger.kernel.org
---
Changes since v1:

- Expose similar library API as rseq:  Expose library API closely
  matching the rseq APIs, following removal of the event counter from
  the rseq kernel API.
- Update makefile to fix make run_tests dependency on "all".
- Introduce a OVERRIDE_TARGETS.

Changes since v2:

- Test page faults.
---
 MAINTAINERS                                        |    1 +
 tools/testing/selftests/Makefile                   |    1 +
 tools/testing/selftests/cpu-opv/.gitignore         |    1 +
 tools/testing/selftests/cpu-opv/Makefile           |   17 +
 .../testing/selftests/cpu-opv/basic_cpu_opv_test.c | 1189 ++++++++++++++++++++
 tools/testing/selftests/cpu-opv/cpu-op.c           |  348 ++++++
 tools/testing/selftests/cpu-opv/cpu-op.h           |   68 ++
 tools/testing/selftests/lib.mk                     |    4 +
 8 files changed, 1629 insertions(+)
 create mode 100644 tools/testing/selftests/cpu-opv/.gitignore
 create mode 100644 tools/testing/selftests/cpu-opv/Makefile
 create mode 100644 tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 0b4e504f5003..c6c2436d15f8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3734,6 +3734,7 @@ L:	linux-kernel at vger.kernel.org
 S:	Supported
 F:	kernel/cpu_opv.c
 F:	include/uapi/linux/cpu_opv.h
+F:	tools/testing/selftests/cpu-opv/
 
 CRAMFS FILESYSTEM
 M:	Nicolas Pitre <nico at linaro.org>
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index eaf599dc2137..fc1eba0e0130 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -5,6 +5,7 @@ TARGETS += breakpoints
 TARGETS += capabilities
 TARGETS += cpufreq
 TARGETS += cpu-hotplug
+TARGETS += cpu-opv
 TARGETS += efivarfs
 TARGETS += exec
 TARGETS += firmware
diff --git a/tools/testing/selftests/cpu-opv/.gitignore b/tools/testing/selftests/cpu-opv/.gitignore
new file mode 100644
index 000000000000..c7186eb95cf5
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/.gitignore
@@ -0,0 +1 @@
+basic_cpu_opv_test
diff --git a/tools/testing/selftests/cpu-opv/Makefile b/tools/testing/selftests/cpu-opv/Makefile
new file mode 100644
index 000000000000..21e63545d521
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/Makefile
@@ -0,0 +1,17 @@
+CFLAGS += -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./
+
+# Own dependencies because we only want to build against 1st prerequisite, but
+# still track changes to header files and depend on shared object.
+OVERRIDE_TARGETS = 1
+
+TEST_GEN_PROGS = basic_cpu_opv_test
+
+TEST_GEN_PROGS_EXTENDED = libcpu-op.so
+
+include ../lib.mk
+
+$(OUTPUT)/libcpu-op.so: cpu-op.c cpu-op.h
+	$(CC) $(CFLAGS) -shared -fPIC $< -o $@
+
+$(OUTPUT)/%: %.c $(TEST_GEN_PROGS_EXTENDED) cpu-op.h
+	$(CC) $(CFLAGS) $< -lcpu-op -o $@
diff --git a/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
new file mode 100644
index 000000000000..a31a10bbd8aa
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
@@ -0,0 +1,1189 @@
+/*
+ * Basic test coverage for cpu_opv system call.
+ */
+
+#define _GNU_SOURCE
+#include <assert.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/time.h>
+#include <errno.h>
+#include <stdlib.h>
+
+#include "cpu-op.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+#define TESTBUFLEN	4096
+#define TESTBUFLEN_CMP	16
+
+#define TESTBUFLEN_PAGE_MAX	65536
+
+#define NR_PF_ARRAY	16384
+#define PF_ARRAY_LEN	4096
+
+/* 64 MB arrays for page fault testing. */
+char pf_array_dst[NR_PF_ARRAY][PF_ARRAY_LEN];
+char pf_array_src[NR_PF_ARRAY][PF_ARRAY_LEN];
+
+static int test_compare_eq_op(char *a, char *b, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_compare_eq_same(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_eq same";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test compare_eq */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf2[i] = (char)i;
+	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret > 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 0);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_compare_eq_diff(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_eq different";
+
+	printf("Testing %s\n", test_name);
+
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret == 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 1);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_compare_ne_op(char *a, char *b, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_compare_ne_same(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_ne same";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test compare_ne */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf2[i] = (char)i;
+	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret == 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 1);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_compare_ne_diff(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_ne different";
+
+	printf("Testing %s\n", test_name);
+
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 0);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_2compare_eq_op(char *a, char *b, char *c, char *d,
+		size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_2compare_eq_index(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN_CMP];
+	char buf2[TESTBUFLEN_CMP];
+	char buf3[TESTBUFLEN_CMP];
+	char buf4[TESTBUFLEN_CMP];
+	const char *test_name = "test_2compare_eq index";
+
+	printf("Testing %s\n", test_name);
+
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN_CMP);
+	memset(buf3, 0, TESTBUFLEN_CMP);
+	memset(buf4, 0, TESTBUFLEN_CMP);
+
+	/* First compare failure is op[0], expect 1. */
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 1) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 1);
+		return -1;
+	}
+
+	/* All compares succeed. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf2[i] = (char)i;
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 0);
+		return -1;
+	}
+
+	/* First compare failure is op[1], expect 2. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf3[i] = (char)i;
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 2) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 2);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int test_2compare_ne_op(char *a, char *b, char *c, char *d,
+		size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_2compare_ne_index(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN_CMP];
+	char buf2[TESTBUFLEN_CMP];
+	char buf3[TESTBUFLEN_CMP];
+	char buf4[TESTBUFLEN_CMP];
+	const char *test_name = "test_2compare_ne index";
+
+	printf("Testing %s\n", test_name);
+
+	memset(buf1, 0, TESTBUFLEN_CMP);
+	memset(buf2, 0, TESTBUFLEN_CMP);
+	memset(buf3, 0, TESTBUFLEN_CMP);
+	memset(buf4, 0, TESTBUFLEN_CMP);
+
+	/* First compare ne failure is op[0], expect 1. */
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 1) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 1);
+		return -1;
+	}
+
+	/* All compare ne succeed. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf3[i] = (char)i;
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 0);
+		return -1;
+	}
+
+	/* First compare failure is op[1], expect 2. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf4[i] = (char)i;
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 2) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 2);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int test_memcpy_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_memcpy";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	for (i = 0; i < TESTBUFLEN; i++) {
+		if (buf2[i] != (char)i) {
+			printf("%s failed. Expecting '%d', found '%d' at offset %d\n",
+				test_name, (char)i, buf2[i], i);
+			return -1;
+		}
+	}
+	return 0;
+}
+
+static int test_memcpy_u32(void)
+{
+	int ret;
+	uint32_t v1, v2;
+	const char *test_name = "test_memcpy_u32";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy_u32 */
+	v1 = 42;
+	v2 = 0;
+	ret = test_memcpy_op(&v2, &v1, sizeof(v1));
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (v1 != v2) {
+		printf("%s failed. Expecting '%d', found '%d'\n",
+			test_name, v1, v2);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_memcpy_mb_memcpy_op(void *dst1, void *src1,
+		void *dst2, void *src2, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst1),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src1),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_MB_OP,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst2),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src2),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy_mb_memcpy(void)
+{
+	int ret;
+	int v1, v2, v3;
+	const char *test_name = "test_memcpy_mb_memcpy";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy */
+	v1 = 42;
+	v2 = v3 = 0;
+	ret = test_memcpy_mb_memcpy_op(&v2, &v1, &v3, &v2, sizeof(int));
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (v3 != v1) {
+		printf("%s failed. Expecting '%d', found '%d'\n",
+			test_name, v1, v3);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_add_op(int *v, int64_t increment)
+{
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_op_add(v, increment, sizeof(*v), cpu);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_add(void)
+{
+	int orig_v = 42, v, ret;
+	int increment = 1;
+	const char *test_name = "test_add";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_add_op(&v, increment);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != orig_v + increment) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_two_add_op(int *v, int64_t *increments)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_ADD_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.arithmetic_op.p, v),
+			.u.arithmetic_op.count = increments[0],
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+		[1] = {
+			.op = CPU_ADD_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.arithmetic_op.p, v),
+			.u.arithmetic_op.count = increments[1],
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_two_add(void)
+{
+	int orig_v = 42, v, ret;
+	int64_t increments[2] = { 99, 123 };
+	const char *test_name = "test_two_add";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_two_add_op(&v, increments);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != orig_v + increments[0] + increments[1]) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_or_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_OR_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_or(void)
+{
+	int orig_v = 0xFF00000, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_or";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_or_op(&v, mask);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v | mask)) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v | mask);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_and_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_AND_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_and(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_and";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_and_op(&v, mask);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v & mask)) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v & mask);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_xor_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_XOR_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_xor(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_xor";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_xor_op(&v, mask);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v ^ mask)) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v ^ mask);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_lshift_op(int *v, uint32_t bits)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_LSHIFT_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.shift_op.p, v),
+			.u.shift_op.bits = bits,
+			.u.shift_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_lshift(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t bits = 5;
+	const char *test_name = "test_lshift";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_lshift_op(&v, bits);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v << bits)) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v << bits);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_rshift_op(int *v, uint32_t bits)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_RSHIFT_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.shift_op.p, v),
+			.u.shift_op.bits = bits,
+			.u.shift_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_rshift(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t bits = 5;
+	const char *test_name = "test_rshift";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_rshift_op(&v, bits);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v >> bits)) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v >> bits);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_cmpxchg_op(void *v, void *expect, void *old, void *n,
+		size_t len)
+{
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_op_cmpxchg(v, expect, old, n, len, cpu);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+
+static int test_cmpxchg_success(void)
+{
+	int ret;
+	uint64_t orig_v = 1, v, expect = 1, old = 0, n = 3;
+	const char *test_name = "test_cmpxchg success";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 0);
+		return -1;
+	}
+	if (v != n) {
+		printf("%s v is %lld, expecting %lld\n",
+			test_name, (long long)v, (long long)n);
+		return -1;
+	}
+	if (old != orig_v) {
+		printf("%s old is %lld, expecting %lld\n",
+			test_name, (long long)old, (long long)orig_v);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_cmpxchg_fail(void)
+{
+	int ret;
+	uint64_t orig_v = 1, v, expect = 123, old = 0, n = 3;
+	const char *test_name = "test_cmpxchg fail";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret == 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 1);
+		return -1;
+	}
+	if (v == n) {
+		printf("%s v is %lld, expecting %lld\n",
+			test_name, (long long)v, (long long)orig_v);
+		return -1;
+	}
+	if (old != orig_v) {
+		printf("%s old is %lld, expecting %lld\n",
+			test_name, (long long)old, (long long)orig_v);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_memcpy_expect_fault_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			/* Return EAGAIN on fault. */
+			.u.memcpy_op.expect_fault_src = 1,
+		},
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_memcpy_fault(void)
+{
+	int ret;
+	char buf1[TESTBUFLEN];
+	const char *test_name = "test_memcpy_fault";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy */
+	ret = test_memcpy_op(buf1, NULL, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EFAULT)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	/* Test memcpy expect fault */
+	ret = test_memcpy_expect_fault_op(buf1, NULL, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EAGAIN)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+static int do_test_unknown_op(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = -1,	/* Unknown */
+			.len = 0,
+		},
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_unknown_op(void)
+{
+	int ret;
+	const char *test_name = "test_unknown_op";
+
+	printf("Testing %s\n", test_name);
+
+	ret = do_test_unknown_op();
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+static int do_test_max_ops(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = { .op = CPU_MB_OP, },
+		[1] = { .op = CPU_MB_OP, },
+		[2] = { .op = CPU_MB_OP, },
+		[3] = { .op = CPU_MB_OP, },
+		[4] = { .op = CPU_MB_OP, },
+		[5] = { .op = CPU_MB_OP, },
+		[6] = { .op = CPU_MB_OP, },
+		[7] = { .op = CPU_MB_OP, },
+		[8] = { .op = CPU_MB_OP, },
+		[9] = { .op = CPU_MB_OP, },
+		[10] = { .op = CPU_MB_OP, },
+		[11] = { .op = CPU_MB_OP, },
+		[12] = { .op = CPU_MB_OP, },
+		[13] = { .op = CPU_MB_OP, },
+		[14] = { .op = CPU_MB_OP, },
+		[15] = { .op = CPU_MB_OP, },
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_max_ops(void)
+{
+	int ret;
+	const char *test_name = "test_max_ops";
+
+	printf("Testing %s\n", test_name);
+
+	ret = do_test_max_ops();
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+static int do_test_too_many_ops(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = { .op = CPU_MB_OP, },
+		[1] = { .op = CPU_MB_OP, },
+		[2] = { .op = CPU_MB_OP, },
+		[3] = { .op = CPU_MB_OP, },
+		[4] = { .op = CPU_MB_OP, },
+		[5] = { .op = CPU_MB_OP, },
+		[6] = { .op = CPU_MB_OP, },
+		[7] = { .op = CPU_MB_OP, },
+		[8] = { .op = CPU_MB_OP, },
+		[9] = { .op = CPU_MB_OP, },
+		[10] = { .op = CPU_MB_OP, },
+		[11] = { .op = CPU_MB_OP, },
+		[12] = { .op = CPU_MB_OP, },
+		[13] = { .op = CPU_MB_OP, },
+		[14] = { .op = CPU_MB_OP, },
+		[15] = { .op = CPU_MB_OP, },
+		[16] = { .op = CPU_MB_OP, },
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_too_many_ops(void)
+{
+	int ret;
+	const char *test_name = "test_too_many_ops";
+
+	printf("Testing %s\n", test_name);
+
+	ret = do_test_too_many_ops();
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+/* Use 64kB len, largest page size known on Linux. */
+static int test_memcpy_single_too_large(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN_PAGE_MAX + 1];
+	char buf2[TESTBUFLEN_PAGE_MAX + 1];
+	const char *test_name = "test_memcpy_single_too_large";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN_PAGE_MAX + 1; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN_PAGE_MAX + 1);
+	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN_PAGE_MAX + 1);
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+static int test_memcpy_single_ok_sum_too_large_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy_single_ok_sum_too_large(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_memcpy_single_ok_sum_too_large";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_memcpy_single_ok_sum_too_large_op(buf2, buf1, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+/*
+ * Iterate over large uninitialized arrays to trigger page faults.
+ */
+int test_page_fault(void)
+{
+	int ret = 0;
+	uint64_t i;
+	const char *test_name = "test_page_fault";
+
+	printf("Testing %s\n", test_name);
+
+	for (i = 0; i < NR_PF_ARRAY; i++) {
+		ret = test_memcpy_op(pf_array_dst[i],
+				     pf_array_src[i],
+				     PF_ARRAY_LEN);
+		if (ret) {
+			printf("%s returned with %d, errno: %s\n",
+				test_name, ret, strerror(errno));
+			return ret;
+		}
+	}
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	int ret = 0;
+
+	ret |= test_compare_eq_same();
+	ret |= test_compare_eq_diff();
+	ret |= test_compare_ne_same();
+	ret |= test_compare_ne_diff();
+	ret |= test_2compare_eq_index();
+	ret |= test_2compare_ne_index();
+	ret |= test_memcpy();
+	ret |= test_memcpy_u32();
+	ret |= test_memcpy_mb_memcpy();
+	ret |= test_add();
+	ret |= test_two_add();
+	ret |= test_or();
+	ret |= test_and();
+	ret |= test_xor();
+	ret |= test_lshift();
+	ret |= test_rshift();
+	ret |= test_cmpxchg_success();
+	ret |= test_cmpxchg_fail();
+	ret |= test_memcpy_fault();
+	ret |= test_unknown_op();
+	ret |= test_max_ops();
+	ret |= test_too_many_ops();
+	ret |= test_memcpy_single_too_large();
+	ret |= test_memcpy_single_ok_sum_too_large();
+	ret |= test_page_fault();
+
+	return ret;
+}
diff --git a/tools/testing/selftests/cpu-opv/cpu-op.c b/tools/testing/selftests/cpu-opv/cpu-op.c
new file mode 100644
index 000000000000..d7ba481cca04
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/cpu-op.c
@@ -0,0 +1,348 @@
+/*
+ * cpu-op.c
+ *
+ * Copyright (C) 2017 Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; only
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <syscall.h>
+#include <assert.h>
+#include <signal.h>
+
+#include "cpu-op.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+int cpu_opv(struct cpu_op *cpu_opv, int cpuopcnt, int cpu, int flags)
+{
+	return syscall(__NR_cpu_opv, cpu_opv, cpuopcnt, cpu, flags);
+}
+
+int cpu_op_get_current_cpu(void)
+{
+	int cpu;
+
+	cpu = sched_getcpu();
+	if (cpu < 0) {
+		perror("sched_getcpu()");
+		abort();
+	}
+	return cpu;
+}
+
+int cpu_op_cmpxchg(void *v, void *expect, void *old, void *n,
+		size_t len, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)old,
+			.u.memcpy_op.src = (unsigned long)v,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)n,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_add(void *v, int64_t count, size_t len, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_ADD_OP,
+			.len = len,
+			.u.arithmetic_op.p = (unsigned long)v,
+			.u.arithmetic_op.count = count,
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int cpu_op_cmpeqv_storep_expect_fault(intptr_t *v, intptr_t expect,
+		intptr_t *newp, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)newp,
+			.u.memcpy_op.expect_fault_dst = 0,
+			/* Return EAGAIN on src fault. */
+			.u.memcpy_op.expect_fault_src = 1,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	intptr_t oldv = READ_ONCE(*v);
+	intptr_t *newp = (intptr_t *)(oldv + voffp);
+	int ret;
+
+	if (oldv == expectnot)
+		return 1;
+	ret = cpu_op_cmpeqv_storep_expect_fault(v, oldv, newp, cpu);
+	if (!ret) {
+		*load = oldv;
+		return 0;
+	}
+	if (ret > 0) {
+		errno = EAGAIN;
+		return -1;
+	}
+	return -1;
+}
+
+int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v2,
+			.u.memcpy_op.src = (unsigned long)&newv2,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v2,
+			.u.memcpy_op.src = (unsigned long)&newv2,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MB_OP,
+		},
+		[3] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v2,
+			.u.compare_op.b = (unsigned long)&expect2,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)dst,
+			.u.memcpy_op.src = (unsigned long)src,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)dst,
+			.u.memcpy_op.src = (unsigned long)src,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MB_OP,
+		},
+		[3] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_addv(intptr_t *v, int64_t count, int cpu)
+{
+	return cpu_op_add(v, count, sizeof(intptr_t), cpu);
+}
diff --git a/tools/testing/selftests/cpu-opv/cpu-op.h b/tools/testing/selftests/cpu-opv/cpu-op.h
new file mode 100644
index 000000000000..ba2ec578ec50
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/cpu-op.h
@@ -0,0 +1,68 @@
+/*
+ * cpu-op.h
+ *
+ * (C) Copyright 2017 - Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef CPU_OPV_H
+#define CPU_OPV_H
+
+#include <stdlib.h>
+#include <stdint.h>
+#include <linux/cpu_opv.h>
+
+#define likely(x)		__builtin_expect(!!(x), 1)
+#define unlikely(x)		__builtin_expect(!!(x), 0)
+#define barrier()		__asm__ __volatile__("" : : : "memory")
+
+#define ACCESS_ONCE(x)		(*(__volatile__  __typeof__(x) *)&(x))
+#define WRITE_ONCE(x, v)	__extension__ ({ ACCESS_ONCE(x) = (v); })
+#define READ_ONCE(x)		ACCESS_ONCE(x)
+
+int cpu_opv(struct cpu_op *cpuopv, int cpuopcnt, int cpu, int flags);
+int cpu_op_get_current_cpu(void);
+
+int cpu_op_cmpxchg(void *v, void *expect, void *old, void *_new,
+		size_t len, int cpu);
+int cpu_op_add(void *v, int64_t count, size_t len, int cpu);
+
+int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu);
+int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu);
+int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu);
+int cpu_op_addv(intptr_t *v, int64_t count, int cpu);
+
+#endif  /* CPU_OPV_H_ */
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index 5bef05d6ba39..441d7bc63bb7 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -105,6 +105,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c
 LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
 endif
 
+# Selftest makefiles can override those targets by setting
+# OVERRIDE_TARGETS = 1.
+ifeq ($(OVERRIDE_TARGETS),)
 $(OUTPUT)/%:%.c
 	$(LINK.c) $^ $(LDLIBS) -o $@
 
@@ -113,5 +116,6 @@ $(OUTPUT)/%.o:%.S
 
 $(OUTPUT)/%:%.S
 	$(LINK.S) $^ $(LDLIBS) -o $@
+endif
 
 .PHONY: run_tests all clean install emit_tests
-- 
2.11.0



--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
  2017-11-21 14:18 ` Mathieu Desnoyers
                   ` (14 preceding siblings ...)
  (?)
@ 2017-11-21 14:18 ` Mathieu Desnoyers
  2017-11-21 15:34     ` Shuah Khan
                     ` (3 more replies)
  -1 siblings, 4 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Shuah Khan,
	linux-kselftest

Implements two basic tests of RSEQ functionality, and one more
exhaustive parameterizable test.

The first, "basic_test" only asserts that RSEQ works moderately
correctly. E.g. that the CPUID pointer works.

"basic_percpu_ops_test" is a slightly more "realistic" variant,
implementing a few simple per-cpu operations and testing their
correctness.

"param_test" is a parametrizable restartable sequences test. See
the "--help" output for usage.

A run_param_test.sh script runs many variants of the parametrizable
tests.

As part of those tests, a helper library "rseq" implements a user-space
API around restartable sequences. It uses the cpu_opv system call as
fallback when single-stepped by a debugger. It exposes the instruction
pointer addresses where the rseq assembly blocks begin and end, as well
as the associated abort instruction pointer, in the __rseq_table
section. This section allows debuggers may know where to place
breakpoints when single-stepping through assembly blocks which may be
aborted at any point by the kernel.

The rseq library expose APIs that present the fast-path operations.
The new from userspace is, e.g. for a counter increment:

    cpu = rseq_cpu_start();
    ret = rseq_addv(&data->c[cpu].count, 1, cpu);
    if (likely(!ret))
        return 0;        /* Success. */
    do {
        cpu = rseq_current_cpu();
        ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
        if (likely(!ret))
            return 0;    /* Success. */
    } while (ret > 0 || errno == EAGAIN);
    perror("cpu_op_addv");
    return -1;           /* Unexpected error. */

PowerPC tests have been implemented by Boqun Feng.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Shuah Khan <shuah@kernel.org>
CC: linux-kselftest@vger.kernel.org
CC: linux-api@vger.kernel.org
---
Changes since v1:
- Provide abort-ip signature: The abort-ip signature is located just
  before the abort-ip target. It is currently hardcoded, but a
  user-space application could use the __rseq_table to iterate on all
  abort-ip targets and use a random value as signature if needed in the
  future.
- Add rseq_prepare_unload(): Libraries and JIT code using rseq critical
  sections need to issue rseq_prepare_unload() on each thread at least
  once before reclaim of struct rseq_cs.
- Use initial-exec TLS model, non-weak symbol: The initial-exec model is
  signal-safe, whereas the global-dynamic model is not.  Remove the
  "weak" symbol attribute from the __rseq_abi in rseq.c. The rseq.so
  library will have ownership of that symbol, and there is not reason for
  an application or user library to try to define that symbol.
  The expected use is to link against libreq.so, which owns and provide
  that symbol.
- Set cpu_id to -2 on register error
- Add rseq_len syscall parameter, rseq_cs version
- Ensure disassember-friendly signature: x86 32/64 disassembler have a
  hard time decoding the instruction stream after a bad instruction. Use
  a nopl instruction to encode the signature. Suggested by Andy Lutomirski.
- Exercise parametrized tests variants in a shell scripts.
- Restartable sequences selftests: Remove use of event counter.
- Use cpu_id_start field:  With the cpu_id_start field, the C
  preparation phase of the fast-path does not need to compare cpu_id < 0
  anymore.
- Signal-safe registration and refcounting: Allow libraries using
  librseq.so to register it from signal handlers.
- Use OVERRIDE_TARGETS in makefile.
- Use "m" constraints for rseq_cs field.

Changes since v2:
- Update based on Thomas Gleixner's comments.
---
 MAINTAINERS                                        |    1 +
 tools/testing/selftests/Makefile                   |    1 +
 tools/testing/selftests/rseq/.gitignore            |    4 +
 tools/testing/selftests/rseq/Makefile              |   23 +
 .../testing/selftests/rseq/basic_percpu_ops_test.c |  333 +++++
 tools/testing/selftests/rseq/basic_test.c          |   55 +
 tools/testing/selftests/rseq/param_test.c          | 1285 ++++++++++++++++++++
 tools/testing/selftests/rseq/rseq-arm.h            |  535 ++++++++
 tools/testing/selftests/rseq/rseq-ppc.h            |  567 +++++++++
 tools/testing/selftests/rseq/rseq-x86.h            |  898 ++++++++++++++
 tools/testing/selftests/rseq/rseq.c                |  116 ++
 tools/testing/selftests/rseq/rseq.h                |  154 +++
 tools/testing/selftests/rseq/run_param_test.sh     |  124 ++
 13 files changed, 4096 insertions(+)
 create mode 100644 tools/testing/selftests/rseq/.gitignore
 create mode 100644 tools/testing/selftests/rseq/Makefile
 create mode 100644 tools/testing/selftests/rseq/basic_percpu_ops_test.c
 create mode 100644 tools/testing/selftests/rseq/basic_test.c
 create mode 100644 tools/testing/selftests/rseq/param_test.c
 create mode 100644 tools/testing/selftests/rseq/rseq-arm.h
 create mode 100644 tools/testing/selftests/rseq/rseq-ppc.h
 create mode 100644 tools/testing/selftests/rseq/rseq-x86.h
 create mode 100644 tools/testing/selftests/rseq/rseq.c
 create mode 100644 tools/testing/selftests/rseq/rseq.h
 create mode 100755 tools/testing/selftests/rseq/run_param_test.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index c6c2436d15f8..ba9137c1f295 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11634,6 +11634,7 @@ S:	Supported
 F:	kernel/rseq.c
 F:	include/uapi/linux/rseq.h
 F:	include/trace/events/rseq.h
+F:	tools/testing/selftests/rseq/
 
 RFKILL
 M:	Johannes Berg <johannes@sipsolutions.net>
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index fc1eba0e0130..fc314334628a 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -26,6 +26,7 @@ TARGETS += nsfs
 TARGETS += powerpc
 TARGETS += pstore
 TARGETS += ptrace
+TARGETS += rseq
 TARGETS += seccomp
 TARGETS += sigaltstack
 TARGETS += size
diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore
new file mode 100644
index 000000000000..9409c3db99b2
--- /dev/null
+++ b/tools/testing/selftests/rseq/.gitignore
@@ -0,0 +1,4 @@
+basic_percpu_ops_test
+basic_test
+basic_rseq_op_test
+param_test
diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
new file mode 100644
index 000000000000..e4f638e5752c
--- /dev/null
+++ b/tools/testing/selftests/rseq/Makefile
@@ -0,0 +1,23 @@
+CFLAGS += -O2 -Wall -g -I./ -I../cpu-opv/ -I../../../../usr/include/ -L./ -Wl,-rpath=./
+LDLIBS += -lpthread
+
+# Own dependencies because we only want to build against 1st prerequisite, but
+# still track changes to header files and depend on shared object.
+OVERRIDE_TARGETS = 1
+
+TEST_GEN_PROGS = basic_test basic_percpu_ops_test param_test
+
+TEST_GEN_PROGS_EXTENDED = librseq.so libcpu-op.so
+
+TEST_PROGS = run_param_test.sh
+
+include ../lib.mk
+
+$(OUTPUT)/librseq.so: rseq.c rseq.h rseq-*.h
+	$(CC) $(CFLAGS) -shared -fPIC $< $(LDLIBS) -o $@
+
+$(OUTPUT)/libcpu-op.so: ../cpu-opv/cpu-op.c ../cpu-opv/cpu-op.h
+	$(CC) $(CFLAGS) -shared -fPIC $< $(LDLIBS) -o $@
+
+$(OUTPUT)/%: %.c $(TEST_GEN_PROGS_EXTENDED) rseq.h rseq-*.h ../cpu-opv/cpu-op.h
+	$(CC) $(CFLAGS) $< $(LDLIBS) -lrseq -lcpu-op -o $@
diff --git a/tools/testing/selftests/rseq/basic_percpu_ops_test.c b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
new file mode 100644
index 000000000000..e5f7fed06a03
--- /dev/null
+++ b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
@@ -0,0 +1,333 @@
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stddef.h>
+
+#include "rseq.h"
+#include "cpu-op.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+struct percpu_lock_entry {
+	intptr_t v;
+} __attribute__((aligned(128)));
+
+struct percpu_lock {
+	struct percpu_lock_entry c[CPU_SETSIZE];
+};
+
+struct test_data_entry {
+	intptr_t count;
+} __attribute__((aligned(128)));
+
+struct spinlock_test_data {
+	struct percpu_lock lock;
+	struct test_data_entry c[CPU_SETSIZE];
+	int reps;
+};
+
+struct percpu_list_node {
+	intptr_t data;
+	struct percpu_list_node *next;
+};
+
+struct percpu_list_entry {
+	struct percpu_list_node *head;
+} __attribute__((aligned(128)));
+
+struct percpu_list {
+	struct percpu_list_entry c[CPU_SETSIZE];
+};
+
+/* A simple percpu spinlock.  Returns the cpu lock was acquired on. */
+int rseq_percpu_lock(struct percpu_lock *lock)
+{
+	int cpu;
+
+	for (;;) {
+		int ret;
+
+#ifndef SKIP_FASTPATH
+		/* Try fast path. */
+		cpu = rseq_cpu_start();
+		ret = rseq_cmpeqv_storev(&lock->c[cpu].v,
+				0, 1, cpu);
+		if (likely(!ret))
+			break;
+		if (ret > 0)
+			continue;	/* Retry. */
+#endif
+	slowpath:
+		__attribute__((unused));
+		/* Fallback on cpu_opv system call. */
+		cpu = rseq_current_cpu();
+		ret = cpu_op_cmpeqv_storev(&lock->c[cpu].v, 0, 1, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	/*
+	 * Acquire semantic when taking lock after control dependency.
+	 * Matches rseq_smp_store_release().
+	 */
+	rseq_smp_acquire__after_ctrl_dep();
+	return cpu;
+}
+
+void rseq_percpu_unlock(struct percpu_lock *lock, int cpu)
+{
+	assert(lock->c[cpu].v == 1);
+	/*
+	 * Release lock, with release semantic. Matches
+	 * rseq_smp_acquire__after_ctrl_dep().
+	 */
+	rseq_smp_store_release(&lock->c[cpu].v, 0);
+}
+
+void *test_percpu_spinlock_thread(void *arg)
+{
+	struct spinlock_test_data *data = arg;
+	int i, cpu;
+
+	if (rseq_register_current_thread()) {
+		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+	for (i = 0; i < data->reps; i++) {
+		cpu = rseq_percpu_lock(&data->lock);
+		data->c[cpu].count++;
+		rseq_percpu_unlock(&data->lock, cpu);
+	}
+	if (rseq_unregister_current_thread()) {
+		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+
+	return NULL;
+}
+
+/*
+ * A simple test which implements a sharded counter using a per-cpu
+ * lock.  Obviously real applications might prefer to simply use a
+ * per-cpu increment; however, this is reasonable for a test and the
+ * lock can be extended to synchronize more complicated operations.
+ */
+void test_percpu_spinlock(void)
+{
+	const int num_threads = 200;
+	int i;
+	uint64_t sum;
+	pthread_t test_threads[num_threads];
+	struct spinlock_test_data data;
+
+	memset(&data, 0, sizeof(data));
+	data.reps = 5000;
+
+	for (i = 0; i < num_threads; i++)
+		pthread_create(&test_threads[i], NULL,
+			test_percpu_spinlock_thread, &data);
+
+	for (i = 0; i < num_threads; i++)
+		pthread_join(test_threads[i], NULL);
+
+	sum = 0;
+	for (i = 0; i < CPU_SETSIZE; i++)
+		sum += data.c[i].count;
+
+	assert(sum == (uint64_t)data.reps * num_threads);
+}
+
+int percpu_list_push(struct percpu_list *list, struct percpu_list_node *node)
+{
+	intptr_t *targetptr, newval, expect;
+	int cpu, ret;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load list->c[cpu].head with single-copy atomicity. */
+	expect = (intptr_t)READ_ONCE(list->c[cpu].head);
+	newval = (intptr_t)node;
+	targetptr = (intptr_t *)&list->c[cpu].head;
+	node->next = (struct percpu_list_node *)expect;
+	ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu);
+	if (likely(!ret))
+		return cpu;
+#endif
+	/* Fallback on cpu_opv system call. */
+	slowpath:
+		__attribute__((unused));
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load list->c[cpu].head with single-copy atomicity. */
+		expect = (intptr_t)READ_ONCE(list->c[cpu].head);
+		newval = (intptr_t)node;
+		targetptr = (intptr_t *)&list->c[cpu].head;
+		node->next = (struct percpu_list_node *)expect;
+		ret = cpu_op_cmpeqv_storev(targetptr, expect, newval, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return cpu;
+}
+
+/*
+ * Unlike a traditional lock-less linked list; the availability of a
+ * rseq primitive allows us to implement pop without concerns over
+ * ABA-type races.
+ */
+struct percpu_list_node *percpu_list_pop(struct percpu_list *list)
+{
+	struct percpu_list_node *head;
+	int cpu, ret;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	ret = rseq_cmpnev_storeoffp_load((intptr_t *)&list->c[cpu].head,
+		(intptr_t)NULL,
+		offsetof(struct percpu_list_node, next),
+		(intptr_t *)&head, cpu);
+	if (likely(!ret))
+		return head;
+	if (ret > 0)
+		return NULL;
+#endif
+	/* Fallback on cpu_opv system call. */
+	slowpath:
+		__attribute__((unused));
+	for (;;) {
+		cpu = rseq_current_cpu();
+		ret = cpu_op_cmpnev_storeoffp_load(
+			(intptr_t *)&list->c[cpu].head,
+			(intptr_t)NULL,
+			offsetof(struct percpu_list_node, next),
+			(intptr_t *)&head, cpu);
+		if (likely(!ret))
+			break;
+		if (ret > 0)
+			return NULL;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return head;
+}
+
+void *test_percpu_list_thread(void *arg)
+{
+	int i;
+	struct percpu_list *list = (struct percpu_list *)arg;
+
+	if (rseq_register_current_thread()) {
+		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+
+	for (i = 0; i < 100000; i++) {
+		struct percpu_list_node *node = percpu_list_pop(list);
+
+		sched_yield();  /* encourage shuffling */
+		if (node)
+			percpu_list_push(list, node);
+	}
+
+	if (rseq_unregister_current_thread()) {
+		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+
+	return NULL;
+}
+
+/* Simultaneous modification to a per-cpu linked list from many threads.  */
+void test_percpu_list(void)
+{
+	int i, j;
+	uint64_t sum = 0, expected_sum = 0;
+	struct percpu_list list;
+	pthread_t test_threads[200];
+	cpu_set_t allowed_cpus;
+
+	memset(&list, 0, sizeof(list));
+
+	/* Generate list entries for every usable cpu. */
+	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+		for (j = 1; j <= 100; j++) {
+			struct percpu_list_node *node;
+
+			expected_sum += j;
+
+			node = malloc(sizeof(*node));
+			assert(node);
+			node->data = j;
+			node->next = list.c[i].head;
+			list.c[i].head = node;
+		}
+	}
+
+	for (i = 0; i < 200; i++)
+		assert(pthread_create(&test_threads[i], NULL,
+			test_percpu_list_thread, &list) == 0);
+
+	for (i = 0; i < 200; i++)
+		pthread_join(test_threads[i], NULL);
+
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		cpu_set_t pin_mask;
+		struct percpu_list_node *node;
+
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+
+		CPU_ZERO(&pin_mask);
+		CPU_SET(i, &pin_mask);
+		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
+
+		while ((node = percpu_list_pop(&list))) {
+			sum += node->data;
+			free(node);
+		}
+	}
+
+	/*
+	 * All entries should now be accounted for (unless some external
+	 * actor is interfering with our allowed affinity while this
+	 * test is running).
+	 */
+	assert(sum == expected_sum);
+}
+
+int main(int argc, char **argv)
+{
+	if (rseq_register_current_thread()) {
+		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		goto error;
+	}
+	printf("spinlock\n");
+	test_percpu_spinlock();
+	printf("percpu_list\n");
+	test_percpu_list();
+	if (rseq_unregister_current_thread()) {
+		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		goto error;
+	}
+	return 0;
+
+error:
+	return -1;
+}
+
diff --git a/tools/testing/selftests/rseq/basic_test.c b/tools/testing/selftests/rseq/basic_test.c
new file mode 100644
index 000000000000..e2086b3885d7
--- /dev/null
+++ b/tools/testing/selftests/rseq/basic_test.c
@@ -0,0 +1,55 @@
+/*
+ * Basic test coverage for critical regions and rseq_current_cpu().
+ */
+
+#define _GNU_SOURCE
+#include <assert.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/time.h>
+
+#include "rseq.h"
+
+void test_cpu_pointer(void)
+{
+	cpu_set_t affinity, test_affinity;
+	int i;
+
+	sched_getaffinity(0, sizeof(affinity), &affinity);
+	CPU_ZERO(&test_affinity);
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		if (CPU_ISSET(i, &affinity)) {
+			CPU_SET(i, &test_affinity);
+			sched_setaffinity(0, sizeof(test_affinity),
+					&test_affinity);
+			assert(sched_getcpu() == i);
+			assert(rseq_current_cpu() == i);
+			assert(rseq_current_cpu_raw() == i);
+			assert(rseq_cpu_start() == i);
+			CPU_CLR(i, &test_affinity);
+		}
+	}
+	sched_setaffinity(0, sizeof(affinity), &affinity);
+}
+
+int main(int argc, char **argv)
+{
+	if (rseq_register_current_thread()) {
+		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		goto init_thread_error;
+	}
+	printf("testing current cpu\n");
+	test_cpu_pointer();
+	if (rseq_unregister_current_thread()) {
+		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		goto init_thread_error;
+	}
+	return 0;
+
+init_thread_error:
+	return -1;
+}
diff --git a/tools/testing/selftests/rseq/param_test.c b/tools/testing/selftests/rseq/param_test.c
new file mode 100644
index 000000000000..c7a16b656a36
--- /dev/null
+++ b/tools/testing/selftests/rseq/param_test.c
@@ -0,0 +1,1285 @@
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <syscall.h>
+#include <unistd.h>
+#include <poll.h>
+#include <sys/types.h>
+#include <signal.h>
+#include <errno.h>
+#include <stddef.h>
+
+#include "cpu-op.h"
+
+static inline pid_t gettid(void)
+{
+	return syscall(__NR_gettid);
+}
+
+#define NR_INJECT	9
+static int loop_cnt[NR_INJECT + 1];
+
+static int opt_modulo, verbose;
+
+static int opt_yield, opt_signal, opt_sleep,
+		opt_disable_rseq, opt_threads = 200,
+		opt_disable_mod = 0, opt_test = 's', opt_mb = 0;
+
+static long long opt_reps = 5000;
+
+static __thread __attribute__((tls_model("initial-exec"))) unsigned int signals_delivered;
+
+#ifndef BENCHMARK
+
+static __thread __attribute__((tls_model("initial-exec"))) unsigned int yield_mod_cnt, nr_abort;
+
+#define printf_verbose(fmt, ...)			\
+	do {						\
+		if (verbose)				\
+			printf(fmt, ## __VA_ARGS__);	\
+	} while (0)
+
+#define RSEQ_INJECT_INPUT \
+	, [loop_cnt_1]"m"(loop_cnt[1]) \
+	, [loop_cnt_2]"m"(loop_cnt[2]) \
+	, [loop_cnt_3]"m"(loop_cnt[3]) \
+	, [loop_cnt_4]"m"(loop_cnt[4]) \
+	, [loop_cnt_5]"m"(loop_cnt[5]) \
+	, [loop_cnt_6]"m"(loop_cnt[6])
+
+#if defined(__x86_64__) || defined(__i386__)
+
+#define INJECT_ASM_REG	"eax"
+
+#define RSEQ_INJECT_CLOBBER \
+	, INJECT_ASM_REG
+
+#define RSEQ_INJECT_ASM(n) \
+	"mov %[loop_cnt_" #n "], %%" INJECT_ASM_REG "\n\t" \
+	"test %%" INJECT_ASM_REG ",%%" INJECT_ASM_REG "\n\t" \
+	"jz 333f\n\t" \
+	"222:\n\t" \
+	"dec %%" INJECT_ASM_REG "\n\t" \
+	"jnz 222b\n\t" \
+	"333:\n\t"
+
+#elif defined(__ARMEL__)
+
+#define INJECT_ASM_REG	"r4"
+
+#define RSEQ_INJECT_CLOBBER \
+	, INJECT_ASM_REG
+
+#define RSEQ_INJECT_ASM(n) \
+	"ldr " INJECT_ASM_REG ", %[loop_cnt_" #n "]\n\t" \
+	"cmp " INJECT_ASM_REG ", #0\n\t" \
+	"beq 333f\n\t" \
+	"222:\n\t" \
+	"subs " INJECT_ASM_REG ", #1\n\t" \
+	"bne 222b\n\t" \
+	"333:\n\t"
+
+#elif __PPC__
+#define INJECT_ASM_REG	"r18"
+
+#define RSEQ_INJECT_CLOBBER \
+	, INJECT_ASM_REG
+
+#define RSEQ_INJECT_ASM(n) \
+	"lwz %%" INJECT_ASM_REG ", %[loop_cnt_" #n "]\n\t" \
+	"cmpwi %%" INJECT_ASM_REG ", 0\n\t" \
+	"beq 333f\n\t" \
+	"222:\n\t" \
+	"subic. %%" INJECT_ASM_REG ", %%" INJECT_ASM_REG ", 1\n\t" \
+	"bne 222b\n\t" \
+	"333:\n\t"
+#else
+#error unsupported target
+#endif
+
+#define RSEQ_INJECT_FAILED \
+	nr_abort++;
+
+#define RSEQ_INJECT_C(n) \
+{ \
+	int loc_i, loc_nr_loops = loop_cnt[n]; \
+	\
+	for (loc_i = 0; loc_i < loc_nr_loops; loc_i++) { \
+		barrier(); \
+	} \
+	if (loc_nr_loops == -1 && opt_modulo) { \
+		if (yield_mod_cnt == opt_modulo - 1) { \
+			if (opt_sleep > 0) \
+				poll(NULL, 0, opt_sleep); \
+			if (opt_yield) \
+				sched_yield(); \
+			if (opt_signal) \
+				raise(SIGUSR1); \
+			yield_mod_cnt = 0; \
+		} else { \
+			yield_mod_cnt++; \
+		} \
+	} \
+}
+
+#else
+
+#define printf_verbose(fmt, ...)
+
+#endif /* BENCHMARK */
+
+#include "rseq.h"
+
+struct percpu_lock_entry {
+	intptr_t v;
+} __attribute__((aligned(128)));
+
+struct percpu_lock {
+	struct percpu_lock_entry c[CPU_SETSIZE];
+};
+
+struct test_data_entry {
+	intptr_t count;
+} __attribute__((aligned(128)));
+
+struct spinlock_test_data {
+	struct percpu_lock lock;
+	struct test_data_entry c[CPU_SETSIZE];
+};
+
+struct spinlock_thread_test_data {
+	struct spinlock_test_data *data;
+	long long reps;
+	int reg;
+};
+
+struct inc_test_data {
+	struct test_data_entry c[CPU_SETSIZE];
+};
+
+struct inc_thread_test_data {
+	struct inc_test_data *data;
+	long long reps;
+	int reg;
+};
+
+struct percpu_list_node {
+	intptr_t data;
+	struct percpu_list_node *next;
+};
+
+struct percpu_list_entry {
+	struct percpu_list_node *head;
+} __attribute__((aligned(128)));
+
+struct percpu_list {
+	struct percpu_list_entry c[CPU_SETSIZE];
+};
+
+#define BUFFER_ITEM_PER_CPU	100
+
+struct percpu_buffer_node {
+	intptr_t data;
+};
+
+struct percpu_buffer_entry {
+	intptr_t offset;
+	intptr_t buflen;
+	struct percpu_buffer_node **array;
+} __attribute__((aligned(128)));
+
+struct percpu_buffer {
+	struct percpu_buffer_entry c[CPU_SETSIZE];
+};
+
+#define MEMCPY_BUFFER_ITEM_PER_CPU	100
+
+struct percpu_memcpy_buffer_node {
+	intptr_t data1;
+	uint64_t data2;
+};
+
+struct percpu_memcpy_buffer_entry {
+	intptr_t offset;
+	intptr_t buflen;
+	struct percpu_memcpy_buffer_node *array;
+} __attribute__((aligned(128)));
+
+struct percpu_memcpy_buffer {
+	struct percpu_memcpy_buffer_entry c[CPU_SETSIZE];
+};
+
+/* A simple percpu spinlock.  Returns the cpu lock was acquired on. */
+static int rseq_percpu_lock(struct percpu_lock *lock)
+{
+	int cpu;
+
+	for (;;) {
+		int ret;
+
+#ifndef SKIP_FASTPATH
+		/* Try fast path. */
+		cpu = rseq_cpu_start();
+		ret = rseq_cmpeqv_storev(&lock->c[cpu].v,
+				0, 1, cpu);
+		if (likely(!ret))
+			break;
+		if (ret > 0)
+			continue;	/* Retry. */
+#endif
+	slowpath:
+		__attribute__((unused));
+		/* Fallback on cpu_opv system call. */
+		cpu = rseq_current_cpu();
+		ret = cpu_op_cmpeqv_storev(&lock->c[cpu].v, 0, 1, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	/*
+	 * Acquire semantic when taking lock after control dependency.
+	 * Matches rseq_smp_store_release().
+	 */
+	rseq_smp_acquire__after_ctrl_dep();
+	return cpu;
+}
+
+static void rseq_percpu_unlock(struct percpu_lock *lock, int cpu)
+{
+	assert(lock->c[cpu].v == 1);
+	/*
+	 * Release lock, with release semantic. Matches
+	 * rseq_smp_acquire__after_ctrl_dep().
+	 */
+	rseq_smp_store_release(&lock->c[cpu].v, 0);
+}
+
+void *test_percpu_spinlock_thread(void *arg)
+{
+	struct spinlock_thread_test_data *thread_data = arg;
+	struct spinlock_test_data *data = thread_data->data;
+	int cpu;
+	long long i, reps;
+
+	if (!opt_disable_rseq && thread_data->reg
+			&& rseq_register_current_thread())
+		abort();
+	reps = thread_data->reps;
+	for (i = 0; i < reps; i++) {
+		cpu = rseq_percpu_lock(&data->lock);
+		data->c[cpu].count++;
+		rseq_percpu_unlock(&data->lock, cpu);
+#ifndef BENCHMARK
+		if (i != 0 && !(i % (reps / 10)))
+			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
+#endif
+	}
+	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
+		(int) gettid(), nr_abort, signals_delivered);
+	if (!opt_disable_rseq && thread_data->reg
+			&& rseq_unregister_current_thread())
+		abort();
+	return NULL;
+}
+
+/*
+ * A simple test which implements a sharded counter using a per-cpu
+ * lock.  Obviously real applications might prefer to simply use a
+ * per-cpu increment; however, this is reasonable for a test and the
+ * lock can be extended to synchronize more complicated operations.
+ */
+void test_percpu_spinlock(void)
+{
+	const int num_threads = opt_threads;
+	int i, ret;
+	uint64_t sum;
+	pthread_t test_threads[num_threads];
+	struct spinlock_test_data data;
+	struct spinlock_thread_test_data thread_data[num_threads];
+
+	memset(&data, 0, sizeof(data));
+	for (i = 0; i < num_threads; i++) {
+		thread_data[i].reps = opt_reps;
+		if (opt_disable_mod <= 0 || (i % opt_disable_mod))
+			thread_data[i].reg = 1;
+		else
+			thread_data[i].reg = 0;
+		thread_data[i].data = &data;
+		ret = pthread_create(&test_threads[i], NULL,
+			test_percpu_spinlock_thread, &thread_data[i]);
+		if (ret) {
+			errno = ret;
+			perror("pthread_create");
+			abort();
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		pthread_join(test_threads[i], NULL);
+		if (ret) {
+			errno = ret;
+			perror("pthread_join");
+			abort();
+		}
+	}
+
+	sum = 0;
+	for (i = 0; i < CPU_SETSIZE; i++)
+		sum += data.c[i].count;
+
+	assert(sum == (uint64_t)opt_reps * num_threads);
+}
+
+void *test_percpu_inc_thread(void *arg)
+{
+	struct inc_thread_test_data *thread_data = arg;
+	struct inc_test_data *data = thread_data->data;
+	long long i, reps;
+
+	if (!opt_disable_rseq && thread_data->reg
+			&& rseq_register_current_thread())
+		abort();
+	reps = thread_data->reps;
+	for (i = 0; i < reps; i++) {
+		int cpu, ret;
+
+#ifndef SKIP_FASTPATH
+		/* Try fast path. */
+		cpu = rseq_cpu_start();
+		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
+		if (likely(!ret))
+			goto next;
+#endif
+	slowpath:
+		__attribute__((unused));
+		for (;;) {
+			/* Fallback on cpu_opv system call. */
+			cpu = rseq_current_cpu();
+			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
+			if (likely(!ret))
+				break;
+			assert(ret >= 0 || errno == EAGAIN);
+		}
+	next:
+		__attribute__((unused));
+#ifndef BENCHMARK
+		if (i != 0 && !(i % (reps / 10)))
+			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
+#endif
+	}
+	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
+		(int) gettid(), nr_abort, signals_delivered);
+	if (!opt_disable_rseq && thread_data->reg
+			&& rseq_unregister_current_thread())
+		abort();
+	return NULL;
+}
+
+void test_percpu_inc(void)
+{
+	const int num_threads = opt_threads;
+	int i, ret;
+	uint64_t sum;
+	pthread_t test_threads[num_threads];
+	struct inc_test_data data;
+	struct inc_thread_test_data thread_data[num_threads];
+
+	memset(&data, 0, sizeof(data));
+	for (i = 0; i < num_threads; i++) {
+		thread_data[i].reps = opt_reps;
+		if (opt_disable_mod <= 0 || (i % opt_disable_mod))
+			thread_data[i].reg = 1;
+		else
+			thread_data[i].reg = 0;
+		thread_data[i].data = &data;
+		ret = pthread_create(&test_threads[i], NULL,
+			test_percpu_inc_thread, &thread_data[i]);
+		if (ret) {
+			errno = ret;
+			perror("pthread_create");
+			abort();
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		pthread_join(test_threads[i], NULL);
+		if (ret) {
+			errno = ret;
+			perror("pthread_join");
+			abort();
+		}
+	}
+
+	sum = 0;
+	for (i = 0; i < CPU_SETSIZE; i++)
+		sum += data.c[i].count;
+
+	assert(sum == (uint64_t)opt_reps * num_threads);
+}
+
+int percpu_list_push(struct percpu_list *list, struct percpu_list_node *node)
+{
+	intptr_t *targetptr, newval, expect;
+	int cpu, ret;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load list->c[cpu].head with single-copy atomicity. */
+	expect = (intptr_t)READ_ONCE(list->c[cpu].head);
+	newval = (intptr_t)node;
+	targetptr = (intptr_t *)&list->c[cpu].head;
+	node->next = (struct percpu_list_node *)expect;
+	ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu);
+	if (likely(!ret))
+		return cpu;
+#endif
+	/* Fallback on cpu_opv system call. */
+slowpath:
+	__attribute__((unused));
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load list->c[cpu].head with single-copy atomicity. */
+		expect = (intptr_t)READ_ONCE(list->c[cpu].head);
+		newval = (intptr_t)node;
+		targetptr = (intptr_t *)&list->c[cpu].head;
+		node->next = (struct percpu_list_node *)expect;
+		ret = cpu_op_cmpeqv_storev(targetptr, expect, newval, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return cpu;
+}
+
+/*
+ * Unlike a traditional lock-less linked list; the availability of a
+ * rseq primitive allows us to implement pop without concerns over
+ * ABA-type races.
+ */
+struct percpu_list_node *percpu_list_pop(struct percpu_list *list)
+{
+	struct percpu_list_node *head;
+	int cpu, ret;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	ret = rseq_cmpnev_storeoffp_load((intptr_t *)&list->c[cpu].head,
+		(intptr_t)NULL,
+		offsetof(struct percpu_list_node, next),
+		(intptr_t *)&head, cpu);
+	if (likely(!ret))
+		return head;
+	if (ret > 0)
+		return NULL;
+#endif
+	/* Fallback on cpu_opv system call. */
+	slowpath:
+		__attribute__((unused));
+	for (;;) {
+		cpu = rseq_current_cpu();
+		ret = cpu_op_cmpnev_storeoffp_load(
+			(intptr_t *)&list->c[cpu].head,
+			(intptr_t)NULL,
+			offsetof(struct percpu_list_node, next),
+			(intptr_t *)&head, cpu);
+		if (likely(!ret))
+			break;
+		if (ret > 0)
+			return NULL;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return head;
+}
+
+void *test_percpu_list_thread(void *arg)
+{
+	long long i, reps;
+	struct percpu_list *list = (struct percpu_list *)arg;
+
+	if (!opt_disable_rseq && rseq_register_current_thread())
+		abort();
+
+	reps = opt_reps;
+	for (i = 0; i < reps; i++) {
+		struct percpu_list_node *node = percpu_list_pop(list);
+
+		if (opt_yield)
+			sched_yield();  /* encourage shuffling */
+		if (node)
+			percpu_list_push(list, node);
+	}
+
+	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
+		(int) gettid(), nr_abort, signals_delivered);
+	if (!opt_disable_rseq && rseq_unregister_current_thread())
+		abort();
+
+	return NULL;
+}
+
+/* Simultaneous modification to a per-cpu linked list from many threads.  */
+void test_percpu_list(void)
+{
+	const int num_threads = opt_threads;
+	int i, j, ret;
+	uint64_t sum = 0, expected_sum = 0;
+	struct percpu_list list;
+	pthread_t test_threads[num_threads];
+	cpu_set_t allowed_cpus;
+
+	memset(&list, 0, sizeof(list));
+
+	/* Generate list entries for every usable cpu. */
+	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+		for (j = 1; j <= 100; j++) {
+			struct percpu_list_node *node;
+
+			expected_sum += j;
+
+			node = malloc(sizeof(*node));
+			assert(node);
+			node->data = j;
+			node->next = list.c[i].head;
+			list.c[i].head = node;
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		ret = pthread_create(&test_threads[i], NULL,
+			test_percpu_list_thread, &list);
+		if (ret) {
+			errno = ret;
+			perror("pthread_create");
+			abort();
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		pthread_join(test_threads[i], NULL);
+		if (ret) {
+			errno = ret;
+			perror("pthread_join");
+			abort();
+		}
+	}
+
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		cpu_set_t pin_mask;
+		struct percpu_list_node *node;
+
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+
+		CPU_ZERO(&pin_mask);
+		CPU_SET(i, &pin_mask);
+		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
+
+		while ((node = percpu_list_pop(&list))) {
+			sum += node->data;
+			free(node);
+		}
+	}
+
+	/*
+	 * All entries should now be accounted for (unless some external
+	 * actor is interfering with our allowed affinity while this
+	 * test is running).
+	 */
+	assert(sum == expected_sum);
+}
+
+bool percpu_buffer_push(struct percpu_buffer *buffer,
+		struct percpu_buffer_node *node)
+{
+	intptr_t *targetptr_spec, newval_spec;
+	intptr_t *targetptr_final, newval_final;
+	int cpu, ret;
+	intptr_t offset;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load offset with single-copy atomicity. */
+	offset = READ_ONCE(buffer->c[cpu].offset);
+	if (offset == buffer->c[cpu].buflen) {
+		if (unlikely(cpu != rseq_current_cpu_raw()))
+			goto slowpath;
+		return false;
+	}
+	newval_spec = (intptr_t)node;
+	targetptr_spec = (intptr_t *)&buffer->c[cpu].array[offset];
+	newval_final = offset + 1;
+	targetptr_final = &buffer->c[cpu].offset;
+	if (opt_mb)
+		ret = rseq_cmpeqv_trystorev_storev_release(targetptr_final,
+			offset, targetptr_spec, newval_spec,
+			newval_final, cpu);
+	else
+		ret = rseq_cmpeqv_trystorev_storev(targetptr_final,
+			offset, targetptr_spec, newval_spec,
+			newval_final, cpu);
+	if (likely(!ret))
+		return true;
+#endif
+slowpath:
+	__attribute__((unused));
+	/* Fallback on cpu_opv system call. */
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load offset with single-copy atomicity. */
+		offset = READ_ONCE(buffer->c[cpu].offset);
+		if (offset == buffer->c[cpu].buflen)
+			return false;
+		newval_spec = (intptr_t)node;
+		targetptr_spec = (intptr_t *)&buffer->c[cpu].array[offset];
+		newval_final = offset + 1;
+		targetptr_final = &buffer->c[cpu].offset;
+		if (opt_mb)
+			ret = cpu_op_cmpeqv_storev_mb_storev(targetptr_final,
+				offset, targetptr_spec, newval_spec,
+				newval_final, cpu);
+		else
+			ret = cpu_op_cmpeqv_storev_storev(targetptr_final,
+				offset, targetptr_spec, newval_spec,
+				newval_final, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return true;
+}
+
+struct percpu_buffer_node *percpu_buffer_pop(struct percpu_buffer *buffer)
+{
+	struct percpu_buffer_node *head;
+	intptr_t *targetptr, newval;
+	int cpu, ret;
+	intptr_t offset;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load offset with single-copy atomicity. */
+	offset = READ_ONCE(buffer->c[cpu].offset);
+	if (offset == 0) {
+		if (unlikely(cpu != rseq_current_cpu_raw()))
+			goto slowpath;
+		return NULL;
+	}
+	head = buffer->c[cpu].array[offset - 1];
+	newval = offset - 1;
+	targetptr = (intptr_t *)&buffer->c[cpu].offset;
+	ret = rseq_cmpeqv_cmpeqv_storev(targetptr, offset,
+		(intptr_t *)&buffer->c[cpu].array[offset - 1], (intptr_t)head,
+		newval, cpu);
+	if (likely(!ret))
+		return head;
+#endif
+slowpath:
+	__attribute__((unused));
+	/* Fallback on cpu_opv system call. */
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load offset with single-copy atomicity. */
+		offset = READ_ONCE(buffer->c[cpu].offset);
+		if (offset == 0)
+			return NULL;
+		head = buffer->c[cpu].array[offset - 1];
+		newval = offset - 1;
+		targetptr = (intptr_t *)&buffer->c[cpu].offset;
+		ret = cpu_op_cmpeqv_cmpeqv_storev(targetptr, offset,
+			(intptr_t *)&buffer->c[cpu].array[offset - 1],
+			(intptr_t)head, newval, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return head;
+}
+
+void *test_percpu_buffer_thread(void *arg)
+{
+	long long i, reps;
+	struct percpu_buffer *buffer = (struct percpu_buffer *)arg;
+
+	if (!opt_disable_rseq && rseq_register_current_thread())
+		abort();
+
+	reps = opt_reps;
+	for (i = 0; i < reps; i++) {
+		struct percpu_buffer_node *node = percpu_buffer_pop(buffer);
+
+		if (opt_yield)
+			sched_yield();  /* encourage shuffling */
+		if (node) {
+			if (!percpu_buffer_push(buffer, node)) {
+				/* Should increase buffer size. */
+				abort();
+			}
+		}
+	}
+
+	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
+		(int) gettid(), nr_abort, signals_delivered);
+	if (!opt_disable_rseq && rseq_unregister_current_thread())
+		abort();
+
+	return NULL;
+}
+
+/* Simultaneous modification to a per-cpu buffer from many threads.  */
+void test_percpu_buffer(void)
+{
+	const int num_threads = opt_threads;
+	int i, j, ret;
+	uint64_t sum = 0, expected_sum = 0;
+	struct percpu_buffer buffer;
+	pthread_t test_threads[num_threads];
+	cpu_set_t allowed_cpus;
+
+	memset(&buffer, 0, sizeof(buffer));
+
+	/* Generate list entries for every usable cpu. */
+	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+		/* Worse-case is every item in same CPU. */
+		buffer.c[i].array =
+			malloc(sizeof(*buffer.c[i].array) * CPU_SETSIZE
+				* BUFFER_ITEM_PER_CPU);
+		assert(buffer.c[i].array);
+		buffer.c[i].buflen = CPU_SETSIZE * BUFFER_ITEM_PER_CPU;
+		for (j = 1; j <= BUFFER_ITEM_PER_CPU; j++) {
+			struct percpu_buffer_node *node;
+
+			expected_sum += j;
+
+			/*
+			 * We could theoretically put the word-sized
+			 * "data" directly in the buffer. However, we
+			 * want to model objects that would not fit
+			 * within a single word, so allocate an object
+			 * for each node.
+			 */
+			node = malloc(sizeof(*node));
+			assert(node);
+			node->data = j;
+			buffer.c[i].array[j - 1] = node;
+			buffer.c[i].offset++;
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		ret = pthread_create(&test_threads[i], NULL,
+			test_percpu_buffer_thread, &buffer);
+		if (ret) {
+			errno = ret;
+			perror("pthread_create");
+			abort();
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		pthread_join(test_threads[i], NULL);
+		if (ret) {
+			errno = ret;
+			perror("pthread_join");
+			abort();
+		}
+	}
+
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		cpu_set_t pin_mask;
+		struct percpu_buffer_node *node;
+
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+
+		CPU_ZERO(&pin_mask);
+		CPU_SET(i, &pin_mask);
+		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
+
+		while ((node = percpu_buffer_pop(&buffer))) {
+			sum += node->data;
+			free(node);
+		}
+		free(buffer.c[i].array);
+	}
+
+	/*
+	 * All entries should now be accounted for (unless some external
+	 * actor is interfering with our allowed affinity while this
+	 * test is running).
+	 */
+	assert(sum == expected_sum);
+}
+
+bool percpu_memcpy_buffer_push(struct percpu_memcpy_buffer *buffer,
+		struct percpu_memcpy_buffer_node item)
+{
+	char *destptr, *srcptr;
+	size_t copylen;
+	intptr_t *targetptr_final, newval_final;
+	int cpu, ret;
+	intptr_t offset;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load offset with single-copy atomicity. */
+	offset = READ_ONCE(buffer->c[cpu].offset);
+	if (offset == buffer->c[cpu].buflen) {
+		if (unlikely(cpu != rseq_current_cpu_raw()))
+			goto slowpath;
+		return false;
+	}
+	destptr = (char *)&buffer->c[cpu].array[offset];
+	srcptr = (char *)&item;
+	copylen = sizeof(item);
+	newval_final = offset + 1;
+	targetptr_final = &buffer->c[cpu].offset;
+	if (opt_mb)
+		ret = rseq_cmpeqv_trymemcpy_storev_release(targetptr_final,
+			offset, destptr, srcptr, copylen,
+			newval_final, cpu);
+	else
+		ret = rseq_cmpeqv_trymemcpy_storev(targetptr_final,
+			offset, destptr, srcptr, copylen,
+			newval_final, cpu);
+	if (likely(!ret))
+		return true;
+#endif
+slowpath:
+	__attribute__((unused));
+	/* Fallback on cpu_opv system call. */
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load offset with single-copy atomicity. */
+		offset = READ_ONCE(buffer->c[cpu].offset);
+		if (offset == buffer->c[cpu].buflen)
+			return false;
+		destptr = (char *)&buffer->c[cpu].array[offset];
+		srcptr = (char *)&item;
+		copylen = sizeof(item);
+		newval_final = offset + 1;
+		targetptr_final = &buffer->c[cpu].offset;
+		/* copylen must be <= PAGE_SIZE. */
+		if (opt_mb)
+			ret = cpu_op_cmpeqv_memcpy_mb_storev(targetptr_final,
+				offset, destptr, srcptr, copylen,
+				newval_final, cpu);
+		else
+			ret = cpu_op_cmpeqv_memcpy_storev(targetptr_final,
+				offset, destptr, srcptr, copylen,
+				newval_final, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return true;
+}
+
+bool percpu_memcpy_buffer_pop(struct percpu_memcpy_buffer *buffer,
+		struct percpu_memcpy_buffer_node *item)
+{
+	char *destptr, *srcptr;
+	size_t copylen;
+	intptr_t *targetptr_final, newval_final;
+	int cpu, ret;
+	intptr_t offset;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load offset with single-copy atomicity. */
+	offset = READ_ONCE(buffer->c[cpu].offset);
+	if (offset == 0) {
+		if (unlikely(cpu != rseq_current_cpu_raw()))
+			goto slowpath;
+		return false;
+	}
+	destptr = (char *)item;
+	srcptr = (char *)&buffer->c[cpu].array[offset - 1];
+	copylen = sizeof(*item);
+	newval_final = offset - 1;
+	targetptr_final = &buffer->c[cpu].offset;
+	ret = rseq_cmpeqv_trymemcpy_storev(targetptr_final,
+		offset, destptr, srcptr, copylen,
+		newval_final, cpu);
+	if (likely(!ret))
+		return true;
+#endif
+slowpath:
+	__attribute__((unused));
+	/* Fallback on cpu_opv system call. */
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load offset with single-copy atomicity. */
+		offset = READ_ONCE(buffer->c[cpu].offset);
+		if (offset == 0)
+			return false;
+		destptr = (char *)item;
+		srcptr = (char *)&buffer->c[cpu].array[offset - 1];
+		copylen = sizeof(*item);
+		newval_final = offset - 1;
+		targetptr_final = &buffer->c[cpu].offset;
+		/* copylen must be <= PAGE_SIZE. */
+		ret = cpu_op_cmpeqv_memcpy_storev(targetptr_final,
+			offset, destptr, srcptr, copylen,
+			newval_final, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return true;
+}
+
+void *test_percpu_memcpy_buffer_thread(void *arg)
+{
+	long long i, reps;
+	struct percpu_memcpy_buffer *buffer = (struct percpu_memcpy_buffer *)arg;
+
+	if (!opt_disable_rseq && rseq_register_current_thread())
+		abort();
+
+	reps = opt_reps;
+	for (i = 0; i < reps; i++) {
+		struct percpu_memcpy_buffer_node item;
+		bool result;
+
+		result = percpu_memcpy_buffer_pop(buffer, &item);
+		if (opt_yield)
+			sched_yield();  /* encourage shuffling */
+		if (result) {
+			if (!percpu_memcpy_buffer_push(buffer, item)) {
+				/* Should increase buffer size. */
+				abort();
+			}
+		}
+	}
+
+	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
+		(int) gettid(), nr_abort, signals_delivered);
+	if (!opt_disable_rseq && rseq_unregister_current_thread())
+		abort();
+
+	return NULL;
+}
+
+/* Simultaneous modification to a per-cpu buffer from many threads.  */
+void test_percpu_memcpy_buffer(void)
+{
+	const int num_threads = opt_threads;
+	int i, j, ret;
+	uint64_t sum = 0, expected_sum = 0;
+	struct percpu_memcpy_buffer buffer;
+	pthread_t test_threads[num_threads];
+	cpu_set_t allowed_cpus;
+
+	memset(&buffer, 0, sizeof(buffer));
+
+	/* Generate list entries for every usable cpu. */
+	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+		/* Worse-case is every item in same CPU. */
+		buffer.c[i].array =
+			malloc(sizeof(*buffer.c[i].array) * CPU_SETSIZE
+				* MEMCPY_BUFFER_ITEM_PER_CPU);
+		assert(buffer.c[i].array);
+		buffer.c[i].buflen = CPU_SETSIZE * MEMCPY_BUFFER_ITEM_PER_CPU;
+		for (j = 1; j <= MEMCPY_BUFFER_ITEM_PER_CPU; j++) {
+			expected_sum += 2 * j + 1;
+
+			/*
+			 * We could theoretically put the word-sized
+			 * "data" directly in the buffer. However, we
+			 * want to model objects that would not fit
+			 * within a single word, so allocate an object
+			 * for each node.
+			 */
+			buffer.c[i].array[j - 1].data1 = j;
+			buffer.c[i].array[j - 1].data2 = j + 1;
+			buffer.c[i].offset++;
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		ret = pthread_create(&test_threads[i], NULL,
+			test_percpu_memcpy_buffer_thread, &buffer);
+		if (ret) {
+			errno = ret;
+			perror("pthread_create");
+			abort();
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		pthread_join(test_threads[i], NULL);
+		if (ret) {
+			errno = ret;
+			perror("pthread_join");
+			abort();
+		}
+	}
+
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		cpu_set_t pin_mask;
+		struct percpu_memcpy_buffer_node item;
+
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+
+		CPU_ZERO(&pin_mask);
+		CPU_SET(i, &pin_mask);
+		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
+
+		while (percpu_memcpy_buffer_pop(&buffer, &item)) {
+			sum += item.data1;
+			sum += item.data2;
+		}
+		free(buffer.c[i].array);
+	}
+
+	/*
+	 * All entries should now be accounted for (unless some external
+	 * actor is interfering with our allowed affinity while this
+	 * test is running).
+	 */
+	assert(sum == expected_sum);
+}
+
+static void test_signal_interrupt_handler(int signo)
+{
+	signals_delivered++;
+}
+
+static int set_signal_handler(void)
+{
+	int ret = 0;
+	struct sigaction sa;
+	sigset_t sigset;
+
+	ret = sigemptyset(&sigset);
+	if (ret < 0) {
+		perror("sigemptyset");
+		return ret;
+	}
+
+	sa.sa_handler = test_signal_interrupt_handler;
+	sa.sa_mask = sigset;
+	sa.sa_flags = 0;
+	ret = sigaction(SIGUSR1, &sa, NULL);
+	if (ret < 0) {
+		perror("sigaction");
+		return ret;
+	}
+
+	printf_verbose("Signal handler set for SIGUSR1\n");
+
+	return ret;
+}
+
+static void show_usage(int argc, char **argv)
+{
+	printf("Usage : %s <OPTIONS>\n",
+		argv[0]);
+	printf("OPTIONS:\n");
+	printf("	[-1 loops] Number of loops for delay injection 1\n");
+	printf("	[-2 loops] Number of loops for delay injection 2\n");
+	printf("	[-3 loops] Number of loops for delay injection 3\n");
+	printf("	[-4 loops] Number of loops for delay injection 4\n");
+	printf("	[-5 loops] Number of loops for delay injection 5\n");
+	printf("	[-6 loops] Number of loops for delay injection 6\n");
+	printf("	[-7 loops] Number of loops for delay injection 7 (-1 to enable -m)\n");
+	printf("	[-8 loops] Number of loops for delay injection 8 (-1 to enable -m)\n");
+	printf("	[-9 loops] Number of loops for delay injection 9 (-1 to enable -m)\n");
+	printf("	[-m N] Yield/sleep/kill every modulo N (default 0: disabled) (>= 0)\n");
+	printf("	[-y] Yield\n");
+	printf("	[-k] Kill thread with signal\n");
+	printf("	[-s S] S: =0: disabled (default), >0: sleep time (ms)\n");
+	printf("	[-t N] Number of threads (default 200)\n");
+	printf("	[-r N] Number of repetitions per thread (default 5000)\n");
+	printf("	[-d] Disable rseq system call (no initialization)\n");
+	printf("	[-D M] Disable rseq for each M threads\n");
+	printf("	[-T test] Choose test: (s)pinlock, (l)ist, (b)uffer, (m)emcpy, (i)ncrement\n");
+	printf("	[-M] Push into buffer and memcpy buffer with memory barriers.\n");
+	printf("	[-v] Verbose output.\n");
+	printf("	[-h] Show this help.\n");
+	printf("\n");
+}
+
+int main(int argc, char **argv)
+{
+	int i;
+
+	for (i = 1; i < argc; i++) {
+		if (argv[i][0] != '-')
+			continue;
+		switch (argv[i][1]) {
+		case '1':
+		case '2':
+		case '3':
+		case '4':
+		case '5':
+		case '6':
+		case '7':
+		case '8':
+		case '9':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			loop_cnt[argv[i][1] - '0'] = atol(argv[i + 1]);
+			i++;
+			break;
+		case 'm':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_modulo = atol(argv[i + 1]);
+			if (opt_modulo < 0) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 's':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_sleep = atol(argv[i + 1]);
+			if (opt_sleep < 0) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 'y':
+			opt_yield = 1;
+			break;
+		case 'k':
+			opt_signal = 1;
+			break;
+		case 'd':
+			opt_disable_rseq = 1;
+			break;
+		case 'D':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_disable_mod = atol(argv[i + 1]);
+			if (opt_disable_mod < 0) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 't':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_threads = atol(argv[i + 1]);
+			if (opt_threads < 0) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 'r':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_reps = atoll(argv[i + 1]);
+			if (opt_reps < 0) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 'h':
+			show_usage(argc, argv);
+			goto end;
+		case 'T':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_test = *argv[i + 1];
+			switch (opt_test) {
+			case 's':
+			case 'l':
+			case 'i':
+			case 'b':
+			case 'm':
+				break;
+			default:
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 'v':
+			verbose = 1;
+			break;
+		case 'M':
+			opt_mb = 1;
+			break;
+		default:
+			show_usage(argc, argv);
+			goto error;
+		}
+	}
+
+	if (set_signal_handler())
+		goto error;
+
+	if (!opt_disable_rseq && rseq_register_current_thread())
+		goto error;
+	switch (opt_test) {
+	case 's':
+		printf_verbose("spinlock\n");
+		test_percpu_spinlock();
+		break;
+	case 'l':
+		printf_verbose("linked list\n");
+		test_percpu_list();
+		break;
+	case 'b':
+		printf_verbose("buffer\n");
+		test_percpu_buffer();
+		break;
+	case 'm':
+		printf_verbose("memcpy buffer\n");
+		test_percpu_memcpy_buffer();
+		break;
+	case 'i':
+		printf_verbose("counter increment\n");
+		test_percpu_inc();
+		break;
+	}
+	if (!opt_disable_rseq && rseq_unregister_current_thread())
+		abort();
+end:
+	return 0;
+
+error:
+	return -1;
+}
diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h
new file mode 100644
index 000000000000..47953c0cef4f
--- /dev/null
+++ b/tools/testing/selftests/rseq/rseq-arm.h
@@ -0,0 +1,535 @@
+/*
+ * rseq-arm.h
+ *
+ * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#define RSEQ_SIG	0x53053053
+
+#define rseq_smp_mb()	__asm__ __volatile__ ("dmb" : : : "memory")
+#define rseq_smp_rmb()	__asm__ __volatile__ ("dmb" : : : "memory")
+#define rseq_smp_wmb()	__asm__ __volatile__ ("dmb" : : : "memory")
+
+#define rseq_smp_load_acquire(p)					\
+__extension__ ({							\
+	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
+	rseq_smp_mb();							\
+	____p1;								\
+})
+
+#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
+
+#define rseq_smp_store_release(p, v)					\
+do {									\
+	rseq_smp_mb();							\
+	WRITE_ONCE(*p, v);						\
+} while (0)
+
+#define RSEQ_ASM_DEFINE_TABLE(section, version, flags,			\
+			start_ip, post_commit_offset, abort_ip)		\
+		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
+		".balign 32\n\t"					\
+		".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
+		".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
+		".popsection\n\t"
+
+#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
+		RSEQ_INJECT_ASM(1)					\
+		"adr r0, " __rseq_str(cs_label) "\n\t"			\
+		"str r0, %[" __rseq_str(rseq_cs) "]\n\t"		\
+		__rseq_str(label) ":\n\t"
+
+#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
+		RSEQ_INJECT_ASM(2)					\
+		"ldr r0, %[" __rseq_str(current_cpu_id) "]\n\t"	\
+		"cmp %[" __rseq_str(cpu_id) "], r0\n\t"		\
+		"bne " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_DEFINE_ABORT(table_label, label, section, sig,		\
+			teardown, abort_label, version, flags, start_ip,\
+			post_commit_offset, abort_ip)			\
+		__rseq_str(table_label) ":\n\t"				\
+		".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
+		".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
+		".word " __rseq_str(RSEQ_SIG) "\n\t"			\
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"b %l[" __rseq_str(abort_label) "]\n\t"
+
+#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"b %l[" __rseq_str(cmpfail_label) "]\n\t"
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expectnot], r0\n\t"
+		"beq 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"str r0, %[load]\n\t"
+		"add r0, %[voffp]\n\t"
+		"ldr r0, [r0]\n\t"
+		/* final store */
+		"str r0, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expectnot]"r"(expectnot),
+		  [voffp]"Ir"(voffp),
+		  [load]"m"(*load)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_addv(intptr_t *v, intptr_t count, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"add r0, %[count]\n\t"
+		/* final store */
+		"str r0, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(4)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  [v]"m"(*v),
+		  [count]"Ir"(count)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		"str %[newv2], %[v2]\n\t"
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		"str %[newv2], %[v2]\n\t"
+		RSEQ_INJECT_ASM(5)
+		"dmb\n\t"	/* full mb provides store-release */
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"ldr r0, %[v2]\n\t"
+		"cmp %[expect2], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* cmp2 input */
+		  [v2]"m"(*v2),
+		  [expect2]"r"(expect2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	uint32_t rseq_scratch[3];
+
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		"str %[src], %[rseq_scratch0]\n\t"
+		"str %[dst], %[rseq_scratch1]\n\t"
+		"str %[len], %[rseq_scratch2]\n\t"
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		"cmp %[len], #0\n\t" \
+		"beq 333f\n\t" \
+		"222:\n\t" \
+		"ldrb %%r0, [%[src]]\n\t" \
+		"strb %%r0, [%[dst]]\n\t" \
+		"adds %[src], #1\n\t" \
+		"adds %[dst], #1\n\t" \
+		"subs %[len], #1\n\t" \
+		"bne 222b\n\t" \
+		"333:\n\t" \
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		"ldr %[len], %[rseq_scratch2]\n\t"
+		"ldr %[dst], %[rseq_scratch1]\n\t"
+		"ldr %[src], %[rseq_scratch0]\n\t"
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG,
+			/* teardown */
+			"ldr %[len], %[rseq_scratch2]\n\t"
+			"ldr %[dst], %[rseq_scratch1]\n\t"
+			"ldr %[src], %[rseq_scratch0]\n\t",
+			abort, 0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
+			/* teardown */
+			"ldr %[len], %[rseq_scratch2]\n\t"
+			"ldr %[dst], %[rseq_scratch1]\n\t"
+			"ldr %[src], %[rseq_scratch0]\n\t",
+			cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len),
+		  [rseq_scratch0]"m"(rseq_scratch[0]),
+		  [rseq_scratch1]"m"(rseq_scratch[1]),
+		  [rseq_scratch2]"m"(rseq_scratch[2])
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	uint32_t rseq_scratch[3];
+
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		"str %[src], %[rseq_scratch0]\n\t"
+		"str %[dst], %[rseq_scratch1]\n\t"
+		"str %[len], %[rseq_scratch2]\n\t"
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		"cmp %[len], #0\n\t" \
+		"beq 333f\n\t" \
+		"222:\n\t" \
+		"ldrb %%r0, [%[src]]\n\t" \
+		"strb %%r0, [%[dst]]\n\t" \
+		"adds %[src], #1\n\t" \
+		"adds %[dst], #1\n\t" \
+		"subs %[len], #1\n\t" \
+		"bne 222b\n\t" \
+		"333:\n\t" \
+		RSEQ_INJECT_ASM(5)
+		"dmb\n\t"	/* full mb provides store-release */
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		"ldr %[len], %[rseq_scratch2]\n\t"
+		"ldr %[dst], %[rseq_scratch1]\n\t"
+		"ldr %[src], %[rseq_scratch0]\n\t"
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG,
+			/* teardown */
+			"ldr %[len], %[rseq_scratch2]\n\t"
+			"ldr %[dst], %[rseq_scratch1]\n\t"
+			"ldr %[src], %[rseq_scratch0]\n\t",
+			abort, 0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
+			/* teardown */
+			"ldr %[len], %[rseq_scratch2]\n\t"
+			"ldr %[dst], %[rseq_scratch1]\n\t"
+			"ldr %[src], %[rseq_scratch0]\n\t",
+			cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len),
+		  [rseq_scratch0]"m"(rseq_scratch[0]),
+		  [rseq_scratch1]"m"(rseq_scratch[1]),
+		  [rseq_scratch2]"m"(rseq_scratch[2])
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
diff --git a/tools/testing/selftests/rseq/rseq-ppc.h b/tools/testing/selftests/rseq/rseq-ppc.h
new file mode 100644
index 000000000000..3db6be5ceffb
--- /dev/null
+++ b/tools/testing/selftests/rseq/rseq-ppc.h
@@ -0,0 +1,567 @@
+/*
+ * rseq-ppc.h
+ *
+ * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ * (C) Copyright 2016 - Boqun Feng <boqun.feng@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#define RSEQ_SIG	0x53053053
+
+#define rseq_smp_mb()		__asm__ __volatile__ ("sync" : : : "memory")
+#define rseq_smp_lwsync()	__asm__ __volatile__ ("lwsync" : : : "memory")
+#define rseq_smp_rmb()		rseq_smp_lwsync()
+#define rseq_smp_wmb()		rseq_smp_lwsync()
+
+#define rseq_smp_load_acquire(p)					\
+__extension__ ({							\
+	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
+	rseq_smp_lwsync();						\
+	____p1;								\
+})
+
+#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_lwsync()
+
+#define rseq_smp_store_release(p, v)					\
+do {									\
+	rseq_smp_lwsync();						\
+	RSEQ_WRITE_ONCE(*p, v);						\
+} while (0)
+
+/*
+ * The __rseq_table section can be used by debuggers to better handle
+ * single-stepping through the restartable critical sections.
+ */
+
+#ifdef __PPC64__
+
+#define STORE_WORD	"std "
+#define LOAD_WORD	"ld "
+#define LOADX_WORD	"ldx "
+#define CMP_WORD	"cmpd "
+
+#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,			\
+			start_ip, post_commit_offset, abort_ip)			\
+		".pushsection " __rseq_str(section) ", \"aw\"\n\t"		\
+		".balign 32\n\t"						\
+		__rseq_str(label) ":\n\t"					\
+		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t"	\
+		".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \
+		".popsection\n\t"
+
+#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)			\
+		RSEQ_INJECT_ASM(1)						\
+		"lis %%r17, (" __rseq_str(cs_label) ")@highest\n\t"		\
+		"ori %%r17, %%r17, (" __rseq_str(cs_label) ")@higher\n\t"	\
+		"rldicr %%r17, %%r17, 32, 31\n\t"				\
+		"oris %%r17, %%r17, (" __rseq_str(cs_label) ")@high\n\t"	\
+		"ori %%r17, %%r17, (" __rseq_str(cs_label) ")@l\n\t"		\
+		"std %%r17, %[" __rseq_str(rseq_cs) "]\n\t"			\
+		__rseq_str(label) ":\n\t"
+
+#else /* #ifdef __PPC64__ */
+
+#define STORE_WORD	"stw "
+#define LOAD_WORD	"lwz "
+#define LOADX_WORD	"lwzx "
+#define CMP_WORD	"cmpw "
+
+#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,			\
+			start_ip, post_commit_offset, abort_ip)			\
+		".pushsection " __rseq_str(section) ", \"aw\"\n\t"		\
+		".balign 32\n\t"						\
+		__rseq_str(label) ":\n\t"					\
+		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t"	\
+		/* 32-bit only supported on BE */				\
+		".long 0x0, " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) "\n\t" \
+		".popsection\n\t"
+
+#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)			\
+		RSEQ_INJECT_ASM(1)						\
+		"lis %%r17, (" __rseq_str(cs_label) ")@ha\n\t"			\
+		"addi %%r17, %%r17, (" __rseq_str(cs_label) ")@l\n\t"		\
+		"stw %%r17, %[" __rseq_str(rseq_cs) "]\n\t"			\
+		__rseq_str(label) ":\n\t"
+
+#endif /* #ifdef __PPC64__ */
+
+#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)			\
+		RSEQ_INJECT_ASM(2)						\
+		"lwz %%r17, %[" __rseq_str(current_cpu_id) "]\n\t"		\
+		"cmpw cr7, %[" __rseq_str(cpu_id) "], %%r17\n\t"		\
+		"bne- cr7, " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label)	\
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"		\
+		".long " __rseq_str(sig) "\n\t"					\
+		__rseq_str(label) ":\n\t"					\
+		teardown							\
+		"b %l[" __rseq_str(abort_label) "]\n\t"			\
+		".popsection\n\t"
+
+#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label)	\
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"		\
+		__rseq_str(label) ":\n\t"					\
+		teardown							\
+		"b %l[" __rseq_str(cmpfail_label) "]\n\t"			\
+		".popsection\n\t"
+
+
+/*
+ * RSEQ_ASM_OPs: asm operations for rseq
+ * 	RSEQ_ASM_OP_R_*: has hard-code registers in it
+ * 	RSEQ_ASM_OP_* (else): doesn't have hard-code registers(unless cr7)
+ */
+#define RSEQ_ASM_OP_CMPEQ(var, expect, label)					\
+		LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t"			\
+		CMP_WORD "cr7, %%r17, %[" __rseq_str(expect) "]\n\t"		\
+		"bne- cr7, " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_OP_CMPNE(var, expectnot, label)				\
+		LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t"			\
+		CMP_WORD "cr7, %%r17, %[" __rseq_str(expectnot) "]\n\t"	\
+		"beq- cr7, " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_OP_STORE(value, var)						\
+		STORE_WORD "%[" __rseq_str(value) "], %[" __rseq_str(var) "]\n\t"
+
+/* Load @var to r17 */
+#define RSEQ_ASM_OP_R_LOAD(var)							\
+		LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t"
+
+/* Store r17 to @var */
+#define RSEQ_ASM_OP_R_STORE(var)						\
+		STORE_WORD "%%r17, %[" __rseq_str(var) "]\n\t"
+
+/* Add @count to r17 */
+#define RSEQ_ASM_OP_R_ADD(count)						\
+		"add %%r17, %[" __rseq_str(count) "], %%r17\n\t"
+
+/* Load (r17 + voffp) to r17 */
+#define RSEQ_ASM_OP_R_LOADX(voffp)						\
+		LOADX_WORD "%%r17, %[" __rseq_str(voffp) "], %%r17\n\t"
+
+/* TODO: implement a faster memcpy. */
+#define RSEQ_ASM_OP_R_MEMCPY() \
+		"cmpdi %%r19, 0\n\t" \
+		"beq 333f\n\t" \
+		"addi %%r20, %%r20, -1\n\t" \
+		"addi %%r21, %%r21, -1\n\t" \
+		"222:\n\t" \
+		"lbzu %%r18, 1(%%r20)\n\t" \
+		"stbu %%r18, 1(%%r21)\n\t" \
+		"addi %%r19, %%r19, -1\n\t" \
+		"cmpdi %%r19, 0\n\t" \
+		"bne 222b\n\t" \
+		"333:\n\t" \
+
+#define RSEQ_ASM_OP_R_FINAL_STORE(var, post_commit_label)			\
+		STORE_WORD "%%r17, %[" __rseq_str(var) "]\n\t"			\
+		__rseq_str(post_commit_label) ":\n\t"
+
+#define RSEQ_ASM_OP_FINAL_STORE(value, var, post_commit_label)			\
+		STORE_WORD "%[" __rseq_str(value) "], %[" __rseq_str(var) "]\n\t"	\
+		__rseq_str(post_commit_label) ":\n\t"
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v not equal to @expectnot */
+		RSEQ_ASM_OP_CMPNE(v, expectnot, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* load the value of @v */
+		RSEQ_ASM_OP_R_LOAD(v)
+		/* store it in @load */
+		RSEQ_ASM_OP_R_STORE(load)
+		/* dereference voffp(v) */
+		RSEQ_ASM_OP_R_LOADX(voffp)
+		/* final store the value at voffp(v) */
+		RSEQ_ASM_OP_R_FINAL_STORE(v, 2)
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expectnot]"r"(expectnot),
+		  [voffp]"b"(voffp),
+		  [load]"m"(*load)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_addv(intptr_t *v, intptr_t count, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* load the value of @v */
+		RSEQ_ASM_OP_R_LOAD(v)
+		/* add @count to it */
+		RSEQ_ASM_OP_R_ADD(count)
+		/* final store */
+		RSEQ_ASM_OP_R_FINAL_STORE(v, 2)
+		RSEQ_INJECT_ASM(4)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [count]"r"(count)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		RSEQ_ASM_OP_STORE(newv2, v2)
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		RSEQ_ASM_OP_STORE(newv2, v2)
+		RSEQ_INJECT_ASM(5)
+		/* for 'release' */
+		"lwsync\n\t"
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* cmp @v2 equal to @expct2 */
+		RSEQ_ASM_OP_CMPEQ(v2, expect2, 5f)
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* cmp2 input */
+		  [v2]"m"(*v2),
+		  [expect2]"r"(expect2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		/* setup for mempcy */
+		"mr %%r19, %[len]\n\t" \
+		"mr %%r20, %[src]\n\t" \
+		"mr %%r21, %[dst]\n\t" \
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		RSEQ_ASM_OP_R_MEMCPY()
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17", "r18", "r19", "r20", "r21"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		/* setup for mempcy */
+		"mr %%r19, %[len]\n\t" \
+		"mr %%r20, %[src]\n\t" \
+		"mr %%r21, %[dst]\n\t" \
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		RSEQ_ASM_OP_R_MEMCPY()
+		RSEQ_INJECT_ASM(5)
+		/* for 'release' */
+		"lwsync\n\t"
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17", "r18", "r19", "r20", "r21"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+#undef STORE_WORD
+#undef LOAD_WORD
+#undef LOADX_WORD
+#undef CMP_WORD
diff --git a/tools/testing/selftests/rseq/rseq-x86.h b/tools/testing/selftests/rseq/rseq-x86.h
new file mode 100644
index 000000000000..63e81d6c61fa
--- /dev/null
+++ b/tools/testing/selftests/rseq/rseq-x86.h
@@ -0,0 +1,898 @@
+/*
+ * rseq-x86.h
+ *
+ * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <stdint.h>
+
+#define RSEQ_SIG	0x53053053
+
+#ifdef __x86_64__
+
+#define rseq_smp_mb()	__asm__ __volatile__ ("mfence" : : : "memory")
+#define rseq_smp_rmb()	barrier()
+#define rseq_smp_wmb()	barrier()
+
+#define rseq_smp_load_acquire(p)					\
+__extension__ ({							\
+	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
+	barrier();							\
+	____p1;								\
+})
+
+#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
+
+#define rseq_smp_store_release(p, v)					\
+do {									\
+	barrier();							\
+	RSEQ_WRITE_ONCE(*p, v);						\
+} while (0)
+
+#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,		\
+			start_ip, post_commit_offset, abort_ip)		\
+		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
+		".balign 32\n\t"					\
+		__rseq_str(label) ":\n\t"				\
+		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
+		".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \
+		".popsection\n\t"
+
+#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
+		RSEQ_INJECT_ASM(1)					\
+		"leaq " __rseq_str(cs_label) "(%%rip), %%rax\n\t"	\
+		"movq %%rax, %[" __rseq_str(rseq_cs) "]\n\t"		\
+		__rseq_str(label) ":\n\t"
+
+#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
+		RSEQ_INJECT_ASM(2)					\
+		"cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \
+		"jnz " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label) \
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
+		/* Disassembler-friendly signature: nopl <sig>(%rip). */\
+		".byte 0x0f, 0x1f, 0x05\n\t"				\
+		".long " __rseq_str(sig) "\n\t"			\
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"jmp %l[" __rseq_str(abort_label) "]\n\t"		\
+		".popsection\n\t"
+
+#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"jmp %l[" __rseq_str(cmpfail_label) "]\n\t"		\
+		".popsection\n\t"
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpq %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* final store */
+		"movq %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpq %[v], %[expectnot]\n\t"
+		"jz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"movq %[v], %%rax\n\t"
+		"movq %%rax, %[load]\n\t"
+		"addq %[voffp], %%rax\n\t"
+		"movq (%%rax), %%rax\n\t"
+		/* final store */
+		"movq %%rax, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expectnot]"r"(expectnot),
+		  [voffp]"er"(voffp),
+		  [load]"m"(*load)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_addv(intptr_t *v, intptr_t count, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* final store */
+		"addq %[count], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(4)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [count]"er"(count)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpq %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		"movq %[newv2], %[v2]\n\t"
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"movq %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+/* x86-64 is TSO. */
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	return rseq_cmpeqv_trystorev_storev(v, expect, v2, newv2,
+			newv, cpu);
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpq %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"cmpq %[v2], %[expect2]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"movq %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* cmp2 input */
+		  [v2]"m"(*v2),
+		  [expect2]"r"(expect2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	uint64_t rseq_scratch[3];
+
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		"movq %[src], %[rseq_scratch0]\n\t"
+		"movq %[dst], %[rseq_scratch1]\n\t"
+		"movq %[len], %[rseq_scratch2]\n\t"
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpq %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		"test %[len], %[len]\n\t" \
+		"jz 333f\n\t" \
+		"222:\n\t" \
+		"movb (%[src]), %%al\n\t" \
+		"movb %%al, (%[dst])\n\t" \
+		"inc %[src]\n\t" \
+		"inc %[dst]\n\t" \
+		"dec %[len]\n\t" \
+		"jnz 222b\n\t" \
+		"333:\n\t" \
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"movq %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		"movq %[rseq_scratch2], %[len]\n\t"
+		"movq %[rseq_scratch1], %[dst]\n\t"
+		"movq %[rseq_scratch0], %[src]\n\t"
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG,
+			"movq %[rseq_scratch2], %[len]\n\t"
+			"movq %[rseq_scratch1], %[dst]\n\t"
+			"movq %[rseq_scratch0], %[src]\n\t",
+			abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
+			"movq %[rseq_scratch2], %[len]\n\t"
+			"movq %[rseq_scratch1], %[dst]\n\t"
+			"movq %[rseq_scratch0], %[src]\n\t",
+			cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len),
+		  [rseq_scratch0]"m"(rseq_scratch[0]),
+		  [rseq_scratch1]"m"(rseq_scratch[1]),
+		  [rseq_scratch2]"m"(rseq_scratch[2])
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+/* x86-64 is TSO. */
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	return rseq_cmpeqv_trymemcpy_storev(v, expect, dst, src,
+			len, newv, cpu);
+}
+
+#elif __i386__
+
+/*
+ * Support older 32-bit architectures that do not implement fence
+ * instructions.
+ */
+#define rseq_smp_mb()	\
+	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
+#define rseq_smp_rmb()	\
+	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
+#define rseq_smp_wmb()	\
+	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
+
+#define rseq_smp_load_acquire(p)					\
+__extension__ ({							\
+	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
+	rseq_smp_mb();							\
+	____p1;								\
+})
+
+#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
+
+#define rseq_smp_store_release(p, v)					\
+do {									\
+	rseq_smp_mb();							\
+	RSEQ_WRITE_ONCE(*p, v);						\
+} while (0)
+
+/*
+ * Use eax as scratch register and take memory operands as input to
+ * lessen register pressure. Especially needed when compiling in O0.
+ */
+#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,		\
+			start_ip, post_commit_offset, abort_ip)		\
+		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
+		".balign 32\n\t"					\
+		__rseq_str(label) ":\n\t"				\
+		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
+		".long " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
+		".popsection\n\t"
+
+#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
+		RSEQ_INJECT_ASM(1)					\
+		"movl $" __rseq_str(cs_label) ", %[rseq_cs]\n\t"	\
+		__rseq_str(label) ":\n\t"
+
+#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
+		RSEQ_INJECT_ASM(2)					\
+		"cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \
+		"jnz " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label) \
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
+		/* Disassembler-friendly signature: nopl <sig>. */\
+		".byte 0x0f, 0x1f, 0x05\n\t"				\
+		".long " __rseq_str(sig) "\n\t"			\
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"jmp %l[" __rseq_str(abort_label) "]\n\t"		\
+		".popsection\n\t"
+
+#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"jmp %l[" __rseq_str(cmpfail_label) "]\n\t"		\
+		".popsection\n\t"
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpl %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* final store */
+		"movl %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpl %[v], %[expectnot]\n\t"
+		"jz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"movl %[v], %%eax\n\t"
+		"movl %%eax, %[load]\n\t"
+		"addl %[voffp], %%eax\n\t"
+		"movl (%%eax), %%eax\n\t"
+		/* final store */
+		"movl %%eax, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expectnot]"r"(expectnot),
+		  [voffp]"ir"(voffp),
+		  [load]"m"(*load)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_addv(intptr_t *v, intptr_t count, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* final store */
+		"addl %[count], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(4)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [count]"ir"(count)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpl %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		"movl %[newv2], %%eax\n\t"
+		"movl %%eax, %[v2]\n\t"
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"movl %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"m"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"movl %[expect], %%eax\n\t"
+		"cmpl %[v], %%eax\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		"movl %[newv2], %[v2]\n\t"
+		RSEQ_INJECT_ASM(5)
+		"lock; addl $0,0(%%esp)\n\t"
+		/* final store */
+		"movl %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"m"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpl %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"cmpl %[expect2], %[v2]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(5)
+		"movl %[newv], %%eax\n\t"
+		/* final store */
+		"movl %%eax, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* cmp2 input */
+		  [v2]"m"(*v2),
+		  [expect2]"r"(expect2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"m"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+/* TODO: implement a faster memcpy. */
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	uint32_t rseq_scratch[3];
+
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		"movl %[src], %[rseq_scratch0]\n\t"
+		"movl %[dst], %[rseq_scratch1]\n\t"
+		"movl %[len], %[rseq_scratch2]\n\t"
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"movl %[expect], %%eax\n\t"
+		"cmpl %%eax, %[v]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		"test %[len], %[len]\n\t" \
+		"jz 333f\n\t" \
+		"222:\n\t" \
+		"movb (%[src]), %%al\n\t" \
+		"movb %%al, (%[dst])\n\t" \
+		"inc %[src]\n\t" \
+		"inc %[dst]\n\t" \
+		"dec %[len]\n\t" \
+		"jnz 222b\n\t" \
+		"333:\n\t" \
+		RSEQ_INJECT_ASM(5)
+		"movl %[newv], %%eax\n\t"
+		/* final store */
+		"movl %%eax, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		"movl %[rseq_scratch2], %[len]\n\t"
+		"movl %[rseq_scratch1], %[dst]\n\t"
+		"movl %[rseq_scratch0], %[src]\n\t"
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG,
+			"movl %[rseq_scratch2], %[len]\n\t"
+			"movl %[rseq_scratch1], %[dst]\n\t"
+			"movl %[rseq_scratch0], %[src]\n\t",
+			abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
+			"movl %[rseq_scratch2], %[len]\n\t"
+			"movl %[rseq_scratch1], %[dst]\n\t"
+			"movl %[rseq_scratch0], %[src]\n\t",
+			cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"m"(expect),
+		  [newv]"m"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len),
+		  [rseq_scratch0]"m"(rseq_scratch[0]),
+		  [rseq_scratch1]"m"(rseq_scratch[1]),
+		  [rseq_scratch2]"m"(rseq_scratch[2])
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+/* TODO: implement a faster memcpy. */
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	uint32_t rseq_scratch[3];
+
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		"movl %[src], %[rseq_scratch0]\n\t"
+		"movl %[dst], %[rseq_scratch1]\n\t"
+		"movl %[len], %[rseq_scratch2]\n\t"
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"movl %[expect], %%eax\n\t"
+		"cmpl %%eax, %[v]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		"test %[len], %[len]\n\t" \
+		"jz 333f\n\t" \
+		"222:\n\t" \
+		"movb (%[src]), %%al\n\t" \
+		"movb %%al, (%[dst])\n\t" \
+		"inc %[src]\n\t" \
+		"inc %[dst]\n\t" \
+		"dec %[len]\n\t" \
+		"jnz 222b\n\t" \
+		"333:\n\t" \
+		RSEQ_INJECT_ASM(5)
+		"lock; addl $0,0(%%esp)\n\t"
+		"movl %[newv], %%eax\n\t"
+		/* final store */
+		"movl %%eax, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		"movl %[rseq_scratch2], %[len]\n\t"
+		"movl %[rseq_scratch1], %[dst]\n\t"
+		"movl %[rseq_scratch0], %[src]\n\t"
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG,
+			"movl %[rseq_scratch2], %[len]\n\t"
+			"movl %[rseq_scratch1], %[dst]\n\t"
+			"movl %[rseq_scratch0], %[src]\n\t",
+			abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
+			"movl %[rseq_scratch2], %[len]\n\t"
+			"movl %[rseq_scratch1], %[dst]\n\t"
+			"movl %[rseq_scratch0], %[src]\n\t",
+			cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"m"(expect),
+		  [newv]"m"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len),
+		  [rseq_scratch0]"m"(rseq_scratch[0]),
+		  [rseq_scratch1]"m"(rseq_scratch[1]),
+		  [rseq_scratch2]"m"(rseq_scratch[2])
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+#endif
diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c
new file mode 100644
index 000000000000..b83d3196c33e
--- /dev/null
+++ b/tools/testing/selftests/rseq/rseq.c
@@ -0,0 +1,116 @@
+/*
+ * rseq.c
+ *
+ * Copyright (C) 2016 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; only
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <syscall.h>
+#include <assert.h>
+#include <signal.h>
+
+#include "rseq.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+__attribute__((tls_model("initial-exec"))) __thread
+volatile struct rseq __rseq_abi = {
+	.cpu_id = RSEQ_CPU_ID_UNINITIALIZED,
+};
+
+static __attribute__((tls_model("initial-exec"))) __thread
+volatile int refcount;
+
+static void signal_off_save(sigset_t *oldset)
+{
+	sigset_t set;
+	int ret;
+
+	sigfillset(&set);
+	ret = pthread_sigmask(SIG_BLOCK, &set, oldset);
+	if (ret)
+		abort();
+}
+
+static void signal_restore(sigset_t oldset)
+{
+	int ret;
+
+	ret = pthread_sigmask(SIG_SETMASK, &oldset, NULL);
+	if (ret)
+		abort();
+}
+
+static int sys_rseq(volatile struct rseq *rseq_abi, uint32_t rseq_len,
+		int flags, uint32_t sig)
+{
+	return syscall(__NR_rseq, rseq_abi, rseq_len, flags, sig);
+}
+
+int rseq_register_current_thread(void)
+{
+	int rc, ret = 0;
+	sigset_t oldset;
+
+	signal_off_save(&oldset);
+	if (refcount++)
+		goto end;
+	rc = sys_rseq(&__rseq_abi, sizeof(struct rseq), 0, RSEQ_SIG);
+	if (!rc) {
+		assert(rseq_current_cpu_raw() >= 0);
+		goto end;
+	}
+	if (errno != EBUSY)
+		__rseq_abi.cpu_id = -2;
+	ret = -1;
+	refcount--;
+end:
+	signal_restore(oldset);
+	return ret;
+}
+
+int rseq_unregister_current_thread(void)
+{
+	int rc, ret = 0;
+	sigset_t oldset;
+
+	signal_off_save(&oldset);
+	if (--refcount)
+		goto end;
+	rc = sys_rseq(&__rseq_abi, sizeof(struct rseq),
+			RSEQ_FLAG_UNREGISTER, RSEQ_SIG);
+	if (!rc)
+		goto end;
+	ret = -1;
+end:
+	signal_restore(oldset);
+	return ret;
+}
+
+int32_t rseq_fallback_current_cpu(void)
+{
+	int32_t cpu;
+
+	cpu = sched_getcpu();
+	if (cpu < 0) {
+		perror("sched_getcpu()");
+		abort();
+	}
+	return cpu;
+}
diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
new file mode 100644
index 000000000000..26c8ea01e940
--- /dev/null
+++ b/tools/testing/selftests/rseq/rseq.h
@@ -0,0 +1,154 @@
+/*
+ * rseq.h
+ *
+ * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RSEQ_H
+#define RSEQ_H
+
+#include <stdint.h>
+#include <stdbool.h>
+#include <pthread.h>
+#include <signal.h>
+#include <sched.h>
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sched.h>
+#include <linux/rseq.h>
+
+/*
+ * Empty code injection macros, override when testing.
+ * It is important to consider that the ASM injection macros need to be
+ * fully reentrant (e.g. do not modify the stack).
+ */
+#ifndef RSEQ_INJECT_ASM
+#define RSEQ_INJECT_ASM(n)
+#endif
+
+#ifndef RSEQ_INJECT_C
+#define RSEQ_INJECT_C(n)
+#endif
+
+#ifndef RSEQ_INJECT_INPUT
+#define RSEQ_INJECT_INPUT
+#endif
+
+#ifndef RSEQ_INJECT_CLOBBER
+#define RSEQ_INJECT_CLOBBER
+#endif
+
+#ifndef RSEQ_INJECT_FAILED
+#define RSEQ_INJECT_FAILED
+#endif
+
+extern __thread volatile struct rseq __rseq_abi;
+
+#define rseq_likely(x)		__builtin_expect(!!(x), 1)
+#define rseq_unlikely(x)	__builtin_expect(!!(x), 0)
+#define rseq_barrier()		__asm__ __volatile__("" : : : "memory")
+
+#define RSEQ_ACCESS_ONCE(x)	(*(__volatile__  __typeof__(x) *)&(x))
+#define RSEQ_WRITE_ONCE(x, v)	__extension__ ({ RSEQ_ACCESS_ONCE(x) = (v); })
+#define RSEQ_READ_ONCE(x)	RSEQ_ACCESS_ONCE(x)
+
+#define __rseq_str_1(x)	#x
+#define __rseq_str(x)		__rseq_str_1(x)
+
+#if defined(__x86_64__) || defined(__i386__)
+#include <rseq-x86.h>
+#elif defined(__ARMEL__)
+#include <rseq-arm.h>
+#elif defined(__PPC__)
+#include <rseq-ppc.h>
+#else
+#error unsupported target
+#endif
+
+/*
+ * Register rseq for the current thread. This needs to be called once
+ * by any thread which uses restartable sequences, before they start
+ * using restartable sequences, to ensure restartable sequences
+ * succeed. A restartable sequence executed from a non-registered
+ * thread will always fail.
+ */
+int rseq_register_current_thread(void);
+
+/*
+ * Unregister rseq for current thread.
+ */
+int rseq_unregister_current_thread(void);
+
+/*
+ * Restartable sequence fallback for reading the current CPU number.
+ */
+int32_t rseq_fallback_current_cpu(void);
+
+/*
+ * Values returned can be either the current CPU number, -1 (rseq is
+ * uninitialized), or -2 (rseq initialization has failed).
+ */
+static inline int32_t rseq_current_cpu_raw(void)
+{
+	return RSEQ_ACCESS_ONCE(__rseq_abi.cpu_id);
+}
+
+/*
+ * Returns a possible CPU number, which is typically the current CPU.
+ * The returned CPU number can be used to prepare for an rseq critical
+ * section, which will confirm whether the cpu number is indeed the
+ * current one, and whether rseq is initialized.
+ *
+ * The CPU number returned by rseq_cpu_start should always be validated
+ * by passing it to a rseq asm sequence, or by comparing it to the
+ * return value of rseq_current_cpu_raw() if the rseq asm sequence
+ * does not need to be invoked.
+ */
+static inline uint32_t rseq_cpu_start(void)
+{
+	return RSEQ_ACCESS_ONCE(__rseq_abi.cpu_id_start);
+}
+
+static inline uint32_t rseq_current_cpu(void)
+{
+	int32_t cpu;
+
+	cpu = rseq_current_cpu_raw();
+	if (rseq_unlikely(cpu < 0))
+		cpu = rseq_fallback_current_cpu();
+	return cpu;
+}
+
+/*
+ * rseq_prepare_unload() should be invoked by each thread using rseq_finish*()
+ * at least once between their last rseq_finish*() and library unload of the
+ * library defining the rseq critical section (struct rseq_cs). This also
+ * applies to use of rseq in code generated by JIT: rseq_prepare_unload()
+ * should be invoked at least once by each thread using rseq_finish*() before
+ * reclaim of the memory holding the struct rseq_cs.
+ */
+static inline void rseq_prepare_unload(void)
+{
+	__rseq_abi.rseq_cs = 0;
+}
+
+#endif  /* RSEQ_H_ */
diff --git a/tools/testing/selftests/rseq/run_param_test.sh b/tools/testing/selftests/rseq/run_param_test.sh
new file mode 100755
index 000000000000..c7475a2bef11
--- /dev/null
+++ b/tools/testing/selftests/rseq/run_param_test.sh
@@ -0,0 +1,124 @@
+#!/bin/bash
+
+EXTRA_ARGS=${@}
+
+OLDIFS="$IFS"
+IFS=$'\n'
+TEST_LIST=(
+	"-T s"
+	"-T l"
+	"-T b"
+	"-T b -M"
+	"-T m"
+	"-T m -M"
+	"-T i"
+)
+
+TEST_NAME=(
+	"spinlock"
+	"list"
+	"buffer"
+	"buffer with barrier"
+	"memcpy"
+	"memcpy with barrier"
+	"increment"
+)
+IFS="$OLDIFS"
+
+function do_tests()
+{
+	local i=0
+	while [ "$i" -lt "${#TEST_LIST[@]}" ]; do
+		echo "Running test ${TEST_NAME[$i]}"
+		./param_test ${TEST_LIST[$i]} ${@} ${EXTRA_ARGS} || exit 1
+		let "i++"
+	done
+}
+
+echo "Default parameters"
+do_tests
+
+echo "Loop injection: 10000 loops"
+
+OLDIFS="$IFS"
+IFS=$'\n'
+INJECT_LIST=(
+	"1"
+	"2"
+	"3"
+	"4"
+	"5"
+	"6"
+	"7"
+	"8"
+	"9"
+)
+IFS="$OLDIFS"
+
+NR_LOOPS=10000
+
+i=0
+while [ "$i" -lt "${#INJECT_LIST[@]}" ]; do
+	echo "Injecting at <${INJECT_LIST[$i]}>"
+	do_tests -${INJECT_LIST[i]} ${NR_LOOPS}
+	let "i++"
+done
+NR_LOOPS=
+
+function inject_blocking()
+{
+	OLDIFS="$IFS"
+	IFS=$'\n'
+	INJECT_LIST=(
+		"7"
+		"8"
+		"9"
+	)
+	IFS="$OLDIFS"
+
+	NR_LOOPS=-1
+
+	i=0
+	while [ "$i" -lt "${#INJECT_LIST[@]}" ]; do
+		echo "Injecting at <${INJECT_LIST[$i]}>"
+		do_tests -${INJECT_LIST[i]} -1 ${@}
+		let "i++"
+	done
+	NR_LOOPS=
+}
+
+echo "Yield injection (25%)"
+inject_blocking -m 4 -y -r 100
+
+echo "Yield injection (50%)"
+inject_blocking -m 2 -y -r 100
+
+echo "Yield injection (100%)"
+inject_blocking -m 1 -y -r 100
+
+echo "Kill injection (25%)"
+inject_blocking -m 4 -k -r 100
+
+echo "Kill injection (50%)"
+inject_blocking -m 2 -k -r 100
+
+echo "Kill injection (100%)"
+inject_blocking -m 1 -k -r 100
+
+echo "Sleep injection (1ms, 25%)"
+inject_blocking -m 4 -s 1 -r 100
+
+echo "Sleep injection (1ms, 50%)"
+inject_blocking -m 2 -s 1 -r 100
+
+echo "Sleep injection (1ms, 100%)"
+inject_blocking -m 1 -s 1 -r 100
+
+echo "Disable rseq for 25% threads"
+do_tests -D 4
+
+echo "Disable rseq for 50% threads"
+do_tests -D 2
+
+echo "Disable rseq"
+do_tests -d
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 16/22] rseq: selftests: arm: workaround gcc asm size guess
  2017-11-21 14:18 ` Mathieu Desnoyers
  (?)
  (?)
@ 2017-11-21 14:18   ` mathieu.desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Florian Weimer,
	Shuah Khan, linux-kselftest

Fixes assembler errors:
/tmp/cceKwI9a.s: Assembler messages:
/tmp/cceKwI9a.s:849: Error: co-processor offset out of range

with gcc prior to gcc-7. This can trigger if multiple rseq inline asm
are used within the same function.

My best guess on the cause of this issue is that gcc has a hard
time figuring out the actual size of the inline asm, and therefore
does not compute the offsets at which literal values can be
placed from the program counter accurately.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul Turner <pjt@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Andrew Hunter <ahh@google.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Florian Weimer <fweimer@redhat.com>
CC: Shuah Khan <shuah@kernel.org>
CC: linux-kselftest@vger.kernel.org
CC: linux-api@vger.kernel.org
---
 tools/testing/selftests/rseq/rseq-arm.h | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h
index 47953c0cef4f..6d3fda276f4d 100644
--- a/tools/testing/selftests/rseq/rseq-arm.h
+++ b/tools/testing/selftests/rseq/rseq-arm.h
@@ -79,12 +79,15 @@ do {									\
 		teardown						\
 		"b %l[" __rseq_str(cmpfail_label) "]\n\t"
 
+#define rseq_workaround_gcc_asm_size_guess()	__asm__ __volatile__("")
+
 static inline __attribute__((always_inline))
 int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
 		int cpu)
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -115,11 +118,14 @@ int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -129,6 +135,7 @@ int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -164,11 +171,14 @@ int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -177,6 +187,7 @@ int rseq_addv(intptr_t *v, intptr_t count, int cpu)
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -203,8 +214,10 @@ int rseq_addv(intptr_t *v, intptr_t count, int cpu)
 		  RSEQ_INJECT_CLOBBER
 		: abort
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 }
@@ -216,6 +229,7 @@ int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -253,11 +267,14 @@ int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -268,6 +285,7 @@ int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -306,11 +324,14 @@ int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -321,6 +342,7 @@ int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -359,11 +381,14 @@ int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -376,6 +401,7 @@ int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
 
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		"str %[src], %[rseq_scratch0]\n\t"
@@ -442,11 +468,14 @@ int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -459,6 +488,7 @@ int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
 
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		"str %[src], %[rseq_scratch0]\n\t"
@@ -526,10 +556,13 @@ int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 16/22] rseq: selftests: arm: workaround gcc asm size guess
@ 2017-11-21 14:18   ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: mathieu.desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)


Fixes assembler errors:
/tmp/cceKwI9a.s: Assembler messages:
/tmp/cceKwI9a.s:849: Error: co-processor offset out of range

with gcc prior to gcc-7. This can trigger if multiple rseq inline asm
are used within the same function.

My best guess on the cause of this issue is that gcc has a hard
time figuring out the actual size of the inline asm, and therefore
does not compute the offsets at which literal values can be
placed from the program counter accurately.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
CC: Peter Zijlstra <peterz at infradead.org>
CC: Paul Turner <pjt at google.com>
CC: Thomas Gleixner <tglx at linutronix.de>
CC: Andrew Hunter <ahh at google.com>
CC: Andy Lutomirski <luto at amacapital.net>
CC: Andi Kleen <andi at firstfloor.org>
CC: Dave Watson <davejwatson at fb.com>
CC: Chris Lameter <cl at linux.com>
CC: Ingo Molnar <mingo at redhat.com>
CC: "H. Peter Anvin" <hpa at zytor.com>
CC: Ben Maurer <bmaurer at fb.com>
CC: Steven Rostedt <rostedt at goodmis.org>
CC: Josh Triplett <josh at joshtriplett.org>
CC: Linus Torvalds <torvalds at linux-foundation.org>
CC: Andrew Morton <akpm at linux-foundation.org>
CC: Russell King <linux at arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas at arm.com>
CC: Will Deacon <will.deacon at arm.com>
CC: Michael Kerrisk <mtk.manpages at gmail.com>
CC: Boqun Feng <boqun.feng at gmail.com>
CC: Florian Weimer <fweimer at redhat.com>
CC: Shuah Khan <shuah at kernel.org>
CC: linux-kselftest at vger.kernel.org
CC: linux-api at vger.kernel.org
---
 tools/testing/selftests/rseq/rseq-arm.h | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h
index 47953c0cef4f..6d3fda276f4d 100644
--- a/tools/testing/selftests/rseq/rseq-arm.h
+++ b/tools/testing/selftests/rseq/rseq-arm.h
@@ -79,12 +79,15 @@ do {									\
 		teardown						\
 		"b %l[" __rseq_str(cmpfail_label) "]\n\t"
 
+#define rseq_workaround_gcc_asm_size_guess()	__asm__ __volatile__("")
+
 static inline __attribute__((always_inline))
 int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
 		int cpu)
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -115,11 +118,14 @@ int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -129,6 +135,7 @@ int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -164,11 +171,14 @@ int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -177,6 +187,7 @@ int rseq_addv(intptr_t *v, intptr_t count, int cpu)
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -203,8 +214,10 @@ int rseq_addv(intptr_t *v, intptr_t count, int cpu)
 		  RSEQ_INJECT_CLOBBER
 		: abort
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 }
@@ -216,6 +229,7 @@ int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -253,11 +267,14 @@ int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -268,6 +285,7 @@ int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -306,11 +324,14 @@ int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -321,6 +342,7 @@ int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -359,11 +381,14 @@ int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -376,6 +401,7 @@ int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
 
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		"str %[src], %[rseq_scratch0]\n\t"
@@ -442,11 +468,14 @@ int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -459,6 +488,7 @@ int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
 
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		"str %[src], %[rseq_scratch0]\n\t"
@@ -526,10 +556,13 @@ int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 16/22] rseq: selftests: arm: workaround gcc asm size guess
@ 2017-11-21 14:18   ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)


Fixes assembler errors:
/tmp/cceKwI9a.s: Assembler messages:
/tmp/cceKwI9a.s:849: Error: co-processor offset out of range

with gcc prior to gcc-7. This can trigger if multiple rseq inline asm
are used within the same function.

My best guess on the cause of this issue is that gcc has a hard
time figuring out the actual size of the inline asm, and therefore
does not compute the offsets at which literal values can be
placed from the program counter accurately.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
CC: Peter Zijlstra <peterz at infradead.org>
CC: Paul Turner <pjt at google.com>
CC: Thomas Gleixner <tglx at linutronix.de>
CC: Andrew Hunter <ahh at google.com>
CC: Andy Lutomirski <luto at amacapital.net>
CC: Andi Kleen <andi at firstfloor.org>
CC: Dave Watson <davejwatson at fb.com>
CC: Chris Lameter <cl at linux.com>
CC: Ingo Molnar <mingo at redhat.com>
CC: "H. Peter Anvin" <hpa at zytor.com>
CC: Ben Maurer <bmaurer at fb.com>
CC: Steven Rostedt <rostedt at goodmis.org>
CC: Josh Triplett <josh at joshtriplett.org>
CC: Linus Torvalds <torvalds at linux-foundation.org>
CC: Andrew Morton <akpm at linux-foundation.org>
CC: Russell King <linux at arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas at arm.com>
CC: Will Deacon <will.deacon at arm.com>
CC: Michael Kerrisk <mtk.manpages at gmail.com>
CC: Boqun Feng <boqun.feng at gmail.com>
CC: Florian Weimer <fweimer at redhat.com>
CC: Shuah Khan <shuah at kernel.org>
CC: linux-kselftest at vger.kernel.org
CC: linux-api at vger.kernel.org
---
 tools/testing/selftests/rseq/rseq-arm.h | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h
index 47953c0cef4f..6d3fda276f4d 100644
--- a/tools/testing/selftests/rseq/rseq-arm.h
+++ b/tools/testing/selftests/rseq/rseq-arm.h
@@ -79,12 +79,15 @@ do {									\
 		teardown						\
 		"b %l[" __rseq_str(cmpfail_label) "]\n\t"
 
+#define rseq_workaround_gcc_asm_size_guess()	__asm__ __volatile__("")
+
 static inline __attribute__((always_inline))
 int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
 		int cpu)
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -115,11 +118,14 @@ int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -129,6 +135,7 @@ int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -164,11 +171,14 @@ int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -177,6 +187,7 @@ int rseq_addv(intptr_t *v, intptr_t count, int cpu)
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -203,8 +214,10 @@ int rseq_addv(intptr_t *v, intptr_t count, int cpu)
 		  RSEQ_INJECT_CLOBBER
 		: abort
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 }
@@ -216,6 +229,7 @@ int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -253,11 +267,14 @@ int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -268,6 +285,7 @@ int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -306,11 +324,14 @@ int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -321,6 +342,7 @@ int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -359,11 +381,14 @@ int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -376,6 +401,7 @@ int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
 
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		"str %[src], %[rseq_scratch0]\n\t"
@@ -442,11 +468,14 @@ int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -459,6 +488,7 @@ int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
 
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		"str %[src], %[rseq_scratch0]\n\t"
@@ -526,10 +556,13 @@ int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 16/22] rseq: selftests: arm: workaround gcc asm size guess
@ 2017-11-21 14:18   ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Florian Weimer,
	Shuah Khan

Fixes assembler errors:
/tmp/cceKwI9a.s: Assembler messages:
/tmp/cceKwI9a.s:849: Error: co-processor offset out of range

with gcc prior to gcc-7. This can trigger if multiple rseq inline asm
are used within the same function.

My best guess on the cause of this issue is that gcc has a hard
time figuring out the actual size of the inline asm, and therefore
does not compute the offsets at which literal values can be
placed from the program counter accurately.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul Turner <pjt@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Andrew Hunter <ahh@google.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Florian Weimer <fweimer@redhat.com>
CC: Shuah Khan <shuah@kernel.org>
CC: linux-kselftest@vger.kernel.org
CC: linux-api@vger.kernel.org
---
 tools/testing/selftests/rseq/rseq-arm.h | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h
index 47953c0cef4f..6d3fda276f4d 100644
--- a/tools/testing/selftests/rseq/rseq-arm.h
+++ b/tools/testing/selftests/rseq/rseq-arm.h
@@ -79,12 +79,15 @@ do {									\
 		teardown						\
 		"b %l[" __rseq_str(cmpfail_label) "]\n\t"
 
+#define rseq_workaround_gcc_asm_size_guess()	__asm__ __volatile__("")
+
 static inline __attribute__((always_inline))
 int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
 		int cpu)
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -115,11 +118,14 @@ int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -129,6 +135,7 @@ int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -164,11 +171,14 @@ int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -177,6 +187,7 @@ int rseq_addv(intptr_t *v, intptr_t count, int cpu)
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -203,8 +214,10 @@ int rseq_addv(intptr_t *v, intptr_t count, int cpu)
 		  RSEQ_INJECT_CLOBBER
 		: abort
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 }
@@ -216,6 +229,7 @@ int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -253,11 +267,14 @@ int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -268,6 +285,7 @@ int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -306,11 +324,14 @@ int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -321,6 +342,7 @@ int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -359,11 +381,14 @@ int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -376,6 +401,7 @@ int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
 
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		"str %[src], %[rseq_scratch0]\n\t"
@@ -442,11 +468,14 @@ int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -459,6 +488,7 @@ int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
 
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		"str %[src], %[rseq_scratch0]\n\t"
@@ -526,10 +556,13 @@ int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 17/22] Fix: membarrier: add missing preempt off around smp_call_function_many
  2017-11-21 14:18 ` Mathieu Desnoyers
@ 2017-11-21 14:18   ` Mathieu Desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Maged Michael,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Andrea Parri, stable, #v4.14

smp_call_function_many requires disabling preemption around the call.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andrea Parri <parri.andrea@gmail.com>
CC: stable@vger.kernel.org #v4.14
---
 kernel/sched/membarrier.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index dd7908743dab..9bcbacba82a8 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -89,7 +89,9 @@ static int membarrier_private_expedited(void)
 		rcu_read_unlock();
 	}
 	if (!fallback) {
+		preempt_disable();
 		smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
+		preempt_enable();
 		free_cpumask_var(tmpmask);
 	}
 	cpus_read_unlock();
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 17/22] Fix: membarrier: add missing preempt off around smp_call_function_many
@ 2017-11-21 14:18   ` Mathieu Desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Maged Michael,
	Avi Kivity

smp_call_function_many requires disabling preemption around the call.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andrea Parri <parri.andrea@gmail.com>
CC: stable@vger.kernel.org #v4.14
---
 kernel/sched/membarrier.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index dd7908743dab..9bcbacba82a8 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -89,7 +89,9 @@ static int membarrier_private_expedited(void)
 		rcu_read_unlock();
 	}
 	if (!fallback) {
+		preempt_disable();
 		smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
+		preempt_enable();
 		free_cpumask_var(tmpmask);
 	}
 	cpus_read_unlock();
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 18/22] membarrier: selftest: Test private expedited cmd
  2017-11-21 14:18 ` Mathieu Desnoyers
  (?)
  (?)
@ 2017-11-21 14:18   ` mathieu.desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Maged Michael,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Alan Stern, Andy Lutomirski, Alice Ferrazzi,
	Paul Elder, linux-kselftest, linux-arch

Test the new MEMBARRIER_CMD_PRIVATE_EXPEDITED and
MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED commands.

Add checks expecting specific error values on system calls expected to
fail.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Alan Stern <stern@rowland.harvard.edu>
CC: Will Deacon <will.deacon@arm.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: Alice Ferrazzi <alice.ferrazzi@gmail.com>
CC: Paul Elder <paul.elder@pitt.edu>
CC: linux-kselftest@vger.kernel.org
CC: linux-arch@vger.kernel.org
---
Changes since v1:
- return result of ksft_exit_pass from main(), silencing compiler
  warning about missing return value.
---
 .../testing/selftests/membarrier/membarrier_test.c | 111 ++++++++++++++++++---
 1 file changed, 95 insertions(+), 16 deletions(-)

diff --git a/tools/testing/selftests/membarrier/membarrier_test.c b/tools/testing/selftests/membarrier/membarrier_test.c
index 9e674d9514d1..e6ee73d01fa1 100644
--- a/tools/testing/selftests/membarrier/membarrier_test.c
+++ b/tools/testing/selftests/membarrier/membarrier_test.c
@@ -16,49 +16,119 @@ static int sys_membarrier(int cmd, int flags)
 static int test_membarrier_cmd_fail(void)
 {
 	int cmd = -1, flags = 0;
+	const char *test_name = "sys membarrier invalid command";
 
 	if (sys_membarrier(cmd, flags) != -1) {
 		ksft_exit_fail_msg(
-			"sys membarrier invalid command test: command = %d, flags = %d. Should fail, but passed\n",
-			cmd, flags);
+			"%s test: command = %d, flags = %d. Should fail, but passed\n",
+			test_name, cmd, flags);
+	}
+	if (errno != EINVAL) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EINVAL, strerror(EINVAL),
+			errno, strerror(errno));
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier invalid command test: command = %d, flags = %d. Failed as expected\n",
-		cmd, flags);
+		"%s test: command = %d, flags = %d, errno = %d. Failed as expected\n",
+		test_name, cmd, flags, errno);
 	return 0;
 }
 
 static int test_membarrier_flags_fail(void)
 {
 	int cmd = MEMBARRIER_CMD_QUERY, flags = 1;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_QUERY invalid flags";
 
 	if (sys_membarrier(cmd, flags) != -1) {
 		ksft_exit_fail_msg(
-			"sys membarrier MEMBARRIER_CMD_QUERY invalid flags test: flags = %d. Should fail, but passed\n",
-			flags);
+			"%s test: flags = %d. Should fail, but passed\n",
+			test_name, flags);
+	}
+	if (errno != EINVAL) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EINVAL, strerror(EINVAL),
+			errno, strerror(errno));
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier MEMBARRIER_CMD_QUERY invalid flags test: flags = %d. Failed as expected\n",
-		flags);
+		"%s test: flags = %d, errno = %d. Failed as expected\n",
+		test_name, flags, errno);
 	return 0;
 }
 
-static int test_membarrier_success(void)
+static int test_membarrier_shared_success(void)
 {
 	int cmd = MEMBARRIER_CMD_SHARED, flags = 0;
-	const char *test_name = "sys membarrier MEMBARRIER_CMD_SHARED\n";
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_SHARED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n", test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_private_expedited_fail(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED not registered failure";
+
+	if (sys_membarrier(cmd, flags) != -1) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should fail, but passed\n",
+			test_name, flags);
+	}
+	if (errno != EPERM) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EPERM, strerror(EPERM),
+			errno, strerror(errno));
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d, errno = %d\n",
+		test_name, flags, errno);
+	return 0;
+}
+
+static int test_membarrier_register_private_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED";
 
 	if (sys_membarrier(cmd, flags) != 0) {
 		ksft_exit_fail_msg(
-			"sys membarrier MEMBARRIER_CMD_SHARED test: flags = %d\n",
-			flags);
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier MEMBARRIER_CMD_SHARED test: flags = %d\n",
-		flags);
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_private_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
 	return 0;
 }
 
@@ -72,7 +142,16 @@ static int test_membarrier(void)
 	status = test_membarrier_flags_fail();
 	if (status)
 		return status;
-	status = test_membarrier_success();
+	status = test_membarrier_shared_success();
+	if (status)
+		return status;
+	status = test_membarrier_private_expedited_fail();
+	if (status)
+		return status;
+	status = test_membarrier_register_private_expedited_success();
+	if (status)
+		return status;
+	status = test_membarrier_private_expedited_success();
 	if (status)
 		return status;
 	return 0;
@@ -108,5 +187,5 @@ int main(int argc, char **argv)
 	test_membarrier_query();
 	test_membarrier();
 
-	ksft_exit_pass();
+	return ksft_exit_pass();
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 18/22] membarrier: selftest: Test private expedited cmd
@ 2017-11-21 14:18   ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: mathieu.desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)


Test the new MEMBARRIER_CMD_PRIVATE_EXPEDITED and
MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED commands.

Add checks expecting specific error values on system calls expected to
fail.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Acked-by: Shuah Khan <shuahkh at osg.samsung.com>
Acked-by: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
CC: Peter Zijlstra <peterz at infradead.org>
CC: Paul E. McKenney <paulmck at linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng at gmail.com>
CC: Andrew Hunter <ahh at google.com>
CC: Maged Michael <maged.michael at gmail.com>
CC: Avi Kivity <avi at scylladb.com>
CC: Benjamin Herrenschmidt <benh at kernel.crashing.org>
CC: Paul Mackerras <paulus at samba.org>
CC: Michael Ellerman <mpe at ellerman.id.au>
CC: Dave Watson <davejwatson at fb.com>
CC: Alan Stern <stern at rowland.harvard.edu>
CC: Will Deacon <will.deacon at arm.com>
CC: Andy Lutomirski <luto at kernel.org>
CC: Alice Ferrazzi <alice.ferrazzi at gmail.com>
CC: Paul Elder <paul.elder at pitt.edu>
CC: linux-kselftest at vger.kernel.org
CC: linux-arch at vger.kernel.org
---
Changes since v1:
- return result of ksft_exit_pass from main(), silencing compiler
  warning about missing return value.
---
 .../testing/selftests/membarrier/membarrier_test.c | 111 ++++++++++++++++++---
 1 file changed, 95 insertions(+), 16 deletions(-)

diff --git a/tools/testing/selftests/membarrier/membarrier_test.c b/tools/testing/selftests/membarrier/membarrier_test.c
index 9e674d9514d1..e6ee73d01fa1 100644
--- a/tools/testing/selftests/membarrier/membarrier_test.c
+++ b/tools/testing/selftests/membarrier/membarrier_test.c
@@ -16,49 +16,119 @@ static int sys_membarrier(int cmd, int flags)
 static int test_membarrier_cmd_fail(void)
 {
 	int cmd = -1, flags = 0;
+	const char *test_name = "sys membarrier invalid command";
 
 	if (sys_membarrier(cmd, flags) != -1) {
 		ksft_exit_fail_msg(
-			"sys membarrier invalid command test: command = %d, flags = %d. Should fail, but passed\n",
-			cmd, flags);
+			"%s test: command = %d, flags = %d. Should fail, but passed\n",
+			test_name, cmd, flags);
+	}
+	if (errno != EINVAL) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EINVAL, strerror(EINVAL),
+			errno, strerror(errno));
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier invalid command test: command = %d, flags = %d. Failed as expected\n",
-		cmd, flags);
+		"%s test: command = %d, flags = %d, errno = %d. Failed as expected\n",
+		test_name, cmd, flags, errno);
 	return 0;
 }
 
 static int test_membarrier_flags_fail(void)
 {
 	int cmd = MEMBARRIER_CMD_QUERY, flags = 1;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_QUERY invalid flags";
 
 	if (sys_membarrier(cmd, flags) != -1) {
 		ksft_exit_fail_msg(
-			"sys membarrier MEMBARRIER_CMD_QUERY invalid flags test: flags = %d. Should fail, but passed\n",
-			flags);
+			"%s test: flags = %d. Should fail, but passed\n",
+			test_name, flags);
+	}
+	if (errno != EINVAL) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EINVAL, strerror(EINVAL),
+			errno, strerror(errno));
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier MEMBARRIER_CMD_QUERY invalid flags test: flags = %d. Failed as expected\n",
-		flags);
+		"%s test: flags = %d, errno = %d. Failed as expected\n",
+		test_name, flags, errno);
 	return 0;
 }
 
-static int test_membarrier_success(void)
+static int test_membarrier_shared_success(void)
 {
 	int cmd = MEMBARRIER_CMD_SHARED, flags = 0;
-	const char *test_name = "sys membarrier MEMBARRIER_CMD_SHARED\n";
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_SHARED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n", test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_private_expedited_fail(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED not registered failure";
+
+	if (sys_membarrier(cmd, flags) != -1) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should fail, but passed\n",
+			test_name, flags);
+	}
+	if (errno != EPERM) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EPERM, strerror(EPERM),
+			errno, strerror(errno));
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d, errno = %d\n",
+		test_name, flags, errno);
+	return 0;
+}
+
+static int test_membarrier_register_private_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED";
 
 	if (sys_membarrier(cmd, flags) != 0) {
 		ksft_exit_fail_msg(
-			"sys membarrier MEMBARRIER_CMD_SHARED test: flags = %d\n",
-			flags);
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier MEMBARRIER_CMD_SHARED test: flags = %d\n",
-		flags);
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_private_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
 	return 0;
 }
 
@@ -72,7 +142,16 @@ static int test_membarrier(void)
 	status = test_membarrier_flags_fail();
 	if (status)
 		return status;
-	status = test_membarrier_success();
+	status = test_membarrier_shared_success();
+	if (status)
+		return status;
+	status = test_membarrier_private_expedited_fail();
+	if (status)
+		return status;
+	status = test_membarrier_register_private_expedited_success();
+	if (status)
+		return status;
+	status = test_membarrier_private_expedited_success();
 	if (status)
 		return status;
 	return 0;
@@ -108,5 +187,5 @@ int main(int argc, char **argv)
 	test_membarrier_query();
 	test_membarrier();
 
-	ksft_exit_pass();
+	return ksft_exit_pass();
 }
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 18/22] membarrier: selftest: Test private expedited cmd
@ 2017-11-21 14:18   ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)


Test the new MEMBARRIER_CMD_PRIVATE_EXPEDITED and
MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED commands.

Add checks expecting specific error values on system calls expected to
fail.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Acked-by: Shuah Khan <shuahkh at osg.samsung.com>
Acked-by: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
CC: Peter Zijlstra <peterz at infradead.org>
CC: Paul E. McKenney <paulmck at linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng at gmail.com>
CC: Andrew Hunter <ahh at google.com>
CC: Maged Michael <maged.michael at gmail.com>
CC: Avi Kivity <avi at scylladb.com>
CC: Benjamin Herrenschmidt <benh at kernel.crashing.org>
CC: Paul Mackerras <paulus at samba.org>
CC: Michael Ellerman <mpe at ellerman.id.au>
CC: Dave Watson <davejwatson at fb.com>
CC: Alan Stern <stern at rowland.harvard.edu>
CC: Will Deacon <will.deacon at arm.com>
CC: Andy Lutomirski <luto at kernel.org>
CC: Alice Ferrazzi <alice.ferrazzi at gmail.com>
CC: Paul Elder <paul.elder at pitt.edu>
CC: linux-kselftest at vger.kernel.org
CC: linux-arch at vger.kernel.org
---
Changes since v1:
- return result of ksft_exit_pass from main(), silencing compiler
  warning about missing return value.
---
 .../testing/selftests/membarrier/membarrier_test.c | 111 ++++++++++++++++++---
 1 file changed, 95 insertions(+), 16 deletions(-)

diff --git a/tools/testing/selftests/membarrier/membarrier_test.c b/tools/testing/selftests/membarrier/membarrier_test.c
index 9e674d9514d1..e6ee73d01fa1 100644
--- a/tools/testing/selftests/membarrier/membarrier_test.c
+++ b/tools/testing/selftests/membarrier/membarrier_test.c
@@ -16,49 +16,119 @@ static int sys_membarrier(int cmd, int flags)
 static int test_membarrier_cmd_fail(void)
 {
 	int cmd = -1, flags = 0;
+	const char *test_name = "sys membarrier invalid command";
 
 	if (sys_membarrier(cmd, flags) != -1) {
 		ksft_exit_fail_msg(
-			"sys membarrier invalid command test: command = %d, flags = %d. Should fail, but passed\n",
-			cmd, flags);
+			"%s test: command = %d, flags = %d. Should fail, but passed\n",
+			test_name, cmd, flags);
+	}
+	if (errno != EINVAL) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EINVAL, strerror(EINVAL),
+			errno, strerror(errno));
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier invalid command test: command = %d, flags = %d. Failed as expected\n",
-		cmd, flags);
+		"%s test: command = %d, flags = %d, errno = %d. Failed as expected\n",
+		test_name, cmd, flags, errno);
 	return 0;
 }
 
 static int test_membarrier_flags_fail(void)
 {
 	int cmd = MEMBARRIER_CMD_QUERY, flags = 1;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_QUERY invalid flags";
 
 	if (sys_membarrier(cmd, flags) != -1) {
 		ksft_exit_fail_msg(
-			"sys membarrier MEMBARRIER_CMD_QUERY invalid flags test: flags = %d. Should fail, but passed\n",
-			flags);
+			"%s test: flags = %d. Should fail, but passed\n",
+			test_name, flags);
+	}
+	if (errno != EINVAL) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EINVAL, strerror(EINVAL),
+			errno, strerror(errno));
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier MEMBARRIER_CMD_QUERY invalid flags test: flags = %d. Failed as expected\n",
-		flags);
+		"%s test: flags = %d, errno = %d. Failed as expected\n",
+		test_name, flags, errno);
 	return 0;
 }
 
-static int test_membarrier_success(void)
+static int test_membarrier_shared_success(void)
 {
 	int cmd = MEMBARRIER_CMD_SHARED, flags = 0;
-	const char *test_name = "sys membarrier MEMBARRIER_CMD_SHARED\n";
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_SHARED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n", test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_private_expedited_fail(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED not registered failure";
+
+	if (sys_membarrier(cmd, flags) != -1) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should fail, but passed\n",
+			test_name, flags);
+	}
+	if (errno != EPERM) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EPERM, strerror(EPERM),
+			errno, strerror(errno));
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d, errno = %d\n",
+		test_name, flags, errno);
+	return 0;
+}
+
+static int test_membarrier_register_private_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED";
 
 	if (sys_membarrier(cmd, flags) != 0) {
 		ksft_exit_fail_msg(
-			"sys membarrier MEMBARRIER_CMD_SHARED test: flags = %d\n",
-			flags);
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier MEMBARRIER_CMD_SHARED test: flags = %d\n",
-		flags);
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_private_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
 	return 0;
 }
 
@@ -72,7 +142,16 @@ static int test_membarrier(void)
 	status = test_membarrier_flags_fail();
 	if (status)
 		return status;
-	status = test_membarrier_success();
+	status = test_membarrier_shared_success();
+	if (status)
+		return status;
+	status = test_membarrier_private_expedited_fail();
+	if (status)
+		return status;
+	status = test_membarrier_register_private_expedited_success();
+	if (status)
+		return status;
+	status = test_membarrier_private_expedited_success();
 	if (status)
 		return status;
 	return 0;
@@ -108,5 +187,5 @@ int main(int argc, char **argv)
 	test_membarrier_query();
 	test_membarrier();
 
-	ksft_exit_pass();
+	return ksft_exit_pass();
 }
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 18/22] membarrier: selftest: Test private expedited cmd
@ 2017-11-21 14:18   ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Maged Michael,
	Avi Kivity

Test the new MEMBARRIER_CMD_PRIVATE_EXPEDITED and
MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED commands.

Add checks expecting specific error values on system calls expected to
fail.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Alan Stern <stern@rowland.harvard.edu>
CC: Will Deacon <will.deacon@arm.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: Alice Ferrazzi <alice.ferrazzi@gmail.com>
CC: Paul Elder <paul.elder@pitt.edu>
CC: linux-kselftest@vger.kernel.org
CC: linux-arch@vger.kernel.org
---
Changes since v1:
- return result of ksft_exit_pass from main(), silencing compiler
  warning about missing return value.
---
 .../testing/selftests/membarrier/membarrier_test.c | 111 ++++++++++++++++++---
 1 file changed, 95 insertions(+), 16 deletions(-)

diff --git a/tools/testing/selftests/membarrier/membarrier_test.c b/tools/testing/selftests/membarrier/membarrier_test.c
index 9e674d9514d1..e6ee73d01fa1 100644
--- a/tools/testing/selftests/membarrier/membarrier_test.c
+++ b/tools/testing/selftests/membarrier/membarrier_test.c
@@ -16,49 +16,119 @@ static int sys_membarrier(int cmd, int flags)
 static int test_membarrier_cmd_fail(void)
 {
 	int cmd = -1, flags = 0;
+	const char *test_name = "sys membarrier invalid command";
 
 	if (sys_membarrier(cmd, flags) != -1) {
 		ksft_exit_fail_msg(
-			"sys membarrier invalid command test: command = %d, flags = %d. Should fail, but passed\n",
-			cmd, flags);
+			"%s test: command = %d, flags = %d. Should fail, but passed\n",
+			test_name, cmd, flags);
+	}
+	if (errno != EINVAL) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EINVAL, strerror(EINVAL),
+			errno, strerror(errno));
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier invalid command test: command = %d, flags = %d. Failed as expected\n",
-		cmd, flags);
+		"%s test: command = %d, flags = %d, errno = %d. Failed as expected\n",
+		test_name, cmd, flags, errno);
 	return 0;
 }
 
 static int test_membarrier_flags_fail(void)
 {
 	int cmd = MEMBARRIER_CMD_QUERY, flags = 1;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_QUERY invalid flags";
 
 	if (sys_membarrier(cmd, flags) != -1) {
 		ksft_exit_fail_msg(
-			"sys membarrier MEMBARRIER_CMD_QUERY invalid flags test: flags = %d. Should fail, but passed\n",
-			flags);
+			"%s test: flags = %d. Should fail, but passed\n",
+			test_name, flags);
+	}
+	if (errno != EINVAL) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EINVAL, strerror(EINVAL),
+			errno, strerror(errno));
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier MEMBARRIER_CMD_QUERY invalid flags test: flags = %d. Failed as expected\n",
-		flags);
+		"%s test: flags = %d, errno = %d. Failed as expected\n",
+		test_name, flags, errno);
 	return 0;
 }
 
-static int test_membarrier_success(void)
+static int test_membarrier_shared_success(void)
 {
 	int cmd = MEMBARRIER_CMD_SHARED, flags = 0;
-	const char *test_name = "sys membarrier MEMBARRIER_CMD_SHARED\n";
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_SHARED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n", test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_private_expedited_fail(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED not registered failure";
+
+	if (sys_membarrier(cmd, flags) != -1) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should fail, but passed\n",
+			test_name, flags);
+	}
+	if (errno != EPERM) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EPERM, strerror(EPERM),
+			errno, strerror(errno));
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d, errno = %d\n",
+		test_name, flags, errno);
+	return 0;
+}
+
+static int test_membarrier_register_private_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED";
 
 	if (sys_membarrier(cmd, flags) != 0) {
 		ksft_exit_fail_msg(
-			"sys membarrier MEMBARRIER_CMD_SHARED test: flags = %d\n",
-			flags);
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier MEMBARRIER_CMD_SHARED test: flags = %d\n",
-		flags);
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_private_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
 	return 0;
 }
 
@@ -72,7 +142,16 @@ static int test_membarrier(void)
 	status = test_membarrier_flags_fail();
 	if (status)
 		return status;
-	status = test_membarrier_success();
+	status = test_membarrier_shared_success();
+	if (status)
+		return status;
+	status = test_membarrier_private_expedited_fail();
+	if (status)
+		return status;
+	status = test_membarrier_register_private_expedited_success();
+	if (status)
+		return status;
+	status = test_membarrier_private_expedited_success();
 	if (status)
 		return status;
 	return 0;
@@ -108,5 +187,5 @@ int main(int argc, char **argv)
 	test_membarrier_query();
 	test_membarrier();
 
-	ksft_exit_pass();
+	return ksft_exit_pass();
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 v7 19/22] powerpc: membarrier: Skip memory barrier in switch_mm()
  2017-11-21 14:18 ` Mathieu Desnoyers
@ 2017-11-21 14:18   ` Mathieu Desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Maged Michael,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Alan Stern, Andy Lutomirski, Alexander Viro,
	Nicholas Piggin, linuxppc-dev, linux-arch

Allow PowerPC to skip the full memory barrier in switch_mm(), and
only issue the barrier when scheduling into a task belonging to a
process that has registered to use expedited private.

Threads targeting the same VM but which belong to different thread
groups is a tricky case. It has a few consequences:

It turns out that we cannot rely on get_nr_threads(p) to count the
number of threads using a VM. We can use
(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)
instead to skip the synchronize_sched() for cases where the VM only has
a single user, and that user only has a single thread.

It also turns out that we cannot use for_each_thread() to set
thread flags in all threads using a VM, as it only iterates on the
thread group.

Therefore, test the membarrier state variable directly rather than
relying on thread flags. This means
membarrier_register_private_expedited() needs to set the
MEMBARRIER_STATE_PRIVATE_EXPEDITED flag, issue synchronize_sched(), and
only then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows
private expedited membarrier commands to succeed.
membarrier_arch_switch_mm() now tests for the
MEMBARRIER_STATE_PRIVATE_EXPEDITED flag.

Changes since v1:
- Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in
  powerpc membarrier_arch_sched_in(), given that we want to specifically
  check the next thread state.
- Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig.
- Use task_thread_info() to pass thread_info from task to
  *_ti_thread_flag().

Changes since v2:
- Move membarrier_arch_sched_in() call to finish_task_switch().
- Check for NULL t->mm in membarrier_arch_fork().
- Use membarrier_sched_in() in generic code, which invokes the
  arch-specific membarrier_arch_sched_in(). This fixes allnoconfig
  build on PowerPC.
- Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing
  allnoconfig build on PowerPC.
- Build and runtime tested on PowerPC.

Changes since v3:
- Simply rely on copy_mm() to copy the membarrier_private_expedited mm
  field on fork.
- powerpc: test thread flag instead of reading
  membarrier_private_expedited in membarrier_arch_fork().
- powerpc: skip memory barrier in membarrier_arch_sched_in() if coming
  from kernel thread, since mmdrop() implies a full barrier.
- Set membarrier_private_expedited to 1 only after arch registration
  code, thus eliminating a race where concurrent commands could succeed
  when they should fail if issued concurrently with process
  registration.
- Use READ_ONCE() for membarrier_private_expedited field access in
  membarrier_private_expedited. Matches WRITE_ONCE() performed in
  process registration.

Changes since v4:
- Move powerpc hook from sched_in() to switch_mm(), based on feedback
  from Nicholas Piggin.

Changes since v5:
- Rebase on v4.14-rc6.
- Fold "Fix: membarrier: Handle CLONE_VM + !CLONE_THREAD correctly on
  powerpc (v2)"

Changes since v6:
- Rename MEMBARRIER_STATE_SWITCH_MM to MEMBARRIER_STATE_PRIVATE_EXPEDITED.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Alan Stern <stern@rowland.harvard.edu>
CC: Will Deacon <will.deacon@arm.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: Ingo Molnar <mingo@redhat.com>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Nicholas Piggin <npiggin@gmail.com>
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-arch@vger.kernel.org
---
 MAINTAINERS                           |  1 +
 arch/powerpc/Kconfig                  |  1 +
 arch/powerpc/include/asm/membarrier.h | 25 +++++++++++++++++++++++++
 arch/powerpc/mm/mmu_context.c         |  7 +++++++
 include/linux/sched/mm.h              | 12 +++++++++++-
 init/Kconfig                          |  3 +++
 kernel/sched/core.c                   | 10 ----------
 kernel/sched/membarrier.c             |  9 +++++++++
 8 files changed, 57 insertions(+), 11 deletions(-)
 create mode 100644 arch/powerpc/include/asm/membarrier.h

diff --git a/MAINTAINERS b/MAINTAINERS
index ba9137c1f295..92f460afeaaf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8938,6 +8938,7 @@ L:	linux-kernel@vger.kernel.org
 S:	Supported
 F:	kernel/sched/membarrier.c
 F:	include/uapi/linux/membarrier.h
+F:	arch/powerpc/include/asm/membarrier.h
 
 MEMORY MANAGEMENT
 L:	linux-mm@kvack.org
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e9992f80819c..d41c6ede0709 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -140,6 +140,7 @@ config PPC
 	select ARCH_HAS_FORTIFY_SOURCE
 	select ARCH_HAS_GCOV_PROFILE_ALL
 	select ARCH_HAS_PMEM_API                if PPC64
+	select ARCH_HAS_MEMBARRIER_HOOKS
 	select ARCH_HAS_SCALED_CPUTIME		if VIRT_CPU_ACCOUNTING_NATIVE
 	select ARCH_HAS_SG_CHAIN
 	select ARCH_HAS_TICK_BROADCAST		if GENERIC_CLOCKEVENTS_BROADCAST
diff --git a/arch/powerpc/include/asm/membarrier.h b/arch/powerpc/include/asm/membarrier.h
new file mode 100644
index 000000000000..046f96768ab5
--- /dev/null
+++ b/arch/powerpc/include/asm/membarrier.h
@@ -0,0 +1,25 @@
+#ifndef _ASM_POWERPC_MEMBARRIER_H
+#define _ASM_POWERPC_MEMBARRIER_H
+
+static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
+		struct mm_struct *next, struct task_struct *tsk)
+{
+	/*
+	 * Only need the full barrier when switching between processes.
+	 * Barrier when switching from kernel to userspace is not
+	 * required here, given that it is implied by mmdrop(). Barrier
+	 * when switching from userspace to kernel is not needed after
+	 * store to rq->curr.
+	 */
+	if (likely(!(atomic_read(&next->membarrier_state)
+			& MEMBARRIER_STATE_PRIVATE_EXPEDITED) || !prev))
+		return;
+
+	/*
+	 * The membarrier system call requires a full memory barrier
+	 * after storing to rq->curr, before going back to user-space.
+	 */
+	smp_mb();
+}
+
+#endif /* _ASM_POWERPC_MEMBARRIER_H */
diff --git a/arch/powerpc/mm/mmu_context.c b/arch/powerpc/mm/mmu_context.c
index d60a62bf4fc7..0ab297c4cfad 100644
--- a/arch/powerpc/mm/mmu_context.c
+++ b/arch/powerpc/mm/mmu_context.c
@@ -12,6 +12,7 @@
 
 #include <linux/mm.h>
 #include <linux/cpu.h>
+#include <linux/sched/mm.h>
 
 #include <asm/mmu_context.h>
 
@@ -58,6 +59,10 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 		 *
 		 * On the read side the barrier is in pte_xchg(), which orders
 		 * the store to the PTE vs the load of mm_cpumask.
+		 *
+		 * This full barrier is needed by membarrier when switching
+		 * between processes after store to rq->curr, before user-space
+		 * memory accesses.
 		 */
 		smp_mb();
 
@@ -80,6 +85,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 
 	if (new_on_cpu)
 		radix_kvm_prefetch_workaround(next);
+	else
+		membarrier_arch_switch_mm(prev, next, tsk);
 
 	/*
 	 * The actual HW switching method differs between the various
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 3d49b91b674d..7077253d0df4 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -215,14 +215,24 @@ static inline void memalloc_noreclaim_restore(unsigned int flags)
 #ifdef CONFIG_MEMBARRIER
 enum {
 	MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY	= (1U << 0),
-	MEMBARRIER_STATE_SWITCH_MM			= (1U << 1),
+	MEMBARRIER_STATE_PRIVATE_EXPEDITED		= (1U << 1),
 };
 
+#ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
+#include <asm/membarrier.h>
+#endif
+
 static inline void membarrier_execve(struct task_struct *t)
 {
 	atomic_set(&t->mm->membarrier_state, 0);
 }
 #else
+#ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
+static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
+		struct mm_struct *next, struct task_struct *tsk)
+{
+}
+#endif
 static inline void membarrier_execve(struct task_struct *t)
 {
 }
diff --git a/init/Kconfig b/init/Kconfig
index acf678e2363c..7300640235dc 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1400,6 +1400,9 @@ config USERFAULTFD
 	  Enable the userfaultfd() system call that allows to intercept and
 	  handle page faults in userland.
 
+config ARCH_HAS_MEMBARRIER_HOOKS
+	bool
+
 config RSEQ
 	bool "Enable rseq() system call" if EXPERT
 	default y
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4bbe297574b5..55cc426ff46e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2693,16 +2693,6 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	prev_state = prev->state;
 	vtime_task_switch(prev);
 	perf_event_task_sched_in(prev, current);
-	/*
-	 * The membarrier system call requires a full memory barrier
-	 * after storing to rq->curr, before going back to user-space.
-	 *
-	 * TODO: This smp_mb__after_unlock_lock can go away if PPC end
-	 * up adding a full barrier to switch_mm(), or we should figure
-	 * out if a smp_mb__after_unlock_lock is really the proper API
-	 * to use.
-	 */
-	smp_mb__after_unlock_lock();
 	finish_lock_switch(rq, prev);
 	finish_arch_post_lock_switch();
 
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 9bcbacba82a8..7d2ec7202ba8 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -118,6 +118,15 @@ static void membarrier_register_private_expedited(void)
 	if (atomic_read(&mm->membarrier_state)
 			& MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY)
 		return;
+	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED,
+			&mm->membarrier_state);
+	if (!(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)) {
+		/*
+		 * Ensure all future scheduler executions will observe the
+		 * new thread flag state for this process.
+		 */
+		synchronize_sched();
+	}
 	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY,
 			&mm->membarrier_state);
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 v7 19/22] powerpc: membarrier: Skip memory barrier in switch_mm()
@ 2017-11-21 14:18   ` Mathieu Desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Maged Michael,
	Avi Kivity

Allow PowerPC to skip the full memory barrier in switch_mm(), and
only issue the barrier when scheduling into a task belonging to a
process that has registered to use expedited private.

Threads targeting the same VM but which belong to different thread
groups is a tricky case. It has a few consequences:

It turns out that we cannot rely on get_nr_threads(p) to count the
number of threads using a VM. We can use
(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)
instead to skip the synchronize_sched() for cases where the VM only has
a single user, and that user only has a single thread.

It also turns out that we cannot use for_each_thread() to set
thread flags in all threads using a VM, as it only iterates on the
thread group.

Therefore, test the membarrier state variable directly rather than
relying on thread flags. This means
membarrier_register_private_expedited() needs to set the
MEMBARRIER_STATE_PRIVATE_EXPEDITED flag, issue synchronize_sched(), and
only then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows
private expedited membarrier commands to succeed.
membarrier_arch_switch_mm() now tests for the
MEMBARRIER_STATE_PRIVATE_EXPEDITED flag.

Changes since v1:
- Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in
  powerpc membarrier_arch_sched_in(), given that we want to specifically
  check the next thread state.
- Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig.
- Use task_thread_info() to pass thread_info from task to
  *_ti_thread_flag().

Changes since v2:
- Move membarrier_arch_sched_in() call to finish_task_switch().
- Check for NULL t->mm in membarrier_arch_fork().
- Use membarrier_sched_in() in generic code, which invokes the
  arch-specific membarrier_arch_sched_in(). This fixes allnoconfig
  build on PowerPC.
- Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing
  allnoconfig build on PowerPC.
- Build and runtime tested on PowerPC.

Changes since v3:
- Simply rely on copy_mm() to copy the membarrier_private_expedited mm
  field on fork.
- powerpc: test thread flag instead of reading
  membarrier_private_expedited in membarrier_arch_fork().
- powerpc: skip memory barrier in membarrier_arch_sched_in() if coming
  from kernel thread, since mmdrop() implies a full barrier.
- Set membarrier_private_expedited to 1 only after arch registration
  code, thus eliminating a race where concurrent commands could succeed
  when they should fail if issued concurrently with process
  registration.
- Use READ_ONCE() for membarrier_private_expedited field access in
  membarrier_private_expedited. Matches WRITE_ONCE() performed in
  process registration.

Changes since v4:
- Move powerpc hook from sched_in() to switch_mm(), based on feedback
  from Nicholas Piggin.

Changes since v5:
- Rebase on v4.14-rc6.
- Fold "Fix: membarrier: Handle CLONE_VM + !CLONE_THREAD correctly on
  powerpc (v2)"

Changes since v6:
- Rename MEMBARRIER_STATE_SWITCH_MM to MEMBARRIER_STATE_PRIVATE_EXPEDITED.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Alan Stern <stern@rowland.harvard.edu>
CC: Will Deacon <will.deacon@arm.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: Ingo Molnar <mingo@redhat.com>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Nicholas Piggin <npiggin@gmail.com>
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-arch@vger.kernel.org
---
 MAINTAINERS                           |  1 +
 arch/powerpc/Kconfig                  |  1 +
 arch/powerpc/include/asm/membarrier.h | 25 +++++++++++++++++++++++++
 arch/powerpc/mm/mmu_context.c         |  7 +++++++
 include/linux/sched/mm.h              | 12 +++++++++++-
 init/Kconfig                          |  3 +++
 kernel/sched/core.c                   | 10 ----------
 kernel/sched/membarrier.c             |  9 +++++++++
 8 files changed, 57 insertions(+), 11 deletions(-)
 create mode 100644 arch/powerpc/include/asm/membarrier.h

diff --git a/MAINTAINERS b/MAINTAINERS
index ba9137c1f295..92f460afeaaf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8938,6 +8938,7 @@ L:	linux-kernel@vger.kernel.org
 S:	Supported
 F:	kernel/sched/membarrier.c
 F:	include/uapi/linux/membarrier.h
+F:	arch/powerpc/include/asm/membarrier.h
 
 MEMORY MANAGEMENT
 L:	linux-mm@kvack.org
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e9992f80819c..d41c6ede0709 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -140,6 +140,7 @@ config PPC
 	select ARCH_HAS_FORTIFY_SOURCE
 	select ARCH_HAS_GCOV_PROFILE_ALL
 	select ARCH_HAS_PMEM_API                if PPC64
+	select ARCH_HAS_MEMBARRIER_HOOKS
 	select ARCH_HAS_SCALED_CPUTIME		if VIRT_CPU_ACCOUNTING_NATIVE
 	select ARCH_HAS_SG_CHAIN
 	select ARCH_HAS_TICK_BROADCAST		if GENERIC_CLOCKEVENTS_BROADCAST
diff --git a/arch/powerpc/include/asm/membarrier.h b/arch/powerpc/include/asm/membarrier.h
new file mode 100644
index 000000000000..046f96768ab5
--- /dev/null
+++ b/arch/powerpc/include/asm/membarrier.h
@@ -0,0 +1,25 @@
+#ifndef _ASM_POWERPC_MEMBARRIER_H
+#define _ASM_POWERPC_MEMBARRIER_H
+
+static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
+		struct mm_struct *next, struct task_struct *tsk)
+{
+	/*
+	 * Only need the full barrier when switching between processes.
+	 * Barrier when switching from kernel to userspace is not
+	 * required here, given that it is implied by mmdrop(). Barrier
+	 * when switching from userspace to kernel is not needed after
+	 * store to rq->curr.
+	 */
+	if (likely(!(atomic_read(&next->membarrier_state)
+			& MEMBARRIER_STATE_PRIVATE_EXPEDITED) || !prev))
+		return;
+
+	/*
+	 * The membarrier system call requires a full memory barrier
+	 * after storing to rq->curr, before going back to user-space.
+	 */
+	smp_mb();
+}
+
+#endif /* _ASM_POWERPC_MEMBARRIER_H */
diff --git a/arch/powerpc/mm/mmu_context.c b/arch/powerpc/mm/mmu_context.c
index d60a62bf4fc7..0ab297c4cfad 100644
--- a/arch/powerpc/mm/mmu_context.c
+++ b/arch/powerpc/mm/mmu_context.c
@@ -12,6 +12,7 @@
 
 #include <linux/mm.h>
 #include <linux/cpu.h>
+#include <linux/sched/mm.h>
 
 #include <asm/mmu_context.h>
 
@@ -58,6 +59,10 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 		 *
 		 * On the read side the barrier is in pte_xchg(), which orders
 		 * the store to the PTE vs the load of mm_cpumask.
+		 *
+		 * This full barrier is needed by membarrier when switching
+		 * between processes after store to rq->curr, before user-space
+		 * memory accesses.
 		 */
 		smp_mb();
 
@@ -80,6 +85,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 
 	if (new_on_cpu)
 		radix_kvm_prefetch_workaround(next);
+	else
+		membarrier_arch_switch_mm(prev, next, tsk);
 
 	/*
 	 * The actual HW switching method differs between the various
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 3d49b91b674d..7077253d0df4 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -215,14 +215,24 @@ static inline void memalloc_noreclaim_restore(unsigned int flags)
 #ifdef CONFIG_MEMBARRIER
 enum {
 	MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY	= (1U << 0),
-	MEMBARRIER_STATE_SWITCH_MM			= (1U << 1),
+	MEMBARRIER_STATE_PRIVATE_EXPEDITED		= (1U << 1),
 };
 
+#ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
+#include <asm/membarrier.h>
+#endif
+
 static inline void membarrier_execve(struct task_struct *t)
 {
 	atomic_set(&t->mm->membarrier_state, 0);
 }
 #else
+#ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
+static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
+		struct mm_struct *next, struct task_struct *tsk)
+{
+}
+#endif
 static inline void membarrier_execve(struct task_struct *t)
 {
 }
diff --git a/init/Kconfig b/init/Kconfig
index acf678e2363c..7300640235dc 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1400,6 +1400,9 @@ config USERFAULTFD
 	  Enable the userfaultfd() system call that allows to intercept and
 	  handle page faults in userland.
 
+config ARCH_HAS_MEMBARRIER_HOOKS
+	bool
+
 config RSEQ
 	bool "Enable rseq() system call" if EXPERT
 	default y
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4bbe297574b5..55cc426ff46e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2693,16 +2693,6 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	prev_state = prev->state;
 	vtime_task_switch(prev);
 	perf_event_task_sched_in(prev, current);
-	/*
-	 * The membarrier system call requires a full memory barrier
-	 * after storing to rq->curr, before going back to user-space.
-	 *
-	 * TODO: This smp_mb__after_unlock_lock can go away if PPC end
-	 * up adding a full barrier to switch_mm(), or we should figure
-	 * out if a smp_mb__after_unlock_lock is really the proper API
-	 * to use.
-	 */
-	smp_mb__after_unlock_lock();
 	finish_lock_switch(rq, prev);
 	finish_arch_post_lock_switch();
 
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 9bcbacba82a8..7d2ec7202ba8 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -118,6 +118,15 @@ static void membarrier_register_private_expedited(void)
 	if (atomic_read(&mm->membarrier_state)
 			& MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY)
 		return;
+	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED,
+			&mm->membarrier_state);
+	if (!(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)) {
+		/*
+		 * Ensure all future scheduler executions will observe the
+		 * new thread flag state for this process.
+		 */
+		synchronize_sched();
+	}
 	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY,
 			&mm->membarrier_state);
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 v5 20/22] membarrier: Document scheduler barrier requirements
  2017-11-21 14:18 ` Mathieu Desnoyers
@ 2017-11-21 14:18   ` Mathieu Desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Maged Michael,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Andrea Parri, x86

Document the membarrier requirement on having a full memory barrier in
__schedule() after coming from user-space, before storing to rq->curr.
It is provided by smp_mb__after_spinlock() in __schedule().

Document that membarrier requires a full barrier on transition from
kernel thread to userspace thread. We currently have an implicit barrier
from atomic_dec_and_test() in mmdrop() that ensures this.

The x86 switch_mm_irqs_off() full barrier is currently provided by many
cpumask update operations as well as write_cr3(). Document that
write_cr3() provides this barrier.

Changes since v1:
- Update comments to match reality for code paths which are after
  storing to rq->curr, before returning to user-space, based on feedback
  from Andrea Parri.
Changes since v2:
- Update changelog (smp_mb__before_spinlock -> smp_mb__after_spinlock).
  Based on feedback from Andrea Parri.
Changes since v3:
- Clarify comments following feeback from Peter Zijlstra.
Changes since v4:
- Update comment regarding powerpc barrier.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andrea Parri <parri.andrea@gmail.com>
CC: x86@kernel.org
---
 arch/x86/mm/tlb.c        |  5 +++++
 include/linux/sched/mm.h |  5 +++++
 kernel/sched/core.c      | 37 ++++++++++++++++++++++++++-----------
 3 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 3118392cdf75..5abf9bfcca1f 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -146,6 +146,11 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 #endif
 	this_cpu_write(cpu_tlbstate.is_lazy, false);
 
+	/*
+	 * The membarrier system call requires a full memory barrier
+	 * before returning to user-space, after storing to rq->curr.
+	 * Writing to CR3 provides that full memory barrier.
+	 */
 	if (real_prev == next) {
 		VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
 			   next->context.ctx_id);
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 7077253d0df4..0f9e1a96b890 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -39,6 +39,11 @@ static inline void mmgrab(struct mm_struct *mm)
 extern void __mmdrop(struct mm_struct *);
 static inline void mmdrop(struct mm_struct *mm)
 {
+	/*
+	 * The implicit full barrier implied by atomic_dec_and_test is
+	 * required by the membarrier system call before returning to
+	 * user-space, after storing to rq->curr.
+	 */
 	if (unlikely(atomic_dec_and_test(&mm->mm_count)))
 		__mmdrop(mm);
 }
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 55cc426ff46e..0151eb1fcd5b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2697,6 +2697,12 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	finish_arch_post_lock_switch();
 
 	fire_sched_in_preempt_notifiers(current);
+	/*
+	 * When transitioning from a kernel thread to a userspace
+	 * thread, mmdrop()'s implicit full barrier is required by the
+	 * membarrier system call, because the current active_mm can
+	 * become the current mm without going through switch_mm().
+	 */
 	if (mm)
 		mmdrop(mm);
 	if (unlikely(prev_state == TASK_DEAD)) {
@@ -2802,6 +2808,13 @@ context_switch(struct rq *rq, struct task_struct *prev,
 	 */
 	arch_start_context_switch(prev);
 
+	/*
+	 * If mm is non-NULL, we pass through switch_mm(). If mm is
+	 * NULL, we will pass through mmdrop() in finish_task_switch().
+	 * Both of these contain the full memory barrier required by
+	 * membarrier after storing to rq->curr, before returning to
+	 * user-space.
+	 */
 	if (!mm) {
 		next->active_mm = oldmm;
 		mmgrab(oldmm);
@@ -3338,6 +3351,9 @@ static void __sched notrace __schedule(bool preempt)
 	 * Make sure that signal_pending_state()->signal_pending() below
 	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
 	 * done by the caller to avoid the race with signal_wake_up().
+	 *
+	 * The membarrier system call requires a full memory barrier
+	 * after coming from user-space, before storing to rq->curr.
 	 */
 	rq_lock(rq, &rf);
 	smp_mb__after_spinlock();
@@ -3386,17 +3402,16 @@ static void __sched notrace __schedule(bool preempt)
 		/*
 		 * The membarrier system call requires each architecture
 		 * to have a full memory barrier after updating
-		 * rq->curr, before returning to user-space. For TSO
-		 * (e.g. x86), the architecture must provide its own
-		 * barrier in switch_mm(). For weakly ordered machines
-		 * for which spin_unlock() acts as a full memory
-		 * barrier, finish_lock_switch() in common code takes
-		 * care of this barrier. For weakly ordered machines for
-		 * which spin_unlock() acts as a RELEASE barrier (only
-		 * arm64 and PowerPC), arm64 has a full barrier in
-		 * switch_to(), and PowerPC has
-		 * smp_mb__after_unlock_lock() before
-		 * finish_lock_switch().
+		 * rq->curr, before returning to user-space.
+		 *
+		 * Here are the schemes providing that barrier on the
+		 * various architectures:
+		 * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC.
+		 *   switch_mm() rely on membarrier_arch_switch_mm() on PowerPC.
+		 * - finish_lock_switch() for weakly-ordered
+		 *   architectures where spin_unlock is a full barrier,
+		 * - switch_to() for arm64 (weakly-ordered, spin_unlock
+		 *   is a RELEASE barrier),
 		 */
 		++*switch_count;
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 v5 20/22] membarrier: Document scheduler barrier requirements
@ 2017-11-21 14:18   ` Mathieu Desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Maged Michael,
	Avi Kivity

Document the membarrier requirement on having a full memory barrier in
__schedule() after coming from user-space, before storing to rq->curr.
It is provided by smp_mb__after_spinlock() in __schedule().

Document that membarrier requires a full barrier on transition from
kernel thread to userspace thread. We currently have an implicit barrier
from atomic_dec_and_test() in mmdrop() that ensures this.

The x86 switch_mm_irqs_off() full barrier is currently provided by many
cpumask update operations as well as write_cr3(). Document that
write_cr3() provides this barrier.

Changes since v1:
- Update comments to match reality for code paths which are after
  storing to rq->curr, before returning to user-space, based on feedback
  from Andrea Parri.
Changes since v2:
- Update changelog (smp_mb__before_spinlock -> smp_mb__after_spinlock).
  Based on feedback from Andrea Parri.
Changes since v3:
- Clarify comments following feeback from Peter Zijlstra.
Changes since v4:
- Update comment regarding powerpc barrier.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andrea Parri <parri.andrea@gmail.com>
CC: x86@kernel.org
---
 arch/x86/mm/tlb.c        |  5 +++++
 include/linux/sched/mm.h |  5 +++++
 kernel/sched/core.c      | 37 ++++++++++++++++++++++++++-----------
 3 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 3118392cdf75..5abf9bfcca1f 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -146,6 +146,11 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 #endif
 	this_cpu_write(cpu_tlbstate.is_lazy, false);
 
+	/*
+	 * The membarrier system call requires a full memory barrier
+	 * before returning to user-space, after storing to rq->curr.
+	 * Writing to CR3 provides that full memory barrier.
+	 */
 	if (real_prev == next) {
 		VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
 			   next->context.ctx_id);
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 7077253d0df4..0f9e1a96b890 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -39,6 +39,11 @@ static inline void mmgrab(struct mm_struct *mm)
 extern void __mmdrop(struct mm_struct *);
 static inline void mmdrop(struct mm_struct *mm)
 {
+	/*
+	 * The implicit full barrier implied by atomic_dec_and_test is
+	 * required by the membarrier system call before returning to
+	 * user-space, after storing to rq->curr.
+	 */
 	if (unlikely(atomic_dec_and_test(&mm->mm_count)))
 		__mmdrop(mm);
 }
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 55cc426ff46e..0151eb1fcd5b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2697,6 +2697,12 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	finish_arch_post_lock_switch();
 
 	fire_sched_in_preempt_notifiers(current);
+	/*
+	 * When transitioning from a kernel thread to a userspace
+	 * thread, mmdrop()'s implicit full barrier is required by the
+	 * membarrier system call, because the current active_mm can
+	 * become the current mm without going through switch_mm().
+	 */
 	if (mm)
 		mmdrop(mm);
 	if (unlikely(prev_state == TASK_DEAD)) {
@@ -2802,6 +2808,13 @@ context_switch(struct rq *rq, struct task_struct *prev,
 	 */
 	arch_start_context_switch(prev);
 
+	/*
+	 * If mm is non-NULL, we pass through switch_mm(). If mm is
+	 * NULL, we will pass through mmdrop() in finish_task_switch().
+	 * Both of these contain the full memory barrier required by
+	 * membarrier after storing to rq->curr, before returning to
+	 * user-space.
+	 */
 	if (!mm) {
 		next->active_mm = oldmm;
 		mmgrab(oldmm);
@@ -3338,6 +3351,9 @@ static void __sched notrace __schedule(bool preempt)
 	 * Make sure that signal_pending_state()->signal_pending() below
 	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
 	 * done by the caller to avoid the race with signal_wake_up().
+	 *
+	 * The membarrier system call requires a full memory barrier
+	 * after coming from user-space, before storing to rq->curr.
 	 */
 	rq_lock(rq, &rf);
 	smp_mb__after_spinlock();
@@ -3386,17 +3402,16 @@ static void __sched notrace __schedule(bool preempt)
 		/*
 		 * The membarrier system call requires each architecture
 		 * to have a full memory barrier after updating
-		 * rq->curr, before returning to user-space. For TSO
-		 * (e.g. x86), the architecture must provide its own
-		 * barrier in switch_mm(). For weakly ordered machines
-		 * for which spin_unlock() acts as a full memory
-		 * barrier, finish_lock_switch() in common code takes
-		 * care of this barrier. For weakly ordered machines for
-		 * which spin_unlock() acts as a RELEASE barrier (only
-		 * arm64 and PowerPC), arm64 has a full barrier in
-		 * switch_to(), and PowerPC has
-		 * smp_mb__after_unlock_lock() before
-		 * finish_lock_switch().
+		 * rq->curr, before returning to user-space.
+		 *
+		 * Here are the schemes providing that barrier on the
+		 * various architectures:
+		 * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC.
+		 *   switch_mm() rely on membarrier_arch_switch_mm() on PowerPC.
+		 * - finish_lock_switch() for weakly-ordered
+		 *   architectures where spin_unlock is a full barrier,
+		 * - switch_to() for arm64 (weakly-ordered, spin_unlock
+		 *   is a RELEASE barrier),
 		 */
 		++*switch_count;
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 v2 21/22] membarrier: provide SHARED_EXPEDITED command
  2017-11-21 14:18 ` Mathieu Desnoyers
@ 2017-11-21 14:18   ` Mathieu Desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Maged Michael,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Andrea Parri, x86

Allow expedited membarrier to be used for data shared between processes
(shared memory).

Processes wishing to receive the membarriers register with
MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED. Those which want to issue
membarrier invoke MEMBARRIER_CMD_SHARED_EXPEDITED.

This allows extremely simple kernel-level implementation: we have almost
everything we need with the PRIVATE_EXPEDITED barrier code. All we need
to do is to add a flag in the mm_struct that will be used to check
whether we need to send the IPI to the current thread of each CPU.

There is a slight downside of this approach compared to targeting
specific shared memory users: when performing a membarrier operation,
all registered "shared" receivers will get the barrier, even if they
don't share a memory mapping with the "sender" issuing
MEMBARRIER_CMD_SHARED_EXPEDITED.

This registration approach seems to fit the requirement of not
disturbing processes that really deeply care about real-time: they
simply should not register with MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andrea Parri <parri.andrea@gmail.com>
CC: x86@kernel.org
---
Changes since v1:
- Add missing preempt disable around smp_call_function_many().
---
 arch/powerpc/include/asm/membarrier.h |   3 +-
 include/linux/sched/mm.h              |   6 +-
 include/uapi/linux/membarrier.h       |  34 ++++++++--
 kernel/sched/membarrier.c             | 114 ++++++++++++++++++++++++++++++++--
 4 files changed, 143 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/membarrier.h b/arch/powerpc/include/asm/membarrier.h
index 046f96768ab5..ddf4baedd132 100644
--- a/arch/powerpc/include/asm/membarrier.h
+++ b/arch/powerpc/include/asm/membarrier.h
@@ -12,7 +12,8 @@ static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
 	 * store to rq->curr.
 	 */
 	if (likely(!(atomic_read(&next->membarrier_state)
-			& MEMBARRIER_STATE_PRIVATE_EXPEDITED) || !prev))
+			& (MEMBARRIER_STATE_PRIVATE_EXPEDITED
+			| MEMBARRIER_STATE_SHARED_EXPEDITED)) || !prev))
 		return;
 
 	/*
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 0f9e1a96b890..c7b0f5970d7c 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -219,8 +219,10 @@ static inline void memalloc_noreclaim_restore(unsigned int flags)
 
 #ifdef CONFIG_MEMBARRIER
 enum {
-	MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY	= (1U << 0),
-	MEMBARRIER_STATE_PRIVATE_EXPEDITED		= (1U << 1),
+	MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY		= (1U << 0),
+	MEMBARRIER_STATE_PRIVATE_EXPEDITED			= (1U << 1),
+	MEMBARRIER_STATE_SHARED_EXPEDITED_READY			= (1U << 2),
+	MEMBARRIER_STATE_SHARED_EXPEDITED			= (1U << 3),
 };
 
 #ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
index 4e01ad7ffe98..2de01e595d3b 100644
--- a/include/uapi/linux/membarrier.h
+++ b/include/uapi/linux/membarrier.h
@@ -40,6 +40,28 @@
  *                          (non-running threads are de facto in such a
  *                          state). This covers threads from all processes
  *                          running on the system. This command returns 0.
+ * @MEMBARRIER_CMD_SHARED_EXPEDITED:
+ *                          Execute a memory barrier on all running threads
+ *                          part of a process which previously registered
+ *                          with MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED.
+ *                          Upon return from system call, the caller thread
+ *                          is ensured that all running threads have passed
+ *                          through a state where all memory accesses to
+ *                          user-space addresses match program order between
+ *                          entry to and return from the system call
+ *                          (non-running threads are de facto in such a
+ *                          state). This only covers threads from processes
+ *                          which registered with
+ *                          MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED.
+ *                          This command returns 0. Given that
+ *                          registration is about the intent to receive
+ *                          the barriers, it is valid to invoke
+ *                          MEMBARRIER_CMD_SHARED_EXPEDITED from a
+ *                          non-registered process.
+ * @MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED:
+ *                          Register the process intent to receive
+ *                          MEMBARRIER_CMD_SHARED_EXPEDITED memory
+ *                          barriers. Always returns 0.
  * @MEMBARRIER_CMD_PRIVATE_EXPEDITED:
  *                          Execute a memory barrier on each running
  *                          thread belonging to the same process as the current
@@ -70,12 +92,12 @@
  * the value 0.
  */
 enum membarrier_cmd {
-	MEMBARRIER_CMD_QUERY				= 0,
-	MEMBARRIER_CMD_SHARED				= (1 << 0),
-	/* reserved for MEMBARRIER_CMD_SHARED_EXPEDITED (1 << 1) */
-	/* reserved for MEMBARRIER_CMD_PRIVATE (1 << 2) */
-	MEMBARRIER_CMD_PRIVATE_EXPEDITED		= (1 << 3),
-	MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED	= (1 << 4),
+	MEMBARRIER_CMD_QUERY					= 0,
+	MEMBARRIER_CMD_SHARED					= (1 << 0),
+	MEMBARRIER_CMD_SHARED_EXPEDITED				= (1 << 1),
+	MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED		= (1 << 2),
+	MEMBARRIER_CMD_PRIVATE_EXPEDITED			= (1 << 3),
+	MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED		= (1 << 4),
 };
 
 #endif /* _UAPI_LINUX_MEMBARRIER_H */
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 7d2ec7202ba8..b1312eb9d292 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -27,7 +27,9 @@
  * except MEMBARRIER_CMD_QUERY.
  */
 #define MEMBARRIER_CMD_BITMASK	\
-	(MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED	\
+	(MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_SHARED_EXPEDITED \
+	| MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED \
+	| MEMBARRIER_CMD_PRIVATE_EXPEDITED	\
 	| MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED)
 
 static void ipi_mb(void *info)
@@ -35,6 +37,73 @@ static void ipi_mb(void *info)
 	smp_mb();	/* IPIs should be serializing but paranoid. */
 }
 
+static int membarrier_shared_expedited(void)
+{
+	int cpu;
+	bool fallback = false;
+	cpumask_var_t tmpmask;
+
+	if (num_online_cpus() == 1)
+		return 0;
+
+	/*
+	 * Matches memory barriers around rq->curr modification in
+	 * scheduler.
+	 */
+	smp_mb();	/* system call entry is not a mb. */
+
+	/*
+	 * Expedited membarrier commands guarantee that they won't
+	 * block, hence the GFP_NOWAIT allocation flag and fallback
+	 * implementation.
+	 */
+	if (!zalloc_cpumask_var(&tmpmask, GFP_NOWAIT)) {
+		/* Fallback for OOM. */
+		fallback = true;
+	}
+
+	cpus_read_lock();
+	for_each_online_cpu(cpu) {
+		struct task_struct *p;
+
+		/*
+		 * Skipping the current CPU is OK even through we can be
+		 * migrated at any point. The current CPU, at the point
+		 * where we read raw_smp_processor_id(), is ensured to
+		 * be in program order with respect to the caller
+		 * thread. Therefore, we can skip this CPU from the
+		 * iteration.
+		 */
+		if (cpu == raw_smp_processor_id())
+			continue;
+		rcu_read_lock();
+		p = task_rcu_dereference(&cpu_rq(cpu)->curr);
+		if (p && p->mm && (atomic_read(&p->mm->membarrier_state)
+				& MEMBARRIER_STATE_SHARED_EXPEDITED)) {
+			if (!fallback)
+				__cpumask_set_cpu(cpu, tmpmask);
+			else
+				smp_call_function_single(cpu, ipi_mb, NULL, 1);
+		}
+		rcu_read_unlock();
+	}
+	if (!fallback) {
+		preempt_disable();
+		smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
+		preempt_enable();
+		free_cpumask_var(tmpmask);
+	}
+	cpus_read_unlock();
+
+	/*
+	 * Memory barrier on the caller thread _after_ we finished
+	 * waiting for the last IPI. Matches memory barriers around
+	 * rq->curr modification in scheduler.
+	 */
+	smp_mb();	/* exit from system call is not a mb */
+	return 0;
+}
+
 static int membarrier_private_expedited(void)
 {
 	int cpu;
@@ -105,7 +174,38 @@ static int membarrier_private_expedited(void)
 	return 0;
 }
 
-static void membarrier_register_private_expedited(void)
+static int membarrier_register_shared_expedited(void)
+{
+	struct task_struct *p = current;
+	struct mm_struct *mm = p->mm;
+
+	if (atomic_read(&mm->membarrier_state)
+			& MEMBARRIER_STATE_SHARED_EXPEDITED_READY)
+		return 0;
+	atomic_or(MEMBARRIER_STATE_SHARED_EXPEDITED, &mm->membarrier_state);
+	if (atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1) {
+		/*
+		 * For single mm user, single threaded process, we can
+		 * simply issue a memory barrier after setting
+		 * MEMBARRIER_STATE_SHARED_EXPEDITED to guarantee that
+		 * no memory access following registration is reordered
+		 * before registration.
+		 */
+		smp_mb();
+	} else {
+		/*
+		 * For multi-mm user threads, we need to ensure all
+		 * future scheduler executions will observe the new
+		 * thread flag state for this mm.
+		 */
+		synchronize_sched();
+	}
+	atomic_or(MEMBARRIER_STATE_SHARED_EXPEDITED_READY,
+			&mm->membarrier_state);
+	return 0;
+}
+
+static int membarrier_register_private_expedited(void)
 {
 	struct task_struct *p = current;
 	struct mm_struct *mm = p->mm;
@@ -117,7 +217,7 @@ static void membarrier_register_private_expedited(void)
 	 */
 	if (atomic_read(&mm->membarrier_state)
 			& MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY)
-		return;
+		return 0;
 	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED,
 			&mm->membarrier_state);
 	if (!(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)) {
@@ -129,6 +229,7 @@ static void membarrier_register_private_expedited(void)
 	}
 	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY,
 			&mm->membarrier_state);
+	return 0;
 }
 
 /**
@@ -178,11 +279,14 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
 		if (num_online_cpus() > 1)
 			synchronize_sched();
 		return 0;
+	case MEMBARRIER_CMD_SHARED_EXPEDITED:
+		return membarrier_shared_expedited();
+	case MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED:
+		return membarrier_register_shared_expedited();
 	case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
 		return membarrier_private_expedited();
 	case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED:
-		membarrier_register_private_expedited();
-		return 0;
+		return membarrier_register_private_expedited();
 	default:
 		return -EINVAL;
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 v2 21/22] membarrier: provide SHARED_EXPEDITED command
@ 2017-11-21 14:18   ` Mathieu Desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Maged Michael,
	Avi Kivity

Allow expedited membarrier to be used for data shared between processes
(shared memory).

Processes wishing to receive the membarriers register with
MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED. Those which want to issue
membarrier invoke MEMBARRIER_CMD_SHARED_EXPEDITED.

This allows extremely simple kernel-level implementation: we have almost
everything we need with the PRIVATE_EXPEDITED barrier code. All we need
to do is to add a flag in the mm_struct that will be used to check
whether we need to send the IPI to the current thread of each CPU.

There is a slight downside of this approach compared to targeting
specific shared memory users: when performing a membarrier operation,
all registered "shared" receivers will get the barrier, even if they
don't share a memory mapping with the "sender" issuing
MEMBARRIER_CMD_SHARED_EXPEDITED.

This registration approach seems to fit the requirement of not
disturbing processes that really deeply care about real-time: they
simply should not register with MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andrea Parri <parri.andrea@gmail.com>
CC: x86@kernel.org
---
Changes since v1:
- Add missing preempt disable around smp_call_function_many().
---
 arch/powerpc/include/asm/membarrier.h |   3 +-
 include/linux/sched/mm.h              |   6 +-
 include/uapi/linux/membarrier.h       |  34 ++++++++--
 kernel/sched/membarrier.c             | 114 ++++++++++++++++++++++++++++++++--
 4 files changed, 143 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/membarrier.h b/arch/powerpc/include/asm/membarrier.h
index 046f96768ab5..ddf4baedd132 100644
--- a/arch/powerpc/include/asm/membarrier.h
+++ b/arch/powerpc/include/asm/membarrier.h
@@ -12,7 +12,8 @@ static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
 	 * store to rq->curr.
 	 */
 	if (likely(!(atomic_read(&next->membarrier_state)
-			& MEMBARRIER_STATE_PRIVATE_EXPEDITED) || !prev))
+			& (MEMBARRIER_STATE_PRIVATE_EXPEDITED
+			| MEMBARRIER_STATE_SHARED_EXPEDITED)) || !prev))
 		return;
 
 	/*
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 0f9e1a96b890..c7b0f5970d7c 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -219,8 +219,10 @@ static inline void memalloc_noreclaim_restore(unsigned int flags)
 
 #ifdef CONFIG_MEMBARRIER
 enum {
-	MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY	= (1U << 0),
-	MEMBARRIER_STATE_PRIVATE_EXPEDITED		= (1U << 1),
+	MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY		= (1U << 0),
+	MEMBARRIER_STATE_PRIVATE_EXPEDITED			= (1U << 1),
+	MEMBARRIER_STATE_SHARED_EXPEDITED_READY			= (1U << 2),
+	MEMBARRIER_STATE_SHARED_EXPEDITED			= (1U << 3),
 };
 
 #ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
index 4e01ad7ffe98..2de01e595d3b 100644
--- a/include/uapi/linux/membarrier.h
+++ b/include/uapi/linux/membarrier.h
@@ -40,6 +40,28 @@
  *                          (non-running threads are de facto in such a
  *                          state). This covers threads from all processes
  *                          running on the system. This command returns 0.
+ * @MEMBARRIER_CMD_SHARED_EXPEDITED:
+ *                          Execute a memory barrier on all running threads
+ *                          part of a process which previously registered
+ *                          with MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED.
+ *                          Upon return from system call, the caller thread
+ *                          is ensured that all running threads have passed
+ *                          through a state where all memory accesses to
+ *                          user-space addresses match program order between
+ *                          entry to and return from the system call
+ *                          (non-running threads are de facto in such a
+ *                          state). This only covers threads from processes
+ *                          which registered with
+ *                          MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED.
+ *                          This command returns 0. Given that
+ *                          registration is about the intent to receive
+ *                          the barriers, it is valid to invoke
+ *                          MEMBARRIER_CMD_SHARED_EXPEDITED from a
+ *                          non-registered process.
+ * @MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED:
+ *                          Register the process intent to receive
+ *                          MEMBARRIER_CMD_SHARED_EXPEDITED memory
+ *                          barriers. Always returns 0.
  * @MEMBARRIER_CMD_PRIVATE_EXPEDITED:
  *                          Execute a memory barrier on each running
  *                          thread belonging to the same process as the current
@@ -70,12 +92,12 @@
  * the value 0.
  */
 enum membarrier_cmd {
-	MEMBARRIER_CMD_QUERY				= 0,
-	MEMBARRIER_CMD_SHARED				= (1 << 0),
-	/* reserved for MEMBARRIER_CMD_SHARED_EXPEDITED (1 << 1) */
-	/* reserved for MEMBARRIER_CMD_PRIVATE (1 << 2) */
-	MEMBARRIER_CMD_PRIVATE_EXPEDITED		= (1 << 3),
-	MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED	= (1 << 4),
+	MEMBARRIER_CMD_QUERY					= 0,
+	MEMBARRIER_CMD_SHARED					= (1 << 0),
+	MEMBARRIER_CMD_SHARED_EXPEDITED				= (1 << 1),
+	MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED		= (1 << 2),
+	MEMBARRIER_CMD_PRIVATE_EXPEDITED			= (1 << 3),
+	MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED		= (1 << 4),
 };
 
 #endif /* _UAPI_LINUX_MEMBARRIER_H */
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 7d2ec7202ba8..b1312eb9d292 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -27,7 +27,9 @@
  * except MEMBARRIER_CMD_QUERY.
  */
 #define MEMBARRIER_CMD_BITMASK	\
-	(MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED	\
+	(MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_SHARED_EXPEDITED \
+	| MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED \
+	| MEMBARRIER_CMD_PRIVATE_EXPEDITED	\
 	| MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED)
 
 static void ipi_mb(void *info)
@@ -35,6 +37,73 @@ static void ipi_mb(void *info)
 	smp_mb();	/* IPIs should be serializing but paranoid. */
 }
 
+static int membarrier_shared_expedited(void)
+{
+	int cpu;
+	bool fallback = false;
+	cpumask_var_t tmpmask;
+
+	if (num_online_cpus() == 1)
+		return 0;
+
+	/*
+	 * Matches memory barriers around rq->curr modification in
+	 * scheduler.
+	 */
+	smp_mb();	/* system call entry is not a mb. */
+
+	/*
+	 * Expedited membarrier commands guarantee that they won't
+	 * block, hence the GFP_NOWAIT allocation flag and fallback
+	 * implementation.
+	 */
+	if (!zalloc_cpumask_var(&tmpmask, GFP_NOWAIT)) {
+		/* Fallback for OOM. */
+		fallback = true;
+	}
+
+	cpus_read_lock();
+	for_each_online_cpu(cpu) {
+		struct task_struct *p;
+
+		/*
+		 * Skipping the current CPU is OK even through we can be
+		 * migrated at any point. The current CPU, at the point
+		 * where we read raw_smp_processor_id(), is ensured to
+		 * be in program order with respect to the caller
+		 * thread. Therefore, we can skip this CPU from the
+		 * iteration.
+		 */
+		if (cpu == raw_smp_processor_id())
+			continue;
+		rcu_read_lock();
+		p = task_rcu_dereference(&cpu_rq(cpu)->curr);
+		if (p && p->mm && (atomic_read(&p->mm->membarrier_state)
+				& MEMBARRIER_STATE_SHARED_EXPEDITED)) {
+			if (!fallback)
+				__cpumask_set_cpu(cpu, tmpmask);
+			else
+				smp_call_function_single(cpu, ipi_mb, NULL, 1);
+		}
+		rcu_read_unlock();
+	}
+	if (!fallback) {
+		preempt_disable();
+		smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
+		preempt_enable();
+		free_cpumask_var(tmpmask);
+	}
+	cpus_read_unlock();
+
+	/*
+	 * Memory barrier on the caller thread _after_ we finished
+	 * waiting for the last IPI. Matches memory barriers around
+	 * rq->curr modification in scheduler.
+	 */
+	smp_mb();	/* exit from system call is not a mb */
+	return 0;
+}
+
 static int membarrier_private_expedited(void)
 {
 	int cpu;
@@ -105,7 +174,38 @@ static int membarrier_private_expedited(void)
 	return 0;
 }
 
-static void membarrier_register_private_expedited(void)
+static int membarrier_register_shared_expedited(void)
+{
+	struct task_struct *p = current;
+	struct mm_struct *mm = p->mm;
+
+	if (atomic_read(&mm->membarrier_state)
+			& MEMBARRIER_STATE_SHARED_EXPEDITED_READY)
+		return 0;
+	atomic_or(MEMBARRIER_STATE_SHARED_EXPEDITED, &mm->membarrier_state);
+	if (atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1) {
+		/*
+		 * For single mm user, single threaded process, we can
+		 * simply issue a memory barrier after setting
+		 * MEMBARRIER_STATE_SHARED_EXPEDITED to guarantee that
+		 * no memory access following registration is reordered
+		 * before registration.
+		 */
+		smp_mb();
+	} else {
+		/*
+		 * For multi-mm user threads, we need to ensure all
+		 * future scheduler executions will observe the new
+		 * thread flag state for this mm.
+		 */
+		synchronize_sched();
+	}
+	atomic_or(MEMBARRIER_STATE_SHARED_EXPEDITED_READY,
+			&mm->membarrier_state);
+	return 0;
+}
+
+static int membarrier_register_private_expedited(void)
 {
 	struct task_struct *p = current;
 	struct mm_struct *mm = p->mm;
@@ -117,7 +217,7 @@ static void membarrier_register_private_expedited(void)
 	 */
 	if (atomic_read(&mm->membarrier_state)
 			& MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY)
-		return;
+		return 0;
 	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED,
 			&mm->membarrier_state);
 	if (!(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)) {
@@ -129,6 +229,7 @@ static void membarrier_register_private_expedited(void)
 	}
 	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY,
 			&mm->membarrier_state);
+	return 0;
 }
 
 /**
@@ -178,11 +279,14 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
 		if (num_online_cpus() > 1)
 			synchronize_sched();
 		return 0;
+	case MEMBARRIER_CMD_SHARED_EXPEDITED:
+		return membarrier_shared_expedited();
+	case MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED:
+		return membarrier_register_shared_expedited();
 	case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
 		return membarrier_private_expedited();
 	case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED:
-		membarrier_register_private_expedited();
-		return 0;
+		return membarrier_register_private_expedited();
 	default:
 		return -EINVAL;
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 22/22] membarrier: selftest: Test shared expedited cmd
  2017-11-21 14:18 ` Mathieu Desnoyers
  (?)
  (?)
@ 2017-11-21 14:19   ` mathieu.desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:19 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers,
	Greg Kroah-Hartman, Maged Michael, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Alan Stern, Andy Lutomirski, Alice Ferrazzi, Paul Elder,
	linux-kselftest, linux-arch

Test the new MEMBARRIER_CMD_SHARED_EXPEDITED and
MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED commands.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Alan Stern <stern@rowland.harvard.edu>
CC: Will Deacon <will.deacon@arm.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: Alice Ferrazzi <alice.ferrazzi@gmail.com>
CC: Paul Elder <paul.elder@pitt.edu>
CC: linux-kselftest@vger.kernel.org
CC: linux-arch@vger.kernel.org
---
 .../testing/selftests/membarrier/membarrier_test.c | 51 +++++++++++++++++++++-
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/membarrier/membarrier_test.c b/tools/testing/selftests/membarrier/membarrier_test.c
index e6ee73d01fa1..bb9c58072c5c 100644
--- a/tools/testing/selftests/membarrier/membarrier_test.c
+++ b/tools/testing/selftests/membarrier/membarrier_test.c
@@ -132,6 +132,40 @@ static int test_membarrier_private_expedited_success(void)
 	return 0;
 }
 
+static int test_membarrier_register_shared_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_shared_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_SHARED_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_SHARED_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
 static int test_membarrier(void)
 {
 	int status;
@@ -154,6 +188,19 @@ static int test_membarrier(void)
 	status = test_membarrier_private_expedited_success();
 	if (status)
 		return status;
+	/*
+	 * It is valid to send a shared membarrier from a non-registered
+	 * process.
+	 */
+	status = test_membarrier_shared_expedited_success();
+	if (status)
+		return status;
+	status = test_membarrier_register_shared_expedited_success();
+	if (status)
+		return status;
+	status = test_membarrier_shared_expedited_success();
+	if (status)
+		return status;
 	return 0;
 }
 
@@ -173,8 +220,10 @@ static int test_membarrier_query(void)
 		}
 		ksft_exit_fail_msg("sys_membarrier() failed\n");
 	}
-	if (!(ret & MEMBARRIER_CMD_SHARED))
+	if (!(ret & MEMBARRIER_CMD_SHARED)) {
+		ksft_test_result_fail("sys_membarrier() CMD_SHARED query failed\n");
 		ksft_exit_fail_msg("sys_membarrier is not supported.\n");
+	}
 
 	ksft_test_result_pass("sys_membarrier available\n");
 	return 0;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 22/22] membarrier: selftest: Test shared expedited cmd
@ 2017-11-21 14:19   ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: mathieu.desnoyers @ 2017-11-21 14:19 UTC (permalink / raw)


Test the new MEMBARRIER_CMD_SHARED_EXPEDITED and
MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED commands.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Acked-by: Shuah Khan <shuahkh at osg.samsung.com>
CC: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
CC: Peter Zijlstra <peterz at infradead.org>
CC: Paul E. McKenney <paulmck at linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng at gmail.com>
CC: Andrew Hunter <ahh at google.com>
CC: Maged Michael <maged.michael at gmail.com>
CC: Avi Kivity <avi at scylladb.com>
CC: Benjamin Herrenschmidt <benh at kernel.crashing.org>
CC: Paul Mackerras <paulus at samba.org>
CC: Michael Ellerman <mpe at ellerman.id.au>
CC: Dave Watson <davejwatson at fb.com>
CC: Alan Stern <stern at rowland.harvard.edu>
CC: Will Deacon <will.deacon at arm.com>
CC: Andy Lutomirski <luto at kernel.org>
CC: Alice Ferrazzi <alice.ferrazzi at gmail.com>
CC: Paul Elder <paul.elder at pitt.edu>
CC: linux-kselftest at vger.kernel.org
CC: linux-arch at vger.kernel.org
---
 .../testing/selftests/membarrier/membarrier_test.c | 51 +++++++++++++++++++++-
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/membarrier/membarrier_test.c b/tools/testing/selftests/membarrier/membarrier_test.c
index e6ee73d01fa1..bb9c58072c5c 100644
--- a/tools/testing/selftests/membarrier/membarrier_test.c
+++ b/tools/testing/selftests/membarrier/membarrier_test.c
@@ -132,6 +132,40 @@ static int test_membarrier_private_expedited_success(void)
 	return 0;
 }
 
+static int test_membarrier_register_shared_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_shared_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_SHARED_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_SHARED_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
 static int test_membarrier(void)
 {
 	int status;
@@ -154,6 +188,19 @@ static int test_membarrier(void)
 	status = test_membarrier_private_expedited_success();
 	if (status)
 		return status;
+	/*
+	 * It is valid to send a shared membarrier from a non-registered
+	 * process.
+	 */
+	status = test_membarrier_shared_expedited_success();
+	if (status)
+		return status;
+	status = test_membarrier_register_shared_expedited_success();
+	if (status)
+		return status;
+	status = test_membarrier_shared_expedited_success();
+	if (status)
+		return status;
 	return 0;
 }
 
@@ -173,8 +220,10 @@ static int test_membarrier_query(void)
 		}
 		ksft_exit_fail_msg("sys_membarrier() failed\n");
 	}
-	if (!(ret & MEMBARRIER_CMD_SHARED))
+	if (!(ret & MEMBARRIER_CMD_SHARED)) {
+		ksft_test_result_fail("sys_membarrier() CMD_SHARED query failed\n");
 		ksft_exit_fail_msg("sys_membarrier is not supported.\n");
+	}
 
 	ksft_test_result_pass("sys_membarrier available\n");
 	return 0;
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 22/22] membarrier: selftest: Test shared expedited cmd
@ 2017-11-21 14:19   ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:19 UTC (permalink / raw)


Test the new MEMBARRIER_CMD_SHARED_EXPEDITED and
MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED commands.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Acked-by: Shuah Khan <shuahkh at osg.samsung.com>
CC: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
CC: Peter Zijlstra <peterz at infradead.org>
CC: Paul E. McKenney <paulmck at linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng at gmail.com>
CC: Andrew Hunter <ahh at google.com>
CC: Maged Michael <maged.michael at gmail.com>
CC: Avi Kivity <avi at scylladb.com>
CC: Benjamin Herrenschmidt <benh at kernel.crashing.org>
CC: Paul Mackerras <paulus at samba.org>
CC: Michael Ellerman <mpe at ellerman.id.au>
CC: Dave Watson <davejwatson at fb.com>
CC: Alan Stern <stern at rowland.harvard.edu>
CC: Will Deacon <will.deacon at arm.com>
CC: Andy Lutomirski <luto at kernel.org>
CC: Alice Ferrazzi <alice.ferrazzi at gmail.com>
CC: Paul Elder <paul.elder at pitt.edu>
CC: linux-kselftest at vger.kernel.org
CC: linux-arch at vger.kernel.org
---
 .../testing/selftests/membarrier/membarrier_test.c | 51 +++++++++++++++++++++-
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/membarrier/membarrier_test.c b/tools/testing/selftests/membarrier/membarrier_test.c
index e6ee73d01fa1..bb9c58072c5c 100644
--- a/tools/testing/selftests/membarrier/membarrier_test.c
+++ b/tools/testing/selftests/membarrier/membarrier_test.c
@@ -132,6 +132,40 @@ static int test_membarrier_private_expedited_success(void)
 	return 0;
 }
 
+static int test_membarrier_register_shared_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_shared_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_SHARED_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_SHARED_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
 static int test_membarrier(void)
 {
 	int status;
@@ -154,6 +188,19 @@ static int test_membarrier(void)
 	status = test_membarrier_private_expedited_success();
 	if (status)
 		return status;
+	/*
+	 * It is valid to send a shared membarrier from a non-registered
+	 * process.
+	 */
+	status = test_membarrier_shared_expedited_success();
+	if (status)
+		return status;
+	status = test_membarrier_register_shared_expedited_success();
+	if (status)
+		return status;
+	status = test_membarrier_shared_expedited_success();
+	if (status)
+		return status;
 	return 0;
 }
 
@@ -173,8 +220,10 @@ static int test_membarrier_query(void)
 		}
 		ksft_exit_fail_msg("sys_membarrier() failed\n");
 	}
-	if (!(ret & MEMBARRIER_CMD_SHARED))
+	if (!(ret & MEMBARRIER_CMD_SHARED)) {
+		ksft_test_result_fail("sys_membarrier() CMD_SHARED query failed\n");
 		ksft_exit_fail_msg("sys_membarrier is not supported.\n");
+	}
 
 	ksft_test_result_pass("sys_membarrier available\n");
 	return 0;
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [RFC PATCH for 4.15 22/22] membarrier: selftest: Test shared expedited cmd
@ 2017-11-21 14:19   ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 14:19 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers,
	Greg Kroah-Hartman

Test the new MEMBARRIER_CMD_SHARED_EXPEDITED and
MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED commands.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Alan Stern <stern@rowland.harvard.edu>
CC: Will Deacon <will.deacon@arm.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: Alice Ferrazzi <alice.ferrazzi@gmail.com>
CC: Paul Elder <paul.elder@pitt.edu>
CC: linux-kselftest@vger.kernel.org
CC: linux-arch@vger.kernel.org
---
 .../testing/selftests/membarrier/membarrier_test.c | 51 +++++++++++++++++++++-
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/membarrier/membarrier_test.c b/tools/testing/selftests/membarrier/membarrier_test.c
index e6ee73d01fa1..bb9c58072c5c 100644
--- a/tools/testing/selftests/membarrier/membarrier_test.c
+++ b/tools/testing/selftests/membarrier/membarrier_test.c
@@ -132,6 +132,40 @@ static int test_membarrier_private_expedited_success(void)
 	return 0;
 }
 
+static int test_membarrier_register_shared_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_shared_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_SHARED_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_SHARED_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
 static int test_membarrier(void)
 {
 	int status;
@@ -154,6 +188,19 @@ static int test_membarrier(void)
 	status = test_membarrier_private_expedited_success();
 	if (status)
 		return status;
+	/*
+	 * It is valid to send a shared membarrier from a non-registered
+	 * process.
+	 */
+	status = test_membarrier_shared_expedited_success();
+	if (status)
+		return status;
+	status = test_membarrier_register_shared_expedited_success();
+	if (status)
+		return status;
+	status = test_membarrier_shared_expedited_success();
+	if (status)
+		return status;
 	return 0;
 }
 
@@ -173,8 +220,10 @@ static int test_membarrier_query(void)
 		}
 		ksft_exit_fail_msg("sys_membarrier() failed\n");
 	}
-	if (!(ret & MEMBARRIER_CMD_SHARED))
+	if (!(ret & MEMBARRIER_CMD_SHARED)) {
+		ksft_test_result_fail("sys_membarrier() CMD_SHARED query failed\n");
 		ksft_exit_fail_msg("sys_membarrier is not supported.\n");
+	}
 
 	ksft_test_result_pass("sys_membarrier available\n");
 	return 0;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 14/22] cpu_opv: selftests: Implement selftests
  2017-11-21 14:18   ` mathieu.desnoyers
  (?)
  (?)
@ 2017-11-21 15:17     ` shuah
  -1 siblings, 0 replies; 175+ messages in thread
From: Shuah Khan @ 2017-11-21 15:17 UTC (permalink / raw)
  To: Mathieu Desnoyers, Peter Zijlstra, Paul E . McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, linux-kselftest, Shuah Khan,
	Shuah Khan

On 11/21/2017 07:18 AM, Mathieu Desnoyers wrote:
> Implement cpu_opv selftests. It needs to express dependencies on
> header files and .so, which require to override the selftests
> lib.mk targets. Introduce a new OVERRIDE_TARGETS define for this.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: Russell King <linux@arm.linux.org.uk>
> CC: Catalin Marinas <catalin.marinas@arm.com>
> CC: Will Deacon <will.deacon@arm.com>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Paul Turner <pjt@google.com>
> CC: Andrew Hunter <ahh@google.com>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Andi Kleen <andi@firstfloor.org>
> CC: Dave Watson <davejwatson@fb.com>
> CC: Chris Lameter <cl@linux.com>
> CC: Ingo Molnar <mingo@redhat.com>
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: Ben Maurer <bmaurer@fb.com>
> CC: Steven Rostedt <rostedt@goodmis.org>
> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> CC: Josh Triplett <josh@joshtriplett.org>
> CC: Linus Torvalds <torvalds@linux-foundation.org>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Boqun Feng <boqun.feng@gmail.com>
> CC: Shuah Khan <shuah@kernel.org>
> CC: linux-kselftest@vger.kernel.org
> CC: linux-api@vger.kernel.org
> ---
> Changes since v1:
> 
> - Expose similar library API as rseq:  Expose library API closely
>   matching the rseq APIs, following removal of the event counter from
>   the rseq kernel API.
> - Update makefile to fix make run_tests dependency on "all".
> - Introduce a OVERRIDE_TARGETS.
> 
> Changes since v2:
> 
> - Test page faults.
> ---
>  MAINTAINERS                                        |    1 +
>  tools/testing/selftests/Makefile                   |    1 +
>  tools/testing/selftests/cpu-opv/.gitignore         |    1 +
>  tools/testing/selftests/cpu-opv/Makefile           |   17 +
>  .../testing/selftests/cpu-opv/basic_cpu_opv_test.c | 1189 ++++++++++++++++++++
>  tools/testing/selftests/cpu-opv/cpu-op.c           |  348 ++++++
>  tools/testing/selftests/cpu-opv/cpu-op.h           |   68 ++
>  tools/testing/selftests/lib.mk                     |    4 +

Please make the change to add OVERRIDE_TARGETS lib.mk a separate patch.
This is most likely going to be the first patch in this series.

>  8 files changed, 1629 insertions(+)
>  create mode 100644 tools/testing/selftests/cpu-opv/.gitignore
>  create mode 100644 tools/testing/selftests/cpu-opv/Makefile
>  create mode 100644 tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
>  create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.c
>  create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0b4e504f5003..c6c2436d15f8 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3734,6 +3734,7 @@ L:	linux-kernel@vger.kernel.org
>  S:	Supported
>  F:	kernel/cpu_opv.c
>  F:	include/uapi/linux/cpu_opv.h
> +F:	tools/testing/selftests/cpu-opv/
>  
>  CRAMFS FILESYSTEM
>  M:	Nicolas Pitre <nico@linaro.org>
> diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
> index eaf599dc2137..fc1eba0e0130 100644
> --- a/tools/testing/selftests/Makefile
> +++ b/tools/testing/selftests/Makefile
> @@ -5,6 +5,7 @@ TARGETS += breakpoints
>  TARGETS += capabilities
>  TARGETS += cpufreq
>  TARGETS += cpu-hotplug
> +TARGETS += cpu-opv
>  TARGETS += efivarfs
>  TARGETS += exec
>  TARGETS += firmware
> diff --git a/tools/testing/selftests/cpu-opv/.gitignore b/tools/testing/selftests/cpu-opv/.gitignore
> new file mode 100644
> index 000000000000..c7186eb95cf5
> --- /dev/null
> +++ b/tools/testing/selftests/cpu-opv/.gitignore
> @@ -0,0 +1 @@
> +basic_cpu_opv_test
> diff --git a/tools/testing/selftests/cpu-opv/Makefile b/tools/testing/selftests/cpu-opv/Makefile
> new file mode 100644
> index 000000000000..21e63545d521
> --- /dev/null
> +++ b/tools/testing/selftests/cpu-opv/Makefile
> @@ -0,0 +1,17 @@
> +CFLAGS += -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./
> +
> +# Own dependencies because we only want to build against 1st prerequisite, but
> +# still track changes to header files and depend on shared object.
> +OVERRIDE_TARGETS = 1
> +
> +TEST_GEN_PROGS = basic_cpu_opv_test
> +
> +TEST_GEN_PROGS_EXTENDED = libcpu-op.so


> +
> +include ../lib.mk
> +
> +$(OUTPUT)/libcpu-op.so: cpu-op.c cpu-op.h
> +	$(CC) $(CFLAGS) -shared -fPIC $< -o $@
> +
> +$(OUTPUT)/%: %.c $(TEST_GEN_PROGS_EXTENDED) cpu-op.h
> +	$(CC) $(CFLAGS) $< -lcpu-op -o $@
> diff --git a/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
> new file mode 100644
> index 000000000000..a31a10bbd8aa
> --- /dev/null
> +++ b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
> @@ -0,0 +1,1189 @@
> +/*
> + * Basic test coverage for cpu_opv system call.
> + */
> +
> +#define _GNU_SOURCE
> +#include <assert.h>
> +#include <sched.h>
> +#include <signal.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <sys/time.h>
> +#include <errno.h>
> +#include <stdlib.h>
> +
> +#include "cpu-op.h"
> +
> +#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
> +
> +#define TESTBUFLEN	4096
> +#define TESTBUFLEN_CMP	16
> +
> +#define TESTBUFLEN_PAGE_MAX	65536
> +
> +#define NR_PF_ARRAY	16384
> +#define PF_ARRAY_LEN	4096
> +
> +/* 64 MB arrays for page fault testing. */
> +char pf_array_dst[NR_PF_ARRAY][PF_ARRAY_LEN];
> +char pf_array_src[NR_PF_ARRAY][PF_ARRAY_LEN];
> +
> +static int test_compare_eq_op(char *a, char *b, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_compare_eq_same(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_compare_eq same";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test compare_eq */
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf2[i] = (char)i;
> +	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret > 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 0);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_compare_eq_diff(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_compare_eq different";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN);
> +	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret == 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 1);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_compare_ne_op(char *a, char *b, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_NE_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_compare_ne_same(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_compare_ne same";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test compare_ne */
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf2[i] = (char)i;
> +	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret == 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 1);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_compare_ne_diff(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_compare_ne different";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN);
> +	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 0);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_2compare_eq_op(char *a, char *b, char *c, char *d,
> +		size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_2compare_eq_index(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN_CMP];
> +	char buf2[TESTBUFLEN_CMP];
> +	char buf3[TESTBUFLEN_CMP];
> +	char buf4[TESTBUFLEN_CMP];
> +	const char *test_name = "test_2compare_eq index";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN_CMP);
> +	memset(buf3, 0, TESTBUFLEN_CMP);
> +	memset(buf4, 0, TESTBUFLEN_CMP);
> +
> +	/* First compare failure is op[0], expect 1. */
> +	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 1) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 1);
> +		return -1;
> +	}
> +
> +	/* All compares succeed. */
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf2[i] = (char)i;
> +	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 0);
> +		return -1;
> +	}
> +
> +	/* First compare failure is op[1], expect 2. */
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf3[i] = (char)i;
> +	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 2) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 2);
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +static int test_2compare_ne_op(char *a, char *b, char *c, char *d,
> +		size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_NE_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_COMPARE_NE_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_2compare_ne_index(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN_CMP];
> +	char buf2[TESTBUFLEN_CMP];
> +	char buf3[TESTBUFLEN_CMP];
> +	char buf4[TESTBUFLEN_CMP];
> +	const char *test_name = "test_2compare_ne index";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	memset(buf1, 0, TESTBUFLEN_CMP);
> +	memset(buf2, 0, TESTBUFLEN_CMP);
> +	memset(buf3, 0, TESTBUFLEN_CMP);
> +	memset(buf4, 0, TESTBUFLEN_CMP);
> +
> +	/* First compare ne failure is op[0], expect 1. */
> +	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 1) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 1);
> +		return -1;
> +	}
> +
> +	/* All compare ne succeed. */
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf1[i] = (char)i;
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf3[i] = (char)i;
> +	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 0);
> +		return -1;
> +	}
> +
> +	/* First compare failure is op[1], expect 2. */
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf4[i] = (char)i;
> +	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 2) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 2);
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +static int test_memcpy_op(void *dst, void *src, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_memcpy(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_memcpy";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy */
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN);
> +	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	for (i = 0; i < TESTBUFLEN; i++) {
> +		if (buf2[i] != (char)i) {
> +			printf("%s failed. Expecting '%d', found '%d' at offset %d\n",
> +				test_name, (char)i, buf2[i], i);
> +			return -1;
> +		}
> +	}
> +	return 0;
> +}
> +
> +static int test_memcpy_u32(void)
> +{
> +	int ret;
> +	uint32_t v1, v2;
> +	const char *test_name = "test_memcpy_u32";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy_u32 */
> +	v1 = 42;
> +	v2 = 0;
> +	ret = test_memcpy_op(&v2, &v1, sizeof(v1));
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (v1 != v2) {
> +		printf("%s failed. Expecting '%d', found '%d'\n",
> +			test_name, v1, v2);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_memcpy_mb_memcpy_op(void *dst1, void *src1,
> +		void *dst2, void *src2, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst1),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src1),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MB_OP,
> +		},
> +		[2] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst2),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src2),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_memcpy_mb_memcpy(void)
> +{
> +	int ret;
> +	int v1, v2, v3;
> +	const char *test_name = "test_memcpy_mb_memcpy";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy */
> +	v1 = 42;
> +	v2 = v3 = 0;
> +	ret = test_memcpy_mb_memcpy_op(&v2, &v1, &v3, &v2, sizeof(int));
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (v3 != v1) {
> +		printf("%s failed. Expecting '%d', found '%d'\n",
> +			test_name, v1, v3);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_add_op(int *v, int64_t increment)
> +{
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_op_add(v, increment, sizeof(*v), cpu);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_add(void)
> +{
> +	int orig_v = 42, v, ret;
> +	int increment = 1;
> +	const char *test_name = "test_add";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_add_op(&v, increment);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != orig_v + increment) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_two_add_op(int *v, int64_t *increments)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_ADD_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.arithmetic_op.p, v),
> +			.u.arithmetic_op.count = increments[0],
> +			.u.arithmetic_op.expect_fault_p = 0,
> +		},
> +		[1] = {
> +			.op = CPU_ADD_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.arithmetic_op.p, v),
> +			.u.arithmetic_op.count = increments[1],
> +			.u.arithmetic_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_two_add(void)
> +{
> +	int orig_v = 42, v, ret;
> +	int64_t increments[2] = { 99, 123 };
> +	const char *test_name = "test_two_add";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_two_add_op(&v, increments);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != orig_v + increments[0] + increments[1]) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_or_op(int *v, uint64_t mask)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_OR_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.bitwise_op.p, v),
> +			.u.bitwise_op.mask = mask,
> +			.u.bitwise_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_or(void)
> +{
> +	int orig_v = 0xFF00000, v, ret;
> +	uint32_t mask = 0xFFF;
> +	const char *test_name = "test_or";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_or_op(&v, mask);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != (orig_v | mask)) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v | mask);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_and_op(int *v, uint64_t mask)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_AND_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.bitwise_op.p, v),
> +			.u.bitwise_op.mask = mask,
> +			.u.bitwise_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_and(void)
> +{
> +	int orig_v = 0xF00, v, ret;
> +	uint32_t mask = 0xFFF;
> +	const char *test_name = "test_and";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_and_op(&v, mask);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != (orig_v & mask)) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v & mask);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_xor_op(int *v, uint64_t mask)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_XOR_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.bitwise_op.p, v),
> +			.u.bitwise_op.mask = mask,
> +			.u.bitwise_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_xor(void)
> +{
> +	int orig_v = 0xF00, v, ret;
> +	uint32_t mask = 0xFFF;
> +	const char *test_name = "test_xor";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_xor_op(&v, mask);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != (orig_v ^ mask)) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v ^ mask);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_lshift_op(int *v, uint32_t bits)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_LSHIFT_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.shift_op.p, v),
> +			.u.shift_op.bits = bits,
> +			.u.shift_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_lshift(void)
> +{
> +	int orig_v = 0xF00, v, ret;
> +	uint32_t bits = 5;
> +	const char *test_name = "test_lshift";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_lshift_op(&v, bits);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != (orig_v << bits)) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v << bits);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_rshift_op(int *v, uint32_t bits)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_RSHIFT_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.shift_op.p, v),
> +			.u.shift_op.bits = bits,
> +			.u.shift_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_rshift(void)
> +{
> +	int orig_v = 0xF00, v, ret;
> +	uint32_t bits = 5;
> +	const char *test_name = "test_rshift";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_rshift_op(&v, bits);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != (orig_v >> bits)) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v >> bits);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_cmpxchg_op(void *v, void *expect, void *old, void *n,
> +		size_t len)
> +{
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_op_cmpxchg(v, expect, old, n, len, cpu);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +
> +static int test_cmpxchg_success(void)
> +{
> +	int ret;
> +	uint64_t orig_v = 1, v, expect = 1, old = 0, n = 3;
> +	const char *test_name = "test_cmpxchg success";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 0);
> +		return -1;
> +	}
> +	if (v != n) {
> +		printf("%s v is %lld, expecting %lld\n",
> +			test_name, (long long)v, (long long)n);
> +		return -1;
> +	}
> +	if (old != orig_v) {
> +		printf("%s old is %lld, expecting %lld\n",
> +			test_name, (long long)old, (long long)orig_v);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_cmpxchg_fail(void)
> +{
> +	int ret;
> +	uint64_t orig_v = 1, v, expect = 123, old = 0, n = 3;
> +	const char *test_name = "test_cmpxchg fail";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret == 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 1);
> +		return -1;
> +	}
> +	if (v == n) {
> +		printf("%s v is %lld, expecting %lld\n",
> +			test_name, (long long)v, (long long)orig_v);
> +		return -1;
> +	}
> +	if (old != orig_v) {
> +		printf("%s old is %lld, expecting %lld\n",
> +			test_name, (long long)old, (long long)orig_v);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_memcpy_expect_fault_op(void *dst, void *src, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			/* Return EAGAIN on fault. */
> +			.u.memcpy_op.expect_fault_src = 1,
> +		},
> +	};
> +	int cpu;
> +
> +	cpu = cpu_op_get_current_cpu();
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +static int test_memcpy_fault(void)
> +{
> +	int ret;
> +	char buf1[TESTBUFLEN];
> +	const char *test_name = "test_memcpy_fault";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy */
> +	ret = test_memcpy_op(buf1, NULL, TESTBUFLEN);
> +	if (!ret || (ret < 0 && errno != EFAULT)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	/* Test memcpy expect fault */
> +	ret = test_memcpy_expect_fault_op(buf1, NULL, TESTBUFLEN);
> +	if (!ret || (ret < 0 && errno != EAGAIN)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +static int do_test_unknown_op(void)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = -1,	/* Unknown */
> +			.len = 0,
> +		},
> +	};
> +	int cpu;
> +
> +	cpu = cpu_op_get_current_cpu();
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +static int test_unknown_op(void)
> +{
> +	int ret;
> +	const char *test_name = "test_unknown_op";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	ret = do_test_unknown_op();
> +	if (!ret || (ret < 0 && errno != EINVAL)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +static int do_test_max_ops(void)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = { .op = CPU_MB_OP, },
> +		[1] = { .op = CPU_MB_OP, },
> +		[2] = { .op = CPU_MB_OP, },
> +		[3] = { .op = CPU_MB_OP, },
> +		[4] = { .op = CPU_MB_OP, },
> +		[5] = { .op = CPU_MB_OP, },
> +		[6] = { .op = CPU_MB_OP, },
> +		[7] = { .op = CPU_MB_OP, },
> +		[8] = { .op = CPU_MB_OP, },
> +		[9] = { .op = CPU_MB_OP, },
> +		[10] = { .op = CPU_MB_OP, },
> +		[11] = { .op = CPU_MB_OP, },
> +		[12] = { .op = CPU_MB_OP, },
> +		[13] = { .op = CPU_MB_OP, },
> +		[14] = { .op = CPU_MB_OP, },
> +		[15] = { .op = CPU_MB_OP, },
> +	};
> +	int cpu;
> +
> +	cpu = cpu_op_get_current_cpu();
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +static int test_max_ops(void)
> +{
> +	int ret;
> +	const char *test_name = "test_max_ops";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	ret = do_test_max_ops();
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +static int do_test_too_many_ops(void)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = { .op = CPU_MB_OP, },
> +		[1] = { .op = CPU_MB_OP, },
> +		[2] = { .op = CPU_MB_OP, },
> +		[3] = { .op = CPU_MB_OP, },
> +		[4] = { .op = CPU_MB_OP, },
> +		[5] = { .op = CPU_MB_OP, },
> +		[6] = { .op = CPU_MB_OP, },
> +		[7] = { .op = CPU_MB_OP, },
> +		[8] = { .op = CPU_MB_OP, },
> +		[9] = { .op = CPU_MB_OP, },
> +		[10] = { .op = CPU_MB_OP, },
> +		[11] = { .op = CPU_MB_OP, },
> +		[12] = { .op = CPU_MB_OP, },
> +		[13] = { .op = CPU_MB_OP, },
> +		[14] = { .op = CPU_MB_OP, },
> +		[15] = { .op = CPU_MB_OP, },
> +		[16] = { .op = CPU_MB_OP, },
> +	};
> +	int cpu;
> +
> +	cpu = cpu_op_get_current_cpu();
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +static int test_too_many_ops(void)
> +{
> +	int ret;
> +	const char *test_name = "test_too_many_ops";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	ret = do_test_too_many_ops();
> +	if (!ret || (ret < 0 && errno != EINVAL)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +/* Use 64kB len, largest page size known on Linux. */
> +static int test_memcpy_single_too_large(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN_PAGE_MAX + 1];
> +	char buf2[TESTBUFLEN_PAGE_MAX + 1];
> +	const char *test_name = "test_memcpy_single_too_large";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy */
> +	for (i = 0; i < TESTBUFLEN_PAGE_MAX + 1; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN_PAGE_MAX + 1);
> +	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN_PAGE_MAX + 1);
> +	if (!ret || (ret < 0 && errno != EINVAL)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +static int test_memcpy_single_ok_sum_too_large_op(void *dst, void *src, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_memcpy_single_ok_sum_too_large(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_memcpy_single_ok_sum_too_large";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy */
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN);
> +	ret = test_memcpy_single_ok_sum_too_large_op(buf2, buf1, TESTBUFLEN);
> +	if (!ret || (ret < 0 && errno != EINVAL)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Iterate over large uninitialized arrays to trigger page faults.
> + */
> +int test_page_fault(void)
> +{
> +	int ret = 0;
> +	uint64_t i;
> +	const char *test_name = "test_page_fault";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	for (i = 0; i < NR_PF_ARRAY; i++) {
> +		ret = test_memcpy_op(pf_array_dst[i],
> +				     pf_array_src[i],
> +				     PF_ARRAY_LEN);
> +		if (ret) {
> +			printf("%s returned with %d, errno: %s\n",
> +				test_name, ret, strerror(errno));
> +			return ret;
> +		}
> +	}
> +	return 0;
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	int ret = 0;
> +
> +	ret |= test_compare_eq_same();
> +	ret |= test_compare_eq_diff();
> +	ret |= test_compare_ne_same();
> +	ret |= test_compare_ne_diff();
> +	ret |= test_2compare_eq_index();
> +	ret |= test_2compare_ne_index();
> +	ret |= test_memcpy();
> +	ret |= test_memcpy_u32();
> +	ret |= test_memcpy_mb_memcpy();
> +	ret |= test_add();
> +	ret |= test_two_add();
> +	ret |= test_or();
> +	ret |= test_and();
> +	ret |= test_xor();
> +	ret |= test_lshift();
> +	ret |= test_rshift();
> +	ret |= test_cmpxchg_success();
> +	ret |= test_cmpxchg_fail();
> +	ret |= test_memcpy_fault();
> +	ret |= test_unknown_op();
> +	ret |= test_max_ops();
> +	ret |= test_too_many_ops();
> +	ret |= test_memcpy_single_too_large();
> +	ret |= test_memcpy_single_ok_sum_too_large();
> +	ret |= test_page_fault();
> +

Where do pass counts get printed. I am seeing error messages when tests fail,
not seeing any pass messages. It would be nice to use ksft framework for
counting pass/fail for these series of tests that get run.

> +	return ret;
> +}
> diff --git a/tools/testing/selftests/cpu-opv/cpu-op.c b/tools/testing/selftests/cpu-opv/cpu-op.c
> new file mode 100644
> index 000000000000..d7ba481cca04
> --- /dev/null
> +++ b/tools/testing/selftests/cpu-opv/cpu-op.c
> @@ -0,0 +1,348 @@
> +/*
> + * cpu-op.c
> + *
> + * Copyright (C) 2017 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; only
> + * version 2.1 of the License.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * Lesser General Public License for more details.
> + */
> +
> +#define _GNU_SOURCE
> +#include <errno.h>
> +#include <sched.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <syscall.h>
> +#include <assert.h>
> +#include <signal.h>
> +
> +#include "cpu-op.h"
> +
> +#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
> +
> +int cpu_opv(struct cpu_op *cpu_opv, int cpuopcnt, int cpu, int flags)
> +{
> +	return syscall(__NR_cpu_opv, cpu_opv, cpuopcnt, cpu, flags);
> +}
> +
> +int cpu_op_get_current_cpu(void)
> +{
> +	int cpu;
> +
> +	cpu = sched_getcpu();
> +	if (cpu < 0) {
> +		perror("sched_getcpu()");
> +		abort();
> +	}
> +	return cpu;
> +}
> +
> +int cpu_op_cmpxchg(void *v, void *expect, void *old, void *n,
> +		size_t len, int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			.u.memcpy_op.dst = (unsigned long)old,
> +			.u.memcpy_op.src = (unsigned long)v,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[1] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = len,
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)n,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_add(void *v, int64_t count, size_t len, int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_ADD_OP,
> +			.len = len,
> +			.u.arithmetic_op.p = (unsigned long)v,
> +			.u.arithmetic_op.count = count,
> +			.u.arithmetic_op.expect_fault_p = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +static int cpu_op_cmpeqv_storep_expect_fault(intptr_t *v, intptr_t expect,
> +		intptr_t *newp, int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)newp,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			/* Return EAGAIN on src fault. */
> +			.u.memcpy_op.expect_fault_src = 1,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
> +		off_t voffp, intptr_t *load, int cpu)
> +{
> +	intptr_t oldv = READ_ONCE(*v);
> +	intptr_t *newp = (intptr_t *)(oldv + voffp);
> +	int ret;
> +
> +	if (oldv == expectnot)
> +		return 1;
> +	ret = cpu_op_cmpeqv_storep_expect_fault(v, oldv, newp, cpu);
> +	if (!ret) {
> +		*load = oldv;
> +		return 0;
> +	}
> +	if (ret > 0) {
> +		errno = EAGAIN;
> +		return -1;
> +	}
> +	return -1;
> +}
> +
> +int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v2,
> +			.u.memcpy_op.src = (unsigned long)&newv2,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v2,
> +			.u.memcpy_op.src = (unsigned long)&newv2,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MB_OP,
> +		},
> +		[3] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t expect2, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v2,
> +			.u.compare_op.b = (unsigned long)&expect2,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			.u.memcpy_op.dst = (unsigned long)dst,
> +			.u.memcpy_op.src = (unsigned long)src,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			.u.memcpy_op.dst = (unsigned long)dst,
> +			.u.memcpy_op.src = (unsigned long)src,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MB_OP,
> +		},
> +		[3] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_addv(intptr_t *v, int64_t count, int cpu)
> +{
> +	return cpu_op_add(v, count, sizeof(intptr_t), cpu);
> +}
> diff --git a/tools/testing/selftests/cpu-opv/cpu-op.h b/tools/testing/selftests/cpu-opv/cpu-op.h
> new file mode 100644
> index 000000000000..ba2ec578ec50
> --- /dev/null
> +++ b/tools/testing/selftests/cpu-opv/cpu-op.h
> @@ -0,0 +1,68 @@
> +/*
> + * cpu-op.h
> + *
> + * (C) Copyright 2017 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#ifndef CPU_OPV_H
> +#define CPU_OPV_H
> +
> +#include <stdlib.h>
> +#include <stdint.h>
> +#include <linux/cpu_opv.h>
> +
> +#define likely(x)		__builtin_expect(!!(x), 1)
> +#define unlikely(x)		__builtin_expect(!!(x), 0)
> +#define barrier()		__asm__ __volatile__("" : : : "memory")
> +
> +#define ACCESS_ONCE(x)		(*(__volatile__  __typeof__(x) *)&(x))
> +#define WRITE_ONCE(x, v)	__extension__ ({ ACCESS_ONCE(x) = (v); })
> +#define READ_ONCE(x)		ACCESS_ONCE(x)
> +
> +int cpu_opv(struct cpu_op *cpuopv, int cpuopcnt, int cpu, int flags);
> +int cpu_op_get_current_cpu(void);
> +
> +int cpu_op_cmpxchg(void *v, void *expect, void *old, void *_new,
> +		size_t len, int cpu);
> +int cpu_op_add(void *v, int64_t count, size_t len, int cpu);
> +
> +int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu);
> +int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
> +		off_t voffp, intptr_t *load, int cpu);
> +int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu);
> +int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu);
> +int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t expect2, intptr_t newv,
> +		int cpu);
> +int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu);
> +int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu);
> +int cpu_op_addv(intptr_t *v, int64_t count, int cpu);
> +
> +#endif  /* CPU_OPV_H_ */
> diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
> index 5bef05d6ba39..441d7bc63bb7 100644
> --- a/tools/testing/selftests/lib.mk
> +++ b/tools/testing/selftests/lib.mk
> @@ -105,6 +105,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c
>  LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
>  endif
>  
> +# Selftest makefiles can override those targets by setting
> +# OVERRIDE_TARGETS = 1.
> +ifeq ($(OVERRIDE_TARGETS),)
>  $(OUTPUT)/%:%.c
>  	$(LINK.c) $^ $(LDLIBS) -o $@
>  
> @@ -113,5 +116,6 @@ $(OUTPUT)/%.o:%.S
>  
>  $(OUTPUT)/%:%.S
>  	$(LINK.S) $^ $(LDLIBS) -o $@
> +endif
>  
>  .PHONY: run_tests all clean install emit_tests
> 

As I said before, please do this change in a separate patch.

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 14/22] cpu_opv: selftests: Implement selftests
@ 2017-11-21 15:17     ` shuah
  0 siblings, 0 replies; 175+ messages in thread
From: shuah @ 2017-11-21 15:17 UTC (permalink / raw)


On 11/21/2017 07:18 AM, Mathieu Desnoyers wrote:
> Implement cpu_opv selftests. It needs to express dependencies on
> header files and .so, which require to override the selftests
> lib.mk targets. Introduce a new OVERRIDE_TARGETS define for this.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> CC: Russell King <linux at arm.linux.org.uk>
> CC: Catalin Marinas <catalin.marinas at arm.com>
> CC: Will Deacon <will.deacon at arm.com>
> CC: Thomas Gleixner <tglx at linutronix.de>
> CC: Paul Turner <pjt at google.com>
> CC: Andrew Hunter <ahh at google.com>
> CC: Peter Zijlstra <peterz at infradead.org>
> CC: Andy Lutomirski <luto at amacapital.net>
> CC: Andi Kleen <andi at firstfloor.org>
> CC: Dave Watson <davejwatson at fb.com>
> CC: Chris Lameter <cl at linux.com>
> CC: Ingo Molnar <mingo at redhat.com>
> CC: "H. Peter Anvin" <hpa at zytor.com>
> CC: Ben Maurer <bmaurer at fb.com>
> CC: Steven Rostedt <rostedt at goodmis.org>
> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
> CC: Josh Triplett <josh at joshtriplett.org>
> CC: Linus Torvalds <torvalds at linux-foundation.org>
> CC: Andrew Morton <akpm at linux-foundation.org>
> CC: Boqun Feng <boqun.feng at gmail.com>
> CC: Shuah Khan <shuah at kernel.org>
> CC: linux-kselftest at vger.kernel.org
> CC: linux-api at vger.kernel.org
> ---
> Changes since v1:
> 
> - Expose similar library API as rseq:  Expose library API closely
>   matching the rseq APIs, following removal of the event counter from
>   the rseq kernel API.
> - Update makefile to fix make run_tests dependency on "all".
> - Introduce a OVERRIDE_TARGETS.
> 
> Changes since v2:
> 
> - Test page faults.
> ---
>  MAINTAINERS                                        |    1 +
>  tools/testing/selftests/Makefile                   |    1 +
>  tools/testing/selftests/cpu-opv/.gitignore         |    1 +
>  tools/testing/selftests/cpu-opv/Makefile           |   17 +
>  .../testing/selftests/cpu-opv/basic_cpu_opv_test.c | 1189 ++++++++++++++++++++
>  tools/testing/selftests/cpu-opv/cpu-op.c           |  348 ++++++
>  tools/testing/selftests/cpu-opv/cpu-op.h           |   68 ++
>  tools/testing/selftests/lib.mk                     |    4 +

Please make the change to add OVERRIDE_TARGETS lib.mk a separate patch.
This is most likely going to be the first patch in this series.

>  8 files changed, 1629 insertions(+)
>  create mode 100644 tools/testing/selftests/cpu-opv/.gitignore
>  create mode 100644 tools/testing/selftests/cpu-opv/Makefile
>  create mode 100644 tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
>  create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.c
>  create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0b4e504f5003..c6c2436d15f8 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3734,6 +3734,7 @@ L:	linux-kernel at vger.kernel.org
>  S:	Supported
>  F:	kernel/cpu_opv.c
>  F:	include/uapi/linux/cpu_opv.h
> +F:	tools/testing/selftests/cpu-opv/
>  
>  CRAMFS FILESYSTEM
>  M:	Nicolas Pitre <nico at linaro.org>
> diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
> index eaf599dc2137..fc1eba0e0130 100644
> --- a/tools/testing/selftests/Makefile
> +++ b/tools/testing/selftests/Makefile
> @@ -5,6 +5,7 @@ TARGETS += breakpoints
>  TARGETS += capabilities
>  TARGETS += cpufreq
>  TARGETS += cpu-hotplug
> +TARGETS += cpu-opv
>  TARGETS += efivarfs
>  TARGETS += exec
>  TARGETS += firmware
> diff --git a/tools/testing/selftests/cpu-opv/.gitignore b/tools/testing/selftests/cpu-opv/.gitignore
> new file mode 100644
> index 000000000000..c7186eb95cf5
> --- /dev/null
> +++ b/tools/testing/selftests/cpu-opv/.gitignore
> @@ -0,0 +1 @@
> +basic_cpu_opv_test
> diff --git a/tools/testing/selftests/cpu-opv/Makefile b/tools/testing/selftests/cpu-opv/Makefile
> new file mode 100644
> index 000000000000..21e63545d521
> --- /dev/null
> +++ b/tools/testing/selftests/cpu-opv/Makefile
> @@ -0,0 +1,17 @@
> +CFLAGS += -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./
> +
> +# Own dependencies because we only want to build against 1st prerequisite, but
> +# still track changes to header files and depend on shared object.
> +OVERRIDE_TARGETS = 1
> +
> +TEST_GEN_PROGS = basic_cpu_opv_test
> +
> +TEST_GEN_PROGS_EXTENDED = libcpu-op.so


> +
> +include ../lib.mk
> +
> +$(OUTPUT)/libcpu-op.so: cpu-op.c cpu-op.h
> +	$(CC) $(CFLAGS) -shared -fPIC $< -o $@
> +
> +$(OUTPUT)/%: %.c $(TEST_GEN_PROGS_EXTENDED) cpu-op.h
> +	$(CC) $(CFLAGS) $< -lcpu-op -o $@
> diff --git a/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
> new file mode 100644
> index 000000000000..a31a10bbd8aa
> --- /dev/null
> +++ b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
> @@ -0,0 +1,1189 @@
> +/*
> + * Basic test coverage for cpu_opv system call.
> + */
> +
> +#define _GNU_SOURCE
> +#include <assert.h>
> +#include <sched.h>
> +#include <signal.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <sys/time.h>
> +#include <errno.h>
> +#include <stdlib.h>
> +
> +#include "cpu-op.h"
> +
> +#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
> +
> +#define TESTBUFLEN	4096
> +#define TESTBUFLEN_CMP	16
> +
> +#define TESTBUFLEN_PAGE_MAX	65536
> +
> +#define NR_PF_ARRAY	16384
> +#define PF_ARRAY_LEN	4096
> +
> +/* 64 MB arrays for page fault testing. */
> +char pf_array_dst[NR_PF_ARRAY][PF_ARRAY_LEN];
> +char pf_array_src[NR_PF_ARRAY][PF_ARRAY_LEN];
> +
> +static int test_compare_eq_op(char *a, char *b, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_compare_eq_same(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_compare_eq same";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test compare_eq */
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf2[i] = (char)i;
> +	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret > 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 0);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_compare_eq_diff(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_compare_eq different";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN);
> +	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret == 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 1);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_compare_ne_op(char *a, char *b, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_NE_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_compare_ne_same(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_compare_ne same";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test compare_ne */
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf2[i] = (char)i;
> +	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret == 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 1);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_compare_ne_diff(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_compare_ne different";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN);
> +	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 0);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_2compare_eq_op(char *a, char *b, char *c, char *d,
> +		size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_2compare_eq_index(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN_CMP];
> +	char buf2[TESTBUFLEN_CMP];
> +	char buf3[TESTBUFLEN_CMP];
> +	char buf4[TESTBUFLEN_CMP];
> +	const char *test_name = "test_2compare_eq index";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN_CMP);
> +	memset(buf3, 0, TESTBUFLEN_CMP);
> +	memset(buf4, 0, TESTBUFLEN_CMP);
> +
> +	/* First compare failure is op[0], expect 1. */
> +	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 1) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 1);
> +		return -1;
> +	}
> +
> +	/* All compares succeed. */
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf2[i] = (char)i;
> +	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 0);
> +		return -1;
> +	}
> +
> +	/* First compare failure is op[1], expect 2. */
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf3[i] = (char)i;
> +	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 2) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 2);
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +static int test_2compare_ne_op(char *a, char *b, char *c, char *d,
> +		size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_NE_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_COMPARE_NE_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_2compare_ne_index(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN_CMP];
> +	char buf2[TESTBUFLEN_CMP];
> +	char buf3[TESTBUFLEN_CMP];
> +	char buf4[TESTBUFLEN_CMP];
> +	const char *test_name = "test_2compare_ne index";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	memset(buf1, 0, TESTBUFLEN_CMP);
> +	memset(buf2, 0, TESTBUFLEN_CMP);
> +	memset(buf3, 0, TESTBUFLEN_CMP);
> +	memset(buf4, 0, TESTBUFLEN_CMP);
> +
> +	/* First compare ne failure is op[0], expect 1. */
> +	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 1) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 1);
> +		return -1;
> +	}
> +
> +	/* All compare ne succeed. */
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf1[i] = (char)i;
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf3[i] = (char)i;
> +	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 0);
> +		return -1;
> +	}
> +
> +	/* First compare failure is op[1], expect 2. */
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf4[i] = (char)i;
> +	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 2) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 2);
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +static int test_memcpy_op(void *dst, void *src, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_memcpy(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_memcpy";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy */
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN);
> +	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	for (i = 0; i < TESTBUFLEN; i++) {
> +		if (buf2[i] != (char)i) {
> +			printf("%s failed. Expecting '%d', found '%d' at offset %d\n",
> +				test_name, (char)i, buf2[i], i);
> +			return -1;
> +		}
> +	}
> +	return 0;
> +}
> +
> +static int test_memcpy_u32(void)
> +{
> +	int ret;
> +	uint32_t v1, v2;
> +	const char *test_name = "test_memcpy_u32";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy_u32 */
> +	v1 = 42;
> +	v2 = 0;
> +	ret = test_memcpy_op(&v2, &v1, sizeof(v1));
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (v1 != v2) {
> +		printf("%s failed. Expecting '%d', found '%d'\n",
> +			test_name, v1, v2);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_memcpy_mb_memcpy_op(void *dst1, void *src1,
> +		void *dst2, void *src2, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst1),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src1),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MB_OP,
> +		},
> +		[2] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst2),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src2),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_memcpy_mb_memcpy(void)
> +{
> +	int ret;
> +	int v1, v2, v3;
> +	const char *test_name = "test_memcpy_mb_memcpy";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy */
> +	v1 = 42;
> +	v2 = v3 = 0;
> +	ret = test_memcpy_mb_memcpy_op(&v2, &v1, &v3, &v2, sizeof(int));
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (v3 != v1) {
> +		printf("%s failed. Expecting '%d', found '%d'\n",
> +			test_name, v1, v3);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_add_op(int *v, int64_t increment)
> +{
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_op_add(v, increment, sizeof(*v), cpu);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_add(void)
> +{
> +	int orig_v = 42, v, ret;
> +	int increment = 1;
> +	const char *test_name = "test_add";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_add_op(&v, increment);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != orig_v + increment) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_two_add_op(int *v, int64_t *increments)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_ADD_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.arithmetic_op.p, v),
> +			.u.arithmetic_op.count = increments[0],
> +			.u.arithmetic_op.expect_fault_p = 0,
> +		},
> +		[1] = {
> +			.op = CPU_ADD_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.arithmetic_op.p, v),
> +			.u.arithmetic_op.count = increments[1],
> +			.u.arithmetic_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_two_add(void)
> +{
> +	int orig_v = 42, v, ret;
> +	int64_t increments[2] = { 99, 123 };
> +	const char *test_name = "test_two_add";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_two_add_op(&v, increments);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != orig_v + increments[0] + increments[1]) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_or_op(int *v, uint64_t mask)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_OR_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.bitwise_op.p, v),
> +			.u.bitwise_op.mask = mask,
> +			.u.bitwise_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_or(void)
> +{
> +	int orig_v = 0xFF00000, v, ret;
> +	uint32_t mask = 0xFFF;
> +	const char *test_name = "test_or";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_or_op(&v, mask);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != (orig_v | mask)) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v | mask);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_and_op(int *v, uint64_t mask)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_AND_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.bitwise_op.p, v),
> +			.u.bitwise_op.mask = mask,
> +			.u.bitwise_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_and(void)
> +{
> +	int orig_v = 0xF00, v, ret;
> +	uint32_t mask = 0xFFF;
> +	const char *test_name = "test_and";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_and_op(&v, mask);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != (orig_v & mask)) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v & mask);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_xor_op(int *v, uint64_t mask)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_XOR_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.bitwise_op.p, v),
> +			.u.bitwise_op.mask = mask,
> +			.u.bitwise_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_xor(void)
> +{
> +	int orig_v = 0xF00, v, ret;
> +	uint32_t mask = 0xFFF;
> +	const char *test_name = "test_xor";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_xor_op(&v, mask);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != (orig_v ^ mask)) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v ^ mask);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_lshift_op(int *v, uint32_t bits)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_LSHIFT_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.shift_op.p, v),
> +			.u.shift_op.bits = bits,
> +			.u.shift_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_lshift(void)
> +{
> +	int orig_v = 0xF00, v, ret;
> +	uint32_t bits = 5;
> +	const char *test_name = "test_lshift";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_lshift_op(&v, bits);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != (orig_v << bits)) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v << bits);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_rshift_op(int *v, uint32_t bits)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_RSHIFT_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.shift_op.p, v),
> +			.u.shift_op.bits = bits,
> +			.u.shift_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_rshift(void)
> +{
> +	int orig_v = 0xF00, v, ret;
> +	uint32_t bits = 5;
> +	const char *test_name = "test_rshift";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_rshift_op(&v, bits);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != (orig_v >> bits)) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v >> bits);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_cmpxchg_op(void *v, void *expect, void *old, void *n,
> +		size_t len)
> +{
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_op_cmpxchg(v, expect, old, n, len, cpu);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +
> +static int test_cmpxchg_success(void)
> +{
> +	int ret;
> +	uint64_t orig_v = 1, v, expect = 1, old = 0, n = 3;
> +	const char *test_name = "test_cmpxchg success";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 0);
> +		return -1;
> +	}
> +	if (v != n) {
> +		printf("%s v is %lld, expecting %lld\n",
> +			test_name, (long long)v, (long long)n);
> +		return -1;
> +	}
> +	if (old != orig_v) {
> +		printf("%s old is %lld, expecting %lld\n",
> +			test_name, (long long)old, (long long)orig_v);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_cmpxchg_fail(void)
> +{
> +	int ret;
> +	uint64_t orig_v = 1, v, expect = 123, old = 0, n = 3;
> +	const char *test_name = "test_cmpxchg fail";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret == 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 1);
> +		return -1;
> +	}
> +	if (v == n) {
> +		printf("%s v is %lld, expecting %lld\n",
> +			test_name, (long long)v, (long long)orig_v);
> +		return -1;
> +	}
> +	if (old != orig_v) {
> +		printf("%s old is %lld, expecting %lld\n",
> +			test_name, (long long)old, (long long)orig_v);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_memcpy_expect_fault_op(void *dst, void *src, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			/* Return EAGAIN on fault. */
> +			.u.memcpy_op.expect_fault_src = 1,
> +		},
> +	};
> +	int cpu;
> +
> +	cpu = cpu_op_get_current_cpu();
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +static int test_memcpy_fault(void)
> +{
> +	int ret;
> +	char buf1[TESTBUFLEN];
> +	const char *test_name = "test_memcpy_fault";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy */
> +	ret = test_memcpy_op(buf1, NULL, TESTBUFLEN);
> +	if (!ret || (ret < 0 && errno != EFAULT)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	/* Test memcpy expect fault */
> +	ret = test_memcpy_expect_fault_op(buf1, NULL, TESTBUFLEN);
> +	if (!ret || (ret < 0 && errno != EAGAIN)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +static int do_test_unknown_op(void)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = -1,	/* Unknown */
> +			.len = 0,
> +		},
> +	};
> +	int cpu;
> +
> +	cpu = cpu_op_get_current_cpu();
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +static int test_unknown_op(void)
> +{
> +	int ret;
> +	const char *test_name = "test_unknown_op";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	ret = do_test_unknown_op();
> +	if (!ret || (ret < 0 && errno != EINVAL)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +static int do_test_max_ops(void)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = { .op = CPU_MB_OP, },
> +		[1] = { .op = CPU_MB_OP, },
> +		[2] = { .op = CPU_MB_OP, },
> +		[3] = { .op = CPU_MB_OP, },
> +		[4] = { .op = CPU_MB_OP, },
> +		[5] = { .op = CPU_MB_OP, },
> +		[6] = { .op = CPU_MB_OP, },
> +		[7] = { .op = CPU_MB_OP, },
> +		[8] = { .op = CPU_MB_OP, },
> +		[9] = { .op = CPU_MB_OP, },
> +		[10] = { .op = CPU_MB_OP, },
> +		[11] = { .op = CPU_MB_OP, },
> +		[12] = { .op = CPU_MB_OP, },
> +		[13] = { .op = CPU_MB_OP, },
> +		[14] = { .op = CPU_MB_OP, },
> +		[15] = { .op = CPU_MB_OP, },
> +	};
> +	int cpu;
> +
> +	cpu = cpu_op_get_current_cpu();
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +static int test_max_ops(void)
> +{
> +	int ret;
> +	const char *test_name = "test_max_ops";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	ret = do_test_max_ops();
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +static int do_test_too_many_ops(void)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = { .op = CPU_MB_OP, },
> +		[1] = { .op = CPU_MB_OP, },
> +		[2] = { .op = CPU_MB_OP, },
> +		[3] = { .op = CPU_MB_OP, },
> +		[4] = { .op = CPU_MB_OP, },
> +		[5] = { .op = CPU_MB_OP, },
> +		[6] = { .op = CPU_MB_OP, },
> +		[7] = { .op = CPU_MB_OP, },
> +		[8] = { .op = CPU_MB_OP, },
> +		[9] = { .op = CPU_MB_OP, },
> +		[10] = { .op = CPU_MB_OP, },
> +		[11] = { .op = CPU_MB_OP, },
> +		[12] = { .op = CPU_MB_OP, },
> +		[13] = { .op = CPU_MB_OP, },
> +		[14] = { .op = CPU_MB_OP, },
> +		[15] = { .op = CPU_MB_OP, },
> +		[16] = { .op = CPU_MB_OP, },
> +	};
> +	int cpu;
> +
> +	cpu = cpu_op_get_current_cpu();
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +static int test_too_many_ops(void)
> +{
> +	int ret;
> +	const char *test_name = "test_too_many_ops";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	ret = do_test_too_many_ops();
> +	if (!ret || (ret < 0 && errno != EINVAL)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +/* Use 64kB len, largest page size known on Linux. */
> +static int test_memcpy_single_too_large(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN_PAGE_MAX + 1];
> +	char buf2[TESTBUFLEN_PAGE_MAX + 1];
> +	const char *test_name = "test_memcpy_single_too_large";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy */
> +	for (i = 0; i < TESTBUFLEN_PAGE_MAX + 1; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN_PAGE_MAX + 1);
> +	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN_PAGE_MAX + 1);
> +	if (!ret || (ret < 0 && errno != EINVAL)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +static int test_memcpy_single_ok_sum_too_large_op(void *dst, void *src, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_memcpy_single_ok_sum_too_large(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_memcpy_single_ok_sum_too_large";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy */
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN);
> +	ret = test_memcpy_single_ok_sum_too_large_op(buf2, buf1, TESTBUFLEN);
> +	if (!ret || (ret < 0 && errno != EINVAL)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Iterate over large uninitialized arrays to trigger page faults.
> + */
> +int test_page_fault(void)
> +{
> +	int ret = 0;
> +	uint64_t i;
> +	const char *test_name = "test_page_fault";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	for (i = 0; i < NR_PF_ARRAY; i++) {
> +		ret = test_memcpy_op(pf_array_dst[i],
> +				     pf_array_src[i],
> +				     PF_ARRAY_LEN);
> +		if (ret) {
> +			printf("%s returned with %d, errno: %s\n",
> +				test_name, ret, strerror(errno));
> +			return ret;
> +		}
> +	}
> +	return 0;
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	int ret = 0;
> +
> +	ret |= test_compare_eq_same();
> +	ret |= test_compare_eq_diff();
> +	ret |= test_compare_ne_same();
> +	ret |= test_compare_ne_diff();
> +	ret |= test_2compare_eq_index();
> +	ret |= test_2compare_ne_index();
> +	ret |= test_memcpy();
> +	ret |= test_memcpy_u32();
> +	ret |= test_memcpy_mb_memcpy();
> +	ret |= test_add();
> +	ret |= test_two_add();
> +	ret |= test_or();
> +	ret |= test_and();
> +	ret |= test_xor();
> +	ret |= test_lshift();
> +	ret |= test_rshift();
> +	ret |= test_cmpxchg_success();
> +	ret |= test_cmpxchg_fail();
> +	ret |= test_memcpy_fault();
> +	ret |= test_unknown_op();
> +	ret |= test_max_ops();
> +	ret |= test_too_many_ops();
> +	ret |= test_memcpy_single_too_large();
> +	ret |= test_memcpy_single_ok_sum_too_large();
> +	ret |= test_page_fault();
> +

Where do pass counts get printed. I am seeing error messages when tests fail,
not seeing any pass messages. It would be nice to use ksft framework for
counting pass/fail for these series of tests that get run.

> +	return ret;
> +}
> diff --git a/tools/testing/selftests/cpu-opv/cpu-op.c b/tools/testing/selftests/cpu-opv/cpu-op.c
> new file mode 100644
> index 000000000000..d7ba481cca04
> --- /dev/null
> +++ b/tools/testing/selftests/cpu-opv/cpu-op.c
> @@ -0,0 +1,348 @@
> +/*
> + * cpu-op.c
> + *
> + * Copyright (C) 2017 Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; only
> + * version 2.1 of the License.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * Lesser General Public License for more details.
> + */
> +
> +#define _GNU_SOURCE
> +#include <errno.h>
> +#include <sched.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <syscall.h>
> +#include <assert.h>
> +#include <signal.h>
> +
> +#include "cpu-op.h"
> +
> +#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
> +
> +int cpu_opv(struct cpu_op *cpu_opv, int cpuopcnt, int cpu, int flags)
> +{
> +	return syscall(__NR_cpu_opv, cpu_opv, cpuopcnt, cpu, flags);
> +}
> +
> +int cpu_op_get_current_cpu(void)
> +{
> +	int cpu;
> +
> +	cpu = sched_getcpu();
> +	if (cpu < 0) {
> +		perror("sched_getcpu()");
> +		abort();
> +	}
> +	return cpu;
> +}
> +
> +int cpu_op_cmpxchg(void *v, void *expect, void *old, void *n,
> +		size_t len, int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			.u.memcpy_op.dst = (unsigned long)old,
> +			.u.memcpy_op.src = (unsigned long)v,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[1] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = len,
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)n,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_add(void *v, int64_t count, size_t len, int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_ADD_OP,
> +			.len = len,
> +			.u.arithmetic_op.p = (unsigned long)v,
> +			.u.arithmetic_op.count = count,
> +			.u.arithmetic_op.expect_fault_p = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +static int cpu_op_cmpeqv_storep_expect_fault(intptr_t *v, intptr_t expect,
> +		intptr_t *newp, int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)newp,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			/* Return EAGAIN on src fault. */
> +			.u.memcpy_op.expect_fault_src = 1,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
> +		off_t voffp, intptr_t *load, int cpu)
> +{
> +	intptr_t oldv = READ_ONCE(*v);
> +	intptr_t *newp = (intptr_t *)(oldv + voffp);
> +	int ret;
> +
> +	if (oldv == expectnot)
> +		return 1;
> +	ret = cpu_op_cmpeqv_storep_expect_fault(v, oldv, newp, cpu);
> +	if (!ret) {
> +		*load = oldv;
> +		return 0;
> +	}
> +	if (ret > 0) {
> +		errno = EAGAIN;
> +		return -1;
> +	}
> +	return -1;
> +}
> +
> +int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v2,
> +			.u.memcpy_op.src = (unsigned long)&newv2,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v2,
> +			.u.memcpy_op.src = (unsigned long)&newv2,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MB_OP,
> +		},
> +		[3] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t expect2, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v2,
> +			.u.compare_op.b = (unsigned long)&expect2,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			.u.memcpy_op.dst = (unsigned long)dst,
> +			.u.memcpy_op.src = (unsigned long)src,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			.u.memcpy_op.dst = (unsigned long)dst,
> +			.u.memcpy_op.src = (unsigned long)src,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MB_OP,
> +		},
> +		[3] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_addv(intptr_t *v, int64_t count, int cpu)
> +{
> +	return cpu_op_add(v, count, sizeof(intptr_t), cpu);
> +}
> diff --git a/tools/testing/selftests/cpu-opv/cpu-op.h b/tools/testing/selftests/cpu-opv/cpu-op.h
> new file mode 100644
> index 000000000000..ba2ec578ec50
> --- /dev/null
> +++ b/tools/testing/selftests/cpu-opv/cpu-op.h
> @@ -0,0 +1,68 @@
> +/*
> + * cpu-op.h
> + *
> + * (C) Copyright 2017 - Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#ifndef CPU_OPV_H
> +#define CPU_OPV_H
> +
> +#include <stdlib.h>
> +#include <stdint.h>
> +#include <linux/cpu_opv.h>
> +
> +#define likely(x)		__builtin_expect(!!(x), 1)
> +#define unlikely(x)		__builtin_expect(!!(x), 0)
> +#define barrier()		__asm__ __volatile__("" : : : "memory")
> +
> +#define ACCESS_ONCE(x)		(*(__volatile__  __typeof__(x) *)&(x))
> +#define WRITE_ONCE(x, v)	__extension__ ({ ACCESS_ONCE(x) = (v); })
> +#define READ_ONCE(x)		ACCESS_ONCE(x)
> +
> +int cpu_opv(struct cpu_op *cpuopv, int cpuopcnt, int cpu, int flags);
> +int cpu_op_get_current_cpu(void);
> +
> +int cpu_op_cmpxchg(void *v, void *expect, void *old, void *_new,
> +		size_t len, int cpu);
> +int cpu_op_add(void *v, int64_t count, size_t len, int cpu);
> +
> +int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu);
> +int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
> +		off_t voffp, intptr_t *load, int cpu);
> +int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu);
> +int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu);
> +int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t expect2, intptr_t newv,
> +		int cpu);
> +int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu);
> +int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu);
> +int cpu_op_addv(intptr_t *v, int64_t count, int cpu);
> +
> +#endif  /* CPU_OPV_H_ */
> diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
> index 5bef05d6ba39..441d7bc63bb7 100644
> --- a/tools/testing/selftests/lib.mk
> +++ b/tools/testing/selftests/lib.mk
> @@ -105,6 +105,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c
>  LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
>  endif
>  
> +# Selftest makefiles can override those targets by setting
> +# OVERRIDE_TARGETS = 1.
> +ifeq ($(OVERRIDE_TARGETS),)
>  $(OUTPUT)/%:%.c
>  	$(LINK.c) $^ $(LDLIBS) -o $@
>  
> @@ -113,5 +116,6 @@ $(OUTPUT)/%.o:%.S
>  
>  $(OUTPUT)/%:%.S
>  	$(LINK.S) $^ $(LDLIBS) -o $@
> +endif
>  
>  .PHONY: run_tests all clean install emit_tests
> 

As I said before, please do this change in a separate patch.

thanks,
-- Shuah

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 14/22] cpu_opv: selftests: Implement selftests
@ 2017-11-21 15:17     ` shuah
  0 siblings, 0 replies; 175+ messages in thread
From: Shuah Khan @ 2017-11-21 15:17 UTC (permalink / raw)


On 11/21/2017 07:18 AM, Mathieu Desnoyers wrote:
> Implement cpu_opv selftests. It needs to express dependencies on
> header files and .so, which require to override the selftests
> lib.mk targets. Introduce a new OVERRIDE_TARGETS define for this.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> CC: Russell King <linux at arm.linux.org.uk>
> CC: Catalin Marinas <catalin.marinas at arm.com>
> CC: Will Deacon <will.deacon at arm.com>
> CC: Thomas Gleixner <tglx at linutronix.de>
> CC: Paul Turner <pjt at google.com>
> CC: Andrew Hunter <ahh at google.com>
> CC: Peter Zijlstra <peterz at infradead.org>
> CC: Andy Lutomirski <luto at amacapital.net>
> CC: Andi Kleen <andi at firstfloor.org>
> CC: Dave Watson <davejwatson at fb.com>
> CC: Chris Lameter <cl at linux.com>
> CC: Ingo Molnar <mingo at redhat.com>
> CC: "H. Peter Anvin" <hpa at zytor.com>
> CC: Ben Maurer <bmaurer at fb.com>
> CC: Steven Rostedt <rostedt at goodmis.org>
> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
> CC: Josh Triplett <josh at joshtriplett.org>
> CC: Linus Torvalds <torvalds at linux-foundation.org>
> CC: Andrew Morton <akpm at linux-foundation.org>
> CC: Boqun Feng <boqun.feng at gmail.com>
> CC: Shuah Khan <shuah at kernel.org>
> CC: linux-kselftest at vger.kernel.org
> CC: linux-api at vger.kernel.org
> ---
> Changes since v1:
> 
> - Expose similar library API as rseq:  Expose library API closely
>   matching the rseq APIs, following removal of the event counter from
>   the rseq kernel API.
> - Update makefile to fix make run_tests dependency on "all".
> - Introduce a OVERRIDE_TARGETS.
> 
> Changes since v2:
> 
> - Test page faults.
> ---
>  MAINTAINERS                                        |    1 +
>  tools/testing/selftests/Makefile                   |    1 +
>  tools/testing/selftests/cpu-opv/.gitignore         |    1 +
>  tools/testing/selftests/cpu-opv/Makefile           |   17 +
>  .../testing/selftests/cpu-opv/basic_cpu_opv_test.c | 1189 ++++++++++++++++++++
>  tools/testing/selftests/cpu-opv/cpu-op.c           |  348 ++++++
>  tools/testing/selftests/cpu-opv/cpu-op.h           |   68 ++
>  tools/testing/selftests/lib.mk                     |    4 +

Please make the change to add OVERRIDE_TARGETS lib.mk a separate patch.
This is most likely going to be the first patch in this series.

>  8 files changed, 1629 insertions(+)
>  create mode 100644 tools/testing/selftests/cpu-opv/.gitignore
>  create mode 100644 tools/testing/selftests/cpu-opv/Makefile
>  create mode 100644 tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
>  create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.c
>  create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0b4e504f5003..c6c2436d15f8 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3734,6 +3734,7 @@ L:	linux-kernel at vger.kernel.org
>  S:	Supported
>  F:	kernel/cpu_opv.c
>  F:	include/uapi/linux/cpu_opv.h
> +F:	tools/testing/selftests/cpu-opv/
>  
>  CRAMFS FILESYSTEM
>  M:	Nicolas Pitre <nico at linaro.org>
> diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
> index eaf599dc2137..fc1eba0e0130 100644
> --- a/tools/testing/selftests/Makefile
> +++ b/tools/testing/selftests/Makefile
> @@ -5,6 +5,7 @@ TARGETS += breakpoints
>  TARGETS += capabilities
>  TARGETS += cpufreq
>  TARGETS += cpu-hotplug
> +TARGETS += cpu-opv
>  TARGETS += efivarfs
>  TARGETS += exec
>  TARGETS += firmware
> diff --git a/tools/testing/selftests/cpu-opv/.gitignore b/tools/testing/selftests/cpu-opv/.gitignore
> new file mode 100644
> index 000000000000..c7186eb95cf5
> --- /dev/null
> +++ b/tools/testing/selftests/cpu-opv/.gitignore
> @@ -0,0 +1 @@
> +basic_cpu_opv_test
> diff --git a/tools/testing/selftests/cpu-opv/Makefile b/tools/testing/selftests/cpu-opv/Makefile
> new file mode 100644
> index 000000000000..21e63545d521
> --- /dev/null
> +++ b/tools/testing/selftests/cpu-opv/Makefile
> @@ -0,0 +1,17 @@
> +CFLAGS += -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./
> +
> +# Own dependencies because we only want to build against 1st prerequisite, but
> +# still track changes to header files and depend on shared object.
> +OVERRIDE_TARGETS = 1
> +
> +TEST_GEN_PROGS = basic_cpu_opv_test
> +
> +TEST_GEN_PROGS_EXTENDED = libcpu-op.so


> +
> +include ../lib.mk
> +
> +$(OUTPUT)/libcpu-op.so: cpu-op.c cpu-op.h
> +	$(CC) $(CFLAGS) -shared -fPIC $< -o $@
> +
> +$(OUTPUT)/%: %.c $(TEST_GEN_PROGS_EXTENDED) cpu-op.h
> +	$(CC) $(CFLAGS) $< -lcpu-op -o $@
> diff --git a/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
> new file mode 100644
> index 000000000000..a31a10bbd8aa
> --- /dev/null
> +++ b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
> @@ -0,0 +1,1189 @@
> +/*
> + * Basic test coverage for cpu_opv system call.
> + */
> +
> +#define _GNU_SOURCE
> +#include <assert.h>
> +#include <sched.h>
> +#include <signal.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <sys/time.h>
> +#include <errno.h>
> +#include <stdlib.h>
> +
> +#include "cpu-op.h"
> +
> +#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
> +
> +#define TESTBUFLEN	4096
> +#define TESTBUFLEN_CMP	16
> +
> +#define TESTBUFLEN_PAGE_MAX	65536
> +
> +#define NR_PF_ARRAY	16384
> +#define PF_ARRAY_LEN	4096
> +
> +/* 64 MB arrays for page fault testing. */
> +char pf_array_dst[NR_PF_ARRAY][PF_ARRAY_LEN];
> +char pf_array_src[NR_PF_ARRAY][PF_ARRAY_LEN];
> +
> +static int test_compare_eq_op(char *a, char *b, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_compare_eq_same(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_compare_eq same";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test compare_eq */
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf2[i] = (char)i;
> +	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret > 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 0);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_compare_eq_diff(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_compare_eq different";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN);
> +	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret == 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 1);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_compare_ne_op(char *a, char *b, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_NE_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_compare_ne_same(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_compare_ne same";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test compare_ne */
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf2[i] = (char)i;
> +	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret == 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 1);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_compare_ne_diff(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_compare_ne different";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN);
> +	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 0);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_2compare_eq_op(char *a, char *b, char *c, char *d,
> +		size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_2compare_eq_index(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN_CMP];
> +	char buf2[TESTBUFLEN_CMP];
> +	char buf3[TESTBUFLEN_CMP];
> +	char buf4[TESTBUFLEN_CMP];
> +	const char *test_name = "test_2compare_eq index";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN_CMP);
> +	memset(buf3, 0, TESTBUFLEN_CMP);
> +	memset(buf4, 0, TESTBUFLEN_CMP);
> +
> +	/* First compare failure is op[0], expect 1. */
> +	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 1) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 1);
> +		return -1;
> +	}
> +
> +	/* All compares succeed. */
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf2[i] = (char)i;
> +	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 0);
> +		return -1;
> +	}
> +
> +	/* First compare failure is op[1], expect 2. */
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf3[i] = (char)i;
> +	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 2) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 2);
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +static int test_2compare_ne_op(char *a, char *b, char *c, char *d,
> +		size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_NE_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_COMPARE_NE_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_2compare_ne_index(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN_CMP];
> +	char buf2[TESTBUFLEN_CMP];
> +	char buf3[TESTBUFLEN_CMP];
> +	char buf4[TESTBUFLEN_CMP];
> +	const char *test_name = "test_2compare_ne index";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	memset(buf1, 0, TESTBUFLEN_CMP);
> +	memset(buf2, 0, TESTBUFLEN_CMP);
> +	memset(buf3, 0, TESTBUFLEN_CMP);
> +	memset(buf4, 0, TESTBUFLEN_CMP);
> +
> +	/* First compare ne failure is op[0], expect 1. */
> +	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 1) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 1);
> +		return -1;
> +	}
> +
> +	/* All compare ne succeed. */
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf1[i] = (char)i;
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf3[i] = (char)i;
> +	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 0);
> +		return -1;
> +	}
> +
> +	/* First compare failure is op[1], expect 2. */
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf4[i] = (char)i;
> +	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 2) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 2);
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +static int test_memcpy_op(void *dst, void *src, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_memcpy(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_memcpy";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy */
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN);
> +	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	for (i = 0; i < TESTBUFLEN; i++) {
> +		if (buf2[i] != (char)i) {
> +			printf("%s failed. Expecting '%d', found '%d' at offset %d\n",
> +				test_name, (char)i, buf2[i], i);
> +			return -1;
> +		}
> +	}
> +	return 0;
> +}
> +
> +static int test_memcpy_u32(void)
> +{
> +	int ret;
> +	uint32_t v1, v2;
> +	const char *test_name = "test_memcpy_u32";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy_u32 */
> +	v1 = 42;
> +	v2 = 0;
> +	ret = test_memcpy_op(&v2, &v1, sizeof(v1));
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (v1 != v2) {
> +		printf("%s failed. Expecting '%d', found '%d'\n",
> +			test_name, v1, v2);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_memcpy_mb_memcpy_op(void *dst1, void *src1,
> +		void *dst2, void *src2, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst1),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src1),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MB_OP,
> +		},
> +		[2] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst2),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src2),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_memcpy_mb_memcpy(void)
> +{
> +	int ret;
> +	int v1, v2, v3;
> +	const char *test_name = "test_memcpy_mb_memcpy";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy */
> +	v1 = 42;
> +	v2 = v3 = 0;
> +	ret = test_memcpy_mb_memcpy_op(&v2, &v1, &v3, &v2, sizeof(int));
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (v3 != v1) {
> +		printf("%s failed. Expecting '%d', found '%d'\n",
> +			test_name, v1, v3);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_add_op(int *v, int64_t increment)
> +{
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_op_add(v, increment, sizeof(*v), cpu);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_add(void)
> +{
> +	int orig_v = 42, v, ret;
> +	int increment = 1;
> +	const char *test_name = "test_add";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_add_op(&v, increment);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != orig_v + increment) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_two_add_op(int *v, int64_t *increments)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_ADD_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.arithmetic_op.p, v),
> +			.u.arithmetic_op.count = increments[0],
> +			.u.arithmetic_op.expect_fault_p = 0,
> +		},
> +		[1] = {
> +			.op = CPU_ADD_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.arithmetic_op.p, v),
> +			.u.arithmetic_op.count = increments[1],
> +			.u.arithmetic_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_two_add(void)
> +{
> +	int orig_v = 42, v, ret;
> +	int64_t increments[2] = { 99, 123 };
> +	const char *test_name = "test_two_add";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_two_add_op(&v, increments);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != orig_v + increments[0] + increments[1]) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_or_op(int *v, uint64_t mask)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_OR_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.bitwise_op.p, v),
> +			.u.bitwise_op.mask = mask,
> +			.u.bitwise_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_or(void)
> +{
> +	int orig_v = 0xFF00000, v, ret;
> +	uint32_t mask = 0xFFF;
> +	const char *test_name = "test_or";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_or_op(&v, mask);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != (orig_v | mask)) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v | mask);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_and_op(int *v, uint64_t mask)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_AND_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.bitwise_op.p, v),
> +			.u.bitwise_op.mask = mask,
> +			.u.bitwise_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_and(void)
> +{
> +	int orig_v = 0xF00, v, ret;
> +	uint32_t mask = 0xFFF;
> +	const char *test_name = "test_and";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_and_op(&v, mask);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != (orig_v & mask)) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v & mask);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_xor_op(int *v, uint64_t mask)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_XOR_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.bitwise_op.p, v),
> +			.u.bitwise_op.mask = mask,
> +			.u.bitwise_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_xor(void)
> +{
> +	int orig_v = 0xF00, v, ret;
> +	uint32_t mask = 0xFFF;
> +	const char *test_name = "test_xor";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_xor_op(&v, mask);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != (orig_v ^ mask)) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v ^ mask);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_lshift_op(int *v, uint32_t bits)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_LSHIFT_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.shift_op.p, v),
> +			.u.shift_op.bits = bits,
> +			.u.shift_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_lshift(void)
> +{
> +	int orig_v = 0xF00, v, ret;
> +	uint32_t bits = 5;
> +	const char *test_name = "test_lshift";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_lshift_op(&v, bits);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != (orig_v << bits)) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v << bits);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_rshift_op(int *v, uint32_t bits)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_RSHIFT_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.shift_op.p, v),
> +			.u.shift_op.bits = bits,
> +			.u.shift_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_rshift(void)
> +{
> +	int orig_v = 0xF00, v, ret;
> +	uint32_t bits = 5;
> +	const char *test_name = "test_rshift";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_rshift_op(&v, bits);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != (orig_v >> bits)) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v >> bits);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_cmpxchg_op(void *v, void *expect, void *old, void *n,
> +		size_t len)
> +{
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_op_cmpxchg(v, expect, old, n, len, cpu);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +
> +static int test_cmpxchg_success(void)
> +{
> +	int ret;
> +	uint64_t orig_v = 1, v, expect = 1, old = 0, n = 3;
> +	const char *test_name = "test_cmpxchg success";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 0);
> +		return -1;
> +	}
> +	if (v != n) {
> +		printf("%s v is %lld, expecting %lld\n",
> +			test_name, (long long)v, (long long)n);
> +		return -1;
> +	}
> +	if (old != orig_v) {
> +		printf("%s old is %lld, expecting %lld\n",
> +			test_name, (long long)old, (long long)orig_v);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_cmpxchg_fail(void)
> +{
> +	int ret;
> +	uint64_t orig_v = 1, v, expect = 123, old = 0, n = 3;
> +	const char *test_name = "test_cmpxchg fail";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret == 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 1);
> +		return -1;
> +	}
> +	if (v == n) {
> +		printf("%s v is %lld, expecting %lld\n",
> +			test_name, (long long)v, (long long)orig_v);
> +		return -1;
> +	}
> +	if (old != orig_v) {
> +		printf("%s old is %lld, expecting %lld\n",
> +			test_name, (long long)old, (long long)orig_v);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_memcpy_expect_fault_op(void *dst, void *src, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			/* Return EAGAIN on fault. */
> +			.u.memcpy_op.expect_fault_src = 1,
> +		},
> +	};
> +	int cpu;
> +
> +	cpu = cpu_op_get_current_cpu();
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +static int test_memcpy_fault(void)
> +{
> +	int ret;
> +	char buf1[TESTBUFLEN];
> +	const char *test_name = "test_memcpy_fault";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy */
> +	ret = test_memcpy_op(buf1, NULL, TESTBUFLEN);
> +	if (!ret || (ret < 0 && errno != EFAULT)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	/* Test memcpy expect fault */
> +	ret = test_memcpy_expect_fault_op(buf1, NULL, TESTBUFLEN);
> +	if (!ret || (ret < 0 && errno != EAGAIN)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +static int do_test_unknown_op(void)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = -1,	/* Unknown */
> +			.len = 0,
> +		},
> +	};
> +	int cpu;
> +
> +	cpu = cpu_op_get_current_cpu();
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +static int test_unknown_op(void)
> +{
> +	int ret;
> +	const char *test_name = "test_unknown_op";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	ret = do_test_unknown_op();
> +	if (!ret || (ret < 0 && errno != EINVAL)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +static int do_test_max_ops(void)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = { .op = CPU_MB_OP, },
> +		[1] = { .op = CPU_MB_OP, },
> +		[2] = { .op = CPU_MB_OP, },
> +		[3] = { .op = CPU_MB_OP, },
> +		[4] = { .op = CPU_MB_OP, },
> +		[5] = { .op = CPU_MB_OP, },
> +		[6] = { .op = CPU_MB_OP, },
> +		[7] = { .op = CPU_MB_OP, },
> +		[8] = { .op = CPU_MB_OP, },
> +		[9] = { .op = CPU_MB_OP, },
> +		[10] = { .op = CPU_MB_OP, },
> +		[11] = { .op = CPU_MB_OP, },
> +		[12] = { .op = CPU_MB_OP, },
> +		[13] = { .op = CPU_MB_OP, },
> +		[14] = { .op = CPU_MB_OP, },
> +		[15] = { .op = CPU_MB_OP, },
> +	};
> +	int cpu;
> +
> +	cpu = cpu_op_get_current_cpu();
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +static int test_max_ops(void)
> +{
> +	int ret;
> +	const char *test_name = "test_max_ops";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	ret = do_test_max_ops();
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +static int do_test_too_many_ops(void)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = { .op = CPU_MB_OP, },
> +		[1] = { .op = CPU_MB_OP, },
> +		[2] = { .op = CPU_MB_OP, },
> +		[3] = { .op = CPU_MB_OP, },
> +		[4] = { .op = CPU_MB_OP, },
> +		[5] = { .op = CPU_MB_OP, },
> +		[6] = { .op = CPU_MB_OP, },
> +		[7] = { .op = CPU_MB_OP, },
> +		[8] = { .op = CPU_MB_OP, },
> +		[9] = { .op = CPU_MB_OP, },
> +		[10] = { .op = CPU_MB_OP, },
> +		[11] = { .op = CPU_MB_OP, },
> +		[12] = { .op = CPU_MB_OP, },
> +		[13] = { .op = CPU_MB_OP, },
> +		[14] = { .op = CPU_MB_OP, },
> +		[15] = { .op = CPU_MB_OP, },
> +		[16] = { .op = CPU_MB_OP, },
> +	};
> +	int cpu;
> +
> +	cpu = cpu_op_get_current_cpu();
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +static int test_too_many_ops(void)
> +{
> +	int ret;
> +	const char *test_name = "test_too_many_ops";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	ret = do_test_too_many_ops();
> +	if (!ret || (ret < 0 && errno != EINVAL)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +/* Use 64kB len, largest page size known on Linux. */
> +static int test_memcpy_single_too_large(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN_PAGE_MAX + 1];
> +	char buf2[TESTBUFLEN_PAGE_MAX + 1];
> +	const char *test_name = "test_memcpy_single_too_large";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy */
> +	for (i = 0; i < TESTBUFLEN_PAGE_MAX + 1; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN_PAGE_MAX + 1);
> +	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN_PAGE_MAX + 1);
> +	if (!ret || (ret < 0 && errno != EINVAL)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +static int test_memcpy_single_ok_sum_too_large_op(void *dst, void *src, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_memcpy_single_ok_sum_too_large(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_memcpy_single_ok_sum_too_large";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy */
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN);
> +	ret = test_memcpy_single_ok_sum_too_large_op(buf2, buf1, TESTBUFLEN);
> +	if (!ret || (ret < 0 && errno != EINVAL)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Iterate over large uninitialized arrays to trigger page faults.
> + */
> +int test_page_fault(void)
> +{
> +	int ret = 0;
> +	uint64_t i;
> +	const char *test_name = "test_page_fault";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	for (i = 0; i < NR_PF_ARRAY; i++) {
> +		ret = test_memcpy_op(pf_array_dst[i],
> +				     pf_array_src[i],
> +				     PF_ARRAY_LEN);
> +		if (ret) {
> +			printf("%s returned with %d, errno: %s\n",
> +				test_name, ret, strerror(errno));
> +			return ret;
> +		}
> +	}
> +	return 0;
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	int ret = 0;
> +
> +	ret |= test_compare_eq_same();
> +	ret |= test_compare_eq_diff();
> +	ret |= test_compare_ne_same();
> +	ret |= test_compare_ne_diff();
> +	ret |= test_2compare_eq_index();
> +	ret |= test_2compare_ne_index();
> +	ret |= test_memcpy();
> +	ret |= test_memcpy_u32();
> +	ret |= test_memcpy_mb_memcpy();
> +	ret |= test_add();
> +	ret |= test_two_add();
> +	ret |= test_or();
> +	ret |= test_and();
> +	ret |= test_xor();
> +	ret |= test_lshift();
> +	ret |= test_rshift();
> +	ret |= test_cmpxchg_success();
> +	ret |= test_cmpxchg_fail();
> +	ret |= test_memcpy_fault();
> +	ret |= test_unknown_op();
> +	ret |= test_max_ops();
> +	ret |= test_too_many_ops();
> +	ret |= test_memcpy_single_too_large();
> +	ret |= test_memcpy_single_ok_sum_too_large();
> +	ret |= test_page_fault();
> +

Where do pass counts get printed. I am seeing error messages when tests fail,
not seeing any pass messages. It would be nice to use ksft framework for
counting pass/fail for these series of tests that get run.

> +	return ret;
> +}
> diff --git a/tools/testing/selftests/cpu-opv/cpu-op.c b/tools/testing/selftests/cpu-opv/cpu-op.c
> new file mode 100644
> index 000000000000..d7ba481cca04
> --- /dev/null
> +++ b/tools/testing/selftests/cpu-opv/cpu-op.c
> @@ -0,0 +1,348 @@
> +/*
> + * cpu-op.c
> + *
> + * Copyright (C) 2017 Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; only
> + * version 2.1 of the License.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * Lesser General Public License for more details.
> + */
> +
> +#define _GNU_SOURCE
> +#include <errno.h>
> +#include <sched.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <syscall.h>
> +#include <assert.h>
> +#include <signal.h>
> +
> +#include "cpu-op.h"
> +
> +#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
> +
> +int cpu_opv(struct cpu_op *cpu_opv, int cpuopcnt, int cpu, int flags)
> +{
> +	return syscall(__NR_cpu_opv, cpu_opv, cpuopcnt, cpu, flags);
> +}
> +
> +int cpu_op_get_current_cpu(void)
> +{
> +	int cpu;
> +
> +	cpu = sched_getcpu();
> +	if (cpu < 0) {
> +		perror("sched_getcpu()");
> +		abort();
> +	}
> +	return cpu;
> +}
> +
> +int cpu_op_cmpxchg(void *v, void *expect, void *old, void *n,
> +		size_t len, int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			.u.memcpy_op.dst = (unsigned long)old,
> +			.u.memcpy_op.src = (unsigned long)v,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[1] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = len,
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)n,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_add(void *v, int64_t count, size_t len, int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_ADD_OP,
> +			.len = len,
> +			.u.arithmetic_op.p = (unsigned long)v,
> +			.u.arithmetic_op.count = count,
> +			.u.arithmetic_op.expect_fault_p = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +static int cpu_op_cmpeqv_storep_expect_fault(intptr_t *v, intptr_t expect,
> +		intptr_t *newp, int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)newp,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			/* Return EAGAIN on src fault. */
> +			.u.memcpy_op.expect_fault_src = 1,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
> +		off_t voffp, intptr_t *load, int cpu)
> +{
> +	intptr_t oldv = READ_ONCE(*v);
> +	intptr_t *newp = (intptr_t *)(oldv + voffp);
> +	int ret;
> +
> +	if (oldv == expectnot)
> +		return 1;
> +	ret = cpu_op_cmpeqv_storep_expect_fault(v, oldv, newp, cpu);
> +	if (!ret) {
> +		*load = oldv;
> +		return 0;
> +	}
> +	if (ret > 0) {
> +		errno = EAGAIN;
> +		return -1;
> +	}
> +	return -1;
> +}
> +
> +int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v2,
> +			.u.memcpy_op.src = (unsigned long)&newv2,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v2,
> +			.u.memcpy_op.src = (unsigned long)&newv2,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MB_OP,
> +		},
> +		[3] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t expect2, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v2,
> +			.u.compare_op.b = (unsigned long)&expect2,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			.u.memcpy_op.dst = (unsigned long)dst,
> +			.u.memcpy_op.src = (unsigned long)src,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			.u.memcpy_op.dst = (unsigned long)dst,
> +			.u.memcpy_op.src = (unsigned long)src,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MB_OP,
> +		},
> +		[3] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_addv(intptr_t *v, int64_t count, int cpu)
> +{
> +	return cpu_op_add(v, count, sizeof(intptr_t), cpu);
> +}
> diff --git a/tools/testing/selftests/cpu-opv/cpu-op.h b/tools/testing/selftests/cpu-opv/cpu-op.h
> new file mode 100644
> index 000000000000..ba2ec578ec50
> --- /dev/null
> +++ b/tools/testing/selftests/cpu-opv/cpu-op.h
> @@ -0,0 +1,68 @@
> +/*
> + * cpu-op.h
> + *
> + * (C) Copyright 2017 - Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#ifndef CPU_OPV_H
> +#define CPU_OPV_H
> +
> +#include <stdlib.h>
> +#include <stdint.h>
> +#include <linux/cpu_opv.h>
> +
> +#define likely(x)		__builtin_expect(!!(x), 1)
> +#define unlikely(x)		__builtin_expect(!!(x), 0)
> +#define barrier()		__asm__ __volatile__("" : : : "memory")
> +
> +#define ACCESS_ONCE(x)		(*(__volatile__  __typeof__(x) *)&(x))
> +#define WRITE_ONCE(x, v)	__extension__ ({ ACCESS_ONCE(x) = (v); })
> +#define READ_ONCE(x)		ACCESS_ONCE(x)
> +
> +int cpu_opv(struct cpu_op *cpuopv, int cpuopcnt, int cpu, int flags);
> +int cpu_op_get_current_cpu(void);
> +
> +int cpu_op_cmpxchg(void *v, void *expect, void *old, void *_new,
> +		size_t len, int cpu);
> +int cpu_op_add(void *v, int64_t count, size_t len, int cpu);
> +
> +int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu);
> +int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
> +		off_t voffp, intptr_t *load, int cpu);
> +int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu);
> +int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu);
> +int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t expect2, intptr_t newv,
> +		int cpu);
> +int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu);
> +int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu);
> +int cpu_op_addv(intptr_t *v, int64_t count, int cpu);
> +
> +#endif  /* CPU_OPV_H_ */
> diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
> index 5bef05d6ba39..441d7bc63bb7 100644
> --- a/tools/testing/selftests/lib.mk
> +++ b/tools/testing/selftests/lib.mk
> @@ -105,6 +105,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c
>  LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
>  endif
>  
> +# Selftest makefiles can override those targets by setting
> +# OVERRIDE_TARGETS = 1.
> +ifeq ($(OVERRIDE_TARGETS),)
>  $(OUTPUT)/%:%.c
>  	$(LINK.c) $^ $(LDLIBS) -o $@
>  
> @@ -113,5 +116,6 @@ $(OUTPUT)/%.o:%.S
>  
>  $(OUTPUT)/%:%.S
>  	$(LINK.S) $^ $(LDLIBS) -o $@
> +endif
>  
>  .PHONY: run_tests all clean install emit_tests
> 

As I said before, please do this change in a separate patch.

thanks,
-- Shuah

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 14/22] cpu_opv: selftests: Implement selftests
@ 2017-11-21 15:17     ` shuah
  0 siblings, 0 replies; 175+ messages in thread
From: Shuah Khan @ 2017-11-21 15:17 UTC (permalink / raw)
  To: Mathieu Desnoyers, Peter Zijlstra, Paul E . McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk,
	linux-kselftest-u79uwXL29TY76Z2rM5mHXA, Shuah Khan, Shuah Khan

On 11/21/2017 07:18 AM, Mathieu Desnoyers wrote:
> Implement cpu_opv selftests. It needs to express dependencies on
> header files and .so, which require to override the selftests
> lib.mk targets. Introduce a new OVERRIDE_TARGETS define for this.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
> CC: Russell King <linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>
> CC: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
> CC: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
> CC: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
> CC: Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> CC: Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> CC: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
> CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
> CC: Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
> CC: Dave Watson <davejwatson-b10kYP2dOMg@public.gmane.org>
> CC: Chris Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
> CC: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> CC: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
> CC: Ben Maurer <bmaurer-b10kYP2dOMg@public.gmane.org>
> CC: Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>
> CC: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> CC: Josh Triplett <josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org>
> CC: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> CC: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> CC: Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> CC: Shuah Khan <shuah-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> CC: linux-kselftest-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> CC: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> ---
> Changes since v1:
> 
> - Expose similar library API as rseq:  Expose library API closely
>   matching the rseq APIs, following removal of the event counter from
>   the rseq kernel API.
> - Update makefile to fix make run_tests dependency on "all".
> - Introduce a OVERRIDE_TARGETS.
> 
> Changes since v2:
> 
> - Test page faults.
> ---
>  MAINTAINERS                                        |    1 +
>  tools/testing/selftests/Makefile                   |    1 +
>  tools/testing/selftests/cpu-opv/.gitignore         |    1 +
>  tools/testing/selftests/cpu-opv/Makefile           |   17 +
>  .../testing/selftests/cpu-opv/basic_cpu_opv_test.c | 1189 ++++++++++++++++++++
>  tools/testing/selftests/cpu-opv/cpu-op.c           |  348 ++++++
>  tools/testing/selftests/cpu-opv/cpu-op.h           |   68 ++
>  tools/testing/selftests/lib.mk                     |    4 +

Please make the change to add OVERRIDE_TARGETS lib.mk a separate patch.
This is most likely going to be the first patch in this series.

>  8 files changed, 1629 insertions(+)
>  create mode 100644 tools/testing/selftests/cpu-opv/.gitignore
>  create mode 100644 tools/testing/selftests/cpu-opv/Makefile
>  create mode 100644 tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
>  create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.c
>  create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0b4e504f5003..c6c2436d15f8 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3734,6 +3734,7 @@ L:	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>  S:	Supported
>  F:	kernel/cpu_opv.c
>  F:	include/uapi/linux/cpu_opv.h
> +F:	tools/testing/selftests/cpu-opv/
>  
>  CRAMFS FILESYSTEM
>  M:	Nicolas Pitre <nico-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
> index eaf599dc2137..fc1eba0e0130 100644
> --- a/tools/testing/selftests/Makefile
> +++ b/tools/testing/selftests/Makefile
> @@ -5,6 +5,7 @@ TARGETS += breakpoints
>  TARGETS += capabilities
>  TARGETS += cpufreq
>  TARGETS += cpu-hotplug
> +TARGETS += cpu-opv
>  TARGETS += efivarfs
>  TARGETS += exec
>  TARGETS += firmware
> diff --git a/tools/testing/selftests/cpu-opv/.gitignore b/tools/testing/selftests/cpu-opv/.gitignore
> new file mode 100644
> index 000000000000..c7186eb95cf5
> --- /dev/null
> +++ b/tools/testing/selftests/cpu-opv/.gitignore
> @@ -0,0 +1 @@
> +basic_cpu_opv_test
> diff --git a/tools/testing/selftests/cpu-opv/Makefile b/tools/testing/selftests/cpu-opv/Makefile
> new file mode 100644
> index 000000000000..21e63545d521
> --- /dev/null
> +++ b/tools/testing/selftests/cpu-opv/Makefile
> @@ -0,0 +1,17 @@
> +CFLAGS += -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./
> +
> +# Own dependencies because we only want to build against 1st prerequisite, but
> +# still track changes to header files and depend on shared object.
> +OVERRIDE_TARGETS = 1
> +
> +TEST_GEN_PROGS = basic_cpu_opv_test
> +
> +TEST_GEN_PROGS_EXTENDED = libcpu-op.so


> +
> +include ../lib.mk
> +
> +$(OUTPUT)/libcpu-op.so: cpu-op.c cpu-op.h
> +	$(CC) $(CFLAGS) -shared -fPIC $< -o $@
> +
> +$(OUTPUT)/%: %.c $(TEST_GEN_PROGS_EXTENDED) cpu-op.h
> +	$(CC) $(CFLAGS) $< -lcpu-op -o $@
> diff --git a/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
> new file mode 100644
> index 000000000000..a31a10bbd8aa
> --- /dev/null
> +++ b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
> @@ -0,0 +1,1189 @@
> +/*
> + * Basic test coverage for cpu_opv system call.
> + */
> +
> +#define _GNU_SOURCE
> +#include <assert.h>
> +#include <sched.h>
> +#include <signal.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <sys/time.h>
> +#include <errno.h>
> +#include <stdlib.h>
> +
> +#include "cpu-op.h"
> +
> +#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
> +
> +#define TESTBUFLEN	4096
> +#define TESTBUFLEN_CMP	16
> +
> +#define TESTBUFLEN_PAGE_MAX	65536
> +
> +#define NR_PF_ARRAY	16384
> +#define PF_ARRAY_LEN	4096
> +
> +/* 64 MB arrays for page fault testing. */
> +char pf_array_dst[NR_PF_ARRAY][PF_ARRAY_LEN];
> +char pf_array_src[NR_PF_ARRAY][PF_ARRAY_LEN];
> +
> +static int test_compare_eq_op(char *a, char *b, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_compare_eq_same(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_compare_eq same";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test compare_eq */
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf2[i] = (char)i;
> +	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret > 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 0);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_compare_eq_diff(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_compare_eq different";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN);
> +	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret == 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 1);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_compare_ne_op(char *a, char *b, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_NE_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_compare_ne_same(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_compare_ne same";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test compare_ne */
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf2[i] = (char)i;
> +	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret == 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 1);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_compare_ne_diff(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_compare_ne different";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN);
> +	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 0);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_2compare_eq_op(char *a, char *b, char *c, char *d,
> +		size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_2compare_eq_index(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN_CMP];
> +	char buf2[TESTBUFLEN_CMP];
> +	char buf3[TESTBUFLEN_CMP];
> +	char buf4[TESTBUFLEN_CMP];
> +	const char *test_name = "test_2compare_eq index";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN_CMP);
> +	memset(buf3, 0, TESTBUFLEN_CMP);
> +	memset(buf4, 0, TESTBUFLEN_CMP);
> +
> +	/* First compare failure is op[0], expect 1. */
> +	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 1) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 1);
> +		return -1;
> +	}
> +
> +	/* All compares succeed. */
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf2[i] = (char)i;
> +	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 0);
> +		return -1;
> +	}
> +
> +	/* First compare failure is op[1], expect 2. */
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf3[i] = (char)i;
> +	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 2) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 2);
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +static int test_2compare_ne_op(char *a, char *b, char *c, char *d,
> +		size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_NE_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_COMPARE_NE_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_2compare_ne_index(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN_CMP];
> +	char buf2[TESTBUFLEN_CMP];
> +	char buf3[TESTBUFLEN_CMP];
> +	char buf4[TESTBUFLEN_CMP];
> +	const char *test_name = "test_2compare_ne index";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	memset(buf1, 0, TESTBUFLEN_CMP);
> +	memset(buf2, 0, TESTBUFLEN_CMP);
> +	memset(buf3, 0, TESTBUFLEN_CMP);
> +	memset(buf4, 0, TESTBUFLEN_CMP);
> +
> +	/* First compare ne failure is op[0], expect 1. */
> +	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 1) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 1);
> +		return -1;
> +	}
> +
> +	/* All compare ne succeed. */
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf1[i] = (char)i;
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf3[i] = (char)i;
> +	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 0);
> +		return -1;
> +	}
> +
> +	/* First compare failure is op[1], expect 2. */
> +	for (i = 0; i < TESTBUFLEN_CMP; i++)
> +		buf4[i] = (char)i;
> +	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret != 2) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 2);
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +static int test_memcpy_op(void *dst, void *src, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_memcpy(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_memcpy";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy */
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN);
> +	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	for (i = 0; i < TESTBUFLEN; i++) {
> +		if (buf2[i] != (char)i) {
> +			printf("%s failed. Expecting '%d', found '%d' at offset %d\n",
> +				test_name, (char)i, buf2[i], i);
> +			return -1;
> +		}
> +	}
> +	return 0;
> +}
> +
> +static int test_memcpy_u32(void)
> +{
> +	int ret;
> +	uint32_t v1, v2;
> +	const char *test_name = "test_memcpy_u32";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy_u32 */
> +	v1 = 42;
> +	v2 = 0;
> +	ret = test_memcpy_op(&v2, &v1, sizeof(v1));
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (v1 != v2) {
> +		printf("%s failed. Expecting '%d', found '%d'\n",
> +			test_name, v1, v2);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_memcpy_mb_memcpy_op(void *dst1, void *src1,
> +		void *dst2, void *src2, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst1),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src1),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MB_OP,
> +		},
> +		[2] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst2),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src2),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_memcpy_mb_memcpy(void)
> +{
> +	int ret;
> +	int v1, v2, v3;
> +	const char *test_name = "test_memcpy_mb_memcpy";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy */
> +	v1 = 42;
> +	v2 = v3 = 0;
> +	ret = test_memcpy_mb_memcpy_op(&v2, &v1, &v3, &v2, sizeof(int));
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (v3 != v1) {
> +		printf("%s failed. Expecting '%d', found '%d'\n",
> +			test_name, v1, v3);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_add_op(int *v, int64_t increment)
> +{
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_op_add(v, increment, sizeof(*v), cpu);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_add(void)
> +{
> +	int orig_v = 42, v, ret;
> +	int increment = 1;
> +	const char *test_name = "test_add";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_add_op(&v, increment);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != orig_v + increment) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_two_add_op(int *v, int64_t *increments)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_ADD_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.arithmetic_op.p, v),
> +			.u.arithmetic_op.count = increments[0],
> +			.u.arithmetic_op.expect_fault_p = 0,
> +		},
> +		[1] = {
> +			.op = CPU_ADD_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.arithmetic_op.p, v),
> +			.u.arithmetic_op.count = increments[1],
> +			.u.arithmetic_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_two_add(void)
> +{
> +	int orig_v = 42, v, ret;
> +	int64_t increments[2] = { 99, 123 };
> +	const char *test_name = "test_two_add";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_two_add_op(&v, increments);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != orig_v + increments[0] + increments[1]) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_or_op(int *v, uint64_t mask)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_OR_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.bitwise_op.p, v),
> +			.u.bitwise_op.mask = mask,
> +			.u.bitwise_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_or(void)
> +{
> +	int orig_v = 0xFF00000, v, ret;
> +	uint32_t mask = 0xFFF;
> +	const char *test_name = "test_or";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_or_op(&v, mask);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != (orig_v | mask)) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v | mask);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_and_op(int *v, uint64_t mask)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_AND_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.bitwise_op.p, v),
> +			.u.bitwise_op.mask = mask,
> +			.u.bitwise_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_and(void)
> +{
> +	int orig_v = 0xF00, v, ret;
> +	uint32_t mask = 0xFFF;
> +	const char *test_name = "test_and";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_and_op(&v, mask);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != (orig_v & mask)) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v & mask);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_xor_op(int *v, uint64_t mask)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_XOR_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.bitwise_op.p, v),
> +			.u.bitwise_op.mask = mask,
> +			.u.bitwise_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_xor(void)
> +{
> +	int orig_v = 0xF00, v, ret;
> +	uint32_t mask = 0xFFF;
> +	const char *test_name = "test_xor";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_xor_op(&v, mask);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != (orig_v ^ mask)) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v ^ mask);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_lshift_op(int *v, uint32_t bits)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_LSHIFT_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.shift_op.p, v),
> +			.u.shift_op.bits = bits,
> +			.u.shift_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_lshift(void)
> +{
> +	int orig_v = 0xF00, v, ret;
> +	uint32_t bits = 5;
> +	const char *test_name = "test_lshift";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_lshift_op(&v, bits);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != (orig_v << bits)) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v << bits);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_rshift_op(int *v, uint32_t bits)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_RSHIFT_OP,
> +			.len = sizeof(*v),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(
> +				.u.shift_op.p, v),
> +			.u.shift_op.bits = bits,
> +			.u.shift_op.expect_fault_p = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_rshift(void)
> +{
> +	int orig_v = 0xF00, v, ret;
> +	uint32_t bits = 5;
> +	const char *test_name = "test_rshift";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_rshift_op(&v, bits);
> +	if (ret) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		return -1;
> +	}
> +	if (v != (orig_v >> bits)) {
> +		printf("%s unexpected value: %d. Should be %d.\n",
> +			test_name, v, orig_v >> bits);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_cmpxchg_op(void *v, void *expect, void *old, void *n,
> +		size_t len)
> +{
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_op_cmpxchg(v, expect, old, n, len, cpu);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +
> +static int test_cmpxchg_success(void)
> +{
> +	int ret;
> +	uint64_t orig_v = 1, v, expect = 1, old = 0, n = 3;
> +	const char *test_name = "test_cmpxchg success";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 0);
> +		return -1;
> +	}
> +	if (v != n) {
> +		printf("%s v is %lld, expecting %lld\n",
> +			test_name, (long long)v, (long long)n);
> +		return -1;
> +	}
> +	if (old != orig_v) {
> +		printf("%s old is %lld, expecting %lld\n",
> +			test_name, (long long)old, (long long)orig_v);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_cmpxchg_fail(void)
> +{
> +	int ret;
> +	uint64_t orig_v = 1, v, expect = 123, old = 0, n = 3;
> +	const char *test_name = "test_cmpxchg fail";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	v = orig_v;
> +	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	if (ret == 0) {
> +		printf("%s returned %d, expecting %d\n",
> +			test_name, ret, 1);
> +		return -1;
> +	}
> +	if (v == n) {
> +		printf("%s v is %lld, expecting %lld\n",
> +			test_name, (long long)v, (long long)orig_v);
> +		return -1;
> +	}
> +	if (old != orig_v) {
> +		printf("%s old is %lld, expecting %lld\n",
> +			test_name, (long long)old, (long long)orig_v);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int test_memcpy_expect_fault_op(void *dst, void *src, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			/* Return EAGAIN on fault. */
> +			.u.memcpy_op.expect_fault_src = 1,
> +		},
> +	};
> +	int cpu;
> +
> +	cpu = cpu_op_get_current_cpu();
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +static int test_memcpy_fault(void)
> +{
> +	int ret;
> +	char buf1[TESTBUFLEN];
> +	const char *test_name = "test_memcpy_fault";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy */
> +	ret = test_memcpy_op(buf1, NULL, TESTBUFLEN);
> +	if (!ret || (ret < 0 && errno != EFAULT)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	/* Test memcpy expect fault */
> +	ret = test_memcpy_expect_fault_op(buf1, NULL, TESTBUFLEN);
> +	if (!ret || (ret < 0 && errno != EAGAIN)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +static int do_test_unknown_op(void)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = -1,	/* Unknown */
> +			.len = 0,
> +		},
> +	};
> +	int cpu;
> +
> +	cpu = cpu_op_get_current_cpu();
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +static int test_unknown_op(void)
> +{
> +	int ret;
> +	const char *test_name = "test_unknown_op";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	ret = do_test_unknown_op();
> +	if (!ret || (ret < 0 && errno != EINVAL)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +static int do_test_max_ops(void)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = { .op = CPU_MB_OP, },
> +		[1] = { .op = CPU_MB_OP, },
> +		[2] = { .op = CPU_MB_OP, },
> +		[3] = { .op = CPU_MB_OP, },
> +		[4] = { .op = CPU_MB_OP, },
> +		[5] = { .op = CPU_MB_OP, },
> +		[6] = { .op = CPU_MB_OP, },
> +		[7] = { .op = CPU_MB_OP, },
> +		[8] = { .op = CPU_MB_OP, },
> +		[9] = { .op = CPU_MB_OP, },
> +		[10] = { .op = CPU_MB_OP, },
> +		[11] = { .op = CPU_MB_OP, },
> +		[12] = { .op = CPU_MB_OP, },
> +		[13] = { .op = CPU_MB_OP, },
> +		[14] = { .op = CPU_MB_OP, },
> +		[15] = { .op = CPU_MB_OP, },
> +	};
> +	int cpu;
> +
> +	cpu = cpu_op_get_current_cpu();
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +static int test_max_ops(void)
> +{
> +	int ret;
> +	const char *test_name = "test_max_ops";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	ret = do_test_max_ops();
> +	if (ret < 0) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +static int do_test_too_many_ops(void)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = { .op = CPU_MB_OP, },
> +		[1] = { .op = CPU_MB_OP, },
> +		[2] = { .op = CPU_MB_OP, },
> +		[3] = { .op = CPU_MB_OP, },
> +		[4] = { .op = CPU_MB_OP, },
> +		[5] = { .op = CPU_MB_OP, },
> +		[6] = { .op = CPU_MB_OP, },
> +		[7] = { .op = CPU_MB_OP, },
> +		[8] = { .op = CPU_MB_OP, },
> +		[9] = { .op = CPU_MB_OP, },
> +		[10] = { .op = CPU_MB_OP, },
> +		[11] = { .op = CPU_MB_OP, },
> +		[12] = { .op = CPU_MB_OP, },
> +		[13] = { .op = CPU_MB_OP, },
> +		[14] = { .op = CPU_MB_OP, },
> +		[15] = { .op = CPU_MB_OP, },
> +		[16] = { .op = CPU_MB_OP, },
> +	};
> +	int cpu;
> +
> +	cpu = cpu_op_get_current_cpu();
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +static int test_too_many_ops(void)
> +{
> +	int ret;
> +	const char *test_name = "test_too_many_ops";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	ret = do_test_too_many_ops();
> +	if (!ret || (ret < 0 && errno != EINVAL)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +/* Use 64kB len, largest page size known on Linux. */
> +static int test_memcpy_single_too_large(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN_PAGE_MAX + 1];
> +	char buf2[TESTBUFLEN_PAGE_MAX + 1];
> +	const char *test_name = "test_memcpy_single_too_large";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy */
> +	for (i = 0; i < TESTBUFLEN_PAGE_MAX + 1; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN_PAGE_MAX + 1);
> +	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN_PAGE_MAX + 1);
> +	if (!ret || (ret < 0 && errno != EINVAL)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +static int test_memcpy_single_ok_sum_too_large_op(void *dst, void *src, size_t len)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
> +			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +	int ret, cpu;
> +
> +	do {
> +		cpu = cpu_op_get_current_cpu();
> +		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +	} while (ret == -1 && errno == EAGAIN);
> +
> +	return ret;
> +}
> +
> +static int test_memcpy_single_ok_sum_too_large(void)
> +{
> +	int i, ret;
> +	char buf1[TESTBUFLEN];
> +	char buf2[TESTBUFLEN];
> +	const char *test_name = "test_memcpy_single_ok_sum_too_large";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	/* Test memcpy */
> +	for (i = 0; i < TESTBUFLEN; i++)
> +		buf1[i] = (char)i;
> +	memset(buf2, 0, TESTBUFLEN);
> +	ret = test_memcpy_single_ok_sum_too_large_op(buf2, buf1, TESTBUFLEN);
> +	if (!ret || (ret < 0 && errno != EINVAL)) {
> +		printf("%s returned with %d, errno: %s\n",
> +			test_name, ret, strerror(errno));
> +		exit(-1);
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Iterate over large uninitialized arrays to trigger page faults.
> + */
> +int test_page_fault(void)
> +{
> +	int ret = 0;
> +	uint64_t i;
> +	const char *test_name = "test_page_fault";
> +
> +	printf("Testing %s\n", test_name);
> +
> +	for (i = 0; i < NR_PF_ARRAY; i++) {
> +		ret = test_memcpy_op(pf_array_dst[i],
> +				     pf_array_src[i],
> +				     PF_ARRAY_LEN);
> +		if (ret) {
> +			printf("%s returned with %d, errno: %s\n",
> +				test_name, ret, strerror(errno));
> +			return ret;
> +		}
> +	}
> +	return 0;
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	int ret = 0;
> +
> +	ret |= test_compare_eq_same();
> +	ret |= test_compare_eq_diff();
> +	ret |= test_compare_ne_same();
> +	ret |= test_compare_ne_diff();
> +	ret |= test_2compare_eq_index();
> +	ret |= test_2compare_ne_index();
> +	ret |= test_memcpy();
> +	ret |= test_memcpy_u32();
> +	ret |= test_memcpy_mb_memcpy();
> +	ret |= test_add();
> +	ret |= test_two_add();
> +	ret |= test_or();
> +	ret |= test_and();
> +	ret |= test_xor();
> +	ret |= test_lshift();
> +	ret |= test_rshift();
> +	ret |= test_cmpxchg_success();
> +	ret |= test_cmpxchg_fail();
> +	ret |= test_memcpy_fault();
> +	ret |= test_unknown_op();
> +	ret |= test_max_ops();
> +	ret |= test_too_many_ops();
> +	ret |= test_memcpy_single_too_large();
> +	ret |= test_memcpy_single_ok_sum_too_large();
> +	ret |= test_page_fault();
> +

Where do pass counts get printed. I am seeing error messages when tests fail,
not seeing any pass messages. It would be nice to use ksft framework for
counting pass/fail for these series of tests that get run.

> +	return ret;
> +}
> diff --git a/tools/testing/selftests/cpu-opv/cpu-op.c b/tools/testing/selftests/cpu-opv/cpu-op.c
> new file mode 100644
> index 000000000000..d7ba481cca04
> --- /dev/null
> +++ b/tools/testing/selftests/cpu-opv/cpu-op.c
> @@ -0,0 +1,348 @@
> +/*
> + * cpu-op.c
> + *
> + * Copyright (C) 2017 Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; only
> + * version 2.1 of the License.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * Lesser General Public License for more details.
> + */
> +
> +#define _GNU_SOURCE
> +#include <errno.h>
> +#include <sched.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <syscall.h>
> +#include <assert.h>
> +#include <signal.h>
> +
> +#include "cpu-op.h"
> +
> +#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
> +
> +int cpu_opv(struct cpu_op *cpu_opv, int cpuopcnt, int cpu, int flags)
> +{
> +	return syscall(__NR_cpu_opv, cpu_opv, cpuopcnt, cpu, flags);
> +}
> +
> +int cpu_op_get_current_cpu(void)
> +{
> +	int cpu;
> +
> +	cpu = sched_getcpu();
> +	if (cpu < 0) {
> +		perror("sched_getcpu()");
> +		abort();
> +	}
> +	return cpu;
> +}
> +
> +int cpu_op_cmpxchg(void *v, void *expect, void *old, void *n,
> +		size_t len, int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			.u.memcpy_op.dst = (unsigned long)old,
> +			.u.memcpy_op.src = (unsigned long)v,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[1] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = len,
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)n,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_add(void *v, int64_t count, size_t len, int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_ADD_OP,
> +			.len = len,
> +			.u.arithmetic_op.p = (unsigned long)v,
> +			.u.arithmetic_op.count = count,
> +			.u.arithmetic_op.expect_fault_p = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +static int cpu_op_cmpeqv_storep_expect_fault(intptr_t *v, intptr_t expect,
> +		intptr_t *newp, int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)newp,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			/* Return EAGAIN on src fault. */
> +			.u.memcpy_op.expect_fault_src = 1,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
> +		off_t voffp, intptr_t *load, int cpu)
> +{
> +	intptr_t oldv = READ_ONCE(*v);
> +	intptr_t *newp = (intptr_t *)(oldv + voffp);
> +	int ret;
> +
> +	if (oldv == expectnot)
> +		return 1;
> +	ret = cpu_op_cmpeqv_storep_expect_fault(v, oldv, newp, cpu);
> +	if (!ret) {
> +		*load = oldv;
> +		return 0;
> +	}
> +	if (ret > 0) {
> +		errno = EAGAIN;
> +		return -1;
> +	}
> +	return -1;
> +}
> +
> +int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v2,
> +			.u.memcpy_op.src = (unsigned long)&newv2,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v2,
> +			.u.memcpy_op.src = (unsigned long)&newv2,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MB_OP,
> +		},
> +		[3] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t expect2, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v2,
> +			.u.compare_op.b = (unsigned long)&expect2,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			.u.memcpy_op.dst = (unsigned long)dst,
> +			.u.memcpy_op.src = (unsigned long)src,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	struct cpu_op opvec[] = {
> +		[0] = {
> +			.op = CPU_COMPARE_EQ_OP,
> +			.len = sizeof(intptr_t),
> +			.u.compare_op.a = (unsigned long)v,
> +			.u.compare_op.b = (unsigned long)&expect,
> +			.u.compare_op.expect_fault_a = 0,
> +			.u.compare_op.expect_fault_b = 0,
> +		},
> +		[1] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = len,
> +			.u.memcpy_op.dst = (unsigned long)dst,
> +			.u.memcpy_op.src = (unsigned long)src,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +		[2] = {
> +			.op = CPU_MB_OP,
> +		},
> +		[3] = {
> +			.op = CPU_MEMCPY_OP,
> +			.len = sizeof(intptr_t),
> +			.u.memcpy_op.dst = (unsigned long)v,
> +			.u.memcpy_op.src = (unsigned long)&newv,
> +			.u.memcpy_op.expect_fault_dst = 0,
> +			.u.memcpy_op.expect_fault_src = 0,
> +		},
> +	};
> +
> +	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
> +}
> +
> +int cpu_op_addv(intptr_t *v, int64_t count, int cpu)
> +{
> +	return cpu_op_add(v, count, sizeof(intptr_t), cpu);
> +}
> diff --git a/tools/testing/selftests/cpu-opv/cpu-op.h b/tools/testing/selftests/cpu-opv/cpu-op.h
> new file mode 100644
> index 000000000000..ba2ec578ec50
> --- /dev/null
> +++ b/tools/testing/selftests/cpu-opv/cpu-op.h
> @@ -0,0 +1,68 @@
> +/*
> + * cpu-op.h
> + *
> + * (C) Copyright 2017 - Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#ifndef CPU_OPV_H
> +#define CPU_OPV_H
> +
> +#include <stdlib.h>
> +#include <stdint.h>
> +#include <linux/cpu_opv.h>
> +
> +#define likely(x)		__builtin_expect(!!(x), 1)
> +#define unlikely(x)		__builtin_expect(!!(x), 0)
> +#define barrier()		__asm__ __volatile__("" : : : "memory")
> +
> +#define ACCESS_ONCE(x)		(*(__volatile__  __typeof__(x) *)&(x))
> +#define WRITE_ONCE(x, v)	__extension__ ({ ACCESS_ONCE(x) = (v); })
> +#define READ_ONCE(x)		ACCESS_ONCE(x)
> +
> +int cpu_opv(struct cpu_op *cpuopv, int cpuopcnt, int cpu, int flags);
> +int cpu_op_get_current_cpu(void);
> +
> +int cpu_op_cmpxchg(void *v, void *expect, void *old, void *_new,
> +		size_t len, int cpu);
> +int cpu_op_add(void *v, int64_t count, size_t len, int cpu);
> +
> +int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu);
> +int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
> +		off_t voffp, intptr_t *load, int cpu);
> +int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu);
> +int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu);
> +int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t expect2, intptr_t newv,
> +		int cpu);
> +int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu);
> +int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu);
> +int cpu_op_addv(intptr_t *v, int64_t count, int cpu);
> +
> +#endif  /* CPU_OPV_H_ */
> diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
> index 5bef05d6ba39..441d7bc63bb7 100644
> --- a/tools/testing/selftests/lib.mk
> +++ b/tools/testing/selftests/lib.mk
> @@ -105,6 +105,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c
>  LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
>  endif
>  
> +# Selftest makefiles can override those targets by setting
> +# OVERRIDE_TARGETS = 1.
> +ifeq ($(OVERRIDE_TARGETS),)
>  $(OUTPUT)/%:%.c
>  	$(LINK.c) $^ $(LDLIBS) -o $@
>  
> @@ -113,5 +116,6 @@ $(OUTPUT)/%.o:%.S
>  
>  $(OUTPUT)/%:%.S
>  	$(LINK.S) $^ $(LDLIBS) -o $@
> +endif
>  
>  .PHONY: run_tests all clean install emit_tests
> 

As I said before, please do this change in a separate patch.

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-21 15:34     ` Shuah Khan
  0 siblings, 0 replies; 175+ messages in thread
From: Shuah Khan @ 2017-11-21 15:34 UTC (permalink / raw)
  To: Mathieu Desnoyers, Peter Zijlstra, Paul E . McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, linux-kselftest, Shuah Khan,
	Shuah Khan

On 11/21/2017 07:18 AM, Mathieu Desnoyers wrote:
> Implements two basic tests of RSEQ functionality, and one more
> exhaustive parameterizable test.
> 
> The first, "basic_test" only asserts that RSEQ works moderately
> correctly. E.g. that the CPUID pointer works.
> 
> "basic_percpu_ops_test" is a slightly more "realistic" variant,
> implementing a few simple per-cpu operations and testing their
> correctness.
> 
> "param_test" is a parametrizable restartable sequences test. See
> the "--help" output for usage.
> 
> A run_param_test.sh script runs many variants of the parametrizable
> tests.
> 
> As part of those tests, a helper library "rseq" implements a user-space
> API around restartable sequences. It uses the cpu_opv system call as
> fallback when single-stepped by a debugger. It exposes the instruction
> pointer addresses where the rseq assembly blocks begin and end, as well
> as the associated abort instruction pointer, in the __rseq_table
> section. This section allows debuggers may know where to place
> breakpoints when single-stepping through assembly blocks which may be
> aborted at any point by the kernel.
> 
> The rseq library expose APIs that present the fast-path operations.
> The new from userspace is, e.g. for a counter increment:
> 
>     cpu = rseq_cpu_start();
>     ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>     if (likely(!ret))
>         return 0;        /* Success. */
>     do {
>         cpu = rseq_current_cpu();
>         ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>         if (likely(!ret))
>             return 0;    /* Success. */
>     } while (ret > 0 || errno == EAGAIN);
>     perror("cpu_op_addv");
>     return -1;           /* Unexpected error. */
> 
> PowerPC tests have been implemented by Boqun Feng.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: Russell King <linux@arm.linux.org.uk>
> CC: Catalin Marinas <catalin.marinas@arm.com>
> CC: Will Deacon <will.deacon@arm.com>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Paul Turner <pjt@google.com>
> CC: Andrew Hunter <ahh@google.com>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Andi Kleen <andi@firstfloor.org>
> CC: Dave Watson <davejwatson@fb.com>
> CC: Chris Lameter <cl@linux.com>
> CC: Ingo Molnar <mingo@redhat.com>
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: Ben Maurer <bmaurer@fb.com>
> CC: Steven Rostedt <rostedt@goodmis.org>
> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> CC: Josh Triplett <josh@joshtriplett.org>
> CC: Linus Torvalds <torvalds@linux-foundation.org>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Boqun Feng <boqun.feng@gmail.com>
> CC: Shuah Khan <shuah@kernel.org>
> CC: linux-kselftest@vger.kernel.org
> CC: linux-api@vger.kernel.org
> ---
> Changes since v1:
> - Provide abort-ip signature: The abort-ip signature is located just
>   before the abort-ip target. It is currently hardcoded, but a
>   user-space application could use the __rseq_table to iterate on all
>   abort-ip targets and use a random value as signature if needed in the
>   future.
> - Add rseq_prepare_unload(): Libraries and JIT code using rseq critical
>   sections need to issue rseq_prepare_unload() on each thread at least
>   once before reclaim of struct rseq_cs.
> - Use initial-exec TLS model, non-weak symbol: The initial-exec model is
>   signal-safe, whereas the global-dynamic model is not.  Remove the
>   "weak" symbol attribute from the __rseq_abi in rseq.c. The rseq.so
>   library will have ownership of that symbol, and there is not reason for
>   an application or user library to try to define that symbol.
>   The expected use is to link against libreq.so, which owns and provide
>   that symbol.
> - Set cpu_id to -2 on register error
> - Add rseq_len syscall parameter, rseq_cs version
> - Ensure disassember-friendly signature: x86 32/64 disassembler have a
>   hard time decoding the instruction stream after a bad instruction. Use
>   a nopl instruction to encode the signature. Suggested by Andy Lutomirski.
> - Exercise parametrized tests variants in a shell scripts.
> - Restartable sequences selftests: Remove use of event counter.
> - Use cpu_id_start field:  With the cpu_id_start field, the C
>   preparation phase of the fast-path does not need to compare cpu_id < 0
>   anymore.
> - Signal-safe registration and refcounting: Allow libraries using
>   librseq.so to register it from signal handlers.
> - Use OVERRIDE_TARGETS in makefile.
> - Use "m" constraints for rseq_cs field.
> 
> Changes since v2:
> - Update based on Thomas Gleixner's comments.
> ---
>  MAINTAINERS                                        |    1 +
>  tools/testing/selftests/Makefile                   |    1 +
>  tools/testing/selftests/rseq/.gitignore            |    4 +

Thanks for the .gitignore files. It is commonly missed change, I end
up adding one to clean things up after tests get in.

>  tools/testing/selftests/rseq/Makefile              |   23 +
>  .../testing/selftests/rseq/basic_percpu_ops_test.c |  333 +++++
>  tools/testing/selftests/rseq/basic_test.c          |   55 +
>  tools/testing/selftests/rseq/param_test.c          | 1285 ++++++++++++++++++++
>  tools/testing/selftests/rseq/rseq-arm.h            |  535 ++++++++
>  tools/testing/selftests/rseq/rseq-ppc.h            |  567 +++++++++
>  tools/testing/selftests/rseq/rseq-x86.h            |  898 ++++++++++++++
>  tools/testing/selftests/rseq/rseq.c                |  116 ++
>  tools/testing/selftests/rseq/rseq.h                |  154 +++
>  tools/testing/selftests/rseq/run_param_test.sh     |  124 ++
>  13 files changed, 4096 insertions(+)
>  create mode 100644 tools/testing/selftests/rseq/.gitignore
>  create mode 100644 tools/testing/selftests/rseq/Makefile
>  create mode 100644 tools/testing/selftests/rseq/basic_percpu_ops_test.c
>  create mode 100644 tools/testing/selftests/rseq/basic_test.c
>  create mode 100644 tools/testing/selftests/rseq/param_test.c
>  create mode 100644 tools/testing/selftests/rseq/rseq-arm.h
>  create mode 100644 tools/testing/selftests/rseq/rseq-ppc.h
>  create mode 100644 tools/testing/selftests/rseq/rseq-x86.h
>  create mode 100644 tools/testing/selftests/rseq/rseq.c
>  create mode 100644 tools/testing/selftests/rseq/rseq.h
>  create mode 100755 tools/testing/selftests/rseq/run_param_test.sh
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c6c2436d15f8..ba9137c1f295 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11634,6 +11634,7 @@ S:	Supported
>  F:	kernel/rseq.c
>  F:	include/uapi/linux/rseq.h
>  F:	include/trace/events/rseq.h
> +F:	tools/testing/selftests/rseq/
>  
>  RFKILL
>  M:	Johannes Berg <johannes@sipsolutions.net>
> diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
> index fc1eba0e0130..fc314334628a 100644
> --- a/tools/testing/selftests/Makefile
> +++ b/tools/testing/selftests/Makefile
> @@ -26,6 +26,7 @@ TARGETS += nsfs
>  TARGETS += powerpc
>  TARGETS += pstore
>  TARGETS += ptrace
> +TARGETS += rseq
>  TARGETS += seccomp
>  TARGETS += sigaltstack
>  TARGETS += size
> diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore
> new file mode 100644
> index 000000000000..9409c3db99b2
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/.gitignore
> @@ -0,0 +1,4 @@
> +basic_percpu_ops_test
> +basic_test
> +basic_rseq_op_test
> +param_test
> diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
> new file mode 100644
> index 000000000000..e4f638e5752c
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/Makefile
> @@ -0,0 +1,23 @@
> +CFLAGS += -O2 -Wall -g -I./ -I../cpu-opv/ -I../../../../usr/include/ -L./ -Wl,-rpath=./
> +LDLIBS += -lpthread
> +
> +# Own dependencies because we only want to build against 1st prerequisite, but
> +# still track changes to header files and depend on shared object.
> +OVERRIDE_TARGETS = 1
> +
> +TEST_GEN_PROGS = basic_test basic_percpu_ops_test param_test
> +
> +TEST_GEN_PROGS_EXTENDED = librseq.so libcpu-op.so
> +
> +TEST_PROGS = run_param_test.sh
> +
> +include ../lib.mk
> +
> +$(OUTPUT)/librseq.so: rseq.c rseq.h rseq-*.h
> +	$(CC) $(CFLAGS) -shared -fPIC $< $(LDLIBS) -o $@
> +
> +$(OUTPUT)/libcpu-op.so: ../cpu-opv/cpu-op.c ../cpu-opv/cpu-op.h
> +	$(CC) $(CFLAGS) -shared -fPIC $< $(LDLIBS) -o $@
> +
> +$(OUTPUT)/%: %.c $(TEST_GEN_PROGS_EXTENDED) rseq.h rseq-*.h ../cpu-opv/cpu-op.h
> +	$(CC) $(CFLAGS) $< $(LDLIBS) -lrseq -lcpu-op -o $@
> diff --git a/tools/testing/selftests/rseq/basic_percpu_ops_test.c b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
> new file mode 100644
> index 000000000000..e5f7fed06a03
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
> @@ -0,0 +1,333 @@
> +#define _GNU_SOURCE
> +#include <assert.h>
> +#include <pthread.h>
> +#include <sched.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <stddef.h>
> +
> +#include "rseq.h"
> +#include "cpu-op.h"
> +
> +#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
> +
> +struct percpu_lock_entry {
> +	intptr_t v;
> +} __attribute__((aligned(128)));
> +
> +struct percpu_lock {
> +	struct percpu_lock_entry c[CPU_SETSIZE];
> +};
> +
> +struct test_data_entry {
> +	intptr_t count;
> +} __attribute__((aligned(128)));
> +
> +struct spinlock_test_data {
> +	struct percpu_lock lock;
> +	struct test_data_entry c[CPU_SETSIZE];
> +	int reps;
> +};
> +
> +struct percpu_list_node {
> +	intptr_t data;
> +	struct percpu_list_node *next;
> +};
> +
> +struct percpu_list_entry {
> +	struct percpu_list_node *head;
> +} __attribute__((aligned(128)));
> +
> +struct percpu_list {
> +	struct percpu_list_entry c[CPU_SETSIZE];
> +};
> +
> +/* A simple percpu spinlock.  Returns the cpu lock was acquired on. */
> +int rseq_percpu_lock(struct percpu_lock *lock)
> +{
> +	int cpu;
> +
> +	for (;;) {
> +		int ret;
> +
> +#ifndef SKIP_FASTPATH
> +		/* Try fast path. */
> +		cpu = rseq_cpu_start();
> +		ret = rseq_cmpeqv_storev(&lock->c[cpu].v,
> +				0, 1, cpu);
> +		if (likely(!ret))
> +			break;
> +		if (ret > 0)
> +			continue;	/* Retry. */
> +#endif
> +	slowpath:
> +		__attribute__((unused));
> +		/* Fallback on cpu_opv system call. */
> +		cpu = rseq_current_cpu();
> +		ret = cpu_op_cmpeqv_storev(&lock->c[cpu].v, 0, 1, cpu);
> +		if (likely(!ret))
> +			break;
> +		assert(ret >= 0 || errno == EAGAIN);
> +	}
> +	/*
> +	 * Acquire semantic when taking lock after control dependency.
> +	 * Matches rseq_smp_store_release().
> +	 */
> +	rseq_smp_acquire__after_ctrl_dep();
> +	return cpu;
> +}
> +
> +void rseq_percpu_unlock(struct percpu_lock *lock, int cpu)
> +{
> +	assert(lock->c[cpu].v == 1);
> +	/*
> +	 * Release lock, with release semantic. Matches
> +	 * rseq_smp_acquire__after_ctrl_dep().
> +	 */
> +	rseq_smp_store_release(&lock->c[cpu].v, 0);
> +}
> +
> +void *test_percpu_spinlock_thread(void *arg)
> +{
> +	struct spinlock_test_data *data = arg;
> +	int i, cpu;
> +
> +	if (rseq_register_current_thread()) {
> +		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
> +			errno, strerror(errno));
> +		abort();
> +	}
> +	for (i = 0; i < data->reps; i++) {
> +		cpu = rseq_percpu_lock(&data->lock);
> +		data->c[cpu].count++;
> +		rseq_percpu_unlock(&data->lock, cpu);
> +	}
> +	if (rseq_unregister_current_thread()) {
> +		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
> +			errno, strerror(errno));
> +		abort();
> +	}
> +
> +	return NULL;
> +}
> +
> +/*
> + * A simple test which implements a sharded counter using a per-cpu
> + * lock.  Obviously real applications might prefer to simply use a
> + * per-cpu increment; however, this is reasonable for a test and the
> + * lock can be extended to synchronize more complicated operations.
> + */
> +void test_percpu_spinlock(void)
> +{
> +	const int num_threads = 200;
> +	int i;
> +	uint64_t sum;
> +	pthread_t test_threads[num_threads];
> +	struct spinlock_test_data data;
> +
> +	memset(&data, 0, sizeof(data));
> +	data.reps = 5000;
> +
> +	for (i = 0; i < num_threads; i++)
> +		pthread_create(&test_threads[i], NULL,
> +			test_percpu_spinlock_thread, &data);
> +
> +	for (i = 0; i < num_threads; i++)
> +		pthread_join(test_threads[i], NULL);
> +
> +	sum = 0;
> +	for (i = 0; i < CPU_SETSIZE; i++)
> +		sum += data.c[i].count;
> +
> +	assert(sum == (uint64_t)data.reps * num_threads);
> +}
> +
> +int percpu_list_push(struct percpu_list *list, struct percpu_list_node *node)
> +{
> +	intptr_t *targetptr, newval, expect;
> +	int cpu, ret;
> +
> +#ifndef SKIP_FASTPATH
> +	/* Try fast path. */
> +	cpu = rseq_cpu_start();
> +	/* Load list->c[cpu].head with single-copy atomicity. */
> +	expect = (intptr_t)READ_ONCE(list->c[cpu].head);
> +	newval = (intptr_t)node;
> +	targetptr = (intptr_t *)&list->c[cpu].head;
> +	node->next = (struct percpu_list_node *)expect;
> +	ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu);
> +	if (likely(!ret))
> +		return cpu;
> +#endif
> +	/* Fallback on cpu_opv system call. */
> +	slowpath:
> +		__attribute__((unused));
> +	for (;;) {
> +		cpu = rseq_current_cpu();
> +		/* Load list->c[cpu].head with single-copy atomicity. */
> +		expect = (intptr_t)READ_ONCE(list->c[cpu].head);
> +		newval = (intptr_t)node;
> +		targetptr = (intptr_t *)&list->c[cpu].head;
> +		node->next = (struct percpu_list_node *)expect;
> +		ret = cpu_op_cmpeqv_storev(targetptr, expect, newval, cpu);
> +		if (likely(!ret))
> +			break;
> +		assert(ret >= 0 || errno == EAGAIN);
> +	}
> +	return cpu;
> +}
> +
> +/*
> + * Unlike a traditional lock-less linked list; the availability of a
> + * rseq primitive allows us to implement pop without concerns over
> + * ABA-type races.
> + */
> +struct percpu_list_node *percpu_list_pop(struct percpu_list *list)
> +{
> +	struct percpu_list_node *head;
> +	int cpu, ret;
> +
> +#ifndef SKIP_FASTPATH
> +	/* Try fast path. */
> +	cpu = rseq_cpu_start();
> +	ret = rseq_cmpnev_storeoffp_load((intptr_t *)&list->c[cpu].head,
> +		(intptr_t)NULL,
> +		offsetof(struct percpu_list_node, next),
> +		(intptr_t *)&head, cpu);
> +	if (likely(!ret))
> +		return head;
> +	if (ret > 0)
> +		return NULL;
> +#endif
> +	/* Fallback on cpu_opv system call. */
> +	slowpath:
> +		__attribute__((unused));
> +	for (;;) {
> +		cpu = rseq_current_cpu();
> +		ret = cpu_op_cmpnev_storeoffp_load(
> +			(intptr_t *)&list->c[cpu].head,
> +			(intptr_t)NULL,
> +			offsetof(struct percpu_list_node, next),
> +			(intptr_t *)&head, cpu);
> +		if (likely(!ret))
> +			break;
> +		if (ret > 0)
> +			return NULL;
> +		assert(ret >= 0 || errno == EAGAIN);
> +	}
> +	return head;
> +}
> +
> +void *test_percpu_list_thread(void *arg)
> +{
> +	int i;
> +	struct percpu_list *list = (struct percpu_list *)arg;
> +
> +	if (rseq_register_current_thread()) {
> +		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
> +			errno, strerror(errno));
> +		abort();
> +	}
> +
> +	for (i = 0; i < 100000; i++) {
> +		struct percpu_list_node *node = percpu_list_pop(list);
> +
> +		sched_yield();  /* encourage shuffling */
> +		if (node)
> +			percpu_list_push(list, node);
> +	}
> +
> +	if (rseq_unregister_current_thread()) {
> +		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
> +			errno, strerror(errno));
> +		abort();
> +	}
> +
> +	return NULL;
> +}
> +
> +/* Simultaneous modification to a per-cpu linked list from many threads.  */
> +void test_percpu_list(void)
> +{
> +	int i, j;
> +	uint64_t sum = 0, expected_sum = 0;
> +	struct percpu_list list;
> +	pthread_t test_threads[200];
> +	cpu_set_t allowed_cpus;
> +
> +	memset(&list, 0, sizeof(list));
> +
> +	/* Generate list entries for every usable cpu. */
> +	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
> +	for (i = 0; i < CPU_SETSIZE; i++) {
> +		if (!CPU_ISSET(i, &allowed_cpus))
> +			continue;
> +		for (j = 1; j <= 100; j++) {
> +			struct percpu_list_node *node;
> +
> +			expected_sum += j;
> +
> +			node = malloc(sizeof(*node));
> +			assert(node);
> +			node->data = j;
> +			node->next = list.c[i].head;
> +			list.c[i].head = node;
> +		}
> +	}
> +
> +	for (i = 0; i < 200; i++)
> +		assert(pthread_create(&test_threads[i], NULL,
> +			test_percpu_list_thread, &list) == 0);
> +
> +	for (i = 0; i < 200; i++)
> +		pthread_join(test_threads[i], NULL);
> +
> +	for (i = 0; i < CPU_SETSIZE; i++) {
> +		cpu_set_t pin_mask;
> +		struct percpu_list_node *node;
> +
> +		if (!CPU_ISSET(i, &allowed_cpus))
> +			continue;
> +
> +		CPU_ZERO(&pin_mask);
> +		CPU_SET(i, &pin_mask);
> +		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
> +
> +		while ((node = percpu_list_pop(&list))) {
> +			sum += node->data;
> +			free(node);
> +		}
> +	}
> +
> +	/*
> +	 * All entries should now be accounted for (unless some external
> +	 * actor is interfering with our allowed affinity while this
> +	 * test is running).
> +	 */
> +	assert(sum == expected_sum);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	if (rseq_register_current_thread()) {
> +		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
> +			errno, strerror(errno));
> +		goto error;
> +	}
> +	printf("spinlock\n");
> +	test_percpu_spinlock();
> +	printf("percpu_list\n");
> +	test_percpu_list();
> +	if (rseq_unregister_current_thread()) {
> +		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
> +			errno, strerror(errno));
> +		goto error;
> +	}
> +	return 0;
> +
> +error:
> +	return -1;
> +}
> +
> diff --git a/tools/testing/selftests/rseq/basic_test.c b/tools/testing/selftests/rseq/basic_test.c
> new file mode 100644
> index 000000000000..e2086b3885d7
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/basic_test.c
> @@ -0,0 +1,55 @@
> +/*
> + * Basic test coverage for critical regions and rseq_current_cpu().
> + */
> +
> +#define _GNU_SOURCE
> +#include <assert.h>
> +#include <sched.h>
> +#include <signal.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <sys/time.h>
> +
> +#include "rseq.h"
> +
> +void test_cpu_pointer(void)
> +{
> +	cpu_set_t affinity, test_affinity;
> +	int i;
> +
> +	sched_getaffinity(0, sizeof(affinity), &affinity);
> +	CPU_ZERO(&test_affinity);
> +	for (i = 0; i < CPU_SETSIZE; i++) {
> +		if (CPU_ISSET(i, &affinity)) {
> +			CPU_SET(i, &test_affinity);
> +			sched_setaffinity(0, sizeof(test_affinity),
> +					&test_affinity);
> +			assert(sched_getcpu() == i);
> +			assert(rseq_current_cpu() == i);
> +			assert(rseq_current_cpu_raw() == i);
> +			assert(rseq_cpu_start() == i);
> +			CPU_CLR(i, &test_affinity);
> +		}
> +	}
> +	sched_setaffinity(0, sizeof(affinity), &affinity);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	if (rseq_register_current_thread()) {
> +		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
> +			errno, strerror(errno));
> +		goto init_thread_error;
> +	}
> +	printf("testing current cpu\n");
> +	test_cpu_pointer();
> +	if (rseq_unregister_current_thread()) {
> +		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
> +			errno, strerror(errno));
> +		goto init_thread_error;
> +	}
> +	return 0;
> +
> +init_thread_error:
> +	return -1;
> +}
> diff --git a/tools/testing/selftests/rseq/param_test.c b/tools/testing/selftests/rseq/param_test.c
> new file mode 100644
> index 000000000000..c7a16b656a36
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/param_test.c
> @@ -0,0 +1,1285 @@
> +#define _GNU_SOURCE
> +#include <assert.h>
> +#include <pthread.h>
> +#include <sched.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <syscall.h>
> +#include <unistd.h>
> +#include <poll.h>
> +#include <sys/types.h>
> +#include <signal.h>
> +#include <errno.h>
> +#include <stddef.h>
> +
> +#include "cpu-op.h"
> +
> +static inline pid_t gettid(void)
> +{
> +	return syscall(__NR_gettid);
> +}
> +
> +#define NR_INJECT	9
> +static int loop_cnt[NR_INJECT + 1];
> +
> +static int opt_modulo, verbose;
> +
> +static int opt_yield, opt_signal, opt_sleep,
> +		opt_disable_rseq, opt_threads = 200,
> +		opt_disable_mod = 0, opt_test = 's', opt_mb = 0;
> +
> +static long long opt_reps = 5000;
> +
> +static __thread __attribute__((tls_model("initial-exec"))) unsigned int signals_delivered;
> +
> +#ifndef BENCHMARK
> +
> +static __thread __attribute__((tls_model("initial-exec"))) unsigned int yield_mod_cnt, nr_abort;
> +
> +#define printf_verbose(fmt, ...)			\
> +	do {						\
> +		if (verbose)				\
> +			printf(fmt, ## __VA_ARGS__);	\
> +	} while (0)
> +
> +#define RSEQ_INJECT_INPUT \
> +	, [loop_cnt_1]"m"(loop_cnt[1]) \
> +	, [loop_cnt_2]"m"(loop_cnt[2]) \
> +	, [loop_cnt_3]"m"(loop_cnt[3]) \
> +	, [loop_cnt_4]"m"(loop_cnt[4]) \
> +	, [loop_cnt_5]"m"(loop_cnt[5]) \
> +	, [loop_cnt_6]"m"(loop_cnt[6])
> +
> +#if defined(__x86_64__) || defined(__i386__)
> +
> +#define INJECT_ASM_REG	"eax"
> +
> +#define RSEQ_INJECT_CLOBBER \
> +	, INJECT_ASM_REG
> +
> +#define RSEQ_INJECT_ASM(n) \
> +	"mov %[loop_cnt_" #n "], %%" INJECT_ASM_REG "\n\t" \
> +	"test %%" INJECT_ASM_REG ",%%" INJECT_ASM_REG "\n\t" \
> +	"jz 333f\n\t" \
> +	"222:\n\t" \
> +	"dec %%" INJECT_ASM_REG "\n\t" \
> +	"jnz 222b\n\t" \
> +	"333:\n\t"
> +
> +#elif defined(__ARMEL__)
> +
> +#define INJECT_ASM_REG	"r4"
> +
> +#define RSEQ_INJECT_CLOBBER \
> +	, INJECT_ASM_REG
> +
> +#define RSEQ_INJECT_ASM(n) \
> +	"ldr " INJECT_ASM_REG ", %[loop_cnt_" #n "]\n\t" \
> +	"cmp " INJECT_ASM_REG ", #0\n\t" \
> +	"beq 333f\n\t" \
> +	"222:\n\t" \
> +	"subs " INJECT_ASM_REG ", #1\n\t" \
> +	"bne 222b\n\t" \
> +	"333:\n\t"
> +
> +#elif __PPC__
> +#define INJECT_ASM_REG	"r18"
> +
> +#define RSEQ_INJECT_CLOBBER \
> +	, INJECT_ASM_REG
> +
> +#define RSEQ_INJECT_ASM(n) \
> +	"lwz %%" INJECT_ASM_REG ", %[loop_cnt_" #n "]\n\t" \
> +	"cmpwi %%" INJECT_ASM_REG ", 0\n\t" \
> +	"beq 333f\n\t" \
> +	"222:\n\t" \
> +	"subic. %%" INJECT_ASM_REG ", %%" INJECT_ASM_REG ", 1\n\t" \
> +	"bne 222b\n\t" \
> +	"333:\n\t"
> +#else
> +#error unsupported target
> +#endif
> +
> +#define RSEQ_INJECT_FAILED \
> +	nr_abort++;
> +
> +#define RSEQ_INJECT_C(n) \
> +{ \
> +	int loc_i, loc_nr_loops = loop_cnt[n]; \
> +	\
> +	for (loc_i = 0; loc_i < loc_nr_loops; loc_i++) { \
> +		barrier(); \
> +	} \
> +	if (loc_nr_loops == -1 && opt_modulo) { \
> +		if (yield_mod_cnt == opt_modulo - 1) { \
> +			if (opt_sleep > 0) \
> +				poll(NULL, 0, opt_sleep); \
> +			if (opt_yield) \
> +				sched_yield(); \
> +			if (opt_signal) \
> +				raise(SIGUSR1); \
> +			yield_mod_cnt = 0; \
> +		} else { \
> +			yield_mod_cnt++; \
> +		} \
> +	} \
> +}
> +
> +#else
> +
> +#define printf_verbose(fmt, ...)
> +
> +#endif /* BENCHMARK */
> +
> +#include "rseq.h"
> +
> +struct percpu_lock_entry {
> +	intptr_t v;
> +} __attribute__((aligned(128)));
> +
> +struct percpu_lock {
> +	struct percpu_lock_entry c[CPU_SETSIZE];
> +};
> +
> +struct test_data_entry {
> +	intptr_t count;
> +} __attribute__((aligned(128)));
> +
> +struct spinlock_test_data {
> +	struct percpu_lock lock;
> +	struct test_data_entry c[CPU_SETSIZE];
> +};
> +
> +struct spinlock_thread_test_data {
> +	struct spinlock_test_data *data;
> +	long long reps;
> +	int reg;
> +};
> +
> +struct inc_test_data {
> +	struct test_data_entry c[CPU_SETSIZE];
> +};
> +
> +struct inc_thread_test_data {
> +	struct inc_test_data *data;
> +	long long reps;
> +	int reg;
> +};
> +
> +struct percpu_list_node {
> +	intptr_t data;
> +	struct percpu_list_node *next;
> +};
> +
> +struct percpu_list_entry {
> +	struct percpu_list_node *head;
> +} __attribute__((aligned(128)));
> +
> +struct percpu_list {
> +	struct percpu_list_entry c[CPU_SETSIZE];
> +};
> +
> +#define BUFFER_ITEM_PER_CPU	100
> +
> +struct percpu_buffer_node {
> +	intptr_t data;
> +};
> +
> +struct percpu_buffer_entry {
> +	intptr_t offset;
> +	intptr_t buflen;
> +	struct percpu_buffer_node **array;
> +} __attribute__((aligned(128)));
> +
> +struct percpu_buffer {
> +	struct percpu_buffer_entry c[CPU_SETSIZE];
> +};
> +
> +#define MEMCPY_BUFFER_ITEM_PER_CPU	100
> +
> +struct percpu_memcpy_buffer_node {
> +	intptr_t data1;
> +	uint64_t data2;
> +};
> +
> +struct percpu_memcpy_buffer_entry {
> +	intptr_t offset;
> +	intptr_t buflen;
> +	struct percpu_memcpy_buffer_node *array;
> +} __attribute__((aligned(128)));
> +
> +struct percpu_memcpy_buffer {
> +	struct percpu_memcpy_buffer_entry c[CPU_SETSIZE];
> +};
> +
> +/* A simple percpu spinlock.  Returns the cpu lock was acquired on. */
> +static int rseq_percpu_lock(struct percpu_lock *lock)
> +{
> +	int cpu;
> +
> +	for (;;) {
> +		int ret;
> +
> +#ifndef SKIP_FASTPATH
> +		/* Try fast path. */
> +		cpu = rseq_cpu_start();
> +		ret = rseq_cmpeqv_storev(&lock->c[cpu].v,
> +				0, 1, cpu);
> +		if (likely(!ret))
> +			break;
> +		if (ret > 0)
> +			continue;	/* Retry. */
> +#endif> +	slowpath:
> +		__attribute__((unused));
> +		/* Fallback on cpu_opv system call. */
> +		cpu = rseq_current_cpu();
> +		ret = cpu_op_cmpeqv_storev(&lock->c[cpu].v, 0, 1, cpu);
> +		if (likely(!ret))
> +			break;
> +		assert(ret >= 0 || errno == EAGAIN);
> +	}
> +	/*
> +	 * Acquire semantic when taking lock after control dependency.
> +	 * Matches rseq_smp_store_release().
> +	 */
> +	rseq_smp_acquire__after_ctrl_dep();
> +	return cpu;
> +}
> +
> +static void rseq_percpu_unlock(struct percpu_lock *lock, int cpu)
> +{
> +	assert(lock->c[cpu].v == 1);
> +	/*
> +	 * Release lock, with release semantic. Matches
> +	 * rseq_smp_acquire__after_ctrl_dep().
> +	 */
> +	rseq_smp_store_release(&lock->c[cpu].v, 0);
> +}
> +
> +void *test_percpu_spinlock_thread(void *arg)
> +{
> +	struct spinlock_thread_test_data *thread_data = arg;
> +	struct spinlock_test_data *data = thread_data->data;
> +	int cpu;
> +	long long i, reps;
> +
> +	if (!opt_disable_rseq && thread_data->reg
> +			&& rseq_register_current_thread())
> +		abort();
> +	reps = thread_data->reps;
> +	for (i = 0; i < reps; i++) {
> +		cpu = rseq_percpu_lock(&data->lock);
> +		data->c[cpu].count++;
> +		rseq_percpu_unlock(&data->lock, cpu);
> +#ifndef BENCHMARK
> +		if (i != 0 && !(i % (reps / 10)))
> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
> +#endif
> +	}
> +	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
> +		(int) gettid(), nr_abort, signals_delivered);
> +	if (!opt_disable_rseq && thread_data->reg
> +			&& rseq_unregister_current_thread())
> +		abort();
> +	return NULL;
> +}
> +
> +/*
> + * A simple test which implements a sharded counter using a per-cpu
> + * lock.  Obviously real applications might prefer to simply use a
> + * per-cpu increment; however, this is reasonable for a test and the
> + * lock can be extended to synchronize more complicated operations.
> + */
> +void test_percpu_spinlock(void)
> +{
> +	const int num_threads = opt_threads;
> +	int i, ret;
> +	uint64_t sum;
> +	pthread_t test_threads[num_threads];
> +	struct spinlock_test_data data;
> +	struct spinlock_thread_test_data thread_data[num_threads];
> +
> +	memset(&data, 0, sizeof(data));
> +	for (i = 0; i < num_threads; i++) {
> +		thread_data[i].reps = opt_reps;
> +		if (opt_disable_mod <= 0 || (i % opt_disable_mod))
> +			thread_data[i].reg = 1;
> +		else
> +			thread_data[i].reg = 0;
> +		thread_data[i].data = &data;
> +		ret = pthread_create(&test_threads[i], NULL,
> +			test_percpu_spinlock_thread, &thread_data[i]);
> +		if (ret) {
> +			errno = ret;
> +			perror("pthread_create");
> +			abort();
> +		}
> +	}
> +
> +	for (i = 0; i < num_threads; i++) {
> +		pthread_join(test_threads[i], NULL);
> +		if (ret) {
> +			errno = ret;
> +			perror("pthread_join");
> +			abort();
> +		}
> +	}
> +
> +	sum = 0;
> +	for (i = 0; i < CPU_SETSIZE; i++)
> +		sum += data.c[i].count;
> +
> +	assert(sum == (uint64_t)opt_reps * num_threads);
> +}
> +
> +void *test_percpu_inc_thread(void *arg)
> +{
> +	struct inc_thread_test_data *thread_data = arg;
> +	struct inc_test_data *data = thread_data->data;
> +	long long i, reps;
> +
> +	if (!opt_disable_rseq && thread_data->reg
> +			&& rseq_register_current_thread())
> +		abort();
> +	reps = thread_data->reps;
> +	for (i = 0; i < reps; i++) {
> +		int cpu, ret;
> +
> +#ifndef SKIP_FASTPATH
> +		/* Try fast path. */
> +		cpu = rseq_cpu_start();
> +		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
> +		if (likely(!ret))
> +			goto next;
> +#endif

So the test needs to compiled with this enabled? I think it would be better
to make this an argument to be abel to select at test start time as opposed
to making this compile time option. Remember that these tests get run in
automated test rings. Making this a compile time otpion pertty much ensures
that this path will not be tested.

So I would reccommend adding a paratemer.

> +	slowpath:
> +		__attribute__((unused));
> +		for (;;) {
> +			/* Fallback on cpu_opv system call. */
> +			cpu = rseq_current_cpu();
> +			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
> +			if (likely(!ret))
> +				break;
> +			assert(ret >= 0 || errno == EAGAIN);
> +		}
> +	next:
> +		__attribute__((unused));
> +#ifndef BENCHMARK
> +		if (i != 0 && !(i % (reps / 10)))
> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
> +#endif

Same comment as before. Avoid compile time options.

> +	}
> +	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
> +		(int) gettid(), nr_abort, signals_delivered);
> +	if (!opt_disable_rseq && thread_data->reg
> +			&& rseq_unregister_current_thread())
> +		abort();
> +	return NULL;
> +}
> +
> +void test_percpu_inc(void)
> +{
> +	const int num_threads = opt_threads;
> +	int i, ret;
> +	uint64_t sum;
> +	pthread_t test_threads[num_threads];
> +	struct inc_test_data data;
> +	struct inc_thread_test_data thread_data[num_threads];
> +
> +	memset(&data, 0, sizeof(data));
> +	for (i = 0; i < num_threads; i++) {
> +		thread_data[i].reps = opt_reps;
> +		if (opt_disable_mod <= 0 || (i % opt_disable_mod))
> +			thread_data[i].reg = 1;
> +		else
> +			thread_data[i].reg = 0;
> +		thread_data[i].data = &data;
> +		ret = pthread_create(&test_threads[i], NULL,
> +			test_percpu_inc_thread, &thread_data[i]);
> +		if (ret) {
> +			errno = ret;
> +			perror("pthread_create");
> +			abort();
> +		}
> +	}
> +
> +	for (i = 0; i < num_threads; i++) {
> +		pthread_join(test_threads[i], NULL);
> +		if (ret) {
> +			errno = ret;
> +			perror("pthread_join");
> +			abort();
> +		}
> +	}
> +
> +	sum = 0;
> +	for (i = 0; i < CPU_SETSIZE; i++)
> +		sum += data.c[i].count;
> +
> +	assert(sum == (uint64_t)opt_reps * num_threads);
> +}
> +
> +int percpu_list_push(struct percpu_list *list, struct percpu_list_node *node)
> +{
> +	intptr_t *targetptr, newval, expect;
> +	int cpu, ret;
> +
> +#ifndef SKIP_FASTPATH
> +	/* Try fast path. */
> +	cpu = rseq_cpu_start();
> +	/* Load list->c[cpu].head with single-copy atomicity. */
> +	expect = (intptr_t)READ_ONCE(list->c[cpu].head);
> +	newval = (intptr_t)node;
> +	targetptr = (intptr_t *)&list->c[cpu].head;
> +	node->next = (struct percpu_list_node *)expect;
> +	ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu);
> +	if (likely(!ret))
> +		return cpu;
> +#endif
> +	/* Fallback on cpu_opv system call. */
> +slowpath:
> +	__attribute__((unused));
> +	for (;;) {
> +		cpu = rseq_current_cpu();
> +		/* Load list->c[cpu].head with single-copy atomicity. */
> +		expect = (intptr_t)READ_ONCE(list->c[cpu].head);
> +		newval = (intptr_t)node;
> +		targetptr = (intptr_t *)&list->c[cpu].head;
> +		node->next = (struct percpu_list_node *)expect;
> +		ret = cpu_op_cmpeqv_storev(targetptr, expect, newval, cpu);
> +		if (likely(!ret))
> +			break;
> +		assert(ret >= 0 || errno == EAGAIN);
> +	}
> +	return cpu;
> +}
> +
> +/*
> + * Unlike a traditional lock-less linked list; the availability of a
> + * rseq primitive allows us to implement pop without concerns over
> + * ABA-type races.
> + */
> +struct percpu_list_node *percpu_list_pop(struct percpu_list *list)
> +{
> +	struct percpu_list_node *head;
> +	int cpu, ret;
> +
> +#ifndef SKIP_FASTPATH
> +	/* Try fast path. */
> +	cpu = rseq_cpu_start();
> +	ret = rseq_cmpnev_storeoffp_load((intptr_t *)&list->c[cpu].head,
> +		(intptr_t)NULL,
> +		offsetof(struct percpu_list_node, next),
> +		(intptr_t *)&head, cpu);
> +	if (likely(!ret))
> +		return head;
> +	if (ret > 0)
> +		return NULL;
> +#endif
> +	/* Fallback on cpu_opv system call. */
> +	slowpath:
> +		__attribute__((unused));
> +	for (;;) {
> +		cpu = rseq_current_cpu();
> +		ret = cpu_op_cmpnev_storeoffp_load(
> +			(intptr_t *)&list->c[cpu].head,
> +			(intptr_t)NULL,
> +			offsetof(struct percpu_list_node, next),
> +			(intptr_t *)&head, cpu);
> +		if (likely(!ret))
> +			break;
> +		if (ret > 0)
> +			return NULL;
> +		assert(ret >= 0 || errno == EAGAIN);
> +	}
> +	return head;
> +}
> +
> +void *test_percpu_list_thread(void *arg)
> +{
> +	long long i, reps;
> +	struct percpu_list *list = (struct percpu_list *)arg;
> +
> +	if (!opt_disable_rseq && rseq_register_current_thread())
> +		abort();
> +
> +	reps = opt_reps;
> +	for (i = 0; i < reps; i++) {
> +		struct percpu_list_node *node = percpu_list_pop(list);
> +
> +		if (opt_yield)
> +			sched_yield();  /* encourage shuffling */
> +		if (node)
> +			percpu_list_push(list, node);
> +	}
> +
> +	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
> +		(int) gettid(), nr_abort, signals_delivered);
> +	if (!opt_disable_rseq && rseq_unregister_current_thread())
> +		abort();
> +
> +	return NULL;
> +}
> +
> +/* Simultaneous modification to a per-cpu linked list from many threads.  */
> +void test_percpu_list(void)
> +{
> +	const int num_threads = opt_threads;
> +	int i, j, ret;
> +	uint64_t sum = 0, expected_sum = 0;
> +	struct percpu_list list;
> +	pthread_t test_threads[num_threads];
> +	cpu_set_t allowed_cpus;
> +
> +	memset(&list, 0, sizeof(list));
> +
> +	/* Generate list entries for every usable cpu. */
> +	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
> +	for (i = 0; i < CPU_SETSIZE; i++) {
> +		if (!CPU_ISSET(i, &allowed_cpus))
> +			continue;
> +		for (j = 1; j <= 100; j++) {
> +			struct percpu_list_node *node;
> +
> +			expected_sum += j;
> +
> +			node = malloc(sizeof(*node));
> +			assert(node);
> +			node->data = j;
> +			node->next = list.c[i].head;
> +			list.c[i].head = node;
> +		}
> +	}
> +
> +	for (i = 0; i < num_threads; i++) {
> +		ret = pthread_create(&test_threads[i], NULL,
> +			test_percpu_list_thread, &list);
> +		if (ret) {
> +			errno = ret;
> +			perror("pthread_create");
> +			abort();
> +		}
> +	}
> +
> +	for (i = 0; i < num_threads; i++) {
> +		pthread_join(test_threads[i], NULL);
> +		if (ret) {
> +			errno = ret;
> +			perror("pthread_join");
> +			abort();
> +		}
> +	}
> +
> +	for (i = 0; i < CPU_SETSIZE; i++) {
> +		cpu_set_t pin_mask;
> +		struct percpu_list_node *node;
> +
> +		if (!CPU_ISSET(i, &allowed_cpus))
> +			continue;
> +
> +		CPU_ZERO(&pin_mask);
> +		CPU_SET(i, &pin_mask);
> +		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
> +
> +		while ((node = percpu_list_pop(&list))) {
> +			sum += node->data;
> +			free(node);
> +		}
> +	}
> +
> +	/*
> +	 * All entries should now be accounted for (unless some external
> +	 * actor is interfering with our allowed affinity while this
> +	 * test is running).
> +	 */
> +	assert(sum == expected_sum);
> +}
> +
> +bool percpu_buffer_push(struct percpu_buffer *buffer,
> +		struct percpu_buffer_node *node)
> +{
> +	intptr_t *targetptr_spec, newval_spec;
> +	intptr_t *targetptr_final, newval_final;
> +	int cpu, ret;
> +	intptr_t offset;
> +
> +#ifndef SKIP_FASTPATH
> +	/* Try fast path. */
> +	cpu = rseq_cpu_start();
> +	/* Load offset with single-copy atomicity. */
> +	offset = READ_ONCE(buffer->c[cpu].offset);
> +	if (offset == buffer->c[cpu].buflen) {
> +		if (unlikely(cpu != rseq_current_cpu_raw()))
> +			goto slowpath;
> +		return false;
> +	}
> +	newval_spec = (intptr_t)node;
> +	targetptr_spec = (intptr_t *)&buffer->c[cpu].array[offset];
> +	newval_final = offset + 1;
> +	targetptr_final = &buffer->c[cpu].offset;
> +	if (opt_mb)
> +		ret = rseq_cmpeqv_trystorev_storev_release(targetptr_final,
> +			offset, targetptr_spec, newval_spec,
> +			newval_final, cpu);
> +	else
> +		ret = rseq_cmpeqv_trystorev_storev(targetptr_final,
> +			offset, targetptr_spec, newval_spec,
> +			newval_final, cpu);
> +	if (likely(!ret))
> +		return true;
> +#endif
> +slowpath:
> +	__attribute__((unused));
> +	/* Fallback on cpu_opv system call. */
> +	for (;;) {
> +		cpu = rseq_current_cpu();
> +		/* Load offset with single-copy atomicity. */
> +		offset = READ_ONCE(buffer->c[cpu].offset);
> +		if (offset == buffer->c[cpu].buflen)
> +			return false;
> +		newval_spec = (intptr_t)node;
> +		targetptr_spec = (intptr_t *)&buffer->c[cpu].array[offset];
> +		newval_final = offset + 1;
> +		targetptr_final = &buffer->c[cpu].offset;
> +		if (opt_mb)
> +			ret = cpu_op_cmpeqv_storev_mb_storev(targetptr_final,
> +				offset, targetptr_spec, newval_spec,
> +				newval_final, cpu);
> +		else
> +			ret = cpu_op_cmpeqv_storev_storev(targetptr_final,
> +				offset, targetptr_spec, newval_spec,
> +				newval_final, cpu);
> +		if (likely(!ret))
> +			break;
> +		assert(ret >= 0 || errno == EAGAIN);
> +	}
> +	return true;
> +}
> +
> +struct percpu_buffer_node *percpu_buffer_pop(struct percpu_buffer *buffer)
> +{
> +	struct percpu_buffer_node *head;
> +	intptr_t *targetptr, newval;
> +	int cpu, ret;
> +	intptr_t offset;
> +
> +#ifndef SKIP_FASTPATH
> +	/* Try fast path. */
> +	cpu = rseq_cpu_start();
> +	/* Load offset with single-copy atomicity. */
> +	offset = READ_ONCE(buffer->c[cpu].offset);
> +	if (offset == 0) {
> +		if (unlikely(cpu != rseq_current_cpu_raw()))
> +			goto slowpath;
> +		return NULL;
> +	}
> +	head = buffer->c[cpu].array[offset - 1];
> +	newval = offset - 1;
> +	targetptr = (intptr_t *)&buffer->c[cpu].offset;
> +	ret = rseq_cmpeqv_cmpeqv_storev(targetptr, offset,
> +		(intptr_t *)&buffer->c[cpu].array[offset - 1], (intptr_t)head,
> +		newval, cpu);
> +	if (likely(!ret))
> +		return head;
> +#endif
> +slowpath:
> +	__attribute__((unused));
> +	/* Fallback on cpu_opv system call. */
> +	for (;;) {
> +		cpu = rseq_current_cpu();
> +		/* Load offset with single-copy atomicity. */
> +		offset = READ_ONCE(buffer->c[cpu].offset);
> +		if (offset == 0)
> +			return NULL;
> +		head = buffer->c[cpu].array[offset - 1];
> +		newval = offset - 1;
> +		targetptr = (intptr_t *)&buffer->c[cpu].offset;
> +		ret = cpu_op_cmpeqv_cmpeqv_storev(targetptr, offset,
> +			(intptr_t *)&buffer->c[cpu].array[offset - 1],
> +			(intptr_t)head, newval, cpu);
> +		if (likely(!ret))
> +			break;
> +		assert(ret >= 0 || errno == EAGAIN);
> +	}
> +	return head;
> +}
> +
> +void *test_percpu_buffer_thread(void *arg)
> +{
> +	long long i, reps;
> +	struct percpu_buffer *buffer = (struct percpu_buffer *)arg;
> +
> +	if (!opt_disable_rseq && rseq_register_current_thread())
> +		abort();
> +
> +	reps = opt_reps;
> +	for (i = 0; i < reps; i++) {
> +		struct percpu_buffer_node *node = percpu_buffer_pop(buffer);
> +
> +		if (opt_yield)
> +			sched_yield();  /* encourage shuffling */
> +		if (node) {
> +			if (!percpu_buffer_push(buffer, node)) {
> +				/* Should increase buffer size. */
> +				abort();
> +			}
> +		}
> +	}
> +
> +	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
> +		(int) gettid(), nr_abort, signals_delivered);
> +	if (!opt_disable_rseq && rseq_unregister_current_thread())
> +		abort();
> +
> +	return NULL;
> +}
> +
> +/* Simultaneous modification to a per-cpu buffer from many threads.  */
> +void test_percpu_buffer(void)
> +{
> +	const int num_threads = opt_threads;
> +	int i, j, ret;
> +	uint64_t sum = 0, expected_sum = 0;
> +	struct percpu_buffer buffer;
> +	pthread_t test_threads[num_threads];
> +	cpu_set_t allowed_cpus;
> +
> +	memset(&buffer, 0, sizeof(buffer));
> +
> +	/* Generate list entries for every usable cpu. */
> +	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
> +	for (i = 0; i < CPU_SETSIZE; i++) {
> +		if (!CPU_ISSET(i, &allowed_cpus))
> +			continue;
> +		/* Worse-case is every item in same CPU. */
> +		buffer.c[i].array =
> +			malloc(sizeof(*buffer.c[i].array) * CPU_SETSIZE
> +				* BUFFER_ITEM_PER_CPU);
> +		assert(buffer.c[i].array);
> +		buffer.c[i].buflen = CPU_SETSIZE * BUFFER_ITEM_PER_CPU;
> +		for (j = 1; j <= BUFFER_ITEM_PER_CPU; j++) {
> +			struct percpu_buffer_node *node;
> +
> +			expected_sum += j;
> +
> +			/*
> +			 * We could theoretically put the word-sized
> +			 * "data" directly in the buffer. However, we
> +			 * want to model objects that would not fit
> +			 * within a single word, so allocate an object
> +			 * for each node.
> +			 */
> +			node = malloc(sizeof(*node));
> +			assert(node);
> +			node->data = j;
> +			buffer.c[i].array[j - 1] = node;
> +			buffer.c[i].offset++;
> +		}
> +	}
> +
> +	for (i = 0; i < num_threads; i++) {
> +		ret = pthread_create(&test_threads[i], NULL,
> +			test_percpu_buffer_thread, &buffer);
> +		if (ret) {
> +			errno = ret;
> +			perror("pthread_create");
> +			abort();
> +		}
> +	}
> +
> +	for (i = 0; i < num_threads; i++) {
> +		pthread_join(test_threads[i], NULL);
> +		if (ret) {
> +			errno = ret;
> +			perror("pthread_join");
> +			abort();
> +		}
> +	}
> +
> +	for (i = 0; i < CPU_SETSIZE; i++) {
> +		cpu_set_t pin_mask;
> +		struct percpu_buffer_node *node;
> +
> +		if (!CPU_ISSET(i, &allowed_cpus))
> +			continue;
> +
> +		CPU_ZERO(&pin_mask);
> +		CPU_SET(i, &pin_mask);
> +		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
> +
> +		while ((node = percpu_buffer_pop(&buffer))) {
> +			sum += node->data;
> +			free(node);
> +		}
> +		free(buffer.c[i].array);
> +	}
> +
> +	/*
> +	 * All entries should now be accounted for (unless some external
> +	 * actor is interfering with our allowed affinity while this
> +	 * test is running).
> +	 */
> +	assert(sum == expected_sum);
> +}
> +
> +bool percpu_memcpy_buffer_push(struct percpu_memcpy_buffer *buffer,
> +		struct percpu_memcpy_buffer_node item)
> +{
> +	char *destptr, *srcptr;
> +	size_t copylen;
> +	intptr_t *targetptr_final, newval_final;
> +	int cpu, ret;
> +	intptr_t offset;
> +
> +#ifndef SKIP_FASTPATH
> +	/* Try fast path. */
> +	cpu = rseq_cpu_start();
> +	/* Load offset with single-copy atomicity. */
> +	offset = READ_ONCE(buffer->c[cpu].offset);
> +	if (offset == buffer->c[cpu].buflen) {
> +		if (unlikely(cpu != rseq_current_cpu_raw()))
> +			goto slowpath;
> +		return false;
> +	}
> +	destptr = (char *)&buffer->c[cpu].array[offset];
> +	srcptr = (char *)&item;
> +	copylen = sizeof(item);
> +	newval_final = offset + 1;
> +	targetptr_final = &buffer->c[cpu].offset;
> +	if (opt_mb)
> +		ret = rseq_cmpeqv_trymemcpy_storev_release(targetptr_final,
> +			offset, destptr, srcptr, copylen,
> +			newval_final, cpu);
> +	else
> +		ret = rseq_cmpeqv_trymemcpy_storev(targetptr_final,
> +			offset, destptr, srcptr, copylen,
> +			newval_final, cpu);
> +	if (likely(!ret))
> +		return true;
> +#endif
> +slowpath:
> +	__attribute__((unused));
> +	/* Fallback on cpu_opv system call. */
> +	for (;;) {
> +		cpu = rseq_current_cpu();
> +		/* Load offset with single-copy atomicity. */
> +		offset = READ_ONCE(buffer->c[cpu].offset);
> +		if (offset == buffer->c[cpu].buflen)
> +			return false;
> +		destptr = (char *)&buffer->c[cpu].array[offset];
> +		srcptr = (char *)&item;
> +		copylen = sizeof(item);
> +		newval_final = offset + 1;
> +		targetptr_final = &buffer->c[cpu].offset;
> +		/* copylen must be <= PAGE_SIZE. */
> +		if (opt_mb)
> +			ret = cpu_op_cmpeqv_memcpy_mb_storev(targetptr_final,
> +				offset, destptr, srcptr, copylen,
> +				newval_final, cpu);
> +		else
> +			ret = cpu_op_cmpeqv_memcpy_storev(targetptr_final,
> +				offset, destptr, srcptr, copylen,
> +				newval_final, cpu);
> +		if (likely(!ret))
> +			break;
> +		assert(ret >= 0 || errno == EAGAIN);
> +	}
> +	return true;
> +}
> +
> +bool percpu_memcpy_buffer_pop(struct percpu_memcpy_buffer *buffer,
> +		struct percpu_memcpy_buffer_node *item)
> +{
> +	char *destptr, *srcptr;
> +	size_t copylen;
> +	intptr_t *targetptr_final, newval_final;
> +	int cpu, ret;
> +	intptr_t offset;
> +
> +#ifndef SKIP_FASTPATH
> +	/* Try fast path. */
> +	cpu = rseq_cpu_start();
> +	/* Load offset with single-copy atomicity. */
> +	offset = READ_ONCE(buffer->c[cpu].offset);
> +	if (offset == 0) {
> +		if (unlikely(cpu != rseq_current_cpu_raw()))
> +			goto slowpath;
> +		return false;
> +	}
> +	destptr = (char *)item;
> +	srcptr = (char *)&buffer->c[cpu].array[offset - 1];
> +	copylen = sizeof(*item);
> +	newval_final = offset - 1;
> +	targetptr_final = &buffer->c[cpu].offset;
> +	ret = rseq_cmpeqv_trymemcpy_storev(targetptr_final,
> +		offset, destptr, srcptr, copylen,
> +		newval_final, cpu);
> +	if (likely(!ret))
> +		return true;
> +#endif
> +slowpath:
> +	__attribute__((unused));
> +	/* Fallback on cpu_opv system call. */
> +	for (;;) {
> +		cpu = rseq_current_cpu();
> +		/* Load offset with single-copy atomicity. */
> +		offset = READ_ONCE(buffer->c[cpu].offset);
> +		if (offset == 0)
> +			return false;
> +		destptr = (char *)item;
> +		srcptr = (char *)&buffer->c[cpu].array[offset - 1];
> +		copylen = sizeof(*item);
> +		newval_final = offset - 1;
> +		targetptr_final = &buffer->c[cpu].offset;
> +		/* copylen must be <= PAGE_SIZE. */
> +		ret = cpu_op_cmpeqv_memcpy_storev(targetptr_final,
> +			offset, destptr, srcptr, copylen,
> +			newval_final, cpu);
> +		if (likely(!ret))
> +			break;
> +		assert(ret >= 0 || errno == EAGAIN);
> +	}
> +	return true;
> +}
> +
> +void *test_percpu_memcpy_buffer_thread(void *arg)
> +{
> +	long long i, reps;
> +	struct percpu_memcpy_buffer *buffer = (struct percpu_memcpy_buffer *)arg;
> +
> +	if (!opt_disable_rseq && rseq_register_current_thread())
> +		abort();
> +
> +	reps = opt_reps;
> +	for (i = 0; i < reps; i++) {
> +		struct percpu_memcpy_buffer_node item;
> +		bool result;
> +
> +		result = percpu_memcpy_buffer_pop(buffer, &item);
> +		if (opt_yield)
> +			sched_yield();  /* encourage shuffling */
> +		if (result) {
> +			if (!percpu_memcpy_buffer_push(buffer, item)) {
> +				/* Should increase buffer size. */
> +				abort();
> +			}
> +		}
> +	}
> +
> +	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
> +		(int) gettid(), nr_abort, signals_delivered);
> +	if (!opt_disable_rseq && rseq_unregister_current_thread())
> +		abort();
> +
> +	return NULL;
> +}
> +
> +/* Simultaneous modification to a per-cpu buffer from many threads.  */
> +void test_percpu_memcpy_buffer(void)
> +{
> +	const int num_threads = opt_threads;
> +	int i, j, ret;
> +	uint64_t sum = 0, expected_sum = 0;
> +	struct percpu_memcpy_buffer buffer;
> +	pthread_t test_threads[num_threads];
> +	cpu_set_t allowed_cpus;
> +
> +	memset(&buffer, 0, sizeof(buffer));
> +
> +	/* Generate list entries for every usable cpu. */
> +	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
> +	for (i = 0; i < CPU_SETSIZE; i++) {
> +		if (!CPU_ISSET(i, &allowed_cpus))
> +			continue;
> +		/* Worse-case is every item in same CPU. */
> +		buffer.c[i].array =
> +			malloc(sizeof(*buffer.c[i].array) * CPU_SETSIZE
> +				* MEMCPY_BUFFER_ITEM_PER_CPU);
> +		assert(buffer.c[i].array);
> +		buffer.c[i].buflen = CPU_SETSIZE * MEMCPY_BUFFER_ITEM_PER_CPU;
> +		for (j = 1; j <= MEMCPY_BUFFER_ITEM_PER_CPU; j++) {
> +			expected_sum += 2 * j + 1;
> +
> +			/*
> +			 * We could theoretically put the word-sized
> +			 * "data" directly in the buffer. However, we
> +			 * want to model objects that would not fit
> +			 * within a single word, so allocate an object
> +			 * for each node.
> +			 */
> +			buffer.c[i].array[j - 1].data1 = j;
> +			buffer.c[i].array[j - 1].data2 = j + 1;
> +			buffer.c[i].offset++;
> +		}
> +	}
> +
> +	for (i = 0; i < num_threads; i++) {
> +		ret = pthread_create(&test_threads[i], NULL,
> +			test_percpu_memcpy_buffer_thread, &buffer);
> +		if (ret) {
> +			errno = ret;
> +			perror("pthread_create");
> +			abort();
> +		}
> +	}
> +
> +	for (i = 0; i < num_threads; i++) {
> +		pthread_join(test_threads[i], NULL);
> +		if (ret) {
> +			errno = ret;
> +			perror("pthread_join");
> +			abort();
> +		}
> +	}
> +
> +	for (i = 0; i < CPU_SETSIZE; i++) {
> +		cpu_set_t pin_mask;
> +		struct percpu_memcpy_buffer_node item;
> +
> +		if (!CPU_ISSET(i, &allowed_cpus))
> +			continue;
> +
> +		CPU_ZERO(&pin_mask);
> +		CPU_SET(i, &pin_mask);
> +		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
> +
> +		while (percpu_memcpy_buffer_pop(&buffer, &item)) {
> +			sum += item.data1;
> +			sum += item.data2;
> +		}
> +		free(buffer.c[i].array);
> +	}
> +
> +	/*
> +	 * All entries should now be accounted for (unless some external
> +	 * actor is interfering with our allowed affinity while this
> +	 * test is running).
> +	 */
> +	assert(sum == expected_sum);
> +}
> +
> +static void test_signal_interrupt_handler(int signo)
> +{
> +	signals_delivered++;
> +}
> +
> +static int set_signal_handler(void)
> +{
> +	int ret = 0;
> +	struct sigaction sa;
> +	sigset_t sigset;
> +
> +	ret = sigemptyset(&sigset);
> +	if (ret < 0) {
> +		perror("sigemptyset");
> +		return ret;
> +	}
> +
> +	sa.sa_handler = test_signal_interrupt_handler;
> +	sa.sa_mask = sigset;
> +	sa.sa_flags = 0;
> +	ret = sigaction(SIGUSR1, &sa, NULL);
> +	if (ret < 0) {
> +		perror("sigaction");
> +		return ret;
> +	}
> +
> +	printf_verbose("Signal handler set for SIGUSR1\n");
> +
> +	return ret;
> +}
> +
> +static void show_usage(int argc, char **argv)
> +{
> +	printf("Usage : %s <OPTIONS>\n",
> +		argv[0]);
> +	printf("OPTIONS:\n");
> +	printf("	[-1 loops] Number of loops for delay injection 1\n");
> +	printf("	[-2 loops] Number of loops for delay injection 2\n");
> +	printf("	[-3 loops] Number of loops for delay injection 3\n");
> +	printf("	[-4 loops] Number of loops for delay injection 4\n");
> +	printf("	[-5 loops] Number of loops for delay injection 5\n");
> +	printf("	[-6 loops] Number of loops for delay injection 6\n");
> +	printf("	[-7 loops] Number of loops for delay injection 7 (-1 to enable -m)\n");
> +	printf("	[-8 loops] Number of loops for delay injection 8 (-1 to enable -m)\n");
> +	printf("	[-9 loops] Number of loops for delay injection 9 (-1 to enable -m)\n");
> +	printf("	[-m N] Yield/sleep/kill every modulo N (default 0: disabled) (>= 0)\n");
> +	printf("	[-y] Yield\n");
> +	printf("	[-k] Kill thread with signal\n");
> +	printf("	[-s S] S: =0: disabled (default), >0: sleep time (ms)\n");
> +	printf("	[-t N] Number of threads (default 200)\n");
> +	printf("	[-r N] Number of repetitions per thread (default 5000)\n");
> +	printf("	[-d] Disable rseq system call (no initialization)\n");
> +	printf("	[-D M] Disable rseq for each M threads\n");
> +	printf("	[-T test] Choose test: (s)pinlock, (l)ist, (b)uffer, (m)emcpy, (i)ncrement\n");
> +	printf("	[-M] Push into buffer and memcpy buffer with memory barriers.\n");
> +	printf("	[-v] Verbose output.\n");
> +	printf("	[-h] Show this help.\n");
> +	printf("\n");
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	int i;
> +
> +	for (i = 1; i < argc; i++) {
> +		if (argv[i][0] != '-')
> +			continue;
> +		switch (argv[i][1]) {
> +		case '1':
> +		case '2':
> +		case '3':
> +		case '4':
> +		case '5':
> +		case '6':
> +		case '7':
> +		case '8':
> +		case '9':
> +			if (argc < i + 2) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			loop_cnt[argv[i][1] - '0'] = atol(argv[i + 1]);
> +			i++;
> +			break;
> +		case 'm':
> +			if (argc < i + 2) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			opt_modulo = atol(argv[i + 1]);
> +			if (opt_modulo < 0) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			i++;
> +			break;
> +		case 's':
> +			if (argc < i + 2) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			opt_sleep = atol(argv[i + 1]);
> +			if (opt_sleep < 0) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			i++;
> +			break;
> +		case 'y':
> +			opt_yield = 1;
> +			break;
> +		case 'k':
> +			opt_signal = 1;
> +			break;
> +		case 'd':
> +			opt_disable_rseq = 1;
> +			break;
> +		case 'D':
> +			if (argc < i + 2) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			opt_disable_mod = atol(argv[i + 1]);
> +			if (opt_disable_mod < 0) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			i++;
> +			break;
> +		case 't':
> +			if (argc < i + 2) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			opt_threads = atol(argv[i + 1]);
> +			if (opt_threads < 0) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			i++;
> +			break;
> +		case 'r':
> +			if (argc < i + 2) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			opt_reps = atoll(argv[i + 1]);
> +			if (opt_reps < 0) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			i++;
> +			break;
> +		case 'h':
> +			show_usage(argc, argv);
> +			goto end;
> +		case 'T':
> +			if (argc < i + 2) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			opt_test = *argv[i + 1];
> +			switch (opt_test) {
> +			case 's':
> +			case 'l':
> +			case 'i':
> +			case 'b':
> +			case 'm':
> +				break;
> +			default:
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			i++;
> +			break;
> +		case 'v':
> +			verbose = 1;
> +			break;
> +		case 'M':
> +			opt_mb = 1;
> +			break;
> +		default:
> +			show_usage(argc, argv);
> +			goto error;
> +		}
> +	}
> +
> +	if (set_signal_handler())
> +		goto error;
> +
> +	if (!opt_disable_rseq && rseq_register_current_thread())
> +		goto error;
> +	switch (opt_test) {
> +	case 's':
> +		printf_verbose("spinlock\n");
> +		test_percpu_spinlock();
> +		break;
> +	case 'l':
> +		printf_verbose("linked list\n");
> +		test_percpu_list();
> +		break;
> +	case 'b':
> +		printf_verbose("buffer\n");
> +		test_percpu_buffer();
> +		break;
> +	case 'm':
> +		printf_verbose("memcpy buffer\n");
> +		test_percpu_memcpy_buffer();
> +		break;
> +	case 'i':
> +		printf_verbose("counter increment\n");
> +		test_percpu_inc();
> +		break;
> +	}
> +	if (!opt_disable_rseq && rseq_unregister_current_thread())
> +		abort();
> +end:
> +	return 0;
> +
> +error:
> +	return -1;
> +}
> diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h
> new file mode 100644
> index 000000000000..47953c0cef4f
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/rseq-arm.h
> @@ -0,0 +1,535 @@
> +/*
> + * rseq-arm.h
> + *
> + * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#define RSEQ_SIG	0x53053053
> +
> +#define rseq_smp_mb()	__asm__ __volatile__ ("dmb" : : : "memory")
> +#define rseq_smp_rmb()	__asm__ __volatile__ ("dmb" : : : "memory")
> +#define rseq_smp_wmb()	__asm__ __volatile__ ("dmb" : : : "memory")
> +
> +#define rseq_smp_load_acquire(p)					\
> +__extension__ ({							\
> +	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
> +	rseq_smp_mb();							\
> +	____p1;								\
> +})
> +
> +#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
> +
> +#define rseq_smp_store_release(p, v)					\
> +do {									\
> +	rseq_smp_mb();							\
> +	WRITE_ONCE(*p, v);						\
> +} while (0)
> +
> +#define RSEQ_ASM_DEFINE_TABLE(section, version, flags,			\
> +			start_ip, post_commit_offset, abort_ip)		\
> +		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
> +		".balign 32\n\t"					\
> +		".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
> +		".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
> +		".popsection\n\t"
> +
> +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
> +		RSEQ_INJECT_ASM(1)					\
> +		"adr r0, " __rseq_str(cs_label) "\n\t"			\
> +		"str r0, %[" __rseq_str(rseq_cs) "]\n\t"		\
> +		__rseq_str(label) ":\n\t"
> +
> +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
> +		RSEQ_INJECT_ASM(2)					\
> +		"ldr r0, %[" __rseq_str(current_cpu_id) "]\n\t"	\
> +		"cmp %[" __rseq_str(cpu_id) "], r0\n\t"		\
> +		"bne " __rseq_str(label) "\n\t"
> +
> +#define RSEQ_ASM_DEFINE_ABORT(table_label, label, section, sig,		\
> +			teardown, abort_label, version, flags, start_ip,\
> +			post_commit_offset, abort_ip)			\
> +		__rseq_str(table_label) ":\n\t"				\
> +		".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
> +		".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
> +		".word " __rseq_str(RSEQ_SIG) "\n\t"			\
> +		__rseq_str(label) ":\n\t"				\
> +		teardown						\
> +		"b %l[" __rseq_str(abort_label) "]\n\t"
> +
> +#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
> +		__rseq_str(label) ":\n\t"				\
> +		teardown						\
> +		"b %l[" __rseq_str(cmpfail_label) "]\n\t"
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"ldr r0, %[v]\n\t"
> +		"cmp %[expect], r0\n\t"
> +		"bne 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* final store */
> +		"str %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		"b 6f\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
> +			0x0, 0x0, 1b, 2b-1b, 4f)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		"6:\n\t"
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "r0", "memory", "cc"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
> +		off_t voffp, intptr_t *load, int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"ldr r0, %[v]\n\t"
> +		"cmp %[expectnot], r0\n\t"
> +		"beq 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		"str r0, %[load]\n\t"
> +		"add r0, %[voffp]\n\t"
> +		"ldr r0, [r0]\n\t"
> +		/* final store */
> +		"str r0, %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		"b 6f\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
> +			0x0, 0x0, 1b, 2b-1b, 4f)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		"6:\n\t"
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expectnot]"r"(expectnot),
> +		  [voffp]"Ir"(voffp),
> +		  [load]"m"(*load)
> +		  RSEQ_INJECT_INPUT
> +		: "r0", "memory", "cc"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_addv(intptr_t *v, intptr_t count, int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"ldr r0, %[v]\n\t"
> +		"add r0, %[count]\n\t"
> +		/* final store */
> +		"str r0, %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		"b 6f\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
> +			0x0, 0x0, 1b, 2b-1b, 4f)
> +		"6:\n\t"
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  [v]"m"(*v),
> +		  [count]"Ir"(count)
> +		  RSEQ_INJECT_INPUT
> +		: "r0", "memory", "cc"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"ldr r0, %[v]\n\t"
> +		"cmp %[expect], r0\n\t"
> +		"bne 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* try store */
> +		"str %[newv2], %[v2]\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		/* final store */
> +		"str %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		"b 6f\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
> +			0x0, 0x0, 1b, 2b-1b, 4f)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		"6:\n\t"
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* try store input */
> +		  [v2]"m"(*v2),
> +		  [newv2]"r"(newv2),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "r0", "memory", "cc"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"ldr r0, %[v]\n\t"
> +		"cmp %[expect], r0\n\t"
> +		"bne 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* try store */
> +		"str %[newv2], %[v2]\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		"dmb\n\t"	/* full mb provides store-release */
> +		/* final store */
> +		"str %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		"b 6f\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
> +			0x0, 0x0, 1b, 2b-1b, 4f)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		"6:\n\t"
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* try store input */
> +		  [v2]"m"(*v2),
> +		  [newv2]"r"(newv2),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "r0", "memory", "cc"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t expect2, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"ldr r0, %[v]\n\t"
> +		"cmp %[expect], r0\n\t"
> +		"bne 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		"ldr r0, %[v2]\n\t"
> +		"cmp %[expect2], r0\n\t"
> +		"bne 5f\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		/* final store */
> +		"str %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		"b 6f\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
> +			0x0, 0x0, 1b, 2b-1b, 4f)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		"6:\n\t"
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* cmp2 input */
> +		  [v2]"m"(*v2),
> +		  [expect2]"r"(expect2),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "r0", "memory", "cc"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	uint32_t rseq_scratch[3];
> +
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		"str %[src], %[rseq_scratch0]\n\t"
> +		"str %[dst], %[rseq_scratch1]\n\t"
> +		"str %[len], %[rseq_scratch2]\n\t"
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"ldr r0, %[v]\n\t"
> +		"cmp %[expect], r0\n\t"
> +		"bne 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* try memcpy */
> +		"cmp %[len], #0\n\t" \
> +		"beq 333f\n\t" \
> +		"222:\n\t" \
> +		"ldrb %%r0, [%[src]]\n\t" \
> +		"strb %%r0, [%[dst]]\n\t" \
> +		"adds %[src], #1\n\t" \
> +		"adds %[dst], #1\n\t" \
> +		"subs %[len], #1\n\t" \
> +		"bne 222b\n\t" \
> +		"333:\n\t" \
> +		RSEQ_INJECT_ASM(5)
> +		/* final store */
> +		"str %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		/* teardown */
> +		"ldr %[len], %[rseq_scratch2]\n\t"
> +		"ldr %[dst], %[rseq_scratch1]\n\t"
> +		"ldr %[src], %[rseq_scratch0]\n\t"
> +		"b 6f\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG,
> +			/* teardown */
> +			"ldr %[len], %[rseq_scratch2]\n\t"
> +			"ldr %[dst], %[rseq_scratch1]\n\t"
> +			"ldr %[src], %[rseq_scratch0]\n\t",
> +			abort, 0x0, 0x0, 1b, 2b-1b, 4f)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
> +			/* teardown */
> +			"ldr %[len], %[rseq_scratch2]\n\t"
> +			"ldr %[dst], %[rseq_scratch1]\n\t"
> +			"ldr %[src], %[rseq_scratch0]\n\t",
> +			cmpfail)
> +		"6:\n\t"
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv),
> +		  /* try memcpy input */
> +		  [dst]"r"(dst),
> +		  [src]"r"(src),
> +		  [len]"r"(len),
> +		  [rseq_scratch0]"m"(rseq_scratch[0]),
> +		  [rseq_scratch1]"m"(rseq_scratch[1]),
> +		  [rseq_scratch2]"m"(rseq_scratch[2])
> +		  RSEQ_INJECT_INPUT
> +		: "r0", "memory", "cc"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	uint32_t rseq_scratch[3];
> +
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		"str %[src], %[rseq_scratch0]\n\t"
> +		"str %[dst], %[rseq_scratch1]\n\t"
> +		"str %[len], %[rseq_scratch2]\n\t"
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"ldr r0, %[v]\n\t"
> +		"cmp %[expect], r0\n\t"
> +		"bne 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* try memcpy */
> +		"cmp %[len], #0\n\t" \
> +		"beq 333f\n\t" \
> +		"222:\n\t" \
> +		"ldrb %%r0, [%[src]]\n\t" \
> +		"strb %%r0, [%[dst]]\n\t" \
> +		"adds %[src], #1\n\t" \
> +		"adds %[dst], #1\n\t" \
> +		"subs %[len], #1\n\t" \
> +		"bne 222b\n\t" \
> +		"333:\n\t" \
> +		RSEQ_INJECT_ASM(5)
> +		"dmb\n\t"	/* full mb provides store-release */
> +		/* final store */
> +		"str %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		/* teardown */
> +		"ldr %[len], %[rseq_scratch2]\n\t"
> +		"ldr %[dst], %[rseq_scratch1]\n\t"
> +		"ldr %[src], %[rseq_scratch0]\n\t"
> +		"b 6f\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG,
> +			/* teardown */
> +			"ldr %[len], %[rseq_scratch2]\n\t"
> +			"ldr %[dst], %[rseq_scratch1]\n\t"
> +			"ldr %[src], %[rseq_scratch0]\n\t",
> +			abort, 0x0, 0x0, 1b, 2b-1b, 4f)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
> +			/* teardown */
> +			"ldr %[len], %[rseq_scratch2]\n\t"
> +			"ldr %[dst], %[rseq_scratch1]\n\t"
> +			"ldr %[src], %[rseq_scratch0]\n\t",
> +			cmpfail)
> +		"6:\n\t"
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv),
> +		  /* try memcpy input */
> +		  [dst]"r"(dst),
> +		  [src]"r"(src),
> +		  [len]"r"(len),
> +		  [rseq_scratch0]"m"(rseq_scratch[0]),
> +		  [rseq_scratch1]"m"(rseq_scratch[1]),
> +		  [rseq_scratch2]"m"(rseq_scratch[2])
> +		  RSEQ_INJECT_INPUT
> +		: "r0", "memory", "cc"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> diff --git a/tools/testing/selftests/rseq/rseq-ppc.h b/tools/testing/selftests/rseq/rseq-ppc.h
> new file mode 100644
> index 000000000000..3db6be5ceffb
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/rseq-ppc.h
> @@ -0,0 +1,567 @@
> +/*
> + * rseq-ppc.h
> + *
> + * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> + * (C) Copyright 2016 - Boqun Feng <boqun.feng@gmail.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#define RSEQ_SIG	0x53053053
> +
> +#define rseq_smp_mb()		__asm__ __volatile__ ("sync" : : : "memory")
> +#define rseq_smp_lwsync()	__asm__ __volatile__ ("lwsync" : : : "memory")
> +#define rseq_smp_rmb()		rseq_smp_lwsync()
> +#define rseq_smp_wmb()		rseq_smp_lwsync()
> +
> +#define rseq_smp_load_acquire(p)					\
> +__extension__ ({							\
> +	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
> +	rseq_smp_lwsync();						\
> +	____p1;								\
> +})
> +
> +#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_lwsync()
> +
> +#define rseq_smp_store_release(p, v)					\
> +do {									\
> +	rseq_smp_lwsync();						\
> +	RSEQ_WRITE_ONCE(*p, v);						\
> +} while (0)
> +
> +/*
> + * The __rseq_table section can be used by debuggers to better handle
> + * single-stepping through the restartable critical sections.
> + */
> +
> +#ifdef __PPC64__
> +
> +#define STORE_WORD	"std "
> +#define LOAD_WORD	"ld "
> +#define LOADX_WORD	"ldx "
> +#define CMP_WORD	"cmpd "
> +
> +#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,			\
> +			start_ip, post_commit_offset, abort_ip)			\
> +		".pushsection " __rseq_str(section) ", \"aw\"\n\t"		\
> +		".balign 32\n\t"						\
> +		__rseq_str(label) ":\n\t"					\
> +		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t"	\
> +		".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \
> +		".popsection\n\t"
> +
> +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)			\
> +		RSEQ_INJECT_ASM(1)						\
> +		"lis %%r17, (" __rseq_str(cs_label) ")@highest\n\t"		\
> +		"ori %%r17, %%r17, (" __rseq_str(cs_label) ")@higher\n\t"	\
> +		"rldicr %%r17, %%r17, 32, 31\n\t"				\
> +		"oris %%r17, %%r17, (" __rseq_str(cs_label) ")@high\n\t"	\
> +		"ori %%r17, %%r17, (" __rseq_str(cs_label) ")@l\n\t"		\
> +		"std %%r17, %[" __rseq_str(rseq_cs) "]\n\t"			\
> +		__rseq_str(label) ":\n\t"
> +
> +#else /* #ifdef __PPC64__ */
> +
> +#define STORE_WORD	"stw "
> +#define LOAD_WORD	"lwz "
> +#define LOADX_WORD	"lwzx "
> +#define CMP_WORD	"cmpw "
> +
> +#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,			\
> +			start_ip, post_commit_offset, abort_ip)			\
> +		".pushsection " __rseq_str(section) ", \"aw\"\n\t"		\
> +		".balign 32\n\t"						\
> +		__rseq_str(label) ":\n\t"					\
> +		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t"	\
> +		/* 32-bit only supported on BE */				\
> +		".long 0x0, " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) "\n\t" \
> +		".popsection\n\t"
> +
> +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)			\
> +		RSEQ_INJECT_ASM(1)						\
> +		"lis %%r17, (" __rseq_str(cs_label) ")@ha\n\t"			\
> +		"addi %%r17, %%r17, (" __rseq_str(cs_label) ")@l\n\t"		\
> +		"stw %%r17, %[" __rseq_str(rseq_cs) "]\n\t"			\
> +		__rseq_str(label) ":\n\t"
> +
> +#endif /* #ifdef __PPC64__ */
> +
> +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)			\
> +		RSEQ_INJECT_ASM(2)						\
> +		"lwz %%r17, %[" __rseq_str(current_cpu_id) "]\n\t"		\
> +		"cmpw cr7, %[" __rseq_str(cpu_id) "], %%r17\n\t"		\
> +		"bne- cr7, " __rseq_str(label) "\n\t"
> +
> +#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label)	\
> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"		\
> +		".long " __rseq_str(sig) "\n\t"					\
> +		__rseq_str(label) ":\n\t"					\
> +		teardown							\
> +		"b %l[" __rseq_str(abort_label) "]\n\t"			\
> +		".popsection\n\t"
> +
> +#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label)	\
> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"		\
> +		__rseq_str(label) ":\n\t"					\
> +		teardown							\
> +		"b %l[" __rseq_str(cmpfail_label) "]\n\t"			\
> +		".popsection\n\t"
> +
> +
> +/*
> + * RSEQ_ASM_OPs: asm operations for rseq
> + * 	RSEQ_ASM_OP_R_*: has hard-code registers in it
> + * 	RSEQ_ASM_OP_* (else): doesn't have hard-code registers(unless cr7)
> + */
> +#define RSEQ_ASM_OP_CMPEQ(var, expect, label)					\
> +		LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t"			\
> +		CMP_WORD "cr7, %%r17, %[" __rseq_str(expect) "]\n\t"		\
> +		"bne- cr7, " __rseq_str(label) "\n\t"
> +
> +#define RSEQ_ASM_OP_CMPNE(var, expectnot, label)				\
> +		LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t"			\
> +		CMP_WORD "cr7, %%r17, %[" __rseq_str(expectnot) "]\n\t"	\
> +		"beq- cr7, " __rseq_str(label) "\n\t"
> +
> +#define RSEQ_ASM_OP_STORE(value, var)						\
> +		STORE_WORD "%[" __rseq_str(value) "], %[" __rseq_str(var) "]\n\t"
> +
> +/* Load @var to r17 */
> +#define RSEQ_ASM_OP_R_LOAD(var)							\
> +		LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t"
> +
> +/* Store r17 to @var */
> +#define RSEQ_ASM_OP_R_STORE(var)						\
> +		STORE_WORD "%%r17, %[" __rseq_str(var) "]\n\t"
> +
> +/* Add @count to r17 */
> +#define RSEQ_ASM_OP_R_ADD(count)						\
> +		"add %%r17, %[" __rseq_str(count) "], %%r17\n\t"
> +
> +/* Load (r17 + voffp) to r17 */
> +#define RSEQ_ASM_OP_R_LOADX(voffp)						\
> +		LOADX_WORD "%%r17, %[" __rseq_str(voffp) "], %%r17\n\t"
> +
> +/* TODO: implement a faster memcpy. */
> +#define RSEQ_ASM_OP_R_MEMCPY() \
> +		"cmpdi %%r19, 0\n\t" \
> +		"beq 333f\n\t" \
> +		"addi %%r20, %%r20, -1\n\t" \
> +		"addi %%r21, %%r21, -1\n\t" \
> +		"222:\n\t" \
> +		"lbzu %%r18, 1(%%r20)\n\t" \
> +		"stbu %%r18, 1(%%r21)\n\t" \
> +		"addi %%r19, %%r19, -1\n\t" \
> +		"cmpdi %%r19, 0\n\t" \
> +		"bne 222b\n\t" \
> +		"333:\n\t" \
> +
> +#define RSEQ_ASM_OP_R_FINAL_STORE(var, post_commit_label)			\
> +		STORE_WORD "%%r17, %[" __rseq_str(var) "]\n\t"			\
> +		__rseq_str(post_commit_label) ":\n\t"
> +
> +#define RSEQ_ASM_OP_FINAL_STORE(value, var, post_commit_label)			\
> +		STORE_WORD "%[" __rseq_str(value) "], %[" __rseq_str(var) "]\n\t"	\
> +		__rseq_str(post_commit_label) ":\n\t"
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		/* cmp cpuid */
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		/* cmp @v equal to @expect */
> +		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
> +		RSEQ_INJECT_ASM(4)
> +		/* final store */
> +		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
> +		RSEQ_INJECT_ASM(5)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "r17"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
> +		off_t voffp, intptr_t *load, int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		/* cmp cpuid */
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		/* cmp @v not equal to @expectnot */
> +		RSEQ_ASM_OP_CMPNE(v, expectnot, 5f)
> +		RSEQ_INJECT_ASM(4)
> +		/* load the value of @v */
> +		RSEQ_ASM_OP_R_LOAD(v)
> +		/* store it in @load */
> +		RSEQ_ASM_OP_R_STORE(load)
> +		/* dereference voffp(v) */
> +		RSEQ_ASM_OP_R_LOADX(voffp)
> +		/* final store the value at voffp(v) */
> +		RSEQ_ASM_OP_R_FINAL_STORE(v, 2)
> +		RSEQ_INJECT_ASM(5)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expectnot]"r"(expectnot),
> +		  [voffp]"b"(voffp),
> +		  [load]"m"(*load)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "r17"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_addv(intptr_t *v, intptr_t count, int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		/* cmp cpuid */
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		/* load the value of @v */
> +		RSEQ_ASM_OP_R_LOAD(v)
> +		/* add @count to it */
> +		RSEQ_ASM_OP_R_ADD(count)
> +		/* final store */
> +		RSEQ_ASM_OP_R_FINAL_STORE(v, 2)
> +		RSEQ_INJECT_ASM(4)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [count]"r"(count)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "r17"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		/* cmp cpuid */
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		/* cmp @v equal to @expect */
> +		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
> +		RSEQ_INJECT_ASM(4)
> +		/* try store */
> +		RSEQ_ASM_OP_STORE(newv2, v2)
> +		RSEQ_INJECT_ASM(5)
> +		/* final store */
> +		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
> +		RSEQ_INJECT_ASM(6)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* try store input */
> +		  [v2]"m"(*v2),
> +		  [newv2]"r"(newv2),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "r17"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		/* cmp cpuid */
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		/* cmp @v equal to @expect */
> +		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
> +		RSEQ_INJECT_ASM(4)
> +		/* try store */
> +		RSEQ_ASM_OP_STORE(newv2, v2)
> +		RSEQ_INJECT_ASM(5)
> +		/* for 'release' */
> +		"lwsync\n\t"
> +		/* final store */
> +		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
> +		RSEQ_INJECT_ASM(6)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* try store input */
> +		  [v2]"m"(*v2),
> +		  [newv2]"r"(newv2),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "r17"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t expect2, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		/* cmp cpuid */
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		/* cmp @v equal to @expect */
> +		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
> +		RSEQ_INJECT_ASM(4)
> +		/* cmp @v2 equal to @expct2 */
> +		RSEQ_ASM_OP_CMPEQ(v2, expect2, 5f)
> +		RSEQ_INJECT_ASM(5)
> +		/* final store */
> +		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
> +		RSEQ_INJECT_ASM(6)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* cmp2 input */
> +		  [v2]"m"(*v2),
> +		  [expect2]"r"(expect2),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "r17"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		/* setup for mempcy */
> +		"mr %%r19, %[len]\n\t" \
> +		"mr %%r20, %[src]\n\t" \
> +		"mr %%r21, %[dst]\n\t" \
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		/* cmp cpuid */
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		/* cmp @v equal to @expect */
> +		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
> +		RSEQ_INJECT_ASM(4)
> +		/* try memcpy */
> +		RSEQ_ASM_OP_R_MEMCPY()
> +		RSEQ_INJECT_ASM(5)
> +		/* final store */
> +		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
> +		RSEQ_INJECT_ASM(6)
> +		/* teardown */
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv),
> +		  /* try memcpy input */
> +		  [dst]"r"(dst),
> +		  [src]"r"(src),
> +		  [len]"r"(len)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "r17", "r18", "r19", "r20", "r21"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		/* setup for mempcy */
> +		"mr %%r19, %[len]\n\t" \
> +		"mr %%r20, %[src]\n\t" \
> +		"mr %%r21, %[dst]\n\t" \
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		/* cmp cpuid */
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		/* cmp @v equal to @expect */
> +		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
> +		RSEQ_INJECT_ASM(4)
> +		/* try memcpy */
> +		RSEQ_ASM_OP_R_MEMCPY()
> +		RSEQ_INJECT_ASM(5)
> +		/* for 'release' */
> +		"lwsync\n\t"
> +		/* final store */
> +		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
> +		RSEQ_INJECT_ASM(6)
> +		/* teardown */
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv),
> +		  /* try memcpy input */
> +		  [dst]"r"(dst),
> +		  [src]"r"(src),
> +		  [len]"r"(len)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "r17", "r18", "r19", "r20", "r21"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +#undef STORE_WORD
> +#undef LOAD_WORD
> +#undef LOADX_WORD
> +#undef CMP_WORD
> diff --git a/tools/testing/selftests/rseq/rseq-x86.h b/tools/testing/selftests/rseq/rseq-x86.h
> new file mode 100644
> index 000000000000..63e81d6c61fa
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/rseq-x86.h
> @@ -0,0 +1,898 @@
> +/*
> + * rseq-x86.h
> + *
> + * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#include <stdint.h>
> +
> +#define RSEQ_SIG	0x53053053
> +
> +#ifdef __x86_64__
> +
> +#define rseq_smp_mb()	__asm__ __volatile__ ("mfence" : : : "memory")
> +#define rseq_smp_rmb()	barrier()
> +#define rseq_smp_wmb()	barrier()
> +
> +#define rseq_smp_load_acquire(p)					\
> +__extension__ ({							\
> +	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
> +	barrier();							\
> +	____p1;								\
> +})
> +
> +#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
> +
> +#define rseq_smp_store_release(p, v)					\
> +do {									\
> +	barrier();							\
> +	RSEQ_WRITE_ONCE(*p, v);						\
> +} while (0)
> +
> +#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,		\
> +			start_ip, post_commit_offset, abort_ip)		\
> +		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
> +		".balign 32\n\t"					\
> +		__rseq_str(label) ":\n\t"				\
> +		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
> +		".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \
> +		".popsection\n\t"
> +
> +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
> +		RSEQ_INJECT_ASM(1)					\
> +		"leaq " __rseq_str(cs_label) "(%%rip), %%rax\n\t"	\
> +		"movq %%rax, %[" __rseq_str(rseq_cs) "]\n\t"		\
> +		__rseq_str(label) ":\n\t"
> +
> +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
> +		RSEQ_INJECT_ASM(2)					\
> +		"cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \
> +		"jnz " __rseq_str(label) "\n\t"
> +
> +#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label) \
> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
> +		/* Disassembler-friendly signature: nopl <sig>(%rip). */\
> +		".byte 0x0f, 0x1f, 0x05\n\t"				\
> +		".long " __rseq_str(sig) "\n\t"			\
> +		__rseq_str(label) ":\n\t"				\
> +		teardown						\
> +		"jmp %l[" __rseq_str(abort_label) "]\n\t"		\
> +		".popsection\n\t"
> +
> +#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
> +		__rseq_str(label) ":\n\t"				\
> +		teardown						\
> +		"jmp %l[" __rseq_str(cmpfail_label) "]\n\t"		\
> +		".popsection\n\t"
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"cmpq %[v], %[expect]\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* final store */
> +		"movq %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "rax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
> +		off_t voffp, intptr_t *load, int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"cmpq %[v], %[expectnot]\n\t"
> +		"jz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		"movq %[v], %%rax\n\t"
> +		"movq %%rax, %[load]\n\t"
> +		"addq %[voffp], %%rax\n\t"
> +		"movq (%%rax), %%rax\n\t"
> +		/* final store */
> +		"movq %%rax, %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expectnot]"r"(expectnot),
> +		  [voffp]"er"(voffp),
> +		  [load]"m"(*load)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "rax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_addv(intptr_t *v, intptr_t count, int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		/* final store */
> +		"addq %[count], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [count]"er"(count)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "rax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"cmpq %[v], %[expect]\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* try store */
> +		"movq %[newv2], %[v2]\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		/* final store */
> +		"movq %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* try store input */
> +		  [v2]"m"(*v2),
> +		  [newv2]"r"(newv2),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "rax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +/* x86-64 is TSO. */
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	return rseq_cmpeqv_trystorev_storev(v, expect, v2, newv2,
> +			newv, cpu);
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t expect2, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"cmpq %[v], %[expect]\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		"cmpq %[v2], %[expect2]\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		/* final store */
> +		"movq %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* cmp2 input */
> +		  [v2]"m"(*v2),
> +		  [expect2]"r"(expect2),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "rax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	uint64_t rseq_scratch[3];
> +
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		"movq %[src], %[rseq_scratch0]\n\t"
> +		"movq %[dst], %[rseq_scratch1]\n\t"
> +		"movq %[len], %[rseq_scratch2]\n\t"
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"cmpq %[v], %[expect]\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* try memcpy */
> +		"test %[len], %[len]\n\t" \
> +		"jz 333f\n\t" \
> +		"222:\n\t" \
> +		"movb (%[src]), %%al\n\t" \
> +		"movb %%al, (%[dst])\n\t" \
> +		"inc %[src]\n\t" \
> +		"inc %[dst]\n\t" \
> +		"dec %[len]\n\t" \
> +		"jnz 222b\n\t" \
> +		"333:\n\t" \
> +		RSEQ_INJECT_ASM(5)
> +		/* final store */
> +		"movq %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		/* teardown */
> +		"movq %[rseq_scratch2], %[len]\n\t"
> +		"movq %[rseq_scratch1], %[dst]\n\t"
> +		"movq %[rseq_scratch0], %[src]\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG,
> +			"movq %[rseq_scratch2], %[len]\n\t"
> +			"movq %[rseq_scratch1], %[dst]\n\t"
> +			"movq %[rseq_scratch0], %[src]\n\t",
> +			abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
> +			"movq %[rseq_scratch2], %[len]\n\t"
> +			"movq %[rseq_scratch1], %[dst]\n\t"
> +			"movq %[rseq_scratch0], %[src]\n\t",
> +			cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv),
> +		  /* try memcpy input */
> +		  [dst]"r"(dst),
> +		  [src]"r"(src),
> +		  [len]"r"(len),
> +		  [rseq_scratch0]"m"(rseq_scratch[0]),
> +		  [rseq_scratch1]"m"(rseq_scratch[1]),
> +		  [rseq_scratch2]"m"(rseq_scratch[2])
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "rax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +/* x86-64 is TSO. */
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	return rseq_cmpeqv_trymemcpy_storev(v, expect, dst, src,
> +			len, newv, cpu);
> +}
> +
> +#elif __i386__
> +
> +/*
> + * Support older 32-bit architectures that do not implement fence
> + * instructions.
> + */
> +#define rseq_smp_mb()	\
> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
> +#define rseq_smp_rmb()	\
> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
> +#define rseq_smp_wmb()	\
> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
> +
> +#define rseq_smp_load_acquire(p)					\
> +__extension__ ({							\
> +	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
> +	rseq_smp_mb();							\
> +	____p1;								\
> +})
> +
> +#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
> +
> +#define rseq_smp_store_release(p, v)					\
> +do {									\
> +	rseq_smp_mb();							\
> +	RSEQ_WRITE_ONCE(*p, v);						\
> +} while (0)
> +
> +/*
> + * Use eax as scratch register and take memory operands as input to
> + * lessen register pressure. Especially needed when compiling in O0.
> + */
> +#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,		\
> +			start_ip, post_commit_offset, abort_ip)		\
> +		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
> +		".balign 32\n\t"					\
> +		__rseq_str(label) ":\n\t"				\
> +		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
> +		".long " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
> +		".popsection\n\t"
> +
> +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
> +		RSEQ_INJECT_ASM(1)					\
> +		"movl $" __rseq_str(cs_label) ", %[rseq_cs]\n\t"	\
> +		__rseq_str(label) ":\n\t"
> +
> +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
> +		RSEQ_INJECT_ASM(2)					\
> +		"cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \
> +		"jnz " __rseq_str(label) "\n\t"
> +
> +#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label) \
> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
> +		/* Disassembler-friendly signature: nopl <sig>. */\
> +		".byte 0x0f, 0x1f, 0x05\n\t"				\
> +		".long " __rseq_str(sig) "\n\t"			\
> +		__rseq_str(label) ":\n\t"				\
> +		teardown						\
> +		"jmp %l[" __rseq_str(abort_label) "]\n\t"		\
> +		".popsection\n\t"
> +
> +#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
> +		__rseq_str(label) ":\n\t"				\
> +		teardown						\
> +		"jmp %l[" __rseq_str(cmpfail_label) "]\n\t"		\
> +		".popsection\n\t"
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"cmpl %[v], %[expect]\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* final store */
> +		"movl %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "eax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
> +		off_t voffp, intptr_t *load, int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"cmpl %[v], %[expectnot]\n\t"
> +		"jz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		"movl %[v], %%eax\n\t"
> +		"movl %%eax, %[load]\n\t"
> +		"addl %[voffp], %%eax\n\t"
> +		"movl (%%eax), %%eax\n\t"
> +		/* final store */
> +		"movl %%eax, %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expectnot]"r"(expectnot),
> +		  [voffp]"ir"(voffp),
> +		  [load]"m"(*load)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "eax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_addv(intptr_t *v, intptr_t count, int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		/* final store */
> +		"addl %[count], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [count]"ir"(count)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "eax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"cmpl %[v], %[expect]\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* try store */
> +		"movl %[newv2], %%eax\n\t"
> +		"movl %%eax, %[v2]\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		/* final store */
> +		"movl %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* try store input */
> +		  [v2]"m"(*v2),
> +		  [newv2]"m"(newv2),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "eax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"movl %[expect], %%eax\n\t"
> +		"cmpl %[v], %%eax\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* try store */
> +		"movl %[newv2], %[v2]\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		"lock; addl $0,0(%%esp)\n\t"
> +		/* final store */
> +		"movl %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* try store input */
> +		  [v2]"m"(*v2),
> +		  [newv2]"r"(newv2),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"m"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "eax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t expect2, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"cmpl %[v], %[expect]\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		"cmpl %[expect2], %[v2]\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		"movl %[newv], %%eax\n\t"
> +		/* final store */
> +		"movl %%eax, %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* cmp2 input */
> +		  [v2]"m"(*v2),
> +		  [expect2]"r"(expect2),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"m"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "eax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +/* TODO: implement a faster memcpy. */
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	uint32_t rseq_scratch[3];
> +
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		"movl %[src], %[rseq_scratch0]\n\t"
> +		"movl %[dst], %[rseq_scratch1]\n\t"
> +		"movl %[len], %[rseq_scratch2]\n\t"
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"movl %[expect], %%eax\n\t"
> +		"cmpl %%eax, %[v]\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* try memcpy */
> +		"test %[len], %[len]\n\t" \
> +		"jz 333f\n\t" \
> +		"222:\n\t" \
> +		"movb (%[src]), %%al\n\t" \
> +		"movb %%al, (%[dst])\n\t" \
> +		"inc %[src]\n\t" \
> +		"inc %[dst]\n\t" \
> +		"dec %[len]\n\t" \
> +		"jnz 222b\n\t" \
> +		"333:\n\t" \
> +		RSEQ_INJECT_ASM(5)
> +		"movl %[newv], %%eax\n\t"
> +		/* final store */
> +		"movl %%eax, %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		/* teardown */
> +		"movl %[rseq_scratch2], %[len]\n\t"
> +		"movl %[rseq_scratch1], %[dst]\n\t"
> +		"movl %[rseq_scratch0], %[src]\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG,
> +			"movl %[rseq_scratch2], %[len]\n\t"
> +			"movl %[rseq_scratch1], %[dst]\n\t"
> +			"movl %[rseq_scratch0], %[src]\n\t",
> +			abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
> +			"movl %[rseq_scratch2], %[len]\n\t"
> +			"movl %[rseq_scratch1], %[dst]\n\t"
> +			"movl %[rseq_scratch0], %[src]\n\t",
> +			cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"m"(expect),
> +		  [newv]"m"(newv),
> +		  /* try memcpy input */
> +		  [dst]"r"(dst),
> +		  [src]"r"(src),
> +		  [len]"r"(len),
> +		  [rseq_scratch0]"m"(rseq_scratch[0]),
> +		  [rseq_scratch1]"m"(rseq_scratch[1]),
> +		  [rseq_scratch2]"m"(rseq_scratch[2])
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "eax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +/* TODO: implement a faster memcpy. */
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	uint32_t rseq_scratch[3];
> +
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		"movl %[src], %[rseq_scratch0]\n\t"
> +		"movl %[dst], %[rseq_scratch1]\n\t"
> +		"movl %[len], %[rseq_scratch2]\n\t"
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"movl %[expect], %%eax\n\t"
> +		"cmpl %%eax, %[v]\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* try memcpy */
> +		"test %[len], %[len]\n\t" \
> +		"jz 333f\n\t" \
> +		"222:\n\t" \
> +		"movb (%[src]), %%al\n\t" \
> +		"movb %%al, (%[dst])\n\t" \
> +		"inc %[src]\n\t" \
> +		"inc %[dst]\n\t" \
> +		"dec %[len]\n\t" \
> +		"jnz 222b\n\t" \
> +		"333:\n\t" \
> +		RSEQ_INJECT_ASM(5)
> +		"lock; addl $0,0(%%esp)\n\t"
> +		"movl %[newv], %%eax\n\t"
> +		/* final store */
> +		"movl %%eax, %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		/* teardown */
> +		"movl %[rseq_scratch2], %[len]\n\t"
> +		"movl %[rseq_scratch1], %[dst]\n\t"
> +		"movl %[rseq_scratch0], %[src]\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG,
> +			"movl %[rseq_scratch2], %[len]\n\t"
> +			"movl %[rseq_scratch1], %[dst]\n\t"
> +			"movl %[rseq_scratch0], %[src]\n\t",
> +			abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
> +			"movl %[rseq_scratch2], %[len]\n\t"
> +			"movl %[rseq_scratch1], %[dst]\n\t"
> +			"movl %[rseq_scratch0], %[src]\n\t",
> +			cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"m"(expect),
> +		  [newv]"m"(newv),
> +		  /* try memcpy input */
> +		  [dst]"r"(dst),
> +		  [src]"r"(src),
> +		  [len]"r"(len),
> +		  [rseq_scratch0]"m"(rseq_scratch[0]),
> +		  [rseq_scratch1]"m"(rseq_scratch[1]),
> +		  [rseq_scratch2]"m"(rseq_scratch[2])
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "eax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +#endif
> diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c
> new file mode 100644
> index 000000000000..b83d3196c33e
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/rseq.c
> @@ -0,0 +1,116 @@
> +/*
> + * rseq.c
> + *
> + * Copyright (C) 2016 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; only
> + * version 2.1 of the License.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * Lesser General Public License for more details.
> + */
> +
> +#define _GNU_SOURCE
> +#include <errno.h>
> +#include <sched.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <syscall.h>
> +#include <assert.h>
> +#include <signal.h>
> +
> +#include "rseq.h"
> +
> +#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
> +
> +__attribute__((tls_model("initial-exec"))) __thread
> +volatile struct rseq __rseq_abi = {
> +	.cpu_id = RSEQ_CPU_ID_UNINITIALIZED,
> +};
> +
> +static __attribute__((tls_model("initial-exec"))) __thread
> +volatile int refcount;
> +
> +static void signal_off_save(sigset_t *oldset)
> +{
> +	sigset_t set;
> +	int ret;
> +
> +	sigfillset(&set);
> +	ret = pthread_sigmask(SIG_BLOCK, &set, oldset);
> +	if (ret)
> +		abort();
> +}
> +
> +static void signal_restore(sigset_t oldset)
> +{
> +	int ret;
> +
> +	ret = pthread_sigmask(SIG_SETMASK, &oldset, NULL);
> +	if (ret)
> +		abort();
> +}
> +
> +static int sys_rseq(volatile struct rseq *rseq_abi, uint32_t rseq_len,
> +		int flags, uint32_t sig)
> +{
> +	return syscall(__NR_rseq, rseq_abi, rseq_len, flags, sig);
> +}
> +
> +int rseq_register_current_thread(void)
> +{
> +	int rc, ret = 0;
> +	sigset_t oldset;
> +
> +	signal_off_save(&oldset);
> +	if (refcount++)
> +		goto end;
> +	rc = sys_rseq(&__rseq_abi, sizeof(struct rseq), 0, RSEQ_SIG);
> +	if (!rc) {
> +		assert(rseq_current_cpu_raw() >= 0);
> +		goto end;
> +	}
> +	if (errno != EBUSY)
> +		__rseq_abi.cpu_id = -2;
> +	ret = -1;
> +	refcount--;
> +end:
> +	signal_restore(oldset);
> +	return ret;
> +}
> +
> +int rseq_unregister_current_thread(void)
> +{
> +	int rc, ret = 0;
> +	sigset_t oldset;
> +
> +	signal_off_save(&oldset);
> +	if (--refcount)
> +		goto end;
> +	rc = sys_rseq(&__rseq_abi, sizeof(struct rseq),
> +			RSEQ_FLAG_UNREGISTER, RSEQ_SIG);
> +	if (!rc)
> +		goto end;
> +	ret = -1;
> +end:
> +	signal_restore(oldset);
> +	return ret;
> +}
> +
> +int32_t rseq_fallback_current_cpu(void)
> +{
> +	int32_t cpu;
> +
> +	cpu = sched_getcpu();
> +	if (cpu < 0) {
> +		perror("sched_getcpu()");
> +		abort();
> +	}
> +	return cpu;
> +}
> diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
> new file mode 100644
> index 000000000000..26c8ea01e940
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/rseq.h
> @@ -0,0 +1,154 @@
> +/*
> + * rseq.h
> + *
> + * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#ifndef RSEQ_H
> +#define RSEQ_H
> +
> +#include <stdint.h>
> +#include <stdbool.h>
> +#include <pthread.h>
> +#include <signal.h>
> +#include <sched.h>
> +#include <errno.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <sched.h>
> +#include <linux/rseq.h>
> +
> +/*
> + * Empty code injection macros, override when testing.
> + * It is important to consider that the ASM injection macros need to be
> + * fully reentrant (e.g. do not modify the stack).
> + */
> +#ifndef RSEQ_INJECT_ASM
> +#define RSEQ_INJECT_ASM(n)
> +#endif
> +
> +#ifndef RSEQ_INJECT_C
> +#define RSEQ_INJECT_C(n)
> +#endif
> +
> +#ifndef RSEQ_INJECT_INPUT
> +#define RSEQ_INJECT_INPUT
> +#endif
> +
> +#ifndef RSEQ_INJECT_CLOBBER
> +#define RSEQ_INJECT_CLOBBER
> +#endif
> +
> +#ifndef RSEQ_INJECT_FAILED
> +#define RSEQ_INJECT_FAILED
> +#endif
> +
> +extern __thread volatile struct rseq __rseq_abi;
> +
> +#define rseq_likely(x)		__builtin_expect(!!(x), 1)
> +#define rseq_unlikely(x)	__builtin_expect(!!(x), 0)
> +#define rseq_barrier()		__asm__ __volatile__("" : : : "memory")
> +
> +#define RSEQ_ACCESS_ONCE(x)	(*(__volatile__  __typeof__(x) *)&(x))
> +#define RSEQ_WRITE_ONCE(x, v)	__extension__ ({ RSEQ_ACCESS_ONCE(x) = (v); })
> +#define RSEQ_READ_ONCE(x)	RSEQ_ACCESS_ONCE(x)
> +
> +#define __rseq_str_1(x)	#x
> +#define __rseq_str(x)		__rseq_str_1(x)
> +
> +#if defined(__x86_64__) || defined(__i386__)
> +#include <rseq-x86.h>
> +#elif defined(__ARMEL__)
> +#include <rseq-arm.h>
> +#elif defined(__PPC__)
> +#include <rseq-ppc.h>
> +#else
> +#error unsupported target
> +#endif
> +
> +/*
> + * Register rseq for the current thread. This needs to be called once
> + * by any thread which uses restartable sequences, before they start
> + * using restartable sequences, to ensure restartable sequences
> + * succeed. A restartable sequence executed from a non-registered
> + * thread will always fail.
> + */
> +int rseq_register_current_thread(void);
> +
> +/*
> + * Unregister rseq for current thread.
> + */
> +int rseq_unregister_current_thread(void);
> +
> +/*
> + * Restartable sequence fallback for reading the current CPU number.
> + */
> +int32_t rseq_fallback_current_cpu(void);
> +
> +/*
> + * Values returned can be either the current CPU number, -1 (rseq is
> + * uninitialized), or -2 (rseq initialization has failed).
> + */
> +static inline int32_t rseq_current_cpu_raw(void)
> +{
> +	return RSEQ_ACCESS_ONCE(__rseq_abi.cpu_id);
> +}
> +
> +/*
> + * Returns a possible CPU number, which is typically the current CPU.
> + * The returned CPU number can be used to prepare for an rseq critical
> + * section, which will confirm whether the cpu number is indeed the
> + * current one, and whether rseq is initialized.
> + *
> + * The CPU number returned by rseq_cpu_start should always be validated
> + * by passing it to a rseq asm sequence, or by comparing it to the
> + * return value of rseq_current_cpu_raw() if the rseq asm sequence
> + * does not need to be invoked.
> + */
> +static inline uint32_t rseq_cpu_start(void)
> +{
> +	return RSEQ_ACCESS_ONCE(__rseq_abi.cpu_id_start);
> +}
> +
> +static inline uint32_t rseq_current_cpu(void)
> +{
> +	int32_t cpu;
> +
> +	cpu = rseq_current_cpu_raw();
> +	if (rseq_unlikely(cpu < 0))
> +		cpu = rseq_fallback_current_cpu();
> +	return cpu;
> +}
> +
> +/*
> + * rseq_prepare_unload() should be invoked by each thread using rseq_finish*()
> + * at least once between their last rseq_finish*() and library unload of the
> + * library defining the rseq critical section (struct rseq_cs). This also
> + * applies to use of rseq in code generated by JIT: rseq_prepare_unload()
> + * should be invoked at least once by each thread using rseq_finish*() before
> + * reclaim of the memory holding the struct rseq_cs.
> + */
> +static inline void rseq_prepare_unload(void)
> +{
> +	__rseq_abi.rseq_cs = 0;
> +}
> +
> +#endif  /* RSEQ_H_ */
> diff --git a/tools/testing/selftests/rseq/run_param_test.sh b/tools/testing/selftests/rseq/run_param_test.sh
> new file mode 100755
> index 000000000000..c7475a2bef11
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/run_param_test.sh
> @@ -0,0 +1,124 @@
> +#!/bin/bash
> +
> +EXTRA_ARGS=${@}
> +
> +OLDIFS="$IFS"
> +IFS=$'\n'
> +TEST_LIST=(
> +	"-T s"
> +	"-T l"
> +	"-T b"
> +	"-T b -M"
> +	"-T m"
> +	"-T m -M"
> +	"-T i"
> +)
> +
> +TEST_NAME=(
> +	"spinlock"
> +	"list"
> +	"buffer"
> +	"buffer with barrier"
> +	"memcpy"
> +	"memcpy with barrier"
> +	"increment"
> +)
> +IFS="$OLDIFS"
> +
> +function do_tests()
> +{
> +	local i=0
> +	while [ "$i" -lt "${#TEST_LIST[@]}" ]; do
> +		echo "Running test ${TEST_NAME[$i]}"
> +		./param_test ${TEST_LIST[$i]} ${@} ${EXTRA_ARGS} || exit 1
> +		let "i++"
> +	done
> +}
> +
> +echo "Default parameters"
> +do_tests
> +
> +echo "Loop injection: 10000 loops"
> +
> +OLDIFS="$IFS"
> +IFS=$'\n'
> +INJECT_LIST=(
> +	"1"
> +	"2"
> +	"3"
> +	"4"
> +	"5"
> +	"6"
> +	"7"
> +	"8"
> +	"9"
> +)
> +IFS="$OLDIFS"
> +
> +NR_LOOPS=10000
> +
> +i=0
> +while [ "$i" -lt "${#INJECT_LIST[@]}" ]; do
> +	echo "Injecting at <${INJECT_LIST[$i]}>"
> +	do_tests -${INJECT_LIST[i]} ${NR_LOOPS}
> +	let "i++"
> +done
> +NR_LOOPS=
> +
> +function inject_blocking()
> +{
> +	OLDIFS="$IFS"
> +	IFS=$'\n'
> +	INJECT_LIST=(
> +		"7"
> +		"8"
> +		"9"
> +	)
> +	IFS="$OLDIFS"
> +
> +	NR_LOOPS=-1
> +
> +	i=0
> +	while [ "$i" -lt "${#INJECT_LIST[@]}" ]; do
> +		echo "Injecting at <${INJECT_LIST[$i]}>"
> +		do_tests -${INJECT_LIST[i]} -1 ${@}
> +		let "i++"
> +	done
> +	NR_LOOPS=
> +}
> +
> +echo "Yield injection (25%)"
> +inject_blocking -m 4 -y -r 100
> +
> +echo "Yield injection (50%)"
> +inject_blocking -m 2 -y -r 100
> +
> +echo "Yield injection (100%)"
> +inject_blocking -m 1 -y -r 100
> +
> +echo "Kill injection (25%)"
> +inject_blocking -m 4 -k -r 100
> +
> +echo "Kill injection (50%)"
> +inject_blocking -m 2 -k -r 100
> +
> +echo "Kill injection (100%)"
> +inject_blocking -m 1 -k -r 100
> +
> +echo "Sleep injection (1ms, 25%)"
> +inject_blocking -m 4 -s 1 -r 100
> +
> +echo "Sleep injection (1ms, 50%)"
> +inject_blocking -m 2 -s 1 -r 100
> +
> +echo "Sleep injection (1ms, 100%)"
> +inject_blocking -m 1 -s 1 -r 100
> +
> +echo "Disable rseq for 25% threads"
> +do_tests -D 4
> +
> +echo "Disable rseq for 50% threads"
> +do_tests -D 2
> +
> +echo "Disable rseq"
> +do_tests -d
> 

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-21 15:34     ` Shuah Khan
  0 siblings, 0 replies; 175+ messages in thread
From: Shuah Khan @ 2017-11-21 15:34 UTC (permalink / raw)
  To: Mathieu Desnoyers, Peter Zijlstra, Paul E . McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk,
	linux-kselftest-u79uwXL29TY76Z2rM5mHXA, Shuah Khan, Shuah Khan

On 11/21/2017 07:18 AM, Mathieu Desnoyers wrote:
> Implements two basic tests of RSEQ functionality, and one more
> exhaustive parameterizable test.
> 
> The first, "basic_test" only asserts that RSEQ works moderately
> correctly. E.g. that the CPUID pointer works.
> 
> "basic_percpu_ops_test" is a slightly more "realistic" variant,
> implementing a few simple per-cpu operations and testing their
> correctness.
> 
> "param_test" is a parametrizable restartable sequences test. See
> the "--help" output for usage.
> 
> A run_param_test.sh script runs many variants of the parametrizable
> tests.
> 
> As part of those tests, a helper library "rseq" implements a user-space
> API around restartable sequences. It uses the cpu_opv system call as
> fallback when single-stepped by a debugger. It exposes the instruction
> pointer addresses where the rseq assembly blocks begin and end, as well
> as the associated abort instruction pointer, in the __rseq_table
> section. This section allows debuggers may know where to place
> breakpoints when single-stepping through assembly blocks which may be
> aborted at any point by the kernel.
> 
> The rseq library expose APIs that present the fast-path operations.
> The new from userspace is, e.g. for a counter increment:
> 
>     cpu = rseq_cpu_start();
>     ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>     if (likely(!ret))
>         return 0;        /* Success. */
>     do {
>         cpu = rseq_current_cpu();
>         ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>         if (likely(!ret))
>             return 0;    /* Success. */
>     } while (ret > 0 || errno == EAGAIN);
>     perror("cpu_op_addv");
>     return -1;           /* Unexpected error. */
> 
> PowerPC tests have been implemented by Boqun Feng.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
> CC: Russell King <linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>
> CC: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
> CC: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
> CC: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
> CC: Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> CC: Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> CC: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
> CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
> CC: Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
> CC: Dave Watson <davejwatson-b10kYP2dOMg@public.gmane.org>
> CC: Chris Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
> CC: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> CC: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
> CC: Ben Maurer <bmaurer-b10kYP2dOMg@public.gmane.org>
> CC: Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>
> CC: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> CC: Josh Triplett <josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org>
> CC: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> CC: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> CC: Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> CC: Shuah Khan <shuah-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> CC: linux-kselftest-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> CC: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> ---
> Changes since v1:
> - Provide abort-ip signature: The abort-ip signature is located just
>   before the abort-ip target. It is currently hardcoded, but a
>   user-space application could use the __rseq_table to iterate on all
>   abort-ip targets and use a random value as signature if needed in the
>   future.
> - Add rseq_prepare_unload(): Libraries and JIT code using rseq critical
>   sections need to issue rseq_prepare_unload() on each thread at least
>   once before reclaim of struct rseq_cs.
> - Use initial-exec TLS model, non-weak symbol: The initial-exec model is
>   signal-safe, whereas the global-dynamic model is not.  Remove the
>   "weak" symbol attribute from the __rseq_abi in rseq.c. The rseq.so
>   library will have ownership of that symbol, and there is not reason for
>   an application or user library to try to define that symbol.
>   The expected use is to link against libreq.so, which owns and provide
>   that symbol.
> - Set cpu_id to -2 on register error
> - Add rseq_len syscall parameter, rseq_cs version
> - Ensure disassember-friendly signature: x86 32/64 disassembler have a
>   hard time decoding the instruction stream after a bad instruction. Use
>   a nopl instruction to encode the signature. Suggested by Andy Lutomirski.
> - Exercise parametrized tests variants in a shell scripts.
> - Restartable sequences selftests: Remove use of event counter.
> - Use cpu_id_start field:  With the cpu_id_start field, the C
>   preparation phase of the fast-path does not need to compare cpu_id < 0
>   anymore.
> - Signal-safe registration and refcounting: Allow libraries using
>   librseq.so to register it from signal handlers.
> - Use OVERRIDE_TARGETS in makefile.
> - Use "m" constraints for rseq_cs field.
> 
> Changes since v2:
> - Update based on Thomas Gleixner's comments.
> ---
>  MAINTAINERS                                        |    1 +
>  tools/testing/selftests/Makefile                   |    1 +
>  tools/testing/selftests/rseq/.gitignore            |    4 +

Thanks for the .gitignore files. It is commonly missed change, I end
up adding one to clean things up after tests get in.

>  tools/testing/selftests/rseq/Makefile              |   23 +
>  .../testing/selftests/rseq/basic_percpu_ops_test.c |  333 +++++
>  tools/testing/selftests/rseq/basic_test.c          |   55 +
>  tools/testing/selftests/rseq/param_test.c          | 1285 ++++++++++++++++++++
>  tools/testing/selftests/rseq/rseq-arm.h            |  535 ++++++++
>  tools/testing/selftests/rseq/rseq-ppc.h            |  567 +++++++++
>  tools/testing/selftests/rseq/rseq-x86.h            |  898 ++++++++++++++
>  tools/testing/selftests/rseq/rseq.c                |  116 ++
>  tools/testing/selftests/rseq/rseq.h                |  154 +++
>  tools/testing/selftests/rseq/run_param_test.sh     |  124 ++
>  13 files changed, 4096 insertions(+)
>  create mode 100644 tools/testing/selftests/rseq/.gitignore
>  create mode 100644 tools/testing/selftests/rseq/Makefile
>  create mode 100644 tools/testing/selftests/rseq/basic_percpu_ops_test.c
>  create mode 100644 tools/testing/selftests/rseq/basic_test.c
>  create mode 100644 tools/testing/selftests/rseq/param_test.c
>  create mode 100644 tools/testing/selftests/rseq/rseq-arm.h
>  create mode 100644 tools/testing/selftests/rseq/rseq-ppc.h
>  create mode 100644 tools/testing/selftests/rseq/rseq-x86.h
>  create mode 100644 tools/testing/selftests/rseq/rseq.c
>  create mode 100644 tools/testing/selftests/rseq/rseq.h
>  create mode 100755 tools/testing/selftests/rseq/run_param_test.sh
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c6c2436d15f8..ba9137c1f295 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11634,6 +11634,7 @@ S:	Supported
>  F:	kernel/rseq.c
>  F:	include/uapi/linux/rseq.h
>  F:	include/trace/events/rseq.h
> +F:	tools/testing/selftests/rseq/
>  
>  RFKILL
>  M:	Johannes Berg <johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
> diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
> index fc1eba0e0130..fc314334628a 100644
> --- a/tools/testing/selftests/Makefile
> +++ b/tools/testing/selftests/Makefile
> @@ -26,6 +26,7 @@ TARGETS += nsfs
>  TARGETS += powerpc
>  TARGETS += pstore
>  TARGETS += ptrace
> +TARGETS += rseq
>  TARGETS += seccomp
>  TARGETS += sigaltstack
>  TARGETS += size
> diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore
> new file mode 100644
> index 000000000000..9409c3db99b2
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/.gitignore
> @@ -0,0 +1,4 @@
> +basic_percpu_ops_test
> +basic_test
> +basic_rseq_op_test
> +param_test
> diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
> new file mode 100644
> index 000000000000..e4f638e5752c
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/Makefile
> @@ -0,0 +1,23 @@
> +CFLAGS += -O2 -Wall -g -I./ -I../cpu-opv/ -I../../../../usr/include/ -L./ -Wl,-rpath=./
> +LDLIBS += -lpthread
> +
> +# Own dependencies because we only want to build against 1st prerequisite, but
> +# still track changes to header files and depend on shared object.
> +OVERRIDE_TARGETS = 1
> +
> +TEST_GEN_PROGS = basic_test basic_percpu_ops_test param_test
> +
> +TEST_GEN_PROGS_EXTENDED = librseq.so libcpu-op.so
> +
> +TEST_PROGS = run_param_test.sh
> +
> +include ../lib.mk
> +
> +$(OUTPUT)/librseq.so: rseq.c rseq.h rseq-*.h
> +	$(CC) $(CFLAGS) -shared -fPIC $< $(LDLIBS) -o $@
> +
> +$(OUTPUT)/libcpu-op.so: ../cpu-opv/cpu-op.c ../cpu-opv/cpu-op.h
> +	$(CC) $(CFLAGS) -shared -fPIC $< $(LDLIBS) -o $@
> +
> +$(OUTPUT)/%: %.c $(TEST_GEN_PROGS_EXTENDED) rseq.h rseq-*.h ../cpu-opv/cpu-op.h
> +	$(CC) $(CFLAGS) $< $(LDLIBS) -lrseq -lcpu-op -o $@
> diff --git a/tools/testing/selftests/rseq/basic_percpu_ops_test.c b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
> new file mode 100644
> index 000000000000..e5f7fed06a03
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
> @@ -0,0 +1,333 @@
> +#define _GNU_SOURCE
> +#include <assert.h>
> +#include <pthread.h>
> +#include <sched.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <stddef.h>
> +
> +#include "rseq.h"
> +#include "cpu-op.h"
> +
> +#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
> +
> +struct percpu_lock_entry {
> +	intptr_t v;
> +} __attribute__((aligned(128)));
> +
> +struct percpu_lock {
> +	struct percpu_lock_entry c[CPU_SETSIZE];
> +};
> +
> +struct test_data_entry {
> +	intptr_t count;
> +} __attribute__((aligned(128)));
> +
> +struct spinlock_test_data {
> +	struct percpu_lock lock;
> +	struct test_data_entry c[CPU_SETSIZE];
> +	int reps;
> +};
> +
> +struct percpu_list_node {
> +	intptr_t data;
> +	struct percpu_list_node *next;
> +};
> +
> +struct percpu_list_entry {
> +	struct percpu_list_node *head;
> +} __attribute__((aligned(128)));
> +
> +struct percpu_list {
> +	struct percpu_list_entry c[CPU_SETSIZE];
> +};
> +
> +/* A simple percpu spinlock.  Returns the cpu lock was acquired on. */
> +int rseq_percpu_lock(struct percpu_lock *lock)
> +{
> +	int cpu;
> +
> +	for (;;) {
> +		int ret;
> +
> +#ifndef SKIP_FASTPATH
> +		/* Try fast path. */
> +		cpu = rseq_cpu_start();
> +		ret = rseq_cmpeqv_storev(&lock->c[cpu].v,
> +				0, 1, cpu);
> +		if (likely(!ret))
> +			break;
> +		if (ret > 0)
> +			continue;	/* Retry. */
> +#endif
> +	slowpath:
> +		__attribute__((unused));
> +		/* Fallback on cpu_opv system call. */
> +		cpu = rseq_current_cpu();
> +		ret = cpu_op_cmpeqv_storev(&lock->c[cpu].v, 0, 1, cpu);
> +		if (likely(!ret))
> +			break;
> +		assert(ret >= 0 || errno == EAGAIN);
> +	}
> +	/*
> +	 * Acquire semantic when taking lock after control dependency.
> +	 * Matches rseq_smp_store_release().
> +	 */
> +	rseq_smp_acquire__after_ctrl_dep();
> +	return cpu;
> +}
> +
> +void rseq_percpu_unlock(struct percpu_lock *lock, int cpu)
> +{
> +	assert(lock->c[cpu].v == 1);
> +	/*
> +	 * Release lock, with release semantic. Matches
> +	 * rseq_smp_acquire__after_ctrl_dep().
> +	 */
> +	rseq_smp_store_release(&lock->c[cpu].v, 0);
> +}
> +
> +void *test_percpu_spinlock_thread(void *arg)
> +{
> +	struct spinlock_test_data *data = arg;
> +	int i, cpu;
> +
> +	if (rseq_register_current_thread()) {
> +		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
> +			errno, strerror(errno));
> +		abort();
> +	}
> +	for (i = 0; i < data->reps; i++) {
> +		cpu = rseq_percpu_lock(&data->lock);
> +		data->c[cpu].count++;
> +		rseq_percpu_unlock(&data->lock, cpu);
> +	}
> +	if (rseq_unregister_current_thread()) {
> +		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
> +			errno, strerror(errno));
> +		abort();
> +	}
> +
> +	return NULL;
> +}
> +
> +/*
> + * A simple test which implements a sharded counter using a per-cpu
> + * lock.  Obviously real applications might prefer to simply use a
> + * per-cpu increment; however, this is reasonable for a test and the
> + * lock can be extended to synchronize more complicated operations.
> + */
> +void test_percpu_spinlock(void)
> +{
> +	const int num_threads = 200;
> +	int i;
> +	uint64_t sum;
> +	pthread_t test_threads[num_threads];
> +	struct spinlock_test_data data;
> +
> +	memset(&data, 0, sizeof(data));
> +	data.reps = 5000;
> +
> +	for (i = 0; i < num_threads; i++)
> +		pthread_create(&test_threads[i], NULL,
> +			test_percpu_spinlock_thread, &data);
> +
> +	for (i = 0; i < num_threads; i++)
> +		pthread_join(test_threads[i], NULL);
> +
> +	sum = 0;
> +	for (i = 0; i < CPU_SETSIZE; i++)
> +		sum += data.c[i].count;
> +
> +	assert(sum == (uint64_t)data.reps * num_threads);
> +}
> +
> +int percpu_list_push(struct percpu_list *list, struct percpu_list_node *node)
> +{
> +	intptr_t *targetptr, newval, expect;
> +	int cpu, ret;
> +
> +#ifndef SKIP_FASTPATH
> +	/* Try fast path. */
> +	cpu = rseq_cpu_start();
> +	/* Load list->c[cpu].head with single-copy atomicity. */
> +	expect = (intptr_t)READ_ONCE(list->c[cpu].head);
> +	newval = (intptr_t)node;
> +	targetptr = (intptr_t *)&list->c[cpu].head;
> +	node->next = (struct percpu_list_node *)expect;
> +	ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu);
> +	if (likely(!ret))
> +		return cpu;
> +#endif
> +	/* Fallback on cpu_opv system call. */
> +	slowpath:
> +		__attribute__((unused));
> +	for (;;) {
> +		cpu = rseq_current_cpu();
> +		/* Load list->c[cpu].head with single-copy atomicity. */
> +		expect = (intptr_t)READ_ONCE(list->c[cpu].head);
> +		newval = (intptr_t)node;
> +		targetptr = (intptr_t *)&list->c[cpu].head;
> +		node->next = (struct percpu_list_node *)expect;
> +		ret = cpu_op_cmpeqv_storev(targetptr, expect, newval, cpu);
> +		if (likely(!ret))
> +			break;
> +		assert(ret >= 0 || errno == EAGAIN);
> +	}
> +	return cpu;
> +}
> +
> +/*
> + * Unlike a traditional lock-less linked list; the availability of a
> + * rseq primitive allows us to implement pop without concerns over
> + * ABA-type races.
> + */
> +struct percpu_list_node *percpu_list_pop(struct percpu_list *list)
> +{
> +	struct percpu_list_node *head;
> +	int cpu, ret;
> +
> +#ifndef SKIP_FASTPATH
> +	/* Try fast path. */
> +	cpu = rseq_cpu_start();
> +	ret = rseq_cmpnev_storeoffp_load((intptr_t *)&list->c[cpu].head,
> +		(intptr_t)NULL,
> +		offsetof(struct percpu_list_node, next),
> +		(intptr_t *)&head, cpu);
> +	if (likely(!ret))
> +		return head;
> +	if (ret > 0)
> +		return NULL;
> +#endif
> +	/* Fallback on cpu_opv system call. */
> +	slowpath:
> +		__attribute__((unused));
> +	for (;;) {
> +		cpu = rseq_current_cpu();
> +		ret = cpu_op_cmpnev_storeoffp_load(
> +			(intptr_t *)&list->c[cpu].head,
> +			(intptr_t)NULL,
> +			offsetof(struct percpu_list_node, next),
> +			(intptr_t *)&head, cpu);
> +		if (likely(!ret))
> +			break;
> +		if (ret > 0)
> +			return NULL;
> +		assert(ret >= 0 || errno == EAGAIN);
> +	}
> +	return head;
> +}
> +
> +void *test_percpu_list_thread(void *arg)
> +{
> +	int i;
> +	struct percpu_list *list = (struct percpu_list *)arg;
> +
> +	if (rseq_register_current_thread()) {
> +		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
> +			errno, strerror(errno));
> +		abort();
> +	}
> +
> +	for (i = 0; i < 100000; i++) {
> +		struct percpu_list_node *node = percpu_list_pop(list);
> +
> +		sched_yield();  /* encourage shuffling */
> +		if (node)
> +			percpu_list_push(list, node);
> +	}
> +
> +	if (rseq_unregister_current_thread()) {
> +		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
> +			errno, strerror(errno));
> +		abort();
> +	}
> +
> +	return NULL;
> +}
> +
> +/* Simultaneous modification to a per-cpu linked list from many threads.  */
> +void test_percpu_list(void)
> +{
> +	int i, j;
> +	uint64_t sum = 0, expected_sum = 0;
> +	struct percpu_list list;
> +	pthread_t test_threads[200];
> +	cpu_set_t allowed_cpus;
> +
> +	memset(&list, 0, sizeof(list));
> +
> +	/* Generate list entries for every usable cpu. */
> +	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
> +	for (i = 0; i < CPU_SETSIZE; i++) {
> +		if (!CPU_ISSET(i, &allowed_cpus))
> +			continue;
> +		for (j = 1; j <= 100; j++) {
> +			struct percpu_list_node *node;
> +
> +			expected_sum += j;
> +
> +			node = malloc(sizeof(*node));
> +			assert(node);
> +			node->data = j;
> +			node->next = list.c[i].head;
> +			list.c[i].head = node;
> +		}
> +	}
> +
> +	for (i = 0; i < 200; i++)
> +		assert(pthread_create(&test_threads[i], NULL,
> +			test_percpu_list_thread, &list) == 0);
> +
> +	for (i = 0; i < 200; i++)
> +		pthread_join(test_threads[i], NULL);
> +
> +	for (i = 0; i < CPU_SETSIZE; i++) {
> +		cpu_set_t pin_mask;
> +		struct percpu_list_node *node;
> +
> +		if (!CPU_ISSET(i, &allowed_cpus))
> +			continue;
> +
> +		CPU_ZERO(&pin_mask);
> +		CPU_SET(i, &pin_mask);
> +		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
> +
> +		while ((node = percpu_list_pop(&list))) {
> +			sum += node->data;
> +			free(node);
> +		}
> +	}
> +
> +	/*
> +	 * All entries should now be accounted for (unless some external
> +	 * actor is interfering with our allowed affinity while this
> +	 * test is running).
> +	 */
> +	assert(sum == expected_sum);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	if (rseq_register_current_thread()) {
> +		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
> +			errno, strerror(errno));
> +		goto error;
> +	}
> +	printf("spinlock\n");
> +	test_percpu_spinlock();
> +	printf("percpu_list\n");
> +	test_percpu_list();
> +	if (rseq_unregister_current_thread()) {
> +		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
> +			errno, strerror(errno));
> +		goto error;
> +	}
> +	return 0;
> +
> +error:
> +	return -1;
> +}
> +
> diff --git a/tools/testing/selftests/rseq/basic_test.c b/tools/testing/selftests/rseq/basic_test.c
> new file mode 100644
> index 000000000000..e2086b3885d7
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/basic_test.c
> @@ -0,0 +1,55 @@
> +/*
> + * Basic test coverage for critical regions and rseq_current_cpu().
> + */
> +
> +#define _GNU_SOURCE
> +#include <assert.h>
> +#include <sched.h>
> +#include <signal.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <sys/time.h>
> +
> +#include "rseq.h"
> +
> +void test_cpu_pointer(void)
> +{
> +	cpu_set_t affinity, test_affinity;
> +	int i;
> +
> +	sched_getaffinity(0, sizeof(affinity), &affinity);
> +	CPU_ZERO(&test_affinity);
> +	for (i = 0; i < CPU_SETSIZE; i++) {
> +		if (CPU_ISSET(i, &affinity)) {
> +			CPU_SET(i, &test_affinity);
> +			sched_setaffinity(0, sizeof(test_affinity),
> +					&test_affinity);
> +			assert(sched_getcpu() == i);
> +			assert(rseq_current_cpu() == i);
> +			assert(rseq_current_cpu_raw() == i);
> +			assert(rseq_cpu_start() == i);
> +			CPU_CLR(i, &test_affinity);
> +		}
> +	}
> +	sched_setaffinity(0, sizeof(affinity), &affinity);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	if (rseq_register_current_thread()) {
> +		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
> +			errno, strerror(errno));
> +		goto init_thread_error;
> +	}
> +	printf("testing current cpu\n");
> +	test_cpu_pointer();
> +	if (rseq_unregister_current_thread()) {
> +		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
> +			errno, strerror(errno));
> +		goto init_thread_error;
> +	}
> +	return 0;
> +
> +init_thread_error:
> +	return -1;
> +}
> diff --git a/tools/testing/selftests/rseq/param_test.c b/tools/testing/selftests/rseq/param_test.c
> new file mode 100644
> index 000000000000..c7a16b656a36
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/param_test.c
> @@ -0,0 +1,1285 @@
> +#define _GNU_SOURCE
> +#include <assert.h>
> +#include <pthread.h>
> +#include <sched.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <syscall.h>
> +#include <unistd.h>
> +#include <poll.h>
> +#include <sys/types.h>
> +#include <signal.h>
> +#include <errno.h>
> +#include <stddef.h>
> +
> +#include "cpu-op.h"
> +
> +static inline pid_t gettid(void)
> +{
> +	return syscall(__NR_gettid);
> +}
> +
> +#define NR_INJECT	9
> +static int loop_cnt[NR_INJECT + 1];
> +
> +static int opt_modulo, verbose;
> +
> +static int opt_yield, opt_signal, opt_sleep,
> +		opt_disable_rseq, opt_threads = 200,
> +		opt_disable_mod = 0, opt_test = 's', opt_mb = 0;
> +
> +static long long opt_reps = 5000;
> +
> +static __thread __attribute__((tls_model("initial-exec"))) unsigned int signals_delivered;
> +
> +#ifndef BENCHMARK
> +
> +static __thread __attribute__((tls_model("initial-exec"))) unsigned int yield_mod_cnt, nr_abort;
> +
> +#define printf_verbose(fmt, ...)			\
> +	do {						\
> +		if (verbose)				\
> +			printf(fmt, ## __VA_ARGS__);	\
> +	} while (0)
> +
> +#define RSEQ_INJECT_INPUT \
> +	, [loop_cnt_1]"m"(loop_cnt[1]) \
> +	, [loop_cnt_2]"m"(loop_cnt[2]) \
> +	, [loop_cnt_3]"m"(loop_cnt[3]) \
> +	, [loop_cnt_4]"m"(loop_cnt[4]) \
> +	, [loop_cnt_5]"m"(loop_cnt[5]) \
> +	, [loop_cnt_6]"m"(loop_cnt[6])
> +
> +#if defined(__x86_64__) || defined(__i386__)
> +
> +#define INJECT_ASM_REG	"eax"
> +
> +#define RSEQ_INJECT_CLOBBER \
> +	, INJECT_ASM_REG
> +
> +#define RSEQ_INJECT_ASM(n) \
> +	"mov %[loop_cnt_" #n "], %%" INJECT_ASM_REG "\n\t" \
> +	"test %%" INJECT_ASM_REG ",%%" INJECT_ASM_REG "\n\t" \
> +	"jz 333f\n\t" \
> +	"222:\n\t" \
> +	"dec %%" INJECT_ASM_REG "\n\t" \
> +	"jnz 222b\n\t" \
> +	"333:\n\t"
> +
> +#elif defined(__ARMEL__)
> +
> +#define INJECT_ASM_REG	"r4"
> +
> +#define RSEQ_INJECT_CLOBBER \
> +	, INJECT_ASM_REG
> +
> +#define RSEQ_INJECT_ASM(n) \
> +	"ldr " INJECT_ASM_REG ", %[loop_cnt_" #n "]\n\t" \
> +	"cmp " INJECT_ASM_REG ", #0\n\t" \
> +	"beq 333f\n\t" \
> +	"222:\n\t" \
> +	"subs " INJECT_ASM_REG ", #1\n\t" \
> +	"bne 222b\n\t" \
> +	"333:\n\t"
> +
> +#elif __PPC__
> +#define INJECT_ASM_REG	"r18"
> +
> +#define RSEQ_INJECT_CLOBBER \
> +	, INJECT_ASM_REG
> +
> +#define RSEQ_INJECT_ASM(n) \
> +	"lwz %%" INJECT_ASM_REG ", %[loop_cnt_" #n "]\n\t" \
> +	"cmpwi %%" INJECT_ASM_REG ", 0\n\t" \
> +	"beq 333f\n\t" \
> +	"222:\n\t" \
> +	"subic. %%" INJECT_ASM_REG ", %%" INJECT_ASM_REG ", 1\n\t" \
> +	"bne 222b\n\t" \
> +	"333:\n\t"
> +#else
> +#error unsupported target
> +#endif
> +
> +#define RSEQ_INJECT_FAILED \
> +	nr_abort++;
> +
> +#define RSEQ_INJECT_C(n) \
> +{ \
> +	int loc_i, loc_nr_loops = loop_cnt[n]; \
> +	\
> +	for (loc_i = 0; loc_i < loc_nr_loops; loc_i++) { \
> +		barrier(); \
> +	} \
> +	if (loc_nr_loops == -1 && opt_modulo) { \
> +		if (yield_mod_cnt == opt_modulo - 1) { \
> +			if (opt_sleep > 0) \
> +				poll(NULL, 0, opt_sleep); \
> +			if (opt_yield) \
> +				sched_yield(); \
> +			if (opt_signal) \
> +				raise(SIGUSR1); \
> +			yield_mod_cnt = 0; \
> +		} else { \
> +			yield_mod_cnt++; \
> +		} \
> +	} \
> +}
> +
> +#else
> +
> +#define printf_verbose(fmt, ...)
> +
> +#endif /* BENCHMARK */
> +
> +#include "rseq.h"
> +
> +struct percpu_lock_entry {
> +	intptr_t v;
> +} __attribute__((aligned(128)));
> +
> +struct percpu_lock {
> +	struct percpu_lock_entry c[CPU_SETSIZE];
> +};
> +
> +struct test_data_entry {
> +	intptr_t count;
> +} __attribute__((aligned(128)));
> +
> +struct spinlock_test_data {
> +	struct percpu_lock lock;
> +	struct test_data_entry c[CPU_SETSIZE];
> +};
> +
> +struct spinlock_thread_test_data {
> +	struct spinlock_test_data *data;
> +	long long reps;
> +	int reg;
> +};
> +
> +struct inc_test_data {
> +	struct test_data_entry c[CPU_SETSIZE];
> +};
> +
> +struct inc_thread_test_data {
> +	struct inc_test_data *data;
> +	long long reps;
> +	int reg;
> +};
> +
> +struct percpu_list_node {
> +	intptr_t data;
> +	struct percpu_list_node *next;
> +};
> +
> +struct percpu_list_entry {
> +	struct percpu_list_node *head;
> +} __attribute__((aligned(128)));
> +
> +struct percpu_list {
> +	struct percpu_list_entry c[CPU_SETSIZE];
> +};
> +
> +#define BUFFER_ITEM_PER_CPU	100
> +
> +struct percpu_buffer_node {
> +	intptr_t data;
> +};
> +
> +struct percpu_buffer_entry {
> +	intptr_t offset;
> +	intptr_t buflen;
> +	struct percpu_buffer_node **array;
> +} __attribute__((aligned(128)));
> +
> +struct percpu_buffer {
> +	struct percpu_buffer_entry c[CPU_SETSIZE];
> +};
> +
> +#define MEMCPY_BUFFER_ITEM_PER_CPU	100
> +
> +struct percpu_memcpy_buffer_node {
> +	intptr_t data1;
> +	uint64_t data2;
> +};
> +
> +struct percpu_memcpy_buffer_entry {
> +	intptr_t offset;
> +	intptr_t buflen;
> +	struct percpu_memcpy_buffer_node *array;
> +} __attribute__((aligned(128)));
> +
> +struct percpu_memcpy_buffer {
> +	struct percpu_memcpy_buffer_entry c[CPU_SETSIZE];
> +};
> +
> +/* A simple percpu spinlock.  Returns the cpu lock was acquired on. */
> +static int rseq_percpu_lock(struct percpu_lock *lock)
> +{
> +	int cpu;
> +
> +	for (;;) {
> +		int ret;
> +
> +#ifndef SKIP_FASTPATH
> +		/* Try fast path. */
> +		cpu = rseq_cpu_start();
> +		ret = rseq_cmpeqv_storev(&lock->c[cpu].v,
> +				0, 1, cpu);
> +		if (likely(!ret))
> +			break;
> +		if (ret > 0)
> +			continue;	/* Retry. */
> +#endif> +	slowpath:
> +		__attribute__((unused));
> +		/* Fallback on cpu_opv system call. */
> +		cpu = rseq_current_cpu();
> +		ret = cpu_op_cmpeqv_storev(&lock->c[cpu].v, 0, 1, cpu);
> +		if (likely(!ret))
> +			break;
> +		assert(ret >= 0 || errno == EAGAIN);
> +	}
> +	/*
> +	 * Acquire semantic when taking lock after control dependency.
> +	 * Matches rseq_smp_store_release().
> +	 */
> +	rseq_smp_acquire__after_ctrl_dep();
> +	return cpu;
> +}
> +
> +static void rseq_percpu_unlock(struct percpu_lock *lock, int cpu)
> +{
> +	assert(lock->c[cpu].v == 1);
> +	/*
> +	 * Release lock, with release semantic. Matches
> +	 * rseq_smp_acquire__after_ctrl_dep().
> +	 */
> +	rseq_smp_store_release(&lock->c[cpu].v, 0);
> +}
> +
> +void *test_percpu_spinlock_thread(void *arg)
> +{
> +	struct spinlock_thread_test_data *thread_data = arg;
> +	struct spinlock_test_data *data = thread_data->data;
> +	int cpu;
> +	long long i, reps;
> +
> +	if (!opt_disable_rseq && thread_data->reg
> +			&& rseq_register_current_thread())
> +		abort();
> +	reps = thread_data->reps;
> +	for (i = 0; i < reps; i++) {
> +		cpu = rseq_percpu_lock(&data->lock);
> +		data->c[cpu].count++;
> +		rseq_percpu_unlock(&data->lock, cpu);
> +#ifndef BENCHMARK
> +		if (i != 0 && !(i % (reps / 10)))
> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
> +#endif
> +	}
> +	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
> +		(int) gettid(), nr_abort, signals_delivered);
> +	if (!opt_disable_rseq && thread_data->reg
> +			&& rseq_unregister_current_thread())
> +		abort();
> +	return NULL;
> +}
> +
> +/*
> + * A simple test which implements a sharded counter using a per-cpu
> + * lock.  Obviously real applications might prefer to simply use a
> + * per-cpu increment; however, this is reasonable for a test and the
> + * lock can be extended to synchronize more complicated operations.
> + */
> +void test_percpu_spinlock(void)
> +{
> +	const int num_threads = opt_threads;
> +	int i, ret;
> +	uint64_t sum;
> +	pthread_t test_threads[num_threads];
> +	struct spinlock_test_data data;
> +	struct spinlock_thread_test_data thread_data[num_threads];
> +
> +	memset(&data, 0, sizeof(data));
> +	for (i = 0; i < num_threads; i++) {
> +		thread_data[i].reps = opt_reps;
> +		if (opt_disable_mod <= 0 || (i % opt_disable_mod))
> +			thread_data[i].reg = 1;
> +		else
> +			thread_data[i].reg = 0;
> +		thread_data[i].data = &data;
> +		ret = pthread_create(&test_threads[i], NULL,
> +			test_percpu_spinlock_thread, &thread_data[i]);
> +		if (ret) {
> +			errno = ret;
> +			perror("pthread_create");
> +			abort();
> +		}
> +	}
> +
> +	for (i = 0; i < num_threads; i++) {
> +		pthread_join(test_threads[i], NULL);
> +		if (ret) {
> +			errno = ret;
> +			perror("pthread_join");
> +			abort();
> +		}
> +	}
> +
> +	sum = 0;
> +	for (i = 0; i < CPU_SETSIZE; i++)
> +		sum += data.c[i].count;
> +
> +	assert(sum == (uint64_t)opt_reps * num_threads);
> +}
> +
> +void *test_percpu_inc_thread(void *arg)
> +{
> +	struct inc_thread_test_data *thread_data = arg;
> +	struct inc_test_data *data = thread_data->data;
> +	long long i, reps;
> +
> +	if (!opt_disable_rseq && thread_data->reg
> +			&& rseq_register_current_thread())
> +		abort();
> +	reps = thread_data->reps;
> +	for (i = 0; i < reps; i++) {
> +		int cpu, ret;
> +
> +#ifndef SKIP_FASTPATH
> +		/* Try fast path. */
> +		cpu = rseq_cpu_start();
> +		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
> +		if (likely(!ret))
> +			goto next;
> +#endif

So the test needs to compiled with this enabled? I think it would be better
to make this an argument to be abel to select at test start time as opposed
to making this compile time option. Remember that these tests get run in
automated test rings. Making this a compile time otpion pertty much ensures
that this path will not be tested.

So I would reccommend adding a paratemer.

> +	slowpath:
> +		__attribute__((unused));
> +		for (;;) {
> +			/* Fallback on cpu_opv system call. */
> +			cpu = rseq_current_cpu();
> +			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
> +			if (likely(!ret))
> +				break;
> +			assert(ret >= 0 || errno == EAGAIN);
> +		}
> +	next:
> +		__attribute__((unused));
> +#ifndef BENCHMARK
> +		if (i != 0 && !(i % (reps / 10)))
> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
> +#endif

Same comment as before. Avoid compile time options.

> +	}
> +	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
> +		(int) gettid(), nr_abort, signals_delivered);
> +	if (!opt_disable_rseq && thread_data->reg
> +			&& rseq_unregister_current_thread())
> +		abort();
> +	return NULL;
> +}
> +
> +void test_percpu_inc(void)
> +{
> +	const int num_threads = opt_threads;
> +	int i, ret;
> +	uint64_t sum;
> +	pthread_t test_threads[num_threads];
> +	struct inc_test_data data;
> +	struct inc_thread_test_data thread_data[num_threads];
> +
> +	memset(&data, 0, sizeof(data));
> +	for (i = 0; i < num_threads; i++) {
> +		thread_data[i].reps = opt_reps;
> +		if (opt_disable_mod <= 0 || (i % opt_disable_mod))
> +			thread_data[i].reg = 1;
> +		else
> +			thread_data[i].reg = 0;
> +		thread_data[i].data = &data;
> +		ret = pthread_create(&test_threads[i], NULL,
> +			test_percpu_inc_thread, &thread_data[i]);
> +		if (ret) {
> +			errno = ret;
> +			perror("pthread_create");
> +			abort();
> +		}
> +	}
> +
> +	for (i = 0; i < num_threads; i++) {
> +		pthread_join(test_threads[i], NULL);
> +		if (ret) {
> +			errno = ret;
> +			perror("pthread_join");
> +			abort();
> +		}
> +	}
> +
> +	sum = 0;
> +	for (i = 0; i < CPU_SETSIZE; i++)
> +		sum += data.c[i].count;
> +
> +	assert(sum == (uint64_t)opt_reps * num_threads);
> +}
> +
> +int percpu_list_push(struct percpu_list *list, struct percpu_list_node *node)
> +{
> +	intptr_t *targetptr, newval, expect;
> +	int cpu, ret;
> +
> +#ifndef SKIP_FASTPATH
> +	/* Try fast path. */
> +	cpu = rseq_cpu_start();
> +	/* Load list->c[cpu].head with single-copy atomicity. */
> +	expect = (intptr_t)READ_ONCE(list->c[cpu].head);
> +	newval = (intptr_t)node;
> +	targetptr = (intptr_t *)&list->c[cpu].head;
> +	node->next = (struct percpu_list_node *)expect;
> +	ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu);
> +	if (likely(!ret))
> +		return cpu;
> +#endif
> +	/* Fallback on cpu_opv system call. */
> +slowpath:
> +	__attribute__((unused));
> +	for (;;) {
> +		cpu = rseq_current_cpu();
> +		/* Load list->c[cpu].head with single-copy atomicity. */
> +		expect = (intptr_t)READ_ONCE(list->c[cpu].head);
> +		newval = (intptr_t)node;
> +		targetptr = (intptr_t *)&list->c[cpu].head;
> +		node->next = (struct percpu_list_node *)expect;
> +		ret = cpu_op_cmpeqv_storev(targetptr, expect, newval, cpu);
> +		if (likely(!ret))
> +			break;
> +		assert(ret >= 0 || errno == EAGAIN);
> +	}
> +	return cpu;
> +}
> +
> +/*
> + * Unlike a traditional lock-less linked list; the availability of a
> + * rseq primitive allows us to implement pop without concerns over
> + * ABA-type races.
> + */
> +struct percpu_list_node *percpu_list_pop(struct percpu_list *list)
> +{
> +	struct percpu_list_node *head;
> +	int cpu, ret;
> +
> +#ifndef SKIP_FASTPATH
> +	/* Try fast path. */
> +	cpu = rseq_cpu_start();
> +	ret = rseq_cmpnev_storeoffp_load((intptr_t *)&list->c[cpu].head,
> +		(intptr_t)NULL,
> +		offsetof(struct percpu_list_node, next),
> +		(intptr_t *)&head, cpu);
> +	if (likely(!ret))
> +		return head;
> +	if (ret > 0)
> +		return NULL;
> +#endif
> +	/* Fallback on cpu_opv system call. */
> +	slowpath:
> +		__attribute__((unused));
> +	for (;;) {
> +		cpu = rseq_current_cpu();
> +		ret = cpu_op_cmpnev_storeoffp_load(
> +			(intptr_t *)&list->c[cpu].head,
> +			(intptr_t)NULL,
> +			offsetof(struct percpu_list_node, next),
> +			(intptr_t *)&head, cpu);
> +		if (likely(!ret))
> +			break;
> +		if (ret > 0)
> +			return NULL;
> +		assert(ret >= 0 || errno == EAGAIN);
> +	}
> +	return head;
> +}
> +
> +void *test_percpu_list_thread(void *arg)
> +{
> +	long long i, reps;
> +	struct percpu_list *list = (struct percpu_list *)arg;
> +
> +	if (!opt_disable_rseq && rseq_register_current_thread())
> +		abort();
> +
> +	reps = opt_reps;
> +	for (i = 0; i < reps; i++) {
> +		struct percpu_list_node *node = percpu_list_pop(list);
> +
> +		if (opt_yield)
> +			sched_yield();  /* encourage shuffling */
> +		if (node)
> +			percpu_list_push(list, node);
> +	}
> +
> +	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
> +		(int) gettid(), nr_abort, signals_delivered);
> +	if (!opt_disable_rseq && rseq_unregister_current_thread())
> +		abort();
> +
> +	return NULL;
> +}
> +
> +/* Simultaneous modification to a per-cpu linked list from many threads.  */
> +void test_percpu_list(void)
> +{
> +	const int num_threads = opt_threads;
> +	int i, j, ret;
> +	uint64_t sum = 0, expected_sum = 0;
> +	struct percpu_list list;
> +	pthread_t test_threads[num_threads];
> +	cpu_set_t allowed_cpus;
> +
> +	memset(&list, 0, sizeof(list));
> +
> +	/* Generate list entries for every usable cpu. */
> +	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
> +	for (i = 0; i < CPU_SETSIZE; i++) {
> +		if (!CPU_ISSET(i, &allowed_cpus))
> +			continue;
> +		for (j = 1; j <= 100; j++) {
> +			struct percpu_list_node *node;
> +
> +			expected_sum += j;
> +
> +			node = malloc(sizeof(*node));
> +			assert(node);
> +			node->data = j;
> +			node->next = list.c[i].head;
> +			list.c[i].head = node;
> +		}
> +	}
> +
> +	for (i = 0; i < num_threads; i++) {
> +		ret = pthread_create(&test_threads[i], NULL,
> +			test_percpu_list_thread, &list);
> +		if (ret) {
> +			errno = ret;
> +			perror("pthread_create");
> +			abort();
> +		}
> +	}
> +
> +	for (i = 0; i < num_threads; i++) {
> +		pthread_join(test_threads[i], NULL);
> +		if (ret) {
> +			errno = ret;
> +			perror("pthread_join");
> +			abort();
> +		}
> +	}
> +
> +	for (i = 0; i < CPU_SETSIZE; i++) {
> +		cpu_set_t pin_mask;
> +		struct percpu_list_node *node;
> +
> +		if (!CPU_ISSET(i, &allowed_cpus))
> +			continue;
> +
> +		CPU_ZERO(&pin_mask);
> +		CPU_SET(i, &pin_mask);
> +		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
> +
> +		while ((node = percpu_list_pop(&list))) {
> +			sum += node->data;
> +			free(node);
> +		}
> +	}
> +
> +	/*
> +	 * All entries should now be accounted for (unless some external
> +	 * actor is interfering with our allowed affinity while this
> +	 * test is running).
> +	 */
> +	assert(sum == expected_sum);
> +}
> +
> +bool percpu_buffer_push(struct percpu_buffer *buffer,
> +		struct percpu_buffer_node *node)
> +{
> +	intptr_t *targetptr_spec, newval_spec;
> +	intptr_t *targetptr_final, newval_final;
> +	int cpu, ret;
> +	intptr_t offset;
> +
> +#ifndef SKIP_FASTPATH
> +	/* Try fast path. */
> +	cpu = rseq_cpu_start();
> +	/* Load offset with single-copy atomicity. */
> +	offset = READ_ONCE(buffer->c[cpu].offset);
> +	if (offset == buffer->c[cpu].buflen) {
> +		if (unlikely(cpu != rseq_current_cpu_raw()))
> +			goto slowpath;
> +		return false;
> +	}
> +	newval_spec = (intptr_t)node;
> +	targetptr_spec = (intptr_t *)&buffer->c[cpu].array[offset];
> +	newval_final = offset + 1;
> +	targetptr_final = &buffer->c[cpu].offset;
> +	if (opt_mb)
> +		ret = rseq_cmpeqv_trystorev_storev_release(targetptr_final,
> +			offset, targetptr_spec, newval_spec,
> +			newval_final, cpu);
> +	else
> +		ret = rseq_cmpeqv_trystorev_storev(targetptr_final,
> +			offset, targetptr_spec, newval_spec,
> +			newval_final, cpu);
> +	if (likely(!ret))
> +		return true;
> +#endif
> +slowpath:
> +	__attribute__((unused));
> +	/* Fallback on cpu_opv system call. */
> +	for (;;) {
> +		cpu = rseq_current_cpu();
> +		/* Load offset with single-copy atomicity. */
> +		offset = READ_ONCE(buffer->c[cpu].offset);
> +		if (offset == buffer->c[cpu].buflen)
> +			return false;
> +		newval_spec = (intptr_t)node;
> +		targetptr_spec = (intptr_t *)&buffer->c[cpu].array[offset];
> +		newval_final = offset + 1;
> +		targetptr_final = &buffer->c[cpu].offset;
> +		if (opt_mb)
> +			ret = cpu_op_cmpeqv_storev_mb_storev(targetptr_final,
> +				offset, targetptr_spec, newval_spec,
> +				newval_final, cpu);
> +		else
> +			ret = cpu_op_cmpeqv_storev_storev(targetptr_final,
> +				offset, targetptr_spec, newval_spec,
> +				newval_final, cpu);
> +		if (likely(!ret))
> +			break;
> +		assert(ret >= 0 || errno == EAGAIN);
> +	}
> +	return true;
> +}
> +
> +struct percpu_buffer_node *percpu_buffer_pop(struct percpu_buffer *buffer)
> +{
> +	struct percpu_buffer_node *head;
> +	intptr_t *targetptr, newval;
> +	int cpu, ret;
> +	intptr_t offset;
> +
> +#ifndef SKIP_FASTPATH
> +	/* Try fast path. */
> +	cpu = rseq_cpu_start();
> +	/* Load offset with single-copy atomicity. */
> +	offset = READ_ONCE(buffer->c[cpu].offset);
> +	if (offset == 0) {
> +		if (unlikely(cpu != rseq_current_cpu_raw()))
> +			goto slowpath;
> +		return NULL;
> +	}
> +	head = buffer->c[cpu].array[offset - 1];
> +	newval = offset - 1;
> +	targetptr = (intptr_t *)&buffer->c[cpu].offset;
> +	ret = rseq_cmpeqv_cmpeqv_storev(targetptr, offset,
> +		(intptr_t *)&buffer->c[cpu].array[offset - 1], (intptr_t)head,
> +		newval, cpu);
> +	if (likely(!ret))
> +		return head;
> +#endif
> +slowpath:
> +	__attribute__((unused));
> +	/* Fallback on cpu_opv system call. */
> +	for (;;) {
> +		cpu = rseq_current_cpu();
> +		/* Load offset with single-copy atomicity. */
> +		offset = READ_ONCE(buffer->c[cpu].offset);
> +		if (offset == 0)
> +			return NULL;
> +		head = buffer->c[cpu].array[offset - 1];
> +		newval = offset - 1;
> +		targetptr = (intptr_t *)&buffer->c[cpu].offset;
> +		ret = cpu_op_cmpeqv_cmpeqv_storev(targetptr, offset,
> +			(intptr_t *)&buffer->c[cpu].array[offset - 1],
> +			(intptr_t)head, newval, cpu);
> +		if (likely(!ret))
> +			break;
> +		assert(ret >= 0 || errno == EAGAIN);
> +	}
> +	return head;
> +}
> +
> +void *test_percpu_buffer_thread(void *arg)
> +{
> +	long long i, reps;
> +	struct percpu_buffer *buffer = (struct percpu_buffer *)arg;
> +
> +	if (!opt_disable_rseq && rseq_register_current_thread())
> +		abort();
> +
> +	reps = opt_reps;
> +	for (i = 0; i < reps; i++) {
> +		struct percpu_buffer_node *node = percpu_buffer_pop(buffer);
> +
> +		if (opt_yield)
> +			sched_yield();  /* encourage shuffling */
> +		if (node) {
> +			if (!percpu_buffer_push(buffer, node)) {
> +				/* Should increase buffer size. */
> +				abort();
> +			}
> +		}
> +	}
> +
> +	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
> +		(int) gettid(), nr_abort, signals_delivered);
> +	if (!opt_disable_rseq && rseq_unregister_current_thread())
> +		abort();
> +
> +	return NULL;
> +}
> +
> +/* Simultaneous modification to a per-cpu buffer from many threads.  */
> +void test_percpu_buffer(void)
> +{
> +	const int num_threads = opt_threads;
> +	int i, j, ret;
> +	uint64_t sum = 0, expected_sum = 0;
> +	struct percpu_buffer buffer;
> +	pthread_t test_threads[num_threads];
> +	cpu_set_t allowed_cpus;
> +
> +	memset(&buffer, 0, sizeof(buffer));
> +
> +	/* Generate list entries for every usable cpu. */
> +	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
> +	for (i = 0; i < CPU_SETSIZE; i++) {
> +		if (!CPU_ISSET(i, &allowed_cpus))
> +			continue;
> +		/* Worse-case is every item in same CPU. */
> +		buffer.c[i].array =
> +			malloc(sizeof(*buffer.c[i].array) * CPU_SETSIZE
> +				* BUFFER_ITEM_PER_CPU);
> +		assert(buffer.c[i].array);
> +		buffer.c[i].buflen = CPU_SETSIZE * BUFFER_ITEM_PER_CPU;
> +		for (j = 1; j <= BUFFER_ITEM_PER_CPU; j++) {
> +			struct percpu_buffer_node *node;
> +
> +			expected_sum += j;
> +
> +			/*
> +			 * We could theoretically put the word-sized
> +			 * "data" directly in the buffer. However, we
> +			 * want to model objects that would not fit
> +			 * within a single word, so allocate an object
> +			 * for each node.
> +			 */
> +			node = malloc(sizeof(*node));
> +			assert(node);
> +			node->data = j;
> +			buffer.c[i].array[j - 1] = node;
> +			buffer.c[i].offset++;
> +		}
> +	}
> +
> +	for (i = 0; i < num_threads; i++) {
> +		ret = pthread_create(&test_threads[i], NULL,
> +			test_percpu_buffer_thread, &buffer);
> +		if (ret) {
> +			errno = ret;
> +			perror("pthread_create");
> +			abort();
> +		}
> +	}
> +
> +	for (i = 0; i < num_threads; i++) {
> +		pthread_join(test_threads[i], NULL);
> +		if (ret) {
> +			errno = ret;
> +			perror("pthread_join");
> +			abort();
> +		}
> +	}
> +
> +	for (i = 0; i < CPU_SETSIZE; i++) {
> +		cpu_set_t pin_mask;
> +		struct percpu_buffer_node *node;
> +
> +		if (!CPU_ISSET(i, &allowed_cpus))
> +			continue;
> +
> +		CPU_ZERO(&pin_mask);
> +		CPU_SET(i, &pin_mask);
> +		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
> +
> +		while ((node = percpu_buffer_pop(&buffer))) {
> +			sum += node->data;
> +			free(node);
> +		}
> +		free(buffer.c[i].array);
> +	}
> +
> +	/*
> +	 * All entries should now be accounted for (unless some external
> +	 * actor is interfering with our allowed affinity while this
> +	 * test is running).
> +	 */
> +	assert(sum == expected_sum);
> +}
> +
> +bool percpu_memcpy_buffer_push(struct percpu_memcpy_buffer *buffer,
> +		struct percpu_memcpy_buffer_node item)
> +{
> +	char *destptr, *srcptr;
> +	size_t copylen;
> +	intptr_t *targetptr_final, newval_final;
> +	int cpu, ret;
> +	intptr_t offset;
> +
> +#ifndef SKIP_FASTPATH
> +	/* Try fast path. */
> +	cpu = rseq_cpu_start();
> +	/* Load offset with single-copy atomicity. */
> +	offset = READ_ONCE(buffer->c[cpu].offset);
> +	if (offset == buffer->c[cpu].buflen) {
> +		if (unlikely(cpu != rseq_current_cpu_raw()))
> +			goto slowpath;
> +		return false;
> +	}
> +	destptr = (char *)&buffer->c[cpu].array[offset];
> +	srcptr = (char *)&item;
> +	copylen = sizeof(item);
> +	newval_final = offset + 1;
> +	targetptr_final = &buffer->c[cpu].offset;
> +	if (opt_mb)
> +		ret = rseq_cmpeqv_trymemcpy_storev_release(targetptr_final,
> +			offset, destptr, srcptr, copylen,
> +			newval_final, cpu);
> +	else
> +		ret = rseq_cmpeqv_trymemcpy_storev(targetptr_final,
> +			offset, destptr, srcptr, copylen,
> +			newval_final, cpu);
> +	if (likely(!ret))
> +		return true;
> +#endif
> +slowpath:
> +	__attribute__((unused));
> +	/* Fallback on cpu_opv system call. */
> +	for (;;) {
> +		cpu = rseq_current_cpu();
> +		/* Load offset with single-copy atomicity. */
> +		offset = READ_ONCE(buffer->c[cpu].offset);
> +		if (offset == buffer->c[cpu].buflen)
> +			return false;
> +		destptr = (char *)&buffer->c[cpu].array[offset];
> +		srcptr = (char *)&item;
> +		copylen = sizeof(item);
> +		newval_final = offset + 1;
> +		targetptr_final = &buffer->c[cpu].offset;
> +		/* copylen must be <= PAGE_SIZE. */
> +		if (opt_mb)
> +			ret = cpu_op_cmpeqv_memcpy_mb_storev(targetptr_final,
> +				offset, destptr, srcptr, copylen,
> +				newval_final, cpu);
> +		else
> +			ret = cpu_op_cmpeqv_memcpy_storev(targetptr_final,
> +				offset, destptr, srcptr, copylen,
> +				newval_final, cpu);
> +		if (likely(!ret))
> +			break;
> +		assert(ret >= 0 || errno == EAGAIN);
> +	}
> +	return true;
> +}
> +
> +bool percpu_memcpy_buffer_pop(struct percpu_memcpy_buffer *buffer,
> +		struct percpu_memcpy_buffer_node *item)
> +{
> +	char *destptr, *srcptr;
> +	size_t copylen;
> +	intptr_t *targetptr_final, newval_final;
> +	int cpu, ret;
> +	intptr_t offset;
> +
> +#ifndef SKIP_FASTPATH
> +	/* Try fast path. */
> +	cpu = rseq_cpu_start();
> +	/* Load offset with single-copy atomicity. */
> +	offset = READ_ONCE(buffer->c[cpu].offset);
> +	if (offset == 0) {
> +		if (unlikely(cpu != rseq_current_cpu_raw()))
> +			goto slowpath;
> +		return false;
> +	}
> +	destptr = (char *)item;
> +	srcptr = (char *)&buffer->c[cpu].array[offset - 1];
> +	copylen = sizeof(*item);
> +	newval_final = offset - 1;
> +	targetptr_final = &buffer->c[cpu].offset;
> +	ret = rseq_cmpeqv_trymemcpy_storev(targetptr_final,
> +		offset, destptr, srcptr, copylen,
> +		newval_final, cpu);
> +	if (likely(!ret))
> +		return true;
> +#endif
> +slowpath:
> +	__attribute__((unused));
> +	/* Fallback on cpu_opv system call. */
> +	for (;;) {
> +		cpu = rseq_current_cpu();
> +		/* Load offset with single-copy atomicity. */
> +		offset = READ_ONCE(buffer->c[cpu].offset);
> +		if (offset == 0)
> +			return false;
> +		destptr = (char *)item;
> +		srcptr = (char *)&buffer->c[cpu].array[offset - 1];
> +		copylen = sizeof(*item);
> +		newval_final = offset - 1;
> +		targetptr_final = &buffer->c[cpu].offset;
> +		/* copylen must be <= PAGE_SIZE. */
> +		ret = cpu_op_cmpeqv_memcpy_storev(targetptr_final,
> +			offset, destptr, srcptr, copylen,
> +			newval_final, cpu);
> +		if (likely(!ret))
> +			break;
> +		assert(ret >= 0 || errno == EAGAIN);
> +	}
> +	return true;
> +}
> +
> +void *test_percpu_memcpy_buffer_thread(void *arg)
> +{
> +	long long i, reps;
> +	struct percpu_memcpy_buffer *buffer = (struct percpu_memcpy_buffer *)arg;
> +
> +	if (!opt_disable_rseq && rseq_register_current_thread())
> +		abort();
> +
> +	reps = opt_reps;
> +	for (i = 0; i < reps; i++) {
> +		struct percpu_memcpy_buffer_node item;
> +		bool result;
> +
> +		result = percpu_memcpy_buffer_pop(buffer, &item);
> +		if (opt_yield)
> +			sched_yield();  /* encourage shuffling */
> +		if (result) {
> +			if (!percpu_memcpy_buffer_push(buffer, item)) {
> +				/* Should increase buffer size. */
> +				abort();
> +			}
> +		}
> +	}
> +
> +	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
> +		(int) gettid(), nr_abort, signals_delivered);
> +	if (!opt_disable_rseq && rseq_unregister_current_thread())
> +		abort();
> +
> +	return NULL;
> +}
> +
> +/* Simultaneous modification to a per-cpu buffer from many threads.  */
> +void test_percpu_memcpy_buffer(void)
> +{
> +	const int num_threads = opt_threads;
> +	int i, j, ret;
> +	uint64_t sum = 0, expected_sum = 0;
> +	struct percpu_memcpy_buffer buffer;
> +	pthread_t test_threads[num_threads];
> +	cpu_set_t allowed_cpus;
> +
> +	memset(&buffer, 0, sizeof(buffer));
> +
> +	/* Generate list entries for every usable cpu. */
> +	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
> +	for (i = 0; i < CPU_SETSIZE; i++) {
> +		if (!CPU_ISSET(i, &allowed_cpus))
> +			continue;
> +		/* Worse-case is every item in same CPU. */
> +		buffer.c[i].array =
> +			malloc(sizeof(*buffer.c[i].array) * CPU_SETSIZE
> +				* MEMCPY_BUFFER_ITEM_PER_CPU);
> +		assert(buffer.c[i].array);
> +		buffer.c[i].buflen = CPU_SETSIZE * MEMCPY_BUFFER_ITEM_PER_CPU;
> +		for (j = 1; j <= MEMCPY_BUFFER_ITEM_PER_CPU; j++) {
> +			expected_sum += 2 * j + 1;
> +
> +			/*
> +			 * We could theoretically put the word-sized
> +			 * "data" directly in the buffer. However, we
> +			 * want to model objects that would not fit
> +			 * within a single word, so allocate an object
> +			 * for each node.
> +			 */
> +			buffer.c[i].array[j - 1].data1 = j;
> +			buffer.c[i].array[j - 1].data2 = j + 1;
> +			buffer.c[i].offset++;
> +		}
> +	}
> +
> +	for (i = 0; i < num_threads; i++) {
> +		ret = pthread_create(&test_threads[i], NULL,
> +			test_percpu_memcpy_buffer_thread, &buffer);
> +		if (ret) {
> +			errno = ret;
> +			perror("pthread_create");
> +			abort();
> +		}
> +	}
> +
> +	for (i = 0; i < num_threads; i++) {
> +		pthread_join(test_threads[i], NULL);
> +		if (ret) {
> +			errno = ret;
> +			perror("pthread_join");
> +			abort();
> +		}
> +	}
> +
> +	for (i = 0; i < CPU_SETSIZE; i++) {
> +		cpu_set_t pin_mask;
> +		struct percpu_memcpy_buffer_node item;
> +
> +		if (!CPU_ISSET(i, &allowed_cpus))
> +			continue;
> +
> +		CPU_ZERO(&pin_mask);
> +		CPU_SET(i, &pin_mask);
> +		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
> +
> +		while (percpu_memcpy_buffer_pop(&buffer, &item)) {
> +			sum += item.data1;
> +			sum += item.data2;
> +		}
> +		free(buffer.c[i].array);
> +	}
> +
> +	/*
> +	 * All entries should now be accounted for (unless some external
> +	 * actor is interfering with our allowed affinity while this
> +	 * test is running).
> +	 */
> +	assert(sum == expected_sum);
> +}
> +
> +static void test_signal_interrupt_handler(int signo)
> +{
> +	signals_delivered++;
> +}
> +
> +static int set_signal_handler(void)
> +{
> +	int ret = 0;
> +	struct sigaction sa;
> +	sigset_t sigset;
> +
> +	ret = sigemptyset(&sigset);
> +	if (ret < 0) {
> +		perror("sigemptyset");
> +		return ret;
> +	}
> +
> +	sa.sa_handler = test_signal_interrupt_handler;
> +	sa.sa_mask = sigset;
> +	sa.sa_flags = 0;
> +	ret = sigaction(SIGUSR1, &sa, NULL);
> +	if (ret < 0) {
> +		perror("sigaction");
> +		return ret;
> +	}
> +
> +	printf_verbose("Signal handler set for SIGUSR1\n");
> +
> +	return ret;
> +}
> +
> +static void show_usage(int argc, char **argv)
> +{
> +	printf("Usage : %s <OPTIONS>\n",
> +		argv[0]);
> +	printf("OPTIONS:\n");
> +	printf("	[-1 loops] Number of loops for delay injection 1\n");
> +	printf("	[-2 loops] Number of loops for delay injection 2\n");
> +	printf("	[-3 loops] Number of loops for delay injection 3\n");
> +	printf("	[-4 loops] Number of loops for delay injection 4\n");
> +	printf("	[-5 loops] Number of loops for delay injection 5\n");
> +	printf("	[-6 loops] Number of loops for delay injection 6\n");
> +	printf("	[-7 loops] Number of loops for delay injection 7 (-1 to enable -m)\n");
> +	printf("	[-8 loops] Number of loops for delay injection 8 (-1 to enable -m)\n");
> +	printf("	[-9 loops] Number of loops for delay injection 9 (-1 to enable -m)\n");
> +	printf("	[-m N] Yield/sleep/kill every modulo N (default 0: disabled) (>= 0)\n");
> +	printf("	[-y] Yield\n");
> +	printf("	[-k] Kill thread with signal\n");
> +	printf("	[-s S] S: =0: disabled (default), >0: sleep time (ms)\n");
> +	printf("	[-t N] Number of threads (default 200)\n");
> +	printf("	[-r N] Number of repetitions per thread (default 5000)\n");
> +	printf("	[-d] Disable rseq system call (no initialization)\n");
> +	printf("	[-D M] Disable rseq for each M threads\n");
> +	printf("	[-T test] Choose test: (s)pinlock, (l)ist, (b)uffer, (m)emcpy, (i)ncrement\n");
> +	printf("	[-M] Push into buffer and memcpy buffer with memory barriers.\n");
> +	printf("	[-v] Verbose output.\n");
> +	printf("	[-h] Show this help.\n");
> +	printf("\n");
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	int i;
> +
> +	for (i = 1; i < argc; i++) {
> +		if (argv[i][0] != '-')
> +			continue;
> +		switch (argv[i][1]) {
> +		case '1':
> +		case '2':
> +		case '3':
> +		case '4':
> +		case '5':
> +		case '6':
> +		case '7':
> +		case '8':
> +		case '9':
> +			if (argc < i + 2) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			loop_cnt[argv[i][1] - '0'] = atol(argv[i + 1]);
> +			i++;
> +			break;
> +		case 'm':
> +			if (argc < i + 2) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			opt_modulo = atol(argv[i + 1]);
> +			if (opt_modulo < 0) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			i++;
> +			break;
> +		case 's':
> +			if (argc < i + 2) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			opt_sleep = atol(argv[i + 1]);
> +			if (opt_sleep < 0) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			i++;
> +			break;
> +		case 'y':
> +			opt_yield = 1;
> +			break;
> +		case 'k':
> +			opt_signal = 1;
> +			break;
> +		case 'd':
> +			opt_disable_rseq = 1;
> +			break;
> +		case 'D':
> +			if (argc < i + 2) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			opt_disable_mod = atol(argv[i + 1]);
> +			if (opt_disable_mod < 0) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			i++;
> +			break;
> +		case 't':
> +			if (argc < i + 2) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			opt_threads = atol(argv[i + 1]);
> +			if (opt_threads < 0) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			i++;
> +			break;
> +		case 'r':
> +			if (argc < i + 2) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			opt_reps = atoll(argv[i + 1]);
> +			if (opt_reps < 0) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			i++;
> +			break;
> +		case 'h':
> +			show_usage(argc, argv);
> +			goto end;
> +		case 'T':
> +			if (argc < i + 2) {
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			opt_test = *argv[i + 1];
> +			switch (opt_test) {
> +			case 's':
> +			case 'l':
> +			case 'i':
> +			case 'b':
> +			case 'm':
> +				break;
> +			default:
> +				show_usage(argc, argv);
> +				goto error;
> +			}
> +			i++;
> +			break;
> +		case 'v':
> +			verbose = 1;
> +			break;
> +		case 'M':
> +			opt_mb = 1;
> +			break;
> +		default:
> +			show_usage(argc, argv);
> +			goto error;
> +		}
> +	}
> +
> +	if (set_signal_handler())
> +		goto error;
> +
> +	if (!opt_disable_rseq && rseq_register_current_thread())
> +		goto error;
> +	switch (opt_test) {
> +	case 's':
> +		printf_verbose("spinlock\n");
> +		test_percpu_spinlock();
> +		break;
> +	case 'l':
> +		printf_verbose("linked list\n");
> +		test_percpu_list();
> +		break;
> +	case 'b':
> +		printf_verbose("buffer\n");
> +		test_percpu_buffer();
> +		break;
> +	case 'm':
> +		printf_verbose("memcpy buffer\n");
> +		test_percpu_memcpy_buffer();
> +		break;
> +	case 'i':
> +		printf_verbose("counter increment\n");
> +		test_percpu_inc();
> +		break;
> +	}
> +	if (!opt_disable_rseq && rseq_unregister_current_thread())
> +		abort();
> +end:
> +	return 0;
> +
> +error:
> +	return -1;
> +}
> diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h
> new file mode 100644
> index 000000000000..47953c0cef4f
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/rseq-arm.h
> @@ -0,0 +1,535 @@
> +/*
> + * rseq-arm.h
> + *
> + * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#define RSEQ_SIG	0x53053053
> +
> +#define rseq_smp_mb()	__asm__ __volatile__ ("dmb" : : : "memory")
> +#define rseq_smp_rmb()	__asm__ __volatile__ ("dmb" : : : "memory")
> +#define rseq_smp_wmb()	__asm__ __volatile__ ("dmb" : : : "memory")
> +
> +#define rseq_smp_load_acquire(p)					\
> +__extension__ ({							\
> +	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
> +	rseq_smp_mb();							\
> +	____p1;								\
> +})
> +
> +#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
> +
> +#define rseq_smp_store_release(p, v)					\
> +do {									\
> +	rseq_smp_mb();							\
> +	WRITE_ONCE(*p, v);						\
> +} while (0)
> +
> +#define RSEQ_ASM_DEFINE_TABLE(section, version, flags,			\
> +			start_ip, post_commit_offset, abort_ip)		\
> +		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
> +		".balign 32\n\t"					\
> +		".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
> +		".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
> +		".popsection\n\t"
> +
> +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
> +		RSEQ_INJECT_ASM(1)					\
> +		"adr r0, " __rseq_str(cs_label) "\n\t"			\
> +		"str r0, %[" __rseq_str(rseq_cs) "]\n\t"		\
> +		__rseq_str(label) ":\n\t"
> +
> +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
> +		RSEQ_INJECT_ASM(2)					\
> +		"ldr r0, %[" __rseq_str(current_cpu_id) "]\n\t"	\
> +		"cmp %[" __rseq_str(cpu_id) "], r0\n\t"		\
> +		"bne " __rseq_str(label) "\n\t"
> +
> +#define RSEQ_ASM_DEFINE_ABORT(table_label, label, section, sig,		\
> +			teardown, abort_label, version, flags, start_ip,\
> +			post_commit_offset, abort_ip)			\
> +		__rseq_str(table_label) ":\n\t"				\
> +		".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
> +		".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
> +		".word " __rseq_str(RSEQ_SIG) "\n\t"			\
> +		__rseq_str(label) ":\n\t"				\
> +		teardown						\
> +		"b %l[" __rseq_str(abort_label) "]\n\t"
> +
> +#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
> +		__rseq_str(label) ":\n\t"				\
> +		teardown						\
> +		"b %l[" __rseq_str(cmpfail_label) "]\n\t"
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"ldr r0, %[v]\n\t"
> +		"cmp %[expect], r0\n\t"
> +		"bne 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* final store */
> +		"str %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		"b 6f\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
> +			0x0, 0x0, 1b, 2b-1b, 4f)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		"6:\n\t"
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "r0", "memory", "cc"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
> +		off_t voffp, intptr_t *load, int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"ldr r0, %[v]\n\t"
> +		"cmp %[expectnot], r0\n\t"
> +		"beq 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		"str r0, %[load]\n\t"
> +		"add r0, %[voffp]\n\t"
> +		"ldr r0, [r0]\n\t"
> +		/* final store */
> +		"str r0, %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		"b 6f\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
> +			0x0, 0x0, 1b, 2b-1b, 4f)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		"6:\n\t"
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expectnot]"r"(expectnot),
> +		  [voffp]"Ir"(voffp),
> +		  [load]"m"(*load)
> +		  RSEQ_INJECT_INPUT
> +		: "r0", "memory", "cc"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_addv(intptr_t *v, intptr_t count, int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"ldr r0, %[v]\n\t"
> +		"add r0, %[count]\n\t"
> +		/* final store */
> +		"str r0, %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		"b 6f\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
> +			0x0, 0x0, 1b, 2b-1b, 4f)
> +		"6:\n\t"
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  [v]"m"(*v),
> +		  [count]"Ir"(count)
> +		  RSEQ_INJECT_INPUT
> +		: "r0", "memory", "cc"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"ldr r0, %[v]\n\t"
> +		"cmp %[expect], r0\n\t"
> +		"bne 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* try store */
> +		"str %[newv2], %[v2]\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		/* final store */
> +		"str %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		"b 6f\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
> +			0x0, 0x0, 1b, 2b-1b, 4f)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		"6:\n\t"
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* try store input */
> +		  [v2]"m"(*v2),
> +		  [newv2]"r"(newv2),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "r0", "memory", "cc"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"ldr r0, %[v]\n\t"
> +		"cmp %[expect], r0\n\t"
> +		"bne 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* try store */
> +		"str %[newv2], %[v2]\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		"dmb\n\t"	/* full mb provides store-release */
> +		/* final store */
> +		"str %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		"b 6f\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
> +			0x0, 0x0, 1b, 2b-1b, 4f)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		"6:\n\t"
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* try store input */
> +		  [v2]"m"(*v2),
> +		  [newv2]"r"(newv2),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "r0", "memory", "cc"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t expect2, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"ldr r0, %[v]\n\t"
> +		"cmp %[expect], r0\n\t"
> +		"bne 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		"ldr r0, %[v2]\n\t"
> +		"cmp %[expect2], r0\n\t"
> +		"bne 5f\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		/* final store */
> +		"str %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		"b 6f\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
> +			0x0, 0x0, 1b, 2b-1b, 4f)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		"6:\n\t"
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* cmp2 input */
> +		  [v2]"m"(*v2),
> +		  [expect2]"r"(expect2),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "r0", "memory", "cc"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	uint32_t rseq_scratch[3];
> +
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		"str %[src], %[rseq_scratch0]\n\t"
> +		"str %[dst], %[rseq_scratch1]\n\t"
> +		"str %[len], %[rseq_scratch2]\n\t"
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"ldr r0, %[v]\n\t"
> +		"cmp %[expect], r0\n\t"
> +		"bne 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* try memcpy */
> +		"cmp %[len], #0\n\t" \
> +		"beq 333f\n\t" \
> +		"222:\n\t" \
> +		"ldrb %%r0, [%[src]]\n\t" \
> +		"strb %%r0, [%[dst]]\n\t" \
> +		"adds %[src], #1\n\t" \
> +		"adds %[dst], #1\n\t" \
> +		"subs %[len], #1\n\t" \
> +		"bne 222b\n\t" \
> +		"333:\n\t" \
> +		RSEQ_INJECT_ASM(5)
> +		/* final store */
> +		"str %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		/* teardown */
> +		"ldr %[len], %[rseq_scratch2]\n\t"
> +		"ldr %[dst], %[rseq_scratch1]\n\t"
> +		"ldr %[src], %[rseq_scratch0]\n\t"
> +		"b 6f\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG,
> +			/* teardown */
> +			"ldr %[len], %[rseq_scratch2]\n\t"
> +			"ldr %[dst], %[rseq_scratch1]\n\t"
> +			"ldr %[src], %[rseq_scratch0]\n\t",
> +			abort, 0x0, 0x0, 1b, 2b-1b, 4f)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
> +			/* teardown */
> +			"ldr %[len], %[rseq_scratch2]\n\t"
> +			"ldr %[dst], %[rseq_scratch1]\n\t"
> +			"ldr %[src], %[rseq_scratch0]\n\t",
> +			cmpfail)
> +		"6:\n\t"
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv),
> +		  /* try memcpy input */
> +		  [dst]"r"(dst),
> +		  [src]"r"(src),
> +		  [len]"r"(len),
> +		  [rseq_scratch0]"m"(rseq_scratch[0]),
> +		  [rseq_scratch1]"m"(rseq_scratch[1]),
> +		  [rseq_scratch2]"m"(rseq_scratch[2])
> +		  RSEQ_INJECT_INPUT
> +		: "r0", "memory", "cc"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	uint32_t rseq_scratch[3];
> +
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		"str %[src], %[rseq_scratch0]\n\t"
> +		"str %[dst], %[rseq_scratch1]\n\t"
> +		"str %[len], %[rseq_scratch2]\n\t"
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"ldr r0, %[v]\n\t"
> +		"cmp %[expect], r0\n\t"
> +		"bne 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* try memcpy */
> +		"cmp %[len], #0\n\t" \
> +		"beq 333f\n\t" \
> +		"222:\n\t" \
> +		"ldrb %%r0, [%[src]]\n\t" \
> +		"strb %%r0, [%[dst]]\n\t" \
> +		"adds %[src], #1\n\t" \
> +		"adds %[dst], #1\n\t" \
> +		"subs %[len], #1\n\t" \
> +		"bne 222b\n\t" \
> +		"333:\n\t" \
> +		RSEQ_INJECT_ASM(5)
> +		"dmb\n\t"	/* full mb provides store-release */
> +		/* final store */
> +		"str %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		/* teardown */
> +		"ldr %[len], %[rseq_scratch2]\n\t"
> +		"ldr %[dst], %[rseq_scratch1]\n\t"
> +		"ldr %[src], %[rseq_scratch0]\n\t"
> +		"b 6f\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG,
> +			/* teardown */
> +			"ldr %[len], %[rseq_scratch2]\n\t"
> +			"ldr %[dst], %[rseq_scratch1]\n\t"
> +			"ldr %[src], %[rseq_scratch0]\n\t",
> +			abort, 0x0, 0x0, 1b, 2b-1b, 4f)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
> +			/* teardown */
> +			"ldr %[len], %[rseq_scratch2]\n\t"
> +			"ldr %[dst], %[rseq_scratch1]\n\t"
> +			"ldr %[src], %[rseq_scratch0]\n\t",
> +			cmpfail)
> +		"6:\n\t"
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv),
> +		  /* try memcpy input */
> +		  [dst]"r"(dst),
> +		  [src]"r"(src),
> +		  [len]"r"(len),
> +		  [rseq_scratch0]"m"(rseq_scratch[0]),
> +		  [rseq_scratch1]"m"(rseq_scratch[1]),
> +		  [rseq_scratch2]"m"(rseq_scratch[2])
> +		  RSEQ_INJECT_INPUT
> +		: "r0", "memory", "cc"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> diff --git a/tools/testing/selftests/rseq/rseq-ppc.h b/tools/testing/selftests/rseq/rseq-ppc.h
> new file mode 100644
> index 000000000000..3db6be5ceffb
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/rseq-ppc.h
> @@ -0,0 +1,567 @@
> +/*
> + * rseq-ppc.h
> + *
> + * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
> + * (C) Copyright 2016 - Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#define RSEQ_SIG	0x53053053
> +
> +#define rseq_smp_mb()		__asm__ __volatile__ ("sync" : : : "memory")
> +#define rseq_smp_lwsync()	__asm__ __volatile__ ("lwsync" : : : "memory")
> +#define rseq_smp_rmb()		rseq_smp_lwsync()
> +#define rseq_smp_wmb()		rseq_smp_lwsync()
> +
> +#define rseq_smp_load_acquire(p)					\
> +__extension__ ({							\
> +	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
> +	rseq_smp_lwsync();						\
> +	____p1;								\
> +})
> +
> +#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_lwsync()
> +
> +#define rseq_smp_store_release(p, v)					\
> +do {									\
> +	rseq_smp_lwsync();						\
> +	RSEQ_WRITE_ONCE(*p, v);						\
> +} while (0)
> +
> +/*
> + * The __rseq_table section can be used by debuggers to better handle
> + * single-stepping through the restartable critical sections.
> + */
> +
> +#ifdef __PPC64__
> +
> +#define STORE_WORD	"std "
> +#define LOAD_WORD	"ld "
> +#define LOADX_WORD	"ldx "
> +#define CMP_WORD	"cmpd "
> +
> +#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,			\
> +			start_ip, post_commit_offset, abort_ip)			\
> +		".pushsection " __rseq_str(section) ", \"aw\"\n\t"		\
> +		".balign 32\n\t"						\
> +		__rseq_str(label) ":\n\t"					\
> +		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t"	\
> +		".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \
> +		".popsection\n\t"
> +
> +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)			\
> +		RSEQ_INJECT_ASM(1)						\
> +		"lis %%r17, (" __rseq_str(cs_label) ")@highest\n\t"		\
> +		"ori %%r17, %%r17, (" __rseq_str(cs_label) ")@higher\n\t"	\
> +		"rldicr %%r17, %%r17, 32, 31\n\t"				\
> +		"oris %%r17, %%r17, (" __rseq_str(cs_label) ")@high\n\t"	\
> +		"ori %%r17, %%r17, (" __rseq_str(cs_label) ")@l\n\t"		\
> +		"std %%r17, %[" __rseq_str(rseq_cs) "]\n\t"			\
> +		__rseq_str(label) ":\n\t"
> +
> +#else /* #ifdef __PPC64__ */
> +
> +#define STORE_WORD	"stw "
> +#define LOAD_WORD	"lwz "
> +#define LOADX_WORD	"lwzx "
> +#define CMP_WORD	"cmpw "
> +
> +#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,			\
> +			start_ip, post_commit_offset, abort_ip)			\
> +		".pushsection " __rseq_str(section) ", \"aw\"\n\t"		\
> +		".balign 32\n\t"						\
> +		__rseq_str(label) ":\n\t"					\
> +		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t"	\
> +		/* 32-bit only supported on BE */				\
> +		".long 0x0, " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) "\n\t" \
> +		".popsection\n\t"
> +
> +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)			\
> +		RSEQ_INJECT_ASM(1)						\
> +		"lis %%r17, (" __rseq_str(cs_label) ")@ha\n\t"			\
> +		"addi %%r17, %%r17, (" __rseq_str(cs_label) ")@l\n\t"		\
> +		"stw %%r17, %[" __rseq_str(rseq_cs) "]\n\t"			\
> +		__rseq_str(label) ":\n\t"
> +
> +#endif /* #ifdef __PPC64__ */
> +
> +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)			\
> +		RSEQ_INJECT_ASM(2)						\
> +		"lwz %%r17, %[" __rseq_str(current_cpu_id) "]\n\t"		\
> +		"cmpw cr7, %[" __rseq_str(cpu_id) "], %%r17\n\t"		\
> +		"bne- cr7, " __rseq_str(label) "\n\t"
> +
> +#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label)	\
> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"		\
> +		".long " __rseq_str(sig) "\n\t"					\
> +		__rseq_str(label) ":\n\t"					\
> +		teardown							\
> +		"b %l[" __rseq_str(abort_label) "]\n\t"			\
> +		".popsection\n\t"
> +
> +#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label)	\
> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"		\
> +		__rseq_str(label) ":\n\t"					\
> +		teardown							\
> +		"b %l[" __rseq_str(cmpfail_label) "]\n\t"			\
> +		".popsection\n\t"
> +
> +
> +/*
> + * RSEQ_ASM_OPs: asm operations for rseq
> + * 	RSEQ_ASM_OP_R_*: has hard-code registers in it
> + * 	RSEQ_ASM_OP_* (else): doesn't have hard-code registers(unless cr7)
> + */
> +#define RSEQ_ASM_OP_CMPEQ(var, expect, label)					\
> +		LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t"			\
> +		CMP_WORD "cr7, %%r17, %[" __rseq_str(expect) "]\n\t"		\
> +		"bne- cr7, " __rseq_str(label) "\n\t"
> +
> +#define RSEQ_ASM_OP_CMPNE(var, expectnot, label)				\
> +		LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t"			\
> +		CMP_WORD "cr7, %%r17, %[" __rseq_str(expectnot) "]\n\t"	\
> +		"beq- cr7, " __rseq_str(label) "\n\t"
> +
> +#define RSEQ_ASM_OP_STORE(value, var)						\
> +		STORE_WORD "%[" __rseq_str(value) "], %[" __rseq_str(var) "]\n\t"
> +
> +/* Load @var to r17 */
> +#define RSEQ_ASM_OP_R_LOAD(var)							\
> +		LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t"
> +
> +/* Store r17 to @var */
> +#define RSEQ_ASM_OP_R_STORE(var)						\
> +		STORE_WORD "%%r17, %[" __rseq_str(var) "]\n\t"
> +
> +/* Add @count to r17 */
> +#define RSEQ_ASM_OP_R_ADD(count)						\
> +		"add %%r17, %[" __rseq_str(count) "], %%r17\n\t"
> +
> +/* Load (r17 + voffp) to r17 */
> +#define RSEQ_ASM_OP_R_LOADX(voffp)						\
> +		LOADX_WORD "%%r17, %[" __rseq_str(voffp) "], %%r17\n\t"
> +
> +/* TODO: implement a faster memcpy. */
> +#define RSEQ_ASM_OP_R_MEMCPY() \
> +		"cmpdi %%r19, 0\n\t" \
> +		"beq 333f\n\t" \
> +		"addi %%r20, %%r20, -1\n\t" \
> +		"addi %%r21, %%r21, -1\n\t" \
> +		"222:\n\t" \
> +		"lbzu %%r18, 1(%%r20)\n\t" \
> +		"stbu %%r18, 1(%%r21)\n\t" \
> +		"addi %%r19, %%r19, -1\n\t" \
> +		"cmpdi %%r19, 0\n\t" \
> +		"bne 222b\n\t" \
> +		"333:\n\t" \
> +
> +#define RSEQ_ASM_OP_R_FINAL_STORE(var, post_commit_label)			\
> +		STORE_WORD "%%r17, %[" __rseq_str(var) "]\n\t"			\
> +		__rseq_str(post_commit_label) ":\n\t"
> +
> +#define RSEQ_ASM_OP_FINAL_STORE(value, var, post_commit_label)			\
> +		STORE_WORD "%[" __rseq_str(value) "], %[" __rseq_str(var) "]\n\t"	\
> +		__rseq_str(post_commit_label) ":\n\t"
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		/* cmp cpuid */
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		/* cmp @v equal to @expect */
> +		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
> +		RSEQ_INJECT_ASM(4)
> +		/* final store */
> +		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
> +		RSEQ_INJECT_ASM(5)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "r17"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
> +		off_t voffp, intptr_t *load, int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		/* cmp cpuid */
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		/* cmp @v not equal to @expectnot */
> +		RSEQ_ASM_OP_CMPNE(v, expectnot, 5f)
> +		RSEQ_INJECT_ASM(4)
> +		/* load the value of @v */
> +		RSEQ_ASM_OP_R_LOAD(v)
> +		/* store it in @load */
> +		RSEQ_ASM_OP_R_STORE(load)
> +		/* dereference voffp(v) */
> +		RSEQ_ASM_OP_R_LOADX(voffp)
> +		/* final store the value at voffp(v) */
> +		RSEQ_ASM_OP_R_FINAL_STORE(v, 2)
> +		RSEQ_INJECT_ASM(5)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expectnot]"r"(expectnot),
> +		  [voffp]"b"(voffp),
> +		  [load]"m"(*load)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "r17"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_addv(intptr_t *v, intptr_t count, int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		/* cmp cpuid */
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		/* load the value of @v */
> +		RSEQ_ASM_OP_R_LOAD(v)
> +		/* add @count to it */
> +		RSEQ_ASM_OP_R_ADD(count)
> +		/* final store */
> +		RSEQ_ASM_OP_R_FINAL_STORE(v, 2)
> +		RSEQ_INJECT_ASM(4)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [count]"r"(count)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "r17"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		/* cmp cpuid */
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		/* cmp @v equal to @expect */
> +		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
> +		RSEQ_INJECT_ASM(4)
> +		/* try store */
> +		RSEQ_ASM_OP_STORE(newv2, v2)
> +		RSEQ_INJECT_ASM(5)
> +		/* final store */
> +		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
> +		RSEQ_INJECT_ASM(6)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* try store input */
> +		  [v2]"m"(*v2),
> +		  [newv2]"r"(newv2),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "r17"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		/* cmp cpuid */
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		/* cmp @v equal to @expect */
> +		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
> +		RSEQ_INJECT_ASM(4)
> +		/* try store */
> +		RSEQ_ASM_OP_STORE(newv2, v2)
> +		RSEQ_INJECT_ASM(5)
> +		/* for 'release' */
> +		"lwsync\n\t"
> +		/* final store */
> +		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
> +		RSEQ_INJECT_ASM(6)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* try store input */
> +		  [v2]"m"(*v2),
> +		  [newv2]"r"(newv2),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "r17"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t expect2, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		/* cmp cpuid */
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		/* cmp @v equal to @expect */
> +		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
> +		RSEQ_INJECT_ASM(4)
> +		/* cmp @v2 equal to @expct2 */
> +		RSEQ_ASM_OP_CMPEQ(v2, expect2, 5f)
> +		RSEQ_INJECT_ASM(5)
> +		/* final store */
> +		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
> +		RSEQ_INJECT_ASM(6)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* cmp2 input */
> +		  [v2]"m"(*v2),
> +		  [expect2]"r"(expect2),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "r17"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		/* setup for mempcy */
> +		"mr %%r19, %[len]\n\t" \
> +		"mr %%r20, %[src]\n\t" \
> +		"mr %%r21, %[dst]\n\t" \
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		/* cmp cpuid */
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		/* cmp @v equal to @expect */
> +		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
> +		RSEQ_INJECT_ASM(4)
> +		/* try memcpy */
> +		RSEQ_ASM_OP_R_MEMCPY()
> +		RSEQ_INJECT_ASM(5)
> +		/* final store */
> +		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
> +		RSEQ_INJECT_ASM(6)
> +		/* teardown */
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv),
> +		  /* try memcpy input */
> +		  [dst]"r"(dst),
> +		  [src]"r"(src),
> +		  [len]"r"(len)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "r17", "r18", "r19", "r20", "r21"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		/* setup for mempcy */
> +		"mr %%r19, %[len]\n\t" \
> +		"mr %%r20, %[src]\n\t" \
> +		"mr %%r21, %[dst]\n\t" \
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		/* cmp cpuid */
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		/* cmp @v equal to @expect */
> +		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
> +		RSEQ_INJECT_ASM(4)
> +		/* try memcpy */
> +		RSEQ_ASM_OP_R_MEMCPY()
> +		RSEQ_INJECT_ASM(5)
> +		/* for 'release' */
> +		"lwsync\n\t"
> +		/* final store */
> +		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
> +		RSEQ_INJECT_ASM(6)
> +		/* teardown */
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv),
> +		  /* try memcpy input */
> +		  [dst]"r"(dst),
> +		  [src]"r"(src),
> +		  [len]"r"(len)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "r17", "r18", "r19", "r20", "r21"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +#undef STORE_WORD
> +#undef LOAD_WORD
> +#undef LOADX_WORD
> +#undef CMP_WORD
> diff --git a/tools/testing/selftests/rseq/rseq-x86.h b/tools/testing/selftests/rseq/rseq-x86.h
> new file mode 100644
> index 000000000000..63e81d6c61fa
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/rseq-x86.h
> @@ -0,0 +1,898 @@
> +/*
> + * rseq-x86.h
> + *
> + * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#include <stdint.h>
> +
> +#define RSEQ_SIG	0x53053053
> +
> +#ifdef __x86_64__
> +
> +#define rseq_smp_mb()	__asm__ __volatile__ ("mfence" : : : "memory")
> +#define rseq_smp_rmb()	barrier()
> +#define rseq_smp_wmb()	barrier()
> +
> +#define rseq_smp_load_acquire(p)					\
> +__extension__ ({							\
> +	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
> +	barrier();							\
> +	____p1;								\
> +})
> +
> +#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
> +
> +#define rseq_smp_store_release(p, v)					\
> +do {									\
> +	barrier();							\
> +	RSEQ_WRITE_ONCE(*p, v);						\
> +} while (0)
> +
> +#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,		\
> +			start_ip, post_commit_offset, abort_ip)		\
> +		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
> +		".balign 32\n\t"					\
> +		__rseq_str(label) ":\n\t"				\
> +		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
> +		".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \
> +		".popsection\n\t"
> +
> +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
> +		RSEQ_INJECT_ASM(1)					\
> +		"leaq " __rseq_str(cs_label) "(%%rip), %%rax\n\t"	\
> +		"movq %%rax, %[" __rseq_str(rseq_cs) "]\n\t"		\
> +		__rseq_str(label) ":\n\t"
> +
> +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
> +		RSEQ_INJECT_ASM(2)					\
> +		"cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \
> +		"jnz " __rseq_str(label) "\n\t"
> +
> +#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label) \
> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
> +		/* Disassembler-friendly signature: nopl <sig>(%rip). */\
> +		".byte 0x0f, 0x1f, 0x05\n\t"				\
> +		".long " __rseq_str(sig) "\n\t"			\
> +		__rseq_str(label) ":\n\t"				\
> +		teardown						\
> +		"jmp %l[" __rseq_str(abort_label) "]\n\t"		\
> +		".popsection\n\t"
> +
> +#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
> +		__rseq_str(label) ":\n\t"				\
> +		teardown						\
> +		"jmp %l[" __rseq_str(cmpfail_label) "]\n\t"		\
> +		".popsection\n\t"
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"cmpq %[v], %[expect]\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* final store */
> +		"movq %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "rax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
> +		off_t voffp, intptr_t *load, int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"cmpq %[v], %[expectnot]\n\t"
> +		"jz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		"movq %[v], %%rax\n\t"
> +		"movq %%rax, %[load]\n\t"
> +		"addq %[voffp], %%rax\n\t"
> +		"movq (%%rax), %%rax\n\t"
> +		/* final store */
> +		"movq %%rax, %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expectnot]"r"(expectnot),
> +		  [voffp]"er"(voffp),
> +		  [load]"m"(*load)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "rax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_addv(intptr_t *v, intptr_t count, int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		/* final store */
> +		"addq %[count], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [count]"er"(count)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "rax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"cmpq %[v], %[expect]\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* try store */
> +		"movq %[newv2], %[v2]\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		/* final store */
> +		"movq %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* try store input */
> +		  [v2]"m"(*v2),
> +		  [newv2]"r"(newv2),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "rax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +/* x86-64 is TSO. */
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	return rseq_cmpeqv_trystorev_storev(v, expect, v2, newv2,
> +			newv, cpu);
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t expect2, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"cmpq %[v], %[expect]\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		"cmpq %[v2], %[expect2]\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		/* final store */
> +		"movq %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* cmp2 input */
> +		  [v2]"m"(*v2),
> +		  [expect2]"r"(expect2),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "rax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	uint64_t rseq_scratch[3];
> +
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		"movq %[src], %[rseq_scratch0]\n\t"
> +		"movq %[dst], %[rseq_scratch1]\n\t"
> +		"movq %[len], %[rseq_scratch2]\n\t"
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"cmpq %[v], %[expect]\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* try memcpy */
> +		"test %[len], %[len]\n\t" \
> +		"jz 333f\n\t" \
> +		"222:\n\t" \
> +		"movb (%[src]), %%al\n\t" \
> +		"movb %%al, (%[dst])\n\t" \
> +		"inc %[src]\n\t" \
> +		"inc %[dst]\n\t" \
> +		"dec %[len]\n\t" \
> +		"jnz 222b\n\t" \
> +		"333:\n\t" \
> +		RSEQ_INJECT_ASM(5)
> +		/* final store */
> +		"movq %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		/* teardown */
> +		"movq %[rseq_scratch2], %[len]\n\t"
> +		"movq %[rseq_scratch1], %[dst]\n\t"
> +		"movq %[rseq_scratch0], %[src]\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG,
> +			"movq %[rseq_scratch2], %[len]\n\t"
> +			"movq %[rseq_scratch1], %[dst]\n\t"
> +			"movq %[rseq_scratch0], %[src]\n\t",
> +			abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
> +			"movq %[rseq_scratch2], %[len]\n\t"
> +			"movq %[rseq_scratch1], %[dst]\n\t"
> +			"movq %[rseq_scratch0], %[src]\n\t",
> +			cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv),
> +		  /* try memcpy input */
> +		  [dst]"r"(dst),
> +		  [src]"r"(src),
> +		  [len]"r"(len),
> +		  [rseq_scratch0]"m"(rseq_scratch[0]),
> +		  [rseq_scratch1]"m"(rseq_scratch[1]),
> +		  [rseq_scratch2]"m"(rseq_scratch[2])
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "rax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +/* x86-64 is TSO. */
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	return rseq_cmpeqv_trymemcpy_storev(v, expect, dst, src,
> +			len, newv, cpu);
> +}
> +
> +#elif __i386__
> +
> +/*
> + * Support older 32-bit architectures that do not implement fence
> + * instructions.
> + */
> +#define rseq_smp_mb()	\
> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
> +#define rseq_smp_rmb()	\
> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
> +#define rseq_smp_wmb()	\
> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
> +
> +#define rseq_smp_load_acquire(p)					\
> +__extension__ ({							\
> +	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
> +	rseq_smp_mb();							\
> +	____p1;								\
> +})
> +
> +#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
> +
> +#define rseq_smp_store_release(p, v)					\
> +do {									\
> +	rseq_smp_mb();							\
> +	RSEQ_WRITE_ONCE(*p, v);						\
> +} while (0)
> +
> +/*
> + * Use eax as scratch register and take memory operands as input to
> + * lessen register pressure. Especially needed when compiling in O0.
> + */
> +#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,		\
> +			start_ip, post_commit_offset, abort_ip)		\
> +		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
> +		".balign 32\n\t"					\
> +		__rseq_str(label) ":\n\t"				\
> +		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
> +		".long " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
> +		".popsection\n\t"
> +
> +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
> +		RSEQ_INJECT_ASM(1)					\
> +		"movl $" __rseq_str(cs_label) ", %[rseq_cs]\n\t"	\
> +		__rseq_str(label) ":\n\t"
> +
> +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
> +		RSEQ_INJECT_ASM(2)					\
> +		"cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \
> +		"jnz " __rseq_str(label) "\n\t"
> +
> +#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label) \
> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
> +		/* Disassembler-friendly signature: nopl <sig>. */\
> +		".byte 0x0f, 0x1f, 0x05\n\t"				\
> +		".long " __rseq_str(sig) "\n\t"			\
> +		__rseq_str(label) ":\n\t"				\
> +		teardown						\
> +		"jmp %l[" __rseq_str(abort_label) "]\n\t"		\
> +		".popsection\n\t"
> +
> +#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
> +		__rseq_str(label) ":\n\t"				\
> +		teardown						\
> +		"jmp %l[" __rseq_str(cmpfail_label) "]\n\t"		\
> +		".popsection\n\t"
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"cmpl %[v], %[expect]\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* final store */
> +		"movl %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "eax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
> +		off_t voffp, intptr_t *load, int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"cmpl %[v], %[expectnot]\n\t"
> +		"jz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		"movl %[v], %%eax\n\t"
> +		"movl %%eax, %[load]\n\t"
> +		"addl %[voffp], %%eax\n\t"
> +		"movl (%%eax), %%eax\n\t"
> +		/* final store */
> +		"movl %%eax, %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expectnot]"r"(expectnot),
> +		  [voffp]"ir"(voffp),
> +		  [load]"m"(*load)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "eax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_addv(intptr_t *v, intptr_t count, int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		/* final store */
> +		"addl %[count], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [count]"ir"(count)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "eax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"cmpl %[v], %[expect]\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* try store */
> +		"movl %[newv2], %%eax\n\t"
> +		"movl %%eax, %[v2]\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		/* final store */
> +		"movl %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* try store input */
> +		  [v2]"m"(*v2),
> +		  [newv2]"m"(newv2),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "eax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t newv2, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"movl %[expect], %%eax\n\t"
> +		"cmpl %[v], %%eax\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* try store */
> +		"movl %[newv2], %[v2]\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		"lock; addl $0,0(%%esp)\n\t"
> +		/* final store */
> +		"movl %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* try store input */
> +		  [v2]"m"(*v2),
> +		  [newv2]"r"(newv2),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"m"(expect),
> +		  [newv]"r"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "eax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
> +		intptr_t *v2, intptr_t expect2, intptr_t newv,
> +		int cpu)
> +{
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"cmpl %[v], %[expect]\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		"cmpl %[expect2], %[v2]\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(5)
> +		"movl %[newv], %%eax\n\t"
> +		/* final store */
> +		"movl %%eax, %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* cmp2 input */
> +		  [v2]"m"(*v2),
> +		  [expect2]"r"(expect2),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"m"(newv)
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "eax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +/* TODO: implement a faster memcpy. */
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	uint32_t rseq_scratch[3];
> +
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		"movl %[src], %[rseq_scratch0]\n\t"
> +		"movl %[dst], %[rseq_scratch1]\n\t"
> +		"movl %[len], %[rseq_scratch2]\n\t"
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"movl %[expect], %%eax\n\t"
> +		"cmpl %%eax, %[v]\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* try memcpy */
> +		"test %[len], %[len]\n\t" \
> +		"jz 333f\n\t" \
> +		"222:\n\t" \
> +		"movb (%[src]), %%al\n\t" \
> +		"movb %%al, (%[dst])\n\t" \
> +		"inc %[src]\n\t" \
> +		"inc %[dst]\n\t" \
> +		"dec %[len]\n\t" \
> +		"jnz 222b\n\t" \
> +		"333:\n\t" \
> +		RSEQ_INJECT_ASM(5)
> +		"movl %[newv], %%eax\n\t"
> +		/* final store */
> +		"movl %%eax, %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		/* teardown */
> +		"movl %[rseq_scratch2], %[len]\n\t"
> +		"movl %[rseq_scratch1], %[dst]\n\t"
> +		"movl %[rseq_scratch0], %[src]\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG,
> +			"movl %[rseq_scratch2], %[len]\n\t"
> +			"movl %[rseq_scratch1], %[dst]\n\t"
> +			"movl %[rseq_scratch0], %[src]\n\t",
> +			abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
> +			"movl %[rseq_scratch2], %[len]\n\t"
> +			"movl %[rseq_scratch1], %[dst]\n\t"
> +			"movl %[rseq_scratch0], %[src]\n\t",
> +			cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"m"(expect),
> +		  [newv]"m"(newv),
> +		  /* try memcpy input */
> +		  [dst]"r"(dst),
> +		  [src]"r"(src),
> +		  [len]"r"(len),
> +		  [rseq_scratch0]"m"(rseq_scratch[0]),
> +		  [rseq_scratch1]"m"(rseq_scratch[1]),
> +		  [rseq_scratch2]"m"(rseq_scratch[2])
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "eax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +/* TODO: implement a faster memcpy. */
> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
> +		void *dst, void *src, size_t len, intptr_t newv,
> +		int cpu)
> +{
> +	uint32_t rseq_scratch[3];
> +
> +	RSEQ_INJECT_C(9)
> +
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		"movl %[src], %[rseq_scratch0]\n\t"
> +		"movl %[dst], %[rseq_scratch1]\n\t"
> +		"movl %[len], %[rseq_scratch2]\n\t"
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		RSEQ_INJECT_ASM(3)
> +		"movl %[expect], %%eax\n\t"
> +		"cmpl %%eax, %[v]\n\t"
> +		"jnz 5f\n\t"
> +		RSEQ_INJECT_ASM(4)
> +		/* try memcpy */
> +		"test %[len], %[len]\n\t" \
> +		"jz 333f\n\t" \
> +		"222:\n\t" \
> +		"movb (%[src]), %%al\n\t" \
> +		"movb %%al, (%[dst])\n\t" \
> +		"inc %[src]\n\t" \
> +		"inc %[dst]\n\t" \
> +		"dec %[len]\n\t" \
> +		"jnz 222b\n\t" \
> +		"333:\n\t" \
> +		RSEQ_INJECT_ASM(5)
> +		"lock; addl $0,0(%%esp)\n\t"
> +		"movl %[newv], %%eax\n\t"
> +		/* final store */
> +		"movl %%eax, %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_INJECT_ASM(6)
> +		/* teardown */
> +		"movl %[rseq_scratch2], %[len]\n\t"
> +		"movl %[rseq_scratch1], %[dst]\n\t"
> +		"movl %[rseq_scratch0], %[src]\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG,
> +			"movl %[rseq_scratch2], %[len]\n\t"
> +			"movl %[rseq_scratch1], %[dst]\n\t"
> +			"movl %[rseq_scratch0], %[src]\n\t",
> +			abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
> +			"movl %[rseq_scratch2], %[len]\n\t"
> +			"movl %[rseq_scratch1], %[dst]\n\t"
> +			"movl %[rseq_scratch0], %[src]\n\t",
> +			cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expect]"m"(expect),
> +		  [newv]"m"(newv),
> +		  /* try memcpy input */
> +		  [dst]"r"(dst),
> +		  [src]"r"(src),
> +		  [len]"r"(len),
> +		  [rseq_scratch0]"m"(rseq_scratch[0]),
> +		  [rseq_scratch1]"m"(rseq_scratch[1]),
> +		  [rseq_scratch2]"m"(rseq_scratch[2])
> +		  RSEQ_INJECT_INPUT
> +		: "memory", "cc", "eax"
> +		  RSEQ_INJECT_CLOBBER
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	RSEQ_INJECT_FAILED
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +#endif
> diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c
> new file mode 100644
> index 000000000000..b83d3196c33e
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/rseq.c
> @@ -0,0 +1,116 @@
> +/*
> + * rseq.c
> + *
> + * Copyright (C) 2016 Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; only
> + * version 2.1 of the License.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * Lesser General Public License for more details.
> + */
> +
> +#define _GNU_SOURCE
> +#include <errno.h>
> +#include <sched.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <syscall.h>
> +#include <assert.h>
> +#include <signal.h>
> +
> +#include "rseq.h"
> +
> +#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
> +
> +__attribute__((tls_model("initial-exec"))) __thread
> +volatile struct rseq __rseq_abi = {
> +	.cpu_id = RSEQ_CPU_ID_UNINITIALIZED,
> +};
> +
> +static __attribute__((tls_model("initial-exec"))) __thread
> +volatile int refcount;
> +
> +static void signal_off_save(sigset_t *oldset)
> +{
> +	sigset_t set;
> +	int ret;
> +
> +	sigfillset(&set);
> +	ret = pthread_sigmask(SIG_BLOCK, &set, oldset);
> +	if (ret)
> +		abort();
> +}
> +
> +static void signal_restore(sigset_t oldset)
> +{
> +	int ret;
> +
> +	ret = pthread_sigmask(SIG_SETMASK, &oldset, NULL);
> +	if (ret)
> +		abort();
> +}
> +
> +static int sys_rseq(volatile struct rseq *rseq_abi, uint32_t rseq_len,
> +		int flags, uint32_t sig)
> +{
> +	return syscall(__NR_rseq, rseq_abi, rseq_len, flags, sig);
> +}
> +
> +int rseq_register_current_thread(void)
> +{
> +	int rc, ret = 0;
> +	sigset_t oldset;
> +
> +	signal_off_save(&oldset);
> +	if (refcount++)
> +		goto end;
> +	rc = sys_rseq(&__rseq_abi, sizeof(struct rseq), 0, RSEQ_SIG);
> +	if (!rc) {
> +		assert(rseq_current_cpu_raw() >= 0);
> +		goto end;
> +	}
> +	if (errno != EBUSY)
> +		__rseq_abi.cpu_id = -2;
> +	ret = -1;
> +	refcount--;
> +end:
> +	signal_restore(oldset);
> +	return ret;
> +}
> +
> +int rseq_unregister_current_thread(void)
> +{
> +	int rc, ret = 0;
> +	sigset_t oldset;
> +
> +	signal_off_save(&oldset);
> +	if (--refcount)
> +		goto end;
> +	rc = sys_rseq(&__rseq_abi, sizeof(struct rseq),
> +			RSEQ_FLAG_UNREGISTER, RSEQ_SIG);
> +	if (!rc)
> +		goto end;
> +	ret = -1;
> +end:
> +	signal_restore(oldset);
> +	return ret;
> +}
> +
> +int32_t rseq_fallback_current_cpu(void)
> +{
> +	int32_t cpu;
> +
> +	cpu = sched_getcpu();
> +	if (cpu < 0) {
> +		perror("sched_getcpu()");
> +		abort();
> +	}
> +	return cpu;
> +}
> diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
> new file mode 100644
> index 000000000000..26c8ea01e940
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/rseq.h
> @@ -0,0 +1,154 @@
> +/*
> + * rseq.h
> + *
> + * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#ifndef RSEQ_H
> +#define RSEQ_H
> +
> +#include <stdint.h>
> +#include <stdbool.h>
> +#include <pthread.h>
> +#include <signal.h>
> +#include <sched.h>
> +#include <errno.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <sched.h>
> +#include <linux/rseq.h>
> +
> +/*
> + * Empty code injection macros, override when testing.
> + * It is important to consider that the ASM injection macros need to be
> + * fully reentrant (e.g. do not modify the stack).
> + */
> +#ifndef RSEQ_INJECT_ASM
> +#define RSEQ_INJECT_ASM(n)
> +#endif
> +
> +#ifndef RSEQ_INJECT_C
> +#define RSEQ_INJECT_C(n)
> +#endif
> +
> +#ifndef RSEQ_INJECT_INPUT
> +#define RSEQ_INJECT_INPUT
> +#endif
> +
> +#ifndef RSEQ_INJECT_CLOBBER
> +#define RSEQ_INJECT_CLOBBER
> +#endif
> +
> +#ifndef RSEQ_INJECT_FAILED
> +#define RSEQ_INJECT_FAILED
> +#endif
> +
> +extern __thread volatile struct rseq __rseq_abi;
> +
> +#define rseq_likely(x)		__builtin_expect(!!(x), 1)
> +#define rseq_unlikely(x)	__builtin_expect(!!(x), 0)
> +#define rseq_barrier()		__asm__ __volatile__("" : : : "memory")
> +
> +#define RSEQ_ACCESS_ONCE(x)	(*(__volatile__  __typeof__(x) *)&(x))
> +#define RSEQ_WRITE_ONCE(x, v)	__extension__ ({ RSEQ_ACCESS_ONCE(x) = (v); })
> +#define RSEQ_READ_ONCE(x)	RSEQ_ACCESS_ONCE(x)
> +
> +#define __rseq_str_1(x)	#x
> +#define __rseq_str(x)		__rseq_str_1(x)
> +
> +#if defined(__x86_64__) || defined(__i386__)
> +#include <rseq-x86.h>
> +#elif defined(__ARMEL__)
> +#include <rseq-arm.h>
> +#elif defined(__PPC__)
> +#include <rseq-ppc.h>
> +#else
> +#error unsupported target
> +#endif
> +
> +/*
> + * Register rseq for the current thread. This needs to be called once
> + * by any thread which uses restartable sequences, before they start
> + * using restartable sequences, to ensure restartable sequences
> + * succeed. A restartable sequence executed from a non-registered
> + * thread will always fail.
> + */
> +int rseq_register_current_thread(void);
> +
> +/*
> + * Unregister rseq for current thread.
> + */
> +int rseq_unregister_current_thread(void);
> +
> +/*
> + * Restartable sequence fallback for reading the current CPU number.
> + */
> +int32_t rseq_fallback_current_cpu(void);
> +
> +/*
> + * Values returned can be either the current CPU number, -1 (rseq is
> + * uninitialized), or -2 (rseq initialization has failed).
> + */
> +static inline int32_t rseq_current_cpu_raw(void)
> +{
> +	return RSEQ_ACCESS_ONCE(__rseq_abi.cpu_id);
> +}
> +
> +/*
> + * Returns a possible CPU number, which is typically the current CPU.
> + * The returned CPU number can be used to prepare for an rseq critical
> + * section, which will confirm whether the cpu number is indeed the
> + * current one, and whether rseq is initialized.
> + *
> + * The CPU number returned by rseq_cpu_start should always be validated
> + * by passing it to a rseq asm sequence, or by comparing it to the
> + * return value of rseq_current_cpu_raw() if the rseq asm sequence
> + * does not need to be invoked.
> + */
> +static inline uint32_t rseq_cpu_start(void)
> +{
> +	return RSEQ_ACCESS_ONCE(__rseq_abi.cpu_id_start);
> +}
> +
> +static inline uint32_t rseq_current_cpu(void)
> +{
> +	int32_t cpu;
> +
> +	cpu = rseq_current_cpu_raw();
> +	if (rseq_unlikely(cpu < 0))
> +		cpu = rseq_fallback_current_cpu();
> +	return cpu;
> +}
> +
> +/*
> + * rseq_prepare_unload() should be invoked by each thread using rseq_finish*()
> + * at least once between their last rseq_finish*() and library unload of the
> + * library defining the rseq critical section (struct rseq_cs). This also
> + * applies to use of rseq in code generated by JIT: rseq_prepare_unload()
> + * should be invoked at least once by each thread using rseq_finish*() before
> + * reclaim of the memory holding the struct rseq_cs.
> + */
> +static inline void rseq_prepare_unload(void)
> +{
> +	__rseq_abi.rseq_cs = 0;
> +}
> +
> +#endif  /* RSEQ_H_ */
> diff --git a/tools/testing/selftests/rseq/run_param_test.sh b/tools/testing/selftests/rseq/run_param_test.sh
> new file mode 100755
> index 000000000000..c7475a2bef11
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/run_param_test.sh
> @@ -0,0 +1,124 @@
> +#!/bin/bash
> +
> +EXTRA_ARGS=${@}
> +
> +OLDIFS="$IFS"
> +IFS=$'\n'
> +TEST_LIST=(
> +	"-T s"
> +	"-T l"
> +	"-T b"
> +	"-T b -M"
> +	"-T m"
> +	"-T m -M"
> +	"-T i"
> +)
> +
> +TEST_NAME=(
> +	"spinlock"
> +	"list"
> +	"buffer"
> +	"buffer with barrier"
> +	"memcpy"
> +	"memcpy with barrier"
> +	"increment"
> +)
> +IFS="$OLDIFS"
> +
> +function do_tests()
> +{
> +	local i=0
> +	while [ "$i" -lt "${#TEST_LIST[@]}" ]; do
> +		echo "Running test ${TEST_NAME[$i]}"
> +		./param_test ${TEST_LIST[$i]} ${@} ${EXTRA_ARGS} || exit 1
> +		let "i++"
> +	done
> +}
> +
> +echo "Default parameters"
> +do_tests
> +
> +echo "Loop injection: 10000 loops"
> +
> +OLDIFS="$IFS"
> +IFS=$'\n'
> +INJECT_LIST=(
> +	"1"
> +	"2"
> +	"3"
> +	"4"
> +	"5"
> +	"6"
> +	"7"
> +	"8"
> +	"9"
> +)
> +IFS="$OLDIFS"
> +
> +NR_LOOPS=10000
> +
> +i=0
> +while [ "$i" -lt "${#INJECT_LIST[@]}" ]; do
> +	echo "Injecting at <${INJECT_LIST[$i]}>"
> +	do_tests -${INJECT_LIST[i]} ${NR_LOOPS}
> +	let "i++"
> +done
> +NR_LOOPS=
> +
> +function inject_blocking()
> +{
> +	OLDIFS="$IFS"
> +	IFS=$'\n'
> +	INJECT_LIST=(
> +		"7"
> +		"8"
> +		"9"
> +	)
> +	IFS="$OLDIFS"
> +
> +	NR_LOOPS=-1
> +
> +	i=0
> +	while [ "$i" -lt "${#INJECT_LIST[@]}" ]; do
> +		echo "Injecting at <${INJECT_LIST[$i]}>"
> +		do_tests -${INJECT_LIST[i]} -1 ${@}
> +		let "i++"
> +	done
> +	NR_LOOPS=
> +}
> +
> +echo "Yield injection (25%)"
> +inject_blocking -m 4 -y -r 100
> +
> +echo "Yield injection (50%)"
> +inject_blocking -m 2 -y -r 100
> +
> +echo "Yield injection (100%)"
> +inject_blocking -m 1 -y -r 100
> +
> +echo "Kill injection (25%)"
> +inject_blocking -m 4 -k -r 100
> +
> +echo "Kill injection (50%)"
> +inject_blocking -m 2 -k -r 100
> +
> +echo "Kill injection (100%)"
> +inject_blocking -m 1 -k -r 100
> +
> +echo "Sleep injection (1ms, 25%)"
> +inject_blocking -m 4 -s 1 -r 100
> +
> +echo "Sleep injection (1ms, 50%)"
> +inject_blocking -m 2 -s 1 -r 100
> +
> +echo "Sleep injection (1ms, 100%)"
> +inject_blocking -m 1 -s 1 -r 100
> +
> +echo "Disable rseq for 25% threads"
> +do_tests -D 4
> +
> +echo "Disable rseq for 50% threads"
> +do_tests -D 2
> +
> +echo "Disable rseq"
> +do_tests -d
> 

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 16/22] rseq: selftests: arm: workaround gcc asm size guess
  2017-11-21 14:18   ` mathieu.desnoyers
  (?)
  (?)
@ 2017-11-21 15:39     ` shuah
  -1 siblings, 0 replies; 175+ messages in thread
From: Shuah Khan @ 2017-11-21 15:39 UTC (permalink / raw)
  To: Mathieu Desnoyers, Peter Zijlstra, Paul E . McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Florian Weimer, linux-kselftest,
	Shuah Khan, Shuah Khan

On 11/21/2017 07:18 AM, Mathieu Desnoyers wrote:
> Fixes assembler errors:
> /tmp/cceKwI9a.s: Assembler messages:
> /tmp/cceKwI9a.s:849: Error: co-processor offset out of range
> 
> with gcc prior to gcc-7. This can trigger if multiple rseq inline asm
> are used within the same function.
> 
> My best guess on the cause of this issue is that gcc has a hard
> time figuring out the actual size of the inline asm, and therefore
> does not compute the offsets at which literal values can be
> placed from the program counter accurately.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: Paul Turner <pjt@google.com>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Andrew Hunter <ahh@google.com>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Andi Kleen <andi@firstfloor.org>
> CC: Dave Watson <davejwatson@fb.com>
> CC: Chris Lameter <cl@linux.com>
> CC: Ingo Molnar <mingo@redhat.com>
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: Ben Maurer <bmaurer@fb.com>
> CC: Steven Rostedt <rostedt@goodmis.org>
> CC: Josh Triplett <josh@joshtriplett.org>
> CC: Linus Torvalds <torvalds@linux-foundation.org>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Russell King <linux@arm.linux.org.uk>
> CC: Catalin Marinas <catalin.marinas@arm.com>
> CC: Will Deacon <will.deacon@arm.com>
> CC: Michael Kerrisk <mtk.manpages@gmail.com>
> CC: Boqun Feng <boqun.feng@gmail.com>
> CC: Florian Weimer <fweimer@redhat.com>
> CC: Shuah Khan <shuah@kernel.org>
> CC: linux-kselftest@vger.kernel.org
> CC: linux-api@vger.kernel.org
> ---
>  tools/testing/selftests/rseq/rseq-arm.h | 33 +++++++++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
> 
> diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h
> index 47953c0cef4f..6d3fda276f4d 100644
> --- a/tools/testing/selftests/rseq/rseq-arm.h
> +++ b/tools/testing/selftests/rseq/rseq-arm.h
> @@ -79,12 +79,15 @@ do {									\
>  		teardown						\
>  		"b %l[" __rseq_str(cmpfail_label) "]\n\t"
>  
> +#define rseq_workaround_gcc_asm_size_guess()	__asm__ __volatile__("")
> +
>  static inline __attribute__((always_inline))
>  int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
>  		int cpu)
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -115,11 +118,14 @@ int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -129,6 +135,7 @@ int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -164,11 +171,14 @@ int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -177,6 +187,7 @@ int rseq_addv(intptr_t *v, intptr_t count, int cpu)
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -203,8 +214,10 @@ int rseq_addv(intptr_t *v, intptr_t count, int cpu)
>  		  RSEQ_INJECT_CLOBBER
>  		: abort
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  }
> @@ -216,6 +229,7 @@ int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -253,11 +267,14 @@ int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -268,6 +285,7 @@ int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -306,11 +324,14 @@ int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -321,6 +342,7 @@ int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -359,11 +381,14 @@ int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -376,6 +401,7 @@ int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
>  
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		"str %[src], %[rseq_scratch0]\n\t"
> @@ -442,11 +468,14 @@ int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -459,6 +488,7 @@ int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
>  
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		"str %[src], %[rseq_scratch0]\n\t"
> @@ -526,10 +556,13 @@ int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
> 

Looks fine to me. For this patch.

Acked-by: Shuah Khan <shuahkh@osg.samsung.com>

I have comments on other patches in this series.

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 16/22] rseq: selftests: arm: workaround gcc asm size guess
@ 2017-11-21 15:39     ` shuah
  0 siblings, 0 replies; 175+ messages in thread
From: shuah @ 2017-11-21 15:39 UTC (permalink / raw)


On 11/21/2017 07:18 AM, Mathieu Desnoyers wrote:
> Fixes assembler errors:
> /tmp/cceKwI9a.s: Assembler messages:
> /tmp/cceKwI9a.s:849: Error: co-processor offset out of range
> 
> with gcc prior to gcc-7. This can trigger if multiple rseq inline asm
> are used within the same function.
> 
> My best guess on the cause of this issue is that gcc has a hard
> time figuring out the actual size of the inline asm, and therefore
> does not compute the offsets at which literal values can be
> placed from the program counter accurately.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
> CC: Peter Zijlstra <peterz at infradead.org>
> CC: Paul Turner <pjt at google.com>
> CC: Thomas Gleixner <tglx at linutronix.de>
> CC: Andrew Hunter <ahh at google.com>
> CC: Andy Lutomirski <luto at amacapital.net>
> CC: Andi Kleen <andi at firstfloor.org>
> CC: Dave Watson <davejwatson at fb.com>
> CC: Chris Lameter <cl at linux.com>
> CC: Ingo Molnar <mingo at redhat.com>
> CC: "H. Peter Anvin" <hpa at zytor.com>
> CC: Ben Maurer <bmaurer at fb.com>
> CC: Steven Rostedt <rostedt at goodmis.org>
> CC: Josh Triplett <josh at joshtriplett.org>
> CC: Linus Torvalds <torvalds at linux-foundation.org>
> CC: Andrew Morton <akpm at linux-foundation.org>
> CC: Russell King <linux at arm.linux.org.uk>
> CC: Catalin Marinas <catalin.marinas at arm.com>
> CC: Will Deacon <will.deacon at arm.com>
> CC: Michael Kerrisk <mtk.manpages at gmail.com>
> CC: Boqun Feng <boqun.feng at gmail.com>
> CC: Florian Weimer <fweimer at redhat.com>
> CC: Shuah Khan <shuah at kernel.org>
> CC: linux-kselftest at vger.kernel.org
> CC: linux-api at vger.kernel.org
> ---
>  tools/testing/selftests/rseq/rseq-arm.h | 33 +++++++++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
> 
> diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h
> index 47953c0cef4f..6d3fda276f4d 100644
> --- a/tools/testing/selftests/rseq/rseq-arm.h
> +++ b/tools/testing/selftests/rseq/rseq-arm.h
> @@ -79,12 +79,15 @@ do {									\
>  		teardown						\
>  		"b %l[" __rseq_str(cmpfail_label) "]\n\t"
>  
> +#define rseq_workaround_gcc_asm_size_guess()	__asm__ __volatile__("")
> +
>  static inline __attribute__((always_inline))
>  int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
>  		int cpu)
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -115,11 +118,14 @@ int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -129,6 +135,7 @@ int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -164,11 +171,14 @@ int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -177,6 +187,7 @@ int rseq_addv(intptr_t *v, intptr_t count, int cpu)
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -203,8 +214,10 @@ int rseq_addv(intptr_t *v, intptr_t count, int cpu)
>  		  RSEQ_INJECT_CLOBBER
>  		: abort
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  }
> @@ -216,6 +229,7 @@ int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -253,11 +267,14 @@ int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -268,6 +285,7 @@ int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -306,11 +324,14 @@ int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -321,6 +342,7 @@ int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -359,11 +381,14 @@ int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -376,6 +401,7 @@ int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
>  
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		"str %[src], %[rseq_scratch0]\n\t"
> @@ -442,11 +468,14 @@ int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -459,6 +488,7 @@ int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
>  
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		"str %[src], %[rseq_scratch0]\n\t"
> @@ -526,10 +556,13 @@ int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
> 

Looks fine to me. For this patch.

Acked-by: Shuah Khan <shuahkh at osg.samsung.com>

I have comments on other patches in this series.

thanks,
-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 16/22] rseq: selftests: arm: workaround gcc asm size guess
@ 2017-11-21 15:39     ` shuah
  0 siblings, 0 replies; 175+ messages in thread
From: Shuah Khan @ 2017-11-21 15:39 UTC (permalink / raw)


On 11/21/2017 07:18 AM, Mathieu Desnoyers wrote:
> Fixes assembler errors:
> /tmp/cceKwI9a.s: Assembler messages:
> /tmp/cceKwI9a.s:849: Error: co-processor offset out of range
> 
> with gcc prior to gcc-7. This can trigger if multiple rseq inline asm
> are used within the same function.
> 
> My best guess on the cause of this issue is that gcc has a hard
> time figuring out the actual size of the inline asm, and therefore
> does not compute the offsets at which literal values can be
> placed from the program counter accurately.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
> CC: Peter Zijlstra <peterz at infradead.org>
> CC: Paul Turner <pjt at google.com>
> CC: Thomas Gleixner <tglx at linutronix.de>
> CC: Andrew Hunter <ahh at google.com>
> CC: Andy Lutomirski <luto at amacapital.net>
> CC: Andi Kleen <andi at firstfloor.org>
> CC: Dave Watson <davejwatson at fb.com>
> CC: Chris Lameter <cl at linux.com>
> CC: Ingo Molnar <mingo at redhat.com>
> CC: "H. Peter Anvin" <hpa at zytor.com>
> CC: Ben Maurer <bmaurer at fb.com>
> CC: Steven Rostedt <rostedt at goodmis.org>
> CC: Josh Triplett <josh at joshtriplett.org>
> CC: Linus Torvalds <torvalds at linux-foundation.org>
> CC: Andrew Morton <akpm at linux-foundation.org>
> CC: Russell King <linux at arm.linux.org.uk>
> CC: Catalin Marinas <catalin.marinas at arm.com>
> CC: Will Deacon <will.deacon at arm.com>
> CC: Michael Kerrisk <mtk.manpages at gmail.com>
> CC: Boqun Feng <boqun.feng at gmail.com>
> CC: Florian Weimer <fweimer at redhat.com>
> CC: Shuah Khan <shuah at kernel.org>
> CC: linux-kselftest at vger.kernel.org
> CC: linux-api at vger.kernel.org
> ---
>  tools/testing/selftests/rseq/rseq-arm.h | 33 +++++++++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
> 
> diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h
> index 47953c0cef4f..6d3fda276f4d 100644
> --- a/tools/testing/selftests/rseq/rseq-arm.h
> +++ b/tools/testing/selftests/rseq/rseq-arm.h
> @@ -79,12 +79,15 @@ do {									\
>  		teardown						\
>  		"b %l[" __rseq_str(cmpfail_label) "]\n\t"
>  
> +#define rseq_workaround_gcc_asm_size_guess()	__asm__ __volatile__("")
> +
>  static inline __attribute__((always_inline))
>  int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
>  		int cpu)
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -115,11 +118,14 @@ int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -129,6 +135,7 @@ int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -164,11 +171,14 @@ int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -177,6 +187,7 @@ int rseq_addv(intptr_t *v, intptr_t count, int cpu)
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -203,8 +214,10 @@ int rseq_addv(intptr_t *v, intptr_t count, int cpu)
>  		  RSEQ_INJECT_CLOBBER
>  		: abort
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  }
> @@ -216,6 +229,7 @@ int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -253,11 +267,14 @@ int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -268,6 +285,7 @@ int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -306,11 +324,14 @@ int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -321,6 +342,7 @@ int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -359,11 +381,14 @@ int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -376,6 +401,7 @@ int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
>  
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		"str %[src], %[rseq_scratch0]\n\t"
> @@ -442,11 +468,14 @@ int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -459,6 +488,7 @@ int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
>  
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		"str %[src], %[rseq_scratch0]\n\t"
> @@ -526,10 +556,13 @@ int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
> 

Looks fine to me. For this patch.

Acked-by: Shuah Khan <shuahkh at osg.samsung.com>

I have comments on other patches in this series.

thanks,
-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 16/22] rseq: selftests: arm: workaround gcc asm size guess
@ 2017-11-21 15:39     ` shuah
  0 siblings, 0 replies; 175+ messages in thread
From: Shuah Khan @ 2017-11-21 15:39 UTC (permalink / raw)
  To: Mathieu Desnoyers, Peter Zijlstra, Paul E . McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Florian Weimer,
	linux-kselftest-u79uwXL29TY76Z2rM5mHXA, Shuah Khan

On 11/21/2017 07:18 AM, Mathieu Desnoyers wrote:
> Fixes assembler errors:
> /tmp/cceKwI9a.s: Assembler messages:
> /tmp/cceKwI9a.s:849: Error: co-processor offset out of range
> 
> with gcc prior to gcc-7. This can trigger if multiple rseq inline asm
> are used within the same function.
> 
> My best guess on the cause of this issue is that gcc has a hard
> time figuring out the actual size of the inline asm, and therefore
> does not compute the offsets at which literal values can be
> placed from the program counter accurately.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
> CC: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> CC: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
> CC: Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> CC: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
> CC: Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
> CC: Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
> CC: Dave Watson <davejwatson-b10kYP2dOMg@public.gmane.org>
> CC: Chris Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
> CC: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> CC: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
> CC: Ben Maurer <bmaurer-b10kYP2dOMg@public.gmane.org>
> CC: Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>
> CC: Josh Triplett <josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org>
> CC: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> CC: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> CC: Russell King <linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>
> CC: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
> CC: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
> CC: Michael Kerrisk <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> CC: Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> CC: Florian Weimer <fweimer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> CC: Shuah Khan <shuah-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> CC: linux-kselftest-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> CC: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> ---
>  tools/testing/selftests/rseq/rseq-arm.h | 33 +++++++++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
> 
> diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h
> index 47953c0cef4f..6d3fda276f4d 100644
> --- a/tools/testing/selftests/rseq/rseq-arm.h
> +++ b/tools/testing/selftests/rseq/rseq-arm.h
> @@ -79,12 +79,15 @@ do {									\
>  		teardown						\
>  		"b %l[" __rseq_str(cmpfail_label) "]\n\t"
>  
> +#define rseq_workaround_gcc_asm_size_guess()	__asm__ __volatile__("")
> +
>  static inline __attribute__((always_inline))
>  int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
>  		int cpu)
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -115,11 +118,14 @@ int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -129,6 +135,7 @@ int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -164,11 +171,14 @@ int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -177,6 +187,7 @@ int rseq_addv(intptr_t *v, intptr_t count, int cpu)
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -203,8 +214,10 @@ int rseq_addv(intptr_t *v, intptr_t count, int cpu)
>  		  RSEQ_INJECT_CLOBBER
>  		: abort
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  }
> @@ -216,6 +229,7 @@ int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -253,11 +267,14 @@ int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -268,6 +285,7 @@ int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -306,11 +324,14 @@ int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -321,6 +342,7 @@ int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
>  {
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
> @@ -359,11 +381,14 @@ int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -376,6 +401,7 @@ int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
>  
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		"str %[src], %[rseq_scratch0]\n\t"
> @@ -442,11 +468,14 @@ int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
>  
> @@ -459,6 +488,7 @@ int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
>  
>  	RSEQ_INJECT_C(9)
>  
> +	rseq_workaround_gcc_asm_size_guess();
>  	__asm__ __volatile__ goto (
>  		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>  		"str %[src], %[rseq_scratch0]\n\t"
> @@ -526,10 +556,13 @@ int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
>  		  RSEQ_INJECT_CLOBBER
>  		: abort, cmpfail
>  	);
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 0;
>  abort:
> +	rseq_workaround_gcc_asm_size_guess();
>  	RSEQ_INJECT_FAILED
>  	return -1;
>  cmpfail:
> +	rseq_workaround_gcc_asm_size_guess();
>  	return 1;
>  }
> 

Looks fine to me. For this patch.

Acked-by: Shuah Khan <shuahkh-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org>

I have comments on other patches in this series.

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 14/22] cpu_opv: selftests: Implement selftests
  2017-11-21 15:17     ` shuah
  (?)
  (?)
@ 2017-11-21 16:46       ` mathieu.desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 16:46 UTC (permalink / raw)
  To: shuah
  Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, linux-kselftest, Shuah Khan

----- On Nov 21, 2017, at 10:17 AM, shuah shuah@kernel.org wrote:

[...]

>> +int main(int argc, char **argv)
>> +{
>> +	int ret = 0;
>> +
>> +	ret |= test_compare_eq_same();
>> +	ret |= test_compare_eq_diff();
>> +	ret |= test_compare_ne_same();
>> +	ret |= test_compare_ne_diff();
>> +	ret |= test_2compare_eq_index();
>> +	ret |= test_2compare_ne_index();
>> +	ret |= test_memcpy();
>> +	ret |= test_memcpy_u32();
>> +	ret |= test_memcpy_mb_memcpy();
>> +	ret |= test_add();
>> +	ret |= test_two_add();
>> +	ret |= test_or();
>> +	ret |= test_and();
>> +	ret |= test_xor();
>> +	ret |= test_lshift();
>> +	ret |= test_rshift();
>> +	ret |= test_cmpxchg_success();
>> +	ret |= test_cmpxchg_fail();
>> +	ret |= test_memcpy_fault();
>> +	ret |= test_unknown_op();
>> +	ret |= test_max_ops();
>> +	ret |= test_too_many_ops();
>> +	ret |= test_memcpy_single_too_large();
>> +	ret |= test_memcpy_single_ok_sum_too_large();
>> +	ret |= test_page_fault();
>> +
> 
> Where do pass counts get printed. I am seeing error messages when tests fail,
> not seeing any pass messages. It would be nice to use ksft framework for
> counting pass/fail for these series of tests that get run.

done. New output:

TAP version 13
(standard_in) 1: syntax error
selftests: basic_cpu_opv_test
========================================
TAP version 13
ok 1 test_compare_eq same test
ok 2 test_compare_eq different test
ok 3 test_compare_ne same test
ok 4 test_compare_ne different test
ok 5 test_2compare_eq index test
ok 6 test_2compare_ne index test
ok 7 test_memcpy test
ok 8 test_memcpy_u32 test
ok 9 test_memcpy_mb_memcpy test
ok 10 test_add test
ok 11 test_two_add test
ok 12 test_or test
ok 13 test_and test
ok 14 test_xor test
ok 15 test_lshift test
ok 16 test_rshift test
ok 17 test_cmpxchg success test
ok 18 test_cmpxchg fail test
ok 19 test_memcpy_fault test
ok 20 test_unknown_op test
ok 21 test_max_ops test
ok 22 test_too_many_ops test
ok 23 test_memcpy_single_too_large test
ok 24 test_memcpy_single_ok_sum_too_large test
ok 25 test_page_fault test
Pass 25 Fail 0 Xfail 0 Xpass 0 Skip 0 Error 0
1..25
ok 1.. selftests: basic_cpu_opv_test [PASS]

(note the "(standard_in) 1: syntax error" for which I provided a fix
in a separate thread still appears with my make version)

[...]

>> diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
>> index 5bef05d6ba39..441d7bc63bb7 100644
>> --- a/tools/testing/selftests/lib.mk
>> +++ b/tools/testing/selftests/lib.mk
>> @@ -105,6 +105,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c
>>  LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
>>  endif
>>  
>> +# Selftest makefiles can override those targets by setting
>> +# OVERRIDE_TARGETS = 1.
>> +ifeq ($(OVERRIDE_TARGETS),)
>>  $(OUTPUT)/%:%.c
>>  	$(LINK.c) $^ $(LDLIBS) -o $@
>>  
>> @@ -113,5 +116,6 @@ $(OUTPUT)/%.o:%.S
>>  
>>  $(OUTPUT)/%:%.S
>>  	$(LINK.S) $^ $(LDLIBS) -o $@
>> +endif
>>  
>>  .PHONY: run_tests all clean install emit_tests
>> 
> 
> As I said before, please do this change in a separate patch.

Sorry, it appears that I missed this comment last time. Will move
this change to a separate patch.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 14/22] cpu_opv: selftests: Implement selftests
@ 2017-11-21 16:46       ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: mathieu.desnoyers @ 2017-11-21 16:46 UTC (permalink / raw)


----- On Nov 21, 2017, at 10:17 AM, shuah shuah at kernel.org wrote:

[...]

>> +int main(int argc, char **argv)
>> +{
>> +	int ret = 0;
>> +
>> +	ret |= test_compare_eq_same();
>> +	ret |= test_compare_eq_diff();
>> +	ret |= test_compare_ne_same();
>> +	ret |= test_compare_ne_diff();
>> +	ret |= test_2compare_eq_index();
>> +	ret |= test_2compare_ne_index();
>> +	ret |= test_memcpy();
>> +	ret |= test_memcpy_u32();
>> +	ret |= test_memcpy_mb_memcpy();
>> +	ret |= test_add();
>> +	ret |= test_two_add();
>> +	ret |= test_or();
>> +	ret |= test_and();
>> +	ret |= test_xor();
>> +	ret |= test_lshift();
>> +	ret |= test_rshift();
>> +	ret |= test_cmpxchg_success();
>> +	ret |= test_cmpxchg_fail();
>> +	ret |= test_memcpy_fault();
>> +	ret |= test_unknown_op();
>> +	ret |= test_max_ops();
>> +	ret |= test_too_many_ops();
>> +	ret |= test_memcpy_single_too_large();
>> +	ret |= test_memcpy_single_ok_sum_too_large();
>> +	ret |= test_page_fault();
>> +
> 
> Where do pass counts get printed. I am seeing error messages when tests fail,
> not seeing any pass messages. It would be nice to use ksft framework for
> counting pass/fail for these series of tests that get run.

done. New output:

TAP version 13
(standard_in) 1: syntax error
selftests: basic_cpu_opv_test
========================================
TAP version 13
ok 1 test_compare_eq same test
ok 2 test_compare_eq different test
ok 3 test_compare_ne same test
ok 4 test_compare_ne different test
ok 5 test_2compare_eq index test
ok 6 test_2compare_ne index test
ok 7 test_memcpy test
ok 8 test_memcpy_u32 test
ok 9 test_memcpy_mb_memcpy test
ok 10 test_add test
ok 11 test_two_add test
ok 12 test_or test
ok 13 test_and test
ok 14 test_xor test
ok 15 test_lshift test
ok 16 test_rshift test
ok 17 test_cmpxchg success test
ok 18 test_cmpxchg fail test
ok 19 test_memcpy_fault test
ok 20 test_unknown_op test
ok 21 test_max_ops test
ok 22 test_too_many_ops test
ok 23 test_memcpy_single_too_large test
ok 24 test_memcpy_single_ok_sum_too_large test
ok 25 test_page_fault test
Pass 25 Fail 0 Xfail 0 Xpass 0 Skip 0 Error 0
1..25
ok 1.. selftests: basic_cpu_opv_test [PASS]

(note the "(standard_in) 1: syntax error" for which I provided a fix
in a separate thread still appears with my make version)

[...]

>> diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
>> index 5bef05d6ba39..441d7bc63bb7 100644
>> --- a/tools/testing/selftests/lib.mk
>> +++ b/tools/testing/selftests/lib.mk
>> @@ -105,6 +105,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c
>>  LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
>>  endif
>>  
>> +# Selftest makefiles can override those targets by setting
>> +# OVERRIDE_TARGETS = 1.
>> +ifeq ($(OVERRIDE_TARGETS),)
>>  $(OUTPUT)/%:%.c
>>  	$(LINK.c) $^ $(LDLIBS) -o $@
>>  
>> @@ -113,5 +116,6 @@ $(OUTPUT)/%.o:%.S
>>  
>>  $(OUTPUT)/%:%.S
>>  	$(LINK.S) $^ $(LDLIBS) -o $@
>> +endif
>>  
>>  .PHONY: run_tests all clean install emit_tests
>> 
> 
> As I said before, please do this change in a separate patch.

Sorry, it appears that I missed this comment last time. Will move
this change to a separate patch.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 14/22] cpu_opv: selftests: Implement selftests
@ 2017-11-21 16:46       ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 16:46 UTC (permalink / raw)


----- On Nov 21, 2017,@10:17 AM, shuah shuah@kernel.org wrote:

[...]

>> +int main(int argc, char **argv)
>> +{
>> +	int ret = 0;
>> +
>> +	ret |= test_compare_eq_same();
>> +	ret |= test_compare_eq_diff();
>> +	ret |= test_compare_ne_same();
>> +	ret |= test_compare_ne_diff();
>> +	ret |= test_2compare_eq_index();
>> +	ret |= test_2compare_ne_index();
>> +	ret |= test_memcpy();
>> +	ret |= test_memcpy_u32();
>> +	ret |= test_memcpy_mb_memcpy();
>> +	ret |= test_add();
>> +	ret |= test_two_add();
>> +	ret |= test_or();
>> +	ret |= test_and();
>> +	ret |= test_xor();
>> +	ret |= test_lshift();
>> +	ret |= test_rshift();
>> +	ret |= test_cmpxchg_success();
>> +	ret |= test_cmpxchg_fail();
>> +	ret |= test_memcpy_fault();
>> +	ret |= test_unknown_op();
>> +	ret |= test_max_ops();
>> +	ret |= test_too_many_ops();
>> +	ret |= test_memcpy_single_too_large();
>> +	ret |= test_memcpy_single_ok_sum_too_large();
>> +	ret |= test_page_fault();
>> +
> 
> Where do pass counts get printed. I am seeing error messages when tests fail,
> not seeing any pass messages. It would be nice to use ksft framework for
> counting pass/fail for these series of tests that get run.

done. New output:

TAP version 13
(standard_in) 1: syntax error
selftests: basic_cpu_opv_test
========================================
TAP version 13
ok 1 test_compare_eq same test
ok 2 test_compare_eq different test
ok 3 test_compare_ne same test
ok 4 test_compare_ne different test
ok 5 test_2compare_eq index test
ok 6 test_2compare_ne index test
ok 7 test_memcpy test
ok 8 test_memcpy_u32 test
ok 9 test_memcpy_mb_memcpy test
ok 10 test_add test
ok 11 test_two_add test
ok 12 test_or test
ok 13 test_and test
ok 14 test_xor test
ok 15 test_lshift test
ok 16 test_rshift test
ok 17 test_cmpxchg success test
ok 18 test_cmpxchg fail test
ok 19 test_memcpy_fault test
ok 20 test_unknown_op test
ok 21 test_max_ops test
ok 22 test_too_many_ops test
ok 23 test_memcpy_single_too_large test
ok 24 test_memcpy_single_ok_sum_too_large test
ok 25 test_page_fault test
Pass 25 Fail 0 Xfail 0 Xpass 0 Skip 0 Error 0
1..25
ok 1.. selftests: basic_cpu_opv_test [PASS]

(note the "(standard_in) 1: syntax error" for which I provided a fix
in a separate thread still appears with my make version)

[...]

>> diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
>> index 5bef05d6ba39..441d7bc63bb7 100644
>> --- a/tools/testing/selftests/lib.mk
>> +++ b/tools/testing/selftests/lib.mk
>> @@ -105,6 +105,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c
>>  LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
>>  endif
>>  
>> +# Selftest makefiles can override those targets by setting
>> +# OVERRIDE_TARGETS = 1.
>> +ifeq ($(OVERRIDE_TARGETS),)
>>  $(OUTPUT)/%:%.c
>>  	$(LINK.c) $^ $(LDLIBS) -o $@
>>  
>> @@ -113,5 +116,6 @@ $(OUTPUT)/%.o:%.S
>>  
>>  $(OUTPUT)/%:%.S
>>  	$(LINK.S) $^ $(LDLIBS) -o $@
>> +endif
>>  
>>  .PHONY: run_tests all clean install emit_tests
>> 
> 
> As I said before, please do this change in a separate patch.

Sorry, it appears that I missed this comment last time. Will move
this change to a separate patch.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 14/22] cpu_opv: selftests: Implement selftests
@ 2017-11-21 16:46       ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 16:46 UTC (permalink / raw)
  To: shuah
  Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas

----- On Nov 21, 2017, at 10:17 AM, shuah shuah-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org wrote:

[...]

>> +int main(int argc, char **argv)
>> +{
>> +	int ret = 0;
>> +
>> +	ret |= test_compare_eq_same();
>> +	ret |= test_compare_eq_diff();
>> +	ret |= test_compare_ne_same();
>> +	ret |= test_compare_ne_diff();
>> +	ret |= test_2compare_eq_index();
>> +	ret |= test_2compare_ne_index();
>> +	ret |= test_memcpy();
>> +	ret |= test_memcpy_u32();
>> +	ret |= test_memcpy_mb_memcpy();
>> +	ret |= test_add();
>> +	ret |= test_two_add();
>> +	ret |= test_or();
>> +	ret |= test_and();
>> +	ret |= test_xor();
>> +	ret |= test_lshift();
>> +	ret |= test_rshift();
>> +	ret |= test_cmpxchg_success();
>> +	ret |= test_cmpxchg_fail();
>> +	ret |= test_memcpy_fault();
>> +	ret |= test_unknown_op();
>> +	ret |= test_max_ops();
>> +	ret |= test_too_many_ops();
>> +	ret |= test_memcpy_single_too_large();
>> +	ret |= test_memcpy_single_ok_sum_too_large();
>> +	ret |= test_page_fault();
>> +
> 
> Where do pass counts get printed. I am seeing error messages when tests fail,
> not seeing any pass messages. It would be nice to use ksft framework for
> counting pass/fail for these series of tests that get run.

done. New output:

TAP version 13
(standard_in) 1: syntax error
selftests: basic_cpu_opv_test
========================================
TAP version 13
ok 1 test_compare_eq same test
ok 2 test_compare_eq different test
ok 3 test_compare_ne same test
ok 4 test_compare_ne different test
ok 5 test_2compare_eq index test
ok 6 test_2compare_ne index test
ok 7 test_memcpy test
ok 8 test_memcpy_u32 test
ok 9 test_memcpy_mb_memcpy test
ok 10 test_add test
ok 11 test_two_add test
ok 12 test_or test
ok 13 test_and test
ok 14 test_xor test
ok 15 test_lshift test
ok 16 test_rshift test
ok 17 test_cmpxchg success test
ok 18 test_cmpxchg fail test
ok 19 test_memcpy_fault test
ok 20 test_unknown_op test
ok 21 test_max_ops test
ok 22 test_too_many_ops test
ok 23 test_memcpy_single_too_large test
ok 24 test_memcpy_single_ok_sum_too_large test
ok 25 test_page_fault test
Pass 25 Fail 0 Xfail 0 Xpass 0 Skip 0 Error 0
1..25
ok 1.. selftests: basic_cpu_opv_test [PASS]

(note the "(standard_in) 1: syntax error" for which I provided a fix
in a separate thread still appears with my make version)

[...]

>> diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
>> index 5bef05d6ba39..441d7bc63bb7 100644
>> --- a/tools/testing/selftests/lib.mk
>> +++ b/tools/testing/selftests/lib.mk
>> @@ -105,6 +105,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c
>>  LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
>>  endif
>>  
>> +# Selftest makefiles can override those targets by setting
>> +# OVERRIDE_TARGETS = 1.
>> +ifeq ($(OVERRIDE_TARGETS),)
>>  $(OUTPUT)/%:%.c
>>  	$(LINK.c) $^ $(LDLIBS) -o $@
>>  
>> @@ -113,5 +116,6 @@ $(OUTPUT)/%.o:%.S
>>  
>>  $(OUTPUT)/%:%.S
>>  	$(LINK.S) $^ $(LDLIBS) -o $@
>> +endif
>>  
>>  .PHONY: run_tests all clean install emit_tests
>> 
> 
> As I said before, please do this change in a separate patch.

Sorry, it appears that I missed this comment last time. Will move
this change to a separate patch.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
  2017-11-21 15:34     ` Shuah Khan
  (?)
  (?)
@ 2017-11-21 17:05       ` mathieu.desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 17:05 UTC (permalink / raw)
  To: shuah
  Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, linux-kselftest, Shuah Khan

----- On Nov 21, 2017, at 10:34 AM, shuah shuah@kernel.org wrote:

[...]
>> ---
>>  MAINTAINERS                                        |    1 +
>>  tools/testing/selftests/Makefile                   |    1 +
>>  tools/testing/selftests/rseq/.gitignore            |    4 +
> 
> Thanks for the .gitignore files. It is commonly missed change, I end
> up adding one to clean things up after tests get in.

I'm used to receive patches where contributors forget to add new files
to gitignore within my own projects, which may contribute to my awareness
of this pain point. :)

[...]

>> +
>> +void *test_percpu_inc_thread(void *arg)
>> +{
>> +	struct inc_thread_test_data *thread_data = arg;
>> +	struct inc_test_data *data = thread_data->data;
>> +	long long i, reps;
>> +
>> +	if (!opt_disable_rseq && thread_data->reg
>> +			&& rseq_register_current_thread())
>> +		abort();
>> +	reps = thread_data->reps;
>> +	for (i = 0; i < reps; i++) {
>> +		int cpu, ret;
>> +
>> +#ifndef SKIP_FASTPATH
>> +		/* Try fast path. */
>> +		cpu = rseq_cpu_start();
>> +		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>> +		if (likely(!ret))
>> +			goto next;
>> +#endif
> 
> So the test needs to compiled with this enabled? I think it would be better
> to make this an argument to be abel to select at test start time as opposed
> to making this compile time option. Remember that these tests get run in
> automated test rings. Making this a compile time otpion pertty much ensures
> that this path will not be tested.
> 
> So I would reccommend adding a paratemer.
> 
>> +	slowpath:
>> +		__attribute__((unused));
>> +		for (;;) {
>> +			/* Fallback on cpu_opv system call. */
>> +			cpu = rseq_current_cpu();
>> +			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>> +			if (likely(!ret))
>> +				break;
>> +			assert(ret >= 0 || errno == EAGAIN);
>> +		}
>> +	next:
>> +		__attribute__((unused));
>> +#ifndef BENCHMARK
>> +		if (i != 0 && !(i % (reps / 10)))
>> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
>> +#endif
> 
> Same comment as before. Avoid compile time options.

The goal of those compiler define are to generate the altered code without
adding branches into the fast-paths.

Here is an alternative solution that should take care of your concern: I'll
build multiple targets for param_test.c:

param_test
param_test_skip_fastpath (built with -DSKIP_FASTPATH)
param_test_benchmark (build with -DBENCHMARK)

I'll update run_param_test.sh to run both param_test and param_test_skip_fastpath.

Note that "param_test_benchmark" is only useful for benchmarking,
so I don't plan to run it from run_param_test.sh which is meant
to track regressions.

Is that approach OK with you ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-21 17:05       ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: mathieu.desnoyers @ 2017-11-21 17:05 UTC (permalink / raw)


----- On Nov 21, 2017, at 10:34 AM, shuah shuah at kernel.org wrote:

[...]
>> ---
>>  MAINTAINERS                                        |    1 +
>>  tools/testing/selftests/Makefile                   |    1 +
>>  tools/testing/selftests/rseq/.gitignore            |    4 +
> 
> Thanks for the .gitignore files. It is commonly missed change, I end
> up adding one to clean things up after tests get in.

I'm used to receive patches where contributors forget to add new files
to gitignore within my own projects, which may contribute to my awareness
of this pain point. :)

[...]

>> +
>> +void *test_percpu_inc_thread(void *arg)
>> +{
>> +	struct inc_thread_test_data *thread_data = arg;
>> +	struct inc_test_data *data = thread_data->data;
>> +	long long i, reps;
>> +
>> +	if (!opt_disable_rseq && thread_data->reg
>> +			&& rseq_register_current_thread())
>> +		abort();
>> +	reps = thread_data->reps;
>> +	for (i = 0; i < reps; i++) {
>> +		int cpu, ret;
>> +
>> +#ifndef SKIP_FASTPATH
>> +		/* Try fast path. */
>> +		cpu = rseq_cpu_start();
>> +		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>> +		if (likely(!ret))
>> +			goto next;
>> +#endif
> 
> So the test needs to compiled with this enabled? I think it would be better
> to make this an argument to be abel to select at test start time as opposed
> to making this compile time option. Remember that these tests get run in
> automated test rings. Making this a compile time otpion pertty much ensures
> that this path will not be tested.
> 
> So I would reccommend adding a paratemer.
> 
>> +	slowpath:
>> +		__attribute__((unused));
>> +		for (;;) {
>> +			/* Fallback on cpu_opv system call. */
>> +			cpu = rseq_current_cpu();
>> +			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>> +			if (likely(!ret))
>> +				break;
>> +			assert(ret >= 0 || errno == EAGAIN);
>> +		}
>> +	next:
>> +		__attribute__((unused));
>> +#ifndef BENCHMARK
>> +		if (i != 0 && !(i % (reps / 10)))
>> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
>> +#endif
> 
> Same comment as before. Avoid compile time options.

The goal of those compiler define are to generate the altered code without
adding branches into the fast-paths.

Here is an alternative solution that should take care of your concern: I'll
build multiple targets for param_test.c:

param_test
param_test_skip_fastpath (built with -DSKIP_FASTPATH)
param_test_benchmark (build with -DBENCHMARK)

I'll update run_param_test.sh to run both param_test and param_test_skip_fastpath.

Note that "param_test_benchmark" is only useful for benchmarking,
so I don't plan to run it from run_param_test.sh which is meant
to track regressions.

Is that approach OK with you ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-21 17:05       ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 17:05 UTC (permalink / raw)


----- On Nov 21, 2017,@10:34 AM, shuah shuah@kernel.org wrote:

[...]
>> ---
>>  MAINTAINERS                                        |    1 +
>>  tools/testing/selftests/Makefile                   |    1 +
>>  tools/testing/selftests/rseq/.gitignore            |    4 +
> 
> Thanks for the .gitignore files. It is commonly missed change, I end
> up adding one to clean things up after tests get in.

I'm used to receive patches where contributors forget to add new files
to gitignore within my own projects, which may contribute to my awareness
of this pain point. :)

[...]

>> +
>> +void *test_percpu_inc_thread(void *arg)
>> +{
>> +	struct inc_thread_test_data *thread_data = arg;
>> +	struct inc_test_data *data = thread_data->data;
>> +	long long i, reps;
>> +
>> +	if (!opt_disable_rseq && thread_data->reg
>> +			&& rseq_register_current_thread())
>> +		abort();
>> +	reps = thread_data->reps;
>> +	for (i = 0; i < reps; i++) {
>> +		int cpu, ret;
>> +
>> +#ifndef SKIP_FASTPATH
>> +		/* Try fast path. */
>> +		cpu = rseq_cpu_start();
>> +		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>> +		if (likely(!ret))
>> +			goto next;
>> +#endif
> 
> So the test needs to compiled with this enabled? I think it would be better
> to make this an argument to be abel to select at test start time as opposed
> to making this compile time option. Remember that these tests get run in
> automated test rings. Making this a compile time otpion pertty much ensures
> that this path will not be tested.
> 
> So I would reccommend adding a paratemer.
> 
>> +	slowpath:
>> +		__attribute__((unused));
>> +		for (;;) {
>> +			/* Fallback on cpu_opv system call. */
>> +			cpu = rseq_current_cpu();
>> +			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>> +			if (likely(!ret))
>> +				break;
>> +			assert(ret >= 0 || errno == EAGAIN);
>> +		}
>> +	next:
>> +		__attribute__((unused));
>> +#ifndef BENCHMARK
>> +		if (i != 0 && !(i % (reps / 10)))
>> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
>> +#endif
> 
> Same comment as before. Avoid compile time options.

The goal of those compiler define are to generate the altered code without
adding branches into the fast-paths.

Here is an alternative solution that should take care of your concern: I'll
build multiple targets for param_test.c:

param_test
param_test_skip_fastpath (built with -DSKIP_FASTPATH)
param_test_benchmark (build with -DBENCHMARK)

I'll update run_param_test.sh to run both param_test and param_test_skip_fastpath.

Note that "param_test_benchmark" is only useful for benchmarking,
so I don't plan to run it from run_param_test.sh which is meant
to track regressions.

Is that approach OK with you ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-21 17:05       ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 17:05 UTC (permalink / raw)
  To: shuah
  Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas

----- On Nov 21, 2017, at 10:34 AM, shuah shuah@kernel.org wrote:

[...]
>> ---
>>  MAINTAINERS                                        |    1 +
>>  tools/testing/selftests/Makefile                   |    1 +
>>  tools/testing/selftests/rseq/.gitignore            |    4 +
> 
> Thanks for the .gitignore files. It is commonly missed change, I end
> up adding one to clean things up after tests get in.

I'm used to receive patches where contributors forget to add new files
to gitignore within my own projects, which may contribute to my awareness
of this pain point. :)

[...]

>> +
>> +void *test_percpu_inc_thread(void *arg)
>> +{
>> +	struct inc_thread_test_data *thread_data = arg;
>> +	struct inc_test_data *data = thread_data->data;
>> +	long long i, reps;
>> +
>> +	if (!opt_disable_rseq && thread_data->reg
>> +			&& rseq_register_current_thread())
>> +		abort();
>> +	reps = thread_data->reps;
>> +	for (i = 0; i < reps; i++) {
>> +		int cpu, ret;
>> +
>> +#ifndef SKIP_FASTPATH
>> +		/* Try fast path. */
>> +		cpu = rseq_cpu_start();
>> +		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>> +		if (likely(!ret))
>> +			goto next;
>> +#endif
> 
> So the test needs to compiled with this enabled? I think it would be better
> to make this an argument to be abel to select at test start time as opposed
> to making this compile time option. Remember that these tests get run in
> automated test rings. Making this a compile time otpion pertty much ensures
> that this path will not be tested.
> 
> So I would reccommend adding a paratemer.
> 
>> +	slowpath:
>> +		__attribute__((unused));
>> +		for (;;) {
>> +			/* Fallback on cpu_opv system call. */
>> +			cpu = rseq_current_cpu();
>> +			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>> +			if (likely(!ret))
>> +				break;
>> +			assert(ret >= 0 || errno == EAGAIN);
>> +		}
>> +	next:
>> +		__attribute__((unused));
>> +#ifndef BENCHMARK
>> +		if (i != 0 && !(i % (reps / 10)))
>> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
>> +#endif
> 
> Same comment as before. Avoid compile time options.

The goal of those compiler define are to generate the altered code without
adding branches into the fast-paths.

Here is an alternative solution that should take care of your concern: I'll
build multiple targets for param_test.c:

param_test
param_test_skip_fastpath (built with -DSKIP_FASTPATH)
param_test_benchmark (build with -DBENCHMARK)

I'll update run_param_test.sh to run both param_test and param_test_skip_fastpath.

Note that "param_test_benchmark" is only useful for benchmarking,
so I don't plan to run it from run_param_test.sh which is meant
to track regressions.

Is that approach OK with you ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
  2017-11-21 14:18 ` Mathieu Desnoyers
@ 2017-11-21 17:21   ` Andi Kleen
  -1 siblings, 0 replies; 175+ messages in thread
From: Andi Kleen @ 2017-11-21 17:21 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk

On Tue, Nov 21, 2017 at 09:18:38AM -0500, Mathieu Desnoyers wrote:
> Hi,
> 
> Following changes based on a thorough coding style and patch changelog
> review from Thomas Gleixner and Peter Zijlstra, I'm respinning this
> series for another RFC.
> 
My suggestion would be that you also split out the opv system call.
That seems to be main contention point currently, and the restartable
sequences should be useful without it.

-Andi

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-21 17:21   ` Andi Kleen
  0 siblings, 0 replies; 175+ messages in thread
From: Andi Kleen @ 2017-11-21 17:21 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas

On Tue, Nov 21, 2017 at 09:18:38AM -0500, Mathieu Desnoyers wrote:
> Hi,
> 
> Following changes based on a thorough coding style and patch changelog
> review from Thomas Gleixner and Peter Zijlstra, I'm respinning this
> series for another RFC.
> 
My suggestion would be that you also split out the opv system call.
That seems to be main contention point currently, and the restartable
sequences should be useful without it.

-Andi

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
  2017-11-21 17:05       ` mathieu.desnoyers
  (?)
  (?)
@ 2017-11-21 17:40         ` shuah
  -1 siblings, 0 replies; 175+ messages in thread
From: Shuah Khan @ 2017-11-21 17:40 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, linux-kselftest, Shuah Khan, Shuah Khan,
	Shuah Khan

On 11/21/2017 10:05 AM, Mathieu Desnoyers wrote:
> ----- On Nov 21, 2017, at 10:34 AM, shuah shuah@kernel.org wrote:
> 
> [...]
>>> ---
>>>  MAINTAINERS                                        |    1 +
>>>  tools/testing/selftests/Makefile                   |    1 +
>>>  tools/testing/selftests/rseq/.gitignore            |    4 +
>>
>> Thanks for the .gitignore files. It is commonly missed change, I end
>> up adding one to clean things up after tests get in.
> 
> I'm used to receive patches where contributors forget to add new files
> to gitignore within my own projects, which may contribute to my awareness
> of this pain point. :)
> 
> [...]
> 
>>> +
>>> +void *test_percpu_inc_thread(void *arg)
>>> +{
>>> +	struct inc_thread_test_data *thread_data = arg;
>>> +	struct inc_test_data *data = thread_data->data;
>>> +	long long i, reps;
>>> +
>>> +	if (!opt_disable_rseq && thread_data->reg
>>> +			&& rseq_register_current_thread())
>>> +		abort();
>>> +	reps = thread_data->reps;
>>> +	for (i = 0; i < reps; i++) {
>>> +		int cpu, ret;
>>> +
>>> +#ifndef SKIP_FASTPATH
>>> +		/* Try fast path. */
>>> +		cpu = rseq_cpu_start();
>>> +		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>>> +		if (likely(!ret))
>>> +			goto next;
>>> +#endif
>>
>> So the test needs to compiled with this enabled? I think it would be better
>> to make this an argument to be abel to select at test start time as opposed
>> to making this compile time option. Remember that these tests get run in
>> automated test rings. Making this a compile time otpion pertty much ensures
>> that this path will not be tested.
>>
>> So I would reccommend adding a paratemer.
>>
>>> +	slowpath:
>>> +		__attribute__((unused));
>>> +		for (;;) {
>>> +			/* Fallback on cpu_opv system call. */
>>> +			cpu = rseq_current_cpu();
>>> +			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>>> +			if (likely(!ret))
>>> +				break;
>>> +			assert(ret >= 0 || errno == EAGAIN);
>>> +		}
>>> +	next:
>>> +		__attribute__((unused));
>>> +#ifndef BENCHMARK
>>> +		if (i != 0 && !(i % (reps / 10)))
>>> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
>>> +#endif
>>
>> Same comment as before. Avoid compile time options.
> 
> The goal of those compiler define are to generate the altered code without
> adding branches into the fast-paths.

That makes sense. You are looking to not add any overhead.

> 
> Here is an alternative solution that should take care of your concern: I'll
> build multiple targets for param_test.c:
> 
> param_test
> param_test_skip_fastpath (built with -DSKIP_FASTPATH)
> param_test_benchmark (build with -DBENCHMARK)
> 
> I'll update run_param_test.sh to run both param_test and param_test_skip_fastpath.
> 
> Note that "param_test_benchmark" is only useful for benchmarking,
> so I don't plan to run it from run_param_test.sh which is meant
> to track regressions.
> 
> Is that approach OK with you ?
> 

Yes. This approach addresses my concern about coverage for both paths.

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-21 17:40         ` shuah
  0 siblings, 0 replies; 175+ messages in thread
From: shuah @ 2017-11-21 17:40 UTC (permalink / raw)


On 11/21/2017 10:05 AM, Mathieu Desnoyers wrote:
> ----- On Nov 21, 2017, at 10:34 AM, shuah shuah at kernel.org wrote:
> 
> [...]
>>> ---
>>>  MAINTAINERS                                        |    1 +
>>>  tools/testing/selftests/Makefile                   |    1 +
>>>  tools/testing/selftests/rseq/.gitignore            |    4 +
>>
>> Thanks for the .gitignore files. It is commonly missed change, I end
>> up adding one to clean things up after tests get in.
> 
> I'm used to receive patches where contributors forget to add new files
> to gitignore within my own projects, which may contribute to my awareness
> of this pain point. :)
> 
> [...]
> 
>>> +
>>> +void *test_percpu_inc_thread(void *arg)
>>> +{
>>> +	struct inc_thread_test_data *thread_data = arg;
>>> +	struct inc_test_data *data = thread_data->data;
>>> +	long long i, reps;
>>> +
>>> +	if (!opt_disable_rseq && thread_data->reg
>>> +			&& rseq_register_current_thread())
>>> +		abort();
>>> +	reps = thread_data->reps;
>>> +	for (i = 0; i < reps; i++) {
>>> +		int cpu, ret;
>>> +
>>> +#ifndef SKIP_FASTPATH
>>> +		/* Try fast path. */
>>> +		cpu = rseq_cpu_start();
>>> +		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>>> +		if (likely(!ret))
>>> +			goto next;
>>> +#endif
>>
>> So the test needs to compiled with this enabled? I think it would be better
>> to make this an argument to be abel to select at test start time as opposed
>> to making this compile time option. Remember that these tests get run in
>> automated test rings. Making this a compile time otpion pertty much ensures
>> that this path will not be tested.
>>
>> So I would reccommend adding a paratemer.
>>
>>> +	slowpath:
>>> +		__attribute__((unused));
>>> +		for (;;) {
>>> +			/* Fallback on cpu_opv system call. */
>>> +			cpu = rseq_current_cpu();
>>> +			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>>> +			if (likely(!ret))
>>> +				break;
>>> +			assert(ret >= 0 || errno == EAGAIN);
>>> +		}
>>> +	next:
>>> +		__attribute__((unused));
>>> +#ifndef BENCHMARK
>>> +		if (i != 0 && !(i % (reps / 10)))
>>> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
>>> +#endif
>>
>> Same comment as before. Avoid compile time options.
> 
> The goal of those compiler define are to generate the altered code without
> adding branches into the fast-paths.

That makes sense. You are looking to not add any overhead.

> 
> Here is an alternative solution that should take care of your concern: I'll
> build multiple targets for param_test.c:
> 
> param_test
> param_test_skip_fastpath (built with -DSKIP_FASTPATH)
> param_test_benchmark (build with -DBENCHMARK)
> 
> I'll update run_param_test.sh to run both param_test and param_test_skip_fastpath.
> 
> Note that "param_test_benchmark" is only useful for benchmarking,
> so I don't plan to run it from run_param_test.sh which is meant
> to track regressions.
> 
> Is that approach OK with you ?
> 

Yes. This approach addresses my concern about coverage for both paths.

thanks,
-- Shuah

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-21 17:40         ` shuah
  0 siblings, 0 replies; 175+ messages in thread
From: Shuah Khan @ 2017-11-21 17:40 UTC (permalink / raw)


On 11/21/2017 10:05 AM, Mathieu Desnoyers wrote:
> ----- On Nov 21, 2017,@10:34 AM, shuah shuah@kernel.org wrote:
> 
> [...]
>>> ---
>>>  MAINTAINERS                                        |    1 +
>>>  tools/testing/selftests/Makefile                   |    1 +
>>>  tools/testing/selftests/rseq/.gitignore            |    4 +
>>
>> Thanks for the .gitignore files. It is commonly missed change, I end
>> up adding one to clean things up after tests get in.
> 
> I'm used to receive patches where contributors forget to add new files
> to gitignore within my own projects, which may contribute to my awareness
> of this pain point. :)
> 
> [...]
> 
>>> +
>>> +void *test_percpu_inc_thread(void *arg)
>>> +{
>>> +	struct inc_thread_test_data *thread_data = arg;
>>> +	struct inc_test_data *data = thread_data->data;
>>> +	long long i, reps;
>>> +
>>> +	if (!opt_disable_rseq && thread_data->reg
>>> +			&& rseq_register_current_thread())
>>> +		abort();
>>> +	reps = thread_data->reps;
>>> +	for (i = 0; i < reps; i++) {
>>> +		int cpu, ret;
>>> +
>>> +#ifndef SKIP_FASTPATH
>>> +		/* Try fast path. */
>>> +		cpu = rseq_cpu_start();
>>> +		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>>> +		if (likely(!ret))
>>> +			goto next;
>>> +#endif
>>
>> So the test needs to compiled with this enabled? I think it would be better
>> to make this an argument to be abel to select at test start time as opposed
>> to making this compile time option. Remember that these tests get run in
>> automated test rings. Making this a compile time otpion pertty much ensures
>> that this path will not be tested.
>>
>> So I would reccommend adding a paratemer.
>>
>>> +	slowpath:
>>> +		__attribute__((unused));
>>> +		for (;;) {
>>> +			/* Fallback on cpu_opv system call. */
>>> +			cpu = rseq_current_cpu();
>>> +			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>>> +			if (likely(!ret))
>>> +				break;
>>> +			assert(ret >= 0 || errno == EAGAIN);
>>> +		}
>>> +	next:
>>> +		__attribute__((unused));
>>> +#ifndef BENCHMARK
>>> +		if (i != 0 && !(i % (reps / 10)))
>>> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
>>> +#endif
>>
>> Same comment as before. Avoid compile time options.
> 
> The goal of those compiler define are to generate the altered code without
> adding branches into the fast-paths.

That makes sense. You are looking to not add any overhead.

> 
> Here is an alternative solution that should take care of your concern: I'll
> build multiple targets for param_test.c:
> 
> param_test
> param_test_skip_fastpath (built with -DSKIP_FASTPATH)
> param_test_benchmark (build with -DBENCHMARK)
> 
> I'll update run_param_test.sh to run both param_test and param_test_skip_fastpath.
> 
> Note that "param_test_benchmark" is only useful for benchmarking,
> so I don't plan to run it from run_param_test.sh which is meant
> to track regressions.
> 
> Is that approach OK with you ?
> 

Yes. This approach addresses my concern about coverage for both paths.

thanks,
-- Shuah

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-21 17:40         ` shuah
  0 siblings, 0 replies; 175+ messages in thread
From: Shuah Khan @ 2017-11-21 17:40 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas

On 11/21/2017 10:05 AM, Mathieu Desnoyers wrote:
> ----- On Nov 21, 2017, at 10:34 AM, shuah shuah@kernel.org wrote:
> 
> [...]
>>> ---
>>>  MAINTAINERS                                        |    1 +
>>>  tools/testing/selftests/Makefile                   |    1 +
>>>  tools/testing/selftests/rseq/.gitignore            |    4 +
>>
>> Thanks for the .gitignore files. It is commonly missed change, I end
>> up adding one to clean things up after tests get in.
> 
> I'm used to receive patches where contributors forget to add new files
> to gitignore within my own projects, which may contribute to my awareness
> of this pain point. :)
> 
> [...]
> 
>>> +
>>> +void *test_percpu_inc_thread(void *arg)
>>> +{
>>> +	struct inc_thread_test_data *thread_data = arg;
>>> +	struct inc_test_data *data = thread_data->data;
>>> +	long long i, reps;
>>> +
>>> +	if (!opt_disable_rseq && thread_data->reg
>>> +			&& rseq_register_current_thread())
>>> +		abort();
>>> +	reps = thread_data->reps;
>>> +	for (i = 0; i < reps; i++) {
>>> +		int cpu, ret;
>>> +
>>> +#ifndef SKIP_FASTPATH
>>> +		/* Try fast path. */
>>> +		cpu = rseq_cpu_start();
>>> +		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>>> +		if (likely(!ret))
>>> +			goto next;
>>> +#endif
>>
>> So the test needs to compiled with this enabled? I think it would be better
>> to make this an argument to be abel to select at test start time as opposed
>> to making this compile time option. Remember that these tests get run in
>> automated test rings. Making this a compile time otpion pertty much ensures
>> that this path will not be tested.
>>
>> So I would reccommend adding a paratemer.
>>
>>> +	slowpath:
>>> +		__attribute__((unused));
>>> +		for (;;) {
>>> +			/* Fallback on cpu_opv system call. */
>>> +			cpu = rseq_current_cpu();
>>> +			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>>> +			if (likely(!ret))
>>> +				break;
>>> +			assert(ret >= 0 || errno == EAGAIN);
>>> +		}
>>> +	next:
>>> +		__attribute__((unused));
>>> +#ifndef BENCHMARK
>>> +		if (i != 0 && !(i % (reps / 10)))
>>> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
>>> +#endif
>>
>> Same comment as before. Avoid compile time options.
> 
> The goal of those compiler define are to generate the altered code without
> adding branches into the fast-paths.

That makes sense. You are looking to not add any overhead.

> 
> Here is an alternative solution that should take care of your concern: I'll
> build multiple targets for param_test.c:
> 
> param_test
> param_test_skip_fastpath (built with -DSKIP_FASTPATH)
> param_test_benchmark (build with -DBENCHMARK)
> 
> I'll update run_param_test.sh to run both param_test and param_test_skip_fastpath.
> 
> Note that "param_test_benchmark" is only useful for benchmarking,
> so I don't plan to run it from run_param_test.sh which is meant
> to track regressions.
> 
> Is that approach OK with you ?
> 

Yes. This approach addresses my concern about coverage for both paths.

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
  2017-11-21 17:40         ` shuah
  (?)
  (?)
@ 2017-11-21 21:22           ` mathieu.desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 21:22 UTC (permalink / raw)
  To: shuah
  Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, linux-kselftest, Shuah Khan

----- On Nov 21, 2017, at 12:40 PM, shuah shuah@kernel.org wrote:

> On 11/21/2017 10:05 AM, Mathieu Desnoyers wrote:
>> ----- On Nov 21, 2017, at 10:34 AM, shuah shuah@kernel.org wrote:
>> 
>> [...]
>>>> ---
>>>>  MAINTAINERS                                        |    1 +
>>>>  tools/testing/selftests/Makefile                   |    1 +
>>>>  tools/testing/selftests/rseq/.gitignore            |    4 +
>>>
>>> Thanks for the .gitignore files. It is commonly missed change, I end
>>> up adding one to clean things up after tests get in.
>> 
>> I'm used to receive patches where contributors forget to add new files
>> to gitignore within my own projects, which may contribute to my awareness
>> of this pain point. :)
>> 
>> [...]
>> 
>>>> +
>>>> +void *test_percpu_inc_thread(void *arg)
>>>> +{
>>>> +	struct inc_thread_test_data *thread_data = arg;
>>>> +	struct inc_test_data *data = thread_data->data;
>>>> +	long long i, reps;
>>>> +
>>>> +	if (!opt_disable_rseq && thread_data->reg
>>>> +			&& rseq_register_current_thread())
>>>> +		abort();
>>>> +	reps = thread_data->reps;
>>>> +	for (i = 0; i < reps; i++) {
>>>> +		int cpu, ret;
>>>> +
>>>> +#ifndef SKIP_FASTPATH
>>>> +		/* Try fast path. */
>>>> +		cpu = rseq_cpu_start();
>>>> +		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>>>> +		if (likely(!ret))
>>>> +			goto next;
>>>> +#endif
>>>
>>> So the test needs to compiled with this enabled? I think it would be better
>>> to make this an argument to be abel to select at test start time as opposed
>>> to making this compile time option. Remember that these tests get run in
>>> automated test rings. Making this a compile time otpion pertty much ensures
>>> that this path will not be tested.
>>>
>>> So I would reccommend adding a paratemer.
>>>
>>>> +	slowpath:
>>>> +		__attribute__((unused));
>>>> +		for (;;) {
>>>> +			/* Fallback on cpu_opv system call. */
>>>> +			cpu = rseq_current_cpu();
>>>> +			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>>>> +			if (likely(!ret))
>>>> +				break;
>>>> +			assert(ret >= 0 || errno == EAGAIN);
>>>> +		}
>>>> +	next:
>>>> +		__attribute__((unused));
>>>> +#ifndef BENCHMARK
>>>> +		if (i != 0 && !(i % (reps / 10)))
>>>> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
>>>> +#endif
>>>
>>> Same comment as before. Avoid compile time options.
>> 
>> The goal of those compiler define are to generate the altered code without
>> adding branches into the fast-paths.
> 
> That makes sense. You are looking to not add any overhead.
> 
>> 
>> Here is an alternative solution that should take care of your concern: I'll
>> build multiple targets for param_test.c:
>> 
>> param_test
>> param_test_skip_fastpath (built with -DSKIP_FASTPATH)
>> param_test_benchmark (build with -DBENCHMARK)
>> 
>> I'll update run_param_test.sh to run both param_test and
>> param_test_skip_fastpath.
>> 
>> Note that "param_test_benchmark" is only useful for benchmarking,
>> so I don't plan to run it from run_param_test.sh which is meant
>> to track regressions.
>> 
>> Is that approach OK with you ?
>> 
> 
> Yes. This approach addresses my concern about coverage for both paths.

fyi, the updated patches can be found here:

https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=a0b8eb0eb5d4d8a280969370aa1dcf51801139c6
  "selftests: lib.mk: Introduce OVERRIDE_TARGETS"

https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=4ef0214e19bb7415fe7aed6852859b8d66e09a45
  "cpu_opv: selftests: Implement selftests (v4)"

https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=7d7530b843c7ecb50bea5a136c776cf3e9155d43
  "rseq: selftests: Provide self-tests (v4)"

Thanks for the feedback!

Mathieu

> 
> thanks,
> -- Shuah

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-21 21:22           ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: mathieu.desnoyers @ 2017-11-21 21:22 UTC (permalink / raw)


----- On Nov 21, 2017, at 12:40 PM, shuah shuah at kernel.org wrote:

> On 11/21/2017 10:05 AM, Mathieu Desnoyers wrote:
>> ----- On Nov 21, 2017, at 10:34 AM, shuah shuah at kernel.org wrote:
>> 
>> [...]
>>>> ---
>>>>  MAINTAINERS                                        |    1 +
>>>>  tools/testing/selftests/Makefile                   |    1 +
>>>>  tools/testing/selftests/rseq/.gitignore            |    4 +
>>>
>>> Thanks for the .gitignore files. It is commonly missed change, I end
>>> up adding one to clean things up after tests get in.
>> 
>> I'm used to receive patches where contributors forget to add new files
>> to gitignore within my own projects, which may contribute to my awareness
>> of this pain point. :)
>> 
>> [...]
>> 
>>>> +
>>>> +void *test_percpu_inc_thread(void *arg)
>>>> +{
>>>> +	struct inc_thread_test_data *thread_data = arg;
>>>> +	struct inc_test_data *data = thread_data->data;
>>>> +	long long i, reps;
>>>> +
>>>> +	if (!opt_disable_rseq && thread_data->reg
>>>> +			&& rseq_register_current_thread())
>>>> +		abort();
>>>> +	reps = thread_data->reps;
>>>> +	for (i = 0; i < reps; i++) {
>>>> +		int cpu, ret;
>>>> +
>>>> +#ifndef SKIP_FASTPATH
>>>> +		/* Try fast path. */
>>>> +		cpu = rseq_cpu_start();
>>>> +		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>>>> +		if (likely(!ret))
>>>> +			goto next;
>>>> +#endif
>>>
>>> So the test needs to compiled with this enabled? I think it would be better
>>> to make this an argument to be abel to select at test start time as opposed
>>> to making this compile time option. Remember that these tests get run in
>>> automated test rings. Making this a compile time otpion pertty much ensures
>>> that this path will not be tested.
>>>
>>> So I would reccommend adding a paratemer.
>>>
>>>> +	slowpath:
>>>> +		__attribute__((unused));
>>>> +		for (;;) {
>>>> +			/* Fallback on cpu_opv system call. */
>>>> +			cpu = rseq_current_cpu();
>>>> +			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>>>> +			if (likely(!ret))
>>>> +				break;
>>>> +			assert(ret >= 0 || errno == EAGAIN);
>>>> +		}
>>>> +	next:
>>>> +		__attribute__((unused));
>>>> +#ifndef BENCHMARK
>>>> +		if (i != 0 && !(i % (reps / 10)))
>>>> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
>>>> +#endif
>>>
>>> Same comment as before. Avoid compile time options.
>> 
>> The goal of those compiler define are to generate the altered code without
>> adding branches into the fast-paths.
> 
> That makes sense. You are looking to not add any overhead.
> 
>> 
>> Here is an alternative solution that should take care of your concern: I'll
>> build multiple targets for param_test.c:
>> 
>> param_test
>> param_test_skip_fastpath (built with -DSKIP_FASTPATH)
>> param_test_benchmark (build with -DBENCHMARK)
>> 
>> I'll update run_param_test.sh to run both param_test and
>> param_test_skip_fastpath.
>> 
>> Note that "param_test_benchmark" is only useful for benchmarking,
>> so I don't plan to run it from run_param_test.sh which is meant
>> to track regressions.
>> 
>> Is that approach OK with you ?
>> 
> 
> Yes. This approach addresses my concern about coverage for both paths.

fyi, the updated patches can be found here:

https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=a0b8eb0eb5d4d8a280969370aa1dcf51801139c6
  "selftests: lib.mk: Introduce OVERRIDE_TARGETS"

https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=4ef0214e19bb7415fe7aed6852859b8d66e09a45
  "cpu_opv: selftests: Implement selftests (v4)"

https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=7d7530b843c7ecb50bea5a136c776cf3e9155d43
  "rseq: selftests: Provide self-tests (v4)"

Thanks for the feedback!

Mathieu

> 
> thanks,
> -- Shuah

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-21 21:22           ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 21:22 UTC (permalink / raw)


----- On Nov 21, 2017,@12:40 PM, shuah shuah@kernel.org wrote:

> On 11/21/2017 10:05 AM, Mathieu Desnoyers wrote:
>> ----- On Nov 21, 2017,@10:34 AM, shuah shuah@kernel.org wrote:
>> 
>> [...]
>>>> ---
>>>>  MAINTAINERS                                        |    1 +
>>>>  tools/testing/selftests/Makefile                   |    1 +
>>>>  tools/testing/selftests/rseq/.gitignore            |    4 +
>>>
>>> Thanks for the .gitignore files. It is commonly missed change, I end
>>> up adding one to clean things up after tests get in.
>> 
>> I'm used to receive patches where contributors forget to add new files
>> to gitignore within my own projects, which may contribute to my awareness
>> of this pain point. :)
>> 
>> [...]
>> 
>>>> +
>>>> +void *test_percpu_inc_thread(void *arg)
>>>> +{
>>>> +	struct inc_thread_test_data *thread_data = arg;
>>>> +	struct inc_test_data *data = thread_data->data;
>>>> +	long long i, reps;
>>>> +
>>>> +	if (!opt_disable_rseq && thread_data->reg
>>>> +			&& rseq_register_current_thread())
>>>> +		abort();
>>>> +	reps = thread_data->reps;
>>>> +	for (i = 0; i < reps; i++) {
>>>> +		int cpu, ret;
>>>> +
>>>> +#ifndef SKIP_FASTPATH
>>>> +		/* Try fast path. */
>>>> +		cpu = rseq_cpu_start();
>>>> +		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>>>> +		if (likely(!ret))
>>>> +			goto next;
>>>> +#endif
>>>
>>> So the test needs to compiled with this enabled? I think it would be better
>>> to make this an argument to be abel to select at test start time as opposed
>>> to making this compile time option. Remember that these tests get run in
>>> automated test rings. Making this a compile time otpion pertty much ensures
>>> that this path will not be tested.
>>>
>>> So I would reccommend adding a paratemer.
>>>
>>>> +	slowpath:
>>>> +		__attribute__((unused));
>>>> +		for (;;) {
>>>> +			/* Fallback on cpu_opv system call. */
>>>> +			cpu = rseq_current_cpu();
>>>> +			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>>>> +			if (likely(!ret))
>>>> +				break;
>>>> +			assert(ret >= 0 || errno == EAGAIN);
>>>> +		}
>>>> +	next:
>>>> +		__attribute__((unused));
>>>> +#ifndef BENCHMARK
>>>> +		if (i != 0 && !(i % (reps / 10)))
>>>> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
>>>> +#endif
>>>
>>> Same comment as before. Avoid compile time options.
>> 
>> The goal of those compiler define are to generate the altered code without
>> adding branches into the fast-paths.
> 
> That makes sense. You are looking to not add any overhead.
> 
>> 
>> Here is an alternative solution that should take care of your concern: I'll
>> build multiple targets for param_test.c:
>> 
>> param_test
>> param_test_skip_fastpath (built with -DSKIP_FASTPATH)
>> param_test_benchmark (build with -DBENCHMARK)
>> 
>> I'll update run_param_test.sh to run both param_test and
>> param_test_skip_fastpath.
>> 
>> Note that "param_test_benchmark" is only useful for benchmarking,
>> so I don't plan to run it from run_param_test.sh which is meant
>> to track regressions.
>> 
>> Is that approach OK with you ?
>> 
> 
> Yes. This approach addresses my concern about coverage for both paths.

fyi, the updated patches can be found here:

https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=a0b8eb0eb5d4d8a280969370aa1dcf51801139c6
  "selftests: lib.mk: Introduce OVERRIDE_TARGETS"

https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=4ef0214e19bb7415fe7aed6852859b8d66e09a45
  "cpu_opv: selftests: Implement selftests (v4)"

https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=7d7530b843c7ecb50bea5a136c776cf3e9155d43
  "rseq: selftests: Provide self-tests (v4)"

Thanks for the feedback!

Mathieu

> 
> thanks,
> -- Shuah

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-21 21:22           ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 21:22 UTC (permalink / raw)
  To: shuah
  Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas

----- On Nov 21, 2017, at 12:40 PM, shuah shuah@kernel.org wrote:

> On 11/21/2017 10:05 AM, Mathieu Desnoyers wrote:
>> ----- On Nov 21, 2017, at 10:34 AM, shuah shuah@kernel.org wrote:
>> 
>> [...]
>>>> ---
>>>>  MAINTAINERS                                        |    1 +
>>>>  tools/testing/selftests/Makefile                   |    1 +
>>>>  tools/testing/selftests/rseq/.gitignore            |    4 +
>>>
>>> Thanks for the .gitignore files. It is commonly missed change, I end
>>> up adding one to clean things up after tests get in.
>> 
>> I'm used to receive patches where contributors forget to add new files
>> to gitignore within my own projects, which may contribute to my awareness
>> of this pain point. :)
>> 
>> [...]
>> 
>>>> +
>>>> +void *test_percpu_inc_thread(void *arg)
>>>> +{
>>>> +	struct inc_thread_test_data *thread_data = arg;
>>>> +	struct inc_test_data *data = thread_data->data;
>>>> +	long long i, reps;
>>>> +
>>>> +	if (!opt_disable_rseq && thread_data->reg
>>>> +			&& rseq_register_current_thread())
>>>> +		abort();
>>>> +	reps = thread_data->reps;
>>>> +	for (i = 0; i < reps; i++) {
>>>> +		int cpu, ret;
>>>> +
>>>> +#ifndef SKIP_FASTPATH
>>>> +		/* Try fast path. */
>>>> +		cpu = rseq_cpu_start();
>>>> +		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>>>> +		if (likely(!ret))
>>>> +			goto next;
>>>> +#endif
>>>
>>> So the test needs to compiled with this enabled? I think it would be better
>>> to make this an argument to be abel to select at test start time as opposed
>>> to making this compile time option. Remember that these tests get run in
>>> automated test rings. Making this a compile time otpion pertty much ensures
>>> that this path will not be tested.
>>>
>>> So I would reccommend adding a paratemer.
>>>
>>>> +	slowpath:
>>>> +		__attribute__((unused));
>>>> +		for (;;) {
>>>> +			/* Fallback on cpu_opv system call. */
>>>> +			cpu = rseq_current_cpu();
>>>> +			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>>>> +			if (likely(!ret))
>>>> +				break;
>>>> +			assert(ret >= 0 || errno == EAGAIN);
>>>> +		}
>>>> +	next:
>>>> +		__attribute__((unused));
>>>> +#ifndef BENCHMARK
>>>> +		if (i != 0 && !(i % (reps / 10)))
>>>> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
>>>> +#endif
>>>
>>> Same comment as before. Avoid compile time options.
>> 
>> The goal of those compiler define are to generate the altered code without
>> adding branches into the fast-paths.
> 
> That makes sense. You are looking to not add any overhead.
> 
>> 
>> Here is an alternative solution that should take care of your concern: I'll
>> build multiple targets for param_test.c:
>> 
>> param_test
>> param_test_skip_fastpath (built with -DSKIP_FASTPATH)
>> param_test_benchmark (build with -DBENCHMARK)
>> 
>> I'll update run_param_test.sh to run both param_test and
>> param_test_skip_fastpath.
>> 
>> Note that "param_test_benchmark" is only useful for benchmarking,
>> so I don't plan to run it from run_param_test.sh which is meant
>> to track regressions.
>> 
>> Is that approach OK with you ?
>> 
> 
> Yes. This approach addresses my concern about coverage for both paths.

fyi, the updated patches can be found here:

https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=a0b8eb0eb5d4d8a280969370aa1dcf51801139c6
  "selftests: lib.mk: Introduce OVERRIDE_TARGETS"

https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=4ef0214e19bb7415fe7aed6852859b8d66e09a45
  "cpu_opv: selftests: Implement selftests (v4)"

https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=7d7530b843c7ecb50bea5a136c776cf3e9155d43
  "rseq: selftests: Provide self-tests (v4)"

Thanks for the feedback!

Mathieu

> 
> thanks,
> -- Shuah

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
  2017-11-21 21:22           ` mathieu.desnoyers
  (?)
  (?)
@ 2017-11-21 21:24             ` shuahkh
  -1 siblings, 0 replies; 175+ messages in thread
From: Shuah Khan @ 2017-11-21 21:24 UTC (permalink / raw)
  To: Mathieu Desnoyers, shuah
  Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, linux-kselftest, Shuah Khan, Shuah Khan

On 11/21/2017 02:22 PM, Mathieu Desnoyers wrote:
> ----- On Nov 21, 2017, at 12:40 PM, shuah shuah@kernel.org wrote:
> 
>> On 11/21/2017 10:05 AM, Mathieu Desnoyers wrote:
>>> ----- On Nov 21, 2017, at 10:34 AM, shuah shuah@kernel.org wrote:
>>>
>>> [...]
>>>>> ---
>>>>>  MAINTAINERS                                        |    1 +
>>>>>  tools/testing/selftests/Makefile                   |    1 +
>>>>>  tools/testing/selftests/rseq/.gitignore            |    4 +
>>>>
>>>> Thanks for the .gitignore files. It is commonly missed change, I end
>>>> up adding one to clean things up after tests get in.
>>>
>>> I'm used to receive patches where contributors forget to add new files
>>> to gitignore within my own projects, which may contribute to my awareness
>>> of this pain point. :)
>>>
>>> [...]
>>>
>>>>> +
>>>>> +void *test_percpu_inc_thread(void *arg)
>>>>> +{
>>>>> +	struct inc_thread_test_data *thread_data = arg;
>>>>> +	struct inc_test_data *data = thread_data->data;
>>>>> +	long long i, reps;
>>>>> +
>>>>> +	if (!opt_disable_rseq && thread_data->reg
>>>>> +			&& rseq_register_current_thread())
>>>>> +		abort();
>>>>> +	reps = thread_data->reps;
>>>>> +	for (i = 0; i < reps; i++) {
>>>>> +		int cpu, ret;
>>>>> +
>>>>> +#ifndef SKIP_FASTPATH
>>>>> +		/* Try fast path. */
>>>>> +		cpu = rseq_cpu_start();
>>>>> +		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>>>>> +		if (likely(!ret))
>>>>> +			goto next;
>>>>> +#endif
>>>>
>>>> So the test needs to compiled with this enabled? I think it would be better
>>>> to make this an argument to be abel to select at test start time as opposed
>>>> to making this compile time option. Remember that these tests get run in
>>>> automated test rings. Making this a compile time otpion pertty much ensures
>>>> that this path will not be tested.
>>>>
>>>> So I would reccommend adding a paratemer.
>>>>
>>>>> +	slowpath:
>>>>> +		__attribute__((unused));
>>>>> +		for (;;) {
>>>>> +			/* Fallback on cpu_opv system call. */
>>>>> +			cpu = rseq_current_cpu();
>>>>> +			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>>>>> +			if (likely(!ret))
>>>>> +				break;
>>>>> +			assert(ret >= 0 || errno == EAGAIN);
>>>>> +		}
>>>>> +	next:
>>>>> +		__attribute__((unused));
>>>>> +#ifndef BENCHMARK
>>>>> +		if (i != 0 && !(i % (reps / 10)))
>>>>> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
>>>>> +#endif
>>>>
>>>> Same comment as before. Avoid compile time options.
>>>
>>> The goal of those compiler define are to generate the altered code without
>>> adding branches into the fast-paths.
>>
>> That makes sense. You are looking to not add any overhead.
>>
>>>
>>> Here is an alternative solution that should take care of your concern: I'll
>>> build multiple targets for param_test.c:
>>>
>>> param_test
>>> param_test_skip_fastpath (built with -DSKIP_FASTPATH)
>>> param_test_benchmark (build with -DBENCHMARK)
>>>
>>> I'll update run_param_test.sh to run both param_test and
>>> param_test_skip_fastpath.
>>>
>>> Note that "param_test_benchmark" is only useful for benchmarking,
>>> so I don't plan to run it from run_param_test.sh which is meant
>>> to track regressions.
>>>
>>> Is that approach OK with you ?
>>>
>>
>> Yes. This approach addresses my concern about coverage for both paths.
> 
> fyi, the updated patches can be found here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=a0b8eb0eb5d4d8a280969370aa1dcf51801139c6
>   "selftests: lib.mk: Introduce OVERRIDE_TARGETS"
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=4ef0214e19bb7415fe7aed6852859b8d66e09a45
>   "cpu_opv: selftests: Implement selftests (v4)"
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=7d7530b843c7ecb50bea5a136c776cf3e9155d43
>   "rseq: selftests: Provide self-tests (v4)"
> 
> Thanks for the feedback!
> 

Are you going to send these to the mailing list? That way I can do a final
review and give my Ack if they look good.

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-21 21:24             ` shuahkh
  0 siblings, 0 replies; 175+ messages in thread
From: shuahkh @ 2017-11-21 21:24 UTC (permalink / raw)


On 11/21/2017 02:22 PM, Mathieu Desnoyers wrote:
> ----- On Nov 21, 2017, at 12:40 PM, shuah shuah at kernel.org wrote:
> 
>> On 11/21/2017 10:05 AM, Mathieu Desnoyers wrote:
>>> ----- On Nov 21, 2017, at 10:34 AM, shuah shuah at kernel.org wrote:
>>>
>>> [...]
>>>>> ---
>>>>>  MAINTAINERS                                        |    1 +
>>>>>  tools/testing/selftests/Makefile                   |    1 +
>>>>>  tools/testing/selftests/rseq/.gitignore            |    4 +
>>>>
>>>> Thanks for the .gitignore files. It is commonly missed change, I end
>>>> up adding one to clean things up after tests get in.
>>>
>>> I'm used to receive patches where contributors forget to add new files
>>> to gitignore within my own projects, which may contribute to my awareness
>>> of this pain point. :)
>>>
>>> [...]
>>>
>>>>> +
>>>>> +void *test_percpu_inc_thread(void *arg)
>>>>> +{
>>>>> +	struct inc_thread_test_data *thread_data = arg;
>>>>> +	struct inc_test_data *data = thread_data->data;
>>>>> +	long long i, reps;
>>>>> +
>>>>> +	if (!opt_disable_rseq && thread_data->reg
>>>>> +			&& rseq_register_current_thread())
>>>>> +		abort();
>>>>> +	reps = thread_data->reps;
>>>>> +	for (i = 0; i < reps; i++) {
>>>>> +		int cpu, ret;
>>>>> +
>>>>> +#ifndef SKIP_FASTPATH
>>>>> +		/* Try fast path. */
>>>>> +		cpu = rseq_cpu_start();
>>>>> +		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>>>>> +		if (likely(!ret))
>>>>> +			goto next;
>>>>> +#endif
>>>>
>>>> So the test needs to compiled with this enabled? I think it would be better
>>>> to make this an argument to be abel to select at test start time as opposed
>>>> to making this compile time option. Remember that these tests get run in
>>>> automated test rings. Making this a compile time otpion pertty much ensures
>>>> that this path will not be tested.
>>>>
>>>> So I would reccommend adding a paratemer.
>>>>
>>>>> +	slowpath:
>>>>> +		__attribute__((unused));
>>>>> +		for (;;) {
>>>>> +			/* Fallback on cpu_opv system call. */
>>>>> +			cpu = rseq_current_cpu();
>>>>> +			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>>>>> +			if (likely(!ret))
>>>>> +				break;
>>>>> +			assert(ret >= 0 || errno == EAGAIN);
>>>>> +		}
>>>>> +	next:
>>>>> +		__attribute__((unused));
>>>>> +#ifndef BENCHMARK
>>>>> +		if (i != 0 && !(i % (reps / 10)))
>>>>> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
>>>>> +#endif
>>>>
>>>> Same comment as before. Avoid compile time options.
>>>
>>> The goal of those compiler define are to generate the altered code without
>>> adding branches into the fast-paths.
>>
>> That makes sense. You are looking to not add any overhead.
>>
>>>
>>> Here is an alternative solution that should take care of your concern: I'll
>>> build multiple targets for param_test.c:
>>>
>>> param_test
>>> param_test_skip_fastpath (built with -DSKIP_FASTPATH)
>>> param_test_benchmark (build with -DBENCHMARK)
>>>
>>> I'll update run_param_test.sh to run both param_test and
>>> param_test_skip_fastpath.
>>>
>>> Note that "param_test_benchmark" is only useful for benchmarking,
>>> so I don't plan to run it from run_param_test.sh which is meant
>>> to track regressions.
>>>
>>> Is that approach OK with you ?
>>>
>>
>> Yes. This approach addresses my concern about coverage for both paths.
> 
> fyi, the updated patches can be found here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=a0b8eb0eb5d4d8a280969370aa1dcf51801139c6
>   "selftests: lib.mk: Introduce OVERRIDE_TARGETS"
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=4ef0214e19bb7415fe7aed6852859b8d66e09a45
>   "cpu_opv: selftests: Implement selftests (v4)"
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=7d7530b843c7ecb50bea5a136c776cf3e9155d43
>   "rseq: selftests: Provide self-tests (v4)"
> 
> Thanks for the feedback!
> 

Are you going to send these to the mailing list? That way I can do a final
review and give my Ack if they look good.

thanks,
-- Shuah


--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-21 21:24             ` shuahkh
  0 siblings, 0 replies; 175+ messages in thread
From: Shuah Khan @ 2017-11-21 21:24 UTC (permalink / raw)


On 11/21/2017 02:22 PM, Mathieu Desnoyers wrote:
> ----- On Nov 21, 2017,@12:40 PM, shuah shuah@kernel.org wrote:
> 
>> On 11/21/2017 10:05 AM, Mathieu Desnoyers wrote:
>>> ----- On Nov 21, 2017,@10:34 AM, shuah shuah@kernel.org wrote:
>>>
>>> [...]
>>>>> ---
>>>>>  MAINTAINERS                                        |    1 +
>>>>>  tools/testing/selftests/Makefile                   |    1 +
>>>>>  tools/testing/selftests/rseq/.gitignore            |    4 +
>>>>
>>>> Thanks for the .gitignore files. It is commonly missed change, I end
>>>> up adding one to clean things up after tests get in.
>>>
>>> I'm used to receive patches where contributors forget to add new files
>>> to gitignore within my own projects, which may contribute to my awareness
>>> of this pain point. :)
>>>
>>> [...]
>>>
>>>>> +
>>>>> +void *test_percpu_inc_thread(void *arg)
>>>>> +{
>>>>> +	struct inc_thread_test_data *thread_data = arg;
>>>>> +	struct inc_test_data *data = thread_data->data;
>>>>> +	long long i, reps;
>>>>> +
>>>>> +	if (!opt_disable_rseq && thread_data->reg
>>>>> +			&& rseq_register_current_thread())
>>>>> +		abort();
>>>>> +	reps = thread_data->reps;
>>>>> +	for (i = 0; i < reps; i++) {
>>>>> +		int cpu, ret;
>>>>> +
>>>>> +#ifndef SKIP_FASTPATH
>>>>> +		/* Try fast path. */
>>>>> +		cpu = rseq_cpu_start();
>>>>> +		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>>>>> +		if (likely(!ret))
>>>>> +			goto next;
>>>>> +#endif
>>>>
>>>> So the test needs to compiled with this enabled? I think it would be better
>>>> to make this an argument to be abel to select at test start time as opposed
>>>> to making this compile time option. Remember that these tests get run in
>>>> automated test rings. Making this a compile time otpion pertty much ensures
>>>> that this path will not be tested.
>>>>
>>>> So I would reccommend adding a paratemer.
>>>>
>>>>> +	slowpath:
>>>>> +		__attribute__((unused));
>>>>> +		for (;;) {
>>>>> +			/* Fallback on cpu_opv system call. */
>>>>> +			cpu = rseq_current_cpu();
>>>>> +			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>>>>> +			if (likely(!ret))
>>>>> +				break;
>>>>> +			assert(ret >= 0 || errno == EAGAIN);
>>>>> +		}
>>>>> +	next:
>>>>> +		__attribute__((unused));
>>>>> +#ifndef BENCHMARK
>>>>> +		if (i != 0 && !(i % (reps / 10)))
>>>>> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
>>>>> +#endif
>>>>
>>>> Same comment as before. Avoid compile time options.
>>>
>>> The goal of those compiler define are to generate the altered code without
>>> adding branches into the fast-paths.
>>
>> That makes sense. You are looking to not add any overhead.
>>
>>>
>>> Here is an alternative solution that should take care of your concern: I'll
>>> build multiple targets for param_test.c:
>>>
>>> param_test
>>> param_test_skip_fastpath (built with -DSKIP_FASTPATH)
>>> param_test_benchmark (build with -DBENCHMARK)
>>>
>>> I'll update run_param_test.sh to run both param_test and
>>> param_test_skip_fastpath.
>>>
>>> Note that "param_test_benchmark" is only useful for benchmarking,
>>> so I don't plan to run it from run_param_test.sh which is meant
>>> to track regressions.
>>>
>>> Is that approach OK with you ?
>>>
>>
>> Yes. This approach addresses my concern about coverage for both paths.
> 
> fyi, the updated patches can be found here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=a0b8eb0eb5d4d8a280969370aa1dcf51801139c6
>   "selftests: lib.mk: Introduce OVERRIDE_TARGETS"
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=4ef0214e19bb7415fe7aed6852859b8d66e09a45
>   "cpu_opv: selftests: Implement selftests (v4)"
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=7d7530b843c7ecb50bea5a136c776cf3e9155d43
>   "rseq: selftests: Provide self-tests (v4)"
> 
> Thanks for the feedback!
> 

Are you going to send these to the mailing list? That way I can do a final
review and give my Ack if they look good.

thanks,
-- Shuah


--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-21 21:24             ` shuahkh
  0 siblings, 0 replies; 175+ messages in thread
From: Shuah Khan @ 2017-11-21 21:24 UTC (permalink / raw)
  To: Mathieu Desnoyers, shuah
  Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas

On 11/21/2017 02:22 PM, Mathieu Desnoyers wrote:
> ----- On Nov 21, 2017, at 12:40 PM, shuah shuah-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org wrote:
> 
>> On 11/21/2017 10:05 AM, Mathieu Desnoyers wrote:
>>> ----- On Nov 21, 2017, at 10:34 AM, shuah shuah-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org wrote:
>>>
>>> [...]
>>>>> ---
>>>>>  MAINTAINERS                                        |    1 +
>>>>>  tools/testing/selftests/Makefile                   |    1 +
>>>>>  tools/testing/selftests/rseq/.gitignore            |    4 +
>>>>
>>>> Thanks for the .gitignore files. It is commonly missed change, I end
>>>> up adding one to clean things up after tests get in.
>>>
>>> I'm used to receive patches where contributors forget to add new files
>>> to gitignore within my own projects, which may contribute to my awareness
>>> of this pain point. :)
>>>
>>> [...]
>>>
>>>>> +
>>>>> +void *test_percpu_inc_thread(void *arg)
>>>>> +{
>>>>> +	struct inc_thread_test_data *thread_data = arg;
>>>>> +	struct inc_test_data *data = thread_data->data;
>>>>> +	long long i, reps;
>>>>> +
>>>>> +	if (!opt_disable_rseq && thread_data->reg
>>>>> +			&& rseq_register_current_thread())
>>>>> +		abort();
>>>>> +	reps = thread_data->reps;
>>>>> +	for (i = 0; i < reps; i++) {
>>>>> +		int cpu, ret;
>>>>> +
>>>>> +#ifndef SKIP_FASTPATH
>>>>> +		/* Try fast path. */
>>>>> +		cpu = rseq_cpu_start();
>>>>> +		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>>>>> +		if (likely(!ret))
>>>>> +			goto next;
>>>>> +#endif
>>>>
>>>> So the test needs to compiled with this enabled? I think it would be better
>>>> to make this an argument to be abel to select at test start time as opposed
>>>> to making this compile time option. Remember that these tests get run in
>>>> automated test rings. Making this a compile time otpion pertty much ensures
>>>> that this path will not be tested.
>>>>
>>>> So I would reccommend adding a paratemer.
>>>>
>>>>> +	slowpath:
>>>>> +		__attribute__((unused));
>>>>> +		for (;;) {
>>>>> +			/* Fallback on cpu_opv system call. */
>>>>> +			cpu = rseq_current_cpu();
>>>>> +			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>>>>> +			if (likely(!ret))
>>>>> +				break;
>>>>> +			assert(ret >= 0 || errno == EAGAIN);
>>>>> +		}
>>>>> +	next:
>>>>> +		__attribute__((unused));
>>>>> +#ifndef BENCHMARK
>>>>> +		if (i != 0 && !(i % (reps / 10)))
>>>>> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
>>>>> +#endif
>>>>
>>>> Same comment as before. Avoid compile time options.
>>>
>>> The goal of those compiler define are to generate the altered code without
>>> adding branches into the fast-paths.
>>
>> That makes sense. You are looking to not add any overhead.
>>
>>>
>>> Here is an alternative solution that should take care of your concern: I'll
>>> build multiple targets for param_test.c:
>>>
>>> param_test
>>> param_test_skip_fastpath (built with -DSKIP_FASTPATH)
>>> param_test_benchmark (build with -DBENCHMARK)
>>>
>>> I'll update run_param_test.sh to run both param_test and
>>> param_test_skip_fastpath.
>>>
>>> Note that "param_test_benchmark" is only useful for benchmarking,
>>> so I don't plan to run it from run_param_test.sh which is meant
>>> to track regressions.
>>>
>>> Is that approach OK with you ?
>>>
>>
>> Yes. This approach addresses my concern about coverage for both paths.
> 
> fyi, the updated patches can be found here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=a0b8eb0eb5d4d8a280969370aa1dcf51801139c6
>   "selftests: lib.mk: Introduce OVERRIDE_TARGETS"
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=4ef0214e19bb7415fe7aed6852859b8d66e09a45
>   "cpu_opv: selftests: Implement selftests (v4)"
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=7d7530b843c7ecb50bea5a136c776cf3e9155d43
>   "rseq: selftests: Provide self-tests (v4)"
> 
> Thanks for the feedback!
> 

Are you going to send these to the mailing list? That way I can do a final
review and give my Ack if they look good.

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
  2017-11-21 21:24             ` shuahkh
  (?)
  (?)
@ 2017-11-21 21:44               ` mathieu.desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 21:44 UTC (permalink / raw)
  To: Shuah Khan
  Cc: shuah, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson, linux-kernel, linux-api,
	Paul Turner, Andrew Morton, Russell King, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen,
	Chris Lameter, Ben Maurer, rostedt, Josh Triplett,
	Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk,
	linux-kselftest

----- On Nov 21, 2017, at 4:24 PM, Shuah Khan shuahkh@osg.samsung.com wrote:

> On 11/21/2017 02:22 PM, Mathieu Desnoyers wrote:
>> ----- On Nov 21, 2017, at 12:40 PM, shuah shuah@kernel.org wrote:
>> 
>>> On 11/21/2017 10:05 AM, Mathieu Desnoyers wrote:
>>>> ----- On Nov 21, 2017, at 10:34 AM, shuah shuah@kernel.org wrote:
>>>>
>>>> [...]
>>>>>> ---
>>>>>>  MAINTAINERS                                        |    1 +
>>>>>>  tools/testing/selftests/Makefile                   |    1 +
>>>>>>  tools/testing/selftests/rseq/.gitignore            |    4 +
>>>>>
>>>>> Thanks for the .gitignore files. It is commonly missed change, I end
>>>>> up adding one to clean things up after tests get in.
>>>>
>>>> I'm used to receive patches where contributors forget to add new files
>>>> to gitignore within my own projects, which may contribute to my awareness
>>>> of this pain point. :)
>>>>
>>>> [...]
>>>>
>>>>>> +
>>>>>> +void *test_percpu_inc_thread(void *arg)
>>>>>> +{
>>>>>> +	struct inc_thread_test_data *thread_data = arg;
>>>>>> +	struct inc_test_data *data = thread_data->data;
>>>>>> +	long long i, reps;
>>>>>> +
>>>>>> +	if (!opt_disable_rseq && thread_data->reg
>>>>>> +			&& rseq_register_current_thread())
>>>>>> +		abort();
>>>>>> +	reps = thread_data->reps;
>>>>>> +	for (i = 0; i < reps; i++) {
>>>>>> +		int cpu, ret;
>>>>>> +
>>>>>> +#ifndef SKIP_FASTPATH
>>>>>> +		/* Try fast path. */
>>>>>> +		cpu = rseq_cpu_start();
>>>>>> +		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>>>>>> +		if (likely(!ret))
>>>>>> +			goto next;
>>>>>> +#endif
>>>>>
>>>>> So the test needs to compiled with this enabled? I think it would be better
>>>>> to make this an argument to be abel to select at test start time as opposed
>>>>> to making this compile time option. Remember that these tests get run in
>>>>> automated test rings. Making this a compile time otpion pertty much ensures
>>>>> that this path will not be tested.
>>>>>
>>>>> So I would reccommend adding a paratemer.
>>>>>
>>>>>> +	slowpath:
>>>>>> +		__attribute__((unused));
>>>>>> +		for (;;) {
>>>>>> +			/* Fallback on cpu_opv system call. */
>>>>>> +			cpu = rseq_current_cpu();
>>>>>> +			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>>>>>> +			if (likely(!ret))
>>>>>> +				break;
>>>>>> +			assert(ret >= 0 || errno == EAGAIN);
>>>>>> +		}
>>>>>> +	next:
>>>>>> +		__attribute__((unused));
>>>>>> +#ifndef BENCHMARK
>>>>>> +		if (i != 0 && !(i % (reps / 10)))
>>>>>> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
>>>>>> +#endif
>>>>>
>>>>> Same comment as before. Avoid compile time options.
>>>>
>>>> The goal of those compiler define are to generate the altered code without
>>>> adding branches into the fast-paths.
>>>
>>> That makes sense. You are looking to not add any overhead.
>>>
>>>>
>>>> Here is an alternative solution that should take care of your concern: I'll
>>>> build multiple targets for param_test.c:
>>>>
>>>> param_test
>>>> param_test_skip_fastpath (built with -DSKIP_FASTPATH)
>>>> param_test_benchmark (build with -DBENCHMARK)
>>>>
>>>> I'll update run_param_test.sh to run both param_test and
>>>> param_test_skip_fastpath.
>>>>
>>>> Note that "param_test_benchmark" is only useful for benchmarking,
>>>> so I don't plan to run it from run_param_test.sh which is meant
>>>> to track regressions.
>>>>
>>>> Is that approach OK with you ?
>>>>
>>>
>>> Yes. This approach addresses my concern about coverage for both paths.
>> 
>> fyi, the updated patches can be found here:
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=a0b8eb0eb5d4d8a280969370aa1dcf51801139c6
>>   "selftests: lib.mk: Introduce OVERRIDE_TARGETS"
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=4ef0214e19bb7415fe7aed6852859b8d66e09a45
>>   "cpu_opv: selftests: Implement selftests (v4)"
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=7d7530b843c7ecb50bea5a136c776cf3e9155d43
>>   "rseq: selftests: Provide self-tests (v4)"
>> 
>> Thanks for the feedback!
>> 
> 
> Are you going to send these to the mailing list? That way I can do a final
> review and give my Ack if they look good.

Sure, I can do one hopefully last round of RFC with those selftests updates.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-21 21:44               ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: mathieu.desnoyers @ 2017-11-21 21:44 UTC (permalink / raw)


----- On Nov 21, 2017, at 4:24 PM, Shuah Khan shuahkh at osg.samsung.com wrote:

> On 11/21/2017 02:22 PM, Mathieu Desnoyers wrote:
>> ----- On Nov 21, 2017, at 12:40 PM, shuah shuah at kernel.org wrote:
>> 
>>> On 11/21/2017 10:05 AM, Mathieu Desnoyers wrote:
>>>> ----- On Nov 21, 2017, at 10:34 AM, shuah shuah at kernel.org wrote:
>>>>
>>>> [...]
>>>>>> ---
>>>>>>  MAINTAINERS                                        |    1 +
>>>>>>  tools/testing/selftests/Makefile                   |    1 +
>>>>>>  tools/testing/selftests/rseq/.gitignore            |    4 +
>>>>>
>>>>> Thanks for the .gitignore files. It is commonly missed change, I end
>>>>> up adding one to clean things up after tests get in.
>>>>
>>>> I'm used to receive patches where contributors forget to add new files
>>>> to gitignore within my own projects, which may contribute to my awareness
>>>> of this pain point. :)
>>>>
>>>> [...]
>>>>
>>>>>> +
>>>>>> +void *test_percpu_inc_thread(void *arg)
>>>>>> +{
>>>>>> +	struct inc_thread_test_data *thread_data = arg;
>>>>>> +	struct inc_test_data *data = thread_data->data;
>>>>>> +	long long i, reps;
>>>>>> +
>>>>>> +	if (!opt_disable_rseq && thread_data->reg
>>>>>> +			&& rseq_register_current_thread())
>>>>>> +		abort();
>>>>>> +	reps = thread_data->reps;
>>>>>> +	for (i = 0; i < reps; i++) {
>>>>>> +		int cpu, ret;
>>>>>> +
>>>>>> +#ifndef SKIP_FASTPATH
>>>>>> +		/* Try fast path. */
>>>>>> +		cpu = rseq_cpu_start();
>>>>>> +		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>>>>>> +		if (likely(!ret))
>>>>>> +			goto next;
>>>>>> +#endif
>>>>>
>>>>> So the test needs to compiled with this enabled? I think it would be better
>>>>> to make this an argument to be abel to select at test start time as opposed
>>>>> to making this compile time option. Remember that these tests get run in
>>>>> automated test rings. Making this a compile time otpion pertty much ensures
>>>>> that this path will not be tested.
>>>>>
>>>>> So I would reccommend adding a paratemer.
>>>>>
>>>>>> +	slowpath:
>>>>>> +		__attribute__((unused));
>>>>>> +		for (;;) {
>>>>>> +			/* Fallback on cpu_opv system call. */
>>>>>> +			cpu = rseq_current_cpu();
>>>>>> +			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>>>>>> +			if (likely(!ret))
>>>>>> +				break;
>>>>>> +			assert(ret >= 0 || errno == EAGAIN);
>>>>>> +		}
>>>>>> +	next:
>>>>>> +		__attribute__((unused));
>>>>>> +#ifndef BENCHMARK
>>>>>> +		if (i != 0 && !(i % (reps / 10)))
>>>>>> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
>>>>>> +#endif
>>>>>
>>>>> Same comment as before. Avoid compile time options.
>>>>
>>>> The goal of those compiler define are to generate the altered code without
>>>> adding branches into the fast-paths.
>>>
>>> That makes sense. You are looking to not add any overhead.
>>>
>>>>
>>>> Here is an alternative solution that should take care of your concern: I'll
>>>> build multiple targets for param_test.c:
>>>>
>>>> param_test
>>>> param_test_skip_fastpath (built with -DSKIP_FASTPATH)
>>>> param_test_benchmark (build with -DBENCHMARK)
>>>>
>>>> I'll update run_param_test.sh to run both param_test and
>>>> param_test_skip_fastpath.
>>>>
>>>> Note that "param_test_benchmark" is only useful for benchmarking,
>>>> so I don't plan to run it from run_param_test.sh which is meant
>>>> to track regressions.
>>>>
>>>> Is that approach OK with you ?
>>>>
>>>
>>> Yes. This approach addresses my concern about coverage for both paths.
>> 
>> fyi, the updated patches can be found here:
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=a0b8eb0eb5d4d8a280969370aa1dcf51801139c6
>>   "selftests: lib.mk: Introduce OVERRIDE_TARGETS"
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=4ef0214e19bb7415fe7aed6852859b8d66e09a45
>>   "cpu_opv: selftests: Implement selftests (v4)"
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=7d7530b843c7ecb50bea5a136c776cf3e9155d43
>>   "rseq: selftests: Provide self-tests (v4)"
>> 
>> Thanks for the feedback!
>> 
> 
> Are you going to send these to the mailing list? That way I can do a final
> review and give my Ack if they look good.

Sure, I can do one hopefully last round of RFC with those selftests updates.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-21 21:44               ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 21:44 UTC (permalink / raw)


----- On Nov 21, 2017,@4:24 PM, Shuah Khan shuahkh@osg.samsung.com wrote:

> On 11/21/2017 02:22 PM, Mathieu Desnoyers wrote:
>> ----- On Nov 21, 2017,@12:40 PM, shuah shuah@kernel.org wrote:
>> 
>>> On 11/21/2017 10:05 AM, Mathieu Desnoyers wrote:
>>>> ----- On Nov 21, 2017,@10:34 AM, shuah shuah@kernel.org wrote:
>>>>
>>>> [...]
>>>>>> ---
>>>>>>  MAINTAINERS                                        |    1 +
>>>>>>  tools/testing/selftests/Makefile                   |    1 +
>>>>>>  tools/testing/selftests/rseq/.gitignore            |    4 +
>>>>>
>>>>> Thanks for the .gitignore files. It is commonly missed change, I end
>>>>> up adding one to clean things up after tests get in.
>>>>
>>>> I'm used to receive patches where contributors forget to add new files
>>>> to gitignore within my own projects, which may contribute to my awareness
>>>> of this pain point. :)
>>>>
>>>> [...]
>>>>
>>>>>> +
>>>>>> +void *test_percpu_inc_thread(void *arg)
>>>>>> +{
>>>>>> +	struct inc_thread_test_data *thread_data = arg;
>>>>>> +	struct inc_test_data *data = thread_data->data;
>>>>>> +	long long i, reps;
>>>>>> +
>>>>>> +	if (!opt_disable_rseq && thread_data->reg
>>>>>> +			&& rseq_register_current_thread())
>>>>>> +		abort();
>>>>>> +	reps = thread_data->reps;
>>>>>> +	for (i = 0; i < reps; i++) {
>>>>>> +		int cpu, ret;
>>>>>> +
>>>>>> +#ifndef SKIP_FASTPATH
>>>>>> +		/* Try fast path. */
>>>>>> +		cpu = rseq_cpu_start();
>>>>>> +		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>>>>>> +		if (likely(!ret))
>>>>>> +			goto next;
>>>>>> +#endif
>>>>>
>>>>> So the test needs to compiled with this enabled? I think it would be better
>>>>> to make this an argument to be abel to select at test start time as opposed
>>>>> to making this compile time option. Remember that these tests get run in
>>>>> automated test rings. Making this a compile time otpion pertty much ensures
>>>>> that this path will not be tested.
>>>>>
>>>>> So I would reccommend adding a paratemer.
>>>>>
>>>>>> +	slowpath:
>>>>>> +		__attribute__((unused));
>>>>>> +		for (;;) {
>>>>>> +			/* Fallback on cpu_opv system call. */
>>>>>> +			cpu = rseq_current_cpu();
>>>>>> +			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>>>>>> +			if (likely(!ret))
>>>>>> +				break;
>>>>>> +			assert(ret >= 0 || errno == EAGAIN);
>>>>>> +		}
>>>>>> +	next:
>>>>>> +		__attribute__((unused));
>>>>>> +#ifndef BENCHMARK
>>>>>> +		if (i != 0 && !(i % (reps / 10)))
>>>>>> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
>>>>>> +#endif
>>>>>
>>>>> Same comment as before. Avoid compile time options.
>>>>
>>>> The goal of those compiler define are to generate the altered code without
>>>> adding branches into the fast-paths.
>>>
>>> That makes sense. You are looking to not add any overhead.
>>>
>>>>
>>>> Here is an alternative solution that should take care of your concern: I'll
>>>> build multiple targets for param_test.c:
>>>>
>>>> param_test
>>>> param_test_skip_fastpath (built with -DSKIP_FASTPATH)
>>>> param_test_benchmark (build with -DBENCHMARK)
>>>>
>>>> I'll update run_param_test.sh to run both param_test and
>>>> param_test_skip_fastpath.
>>>>
>>>> Note that "param_test_benchmark" is only useful for benchmarking,
>>>> so I don't plan to run it from run_param_test.sh which is meant
>>>> to track regressions.
>>>>
>>>> Is that approach OK with you ?
>>>>
>>>
>>> Yes. This approach addresses my concern about coverage for both paths.
>> 
>> fyi, the updated patches can be found here:
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=a0b8eb0eb5d4d8a280969370aa1dcf51801139c6
>>   "selftests: lib.mk: Introduce OVERRIDE_TARGETS"
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=4ef0214e19bb7415fe7aed6852859b8d66e09a45
>>   "cpu_opv: selftests: Implement selftests (v4)"
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=7d7530b843c7ecb50bea5a136c776cf3e9155d43
>>   "rseq: selftests: Provide self-tests (v4)"
>> 
>> Thanks for the feedback!
>> 
> 
> Are you going to send these to the mailing list? That way I can do a final
> review and give my Ack if they look good.

Sure, I can do one hopefully last round of RFC with those selftests updates.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-21 21:44               ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 21:44 UTC (permalink / raw)
  To: Shuah Khan
  Cc: shuah, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson, linux-kernel, linux-api,
	Paul Turner, Andrew Morton, Russell King, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen,
	Chris Lameter, Ben Maurer, rostedt, Josh Triplett,
	Linus Torvalds

----- On Nov 21, 2017, at 4:24 PM, Shuah Khan shuahkh-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org wrote:

> On 11/21/2017 02:22 PM, Mathieu Desnoyers wrote:
>> ----- On Nov 21, 2017, at 12:40 PM, shuah shuah-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org wrote:
>> 
>>> On 11/21/2017 10:05 AM, Mathieu Desnoyers wrote:
>>>> ----- On Nov 21, 2017, at 10:34 AM, shuah shuah-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org wrote:
>>>>
>>>> [...]
>>>>>> ---
>>>>>>  MAINTAINERS                                        |    1 +
>>>>>>  tools/testing/selftests/Makefile                   |    1 +
>>>>>>  tools/testing/selftests/rseq/.gitignore            |    4 +
>>>>>
>>>>> Thanks for the .gitignore files. It is commonly missed change, I end
>>>>> up adding one to clean things up after tests get in.
>>>>
>>>> I'm used to receive patches where contributors forget to add new files
>>>> to gitignore within my own projects, which may contribute to my awareness
>>>> of this pain point. :)
>>>>
>>>> [...]
>>>>
>>>>>> +
>>>>>> +void *test_percpu_inc_thread(void *arg)
>>>>>> +{
>>>>>> +	struct inc_thread_test_data *thread_data = arg;
>>>>>> +	struct inc_test_data *data = thread_data->data;
>>>>>> +	long long i, reps;
>>>>>> +
>>>>>> +	if (!opt_disable_rseq && thread_data->reg
>>>>>> +			&& rseq_register_current_thread())
>>>>>> +		abort();
>>>>>> +	reps = thread_data->reps;
>>>>>> +	for (i = 0; i < reps; i++) {
>>>>>> +		int cpu, ret;
>>>>>> +
>>>>>> +#ifndef SKIP_FASTPATH
>>>>>> +		/* Try fast path. */
>>>>>> +		cpu = rseq_cpu_start();
>>>>>> +		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>>>>>> +		if (likely(!ret))
>>>>>> +			goto next;
>>>>>> +#endif
>>>>>
>>>>> So the test needs to compiled with this enabled? I think it would be better
>>>>> to make this an argument to be abel to select at test start time as opposed
>>>>> to making this compile time option. Remember that these tests get run in
>>>>> automated test rings. Making this a compile time otpion pertty much ensures
>>>>> that this path will not be tested.
>>>>>
>>>>> So I would reccommend adding a paratemer.
>>>>>
>>>>>> +	slowpath:
>>>>>> +		__attribute__((unused));
>>>>>> +		for (;;) {
>>>>>> +			/* Fallback on cpu_opv system call. */
>>>>>> +			cpu = rseq_current_cpu();
>>>>>> +			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>>>>>> +			if (likely(!ret))
>>>>>> +				break;
>>>>>> +			assert(ret >= 0 || errno == EAGAIN);
>>>>>> +		}
>>>>>> +	next:
>>>>>> +		__attribute__((unused));
>>>>>> +#ifndef BENCHMARK
>>>>>> +		if (i != 0 && !(i % (reps / 10)))
>>>>>> +			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
>>>>>> +#endif
>>>>>
>>>>> Same comment as before. Avoid compile time options.
>>>>
>>>> The goal of those compiler define are to generate the altered code without
>>>> adding branches into the fast-paths.
>>>
>>> That makes sense. You are looking to not add any overhead.
>>>
>>>>
>>>> Here is an alternative solution that should take care of your concern: I'll
>>>> build multiple targets for param_test.c:
>>>>
>>>> param_test
>>>> param_test_skip_fastpath (built with -DSKIP_FASTPATH)
>>>> param_test_benchmark (build with -DBENCHMARK)
>>>>
>>>> I'll update run_param_test.sh to run both param_test and
>>>> param_test_skip_fastpath.
>>>>
>>>> Note that "param_test_benchmark" is only useful for benchmarking,
>>>> so I don't plan to run it from run_param_test.sh which is meant
>>>> to track regressions.
>>>>
>>>> Is that approach OK with you ?
>>>>
>>>
>>> Yes. This approach addresses my concern about coverage for both paths.
>> 
>> fyi, the updated patches can be found here:
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=a0b8eb0eb5d4d8a280969370aa1dcf51801139c6
>>   "selftests: lib.mk: Introduce OVERRIDE_TARGETS"
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=4ef0214e19bb7415fe7aed6852859b8d66e09a45
>>   "cpu_opv: selftests: Implement selftests (v4)"
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/commit/?h=rseq/dev&id=7d7530b843c7ecb50bea5a136c776cf3e9155d43
>>   "rseq: selftests: Provide self-tests (v4)"
>> 
>> Thanks for the feedback!
>> 
> 
> Are you going to send these to the mailing list? That way I can do a final
> review and give my Ack if they look good.

Sure, I can do one hopefully last round of RFC with those selftests updates.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-21 22:05     ` Mathieu Desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 22:05 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Chris Lameter, Ben Maurer, rostedt, Josh Triplett,
	Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk

----- On Nov 21, 2017, at 12:21 PM, Andi Kleen andi@firstfloor.org wrote:

> On Tue, Nov 21, 2017 at 09:18:38AM -0500, Mathieu Desnoyers wrote:
>> Hi,
>> 
>> Following changes based on a thorough coding style and patch changelog
>> review from Thomas Gleixner and Peter Zijlstra, I'm respinning this
>> series for another RFC.
>> 
> My suggestion would be that you also split out the opv system call.
> That seems to be main contention point currently, and the restartable
> sequences should be useful without it.

I consider rseq to be incomplete and a pain to use in various scenarios
without cpu_opv. 

About the contention point you refer to:

Using vDSO as an example of how things should be done is just wrong: the
vDSO interaction with debugger instruction single-stepping is broken,
as I detailed in my previous email.

Thomas' proposal of handling single-stepping with a user-space locking
fallback, which is pretty much what I had in 2016, pushes a lot of
complexity to user-space, requires an extra branch in the fast-path,
as well as additional store-release/load-acquire semantics for consistency.
I don't plan going down that route.

Other than that, I have not received any concrete alternative proposal to
properly handle single-stepping.

The only opposition against cpu_opv is that there *should* be an hypothetical
simpler solution. The rseq idea is not new: it's been presented by Paul Turner
in 2012 at LPC. And so far, cpu_opv is the overall simplest and most
efficient way I encountered to handle single-stepping, and it gives extra
benefits, as described in my changelog.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-21 22:05     ` Mathieu Desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 22:05 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Chris Lameter, Ben Maurer, rostedt, Josh Triplett,
	Linus Torvalds, Catalin Marinas, Will

----- On Nov 21, 2017, at 12:21 PM, Andi Kleen andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org wrote:

> On Tue, Nov 21, 2017 at 09:18:38AM -0500, Mathieu Desnoyers wrote:
>> Hi,
>> 
>> Following changes based on a thorough coding style and patch changelog
>> review from Thomas Gleixner and Peter Zijlstra, I'm respinning this
>> series for another RFC.
>> 
> My suggestion would be that you also split out the opv system call.
> That seems to be main contention point currently, and the restartable
> sequences should be useful without it.

I consider rseq to be incomplete and a pain to use in various scenarios
without cpu_opv. 

About the contention point you refer to:

Using vDSO as an example of how things should be done is just wrong: the
vDSO interaction with debugger instruction single-stepping is broken,
as I detailed in my previous email.

Thomas' proposal of handling single-stepping with a user-space locking
fallback, which is pretty much what I had in 2016, pushes a lot of
complexity to user-space, requires an extra branch in the fast-path,
as well as additional store-release/load-acquire semantics for consistency.
I don't plan going down that route.

Other than that, I have not received any concrete alternative proposal to
properly handle single-stepping.

The only opposition against cpu_opv is that there *should* be an hypothetical
simpler solution. The rseq idea is not new: it's been presented by Paul Turner
in 2012 at LPC. And so far, cpu_opv is the overall simplest and most
efficient way I encountered to handle single-stepping, and it gives extra
benefits, as described in my changelog.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH update for 4.15 1/3] selftests: lib.mk: Introduce OVERRIDE_TARGETS
  2017-11-21 14:18 ` Mathieu Desnoyers
  (?)
  (?)
@ 2017-11-21 22:19   ` mathieu.desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 22:19 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Shuah Khan,
	linux-kselftest

Introduce OVERRIDE_TARGETS to allow tests to express dependencies on
header files and .so, which require to override the selftests lib.mk
targets.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Shuah Khan <shuah@kernel.org>
CC: linux-kselftest@vger.kernel.org
CC: linux-api@vger.kernel.org
---
 tools/testing/selftests/lib.mk | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index 5bef05d6ba39..441d7bc63bb7 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -105,6 +105,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c
 LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
 endif
 
+# Selftest makefiles can override those targets by setting
+# OVERRIDE_TARGETS = 1.
+ifeq ($(OVERRIDE_TARGETS),)
 $(OUTPUT)/%:%.c
 	$(LINK.c) $^ $(LDLIBS) -o $@
 
@@ -113,5 +116,6 @@ $(OUTPUT)/%.o:%.S
 
 $(OUTPUT)/%:%.S
 	$(LINK.S) $^ $(LDLIBS) -o $@
+endif
 
 .PHONY: run_tests all clean install emit_tests
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [PATCH update for 4.15 1/3] selftests: lib.mk: Introduce OVERRIDE_TARGETS
@ 2017-11-21 22:19   ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: mathieu.desnoyers @ 2017-11-21 22:19 UTC (permalink / raw)


Introduce OVERRIDE_TARGETS to allow tests to express dependencies on
header files and .so, which require to override the selftests lib.mk
targets.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
CC: Russell King <linux at arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas at arm.com>
CC: Will Deacon <will.deacon at arm.com>
CC: Thomas Gleixner <tglx at linutronix.de>
CC: Paul Turner <pjt at google.com>
CC: Andrew Hunter <ahh at google.com>
CC: Peter Zijlstra <peterz at infradead.org>
CC: Andy Lutomirski <luto at amacapital.net>
CC: Andi Kleen <andi at firstfloor.org>
CC: Dave Watson <davejwatson at fb.com>
CC: Chris Lameter <cl at linux.com>
CC: Ingo Molnar <mingo at redhat.com>
CC: "H. Peter Anvin" <hpa at zytor.com>
CC: Ben Maurer <bmaurer at fb.com>
CC: Steven Rostedt <rostedt at goodmis.org>
CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
CC: Josh Triplett <josh at joshtriplett.org>
CC: Linus Torvalds <torvalds at linux-foundation.org>
CC: Andrew Morton <akpm at linux-foundation.org>
CC: Boqun Feng <boqun.feng at gmail.com>
CC: Shuah Khan <shuah at kernel.org>
CC: linux-kselftest at vger.kernel.org
CC: linux-api at vger.kernel.org
---
 tools/testing/selftests/lib.mk | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index 5bef05d6ba39..441d7bc63bb7 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -105,6 +105,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c
 LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
 endif
 
+# Selftest makefiles can override those targets by setting
+# OVERRIDE_TARGETS = 1.
+ifeq ($(OVERRIDE_TARGETS),)
 $(OUTPUT)/%:%.c
 	$(LINK.c) $^ $(LDLIBS) -o $@
 
@@ -113,5 +116,6 @@ $(OUTPUT)/%.o:%.S
 
 $(OUTPUT)/%:%.S
 	$(LINK.S) $^ $(LDLIBS) -o $@
+endif
 
 .PHONY: run_tests all clean install emit_tests
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [PATCH update for 4.15 1/3] selftests: lib.mk: Introduce OVERRIDE_TARGETS
@ 2017-11-21 22:19   ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 22:19 UTC (permalink / raw)


Introduce OVERRIDE_TARGETS to allow tests to express dependencies on
header files and .so, which require to override the selftests lib.mk
targets.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
CC: Russell King <linux at arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas at arm.com>
CC: Will Deacon <will.deacon at arm.com>
CC: Thomas Gleixner <tglx at linutronix.de>
CC: Paul Turner <pjt at google.com>
CC: Andrew Hunter <ahh at google.com>
CC: Peter Zijlstra <peterz at infradead.org>
CC: Andy Lutomirski <luto at amacapital.net>
CC: Andi Kleen <andi at firstfloor.org>
CC: Dave Watson <davejwatson at fb.com>
CC: Chris Lameter <cl at linux.com>
CC: Ingo Molnar <mingo at redhat.com>
CC: "H. Peter Anvin" <hpa at zytor.com>
CC: Ben Maurer <bmaurer at fb.com>
CC: Steven Rostedt <rostedt at goodmis.org>
CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
CC: Josh Triplett <josh at joshtriplett.org>
CC: Linus Torvalds <torvalds at linux-foundation.org>
CC: Andrew Morton <akpm at linux-foundation.org>
CC: Boqun Feng <boqun.feng at gmail.com>
CC: Shuah Khan <shuah at kernel.org>
CC: linux-kselftest at vger.kernel.org
CC: linux-api at vger.kernel.org
---
 tools/testing/selftests/lib.mk | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index 5bef05d6ba39..441d7bc63bb7 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -105,6 +105,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c
 LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
 endif
 
+# Selftest makefiles can override those targets by setting
+# OVERRIDE_TARGETS = 1.
+ifeq ($(OVERRIDE_TARGETS),)
 $(OUTPUT)/%:%.c
 	$(LINK.c) $^ $(LDLIBS) -o $@
 
@@ -113,5 +116,6 @@ $(OUTPUT)/%.o:%.S
 
 $(OUTPUT)/%:%.S
 	$(LINK.S) $^ $(LDLIBS) -o $@
+endif
 
 .PHONY: run_tests all clean install emit_tests
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [PATCH update for 4.15 1/3] selftests: lib.mk: Introduce OVERRIDE_TARGETS
@ 2017-11-21 22:19   ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 22:19 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Shuah Khan,
	linux-kselftest-u79uwXL29TY76Z2rM5mHXA

Introduce OVERRIDE_TARGETS to allow tests to express dependencies on
header files and .so, which require to override the selftests lib.mk
targets.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
CC: Russell King <linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>
CC: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
CC: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
CC: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
CC: Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
CC: Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
CC: Dave Watson <davejwatson-b10kYP2dOMg@public.gmane.org>
CC: Chris Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
CC: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
CC: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
CC: Ben Maurer <bmaurer-b10kYP2dOMg@public.gmane.org>
CC: Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>
CC: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
CC: Josh Triplett <josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org>
CC: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
CC: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
CC: Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Shuah Khan <shuah-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
CC: linux-kselftest-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
CC: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
 tools/testing/selftests/lib.mk | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index 5bef05d6ba39..441d7bc63bb7 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -105,6 +105,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c
 LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
 endif
 
+# Selftest makefiles can override those targets by setting
+# OVERRIDE_TARGETS = 1.
+ifeq ($(OVERRIDE_TARGETS),)
 $(OUTPUT)/%:%.c
 	$(LINK.c) $^ $(LDLIBS) -o $@
 
@@ -113,5 +116,6 @@ $(OUTPUT)/%.o:%.S
 
 $(OUTPUT)/%:%.S
 	$(LINK.S) $^ $(LDLIBS) -o $@
+endif
 
 .PHONY: run_tests all clean install emit_tests
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [PATCH update for 4.15 2/3] cpu_opv: selftests: Implement selftests (v4)
  2017-11-21 22:19   ` mathieu.desnoyers
  (?)
  (?)
@ 2017-11-21 22:19   ` mathieu.desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 22:19 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Shuah Khan,
	linux-kselftest

Implement cpu_opv selftests. It needs to express dependencies on
header files and .so, which require to override the selftests
lib.mk targets. Use OVERRIDE_TARGETS define for this.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Shuah Khan <shuah@kernel.org>
CC: linux-kselftest@vger.kernel.org
CC: linux-api@vger.kernel.org
---
Changes since v1:

- Expose similar library API as rseq:  Expose library API closely
  matching the rseq APIs, following removal of the event counter from
  the rseq kernel API.
- Update makefile to fix make run_tests dependency on "all".
- Introduce a OVERRIDE_TARGETS.

Changes since v2:

- Test page faults.

Changes since v3:

- Move lib.mk OVERRIDE_TARGETS change to its own patch.
- Printout TAP output.
---
 MAINTAINERS                                        |    1 +
 tools/testing/selftests/Makefile                   |    1 +
 tools/testing/selftests/cpu-opv/.gitignore         |    1 +
 tools/testing/selftests/cpu-opv/Makefile           |   17 +
 .../testing/selftests/cpu-opv/basic_cpu_opv_test.c | 1167 ++++++++++++++++++++
 tools/testing/selftests/cpu-opv/cpu-op.c           |  348 ++++++
 tools/testing/selftests/cpu-opv/cpu-op.h           |   68 ++
 7 files changed, 1603 insertions(+)
 create mode 100644 tools/testing/selftests/cpu-opv/.gitignore
 create mode 100644 tools/testing/selftests/cpu-opv/Makefile
 create mode 100644 tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 0b4e504f5003..c6c2436d15f8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3734,6 +3734,7 @@ L:	linux-kernel@vger.kernel.org
 S:	Supported
 F:	kernel/cpu_opv.c
 F:	include/uapi/linux/cpu_opv.h
+F:	tools/testing/selftests/cpu-opv/
 
 CRAMFS FILESYSTEM
 M:	Nicolas Pitre <nico@linaro.org>
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index eaf599dc2137..fc1eba0e0130 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -5,6 +5,7 @@ TARGETS += breakpoints
 TARGETS += capabilities
 TARGETS += cpufreq
 TARGETS += cpu-hotplug
+TARGETS += cpu-opv
 TARGETS += efivarfs
 TARGETS += exec
 TARGETS += firmware
diff --git a/tools/testing/selftests/cpu-opv/.gitignore b/tools/testing/selftests/cpu-opv/.gitignore
new file mode 100644
index 000000000000..c7186eb95cf5
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/.gitignore
@@ -0,0 +1 @@
+basic_cpu_opv_test
diff --git a/tools/testing/selftests/cpu-opv/Makefile b/tools/testing/selftests/cpu-opv/Makefile
new file mode 100644
index 000000000000..21e63545d521
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/Makefile
@@ -0,0 +1,17 @@
+CFLAGS += -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./
+
+# Own dependencies because we only want to build against 1st prerequisite, but
+# still track changes to header files and depend on shared object.
+OVERRIDE_TARGETS = 1
+
+TEST_GEN_PROGS = basic_cpu_opv_test
+
+TEST_GEN_PROGS_EXTENDED = libcpu-op.so
+
+include ../lib.mk
+
+$(OUTPUT)/libcpu-op.so: cpu-op.c cpu-op.h
+	$(CC) $(CFLAGS) -shared -fPIC $< -o $@
+
+$(OUTPUT)/%: %.c $(TEST_GEN_PROGS_EXTENDED) cpu-op.h
+	$(CC) $(CFLAGS) $< -lcpu-op -o $@
diff --git a/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
new file mode 100644
index 000000000000..5aeb6ed0b361
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
@@ -0,0 +1,1167 @@
+/*
+ * Basic test coverage for cpu_opv system call.
+ */
+
+#define _GNU_SOURCE
+#include <assert.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/time.h>
+#include <errno.h>
+#include <stdlib.h>
+
+#include "../kselftest.h"
+
+#include "cpu-op.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+#define TESTBUFLEN	4096
+#define TESTBUFLEN_CMP	16
+
+#define TESTBUFLEN_PAGE_MAX	65536
+
+#define NR_PF_ARRAY	16384
+#define PF_ARRAY_LEN	4096
+
+/* 64 MB arrays for page fault testing. */
+char pf_array_dst[NR_PF_ARRAY][PF_ARRAY_LEN];
+char pf_array_src[NR_PF_ARRAY][PF_ARRAY_LEN];
+
+static int test_compare_eq_op(char *a, char *b, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_compare_eq_same(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_eq same";
+
+	/* Test compare_eq */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf2[i] = (char)i;
+	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret > 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 0);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_compare_eq_diff(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_eq different";
+
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret == 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 1);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_compare_ne_op(char *a, char *b, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_compare_ne_same(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_ne same";
+
+	/* Test compare_ne */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf2[i] = (char)i;
+	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret == 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 1);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_compare_ne_diff(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_ne different";
+
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 0);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_2compare_eq_op(char *a, char *b, char *c, char *d,
+		size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_2compare_eq_index(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN_CMP];
+	char buf2[TESTBUFLEN_CMP];
+	char buf3[TESTBUFLEN_CMP];
+	char buf4[TESTBUFLEN_CMP];
+	const char *test_name = "test_2compare_eq index";
+
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN_CMP);
+	memset(buf3, 0, TESTBUFLEN_CMP);
+	memset(buf4, 0, TESTBUFLEN_CMP);
+
+	/* First compare failure is op[0], expect 1. */
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 1) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 1);
+		return -1;
+	}
+
+	/* All compares succeed. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf2[i] = (char)i;
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 0);
+		return -1;
+	}
+
+	/* First compare failure is op[1], expect 2. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf3[i] = (char)i;
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 2) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 2);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_2compare_ne_op(char *a, char *b, char *c, char *d,
+		size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_2compare_ne_index(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN_CMP];
+	char buf2[TESTBUFLEN_CMP];
+	char buf3[TESTBUFLEN_CMP];
+	char buf4[TESTBUFLEN_CMP];
+	const char *test_name = "test_2compare_ne index";
+
+	memset(buf1, 0, TESTBUFLEN_CMP);
+	memset(buf2, 0, TESTBUFLEN_CMP);
+	memset(buf3, 0, TESTBUFLEN_CMP);
+	memset(buf4, 0, TESTBUFLEN_CMP);
+
+	/* First compare ne failure is op[0], expect 1. */
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 1) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 1);
+		return -1;
+	}
+
+	/* All compare ne succeed. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf3[i] = (char)i;
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 0);
+		return -1;
+	}
+
+	/* First compare failure is op[1], expect 2. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf4[i] = (char)i;
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 2) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 2);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_memcpy_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_memcpy";
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	for (i = 0; i < TESTBUFLEN; i++) {
+		if (buf2[i] != (char)i) {
+			ksft_exit_fail_msg("%s test: unexpected value at offset %d. Found %d. Should be %d.\n",
+				   test_name, i, buf2[i], (char)i);
+			return -1;
+		}
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_memcpy_u32(void)
+{
+	int ret;
+	uint32_t v1, v2;
+	const char *test_name = "test_memcpy_u32";
+
+	/* Test memcpy_u32 */
+	v1 = 42;
+	v2 = 0;
+	ret = test_memcpy_op(&v2, &v1, sizeof(v1));
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v1 != v2) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v2, v1);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_memcpy_mb_memcpy_op(void *dst1, void *src1,
+		void *dst2, void *src2, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst1),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src1),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_MB_OP,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst2),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src2),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy_mb_memcpy(void)
+{
+	int ret;
+	int v1, v2, v3;
+	const char *test_name = "test_memcpy_mb_memcpy";
+
+	/* Test memcpy */
+	v1 = 42;
+	v2 = v3 = 0;
+	ret = test_memcpy_mb_memcpy_op(&v2, &v1, &v3, &v2, sizeof(int));
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v3 != v1) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v3, v1);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_add_op(int *v, int64_t increment)
+{
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_op_add(v, increment, sizeof(*v), cpu);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_add(void)
+{
+	int orig_v = 42, v, ret;
+	int increment = 1;
+	const char *test_name = "test_add";
+
+	v = orig_v;
+	ret = test_add_op(&v, increment);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != orig_v + increment) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v,
+				   orig_v + increment);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_two_add_op(int *v, int64_t *increments)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_ADD_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.arithmetic_op.p, v),
+			.u.arithmetic_op.count = increments[0],
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+		[1] = {
+			.op = CPU_ADD_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.arithmetic_op.p, v),
+			.u.arithmetic_op.count = increments[1],
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_two_add(void)
+{
+	int orig_v = 42, v, ret;
+	int64_t increments[2] = { 99, 123 };
+	const char *test_name = "test_two_add";
+
+	v = orig_v;
+	ret = test_two_add_op(&v, increments);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != orig_v + increments[0] + increments[1]) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v,
+				   orig_v + increments[0] + increments[1]);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_or_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_OR_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_or(void)
+{
+	int orig_v = 0xFF00000, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_or";
+
+	v = orig_v;
+	ret = test_or_op(&v, mask);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v | mask)) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v, orig_v | mask);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_and_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_AND_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_and(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_and";
+
+	v = orig_v;
+	ret = test_and_op(&v, mask);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v & mask)) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v, orig_v & mask);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_xor_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_XOR_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_xor(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_xor";
+
+	v = orig_v;
+	ret = test_xor_op(&v, mask);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v ^ mask)) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v, orig_v ^ mask);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_lshift_op(int *v, uint32_t bits)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_LSHIFT_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.shift_op.p, v),
+			.u.shift_op.bits = bits,
+			.u.shift_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_lshift(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t bits = 5;
+	const char *test_name = "test_lshift";
+
+	v = orig_v;
+	ret = test_lshift_op(&v, bits);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v << bits)) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v, orig_v << bits);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_rshift_op(int *v, uint32_t bits)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_RSHIFT_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.shift_op.p, v),
+			.u.shift_op.bits = bits,
+			.u.shift_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_rshift(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t bits = 5;
+	const char *test_name = "test_rshift";
+
+	v = orig_v;
+	ret = test_rshift_op(&v, bits);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v >> bits)) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v, orig_v >> bits);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_cmpxchg_op(void *v, void *expect, void *old, void *n,
+		size_t len)
+{
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_op_cmpxchg(v, expect, old, n, len, cpu);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_cmpxchg_success(void)
+{
+	int ret;
+	uint64_t orig_v = 1, v, expect = 1, old = 0, n = 3;
+	const char *test_name = "test_cmpxchg success";
+
+	v = orig_v;
+	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret) {
+		ksft_exit_fail_msg("%s returned %d, expecting %d\n",
+				   test_name, ret, 0);
+		return -1;
+	}
+	if (v != n) {
+		ksft_exit_fail_msg("%s v is %lld, expecting %lld\n",
+				   test_name, (long long)v, (long long)n);
+		return -1;
+	}
+	if (old != orig_v) {
+		ksft_exit_fail_msg("%s old is %lld, expecting %lld\n",
+				   test_name, (long long)old,
+				   (long long)orig_v);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_cmpxchg_fail(void)
+{
+	int ret;
+	uint64_t orig_v = 1, v, expect = 123, old = 0, n = 3;
+	const char *test_name = "test_cmpxchg fail";
+
+	v = orig_v;
+	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret == 0) {
+		ksft_exit_fail_msg("%s returned %d, expecting %d\n",
+				   test_name, ret, 1);
+		return -1;
+	}
+	if (v == n) {
+		ksft_exit_fail_msg("%s returned %lld, expecting %lld\n",
+				   test_name, (long long)v, (long long)orig_v);
+		return -1;
+	}
+	if (old != orig_v) {
+		ksft_exit_fail_msg("%s old is %lld, expecting %lld\n",
+				   test_name, (long long)old,
+				   (long long)orig_v);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_memcpy_expect_fault_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			/* Return EAGAIN on fault. */
+			.u.memcpy_op.expect_fault_src = 1,
+		},
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_memcpy_fault(void)
+{
+	int ret;
+	char buf1[TESTBUFLEN];
+	const char *test_name = "test_memcpy_fault";
+
+	/* Test memcpy */
+	ret = test_memcpy_op(buf1, NULL, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EFAULT)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	/* Test memcpy expect fault */
+	ret = test_memcpy_expect_fault_op(buf1, NULL, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EAGAIN)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int do_test_unknown_op(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = -1,	/* Unknown */
+			.len = 0,
+		},
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_unknown_op(void)
+{
+	int ret;
+	const char *test_name = "test_unknown_op";
+
+	ret = do_test_unknown_op();
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int do_test_max_ops(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = { .op = CPU_MB_OP, },
+		[1] = { .op = CPU_MB_OP, },
+		[2] = { .op = CPU_MB_OP, },
+		[3] = { .op = CPU_MB_OP, },
+		[4] = { .op = CPU_MB_OP, },
+		[5] = { .op = CPU_MB_OP, },
+		[6] = { .op = CPU_MB_OP, },
+		[7] = { .op = CPU_MB_OP, },
+		[8] = { .op = CPU_MB_OP, },
+		[9] = { .op = CPU_MB_OP, },
+		[10] = { .op = CPU_MB_OP, },
+		[11] = { .op = CPU_MB_OP, },
+		[12] = { .op = CPU_MB_OP, },
+		[13] = { .op = CPU_MB_OP, },
+		[14] = { .op = CPU_MB_OP, },
+		[15] = { .op = CPU_MB_OP, },
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_max_ops(void)
+{
+	int ret;
+	const char *test_name = "test_max_ops";
+
+	ret = do_test_max_ops();
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int do_test_too_many_ops(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = { .op = CPU_MB_OP, },
+		[1] = { .op = CPU_MB_OP, },
+		[2] = { .op = CPU_MB_OP, },
+		[3] = { .op = CPU_MB_OP, },
+		[4] = { .op = CPU_MB_OP, },
+		[5] = { .op = CPU_MB_OP, },
+		[6] = { .op = CPU_MB_OP, },
+		[7] = { .op = CPU_MB_OP, },
+		[8] = { .op = CPU_MB_OP, },
+		[9] = { .op = CPU_MB_OP, },
+		[10] = { .op = CPU_MB_OP, },
+		[11] = { .op = CPU_MB_OP, },
+		[12] = { .op = CPU_MB_OP, },
+		[13] = { .op = CPU_MB_OP, },
+		[14] = { .op = CPU_MB_OP, },
+		[15] = { .op = CPU_MB_OP, },
+		[16] = { .op = CPU_MB_OP, },
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_too_many_ops(void)
+{
+	int ret;
+	const char *test_name = "test_too_many_ops";
+
+	ret = do_test_too_many_ops();
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+/* Use 64kB len, largest page size known on Linux. */
+static int test_memcpy_single_too_large(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN_PAGE_MAX + 1];
+	char buf2[TESTBUFLEN_PAGE_MAX + 1];
+	const char *test_name = "test_memcpy_single_too_large";
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN_PAGE_MAX + 1; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN_PAGE_MAX + 1);
+	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN_PAGE_MAX + 1);
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_memcpy_single_ok_sum_too_large_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy_single_ok_sum_too_large(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_memcpy_single_ok_sum_too_large";
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_memcpy_single_ok_sum_too_large_op(buf2, buf1, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+/*
+ * Iterate over large uninitialized arrays to trigger page faults.
+ */
+int test_page_fault(void)
+{
+	int ret = 0;
+	uint64_t i;
+	const char *test_name = "test_page_fault";
+
+	for (i = 0; i < NR_PF_ARRAY; i++) {
+		ret = test_memcpy_op(pf_array_dst[i],
+				     pf_array_src[i],
+				     PF_ARRAY_LEN);
+		if (ret) {
+			ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+					   test_name, ret, strerror(errno));
+			return ret;
+		}
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	ksft_print_header();
+
+	test_compare_eq_same();
+	test_compare_eq_diff();
+	test_compare_ne_same();
+	test_compare_ne_diff();
+	test_2compare_eq_index();
+	test_2compare_ne_index();
+	test_memcpy();
+	test_memcpy_u32();
+	test_memcpy_mb_memcpy();
+	test_add();
+	test_two_add();
+	test_or();
+	test_and();
+	test_xor();
+	test_lshift();
+	test_rshift();
+	test_cmpxchg_success();
+	test_cmpxchg_fail();
+	test_memcpy_fault();
+	test_unknown_op();
+	test_max_ops();
+	test_too_many_ops();
+	test_memcpy_single_too_large();
+	test_memcpy_single_ok_sum_too_large();
+	test_page_fault();
+
+	return ksft_exit_pass();
+}
diff --git a/tools/testing/selftests/cpu-opv/cpu-op.c b/tools/testing/selftests/cpu-opv/cpu-op.c
new file mode 100644
index 000000000000..d7ba481cca04
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/cpu-op.c
@@ -0,0 +1,348 @@
+/*
+ * cpu-op.c
+ *
+ * Copyright (C) 2017 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; only
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <syscall.h>
+#include <assert.h>
+#include <signal.h>
+
+#include "cpu-op.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+int cpu_opv(struct cpu_op *cpu_opv, int cpuopcnt, int cpu, int flags)
+{
+	return syscall(__NR_cpu_opv, cpu_opv, cpuopcnt, cpu, flags);
+}
+
+int cpu_op_get_current_cpu(void)
+{
+	int cpu;
+
+	cpu = sched_getcpu();
+	if (cpu < 0) {
+		perror("sched_getcpu()");
+		abort();
+	}
+	return cpu;
+}
+
+int cpu_op_cmpxchg(void *v, void *expect, void *old, void *n,
+		size_t len, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)old,
+			.u.memcpy_op.src = (unsigned long)v,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)n,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_add(void *v, int64_t count, size_t len, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_ADD_OP,
+			.len = len,
+			.u.arithmetic_op.p = (unsigned long)v,
+			.u.arithmetic_op.count = count,
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int cpu_op_cmpeqv_storep_expect_fault(intptr_t *v, intptr_t expect,
+		intptr_t *newp, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)newp,
+			.u.memcpy_op.expect_fault_dst = 0,
+			/* Return EAGAIN on src fault. */
+			.u.memcpy_op.expect_fault_src = 1,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	intptr_t oldv = READ_ONCE(*v);
+	intptr_t *newp = (intptr_t *)(oldv + voffp);
+	int ret;
+
+	if (oldv == expectnot)
+		return 1;
+	ret = cpu_op_cmpeqv_storep_expect_fault(v, oldv, newp, cpu);
+	if (!ret) {
+		*load = oldv;
+		return 0;
+	}
+	if (ret > 0) {
+		errno = EAGAIN;
+		return -1;
+	}
+	return -1;
+}
+
+int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v2,
+			.u.memcpy_op.src = (unsigned long)&newv2,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v2,
+			.u.memcpy_op.src = (unsigned long)&newv2,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MB_OP,
+		},
+		[3] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v2,
+			.u.compare_op.b = (unsigned long)&expect2,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)dst,
+			.u.memcpy_op.src = (unsigned long)src,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)dst,
+			.u.memcpy_op.src = (unsigned long)src,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MB_OP,
+		},
+		[3] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_addv(intptr_t *v, int64_t count, int cpu)
+{
+	return cpu_op_add(v, count, sizeof(intptr_t), cpu);
+}
diff --git a/tools/testing/selftests/cpu-opv/cpu-op.h b/tools/testing/selftests/cpu-opv/cpu-op.h
new file mode 100644
index 000000000000..ba2ec578ec50
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/cpu-op.h
@@ -0,0 +1,68 @@
+/*
+ * cpu-op.h
+ *
+ * (C) Copyright 2017 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef CPU_OPV_H
+#define CPU_OPV_H
+
+#include <stdlib.h>
+#include <stdint.h>
+#include <linux/cpu_opv.h>
+
+#define likely(x)		__builtin_expect(!!(x), 1)
+#define unlikely(x)		__builtin_expect(!!(x), 0)
+#define barrier()		__asm__ __volatile__("" : : : "memory")
+
+#define ACCESS_ONCE(x)		(*(__volatile__  __typeof__(x) *)&(x))
+#define WRITE_ONCE(x, v)	__extension__ ({ ACCESS_ONCE(x) = (v); })
+#define READ_ONCE(x)		ACCESS_ONCE(x)
+
+int cpu_opv(struct cpu_op *cpuopv, int cpuopcnt, int cpu, int flags);
+int cpu_op_get_current_cpu(void);
+
+int cpu_op_cmpxchg(void *v, void *expect, void *old, void *_new,
+		size_t len, int cpu);
+int cpu_op_add(void *v, int64_t count, size_t len, int cpu);
+
+int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu);
+int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu);
+int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu);
+int cpu_op_addv(intptr_t *v, int64_t count, int cpu);
+
+#endif  /* CPU_OPV_H_ */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [PATCH update for 4.15 2/3] cpu_opv: selftests: Implement selftests (v4)
@ 2017-11-21 22:19   ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: mathieu.desnoyers @ 2017-11-21 22:19 UTC (permalink / raw)


Implement cpu_opv selftests. It needs to express dependencies on
header files and .so, which require to override the selftests
lib.mk targets. Use OVERRIDE_TARGETS define for this.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
CC: Russell King <linux at arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas at arm.com>
CC: Will Deacon <will.deacon at arm.com>
CC: Thomas Gleixner <tglx at linutronix.de>
CC: Paul Turner <pjt at google.com>
CC: Andrew Hunter <ahh at google.com>
CC: Peter Zijlstra <peterz at infradead.org>
CC: Andy Lutomirski <luto at amacapital.net>
CC: Andi Kleen <andi at firstfloor.org>
CC: Dave Watson <davejwatson at fb.com>
CC: Chris Lameter <cl at linux.com>
CC: Ingo Molnar <mingo at redhat.com>
CC: "H. Peter Anvin" <hpa at zytor.com>
CC: Ben Maurer <bmaurer at fb.com>
CC: Steven Rostedt <rostedt at goodmis.org>
CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
CC: Josh Triplett <josh at joshtriplett.org>
CC: Linus Torvalds <torvalds at linux-foundation.org>
CC: Andrew Morton <akpm at linux-foundation.org>
CC: Boqun Feng <boqun.feng at gmail.com>
CC: Shuah Khan <shuah at kernel.org>
CC: linux-kselftest at vger.kernel.org
CC: linux-api at vger.kernel.org
---
Changes since v1:

- Expose similar library API as rseq:  Expose library API closely
  matching the rseq APIs, following removal of the event counter from
  the rseq kernel API.
- Update makefile to fix make run_tests dependency on "all".
- Introduce a OVERRIDE_TARGETS.

Changes since v2:

- Test page faults.

Changes since v3:

- Move lib.mk OVERRIDE_TARGETS change to its own patch.
- Printout TAP output.
---
 MAINTAINERS                                        |    1 +
 tools/testing/selftests/Makefile                   |    1 +
 tools/testing/selftests/cpu-opv/.gitignore         |    1 +
 tools/testing/selftests/cpu-opv/Makefile           |   17 +
 .../testing/selftests/cpu-opv/basic_cpu_opv_test.c | 1167 ++++++++++++++++++++
 tools/testing/selftests/cpu-opv/cpu-op.c           |  348 ++++++
 tools/testing/selftests/cpu-opv/cpu-op.h           |   68 ++
 7 files changed, 1603 insertions(+)
 create mode 100644 tools/testing/selftests/cpu-opv/.gitignore
 create mode 100644 tools/testing/selftests/cpu-opv/Makefile
 create mode 100644 tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 0b4e504f5003..c6c2436d15f8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3734,6 +3734,7 @@ L:	linux-kernel at vger.kernel.org
 S:	Supported
 F:	kernel/cpu_opv.c
 F:	include/uapi/linux/cpu_opv.h
+F:	tools/testing/selftests/cpu-opv/
 
 CRAMFS FILESYSTEM
 M:	Nicolas Pitre <nico at linaro.org>
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index eaf599dc2137..fc1eba0e0130 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -5,6 +5,7 @@ TARGETS += breakpoints
 TARGETS += capabilities
 TARGETS += cpufreq
 TARGETS += cpu-hotplug
+TARGETS += cpu-opv
 TARGETS += efivarfs
 TARGETS += exec
 TARGETS += firmware
diff --git a/tools/testing/selftests/cpu-opv/.gitignore b/tools/testing/selftests/cpu-opv/.gitignore
new file mode 100644
index 000000000000..c7186eb95cf5
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/.gitignore
@@ -0,0 +1 @@
+basic_cpu_opv_test
diff --git a/tools/testing/selftests/cpu-opv/Makefile b/tools/testing/selftests/cpu-opv/Makefile
new file mode 100644
index 000000000000..21e63545d521
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/Makefile
@@ -0,0 +1,17 @@
+CFLAGS += -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./
+
+# Own dependencies because we only want to build against 1st prerequisite, but
+# still track changes to header files and depend on shared object.
+OVERRIDE_TARGETS = 1
+
+TEST_GEN_PROGS = basic_cpu_opv_test
+
+TEST_GEN_PROGS_EXTENDED = libcpu-op.so
+
+include ../lib.mk
+
+$(OUTPUT)/libcpu-op.so: cpu-op.c cpu-op.h
+	$(CC) $(CFLAGS) -shared -fPIC $< -o $@
+
+$(OUTPUT)/%: %.c $(TEST_GEN_PROGS_EXTENDED) cpu-op.h
+	$(CC) $(CFLAGS) $< -lcpu-op -o $@
diff --git a/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
new file mode 100644
index 000000000000..5aeb6ed0b361
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
@@ -0,0 +1,1167 @@
+/*
+ * Basic test coverage for cpu_opv system call.
+ */
+
+#define _GNU_SOURCE
+#include <assert.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/time.h>
+#include <errno.h>
+#include <stdlib.h>
+
+#include "../kselftest.h"
+
+#include "cpu-op.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+#define TESTBUFLEN	4096
+#define TESTBUFLEN_CMP	16
+
+#define TESTBUFLEN_PAGE_MAX	65536
+
+#define NR_PF_ARRAY	16384
+#define PF_ARRAY_LEN	4096
+
+/* 64 MB arrays for page fault testing. */
+char pf_array_dst[NR_PF_ARRAY][PF_ARRAY_LEN];
+char pf_array_src[NR_PF_ARRAY][PF_ARRAY_LEN];
+
+static int test_compare_eq_op(char *a, char *b, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_compare_eq_same(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_eq same";
+
+	/* Test compare_eq */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf2[i] = (char)i;
+	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret > 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 0);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_compare_eq_diff(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_eq different";
+
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret == 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 1);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_compare_ne_op(char *a, char *b, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_compare_ne_same(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_ne same";
+
+	/* Test compare_ne */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf2[i] = (char)i;
+	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret == 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 1);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_compare_ne_diff(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_ne different";
+
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 0);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_2compare_eq_op(char *a, char *b, char *c, char *d,
+		size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_2compare_eq_index(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN_CMP];
+	char buf2[TESTBUFLEN_CMP];
+	char buf3[TESTBUFLEN_CMP];
+	char buf4[TESTBUFLEN_CMP];
+	const char *test_name = "test_2compare_eq index";
+
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN_CMP);
+	memset(buf3, 0, TESTBUFLEN_CMP);
+	memset(buf4, 0, TESTBUFLEN_CMP);
+
+	/* First compare failure is op[0], expect 1. */
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 1) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 1);
+		return -1;
+	}
+
+	/* All compares succeed. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf2[i] = (char)i;
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 0);
+		return -1;
+	}
+
+	/* First compare failure is op[1], expect 2. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf3[i] = (char)i;
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 2) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 2);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_2compare_ne_op(char *a, char *b, char *c, char *d,
+		size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_2compare_ne_index(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN_CMP];
+	char buf2[TESTBUFLEN_CMP];
+	char buf3[TESTBUFLEN_CMP];
+	char buf4[TESTBUFLEN_CMP];
+	const char *test_name = "test_2compare_ne index";
+
+	memset(buf1, 0, TESTBUFLEN_CMP);
+	memset(buf2, 0, TESTBUFLEN_CMP);
+	memset(buf3, 0, TESTBUFLEN_CMP);
+	memset(buf4, 0, TESTBUFLEN_CMP);
+
+	/* First compare ne failure is op[0], expect 1. */
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 1) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 1);
+		return -1;
+	}
+
+	/* All compare ne succeed. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf3[i] = (char)i;
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 0);
+		return -1;
+	}
+
+	/* First compare failure is op[1], expect 2. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf4[i] = (char)i;
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 2) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 2);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_memcpy_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_memcpy";
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	for (i = 0; i < TESTBUFLEN; i++) {
+		if (buf2[i] != (char)i) {
+			ksft_exit_fail_msg("%s test: unexpected value at offset %d. Found %d. Should be %d.\n",
+				   test_name, i, buf2[i], (char)i);
+			return -1;
+		}
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_memcpy_u32(void)
+{
+	int ret;
+	uint32_t v1, v2;
+	const char *test_name = "test_memcpy_u32";
+
+	/* Test memcpy_u32 */
+	v1 = 42;
+	v2 = 0;
+	ret = test_memcpy_op(&v2, &v1, sizeof(v1));
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v1 != v2) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v2, v1);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_memcpy_mb_memcpy_op(void *dst1, void *src1,
+		void *dst2, void *src2, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst1),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src1),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_MB_OP,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst2),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src2),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy_mb_memcpy(void)
+{
+	int ret;
+	int v1, v2, v3;
+	const char *test_name = "test_memcpy_mb_memcpy";
+
+	/* Test memcpy */
+	v1 = 42;
+	v2 = v3 = 0;
+	ret = test_memcpy_mb_memcpy_op(&v2, &v1, &v3, &v2, sizeof(int));
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v3 != v1) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v3, v1);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_add_op(int *v, int64_t increment)
+{
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_op_add(v, increment, sizeof(*v), cpu);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_add(void)
+{
+	int orig_v = 42, v, ret;
+	int increment = 1;
+	const char *test_name = "test_add";
+
+	v = orig_v;
+	ret = test_add_op(&v, increment);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != orig_v + increment) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v,
+				   orig_v + increment);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_two_add_op(int *v, int64_t *increments)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_ADD_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.arithmetic_op.p, v),
+			.u.arithmetic_op.count = increments[0],
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+		[1] = {
+			.op = CPU_ADD_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.arithmetic_op.p, v),
+			.u.arithmetic_op.count = increments[1],
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_two_add(void)
+{
+	int orig_v = 42, v, ret;
+	int64_t increments[2] = { 99, 123 };
+	const char *test_name = "test_two_add";
+
+	v = orig_v;
+	ret = test_two_add_op(&v, increments);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != orig_v + increments[0] + increments[1]) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v,
+				   orig_v + increments[0] + increments[1]);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_or_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_OR_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_or(void)
+{
+	int orig_v = 0xFF00000, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_or";
+
+	v = orig_v;
+	ret = test_or_op(&v, mask);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v | mask)) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v, orig_v | mask);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_and_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_AND_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_and(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_and";
+
+	v = orig_v;
+	ret = test_and_op(&v, mask);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v & mask)) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v, orig_v & mask);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_xor_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_XOR_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_xor(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_xor";
+
+	v = orig_v;
+	ret = test_xor_op(&v, mask);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v ^ mask)) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v, orig_v ^ mask);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_lshift_op(int *v, uint32_t bits)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_LSHIFT_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.shift_op.p, v),
+			.u.shift_op.bits = bits,
+			.u.shift_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_lshift(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t bits = 5;
+	const char *test_name = "test_lshift";
+
+	v = orig_v;
+	ret = test_lshift_op(&v, bits);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v << bits)) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v, orig_v << bits);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_rshift_op(int *v, uint32_t bits)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_RSHIFT_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.shift_op.p, v),
+			.u.shift_op.bits = bits,
+			.u.shift_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_rshift(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t bits = 5;
+	const char *test_name = "test_rshift";
+
+	v = orig_v;
+	ret = test_rshift_op(&v, bits);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v >> bits)) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v, orig_v >> bits);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_cmpxchg_op(void *v, void *expect, void *old, void *n,
+		size_t len)
+{
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_op_cmpxchg(v, expect, old, n, len, cpu);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_cmpxchg_success(void)
+{
+	int ret;
+	uint64_t orig_v = 1, v, expect = 1, old = 0, n = 3;
+	const char *test_name = "test_cmpxchg success";
+
+	v = orig_v;
+	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret) {
+		ksft_exit_fail_msg("%s returned %d, expecting %d\n",
+				   test_name, ret, 0);
+		return -1;
+	}
+	if (v != n) {
+		ksft_exit_fail_msg("%s v is %lld, expecting %lld\n",
+				   test_name, (long long)v, (long long)n);
+		return -1;
+	}
+	if (old != orig_v) {
+		ksft_exit_fail_msg("%s old is %lld, expecting %lld\n",
+				   test_name, (long long)old,
+				   (long long)orig_v);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_cmpxchg_fail(void)
+{
+	int ret;
+	uint64_t orig_v = 1, v, expect = 123, old = 0, n = 3;
+	const char *test_name = "test_cmpxchg fail";
+
+	v = orig_v;
+	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret == 0) {
+		ksft_exit_fail_msg("%s returned %d, expecting %d\n",
+				   test_name, ret, 1);
+		return -1;
+	}
+	if (v == n) {
+		ksft_exit_fail_msg("%s returned %lld, expecting %lld\n",
+				   test_name, (long long)v, (long long)orig_v);
+		return -1;
+	}
+	if (old != orig_v) {
+		ksft_exit_fail_msg("%s old is %lld, expecting %lld\n",
+				   test_name, (long long)old,
+				   (long long)orig_v);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_memcpy_expect_fault_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			/* Return EAGAIN on fault. */
+			.u.memcpy_op.expect_fault_src = 1,
+		},
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_memcpy_fault(void)
+{
+	int ret;
+	char buf1[TESTBUFLEN];
+	const char *test_name = "test_memcpy_fault";
+
+	/* Test memcpy */
+	ret = test_memcpy_op(buf1, NULL, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EFAULT)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	/* Test memcpy expect fault */
+	ret = test_memcpy_expect_fault_op(buf1, NULL, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EAGAIN)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int do_test_unknown_op(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = -1,	/* Unknown */
+			.len = 0,
+		},
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_unknown_op(void)
+{
+	int ret;
+	const char *test_name = "test_unknown_op";
+
+	ret = do_test_unknown_op();
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int do_test_max_ops(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = { .op = CPU_MB_OP, },
+		[1] = { .op = CPU_MB_OP, },
+		[2] = { .op = CPU_MB_OP, },
+		[3] = { .op = CPU_MB_OP, },
+		[4] = { .op = CPU_MB_OP, },
+		[5] = { .op = CPU_MB_OP, },
+		[6] = { .op = CPU_MB_OP, },
+		[7] = { .op = CPU_MB_OP, },
+		[8] = { .op = CPU_MB_OP, },
+		[9] = { .op = CPU_MB_OP, },
+		[10] = { .op = CPU_MB_OP, },
+		[11] = { .op = CPU_MB_OP, },
+		[12] = { .op = CPU_MB_OP, },
+		[13] = { .op = CPU_MB_OP, },
+		[14] = { .op = CPU_MB_OP, },
+		[15] = { .op = CPU_MB_OP, },
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_max_ops(void)
+{
+	int ret;
+	const char *test_name = "test_max_ops";
+
+	ret = do_test_max_ops();
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int do_test_too_many_ops(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = { .op = CPU_MB_OP, },
+		[1] = { .op = CPU_MB_OP, },
+		[2] = { .op = CPU_MB_OP, },
+		[3] = { .op = CPU_MB_OP, },
+		[4] = { .op = CPU_MB_OP, },
+		[5] = { .op = CPU_MB_OP, },
+		[6] = { .op = CPU_MB_OP, },
+		[7] = { .op = CPU_MB_OP, },
+		[8] = { .op = CPU_MB_OP, },
+		[9] = { .op = CPU_MB_OP, },
+		[10] = { .op = CPU_MB_OP, },
+		[11] = { .op = CPU_MB_OP, },
+		[12] = { .op = CPU_MB_OP, },
+		[13] = { .op = CPU_MB_OP, },
+		[14] = { .op = CPU_MB_OP, },
+		[15] = { .op = CPU_MB_OP, },
+		[16] = { .op = CPU_MB_OP, },
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_too_many_ops(void)
+{
+	int ret;
+	const char *test_name = "test_too_many_ops";
+
+	ret = do_test_too_many_ops();
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+/* Use 64kB len, largest page size known on Linux. */
+static int test_memcpy_single_too_large(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN_PAGE_MAX + 1];
+	char buf2[TESTBUFLEN_PAGE_MAX + 1];
+	const char *test_name = "test_memcpy_single_too_large";
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN_PAGE_MAX + 1; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN_PAGE_MAX + 1);
+	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN_PAGE_MAX + 1);
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_memcpy_single_ok_sum_too_large_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy_single_ok_sum_too_large(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_memcpy_single_ok_sum_too_large";
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_memcpy_single_ok_sum_too_large_op(buf2, buf1, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+/*
+ * Iterate over large uninitialized arrays to trigger page faults.
+ */
+int test_page_fault(void)
+{
+	int ret = 0;
+	uint64_t i;
+	const char *test_name = "test_page_fault";
+
+	for (i = 0; i < NR_PF_ARRAY; i++) {
+		ret = test_memcpy_op(pf_array_dst[i],
+				     pf_array_src[i],
+				     PF_ARRAY_LEN);
+		if (ret) {
+			ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+					   test_name, ret, strerror(errno));
+			return ret;
+		}
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	ksft_print_header();
+
+	test_compare_eq_same();
+	test_compare_eq_diff();
+	test_compare_ne_same();
+	test_compare_ne_diff();
+	test_2compare_eq_index();
+	test_2compare_ne_index();
+	test_memcpy();
+	test_memcpy_u32();
+	test_memcpy_mb_memcpy();
+	test_add();
+	test_two_add();
+	test_or();
+	test_and();
+	test_xor();
+	test_lshift();
+	test_rshift();
+	test_cmpxchg_success();
+	test_cmpxchg_fail();
+	test_memcpy_fault();
+	test_unknown_op();
+	test_max_ops();
+	test_too_many_ops();
+	test_memcpy_single_too_large();
+	test_memcpy_single_ok_sum_too_large();
+	test_page_fault();
+
+	return ksft_exit_pass();
+}
diff --git a/tools/testing/selftests/cpu-opv/cpu-op.c b/tools/testing/selftests/cpu-opv/cpu-op.c
new file mode 100644
index 000000000000..d7ba481cca04
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/cpu-op.c
@@ -0,0 +1,348 @@
+/*
+ * cpu-op.c
+ *
+ * Copyright (C) 2017 Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; only
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <syscall.h>
+#include <assert.h>
+#include <signal.h>
+
+#include "cpu-op.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+int cpu_opv(struct cpu_op *cpu_opv, int cpuopcnt, int cpu, int flags)
+{
+	return syscall(__NR_cpu_opv, cpu_opv, cpuopcnt, cpu, flags);
+}
+
+int cpu_op_get_current_cpu(void)
+{
+	int cpu;
+
+	cpu = sched_getcpu();
+	if (cpu < 0) {
+		perror("sched_getcpu()");
+		abort();
+	}
+	return cpu;
+}
+
+int cpu_op_cmpxchg(void *v, void *expect, void *old, void *n,
+		size_t len, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)old,
+			.u.memcpy_op.src = (unsigned long)v,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)n,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_add(void *v, int64_t count, size_t len, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_ADD_OP,
+			.len = len,
+			.u.arithmetic_op.p = (unsigned long)v,
+			.u.arithmetic_op.count = count,
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int cpu_op_cmpeqv_storep_expect_fault(intptr_t *v, intptr_t expect,
+		intptr_t *newp, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)newp,
+			.u.memcpy_op.expect_fault_dst = 0,
+			/* Return EAGAIN on src fault. */
+			.u.memcpy_op.expect_fault_src = 1,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	intptr_t oldv = READ_ONCE(*v);
+	intptr_t *newp = (intptr_t *)(oldv + voffp);
+	int ret;
+
+	if (oldv == expectnot)
+		return 1;
+	ret = cpu_op_cmpeqv_storep_expect_fault(v, oldv, newp, cpu);
+	if (!ret) {
+		*load = oldv;
+		return 0;
+	}
+	if (ret > 0) {
+		errno = EAGAIN;
+		return -1;
+	}
+	return -1;
+}
+
+int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v2,
+			.u.memcpy_op.src = (unsigned long)&newv2,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v2,
+			.u.memcpy_op.src = (unsigned long)&newv2,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MB_OP,
+		},
+		[3] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v2,
+			.u.compare_op.b = (unsigned long)&expect2,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)dst,
+			.u.memcpy_op.src = (unsigned long)src,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)dst,
+			.u.memcpy_op.src = (unsigned long)src,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MB_OP,
+		},
+		[3] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_addv(intptr_t *v, int64_t count, int cpu)
+{
+	return cpu_op_add(v, count, sizeof(intptr_t), cpu);
+}
diff --git a/tools/testing/selftests/cpu-opv/cpu-op.h b/tools/testing/selftests/cpu-opv/cpu-op.h
new file mode 100644
index 000000000000..ba2ec578ec50
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/cpu-op.h
@@ -0,0 +1,68 @@
+/*
+ * cpu-op.h
+ *
+ * (C) Copyright 2017 - Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef CPU_OPV_H
+#define CPU_OPV_H
+
+#include <stdlib.h>
+#include <stdint.h>
+#include <linux/cpu_opv.h>
+
+#define likely(x)		__builtin_expect(!!(x), 1)
+#define unlikely(x)		__builtin_expect(!!(x), 0)
+#define barrier()		__asm__ __volatile__("" : : : "memory")
+
+#define ACCESS_ONCE(x)		(*(__volatile__  __typeof__(x) *)&(x))
+#define WRITE_ONCE(x, v)	__extension__ ({ ACCESS_ONCE(x) = (v); })
+#define READ_ONCE(x)		ACCESS_ONCE(x)
+
+int cpu_opv(struct cpu_op *cpuopv, int cpuopcnt, int cpu, int flags);
+int cpu_op_get_current_cpu(void);
+
+int cpu_op_cmpxchg(void *v, void *expect, void *old, void *_new,
+		size_t len, int cpu);
+int cpu_op_add(void *v, int64_t count, size_t len, int cpu);
+
+int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu);
+int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu);
+int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu);
+int cpu_op_addv(intptr_t *v, int64_t count, int cpu);
+
+#endif  /* CPU_OPV_H_ */
-- 
2.11.0



--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [PATCH update for 4.15 2/3] cpu_opv: selftests: Implement selftests (v4)
@ 2017-11-21 22:19   ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 22:19 UTC (permalink / raw)


Implement cpu_opv selftests. It needs to express dependencies on
header files and .so, which require to override the selftests
lib.mk targets. Use OVERRIDE_TARGETS define for this.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
CC: Russell King <linux at arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas at arm.com>
CC: Will Deacon <will.deacon at arm.com>
CC: Thomas Gleixner <tglx at linutronix.de>
CC: Paul Turner <pjt at google.com>
CC: Andrew Hunter <ahh at google.com>
CC: Peter Zijlstra <peterz at infradead.org>
CC: Andy Lutomirski <luto at amacapital.net>
CC: Andi Kleen <andi at firstfloor.org>
CC: Dave Watson <davejwatson at fb.com>
CC: Chris Lameter <cl at linux.com>
CC: Ingo Molnar <mingo at redhat.com>
CC: "H. Peter Anvin" <hpa at zytor.com>
CC: Ben Maurer <bmaurer at fb.com>
CC: Steven Rostedt <rostedt at goodmis.org>
CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
CC: Josh Triplett <josh at joshtriplett.org>
CC: Linus Torvalds <torvalds at linux-foundation.org>
CC: Andrew Morton <akpm at linux-foundation.org>
CC: Boqun Feng <boqun.feng at gmail.com>
CC: Shuah Khan <shuah at kernel.org>
CC: linux-kselftest at vger.kernel.org
CC: linux-api at vger.kernel.org
---
Changes since v1:

- Expose similar library API as rseq:  Expose library API closely
  matching the rseq APIs, following removal of the event counter from
  the rseq kernel API.
- Update makefile to fix make run_tests dependency on "all".
- Introduce a OVERRIDE_TARGETS.

Changes since v2:

- Test page faults.

Changes since v3:

- Move lib.mk OVERRIDE_TARGETS change to its own patch.
- Printout TAP output.
---
 MAINTAINERS                                        |    1 +
 tools/testing/selftests/Makefile                   |    1 +
 tools/testing/selftests/cpu-opv/.gitignore         |    1 +
 tools/testing/selftests/cpu-opv/Makefile           |   17 +
 .../testing/selftests/cpu-opv/basic_cpu_opv_test.c | 1167 ++++++++++++++++++++
 tools/testing/selftests/cpu-opv/cpu-op.c           |  348 ++++++
 tools/testing/selftests/cpu-opv/cpu-op.h           |   68 ++
 7 files changed, 1603 insertions(+)
 create mode 100644 tools/testing/selftests/cpu-opv/.gitignore
 create mode 100644 tools/testing/selftests/cpu-opv/Makefile
 create mode 100644 tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 0b4e504f5003..c6c2436d15f8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3734,6 +3734,7 @@ L:	linux-kernel at vger.kernel.org
 S:	Supported
 F:	kernel/cpu_opv.c
 F:	include/uapi/linux/cpu_opv.h
+F:	tools/testing/selftests/cpu-opv/
 
 CRAMFS FILESYSTEM
 M:	Nicolas Pitre <nico at linaro.org>
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index eaf599dc2137..fc1eba0e0130 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -5,6 +5,7 @@ TARGETS += breakpoints
 TARGETS += capabilities
 TARGETS += cpufreq
 TARGETS += cpu-hotplug
+TARGETS += cpu-opv
 TARGETS += efivarfs
 TARGETS += exec
 TARGETS += firmware
diff --git a/tools/testing/selftests/cpu-opv/.gitignore b/tools/testing/selftests/cpu-opv/.gitignore
new file mode 100644
index 000000000000..c7186eb95cf5
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/.gitignore
@@ -0,0 +1 @@
+basic_cpu_opv_test
diff --git a/tools/testing/selftests/cpu-opv/Makefile b/tools/testing/selftests/cpu-opv/Makefile
new file mode 100644
index 000000000000..21e63545d521
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/Makefile
@@ -0,0 +1,17 @@
+CFLAGS += -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./
+
+# Own dependencies because we only want to build against 1st prerequisite, but
+# still track changes to header files and depend on shared object.
+OVERRIDE_TARGETS = 1
+
+TEST_GEN_PROGS = basic_cpu_opv_test
+
+TEST_GEN_PROGS_EXTENDED = libcpu-op.so
+
+include ../lib.mk
+
+$(OUTPUT)/libcpu-op.so: cpu-op.c cpu-op.h
+	$(CC) $(CFLAGS) -shared -fPIC $< -o $@
+
+$(OUTPUT)/%: %.c $(TEST_GEN_PROGS_EXTENDED) cpu-op.h
+	$(CC) $(CFLAGS) $< -lcpu-op -o $@
diff --git a/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
new file mode 100644
index 000000000000..5aeb6ed0b361
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
@@ -0,0 +1,1167 @@
+/*
+ * Basic test coverage for cpu_opv system call.
+ */
+
+#define _GNU_SOURCE
+#include <assert.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/time.h>
+#include <errno.h>
+#include <stdlib.h>
+
+#include "../kselftest.h"
+
+#include "cpu-op.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+#define TESTBUFLEN	4096
+#define TESTBUFLEN_CMP	16
+
+#define TESTBUFLEN_PAGE_MAX	65536
+
+#define NR_PF_ARRAY	16384
+#define PF_ARRAY_LEN	4096
+
+/* 64 MB arrays for page fault testing. */
+char pf_array_dst[NR_PF_ARRAY][PF_ARRAY_LEN];
+char pf_array_src[NR_PF_ARRAY][PF_ARRAY_LEN];
+
+static int test_compare_eq_op(char *a, char *b, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_compare_eq_same(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_eq same";
+
+	/* Test compare_eq */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf2[i] = (char)i;
+	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret > 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 0);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_compare_eq_diff(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_eq different";
+
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret == 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 1);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_compare_ne_op(char *a, char *b, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_compare_ne_same(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_ne same";
+
+	/* Test compare_ne */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf2[i] = (char)i;
+	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret == 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 1);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_compare_ne_diff(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_ne different";
+
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 0);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_2compare_eq_op(char *a, char *b, char *c, char *d,
+		size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_2compare_eq_index(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN_CMP];
+	char buf2[TESTBUFLEN_CMP];
+	char buf3[TESTBUFLEN_CMP];
+	char buf4[TESTBUFLEN_CMP];
+	const char *test_name = "test_2compare_eq index";
+
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN_CMP);
+	memset(buf3, 0, TESTBUFLEN_CMP);
+	memset(buf4, 0, TESTBUFLEN_CMP);
+
+	/* First compare failure is op[0], expect 1. */
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 1) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 1);
+		return -1;
+	}
+
+	/* All compares succeed. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf2[i] = (char)i;
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 0);
+		return -1;
+	}
+
+	/* First compare failure is op[1], expect 2. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf3[i] = (char)i;
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 2) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 2);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_2compare_ne_op(char *a, char *b, char *c, char *d,
+		size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_2compare_ne_index(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN_CMP];
+	char buf2[TESTBUFLEN_CMP];
+	char buf3[TESTBUFLEN_CMP];
+	char buf4[TESTBUFLEN_CMP];
+	const char *test_name = "test_2compare_ne index";
+
+	memset(buf1, 0, TESTBUFLEN_CMP);
+	memset(buf2, 0, TESTBUFLEN_CMP);
+	memset(buf3, 0, TESTBUFLEN_CMP);
+	memset(buf4, 0, TESTBUFLEN_CMP);
+
+	/* First compare ne failure is op[0], expect 1. */
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 1) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 1);
+		return -1;
+	}
+
+	/* All compare ne succeed. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf3[i] = (char)i;
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 0);
+		return -1;
+	}
+
+	/* First compare failure is op[1], expect 2. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf4[i] = (char)i;
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 2) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 2);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_memcpy_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_memcpy";
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	for (i = 0; i < TESTBUFLEN; i++) {
+		if (buf2[i] != (char)i) {
+			ksft_exit_fail_msg("%s test: unexpected value at offset %d. Found %d. Should be %d.\n",
+				   test_name, i, buf2[i], (char)i);
+			return -1;
+		}
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_memcpy_u32(void)
+{
+	int ret;
+	uint32_t v1, v2;
+	const char *test_name = "test_memcpy_u32";
+
+	/* Test memcpy_u32 */
+	v1 = 42;
+	v2 = 0;
+	ret = test_memcpy_op(&v2, &v1, sizeof(v1));
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v1 != v2) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v2, v1);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_memcpy_mb_memcpy_op(void *dst1, void *src1,
+		void *dst2, void *src2, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst1),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src1),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_MB_OP,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst2),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src2),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy_mb_memcpy(void)
+{
+	int ret;
+	int v1, v2, v3;
+	const char *test_name = "test_memcpy_mb_memcpy";
+
+	/* Test memcpy */
+	v1 = 42;
+	v2 = v3 = 0;
+	ret = test_memcpy_mb_memcpy_op(&v2, &v1, &v3, &v2, sizeof(int));
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v3 != v1) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v3, v1);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_add_op(int *v, int64_t increment)
+{
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_op_add(v, increment, sizeof(*v), cpu);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_add(void)
+{
+	int orig_v = 42, v, ret;
+	int increment = 1;
+	const char *test_name = "test_add";
+
+	v = orig_v;
+	ret = test_add_op(&v, increment);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != orig_v + increment) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v,
+				   orig_v + increment);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_two_add_op(int *v, int64_t *increments)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_ADD_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.arithmetic_op.p, v),
+			.u.arithmetic_op.count = increments[0],
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+		[1] = {
+			.op = CPU_ADD_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.arithmetic_op.p, v),
+			.u.arithmetic_op.count = increments[1],
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_two_add(void)
+{
+	int orig_v = 42, v, ret;
+	int64_t increments[2] = { 99, 123 };
+	const char *test_name = "test_two_add";
+
+	v = orig_v;
+	ret = test_two_add_op(&v, increments);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != orig_v + increments[0] + increments[1]) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v,
+				   orig_v + increments[0] + increments[1]);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_or_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_OR_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_or(void)
+{
+	int orig_v = 0xFF00000, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_or";
+
+	v = orig_v;
+	ret = test_or_op(&v, mask);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v | mask)) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v, orig_v | mask);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_and_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_AND_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_and(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_and";
+
+	v = orig_v;
+	ret = test_and_op(&v, mask);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v & mask)) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v, orig_v & mask);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_xor_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_XOR_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_xor(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_xor";
+
+	v = orig_v;
+	ret = test_xor_op(&v, mask);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v ^ mask)) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v, orig_v ^ mask);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_lshift_op(int *v, uint32_t bits)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_LSHIFT_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.shift_op.p, v),
+			.u.shift_op.bits = bits,
+			.u.shift_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_lshift(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t bits = 5;
+	const char *test_name = "test_lshift";
+
+	v = orig_v;
+	ret = test_lshift_op(&v, bits);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v << bits)) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v, orig_v << bits);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_rshift_op(int *v, uint32_t bits)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_RSHIFT_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.shift_op.p, v),
+			.u.shift_op.bits = bits,
+			.u.shift_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_rshift(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t bits = 5;
+	const char *test_name = "test_rshift";
+
+	v = orig_v;
+	ret = test_rshift_op(&v, bits);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v >> bits)) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v, orig_v >> bits);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_cmpxchg_op(void *v, void *expect, void *old, void *n,
+		size_t len)
+{
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_op_cmpxchg(v, expect, old, n, len, cpu);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_cmpxchg_success(void)
+{
+	int ret;
+	uint64_t orig_v = 1, v, expect = 1, old = 0, n = 3;
+	const char *test_name = "test_cmpxchg success";
+
+	v = orig_v;
+	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret) {
+		ksft_exit_fail_msg("%s returned %d, expecting %d\n",
+				   test_name, ret, 0);
+		return -1;
+	}
+	if (v != n) {
+		ksft_exit_fail_msg("%s v is %lld, expecting %lld\n",
+				   test_name, (long long)v, (long long)n);
+		return -1;
+	}
+	if (old != orig_v) {
+		ksft_exit_fail_msg("%s old is %lld, expecting %lld\n",
+				   test_name, (long long)old,
+				   (long long)orig_v);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_cmpxchg_fail(void)
+{
+	int ret;
+	uint64_t orig_v = 1, v, expect = 123, old = 0, n = 3;
+	const char *test_name = "test_cmpxchg fail";
+
+	v = orig_v;
+	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret == 0) {
+		ksft_exit_fail_msg("%s returned %d, expecting %d\n",
+				   test_name, ret, 1);
+		return -1;
+	}
+	if (v == n) {
+		ksft_exit_fail_msg("%s returned %lld, expecting %lld\n",
+				   test_name, (long long)v, (long long)orig_v);
+		return -1;
+	}
+	if (old != orig_v) {
+		ksft_exit_fail_msg("%s old is %lld, expecting %lld\n",
+				   test_name, (long long)old,
+				   (long long)orig_v);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_memcpy_expect_fault_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			/* Return EAGAIN on fault. */
+			.u.memcpy_op.expect_fault_src = 1,
+		},
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_memcpy_fault(void)
+{
+	int ret;
+	char buf1[TESTBUFLEN];
+	const char *test_name = "test_memcpy_fault";
+
+	/* Test memcpy */
+	ret = test_memcpy_op(buf1, NULL, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EFAULT)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	/* Test memcpy expect fault */
+	ret = test_memcpy_expect_fault_op(buf1, NULL, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EAGAIN)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int do_test_unknown_op(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = -1,	/* Unknown */
+			.len = 0,
+		},
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_unknown_op(void)
+{
+	int ret;
+	const char *test_name = "test_unknown_op";
+
+	ret = do_test_unknown_op();
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int do_test_max_ops(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = { .op = CPU_MB_OP, },
+		[1] = { .op = CPU_MB_OP, },
+		[2] = { .op = CPU_MB_OP, },
+		[3] = { .op = CPU_MB_OP, },
+		[4] = { .op = CPU_MB_OP, },
+		[5] = { .op = CPU_MB_OP, },
+		[6] = { .op = CPU_MB_OP, },
+		[7] = { .op = CPU_MB_OP, },
+		[8] = { .op = CPU_MB_OP, },
+		[9] = { .op = CPU_MB_OP, },
+		[10] = { .op = CPU_MB_OP, },
+		[11] = { .op = CPU_MB_OP, },
+		[12] = { .op = CPU_MB_OP, },
+		[13] = { .op = CPU_MB_OP, },
+		[14] = { .op = CPU_MB_OP, },
+		[15] = { .op = CPU_MB_OP, },
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_max_ops(void)
+{
+	int ret;
+	const char *test_name = "test_max_ops";
+
+	ret = do_test_max_ops();
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int do_test_too_many_ops(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = { .op = CPU_MB_OP, },
+		[1] = { .op = CPU_MB_OP, },
+		[2] = { .op = CPU_MB_OP, },
+		[3] = { .op = CPU_MB_OP, },
+		[4] = { .op = CPU_MB_OP, },
+		[5] = { .op = CPU_MB_OP, },
+		[6] = { .op = CPU_MB_OP, },
+		[7] = { .op = CPU_MB_OP, },
+		[8] = { .op = CPU_MB_OP, },
+		[9] = { .op = CPU_MB_OP, },
+		[10] = { .op = CPU_MB_OP, },
+		[11] = { .op = CPU_MB_OP, },
+		[12] = { .op = CPU_MB_OP, },
+		[13] = { .op = CPU_MB_OP, },
+		[14] = { .op = CPU_MB_OP, },
+		[15] = { .op = CPU_MB_OP, },
+		[16] = { .op = CPU_MB_OP, },
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_too_many_ops(void)
+{
+	int ret;
+	const char *test_name = "test_too_many_ops";
+
+	ret = do_test_too_many_ops();
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+/* Use 64kB len, largest page size known on Linux. */
+static int test_memcpy_single_too_large(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN_PAGE_MAX + 1];
+	char buf2[TESTBUFLEN_PAGE_MAX + 1];
+	const char *test_name = "test_memcpy_single_too_large";
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN_PAGE_MAX + 1; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN_PAGE_MAX + 1);
+	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN_PAGE_MAX + 1);
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_memcpy_single_ok_sum_too_large_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy_single_ok_sum_too_large(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_memcpy_single_ok_sum_too_large";
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_memcpy_single_ok_sum_too_large_op(buf2, buf1, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+/*
+ * Iterate over large uninitialized arrays to trigger page faults.
+ */
+int test_page_fault(void)
+{
+	int ret = 0;
+	uint64_t i;
+	const char *test_name = "test_page_fault";
+
+	for (i = 0; i < NR_PF_ARRAY; i++) {
+		ret = test_memcpy_op(pf_array_dst[i],
+				     pf_array_src[i],
+				     PF_ARRAY_LEN);
+		if (ret) {
+			ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+					   test_name, ret, strerror(errno));
+			return ret;
+		}
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	ksft_print_header();
+
+	test_compare_eq_same();
+	test_compare_eq_diff();
+	test_compare_ne_same();
+	test_compare_ne_diff();
+	test_2compare_eq_index();
+	test_2compare_ne_index();
+	test_memcpy();
+	test_memcpy_u32();
+	test_memcpy_mb_memcpy();
+	test_add();
+	test_two_add();
+	test_or();
+	test_and();
+	test_xor();
+	test_lshift();
+	test_rshift();
+	test_cmpxchg_success();
+	test_cmpxchg_fail();
+	test_memcpy_fault();
+	test_unknown_op();
+	test_max_ops();
+	test_too_many_ops();
+	test_memcpy_single_too_large();
+	test_memcpy_single_ok_sum_too_large();
+	test_page_fault();
+
+	return ksft_exit_pass();
+}
diff --git a/tools/testing/selftests/cpu-opv/cpu-op.c b/tools/testing/selftests/cpu-opv/cpu-op.c
new file mode 100644
index 000000000000..d7ba481cca04
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/cpu-op.c
@@ -0,0 +1,348 @@
+/*
+ * cpu-op.c
+ *
+ * Copyright (C) 2017 Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; only
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <syscall.h>
+#include <assert.h>
+#include <signal.h>
+
+#include "cpu-op.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+int cpu_opv(struct cpu_op *cpu_opv, int cpuopcnt, int cpu, int flags)
+{
+	return syscall(__NR_cpu_opv, cpu_opv, cpuopcnt, cpu, flags);
+}
+
+int cpu_op_get_current_cpu(void)
+{
+	int cpu;
+
+	cpu = sched_getcpu();
+	if (cpu < 0) {
+		perror("sched_getcpu()");
+		abort();
+	}
+	return cpu;
+}
+
+int cpu_op_cmpxchg(void *v, void *expect, void *old, void *n,
+		size_t len, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)old,
+			.u.memcpy_op.src = (unsigned long)v,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)n,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_add(void *v, int64_t count, size_t len, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_ADD_OP,
+			.len = len,
+			.u.arithmetic_op.p = (unsigned long)v,
+			.u.arithmetic_op.count = count,
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int cpu_op_cmpeqv_storep_expect_fault(intptr_t *v, intptr_t expect,
+		intptr_t *newp, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)newp,
+			.u.memcpy_op.expect_fault_dst = 0,
+			/* Return EAGAIN on src fault. */
+			.u.memcpy_op.expect_fault_src = 1,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	intptr_t oldv = READ_ONCE(*v);
+	intptr_t *newp = (intptr_t *)(oldv + voffp);
+	int ret;
+
+	if (oldv == expectnot)
+		return 1;
+	ret = cpu_op_cmpeqv_storep_expect_fault(v, oldv, newp, cpu);
+	if (!ret) {
+		*load = oldv;
+		return 0;
+	}
+	if (ret > 0) {
+		errno = EAGAIN;
+		return -1;
+	}
+	return -1;
+}
+
+int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v2,
+			.u.memcpy_op.src = (unsigned long)&newv2,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v2,
+			.u.memcpy_op.src = (unsigned long)&newv2,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MB_OP,
+		},
+		[3] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v2,
+			.u.compare_op.b = (unsigned long)&expect2,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)dst,
+			.u.memcpy_op.src = (unsigned long)src,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)dst,
+			.u.memcpy_op.src = (unsigned long)src,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MB_OP,
+		},
+		[3] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_addv(intptr_t *v, int64_t count, int cpu)
+{
+	return cpu_op_add(v, count, sizeof(intptr_t), cpu);
+}
diff --git a/tools/testing/selftests/cpu-opv/cpu-op.h b/tools/testing/selftests/cpu-opv/cpu-op.h
new file mode 100644
index 000000000000..ba2ec578ec50
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/cpu-op.h
@@ -0,0 +1,68 @@
+/*
+ * cpu-op.h
+ *
+ * (C) Copyright 2017 - Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef CPU_OPV_H
+#define CPU_OPV_H
+
+#include <stdlib.h>
+#include <stdint.h>
+#include <linux/cpu_opv.h>
+
+#define likely(x)		__builtin_expect(!!(x), 1)
+#define unlikely(x)		__builtin_expect(!!(x), 0)
+#define barrier()		__asm__ __volatile__("" : : : "memory")
+
+#define ACCESS_ONCE(x)		(*(__volatile__  __typeof__(x) *)&(x))
+#define WRITE_ONCE(x, v)	__extension__ ({ ACCESS_ONCE(x) = (v); })
+#define READ_ONCE(x)		ACCESS_ONCE(x)
+
+int cpu_opv(struct cpu_op *cpuopv, int cpuopcnt, int cpu, int flags);
+int cpu_op_get_current_cpu(void);
+
+int cpu_op_cmpxchg(void *v, void *expect, void *old, void *_new,
+		size_t len, int cpu);
+int cpu_op_add(void *v, int64_t count, size_t len, int cpu);
+
+int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu);
+int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu);
+int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu);
+int cpu_op_addv(intptr_t *v, int64_t count, int cpu);
+
+#endif  /* CPU_OPV_H_ */
-- 
2.11.0



--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [PATCH update for 4.15 2/3] cpu_opv: selftests: Implement selftests (v4)
@ 2017-11-21 22:19   ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 22:19 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Shuah Khan,
	linux-kselftest-u79uwXL29TY76Z2rM5mHXA

Implement cpu_opv selftests. It needs to express dependencies on
header files and .so, which require to override the selftests
lib.mk targets. Use OVERRIDE_TARGETS define for this.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
CC: Russell King <linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>
CC: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
CC: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
CC: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
CC: Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
CC: Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
CC: Dave Watson <davejwatson-b10kYP2dOMg@public.gmane.org>
CC: Chris Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
CC: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
CC: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
CC: Ben Maurer <bmaurer-b10kYP2dOMg@public.gmane.org>
CC: Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>
CC: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
CC: Josh Triplett <josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org>
CC: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
CC: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
CC: Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Shuah Khan <shuah-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
CC: linux-kselftest-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
CC: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
Changes since v1:

- Expose similar library API as rseq:  Expose library API closely
  matching the rseq APIs, following removal of the event counter from
  the rseq kernel API.
- Update makefile to fix make run_tests dependency on "all".
- Introduce a OVERRIDE_TARGETS.

Changes since v2:

- Test page faults.

Changes since v3:

- Move lib.mk OVERRIDE_TARGETS change to its own patch.
- Printout TAP output.
---
 MAINTAINERS                                        |    1 +
 tools/testing/selftests/Makefile                   |    1 +
 tools/testing/selftests/cpu-opv/.gitignore         |    1 +
 tools/testing/selftests/cpu-opv/Makefile           |   17 +
 .../testing/selftests/cpu-opv/basic_cpu_opv_test.c | 1167 ++++++++++++++++++++
 tools/testing/selftests/cpu-opv/cpu-op.c           |  348 ++++++
 tools/testing/selftests/cpu-opv/cpu-op.h           |   68 ++
 7 files changed, 1603 insertions(+)
 create mode 100644 tools/testing/selftests/cpu-opv/.gitignore
 create mode 100644 tools/testing/selftests/cpu-opv/Makefile
 create mode 100644 tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 0b4e504f5003..c6c2436d15f8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3734,6 +3734,7 @@ L:	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
 S:	Supported
 F:	kernel/cpu_opv.c
 F:	include/uapi/linux/cpu_opv.h
+F:	tools/testing/selftests/cpu-opv/
 
 CRAMFS FILESYSTEM
 M:	Nicolas Pitre <nico-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index eaf599dc2137..fc1eba0e0130 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -5,6 +5,7 @@ TARGETS += breakpoints
 TARGETS += capabilities
 TARGETS += cpufreq
 TARGETS += cpu-hotplug
+TARGETS += cpu-opv
 TARGETS += efivarfs
 TARGETS += exec
 TARGETS += firmware
diff --git a/tools/testing/selftests/cpu-opv/.gitignore b/tools/testing/selftests/cpu-opv/.gitignore
new file mode 100644
index 000000000000..c7186eb95cf5
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/.gitignore
@@ -0,0 +1 @@
+basic_cpu_opv_test
diff --git a/tools/testing/selftests/cpu-opv/Makefile b/tools/testing/selftests/cpu-opv/Makefile
new file mode 100644
index 000000000000..21e63545d521
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/Makefile
@@ -0,0 +1,17 @@
+CFLAGS += -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./
+
+# Own dependencies because we only want to build against 1st prerequisite, but
+# still track changes to header files and depend on shared object.
+OVERRIDE_TARGETS = 1
+
+TEST_GEN_PROGS = basic_cpu_opv_test
+
+TEST_GEN_PROGS_EXTENDED = libcpu-op.so
+
+include ../lib.mk
+
+$(OUTPUT)/libcpu-op.so: cpu-op.c cpu-op.h
+	$(CC) $(CFLAGS) -shared -fPIC $< -o $@
+
+$(OUTPUT)/%: %.c $(TEST_GEN_PROGS_EXTENDED) cpu-op.h
+	$(CC) $(CFLAGS) $< -lcpu-op -o $@
diff --git a/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
new file mode 100644
index 000000000000..5aeb6ed0b361
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
@@ -0,0 +1,1167 @@
+/*
+ * Basic test coverage for cpu_opv system call.
+ */
+
+#define _GNU_SOURCE
+#include <assert.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/time.h>
+#include <errno.h>
+#include <stdlib.h>
+
+#include "../kselftest.h"
+
+#include "cpu-op.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+#define TESTBUFLEN	4096
+#define TESTBUFLEN_CMP	16
+
+#define TESTBUFLEN_PAGE_MAX	65536
+
+#define NR_PF_ARRAY	16384
+#define PF_ARRAY_LEN	4096
+
+/* 64 MB arrays for page fault testing. */
+char pf_array_dst[NR_PF_ARRAY][PF_ARRAY_LEN];
+char pf_array_src[NR_PF_ARRAY][PF_ARRAY_LEN];
+
+static int test_compare_eq_op(char *a, char *b, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_compare_eq_same(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_eq same";
+
+	/* Test compare_eq */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf2[i] = (char)i;
+	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret > 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 0);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_compare_eq_diff(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_eq different";
+
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret == 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 1);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_compare_ne_op(char *a, char *b, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_compare_ne_same(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_ne same";
+
+	/* Test compare_ne */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf2[i] = (char)i;
+	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret == 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 1);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_compare_ne_diff(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_ne different";
+
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 0);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_2compare_eq_op(char *a, char *b, char *c, char *d,
+		size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_2compare_eq_index(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN_CMP];
+	char buf2[TESTBUFLEN_CMP];
+	char buf3[TESTBUFLEN_CMP];
+	char buf4[TESTBUFLEN_CMP];
+	const char *test_name = "test_2compare_eq index";
+
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN_CMP);
+	memset(buf3, 0, TESTBUFLEN_CMP);
+	memset(buf4, 0, TESTBUFLEN_CMP);
+
+	/* First compare failure is op[0], expect 1. */
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 1) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 1);
+		return -1;
+	}
+
+	/* All compares succeed. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf2[i] = (char)i;
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 0);
+		return -1;
+	}
+
+	/* First compare failure is op[1], expect 2. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf3[i] = (char)i;
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 2) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 2);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_2compare_ne_op(char *a, char *b, char *c, char *d,
+		size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_2compare_ne_index(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN_CMP];
+	char buf2[TESTBUFLEN_CMP];
+	char buf3[TESTBUFLEN_CMP];
+	char buf4[TESTBUFLEN_CMP];
+	const char *test_name = "test_2compare_ne index";
+
+	memset(buf1, 0, TESTBUFLEN_CMP);
+	memset(buf2, 0, TESTBUFLEN_CMP);
+	memset(buf3, 0, TESTBUFLEN_CMP);
+	memset(buf4, 0, TESTBUFLEN_CMP);
+
+	/* First compare ne failure is op[0], expect 1. */
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 1) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 1);
+		return -1;
+	}
+
+	/* All compare ne succeed. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf3[i] = (char)i;
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 0) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 0);
+		return -1;
+	}
+
+	/* First compare failure is op[1], expect 2. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf4[i] = (char)i;
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret != 2) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, ret, 2);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_memcpy_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_memcpy";
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	for (i = 0; i < TESTBUFLEN; i++) {
+		if (buf2[i] != (char)i) {
+			ksft_exit_fail_msg("%s test: unexpected value at offset %d. Found %d. Should be %d.\n",
+				   test_name, i, buf2[i], (char)i);
+			return -1;
+		}
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_memcpy_u32(void)
+{
+	int ret;
+	uint32_t v1, v2;
+	const char *test_name = "test_memcpy_u32";
+
+	/* Test memcpy_u32 */
+	v1 = 42;
+	v2 = 0;
+	ret = test_memcpy_op(&v2, &v1, sizeof(v1));
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v1 != v2) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v2, v1);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_memcpy_mb_memcpy_op(void *dst1, void *src1,
+		void *dst2, void *src2, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst1),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src1),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_MB_OP,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst2),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src2),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy_mb_memcpy(void)
+{
+	int ret;
+	int v1, v2, v3;
+	const char *test_name = "test_memcpy_mb_memcpy";
+
+	/* Test memcpy */
+	v1 = 42;
+	v2 = v3 = 0;
+	ret = test_memcpy_mb_memcpy_op(&v2, &v1, &v3, &v2, sizeof(int));
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v3 != v1) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v3, v1);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_add_op(int *v, int64_t increment)
+{
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_op_add(v, increment, sizeof(*v), cpu);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_add(void)
+{
+	int orig_v = 42, v, ret;
+	int increment = 1;
+	const char *test_name = "test_add";
+
+	v = orig_v;
+	ret = test_add_op(&v, increment);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != orig_v + increment) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v,
+				   orig_v + increment);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_two_add_op(int *v, int64_t *increments)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_ADD_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.arithmetic_op.p, v),
+			.u.arithmetic_op.count = increments[0],
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+		[1] = {
+			.op = CPU_ADD_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.arithmetic_op.p, v),
+			.u.arithmetic_op.count = increments[1],
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_two_add(void)
+{
+	int orig_v = 42, v, ret;
+	int64_t increments[2] = { 99, 123 };
+	const char *test_name = "test_two_add";
+
+	v = orig_v;
+	ret = test_two_add_op(&v, increments);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != orig_v + increments[0] + increments[1]) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v,
+				   orig_v + increments[0] + increments[1]);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_or_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_OR_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_or(void)
+{
+	int orig_v = 0xFF00000, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_or";
+
+	v = orig_v;
+	ret = test_or_op(&v, mask);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v | mask)) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v, orig_v | mask);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_and_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_AND_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_and(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_and";
+
+	v = orig_v;
+	ret = test_and_op(&v, mask);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v & mask)) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v, orig_v & mask);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_xor_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_XOR_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_xor(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_xor";
+
+	v = orig_v;
+	ret = test_xor_op(&v, mask);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v ^ mask)) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v, orig_v ^ mask);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_lshift_op(int *v, uint32_t bits)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_LSHIFT_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.shift_op.p, v),
+			.u.shift_op.bits = bits,
+			.u.shift_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_lshift(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t bits = 5;
+	const char *test_name = "test_lshift";
+
+	v = orig_v;
+	ret = test_lshift_op(&v, bits);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v << bits)) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v, orig_v << bits);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_rshift_op(int *v, uint32_t bits)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_RSHIFT_OP,
+			.len = sizeof(*v),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(
+				.u.shift_op.p, v),
+			.u.shift_op.bits = bits,
+			.u.shift_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_rshift(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t bits = 5;
+	const char *test_name = "test_rshift";
+
+	v = orig_v;
+	ret = test_rshift_op(&v, bits);
+	if (ret) {
+		ksft_exit_fail_msg("%s test: returned with %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v >> bits)) {
+		ksft_exit_fail_msg("%s test: unexpected value %d. Should be %d.\n",
+				   test_name, v, orig_v >> bits);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_cmpxchg_op(void *v, void *expect, void *old, void *n,
+		size_t len)
+{
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_op_cmpxchg(v, expect, old, n, len, cpu);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_cmpxchg_success(void)
+{
+	int ret;
+	uint64_t orig_v = 1, v, expect = 1, old = 0, n = 3;
+	const char *test_name = "test_cmpxchg success";
+
+	v = orig_v;
+	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret) {
+		ksft_exit_fail_msg("%s returned %d, expecting %d\n",
+				   test_name, ret, 0);
+		return -1;
+	}
+	if (v != n) {
+		ksft_exit_fail_msg("%s v is %lld, expecting %lld\n",
+				   test_name, (long long)v, (long long)n);
+		return -1;
+	}
+	if (old != orig_v) {
+		ksft_exit_fail_msg("%s old is %lld, expecting %lld\n",
+				   test_name, (long long)old,
+				   (long long)orig_v);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_cmpxchg_fail(void)
+{
+	int ret;
+	uint64_t orig_v = 1, v, expect = 123, old = 0, n = 3;
+	const char *test_name = "test_cmpxchg fail";
+
+	v = orig_v;
+	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (ret == 0) {
+		ksft_exit_fail_msg("%s returned %d, expecting %d\n",
+				   test_name, ret, 1);
+		return -1;
+	}
+	if (v == n) {
+		ksft_exit_fail_msg("%s returned %lld, expecting %lld\n",
+				   test_name, (long long)v, (long long)orig_v);
+		return -1;
+	}
+	if (old != orig_v) {
+		ksft_exit_fail_msg("%s old is %lld, expecting %lld\n",
+				   test_name, (long long)old,
+				   (long long)orig_v);
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_memcpy_expect_fault_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			/* Return EAGAIN on fault. */
+			.u.memcpy_op.expect_fault_src = 1,
+		},
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_memcpy_fault(void)
+{
+	int ret;
+	char buf1[TESTBUFLEN];
+	const char *test_name = "test_memcpy_fault";
+
+	/* Test memcpy */
+	ret = test_memcpy_op(buf1, NULL, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EFAULT)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	/* Test memcpy expect fault */
+	ret = test_memcpy_expect_fault_op(buf1, NULL, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EAGAIN)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int do_test_unknown_op(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = -1,	/* Unknown */
+			.len = 0,
+		},
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_unknown_op(void)
+{
+	int ret;
+	const char *test_name = "test_unknown_op";
+
+	ret = do_test_unknown_op();
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int do_test_max_ops(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = { .op = CPU_MB_OP, },
+		[1] = { .op = CPU_MB_OP, },
+		[2] = { .op = CPU_MB_OP, },
+		[3] = { .op = CPU_MB_OP, },
+		[4] = { .op = CPU_MB_OP, },
+		[5] = { .op = CPU_MB_OP, },
+		[6] = { .op = CPU_MB_OP, },
+		[7] = { .op = CPU_MB_OP, },
+		[8] = { .op = CPU_MB_OP, },
+		[9] = { .op = CPU_MB_OP, },
+		[10] = { .op = CPU_MB_OP, },
+		[11] = { .op = CPU_MB_OP, },
+		[12] = { .op = CPU_MB_OP, },
+		[13] = { .op = CPU_MB_OP, },
+		[14] = { .op = CPU_MB_OP, },
+		[15] = { .op = CPU_MB_OP, },
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_max_ops(void)
+{
+	int ret;
+	const char *test_name = "test_max_ops";
+
+	ret = do_test_max_ops();
+	if (ret < 0) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int do_test_too_many_ops(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = { .op = CPU_MB_OP, },
+		[1] = { .op = CPU_MB_OP, },
+		[2] = { .op = CPU_MB_OP, },
+		[3] = { .op = CPU_MB_OP, },
+		[4] = { .op = CPU_MB_OP, },
+		[5] = { .op = CPU_MB_OP, },
+		[6] = { .op = CPU_MB_OP, },
+		[7] = { .op = CPU_MB_OP, },
+		[8] = { .op = CPU_MB_OP, },
+		[9] = { .op = CPU_MB_OP, },
+		[10] = { .op = CPU_MB_OP, },
+		[11] = { .op = CPU_MB_OP, },
+		[12] = { .op = CPU_MB_OP, },
+		[13] = { .op = CPU_MB_OP, },
+		[14] = { .op = CPU_MB_OP, },
+		[15] = { .op = CPU_MB_OP, },
+		[16] = { .op = CPU_MB_OP, },
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_too_many_ops(void)
+{
+	int ret;
+	const char *test_name = "test_too_many_ops";
+
+	ret = do_test_too_many_ops();
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+/* Use 64kB len, largest page size known on Linux. */
+static int test_memcpy_single_too_large(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN_PAGE_MAX + 1];
+	char buf2[TESTBUFLEN_PAGE_MAX + 1];
+	const char *test_name = "test_memcpy_single_too_large";
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN_PAGE_MAX + 1; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN_PAGE_MAX + 1);
+	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN_PAGE_MAX + 1);
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+static int test_memcpy_single_ok_sum_too_large_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			LINUX_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy_single_ok_sum_too_large(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_memcpy_single_ok_sum_too_large";
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_memcpy_single_ok_sum_too_large_op(buf2, buf1, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+				   test_name, ret, strerror(errno));
+		return -1;
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+/*
+ * Iterate over large uninitialized arrays to trigger page faults.
+ */
+int test_page_fault(void)
+{
+	int ret = 0;
+	uint64_t i;
+	const char *test_name = "test_page_fault";
+
+	for (i = 0; i < NR_PF_ARRAY; i++) {
+		ret = test_memcpy_op(pf_array_dst[i],
+				     pf_array_src[i],
+				     PF_ARRAY_LEN);
+		if (ret) {
+			ksft_exit_fail_msg("%s test: ret = %d, errno = %s\n",
+					   test_name, ret, strerror(errno));
+			return ret;
+		}
+	}
+	ksft_test_result_pass("%s test\n", test_name);
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	ksft_print_header();
+
+	test_compare_eq_same();
+	test_compare_eq_diff();
+	test_compare_ne_same();
+	test_compare_ne_diff();
+	test_2compare_eq_index();
+	test_2compare_ne_index();
+	test_memcpy();
+	test_memcpy_u32();
+	test_memcpy_mb_memcpy();
+	test_add();
+	test_two_add();
+	test_or();
+	test_and();
+	test_xor();
+	test_lshift();
+	test_rshift();
+	test_cmpxchg_success();
+	test_cmpxchg_fail();
+	test_memcpy_fault();
+	test_unknown_op();
+	test_max_ops();
+	test_too_many_ops();
+	test_memcpy_single_too_large();
+	test_memcpy_single_ok_sum_too_large();
+	test_page_fault();
+
+	return ksft_exit_pass();
+}
diff --git a/tools/testing/selftests/cpu-opv/cpu-op.c b/tools/testing/selftests/cpu-opv/cpu-op.c
new file mode 100644
index 000000000000..d7ba481cca04
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/cpu-op.c
@@ -0,0 +1,348 @@
+/*
+ * cpu-op.c
+ *
+ * Copyright (C) 2017 Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; only
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <syscall.h>
+#include <assert.h>
+#include <signal.h>
+
+#include "cpu-op.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+int cpu_opv(struct cpu_op *cpu_opv, int cpuopcnt, int cpu, int flags)
+{
+	return syscall(__NR_cpu_opv, cpu_opv, cpuopcnt, cpu, flags);
+}
+
+int cpu_op_get_current_cpu(void)
+{
+	int cpu;
+
+	cpu = sched_getcpu();
+	if (cpu < 0) {
+		perror("sched_getcpu()");
+		abort();
+	}
+	return cpu;
+}
+
+int cpu_op_cmpxchg(void *v, void *expect, void *old, void *n,
+		size_t len, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)old,
+			.u.memcpy_op.src = (unsigned long)v,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)n,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_add(void *v, int64_t count, size_t len, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_ADD_OP,
+			.len = len,
+			.u.arithmetic_op.p = (unsigned long)v,
+			.u.arithmetic_op.count = count,
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int cpu_op_cmpeqv_storep_expect_fault(intptr_t *v, intptr_t expect,
+		intptr_t *newp, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)newp,
+			.u.memcpy_op.expect_fault_dst = 0,
+			/* Return EAGAIN on src fault. */
+			.u.memcpy_op.expect_fault_src = 1,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	intptr_t oldv = READ_ONCE(*v);
+	intptr_t *newp = (intptr_t *)(oldv + voffp);
+	int ret;
+
+	if (oldv == expectnot)
+		return 1;
+	ret = cpu_op_cmpeqv_storep_expect_fault(v, oldv, newp, cpu);
+	if (!ret) {
+		*load = oldv;
+		return 0;
+	}
+	if (ret > 0) {
+		errno = EAGAIN;
+		return -1;
+	}
+	return -1;
+}
+
+int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v2,
+			.u.memcpy_op.src = (unsigned long)&newv2,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v2,
+			.u.memcpy_op.src = (unsigned long)&newv2,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MB_OP,
+		},
+		[3] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v2,
+			.u.compare_op.b = (unsigned long)&expect2,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)dst,
+			.u.memcpy_op.src = (unsigned long)src,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)dst,
+			.u.memcpy_op.src = (unsigned long)src,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MB_OP,
+		},
+		[3] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_addv(intptr_t *v, int64_t count, int cpu)
+{
+	return cpu_op_add(v, count, sizeof(intptr_t), cpu);
+}
diff --git a/tools/testing/selftests/cpu-opv/cpu-op.h b/tools/testing/selftests/cpu-opv/cpu-op.h
new file mode 100644
index 000000000000..ba2ec578ec50
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/cpu-op.h
@@ -0,0 +1,68 @@
+/*
+ * cpu-op.h
+ *
+ * (C) Copyright 2017 - Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef CPU_OPV_H
+#define CPU_OPV_H
+
+#include <stdlib.h>
+#include <stdint.h>
+#include <linux/cpu_opv.h>
+
+#define likely(x)		__builtin_expect(!!(x), 1)
+#define unlikely(x)		__builtin_expect(!!(x), 0)
+#define barrier()		__asm__ __volatile__("" : : : "memory")
+
+#define ACCESS_ONCE(x)		(*(__volatile__  __typeof__(x) *)&(x))
+#define WRITE_ONCE(x, v)	__extension__ ({ ACCESS_ONCE(x) = (v); })
+#define READ_ONCE(x)		ACCESS_ONCE(x)
+
+int cpu_opv(struct cpu_op *cpuopv, int cpuopcnt, int cpu, int flags);
+int cpu_op_get_current_cpu(void);
+
+int cpu_op_cmpxchg(void *v, void *expect, void *old, void *_new,
+		size_t len, int cpu);
+int cpu_op_add(void *v, int64_t count, size_t len, int cpu);
+
+int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu);
+int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu);
+int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu);
+int cpu_op_addv(intptr_t *v, int64_t count, int cpu);
+
+#endif  /* CPU_OPV_H_ */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* [PATCH update for 4.15 3/3] rseq: selftests: Provide self-tests (v4)
  2017-11-21 14:18 ` Mathieu Desnoyers
                   ` (25 preceding siblings ...)
  (?)
@ 2017-11-21 22:19 ` Mathieu Desnoyers
  2017-11-22 15:23     ` shuah
  -1 siblings, 1 reply; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 22:19 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Shuah Khan,
	linux-kselftest

Implements two basic tests of RSEQ functionality, and one more
exhaustive parameterizable test.

The first, "basic_test" only asserts that RSEQ works moderately
correctly. E.g. that the CPUID pointer works.

"basic_percpu_ops_test" is a slightly more "realistic" variant,
implementing a few simple per-cpu operations and testing their
correctness.

"param_test" is a parametrizable restartable sequences test. See
the "--help" output for usage.

A run_param_test.sh script runs many variants of the parametrizable
tests.

As part of those tests, a helper library "rseq" implements a user-space
API around restartable sequences. It uses the cpu_opv system call as
fallback when single-stepped by a debugger. It exposes the instruction
pointer addresses where the rseq assembly blocks begin and end, as well
as the associated abort instruction pointer, in the __rseq_table
section. This section allows debuggers may know where to place
breakpoints when single-stepping through assembly blocks which may be
aborted at any point by the kernel.

The rseq library expose APIs that present the fast-path operations.
The new from userspace is, e.g. for a counter increment:

    cpu = rseq_cpu_start();
    ret = rseq_addv(&data->c[cpu].count, 1, cpu);
    if (likely(!ret))
        return 0;        /* Success. */
    do {
        cpu = rseq_current_cpu();
        ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
        if (likely(!ret))
            return 0;    /* Success. */
    } while (ret > 0 || errno == EAGAIN);
    perror("cpu_op_addv");
    return -1;           /* Unexpected error. */

PowerPC tests have been implemented by Boqun Feng.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Shuah Khan <shuah@kernel.org>
CC: linux-kselftest@vger.kernel.org
CC: linux-api@vger.kernel.org
---
Changes since v1:
- Provide abort-ip signature: The abort-ip signature is located just
  before the abort-ip target. It is currently hardcoded, but a
  user-space application could use the __rseq_table to iterate on all
  abort-ip targets and use a random value as signature if needed in the
  future.
- Add rseq_prepare_unload(): Libraries and JIT code using rseq critical
  sections need to issue rseq_prepare_unload() on each thread at least
  once before reclaim of struct rseq_cs.
- Use initial-exec TLS model, non-weak symbol: The initial-exec model is
  signal-safe, whereas the global-dynamic model is not.  Remove the
  "weak" symbol attribute from the __rseq_abi in rseq.c. The rseq.so
  library will have ownership of that symbol, and there is not reason for
  an application or user library to try to define that symbol.
  The expected use is to link against libreq.so, which owns and provide
  that symbol.
- Set cpu_id to -2 on register error
- Add rseq_len syscall parameter, rseq_cs version
- Ensure disassember-friendly signature: x86 32/64 disassembler have a
  hard time decoding the instruction stream after a bad instruction. Use
  a nopl instruction to encode the signature. Suggested by Andy Lutomirski.
- Exercise parametrized tests variants in a shell scripts.
- Restartable sequences selftests: Remove use of event counter.
- Use cpu_id_start field:  With the cpu_id_start field, the C
  preparation phase of the fast-path does not need to compare cpu_id < 0
  anymore.
- Signal-safe registration and refcounting: Allow libraries using
  librseq.so to register it from signal handlers.
- Use OVERRIDE_TARGETS in makefile.
- Use "m" constraints for rseq_cs field.

Changes since v2:
- Update based on Thomas Gleixner's comments.

Changes since v3:
- Generate param_test_skip_fastpath and param_test_benchmark with
  -DSKIP_FASTPATH and -DBENCHMARK (respectively). Add param_test_fastpath
  to run_param_test.sh.
---
 MAINTAINERS                                        |    1 +
 tools/testing/selftests/Makefile                   |    1 +
 tools/testing/selftests/rseq/.gitignore            |    4 +
 tools/testing/selftests/rseq/Makefile              |   33 +
 .../testing/selftests/rseq/basic_percpu_ops_test.c |  333 +++++
 tools/testing/selftests/rseq/basic_test.c          |   55 +
 tools/testing/selftests/rseq/param_test.c          | 1285 ++++++++++++++++++++
 tools/testing/selftests/rseq/rseq-arm.h            |  535 ++++++++
 tools/testing/selftests/rseq/rseq-ppc.h            |  567 +++++++++
 tools/testing/selftests/rseq/rseq-x86.h            |  898 ++++++++++++++
 tools/testing/selftests/rseq/rseq.c                |  116 ++
 tools/testing/selftests/rseq/rseq.h                |  154 +++
 tools/testing/selftests/rseq/run_param_test.sh     |  126 ++
 13 files changed, 4108 insertions(+)
 create mode 100644 tools/testing/selftests/rseq/.gitignore
 create mode 100644 tools/testing/selftests/rseq/Makefile
 create mode 100644 tools/testing/selftests/rseq/basic_percpu_ops_test.c
 create mode 100644 tools/testing/selftests/rseq/basic_test.c
 create mode 100644 tools/testing/selftests/rseq/param_test.c
 create mode 100644 tools/testing/selftests/rseq/rseq-arm.h
 create mode 100644 tools/testing/selftests/rseq/rseq-ppc.h
 create mode 100644 tools/testing/selftests/rseq/rseq-x86.h
 create mode 100644 tools/testing/selftests/rseq/rseq.c
 create mode 100644 tools/testing/selftests/rseq/rseq.h
 create mode 100755 tools/testing/selftests/rseq/run_param_test.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index c6c2436d15f8..ba9137c1f295 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11634,6 +11634,7 @@ S:	Supported
 F:	kernel/rseq.c
 F:	include/uapi/linux/rseq.h
 F:	include/trace/events/rseq.h
+F:	tools/testing/selftests/rseq/
 
 RFKILL
 M:	Johannes Berg <johannes@sipsolutions.net>
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index fc1eba0e0130..fc314334628a 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -26,6 +26,7 @@ TARGETS += nsfs
 TARGETS += powerpc
 TARGETS += pstore
 TARGETS += ptrace
+TARGETS += rseq
 TARGETS += seccomp
 TARGETS += sigaltstack
 TARGETS += size
diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore
new file mode 100644
index 000000000000..9409c3db99b2
--- /dev/null
+++ b/tools/testing/selftests/rseq/.gitignore
@@ -0,0 +1,4 @@
+basic_percpu_ops_test
+basic_test
+basic_rseq_op_test
+param_test
diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
new file mode 100644
index 000000000000..3c946c517a1a
--- /dev/null
+++ b/tools/testing/selftests/rseq/Makefile
@@ -0,0 +1,33 @@
+CFLAGS += -O2 -Wall -g -I./ -I../cpu-opv/ -I../../../../usr/include/ -L./ -Wl,-rpath=./
+LDLIBS += -lpthread
+
+# Own dependencies because we only want to build against 1st prerequisite, but
+# still track changes to header files and depend on shared object.
+OVERRIDE_TARGETS = 1
+
+TEST_GEN_PROGS = basic_test basic_percpu_ops_test \
+		param_test param_test_skip_fastpath \
+		param_test_benchmark
+
+TEST_GEN_PROGS_EXTENDED = librseq.so libcpu-op.so
+
+TEST_PROGS = run_param_test.sh
+
+include ../lib.mk
+
+$(OUTPUT)/librseq.so: rseq.c rseq.h rseq-*.h
+	$(CC) $(CFLAGS) -shared -fPIC $< $(LDLIBS) -o $@
+
+$(OUTPUT)/libcpu-op.so: ../cpu-opv/cpu-op.c ../cpu-opv/cpu-op.h
+	$(CC) $(CFLAGS) -shared -fPIC $< $(LDLIBS) -o $@
+
+$(OUTPUT)/%: %.c $(TEST_GEN_PROGS_EXTENDED) rseq.h rseq-*.h ../cpu-opv/cpu-op.h
+	$(CC) $(CFLAGS) $< $(LDLIBS) -lrseq -lcpu-op -o $@
+
+$(OUTPUT)/param_test_skip_fastpath: param_test.c $(TEST_GEN_PROGS_EXTENDED) \
+					rseq.h rseq-*.h ../cpu-opv/cpu-op.h
+	$(CC) $(CFLAGS) -DSKIP_FASTPATH $< $(LDLIBS) -lrseq -lcpu-op -o $@
+
+$(OUTPUT)/param_test_benchmark: param_test.c $(TEST_GEN_PROGS_EXTENDED) \
+					rseq.h rseq-*.h ../cpu-opv/cpu-op.h
+	$(CC) $(CFLAGS) -DBENCHMARK $< $(LDLIBS) -lrseq -lcpu-op -o $@
diff --git a/tools/testing/selftests/rseq/basic_percpu_ops_test.c b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
new file mode 100644
index 000000000000..e5f7fed06a03
--- /dev/null
+++ b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
@@ -0,0 +1,333 @@
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stddef.h>
+
+#include "rseq.h"
+#include "cpu-op.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+struct percpu_lock_entry {
+	intptr_t v;
+} __attribute__((aligned(128)));
+
+struct percpu_lock {
+	struct percpu_lock_entry c[CPU_SETSIZE];
+};
+
+struct test_data_entry {
+	intptr_t count;
+} __attribute__((aligned(128)));
+
+struct spinlock_test_data {
+	struct percpu_lock lock;
+	struct test_data_entry c[CPU_SETSIZE];
+	int reps;
+};
+
+struct percpu_list_node {
+	intptr_t data;
+	struct percpu_list_node *next;
+};
+
+struct percpu_list_entry {
+	struct percpu_list_node *head;
+} __attribute__((aligned(128)));
+
+struct percpu_list {
+	struct percpu_list_entry c[CPU_SETSIZE];
+};
+
+/* A simple percpu spinlock.  Returns the cpu lock was acquired on. */
+int rseq_percpu_lock(struct percpu_lock *lock)
+{
+	int cpu;
+
+	for (;;) {
+		int ret;
+
+#ifndef SKIP_FASTPATH
+		/* Try fast path. */
+		cpu = rseq_cpu_start();
+		ret = rseq_cmpeqv_storev(&lock->c[cpu].v,
+				0, 1, cpu);
+		if (likely(!ret))
+			break;
+		if (ret > 0)
+			continue;	/* Retry. */
+#endif
+	slowpath:
+		__attribute__((unused));
+		/* Fallback on cpu_opv system call. */
+		cpu = rseq_current_cpu();
+		ret = cpu_op_cmpeqv_storev(&lock->c[cpu].v, 0, 1, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	/*
+	 * Acquire semantic when taking lock after control dependency.
+	 * Matches rseq_smp_store_release().
+	 */
+	rseq_smp_acquire__after_ctrl_dep();
+	return cpu;
+}
+
+void rseq_percpu_unlock(struct percpu_lock *lock, int cpu)
+{
+	assert(lock->c[cpu].v == 1);
+	/*
+	 * Release lock, with release semantic. Matches
+	 * rseq_smp_acquire__after_ctrl_dep().
+	 */
+	rseq_smp_store_release(&lock->c[cpu].v, 0);
+}
+
+void *test_percpu_spinlock_thread(void *arg)
+{
+	struct spinlock_test_data *data = arg;
+	int i, cpu;
+
+	if (rseq_register_current_thread()) {
+		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+	for (i = 0; i < data->reps; i++) {
+		cpu = rseq_percpu_lock(&data->lock);
+		data->c[cpu].count++;
+		rseq_percpu_unlock(&data->lock, cpu);
+	}
+	if (rseq_unregister_current_thread()) {
+		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+
+	return NULL;
+}
+
+/*
+ * A simple test which implements a sharded counter using a per-cpu
+ * lock.  Obviously real applications might prefer to simply use a
+ * per-cpu increment; however, this is reasonable for a test and the
+ * lock can be extended to synchronize more complicated operations.
+ */
+void test_percpu_spinlock(void)
+{
+	const int num_threads = 200;
+	int i;
+	uint64_t sum;
+	pthread_t test_threads[num_threads];
+	struct spinlock_test_data data;
+
+	memset(&data, 0, sizeof(data));
+	data.reps = 5000;
+
+	for (i = 0; i < num_threads; i++)
+		pthread_create(&test_threads[i], NULL,
+			test_percpu_spinlock_thread, &data);
+
+	for (i = 0; i < num_threads; i++)
+		pthread_join(test_threads[i], NULL);
+
+	sum = 0;
+	for (i = 0; i < CPU_SETSIZE; i++)
+		sum += data.c[i].count;
+
+	assert(sum == (uint64_t)data.reps * num_threads);
+}
+
+int percpu_list_push(struct percpu_list *list, struct percpu_list_node *node)
+{
+	intptr_t *targetptr, newval, expect;
+	int cpu, ret;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load list->c[cpu].head with single-copy atomicity. */
+	expect = (intptr_t)READ_ONCE(list->c[cpu].head);
+	newval = (intptr_t)node;
+	targetptr = (intptr_t *)&list->c[cpu].head;
+	node->next = (struct percpu_list_node *)expect;
+	ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu);
+	if (likely(!ret))
+		return cpu;
+#endif
+	/* Fallback on cpu_opv system call. */
+	slowpath:
+		__attribute__((unused));
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load list->c[cpu].head with single-copy atomicity. */
+		expect = (intptr_t)READ_ONCE(list->c[cpu].head);
+		newval = (intptr_t)node;
+		targetptr = (intptr_t *)&list->c[cpu].head;
+		node->next = (struct percpu_list_node *)expect;
+		ret = cpu_op_cmpeqv_storev(targetptr, expect, newval, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return cpu;
+}
+
+/*
+ * Unlike a traditional lock-less linked list; the availability of a
+ * rseq primitive allows us to implement pop without concerns over
+ * ABA-type races.
+ */
+struct percpu_list_node *percpu_list_pop(struct percpu_list *list)
+{
+	struct percpu_list_node *head;
+	int cpu, ret;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	ret = rseq_cmpnev_storeoffp_load((intptr_t *)&list->c[cpu].head,
+		(intptr_t)NULL,
+		offsetof(struct percpu_list_node, next),
+		(intptr_t *)&head, cpu);
+	if (likely(!ret))
+		return head;
+	if (ret > 0)
+		return NULL;
+#endif
+	/* Fallback on cpu_opv system call. */
+	slowpath:
+		__attribute__((unused));
+	for (;;) {
+		cpu = rseq_current_cpu();
+		ret = cpu_op_cmpnev_storeoffp_load(
+			(intptr_t *)&list->c[cpu].head,
+			(intptr_t)NULL,
+			offsetof(struct percpu_list_node, next),
+			(intptr_t *)&head, cpu);
+		if (likely(!ret))
+			break;
+		if (ret > 0)
+			return NULL;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return head;
+}
+
+void *test_percpu_list_thread(void *arg)
+{
+	int i;
+	struct percpu_list *list = (struct percpu_list *)arg;
+
+	if (rseq_register_current_thread()) {
+		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+
+	for (i = 0; i < 100000; i++) {
+		struct percpu_list_node *node = percpu_list_pop(list);
+
+		sched_yield();  /* encourage shuffling */
+		if (node)
+			percpu_list_push(list, node);
+	}
+
+	if (rseq_unregister_current_thread()) {
+		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+
+	return NULL;
+}
+
+/* Simultaneous modification to a per-cpu linked list from many threads.  */
+void test_percpu_list(void)
+{
+	int i, j;
+	uint64_t sum = 0, expected_sum = 0;
+	struct percpu_list list;
+	pthread_t test_threads[200];
+	cpu_set_t allowed_cpus;
+
+	memset(&list, 0, sizeof(list));
+
+	/* Generate list entries for every usable cpu. */
+	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+		for (j = 1; j <= 100; j++) {
+			struct percpu_list_node *node;
+
+			expected_sum += j;
+
+			node = malloc(sizeof(*node));
+			assert(node);
+			node->data = j;
+			node->next = list.c[i].head;
+			list.c[i].head = node;
+		}
+	}
+
+	for (i = 0; i < 200; i++)
+		assert(pthread_create(&test_threads[i], NULL,
+			test_percpu_list_thread, &list) == 0);
+
+	for (i = 0; i < 200; i++)
+		pthread_join(test_threads[i], NULL);
+
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		cpu_set_t pin_mask;
+		struct percpu_list_node *node;
+
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+
+		CPU_ZERO(&pin_mask);
+		CPU_SET(i, &pin_mask);
+		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
+
+		while ((node = percpu_list_pop(&list))) {
+			sum += node->data;
+			free(node);
+		}
+	}
+
+	/*
+	 * All entries should now be accounted for (unless some external
+	 * actor is interfering with our allowed affinity while this
+	 * test is running).
+	 */
+	assert(sum == expected_sum);
+}
+
+int main(int argc, char **argv)
+{
+	if (rseq_register_current_thread()) {
+		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		goto error;
+	}
+	printf("spinlock\n");
+	test_percpu_spinlock();
+	printf("percpu_list\n");
+	test_percpu_list();
+	if (rseq_unregister_current_thread()) {
+		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		goto error;
+	}
+	return 0;
+
+error:
+	return -1;
+}
+
diff --git a/tools/testing/selftests/rseq/basic_test.c b/tools/testing/selftests/rseq/basic_test.c
new file mode 100644
index 000000000000..e2086b3885d7
--- /dev/null
+++ b/tools/testing/selftests/rseq/basic_test.c
@@ -0,0 +1,55 @@
+/*
+ * Basic test coverage for critical regions and rseq_current_cpu().
+ */
+
+#define _GNU_SOURCE
+#include <assert.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/time.h>
+
+#include "rseq.h"
+
+void test_cpu_pointer(void)
+{
+	cpu_set_t affinity, test_affinity;
+	int i;
+
+	sched_getaffinity(0, sizeof(affinity), &affinity);
+	CPU_ZERO(&test_affinity);
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		if (CPU_ISSET(i, &affinity)) {
+			CPU_SET(i, &test_affinity);
+			sched_setaffinity(0, sizeof(test_affinity),
+					&test_affinity);
+			assert(sched_getcpu() == i);
+			assert(rseq_current_cpu() == i);
+			assert(rseq_current_cpu_raw() == i);
+			assert(rseq_cpu_start() == i);
+			CPU_CLR(i, &test_affinity);
+		}
+	}
+	sched_setaffinity(0, sizeof(affinity), &affinity);
+}
+
+int main(int argc, char **argv)
+{
+	if (rseq_register_current_thread()) {
+		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		goto init_thread_error;
+	}
+	printf("testing current cpu\n");
+	test_cpu_pointer();
+	if (rseq_unregister_current_thread()) {
+		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		goto init_thread_error;
+	}
+	return 0;
+
+init_thread_error:
+	return -1;
+}
diff --git a/tools/testing/selftests/rseq/param_test.c b/tools/testing/selftests/rseq/param_test.c
new file mode 100644
index 000000000000..c7a16b656a36
--- /dev/null
+++ b/tools/testing/selftests/rseq/param_test.c
@@ -0,0 +1,1285 @@
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <syscall.h>
+#include <unistd.h>
+#include <poll.h>
+#include <sys/types.h>
+#include <signal.h>
+#include <errno.h>
+#include <stddef.h>
+
+#include "cpu-op.h"
+
+static inline pid_t gettid(void)
+{
+	return syscall(__NR_gettid);
+}
+
+#define NR_INJECT	9
+static int loop_cnt[NR_INJECT + 1];
+
+static int opt_modulo, verbose;
+
+static int opt_yield, opt_signal, opt_sleep,
+		opt_disable_rseq, opt_threads = 200,
+		opt_disable_mod = 0, opt_test = 's', opt_mb = 0;
+
+static long long opt_reps = 5000;
+
+static __thread __attribute__((tls_model("initial-exec"))) unsigned int signals_delivered;
+
+#ifndef BENCHMARK
+
+static __thread __attribute__((tls_model("initial-exec"))) unsigned int yield_mod_cnt, nr_abort;
+
+#define printf_verbose(fmt, ...)			\
+	do {						\
+		if (verbose)				\
+			printf(fmt, ## __VA_ARGS__);	\
+	} while (0)
+
+#define RSEQ_INJECT_INPUT \
+	, [loop_cnt_1]"m"(loop_cnt[1]) \
+	, [loop_cnt_2]"m"(loop_cnt[2]) \
+	, [loop_cnt_3]"m"(loop_cnt[3]) \
+	, [loop_cnt_4]"m"(loop_cnt[4]) \
+	, [loop_cnt_5]"m"(loop_cnt[5]) \
+	, [loop_cnt_6]"m"(loop_cnt[6])
+
+#if defined(__x86_64__) || defined(__i386__)
+
+#define INJECT_ASM_REG	"eax"
+
+#define RSEQ_INJECT_CLOBBER \
+	, INJECT_ASM_REG
+
+#define RSEQ_INJECT_ASM(n) \
+	"mov %[loop_cnt_" #n "], %%" INJECT_ASM_REG "\n\t" \
+	"test %%" INJECT_ASM_REG ",%%" INJECT_ASM_REG "\n\t" \
+	"jz 333f\n\t" \
+	"222:\n\t" \
+	"dec %%" INJECT_ASM_REG "\n\t" \
+	"jnz 222b\n\t" \
+	"333:\n\t"
+
+#elif defined(__ARMEL__)
+
+#define INJECT_ASM_REG	"r4"
+
+#define RSEQ_INJECT_CLOBBER \
+	, INJECT_ASM_REG
+
+#define RSEQ_INJECT_ASM(n) \
+	"ldr " INJECT_ASM_REG ", %[loop_cnt_" #n "]\n\t" \
+	"cmp " INJECT_ASM_REG ", #0\n\t" \
+	"beq 333f\n\t" \
+	"222:\n\t" \
+	"subs " INJECT_ASM_REG ", #1\n\t" \
+	"bne 222b\n\t" \
+	"333:\n\t"
+
+#elif __PPC__
+#define INJECT_ASM_REG	"r18"
+
+#define RSEQ_INJECT_CLOBBER \
+	, INJECT_ASM_REG
+
+#define RSEQ_INJECT_ASM(n) \
+	"lwz %%" INJECT_ASM_REG ", %[loop_cnt_" #n "]\n\t" \
+	"cmpwi %%" INJECT_ASM_REG ", 0\n\t" \
+	"beq 333f\n\t" \
+	"222:\n\t" \
+	"subic. %%" INJECT_ASM_REG ", %%" INJECT_ASM_REG ", 1\n\t" \
+	"bne 222b\n\t" \
+	"333:\n\t"
+#else
+#error unsupported target
+#endif
+
+#define RSEQ_INJECT_FAILED \
+	nr_abort++;
+
+#define RSEQ_INJECT_C(n) \
+{ \
+	int loc_i, loc_nr_loops = loop_cnt[n]; \
+	\
+	for (loc_i = 0; loc_i < loc_nr_loops; loc_i++) { \
+		barrier(); \
+	} \
+	if (loc_nr_loops == -1 && opt_modulo) { \
+		if (yield_mod_cnt == opt_modulo - 1) { \
+			if (opt_sleep > 0) \
+				poll(NULL, 0, opt_sleep); \
+			if (opt_yield) \
+				sched_yield(); \
+			if (opt_signal) \
+				raise(SIGUSR1); \
+			yield_mod_cnt = 0; \
+		} else { \
+			yield_mod_cnt++; \
+		} \
+	} \
+}
+
+#else
+
+#define printf_verbose(fmt, ...)
+
+#endif /* BENCHMARK */
+
+#include "rseq.h"
+
+struct percpu_lock_entry {
+	intptr_t v;
+} __attribute__((aligned(128)));
+
+struct percpu_lock {
+	struct percpu_lock_entry c[CPU_SETSIZE];
+};
+
+struct test_data_entry {
+	intptr_t count;
+} __attribute__((aligned(128)));
+
+struct spinlock_test_data {
+	struct percpu_lock lock;
+	struct test_data_entry c[CPU_SETSIZE];
+};
+
+struct spinlock_thread_test_data {
+	struct spinlock_test_data *data;
+	long long reps;
+	int reg;
+};
+
+struct inc_test_data {
+	struct test_data_entry c[CPU_SETSIZE];
+};
+
+struct inc_thread_test_data {
+	struct inc_test_data *data;
+	long long reps;
+	int reg;
+};
+
+struct percpu_list_node {
+	intptr_t data;
+	struct percpu_list_node *next;
+};
+
+struct percpu_list_entry {
+	struct percpu_list_node *head;
+} __attribute__((aligned(128)));
+
+struct percpu_list {
+	struct percpu_list_entry c[CPU_SETSIZE];
+};
+
+#define BUFFER_ITEM_PER_CPU	100
+
+struct percpu_buffer_node {
+	intptr_t data;
+};
+
+struct percpu_buffer_entry {
+	intptr_t offset;
+	intptr_t buflen;
+	struct percpu_buffer_node **array;
+} __attribute__((aligned(128)));
+
+struct percpu_buffer {
+	struct percpu_buffer_entry c[CPU_SETSIZE];
+};
+
+#define MEMCPY_BUFFER_ITEM_PER_CPU	100
+
+struct percpu_memcpy_buffer_node {
+	intptr_t data1;
+	uint64_t data2;
+};
+
+struct percpu_memcpy_buffer_entry {
+	intptr_t offset;
+	intptr_t buflen;
+	struct percpu_memcpy_buffer_node *array;
+} __attribute__((aligned(128)));
+
+struct percpu_memcpy_buffer {
+	struct percpu_memcpy_buffer_entry c[CPU_SETSIZE];
+};
+
+/* A simple percpu spinlock.  Returns the cpu lock was acquired on. */
+static int rseq_percpu_lock(struct percpu_lock *lock)
+{
+	int cpu;
+
+	for (;;) {
+		int ret;
+
+#ifndef SKIP_FASTPATH
+		/* Try fast path. */
+		cpu = rseq_cpu_start();
+		ret = rseq_cmpeqv_storev(&lock->c[cpu].v,
+				0, 1, cpu);
+		if (likely(!ret))
+			break;
+		if (ret > 0)
+			continue;	/* Retry. */
+#endif
+	slowpath:
+		__attribute__((unused));
+		/* Fallback on cpu_opv system call. */
+		cpu = rseq_current_cpu();
+		ret = cpu_op_cmpeqv_storev(&lock->c[cpu].v, 0, 1, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	/*
+	 * Acquire semantic when taking lock after control dependency.
+	 * Matches rseq_smp_store_release().
+	 */
+	rseq_smp_acquire__after_ctrl_dep();
+	return cpu;
+}
+
+static void rseq_percpu_unlock(struct percpu_lock *lock, int cpu)
+{
+	assert(lock->c[cpu].v == 1);
+	/*
+	 * Release lock, with release semantic. Matches
+	 * rseq_smp_acquire__after_ctrl_dep().
+	 */
+	rseq_smp_store_release(&lock->c[cpu].v, 0);
+}
+
+void *test_percpu_spinlock_thread(void *arg)
+{
+	struct spinlock_thread_test_data *thread_data = arg;
+	struct spinlock_test_data *data = thread_data->data;
+	int cpu;
+	long long i, reps;
+
+	if (!opt_disable_rseq && thread_data->reg
+			&& rseq_register_current_thread())
+		abort();
+	reps = thread_data->reps;
+	for (i = 0; i < reps; i++) {
+		cpu = rseq_percpu_lock(&data->lock);
+		data->c[cpu].count++;
+		rseq_percpu_unlock(&data->lock, cpu);
+#ifndef BENCHMARK
+		if (i != 0 && !(i % (reps / 10)))
+			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
+#endif
+	}
+	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
+		(int) gettid(), nr_abort, signals_delivered);
+	if (!opt_disable_rseq && thread_data->reg
+			&& rseq_unregister_current_thread())
+		abort();
+	return NULL;
+}
+
+/*
+ * A simple test which implements a sharded counter using a per-cpu
+ * lock.  Obviously real applications might prefer to simply use a
+ * per-cpu increment; however, this is reasonable for a test and the
+ * lock can be extended to synchronize more complicated operations.
+ */
+void test_percpu_spinlock(void)
+{
+	const int num_threads = opt_threads;
+	int i, ret;
+	uint64_t sum;
+	pthread_t test_threads[num_threads];
+	struct spinlock_test_data data;
+	struct spinlock_thread_test_data thread_data[num_threads];
+
+	memset(&data, 0, sizeof(data));
+	for (i = 0; i < num_threads; i++) {
+		thread_data[i].reps = opt_reps;
+		if (opt_disable_mod <= 0 || (i % opt_disable_mod))
+			thread_data[i].reg = 1;
+		else
+			thread_data[i].reg = 0;
+		thread_data[i].data = &data;
+		ret = pthread_create(&test_threads[i], NULL,
+			test_percpu_spinlock_thread, &thread_data[i]);
+		if (ret) {
+			errno = ret;
+			perror("pthread_create");
+			abort();
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		pthread_join(test_threads[i], NULL);
+		if (ret) {
+			errno = ret;
+			perror("pthread_join");
+			abort();
+		}
+	}
+
+	sum = 0;
+	for (i = 0; i < CPU_SETSIZE; i++)
+		sum += data.c[i].count;
+
+	assert(sum == (uint64_t)opt_reps * num_threads);
+}
+
+void *test_percpu_inc_thread(void *arg)
+{
+	struct inc_thread_test_data *thread_data = arg;
+	struct inc_test_data *data = thread_data->data;
+	long long i, reps;
+
+	if (!opt_disable_rseq && thread_data->reg
+			&& rseq_register_current_thread())
+		abort();
+	reps = thread_data->reps;
+	for (i = 0; i < reps; i++) {
+		int cpu, ret;
+
+#ifndef SKIP_FASTPATH
+		/* Try fast path. */
+		cpu = rseq_cpu_start();
+		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
+		if (likely(!ret))
+			goto next;
+#endif
+	slowpath:
+		__attribute__((unused));
+		for (;;) {
+			/* Fallback on cpu_opv system call. */
+			cpu = rseq_current_cpu();
+			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
+			if (likely(!ret))
+				break;
+			assert(ret >= 0 || errno == EAGAIN);
+		}
+	next:
+		__attribute__((unused));
+#ifndef BENCHMARK
+		if (i != 0 && !(i % (reps / 10)))
+			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
+#endif
+	}
+	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
+		(int) gettid(), nr_abort, signals_delivered);
+	if (!opt_disable_rseq && thread_data->reg
+			&& rseq_unregister_current_thread())
+		abort();
+	return NULL;
+}
+
+void test_percpu_inc(void)
+{
+	const int num_threads = opt_threads;
+	int i, ret;
+	uint64_t sum;
+	pthread_t test_threads[num_threads];
+	struct inc_test_data data;
+	struct inc_thread_test_data thread_data[num_threads];
+
+	memset(&data, 0, sizeof(data));
+	for (i = 0; i < num_threads; i++) {
+		thread_data[i].reps = opt_reps;
+		if (opt_disable_mod <= 0 || (i % opt_disable_mod))
+			thread_data[i].reg = 1;
+		else
+			thread_data[i].reg = 0;
+		thread_data[i].data = &data;
+		ret = pthread_create(&test_threads[i], NULL,
+			test_percpu_inc_thread, &thread_data[i]);
+		if (ret) {
+			errno = ret;
+			perror("pthread_create");
+			abort();
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		pthread_join(test_threads[i], NULL);
+		if (ret) {
+			errno = ret;
+			perror("pthread_join");
+			abort();
+		}
+	}
+
+	sum = 0;
+	for (i = 0; i < CPU_SETSIZE; i++)
+		sum += data.c[i].count;
+
+	assert(sum == (uint64_t)opt_reps * num_threads);
+}
+
+int percpu_list_push(struct percpu_list *list, struct percpu_list_node *node)
+{
+	intptr_t *targetptr, newval, expect;
+	int cpu, ret;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load list->c[cpu].head with single-copy atomicity. */
+	expect = (intptr_t)READ_ONCE(list->c[cpu].head);
+	newval = (intptr_t)node;
+	targetptr = (intptr_t *)&list->c[cpu].head;
+	node->next = (struct percpu_list_node *)expect;
+	ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu);
+	if (likely(!ret))
+		return cpu;
+#endif
+	/* Fallback on cpu_opv system call. */
+slowpath:
+	__attribute__((unused));
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load list->c[cpu].head with single-copy atomicity. */
+		expect = (intptr_t)READ_ONCE(list->c[cpu].head);
+		newval = (intptr_t)node;
+		targetptr = (intptr_t *)&list->c[cpu].head;
+		node->next = (struct percpu_list_node *)expect;
+		ret = cpu_op_cmpeqv_storev(targetptr, expect, newval, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return cpu;
+}
+
+/*
+ * Unlike a traditional lock-less linked list; the availability of a
+ * rseq primitive allows us to implement pop without concerns over
+ * ABA-type races.
+ */
+struct percpu_list_node *percpu_list_pop(struct percpu_list *list)
+{
+	struct percpu_list_node *head;
+	int cpu, ret;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	ret = rseq_cmpnev_storeoffp_load((intptr_t *)&list->c[cpu].head,
+		(intptr_t)NULL,
+		offsetof(struct percpu_list_node, next),
+		(intptr_t *)&head, cpu);
+	if (likely(!ret))
+		return head;
+	if (ret > 0)
+		return NULL;
+#endif
+	/* Fallback on cpu_opv system call. */
+	slowpath:
+		__attribute__((unused));
+	for (;;) {
+		cpu = rseq_current_cpu();
+		ret = cpu_op_cmpnev_storeoffp_load(
+			(intptr_t *)&list->c[cpu].head,
+			(intptr_t)NULL,
+			offsetof(struct percpu_list_node, next),
+			(intptr_t *)&head, cpu);
+		if (likely(!ret))
+			break;
+		if (ret > 0)
+			return NULL;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return head;
+}
+
+void *test_percpu_list_thread(void *arg)
+{
+	long long i, reps;
+	struct percpu_list *list = (struct percpu_list *)arg;
+
+	if (!opt_disable_rseq && rseq_register_current_thread())
+		abort();
+
+	reps = opt_reps;
+	for (i = 0; i < reps; i++) {
+		struct percpu_list_node *node = percpu_list_pop(list);
+
+		if (opt_yield)
+			sched_yield();  /* encourage shuffling */
+		if (node)
+			percpu_list_push(list, node);
+	}
+
+	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
+		(int) gettid(), nr_abort, signals_delivered);
+	if (!opt_disable_rseq && rseq_unregister_current_thread())
+		abort();
+
+	return NULL;
+}
+
+/* Simultaneous modification to a per-cpu linked list from many threads.  */
+void test_percpu_list(void)
+{
+	const int num_threads = opt_threads;
+	int i, j, ret;
+	uint64_t sum = 0, expected_sum = 0;
+	struct percpu_list list;
+	pthread_t test_threads[num_threads];
+	cpu_set_t allowed_cpus;
+
+	memset(&list, 0, sizeof(list));
+
+	/* Generate list entries for every usable cpu. */
+	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+		for (j = 1; j <= 100; j++) {
+			struct percpu_list_node *node;
+
+			expected_sum += j;
+
+			node = malloc(sizeof(*node));
+			assert(node);
+			node->data = j;
+			node->next = list.c[i].head;
+			list.c[i].head = node;
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		ret = pthread_create(&test_threads[i], NULL,
+			test_percpu_list_thread, &list);
+		if (ret) {
+			errno = ret;
+			perror("pthread_create");
+			abort();
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		pthread_join(test_threads[i], NULL);
+		if (ret) {
+			errno = ret;
+			perror("pthread_join");
+			abort();
+		}
+	}
+
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		cpu_set_t pin_mask;
+		struct percpu_list_node *node;
+
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+
+		CPU_ZERO(&pin_mask);
+		CPU_SET(i, &pin_mask);
+		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
+
+		while ((node = percpu_list_pop(&list))) {
+			sum += node->data;
+			free(node);
+		}
+	}
+
+	/*
+	 * All entries should now be accounted for (unless some external
+	 * actor is interfering with our allowed affinity while this
+	 * test is running).
+	 */
+	assert(sum == expected_sum);
+}
+
+bool percpu_buffer_push(struct percpu_buffer *buffer,
+		struct percpu_buffer_node *node)
+{
+	intptr_t *targetptr_spec, newval_spec;
+	intptr_t *targetptr_final, newval_final;
+	int cpu, ret;
+	intptr_t offset;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load offset with single-copy atomicity. */
+	offset = READ_ONCE(buffer->c[cpu].offset);
+	if (offset == buffer->c[cpu].buflen) {
+		if (unlikely(cpu != rseq_current_cpu_raw()))
+			goto slowpath;
+		return false;
+	}
+	newval_spec = (intptr_t)node;
+	targetptr_spec = (intptr_t *)&buffer->c[cpu].array[offset];
+	newval_final = offset + 1;
+	targetptr_final = &buffer->c[cpu].offset;
+	if (opt_mb)
+		ret = rseq_cmpeqv_trystorev_storev_release(targetptr_final,
+			offset, targetptr_spec, newval_spec,
+			newval_final, cpu);
+	else
+		ret = rseq_cmpeqv_trystorev_storev(targetptr_final,
+			offset, targetptr_spec, newval_spec,
+			newval_final, cpu);
+	if (likely(!ret))
+		return true;
+#endif
+slowpath:
+	__attribute__((unused));
+	/* Fallback on cpu_opv system call. */
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load offset with single-copy atomicity. */
+		offset = READ_ONCE(buffer->c[cpu].offset);
+		if (offset == buffer->c[cpu].buflen)
+			return false;
+		newval_spec = (intptr_t)node;
+		targetptr_spec = (intptr_t *)&buffer->c[cpu].array[offset];
+		newval_final = offset + 1;
+		targetptr_final = &buffer->c[cpu].offset;
+		if (opt_mb)
+			ret = cpu_op_cmpeqv_storev_mb_storev(targetptr_final,
+				offset, targetptr_spec, newval_spec,
+				newval_final, cpu);
+		else
+			ret = cpu_op_cmpeqv_storev_storev(targetptr_final,
+				offset, targetptr_spec, newval_spec,
+				newval_final, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return true;
+}
+
+struct percpu_buffer_node *percpu_buffer_pop(struct percpu_buffer *buffer)
+{
+	struct percpu_buffer_node *head;
+	intptr_t *targetptr, newval;
+	int cpu, ret;
+	intptr_t offset;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load offset with single-copy atomicity. */
+	offset = READ_ONCE(buffer->c[cpu].offset);
+	if (offset == 0) {
+		if (unlikely(cpu != rseq_current_cpu_raw()))
+			goto slowpath;
+		return NULL;
+	}
+	head = buffer->c[cpu].array[offset - 1];
+	newval = offset - 1;
+	targetptr = (intptr_t *)&buffer->c[cpu].offset;
+	ret = rseq_cmpeqv_cmpeqv_storev(targetptr, offset,
+		(intptr_t *)&buffer->c[cpu].array[offset - 1], (intptr_t)head,
+		newval, cpu);
+	if (likely(!ret))
+		return head;
+#endif
+slowpath:
+	__attribute__((unused));
+	/* Fallback on cpu_opv system call. */
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load offset with single-copy atomicity. */
+		offset = READ_ONCE(buffer->c[cpu].offset);
+		if (offset == 0)
+			return NULL;
+		head = buffer->c[cpu].array[offset - 1];
+		newval = offset - 1;
+		targetptr = (intptr_t *)&buffer->c[cpu].offset;
+		ret = cpu_op_cmpeqv_cmpeqv_storev(targetptr, offset,
+			(intptr_t *)&buffer->c[cpu].array[offset - 1],
+			(intptr_t)head, newval, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return head;
+}
+
+void *test_percpu_buffer_thread(void *arg)
+{
+	long long i, reps;
+	struct percpu_buffer *buffer = (struct percpu_buffer *)arg;
+
+	if (!opt_disable_rseq && rseq_register_current_thread())
+		abort();
+
+	reps = opt_reps;
+	for (i = 0; i < reps; i++) {
+		struct percpu_buffer_node *node = percpu_buffer_pop(buffer);
+
+		if (opt_yield)
+			sched_yield();  /* encourage shuffling */
+		if (node) {
+			if (!percpu_buffer_push(buffer, node)) {
+				/* Should increase buffer size. */
+				abort();
+			}
+		}
+	}
+
+	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
+		(int) gettid(), nr_abort, signals_delivered);
+	if (!opt_disable_rseq && rseq_unregister_current_thread())
+		abort();
+
+	return NULL;
+}
+
+/* Simultaneous modification to a per-cpu buffer from many threads.  */
+void test_percpu_buffer(void)
+{
+	const int num_threads = opt_threads;
+	int i, j, ret;
+	uint64_t sum = 0, expected_sum = 0;
+	struct percpu_buffer buffer;
+	pthread_t test_threads[num_threads];
+	cpu_set_t allowed_cpus;
+
+	memset(&buffer, 0, sizeof(buffer));
+
+	/* Generate list entries for every usable cpu. */
+	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+		/* Worse-case is every item in same CPU. */
+		buffer.c[i].array =
+			malloc(sizeof(*buffer.c[i].array) * CPU_SETSIZE
+				* BUFFER_ITEM_PER_CPU);
+		assert(buffer.c[i].array);
+		buffer.c[i].buflen = CPU_SETSIZE * BUFFER_ITEM_PER_CPU;
+		for (j = 1; j <= BUFFER_ITEM_PER_CPU; j++) {
+			struct percpu_buffer_node *node;
+
+			expected_sum += j;
+
+			/*
+			 * We could theoretically put the word-sized
+			 * "data" directly in the buffer. However, we
+			 * want to model objects that would not fit
+			 * within a single word, so allocate an object
+			 * for each node.
+			 */
+			node = malloc(sizeof(*node));
+			assert(node);
+			node->data = j;
+			buffer.c[i].array[j - 1] = node;
+			buffer.c[i].offset++;
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		ret = pthread_create(&test_threads[i], NULL,
+			test_percpu_buffer_thread, &buffer);
+		if (ret) {
+			errno = ret;
+			perror("pthread_create");
+			abort();
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		pthread_join(test_threads[i], NULL);
+		if (ret) {
+			errno = ret;
+			perror("pthread_join");
+			abort();
+		}
+	}
+
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		cpu_set_t pin_mask;
+		struct percpu_buffer_node *node;
+
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+
+		CPU_ZERO(&pin_mask);
+		CPU_SET(i, &pin_mask);
+		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
+
+		while ((node = percpu_buffer_pop(&buffer))) {
+			sum += node->data;
+			free(node);
+		}
+		free(buffer.c[i].array);
+	}
+
+	/*
+	 * All entries should now be accounted for (unless some external
+	 * actor is interfering with our allowed affinity while this
+	 * test is running).
+	 */
+	assert(sum == expected_sum);
+}
+
+bool percpu_memcpy_buffer_push(struct percpu_memcpy_buffer *buffer,
+		struct percpu_memcpy_buffer_node item)
+{
+	char *destptr, *srcptr;
+	size_t copylen;
+	intptr_t *targetptr_final, newval_final;
+	int cpu, ret;
+	intptr_t offset;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load offset with single-copy atomicity. */
+	offset = READ_ONCE(buffer->c[cpu].offset);
+	if (offset == buffer->c[cpu].buflen) {
+		if (unlikely(cpu != rseq_current_cpu_raw()))
+			goto slowpath;
+		return false;
+	}
+	destptr = (char *)&buffer->c[cpu].array[offset];
+	srcptr = (char *)&item;
+	copylen = sizeof(item);
+	newval_final = offset + 1;
+	targetptr_final = &buffer->c[cpu].offset;
+	if (opt_mb)
+		ret = rseq_cmpeqv_trymemcpy_storev_release(targetptr_final,
+			offset, destptr, srcptr, copylen,
+			newval_final, cpu);
+	else
+		ret = rseq_cmpeqv_trymemcpy_storev(targetptr_final,
+			offset, destptr, srcptr, copylen,
+			newval_final, cpu);
+	if (likely(!ret))
+		return true;
+#endif
+slowpath:
+	__attribute__((unused));
+	/* Fallback on cpu_opv system call. */
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load offset with single-copy atomicity. */
+		offset = READ_ONCE(buffer->c[cpu].offset);
+		if (offset == buffer->c[cpu].buflen)
+			return false;
+		destptr = (char *)&buffer->c[cpu].array[offset];
+		srcptr = (char *)&item;
+		copylen = sizeof(item);
+		newval_final = offset + 1;
+		targetptr_final = &buffer->c[cpu].offset;
+		/* copylen must be <= PAGE_SIZE. */
+		if (opt_mb)
+			ret = cpu_op_cmpeqv_memcpy_mb_storev(targetptr_final,
+				offset, destptr, srcptr, copylen,
+				newval_final, cpu);
+		else
+			ret = cpu_op_cmpeqv_memcpy_storev(targetptr_final,
+				offset, destptr, srcptr, copylen,
+				newval_final, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return true;
+}
+
+bool percpu_memcpy_buffer_pop(struct percpu_memcpy_buffer *buffer,
+		struct percpu_memcpy_buffer_node *item)
+{
+	char *destptr, *srcptr;
+	size_t copylen;
+	intptr_t *targetptr_final, newval_final;
+	int cpu, ret;
+	intptr_t offset;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load offset with single-copy atomicity. */
+	offset = READ_ONCE(buffer->c[cpu].offset);
+	if (offset == 0) {
+		if (unlikely(cpu != rseq_current_cpu_raw()))
+			goto slowpath;
+		return false;
+	}
+	destptr = (char *)item;
+	srcptr = (char *)&buffer->c[cpu].array[offset - 1];
+	copylen = sizeof(*item);
+	newval_final = offset - 1;
+	targetptr_final = &buffer->c[cpu].offset;
+	ret = rseq_cmpeqv_trymemcpy_storev(targetptr_final,
+		offset, destptr, srcptr, copylen,
+		newval_final, cpu);
+	if (likely(!ret))
+		return true;
+#endif
+slowpath:
+	__attribute__((unused));
+	/* Fallback on cpu_opv system call. */
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load offset with single-copy atomicity. */
+		offset = READ_ONCE(buffer->c[cpu].offset);
+		if (offset == 0)
+			return false;
+		destptr = (char *)item;
+		srcptr = (char *)&buffer->c[cpu].array[offset - 1];
+		copylen = sizeof(*item);
+		newval_final = offset - 1;
+		targetptr_final = &buffer->c[cpu].offset;
+		/* copylen must be <= PAGE_SIZE. */
+		ret = cpu_op_cmpeqv_memcpy_storev(targetptr_final,
+			offset, destptr, srcptr, copylen,
+			newval_final, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return true;
+}
+
+void *test_percpu_memcpy_buffer_thread(void *arg)
+{
+	long long i, reps;
+	struct percpu_memcpy_buffer *buffer = (struct percpu_memcpy_buffer *)arg;
+
+	if (!opt_disable_rseq && rseq_register_current_thread())
+		abort();
+
+	reps = opt_reps;
+	for (i = 0; i < reps; i++) {
+		struct percpu_memcpy_buffer_node item;
+		bool result;
+
+		result = percpu_memcpy_buffer_pop(buffer, &item);
+		if (opt_yield)
+			sched_yield();  /* encourage shuffling */
+		if (result) {
+			if (!percpu_memcpy_buffer_push(buffer, item)) {
+				/* Should increase buffer size. */
+				abort();
+			}
+		}
+	}
+
+	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
+		(int) gettid(), nr_abort, signals_delivered);
+	if (!opt_disable_rseq && rseq_unregister_current_thread())
+		abort();
+
+	return NULL;
+}
+
+/* Simultaneous modification to a per-cpu buffer from many threads.  */
+void test_percpu_memcpy_buffer(void)
+{
+	const int num_threads = opt_threads;
+	int i, j, ret;
+	uint64_t sum = 0, expected_sum = 0;
+	struct percpu_memcpy_buffer buffer;
+	pthread_t test_threads[num_threads];
+	cpu_set_t allowed_cpus;
+
+	memset(&buffer, 0, sizeof(buffer));
+
+	/* Generate list entries for every usable cpu. */
+	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+		/* Worse-case is every item in same CPU. */
+		buffer.c[i].array =
+			malloc(sizeof(*buffer.c[i].array) * CPU_SETSIZE
+				* MEMCPY_BUFFER_ITEM_PER_CPU);
+		assert(buffer.c[i].array);
+		buffer.c[i].buflen = CPU_SETSIZE * MEMCPY_BUFFER_ITEM_PER_CPU;
+		for (j = 1; j <= MEMCPY_BUFFER_ITEM_PER_CPU; j++) {
+			expected_sum += 2 * j + 1;
+
+			/*
+			 * We could theoretically put the word-sized
+			 * "data" directly in the buffer. However, we
+			 * want to model objects that would not fit
+			 * within a single word, so allocate an object
+			 * for each node.
+			 */
+			buffer.c[i].array[j - 1].data1 = j;
+			buffer.c[i].array[j - 1].data2 = j + 1;
+			buffer.c[i].offset++;
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		ret = pthread_create(&test_threads[i], NULL,
+			test_percpu_memcpy_buffer_thread, &buffer);
+		if (ret) {
+			errno = ret;
+			perror("pthread_create");
+			abort();
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		pthread_join(test_threads[i], NULL);
+		if (ret) {
+			errno = ret;
+			perror("pthread_join");
+			abort();
+		}
+	}
+
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		cpu_set_t pin_mask;
+		struct percpu_memcpy_buffer_node item;
+
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+
+		CPU_ZERO(&pin_mask);
+		CPU_SET(i, &pin_mask);
+		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
+
+		while (percpu_memcpy_buffer_pop(&buffer, &item)) {
+			sum += item.data1;
+			sum += item.data2;
+		}
+		free(buffer.c[i].array);
+	}
+
+	/*
+	 * All entries should now be accounted for (unless some external
+	 * actor is interfering with our allowed affinity while this
+	 * test is running).
+	 */
+	assert(sum == expected_sum);
+}
+
+static void test_signal_interrupt_handler(int signo)
+{
+	signals_delivered++;
+}
+
+static int set_signal_handler(void)
+{
+	int ret = 0;
+	struct sigaction sa;
+	sigset_t sigset;
+
+	ret = sigemptyset(&sigset);
+	if (ret < 0) {
+		perror("sigemptyset");
+		return ret;
+	}
+
+	sa.sa_handler = test_signal_interrupt_handler;
+	sa.sa_mask = sigset;
+	sa.sa_flags = 0;
+	ret = sigaction(SIGUSR1, &sa, NULL);
+	if (ret < 0) {
+		perror("sigaction");
+		return ret;
+	}
+
+	printf_verbose("Signal handler set for SIGUSR1\n");
+
+	return ret;
+}
+
+static void show_usage(int argc, char **argv)
+{
+	printf("Usage : %s <OPTIONS>\n",
+		argv[0]);
+	printf("OPTIONS:\n");
+	printf("	[-1 loops] Number of loops for delay injection 1\n");
+	printf("	[-2 loops] Number of loops for delay injection 2\n");
+	printf("	[-3 loops] Number of loops for delay injection 3\n");
+	printf("	[-4 loops] Number of loops for delay injection 4\n");
+	printf("	[-5 loops] Number of loops for delay injection 5\n");
+	printf("	[-6 loops] Number of loops for delay injection 6\n");
+	printf("	[-7 loops] Number of loops for delay injection 7 (-1 to enable -m)\n");
+	printf("	[-8 loops] Number of loops for delay injection 8 (-1 to enable -m)\n");
+	printf("	[-9 loops] Number of loops for delay injection 9 (-1 to enable -m)\n");
+	printf("	[-m N] Yield/sleep/kill every modulo N (default 0: disabled) (>= 0)\n");
+	printf("	[-y] Yield\n");
+	printf("	[-k] Kill thread with signal\n");
+	printf("	[-s S] S: =0: disabled (default), >0: sleep time (ms)\n");
+	printf("	[-t N] Number of threads (default 200)\n");
+	printf("	[-r N] Number of repetitions per thread (default 5000)\n");
+	printf("	[-d] Disable rseq system call (no initialization)\n");
+	printf("	[-D M] Disable rseq for each M threads\n");
+	printf("	[-T test] Choose test: (s)pinlock, (l)ist, (b)uffer, (m)emcpy, (i)ncrement\n");
+	printf("	[-M] Push into buffer and memcpy buffer with memory barriers.\n");
+	printf("	[-v] Verbose output.\n");
+	printf("	[-h] Show this help.\n");
+	printf("\n");
+}
+
+int main(int argc, char **argv)
+{
+	int i;
+
+	for (i = 1; i < argc; i++) {
+		if (argv[i][0] != '-')
+			continue;
+		switch (argv[i][1]) {
+		case '1':
+		case '2':
+		case '3':
+		case '4':
+		case '5':
+		case '6':
+		case '7':
+		case '8':
+		case '9':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			loop_cnt[argv[i][1] - '0'] = atol(argv[i + 1]);
+			i++;
+			break;
+		case 'm':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_modulo = atol(argv[i + 1]);
+			if (opt_modulo < 0) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 's':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_sleep = atol(argv[i + 1]);
+			if (opt_sleep < 0) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 'y':
+			opt_yield = 1;
+			break;
+		case 'k':
+			opt_signal = 1;
+			break;
+		case 'd':
+			opt_disable_rseq = 1;
+			break;
+		case 'D':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_disable_mod = atol(argv[i + 1]);
+			if (opt_disable_mod < 0) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 't':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_threads = atol(argv[i + 1]);
+			if (opt_threads < 0) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 'r':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_reps = atoll(argv[i + 1]);
+			if (opt_reps < 0) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 'h':
+			show_usage(argc, argv);
+			goto end;
+		case 'T':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_test = *argv[i + 1];
+			switch (opt_test) {
+			case 's':
+			case 'l':
+			case 'i':
+			case 'b':
+			case 'm':
+				break;
+			default:
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 'v':
+			verbose = 1;
+			break;
+		case 'M':
+			opt_mb = 1;
+			break;
+		default:
+			show_usage(argc, argv);
+			goto error;
+		}
+	}
+
+	if (set_signal_handler())
+		goto error;
+
+	if (!opt_disable_rseq && rseq_register_current_thread())
+		goto error;
+	switch (opt_test) {
+	case 's':
+		printf_verbose("spinlock\n");
+		test_percpu_spinlock();
+		break;
+	case 'l':
+		printf_verbose("linked list\n");
+		test_percpu_list();
+		break;
+	case 'b':
+		printf_verbose("buffer\n");
+		test_percpu_buffer();
+		break;
+	case 'm':
+		printf_verbose("memcpy buffer\n");
+		test_percpu_memcpy_buffer();
+		break;
+	case 'i':
+		printf_verbose("counter increment\n");
+		test_percpu_inc();
+		break;
+	}
+	if (!opt_disable_rseq && rseq_unregister_current_thread())
+		abort();
+end:
+	return 0;
+
+error:
+	return -1;
+}
diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h
new file mode 100644
index 000000000000..47953c0cef4f
--- /dev/null
+++ b/tools/testing/selftests/rseq/rseq-arm.h
@@ -0,0 +1,535 @@
+/*
+ * rseq-arm.h
+ *
+ * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#define RSEQ_SIG	0x53053053
+
+#define rseq_smp_mb()	__asm__ __volatile__ ("dmb" : : : "memory")
+#define rseq_smp_rmb()	__asm__ __volatile__ ("dmb" : : : "memory")
+#define rseq_smp_wmb()	__asm__ __volatile__ ("dmb" : : : "memory")
+
+#define rseq_smp_load_acquire(p)					\
+__extension__ ({							\
+	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
+	rseq_smp_mb();							\
+	____p1;								\
+})
+
+#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
+
+#define rseq_smp_store_release(p, v)					\
+do {									\
+	rseq_smp_mb();							\
+	WRITE_ONCE(*p, v);						\
+} while (0)
+
+#define RSEQ_ASM_DEFINE_TABLE(section, version, flags,			\
+			start_ip, post_commit_offset, abort_ip)		\
+		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
+		".balign 32\n\t"					\
+		".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
+		".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
+		".popsection\n\t"
+
+#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
+		RSEQ_INJECT_ASM(1)					\
+		"adr r0, " __rseq_str(cs_label) "\n\t"			\
+		"str r0, %[" __rseq_str(rseq_cs) "]\n\t"		\
+		__rseq_str(label) ":\n\t"
+
+#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
+		RSEQ_INJECT_ASM(2)					\
+		"ldr r0, %[" __rseq_str(current_cpu_id) "]\n\t"	\
+		"cmp %[" __rseq_str(cpu_id) "], r0\n\t"		\
+		"bne " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_DEFINE_ABORT(table_label, label, section, sig,		\
+			teardown, abort_label, version, flags, start_ip,\
+			post_commit_offset, abort_ip)			\
+		__rseq_str(table_label) ":\n\t"				\
+		".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
+		".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
+		".word " __rseq_str(RSEQ_SIG) "\n\t"			\
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"b %l[" __rseq_str(abort_label) "]\n\t"
+
+#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"b %l[" __rseq_str(cmpfail_label) "]\n\t"
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expectnot], r0\n\t"
+		"beq 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"str r0, %[load]\n\t"
+		"add r0, %[voffp]\n\t"
+		"ldr r0, [r0]\n\t"
+		/* final store */
+		"str r0, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expectnot]"r"(expectnot),
+		  [voffp]"Ir"(voffp),
+		  [load]"m"(*load)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_addv(intptr_t *v, intptr_t count, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"add r0, %[count]\n\t"
+		/* final store */
+		"str r0, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(4)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  [v]"m"(*v),
+		  [count]"Ir"(count)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		"str %[newv2], %[v2]\n\t"
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		"str %[newv2], %[v2]\n\t"
+		RSEQ_INJECT_ASM(5)
+		"dmb\n\t"	/* full mb provides store-release */
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"ldr r0, %[v2]\n\t"
+		"cmp %[expect2], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* cmp2 input */
+		  [v2]"m"(*v2),
+		  [expect2]"r"(expect2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	uint32_t rseq_scratch[3];
+
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		"str %[src], %[rseq_scratch0]\n\t"
+		"str %[dst], %[rseq_scratch1]\n\t"
+		"str %[len], %[rseq_scratch2]\n\t"
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		"cmp %[len], #0\n\t" \
+		"beq 333f\n\t" \
+		"222:\n\t" \
+		"ldrb %%r0, [%[src]]\n\t" \
+		"strb %%r0, [%[dst]]\n\t" \
+		"adds %[src], #1\n\t" \
+		"adds %[dst], #1\n\t" \
+		"subs %[len], #1\n\t" \
+		"bne 222b\n\t" \
+		"333:\n\t" \
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		"ldr %[len], %[rseq_scratch2]\n\t"
+		"ldr %[dst], %[rseq_scratch1]\n\t"
+		"ldr %[src], %[rseq_scratch0]\n\t"
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG,
+			/* teardown */
+			"ldr %[len], %[rseq_scratch2]\n\t"
+			"ldr %[dst], %[rseq_scratch1]\n\t"
+			"ldr %[src], %[rseq_scratch0]\n\t",
+			abort, 0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
+			/* teardown */
+			"ldr %[len], %[rseq_scratch2]\n\t"
+			"ldr %[dst], %[rseq_scratch1]\n\t"
+			"ldr %[src], %[rseq_scratch0]\n\t",
+			cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len),
+		  [rseq_scratch0]"m"(rseq_scratch[0]),
+		  [rseq_scratch1]"m"(rseq_scratch[1]),
+		  [rseq_scratch2]"m"(rseq_scratch[2])
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	uint32_t rseq_scratch[3];
+
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		"str %[src], %[rseq_scratch0]\n\t"
+		"str %[dst], %[rseq_scratch1]\n\t"
+		"str %[len], %[rseq_scratch2]\n\t"
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		"cmp %[len], #0\n\t" \
+		"beq 333f\n\t" \
+		"222:\n\t" \
+		"ldrb %%r0, [%[src]]\n\t" \
+		"strb %%r0, [%[dst]]\n\t" \
+		"adds %[src], #1\n\t" \
+		"adds %[dst], #1\n\t" \
+		"subs %[len], #1\n\t" \
+		"bne 222b\n\t" \
+		"333:\n\t" \
+		RSEQ_INJECT_ASM(5)
+		"dmb\n\t"	/* full mb provides store-release */
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		"ldr %[len], %[rseq_scratch2]\n\t"
+		"ldr %[dst], %[rseq_scratch1]\n\t"
+		"ldr %[src], %[rseq_scratch0]\n\t"
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG,
+			/* teardown */
+			"ldr %[len], %[rseq_scratch2]\n\t"
+			"ldr %[dst], %[rseq_scratch1]\n\t"
+			"ldr %[src], %[rseq_scratch0]\n\t",
+			abort, 0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
+			/* teardown */
+			"ldr %[len], %[rseq_scratch2]\n\t"
+			"ldr %[dst], %[rseq_scratch1]\n\t"
+			"ldr %[src], %[rseq_scratch0]\n\t",
+			cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len),
+		  [rseq_scratch0]"m"(rseq_scratch[0]),
+		  [rseq_scratch1]"m"(rseq_scratch[1]),
+		  [rseq_scratch2]"m"(rseq_scratch[2])
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
diff --git a/tools/testing/selftests/rseq/rseq-ppc.h b/tools/testing/selftests/rseq/rseq-ppc.h
new file mode 100644
index 000000000000..3db6be5ceffb
--- /dev/null
+++ b/tools/testing/selftests/rseq/rseq-ppc.h
@@ -0,0 +1,567 @@
+/*
+ * rseq-ppc.h
+ *
+ * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ * (C) Copyright 2016 - Boqun Feng <boqun.feng@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#define RSEQ_SIG	0x53053053
+
+#define rseq_smp_mb()		__asm__ __volatile__ ("sync" : : : "memory")
+#define rseq_smp_lwsync()	__asm__ __volatile__ ("lwsync" : : : "memory")
+#define rseq_smp_rmb()		rseq_smp_lwsync()
+#define rseq_smp_wmb()		rseq_smp_lwsync()
+
+#define rseq_smp_load_acquire(p)					\
+__extension__ ({							\
+	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
+	rseq_smp_lwsync();						\
+	____p1;								\
+})
+
+#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_lwsync()
+
+#define rseq_smp_store_release(p, v)					\
+do {									\
+	rseq_smp_lwsync();						\
+	RSEQ_WRITE_ONCE(*p, v);						\
+} while (0)
+
+/*
+ * The __rseq_table section can be used by debuggers to better handle
+ * single-stepping through the restartable critical sections.
+ */
+
+#ifdef __PPC64__
+
+#define STORE_WORD	"std "
+#define LOAD_WORD	"ld "
+#define LOADX_WORD	"ldx "
+#define CMP_WORD	"cmpd "
+
+#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,			\
+			start_ip, post_commit_offset, abort_ip)			\
+		".pushsection " __rseq_str(section) ", \"aw\"\n\t"		\
+		".balign 32\n\t"						\
+		__rseq_str(label) ":\n\t"					\
+		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t"	\
+		".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \
+		".popsection\n\t"
+
+#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)			\
+		RSEQ_INJECT_ASM(1)						\
+		"lis %%r17, (" __rseq_str(cs_label) ")@highest\n\t"		\
+		"ori %%r17, %%r17, (" __rseq_str(cs_label) ")@higher\n\t"	\
+		"rldicr %%r17, %%r17, 32, 31\n\t"				\
+		"oris %%r17, %%r17, (" __rseq_str(cs_label) ")@high\n\t"	\
+		"ori %%r17, %%r17, (" __rseq_str(cs_label) ")@l\n\t"		\
+		"std %%r17, %[" __rseq_str(rseq_cs) "]\n\t"			\
+		__rseq_str(label) ":\n\t"
+
+#else /* #ifdef __PPC64__ */
+
+#define STORE_WORD	"stw "
+#define LOAD_WORD	"lwz "
+#define LOADX_WORD	"lwzx "
+#define CMP_WORD	"cmpw "
+
+#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,			\
+			start_ip, post_commit_offset, abort_ip)			\
+		".pushsection " __rseq_str(section) ", \"aw\"\n\t"		\
+		".balign 32\n\t"						\
+		__rseq_str(label) ":\n\t"					\
+		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t"	\
+		/* 32-bit only supported on BE */				\
+		".long 0x0, " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) "\n\t" \
+		".popsection\n\t"
+
+#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)			\
+		RSEQ_INJECT_ASM(1)						\
+		"lis %%r17, (" __rseq_str(cs_label) ")@ha\n\t"			\
+		"addi %%r17, %%r17, (" __rseq_str(cs_label) ")@l\n\t"		\
+		"stw %%r17, %[" __rseq_str(rseq_cs) "]\n\t"			\
+		__rseq_str(label) ":\n\t"
+
+#endif /* #ifdef __PPC64__ */
+
+#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)			\
+		RSEQ_INJECT_ASM(2)						\
+		"lwz %%r17, %[" __rseq_str(current_cpu_id) "]\n\t"		\
+		"cmpw cr7, %[" __rseq_str(cpu_id) "], %%r17\n\t"		\
+		"bne- cr7, " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label)	\
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"		\
+		".long " __rseq_str(sig) "\n\t"					\
+		__rseq_str(label) ":\n\t"					\
+		teardown							\
+		"b %l[" __rseq_str(abort_label) "]\n\t"			\
+		".popsection\n\t"
+
+#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label)	\
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"		\
+		__rseq_str(label) ":\n\t"					\
+		teardown							\
+		"b %l[" __rseq_str(cmpfail_label) "]\n\t"			\
+		".popsection\n\t"
+
+
+/*
+ * RSEQ_ASM_OPs: asm operations for rseq
+ * 	RSEQ_ASM_OP_R_*: has hard-code registers in it
+ * 	RSEQ_ASM_OP_* (else): doesn't have hard-code registers(unless cr7)
+ */
+#define RSEQ_ASM_OP_CMPEQ(var, expect, label)					\
+		LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t"			\
+		CMP_WORD "cr7, %%r17, %[" __rseq_str(expect) "]\n\t"		\
+		"bne- cr7, " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_OP_CMPNE(var, expectnot, label)				\
+		LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t"			\
+		CMP_WORD "cr7, %%r17, %[" __rseq_str(expectnot) "]\n\t"	\
+		"beq- cr7, " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_OP_STORE(value, var)						\
+		STORE_WORD "%[" __rseq_str(value) "], %[" __rseq_str(var) "]\n\t"
+
+/* Load @var to r17 */
+#define RSEQ_ASM_OP_R_LOAD(var)							\
+		LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t"
+
+/* Store r17 to @var */
+#define RSEQ_ASM_OP_R_STORE(var)						\
+		STORE_WORD "%%r17, %[" __rseq_str(var) "]\n\t"
+
+/* Add @count to r17 */
+#define RSEQ_ASM_OP_R_ADD(count)						\
+		"add %%r17, %[" __rseq_str(count) "], %%r17\n\t"
+
+/* Load (r17 + voffp) to r17 */
+#define RSEQ_ASM_OP_R_LOADX(voffp)						\
+		LOADX_WORD "%%r17, %[" __rseq_str(voffp) "], %%r17\n\t"
+
+/* TODO: implement a faster memcpy. */
+#define RSEQ_ASM_OP_R_MEMCPY() \
+		"cmpdi %%r19, 0\n\t" \
+		"beq 333f\n\t" \
+		"addi %%r20, %%r20, -1\n\t" \
+		"addi %%r21, %%r21, -1\n\t" \
+		"222:\n\t" \
+		"lbzu %%r18, 1(%%r20)\n\t" \
+		"stbu %%r18, 1(%%r21)\n\t" \
+		"addi %%r19, %%r19, -1\n\t" \
+		"cmpdi %%r19, 0\n\t" \
+		"bne 222b\n\t" \
+		"333:\n\t" \
+
+#define RSEQ_ASM_OP_R_FINAL_STORE(var, post_commit_label)			\
+		STORE_WORD "%%r17, %[" __rseq_str(var) "]\n\t"			\
+		__rseq_str(post_commit_label) ":\n\t"
+
+#define RSEQ_ASM_OP_FINAL_STORE(value, var, post_commit_label)			\
+		STORE_WORD "%[" __rseq_str(value) "], %[" __rseq_str(var) "]\n\t"	\
+		__rseq_str(post_commit_label) ":\n\t"
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v not equal to @expectnot */
+		RSEQ_ASM_OP_CMPNE(v, expectnot, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* load the value of @v */
+		RSEQ_ASM_OP_R_LOAD(v)
+		/* store it in @load */
+		RSEQ_ASM_OP_R_STORE(load)
+		/* dereference voffp(v) */
+		RSEQ_ASM_OP_R_LOADX(voffp)
+		/* final store the value at voffp(v) */
+		RSEQ_ASM_OP_R_FINAL_STORE(v, 2)
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expectnot]"r"(expectnot),
+		  [voffp]"b"(voffp),
+		  [load]"m"(*load)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_addv(intptr_t *v, intptr_t count, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* load the value of @v */
+		RSEQ_ASM_OP_R_LOAD(v)
+		/* add @count to it */
+		RSEQ_ASM_OP_R_ADD(count)
+		/* final store */
+		RSEQ_ASM_OP_R_FINAL_STORE(v, 2)
+		RSEQ_INJECT_ASM(4)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [count]"r"(count)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		RSEQ_ASM_OP_STORE(newv2, v2)
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		RSEQ_ASM_OP_STORE(newv2, v2)
+		RSEQ_INJECT_ASM(5)
+		/* for 'release' */
+		"lwsync\n\t"
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* cmp @v2 equal to @expct2 */
+		RSEQ_ASM_OP_CMPEQ(v2, expect2, 5f)
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* cmp2 input */
+		  [v2]"m"(*v2),
+		  [expect2]"r"(expect2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		/* setup for mempcy */
+		"mr %%r19, %[len]\n\t" \
+		"mr %%r20, %[src]\n\t" \
+		"mr %%r21, %[dst]\n\t" \
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		RSEQ_ASM_OP_R_MEMCPY()
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17", "r18", "r19", "r20", "r21"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		/* setup for mempcy */
+		"mr %%r19, %[len]\n\t" \
+		"mr %%r20, %[src]\n\t" \
+		"mr %%r21, %[dst]\n\t" \
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		RSEQ_ASM_OP_R_MEMCPY()
+		RSEQ_INJECT_ASM(5)
+		/* for 'release' */
+		"lwsync\n\t"
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17", "r18", "r19", "r20", "r21"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+#undef STORE_WORD
+#undef LOAD_WORD
+#undef LOADX_WORD
+#undef CMP_WORD
diff --git a/tools/testing/selftests/rseq/rseq-x86.h b/tools/testing/selftests/rseq/rseq-x86.h
new file mode 100644
index 000000000000..63e81d6c61fa
--- /dev/null
+++ b/tools/testing/selftests/rseq/rseq-x86.h
@@ -0,0 +1,898 @@
+/*
+ * rseq-x86.h
+ *
+ * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <stdint.h>
+
+#define RSEQ_SIG	0x53053053
+
+#ifdef __x86_64__
+
+#define rseq_smp_mb()	__asm__ __volatile__ ("mfence" : : : "memory")
+#define rseq_smp_rmb()	barrier()
+#define rseq_smp_wmb()	barrier()
+
+#define rseq_smp_load_acquire(p)					\
+__extension__ ({							\
+	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
+	barrier();							\
+	____p1;								\
+})
+
+#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
+
+#define rseq_smp_store_release(p, v)					\
+do {									\
+	barrier();							\
+	RSEQ_WRITE_ONCE(*p, v);						\
+} while (0)
+
+#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,		\
+			start_ip, post_commit_offset, abort_ip)		\
+		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
+		".balign 32\n\t"					\
+		__rseq_str(label) ":\n\t"				\
+		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
+		".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \
+		".popsection\n\t"
+
+#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
+		RSEQ_INJECT_ASM(1)					\
+		"leaq " __rseq_str(cs_label) "(%%rip), %%rax\n\t"	\
+		"movq %%rax, %[" __rseq_str(rseq_cs) "]\n\t"		\
+		__rseq_str(label) ":\n\t"
+
+#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
+		RSEQ_INJECT_ASM(2)					\
+		"cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \
+		"jnz " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label) \
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
+		/* Disassembler-friendly signature: nopl <sig>(%rip). */\
+		".byte 0x0f, 0x1f, 0x05\n\t"				\
+		".long " __rseq_str(sig) "\n\t"			\
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"jmp %l[" __rseq_str(abort_label) "]\n\t"		\
+		".popsection\n\t"
+
+#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"jmp %l[" __rseq_str(cmpfail_label) "]\n\t"		\
+		".popsection\n\t"
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpq %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* final store */
+		"movq %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpq %[v], %[expectnot]\n\t"
+		"jz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"movq %[v], %%rax\n\t"
+		"movq %%rax, %[load]\n\t"
+		"addq %[voffp], %%rax\n\t"
+		"movq (%%rax), %%rax\n\t"
+		/* final store */
+		"movq %%rax, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expectnot]"r"(expectnot),
+		  [voffp]"er"(voffp),
+		  [load]"m"(*load)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_addv(intptr_t *v, intptr_t count, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* final store */
+		"addq %[count], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(4)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [count]"er"(count)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpq %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		"movq %[newv2], %[v2]\n\t"
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"movq %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+/* x86-64 is TSO. */
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	return rseq_cmpeqv_trystorev_storev(v, expect, v2, newv2,
+			newv, cpu);
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpq %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"cmpq %[v2], %[expect2]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"movq %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* cmp2 input */
+		  [v2]"m"(*v2),
+		  [expect2]"r"(expect2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	uint64_t rseq_scratch[3];
+
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		"movq %[src], %[rseq_scratch0]\n\t"
+		"movq %[dst], %[rseq_scratch1]\n\t"
+		"movq %[len], %[rseq_scratch2]\n\t"
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpq %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		"test %[len], %[len]\n\t" \
+		"jz 333f\n\t" \
+		"222:\n\t" \
+		"movb (%[src]), %%al\n\t" \
+		"movb %%al, (%[dst])\n\t" \
+		"inc %[src]\n\t" \
+		"inc %[dst]\n\t" \
+		"dec %[len]\n\t" \
+		"jnz 222b\n\t" \
+		"333:\n\t" \
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"movq %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		"movq %[rseq_scratch2], %[len]\n\t"
+		"movq %[rseq_scratch1], %[dst]\n\t"
+		"movq %[rseq_scratch0], %[src]\n\t"
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG,
+			"movq %[rseq_scratch2], %[len]\n\t"
+			"movq %[rseq_scratch1], %[dst]\n\t"
+			"movq %[rseq_scratch0], %[src]\n\t",
+			abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
+			"movq %[rseq_scratch2], %[len]\n\t"
+			"movq %[rseq_scratch1], %[dst]\n\t"
+			"movq %[rseq_scratch0], %[src]\n\t",
+			cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len),
+		  [rseq_scratch0]"m"(rseq_scratch[0]),
+		  [rseq_scratch1]"m"(rseq_scratch[1]),
+		  [rseq_scratch2]"m"(rseq_scratch[2])
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+/* x86-64 is TSO. */
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	return rseq_cmpeqv_trymemcpy_storev(v, expect, dst, src,
+			len, newv, cpu);
+}
+
+#elif __i386__
+
+/*
+ * Support older 32-bit architectures that do not implement fence
+ * instructions.
+ */
+#define rseq_smp_mb()	\
+	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
+#define rseq_smp_rmb()	\
+	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
+#define rseq_smp_wmb()	\
+	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
+
+#define rseq_smp_load_acquire(p)					\
+__extension__ ({							\
+	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
+	rseq_smp_mb();							\
+	____p1;								\
+})
+
+#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
+
+#define rseq_smp_store_release(p, v)					\
+do {									\
+	rseq_smp_mb();							\
+	RSEQ_WRITE_ONCE(*p, v);						\
+} while (0)
+
+/*
+ * Use eax as scratch register and take memory operands as input to
+ * lessen register pressure. Especially needed when compiling in O0.
+ */
+#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,		\
+			start_ip, post_commit_offset, abort_ip)		\
+		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
+		".balign 32\n\t"					\
+		__rseq_str(label) ":\n\t"				\
+		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
+		".long " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
+		".popsection\n\t"
+
+#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
+		RSEQ_INJECT_ASM(1)					\
+		"movl $" __rseq_str(cs_label) ", %[rseq_cs]\n\t"	\
+		__rseq_str(label) ":\n\t"
+
+#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
+		RSEQ_INJECT_ASM(2)					\
+		"cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \
+		"jnz " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label) \
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
+		/* Disassembler-friendly signature: nopl <sig>. */\
+		".byte 0x0f, 0x1f, 0x05\n\t"				\
+		".long " __rseq_str(sig) "\n\t"			\
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"jmp %l[" __rseq_str(abort_label) "]\n\t"		\
+		".popsection\n\t"
+
+#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"jmp %l[" __rseq_str(cmpfail_label) "]\n\t"		\
+		".popsection\n\t"
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpl %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* final store */
+		"movl %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpl %[v], %[expectnot]\n\t"
+		"jz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"movl %[v], %%eax\n\t"
+		"movl %%eax, %[load]\n\t"
+		"addl %[voffp], %%eax\n\t"
+		"movl (%%eax), %%eax\n\t"
+		/* final store */
+		"movl %%eax, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expectnot]"r"(expectnot),
+		  [voffp]"ir"(voffp),
+		  [load]"m"(*load)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_addv(intptr_t *v, intptr_t count, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* final store */
+		"addl %[count], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(4)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [count]"ir"(count)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpl %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		"movl %[newv2], %%eax\n\t"
+		"movl %%eax, %[v2]\n\t"
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"movl %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"m"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"movl %[expect], %%eax\n\t"
+		"cmpl %[v], %%eax\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		"movl %[newv2], %[v2]\n\t"
+		RSEQ_INJECT_ASM(5)
+		"lock; addl $0,0(%%esp)\n\t"
+		/* final store */
+		"movl %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"m"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpl %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"cmpl %[expect2], %[v2]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(5)
+		"movl %[newv], %%eax\n\t"
+		/* final store */
+		"movl %%eax, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* cmp2 input */
+		  [v2]"m"(*v2),
+		  [expect2]"r"(expect2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"m"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+/* TODO: implement a faster memcpy. */
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	uint32_t rseq_scratch[3];
+
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		"movl %[src], %[rseq_scratch0]\n\t"
+		"movl %[dst], %[rseq_scratch1]\n\t"
+		"movl %[len], %[rseq_scratch2]\n\t"
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"movl %[expect], %%eax\n\t"
+		"cmpl %%eax, %[v]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		"test %[len], %[len]\n\t" \
+		"jz 333f\n\t" \
+		"222:\n\t" \
+		"movb (%[src]), %%al\n\t" \
+		"movb %%al, (%[dst])\n\t" \
+		"inc %[src]\n\t" \
+		"inc %[dst]\n\t" \
+		"dec %[len]\n\t" \
+		"jnz 222b\n\t" \
+		"333:\n\t" \
+		RSEQ_INJECT_ASM(5)
+		"movl %[newv], %%eax\n\t"
+		/* final store */
+		"movl %%eax, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		"movl %[rseq_scratch2], %[len]\n\t"
+		"movl %[rseq_scratch1], %[dst]\n\t"
+		"movl %[rseq_scratch0], %[src]\n\t"
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG,
+			"movl %[rseq_scratch2], %[len]\n\t"
+			"movl %[rseq_scratch1], %[dst]\n\t"
+			"movl %[rseq_scratch0], %[src]\n\t",
+			abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
+			"movl %[rseq_scratch2], %[len]\n\t"
+			"movl %[rseq_scratch1], %[dst]\n\t"
+			"movl %[rseq_scratch0], %[src]\n\t",
+			cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"m"(expect),
+		  [newv]"m"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len),
+		  [rseq_scratch0]"m"(rseq_scratch[0]),
+		  [rseq_scratch1]"m"(rseq_scratch[1]),
+		  [rseq_scratch2]"m"(rseq_scratch[2])
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+/* TODO: implement a faster memcpy. */
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	uint32_t rseq_scratch[3];
+
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		"movl %[src], %[rseq_scratch0]\n\t"
+		"movl %[dst], %[rseq_scratch1]\n\t"
+		"movl %[len], %[rseq_scratch2]\n\t"
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"movl %[expect], %%eax\n\t"
+		"cmpl %%eax, %[v]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		"test %[len], %[len]\n\t" \
+		"jz 333f\n\t" \
+		"222:\n\t" \
+		"movb (%[src]), %%al\n\t" \
+		"movb %%al, (%[dst])\n\t" \
+		"inc %[src]\n\t" \
+		"inc %[dst]\n\t" \
+		"dec %[len]\n\t" \
+		"jnz 222b\n\t" \
+		"333:\n\t" \
+		RSEQ_INJECT_ASM(5)
+		"lock; addl $0,0(%%esp)\n\t"
+		"movl %[newv], %%eax\n\t"
+		/* final store */
+		"movl %%eax, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		"movl %[rseq_scratch2], %[len]\n\t"
+		"movl %[rseq_scratch1], %[dst]\n\t"
+		"movl %[rseq_scratch0], %[src]\n\t"
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG,
+			"movl %[rseq_scratch2], %[len]\n\t"
+			"movl %[rseq_scratch1], %[dst]\n\t"
+			"movl %[rseq_scratch0], %[src]\n\t",
+			abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
+			"movl %[rseq_scratch2], %[len]\n\t"
+			"movl %[rseq_scratch1], %[dst]\n\t"
+			"movl %[rseq_scratch0], %[src]\n\t",
+			cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"m"(expect),
+		  [newv]"m"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len),
+		  [rseq_scratch0]"m"(rseq_scratch[0]),
+		  [rseq_scratch1]"m"(rseq_scratch[1]),
+		  [rseq_scratch2]"m"(rseq_scratch[2])
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+#endif
diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c
new file mode 100644
index 000000000000..b83d3196c33e
--- /dev/null
+++ b/tools/testing/selftests/rseq/rseq.c
@@ -0,0 +1,116 @@
+/*
+ * rseq.c
+ *
+ * Copyright (C) 2016 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; only
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <syscall.h>
+#include <assert.h>
+#include <signal.h>
+
+#include "rseq.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+__attribute__((tls_model("initial-exec"))) __thread
+volatile struct rseq __rseq_abi = {
+	.cpu_id = RSEQ_CPU_ID_UNINITIALIZED,
+};
+
+static __attribute__((tls_model("initial-exec"))) __thread
+volatile int refcount;
+
+static void signal_off_save(sigset_t *oldset)
+{
+	sigset_t set;
+	int ret;
+
+	sigfillset(&set);
+	ret = pthread_sigmask(SIG_BLOCK, &set, oldset);
+	if (ret)
+		abort();
+}
+
+static void signal_restore(sigset_t oldset)
+{
+	int ret;
+
+	ret = pthread_sigmask(SIG_SETMASK, &oldset, NULL);
+	if (ret)
+		abort();
+}
+
+static int sys_rseq(volatile struct rseq *rseq_abi, uint32_t rseq_len,
+		int flags, uint32_t sig)
+{
+	return syscall(__NR_rseq, rseq_abi, rseq_len, flags, sig);
+}
+
+int rseq_register_current_thread(void)
+{
+	int rc, ret = 0;
+	sigset_t oldset;
+
+	signal_off_save(&oldset);
+	if (refcount++)
+		goto end;
+	rc = sys_rseq(&__rseq_abi, sizeof(struct rseq), 0, RSEQ_SIG);
+	if (!rc) {
+		assert(rseq_current_cpu_raw() >= 0);
+		goto end;
+	}
+	if (errno != EBUSY)
+		__rseq_abi.cpu_id = -2;
+	ret = -1;
+	refcount--;
+end:
+	signal_restore(oldset);
+	return ret;
+}
+
+int rseq_unregister_current_thread(void)
+{
+	int rc, ret = 0;
+	sigset_t oldset;
+
+	signal_off_save(&oldset);
+	if (--refcount)
+		goto end;
+	rc = sys_rseq(&__rseq_abi, sizeof(struct rseq),
+			RSEQ_FLAG_UNREGISTER, RSEQ_SIG);
+	if (!rc)
+		goto end;
+	ret = -1;
+end:
+	signal_restore(oldset);
+	return ret;
+}
+
+int32_t rseq_fallback_current_cpu(void)
+{
+	int32_t cpu;
+
+	cpu = sched_getcpu();
+	if (cpu < 0) {
+		perror("sched_getcpu()");
+		abort();
+	}
+	return cpu;
+}
diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
new file mode 100644
index 000000000000..26c8ea01e940
--- /dev/null
+++ b/tools/testing/selftests/rseq/rseq.h
@@ -0,0 +1,154 @@
+/*
+ * rseq.h
+ *
+ * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RSEQ_H
+#define RSEQ_H
+
+#include <stdint.h>
+#include <stdbool.h>
+#include <pthread.h>
+#include <signal.h>
+#include <sched.h>
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sched.h>
+#include <linux/rseq.h>
+
+/*
+ * Empty code injection macros, override when testing.
+ * It is important to consider that the ASM injection macros need to be
+ * fully reentrant (e.g. do not modify the stack).
+ */
+#ifndef RSEQ_INJECT_ASM
+#define RSEQ_INJECT_ASM(n)
+#endif
+
+#ifndef RSEQ_INJECT_C
+#define RSEQ_INJECT_C(n)
+#endif
+
+#ifndef RSEQ_INJECT_INPUT
+#define RSEQ_INJECT_INPUT
+#endif
+
+#ifndef RSEQ_INJECT_CLOBBER
+#define RSEQ_INJECT_CLOBBER
+#endif
+
+#ifndef RSEQ_INJECT_FAILED
+#define RSEQ_INJECT_FAILED
+#endif
+
+extern __thread volatile struct rseq __rseq_abi;
+
+#define rseq_likely(x)		__builtin_expect(!!(x), 1)
+#define rseq_unlikely(x)	__builtin_expect(!!(x), 0)
+#define rseq_barrier()		__asm__ __volatile__("" : : : "memory")
+
+#define RSEQ_ACCESS_ONCE(x)	(*(__volatile__  __typeof__(x) *)&(x))
+#define RSEQ_WRITE_ONCE(x, v)	__extension__ ({ RSEQ_ACCESS_ONCE(x) = (v); })
+#define RSEQ_READ_ONCE(x)	RSEQ_ACCESS_ONCE(x)
+
+#define __rseq_str_1(x)	#x
+#define __rseq_str(x)		__rseq_str_1(x)
+
+#if defined(__x86_64__) || defined(__i386__)
+#include <rseq-x86.h>
+#elif defined(__ARMEL__)
+#include <rseq-arm.h>
+#elif defined(__PPC__)
+#include <rseq-ppc.h>
+#else
+#error unsupported target
+#endif
+
+/*
+ * Register rseq for the current thread. This needs to be called once
+ * by any thread which uses restartable sequences, before they start
+ * using restartable sequences, to ensure restartable sequences
+ * succeed. A restartable sequence executed from a non-registered
+ * thread will always fail.
+ */
+int rseq_register_current_thread(void);
+
+/*
+ * Unregister rseq for current thread.
+ */
+int rseq_unregister_current_thread(void);
+
+/*
+ * Restartable sequence fallback for reading the current CPU number.
+ */
+int32_t rseq_fallback_current_cpu(void);
+
+/*
+ * Values returned can be either the current CPU number, -1 (rseq is
+ * uninitialized), or -2 (rseq initialization has failed).
+ */
+static inline int32_t rseq_current_cpu_raw(void)
+{
+	return RSEQ_ACCESS_ONCE(__rseq_abi.cpu_id);
+}
+
+/*
+ * Returns a possible CPU number, which is typically the current CPU.
+ * The returned CPU number can be used to prepare for an rseq critical
+ * section, which will confirm whether the cpu number is indeed the
+ * current one, and whether rseq is initialized.
+ *
+ * The CPU number returned by rseq_cpu_start should always be validated
+ * by passing it to a rseq asm sequence, or by comparing it to the
+ * return value of rseq_current_cpu_raw() if the rseq asm sequence
+ * does not need to be invoked.
+ */
+static inline uint32_t rseq_cpu_start(void)
+{
+	return RSEQ_ACCESS_ONCE(__rseq_abi.cpu_id_start);
+}
+
+static inline uint32_t rseq_current_cpu(void)
+{
+	int32_t cpu;
+
+	cpu = rseq_current_cpu_raw();
+	if (rseq_unlikely(cpu < 0))
+		cpu = rseq_fallback_current_cpu();
+	return cpu;
+}
+
+/*
+ * rseq_prepare_unload() should be invoked by each thread using rseq_finish*()
+ * at least once between their last rseq_finish*() and library unload of the
+ * library defining the rseq critical section (struct rseq_cs). This also
+ * applies to use of rseq in code generated by JIT: rseq_prepare_unload()
+ * should be invoked at least once by each thread using rseq_finish*() before
+ * reclaim of the memory holding the struct rseq_cs.
+ */
+static inline void rseq_prepare_unload(void)
+{
+	__rseq_abi.rseq_cs = 0;
+}
+
+#endif  /* RSEQ_H_ */
diff --git a/tools/testing/selftests/rseq/run_param_test.sh b/tools/testing/selftests/rseq/run_param_test.sh
new file mode 100755
index 000000000000..41c83d3f62a3
--- /dev/null
+++ b/tools/testing/selftests/rseq/run_param_test.sh
@@ -0,0 +1,126 @@
+#!/bin/bash
+
+EXTRA_ARGS=${@}
+
+OLDIFS="$IFS"
+IFS=$'\n'
+TEST_LIST=(
+	"-T s"
+	"-T l"
+	"-T b"
+	"-T b -M"
+	"-T m"
+	"-T m -M"
+	"-T i"
+)
+
+TEST_NAME=(
+	"spinlock"
+	"list"
+	"buffer"
+	"buffer with barrier"
+	"memcpy"
+	"memcpy with barrier"
+	"increment"
+)
+IFS="$OLDIFS"
+
+function do_tests()
+{
+	local i=0
+	while [ "$i" -lt "${#TEST_LIST[@]}" ]; do
+		echo "Running test ${TEST_NAME[$i]}"
+		./param_test ${TEST_LIST[$i]} ${@} ${EXTRA_ARGS} || exit 1
+		echo "Running skip fast-path test ${TEST_NAME[$i]}"
+		./param_test_skip_fastpath ${TEST_LIST[$i]} ${@} ${EXTRA_ARGS} || exit 1
+		let "i++"
+	done
+}
+
+echo "Default parameters"
+do_tests
+
+echo "Loop injection: 10000 loops"
+
+OLDIFS="$IFS"
+IFS=$'\n'
+INJECT_LIST=(
+	"1"
+	"2"
+	"3"
+	"4"
+	"5"
+	"6"
+	"7"
+	"8"
+	"9"
+)
+IFS="$OLDIFS"
+
+NR_LOOPS=10000
+
+i=0
+while [ "$i" -lt "${#INJECT_LIST[@]}" ]; do
+	echo "Injecting at <${INJECT_LIST[$i]}>"
+	do_tests -${INJECT_LIST[i]} ${NR_LOOPS}
+	let "i++"
+done
+NR_LOOPS=
+
+function inject_blocking()
+{
+	OLDIFS="$IFS"
+	IFS=$'\n'
+	INJECT_LIST=(
+		"7"
+		"8"
+		"9"
+	)
+	IFS="$OLDIFS"
+
+	NR_LOOPS=-1
+
+	i=0
+	while [ "$i" -lt "${#INJECT_LIST[@]}" ]; do
+		echo "Injecting at <${INJECT_LIST[$i]}>"
+		do_tests -${INJECT_LIST[i]} -1 ${@}
+		let "i++"
+	done
+	NR_LOOPS=
+}
+
+echo "Yield injection (25%)"
+inject_blocking -m 4 -y -r 100
+
+echo "Yield injection (50%)"
+inject_blocking -m 2 -y -r 100
+
+echo "Yield injection (100%)"
+inject_blocking -m 1 -y -r 100
+
+echo "Kill injection (25%)"
+inject_blocking -m 4 -k -r 100
+
+echo "Kill injection (50%)"
+inject_blocking -m 2 -k -r 100
+
+echo "Kill injection (100%)"
+inject_blocking -m 1 -k -r 100
+
+echo "Sleep injection (1ms, 25%)"
+inject_blocking -m 4 -s 1 -r 100
+
+echo "Sleep injection (1ms, 50%)"
+inject_blocking -m 2 -s 1 -r 100
+
+echo "Sleep injection (1ms, 100%)"
+inject_blocking -m 1 -s 1 -r 100
+
+echo "Disable rseq for 25% threads"
+do_tests -D 4
+
+echo "Disable rseq for 50% threads"
+do_tests -D 2
+
+echo "Disable rseq"
+do_tests -d
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 175+ messages in thread

* Re: [PATCH update for 4.15 1/3] selftests: lib.mk: Introduce OVERRIDE_TARGETS
  2017-11-21 22:19   ` mathieu.desnoyers
  (?)
  (?)
@ 2017-11-21 22:22     ` mathieu.desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 22:22 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, shuah, linux-kselftest

----- On Nov 21, 2017, at 5:19 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> Introduce OVERRIDE_TARGETS to allow tests to express dependencies on
> header files and .so, which require to override the selftests lib.mk
> targets.

Those 3 seftests update patches are still RFC (even though the subject tag
is missing).

Thanks,

Mathieu

> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: Russell King <linux@arm.linux.org.uk>
> CC: Catalin Marinas <catalin.marinas@arm.com>
> CC: Will Deacon <will.deacon@arm.com>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Paul Turner <pjt@google.com>
> CC: Andrew Hunter <ahh@google.com>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Andi Kleen <andi@firstfloor.org>
> CC: Dave Watson <davejwatson@fb.com>
> CC: Chris Lameter <cl@linux.com>
> CC: Ingo Molnar <mingo@redhat.com>
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: Ben Maurer <bmaurer@fb.com>
> CC: Steven Rostedt <rostedt@goodmis.org>
> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> CC: Josh Triplett <josh@joshtriplett.org>
> CC: Linus Torvalds <torvalds@linux-foundation.org>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Boqun Feng <boqun.feng@gmail.com>
> CC: Shuah Khan <shuah@kernel.org>
> CC: linux-kselftest@vger.kernel.org
> CC: linux-api@vger.kernel.org
> ---
> tools/testing/selftests/lib.mk | 4 ++++
> 1 file changed, 4 insertions(+)
> 
> diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
> index 5bef05d6ba39..441d7bc63bb7 100644
> --- a/tools/testing/selftests/lib.mk
> +++ b/tools/testing/selftests/lib.mk
> @@ -105,6 +105,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c
> LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
> endif
> 
> +# Selftest makefiles can override those targets by setting
> +# OVERRIDE_TARGETS = 1.
> +ifeq ($(OVERRIDE_TARGETS),)
> $(OUTPUT)/%:%.c
> 	$(LINK.c) $^ $(LDLIBS) -o $@
> 
> @@ -113,5 +116,6 @@ $(OUTPUT)/%.o:%.S
> 
> $(OUTPUT)/%:%.S
> 	$(LINK.S) $^ $(LDLIBS) -o $@
> +endif
> 
> .PHONY: run_tests all clean install emit_tests
> --
> 2.11.0

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [PATCH update for 4.15 1/3] selftests: lib.mk: Introduce OVERRIDE_TARGETS
@ 2017-11-21 22:22     ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: mathieu.desnoyers @ 2017-11-21 22:22 UTC (permalink / raw)


----- On Nov 21, 2017, at 5:19 PM, Mathieu Desnoyers mathieu.desnoyers at efficios.com wrote:

> Introduce OVERRIDE_TARGETS to allow tests to express dependencies on
> header files and .so, which require to override the selftests lib.mk
> targets.

Those 3 seftests update patches are still RFC (even though the subject tag
is missing).

Thanks,

Mathieu

> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> CC: Russell King <linux at arm.linux.org.uk>
> CC: Catalin Marinas <catalin.marinas at arm.com>
> CC: Will Deacon <will.deacon at arm.com>
> CC: Thomas Gleixner <tglx at linutronix.de>
> CC: Paul Turner <pjt at google.com>
> CC: Andrew Hunter <ahh at google.com>
> CC: Peter Zijlstra <peterz at infradead.org>
> CC: Andy Lutomirski <luto at amacapital.net>
> CC: Andi Kleen <andi at firstfloor.org>
> CC: Dave Watson <davejwatson at fb.com>
> CC: Chris Lameter <cl at linux.com>
> CC: Ingo Molnar <mingo at redhat.com>
> CC: "H. Peter Anvin" <hpa at zytor.com>
> CC: Ben Maurer <bmaurer at fb.com>
> CC: Steven Rostedt <rostedt at goodmis.org>
> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
> CC: Josh Triplett <josh at joshtriplett.org>
> CC: Linus Torvalds <torvalds at linux-foundation.org>
> CC: Andrew Morton <akpm at linux-foundation.org>
> CC: Boqun Feng <boqun.feng at gmail.com>
> CC: Shuah Khan <shuah at kernel.org>
> CC: linux-kselftest at vger.kernel.org
> CC: linux-api at vger.kernel.org
> ---
> tools/testing/selftests/lib.mk | 4 ++++
> 1 file changed, 4 insertions(+)
> 
> diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
> index 5bef05d6ba39..441d7bc63bb7 100644
> --- a/tools/testing/selftests/lib.mk
> +++ b/tools/testing/selftests/lib.mk
> @@ -105,6 +105,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c
> LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
> endif
> 
> +# Selftest makefiles can override those targets by setting
> +# OVERRIDE_TARGETS = 1.
> +ifeq ($(OVERRIDE_TARGETS),)
> $(OUTPUT)/%:%.c
> 	$(LINK.c) $^ $(LDLIBS) -o $@
> 
> @@ -113,5 +116,6 @@ $(OUTPUT)/%.o:%.S
> 
> $(OUTPUT)/%:%.S
> 	$(LINK.S) $^ $(LDLIBS) -o $@
> +endif
> 
> .PHONY: run_tests all clean install emit_tests
> --
> 2.11.0

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [PATCH update for 4.15 1/3] selftests: lib.mk: Introduce OVERRIDE_TARGETS
@ 2017-11-21 22:22     ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 22:22 UTC (permalink / raw)


----- On Nov 21, 2017,@5:19 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> Introduce OVERRIDE_TARGETS to allow tests to express dependencies on
> header files and .so, which require to override the selftests lib.mk
> targets.

Those 3 seftests update patches are still RFC (even though the subject tag
is missing).

Thanks,

Mathieu

> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> CC: Russell King <linux at arm.linux.org.uk>
> CC: Catalin Marinas <catalin.marinas at arm.com>
> CC: Will Deacon <will.deacon at arm.com>
> CC: Thomas Gleixner <tglx at linutronix.de>
> CC: Paul Turner <pjt at google.com>
> CC: Andrew Hunter <ahh at google.com>
> CC: Peter Zijlstra <peterz at infradead.org>
> CC: Andy Lutomirski <luto at amacapital.net>
> CC: Andi Kleen <andi at firstfloor.org>
> CC: Dave Watson <davejwatson at fb.com>
> CC: Chris Lameter <cl at linux.com>
> CC: Ingo Molnar <mingo at redhat.com>
> CC: "H. Peter Anvin" <hpa at zytor.com>
> CC: Ben Maurer <bmaurer at fb.com>
> CC: Steven Rostedt <rostedt at goodmis.org>
> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
> CC: Josh Triplett <josh at joshtriplett.org>
> CC: Linus Torvalds <torvalds at linux-foundation.org>
> CC: Andrew Morton <akpm at linux-foundation.org>
> CC: Boqun Feng <boqun.feng at gmail.com>
> CC: Shuah Khan <shuah at kernel.org>
> CC: linux-kselftest at vger.kernel.org
> CC: linux-api at vger.kernel.org
> ---
> tools/testing/selftests/lib.mk | 4 ++++
> 1 file changed, 4 insertions(+)
> 
> diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
> index 5bef05d6ba39..441d7bc63bb7 100644
> --- a/tools/testing/selftests/lib.mk
> +++ b/tools/testing/selftests/lib.mk
> @@ -105,6 +105,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c
> LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
> endif
> 
> +# Selftest makefiles can override those targets by setting
> +# OVERRIDE_TARGETS = 1.
> +ifeq ($(OVERRIDE_TARGETS),)
> $(OUTPUT)/%:%.c
> 	$(LINK.c) $^ $(LDLIBS) -o $@
> 
> @@ -113,5 +116,6 @@ $(OUTPUT)/%.o:%.S
> 
> $(OUTPUT)/%:%.S
> 	$(LINK.S) $^ $(LDLIBS) -o $@
> +endif
> 
> .PHONY: run_tests all clean install emit_tests
> --
> 2.11.0

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH update for 4.15 1/3] selftests: lib.mk: Introduce OVERRIDE_TARGETS
@ 2017-11-21 22:22     ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-21 22:22 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, shuah, linux-kselftest

----- On Nov 21, 2017, at 5:19 PM, Mathieu Desnoyers mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org wrote:

> Introduce OVERRIDE_TARGETS to allow tests to express dependencies on
> header files and .so, which require to override the selftests lib.mk
> targets.

Those 3 seftests update patches are still RFC (even though the subject tag
is missing).

Thanks,

Mathieu

> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
> CC: Russell King <linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>
> CC: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
> CC: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
> CC: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
> CC: Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> CC: Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> CC: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
> CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
> CC: Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
> CC: Dave Watson <davejwatson-b10kYP2dOMg@public.gmane.org>
> CC: Chris Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
> CC: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> CC: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
> CC: Ben Maurer <bmaurer-b10kYP2dOMg@public.gmane.org>
> CC: Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>
> CC: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> CC: Josh Triplett <josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org>
> CC: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> CC: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> CC: Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> CC: Shuah Khan <shuah-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> CC: linux-kselftest-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> CC: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> ---
> tools/testing/selftests/lib.mk | 4 ++++
> 1 file changed, 4 insertions(+)
> 
> diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
> index 5bef05d6ba39..441d7bc63bb7 100644
> --- a/tools/testing/selftests/lib.mk
> +++ b/tools/testing/selftests/lib.mk
> @@ -105,6 +105,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c
> LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
> endif
> 
> +# Selftest makefiles can override those targets by setting
> +# OVERRIDE_TARGETS = 1.
> +ifeq ($(OVERRIDE_TARGETS),)
> $(OUTPUT)/%:%.c
> 	$(LINK.c) $^ $(LDLIBS) -o $@
> 
> @@ -113,5 +116,6 @@ $(OUTPUT)/%.o:%.S
> 
> $(OUTPUT)/%:%.S
> 	$(LINK.S) $^ $(LDLIBS) -o $@
> +endif
> 
> .PHONY: run_tests all clean install emit_tests
> --
> 2.11.0

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-21 22:59       ` Thomas Gleixner
  0 siblings, 0 replies; 175+ messages in thread
From: Thomas Gleixner @ 2017-11-21 22:59 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson, linux-kernel, linux-api,
	Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andrew Hunter, Chris Lameter, Ben Maurer,
	rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk

On Tue, 21 Nov 2017, Mathieu Desnoyers wrote:
> ----- On Nov 21, 2017, at 12:21 PM, Andi Kleen andi@firstfloor.org wrote:
> 
> > On Tue, Nov 21, 2017 at 09:18:38AM -0500, Mathieu Desnoyers wrote:
> >> Hi,
> >> 
> >> Following changes based on a thorough coding style and patch changelog
> >> review from Thomas Gleixner and Peter Zijlstra, I'm respinning this
> >> series for another RFC.
> >> 
> > My suggestion would be that you also split out the opv system call.
> > That seems to be main contention point currently, and the restartable
> > sequences should be useful without it.
> 
> I consider rseq to be incomplete and a pain to use in various scenarios
> without cpu_opv. 
> 
> About the contention point you refer to:
> 
> Using vDSO as an example of how things should be done is just wrong: the
> vDSO interaction with debugger instruction single-stepping is broken,
> as I detailed in my previous email.

Let me turn that around. You're lamenting about a conditional branch in
your rseq thing for performance reasons and at the same time you want to
force extra code into the VDSO? clock_gettime() is one of the hottest
vsyscalls in certain scenarions. So why would we want to have extra code
there? Just to make debuggers happy. You really can't be serious about
that.

> Thomas' proposal of handling single-stepping with a user-space locking
> fallback, which is pretty much what I had in 2016, pushes a lot of
> complexity to user-space, requires an extra branch in the fast-path,
> as well as additional store-release/load-acquire semantics for consistency.
> I don't plan going down that route.
>
> Other than that, I have not received any concrete alternative proposal to
> properly handle single-stepping.

You provided the details today. Up to that point all we had was handwaving
and inconsistent information.

> The only opposition against cpu_opv is that there *should* be an hypothetical
> simpler solution. The rseq idea is not new: it's been presented by Paul Turner
> in 2012 at LPC. And so far, cpu_opv is the overall simplest and most
> efficient way I encountered to handle single-stepping, and it gives extra
> benefits, as described in my changelog.

That's how you define it and that does not make cpu_opv less complex and
more debuggable. There is no way to debug that and still you claim that it
removes compexity from user space. That ops stuff comes from user space and
is not magically constructed by the kernel. In some of your use cases it
even has different semantics than the rseq section code. So how is that
removing any complexity from user space? All it buys you is an extra branch
less in your rseq hotpath and that's your justification to shove that
thing into the kernel.

The version I reviewed was just undigestable. I did not have time to look
at the hastily cobbled together version of today. Aside of that the
scheduler portion of it has not seen any review from scheduler folks
either.

AFAICT there is not a single reviewed-by tag on the sys_rseq and the
sys_opv patches either.

Are you seriously expecting that new syscalls of that kind are going to be
merged without a deep and thorough review just based on your decision to
declare them ready?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-21 22:59       ` Thomas Gleixner
  0 siblings, 0 replies; 175+ messages in thread
From: Thomas Gleixner @ 2017-11-21 22:59 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson, linux-kernel, linux-api,
	Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andrew Hunter, Chris Lameter, Ben Maurer,
	rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will

On Tue, 21 Nov 2017, Mathieu Desnoyers wrote:
> ----- On Nov 21, 2017, at 12:21 PM, Andi Kleen andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org wrote:
> 
> > On Tue, Nov 21, 2017 at 09:18:38AM -0500, Mathieu Desnoyers wrote:
> >> Hi,
> >> 
> >> Following changes based on a thorough coding style and patch changelog
> >> review from Thomas Gleixner and Peter Zijlstra, I'm respinning this
> >> series for another RFC.
> >> 
> > My suggestion would be that you also split out the opv system call.
> > That seems to be main contention point currently, and the restartable
> > sequences should be useful without it.
> 
> I consider rseq to be incomplete and a pain to use in various scenarios
> without cpu_opv. 
> 
> About the contention point you refer to:
> 
> Using vDSO as an example of how things should be done is just wrong: the
> vDSO interaction with debugger instruction single-stepping is broken,
> as I detailed in my previous email.

Let me turn that around. You're lamenting about a conditional branch in
your rseq thing for performance reasons and at the same time you want to
force extra code into the VDSO? clock_gettime() is one of the hottest
vsyscalls in certain scenarions. So why would we want to have extra code
there? Just to make debuggers happy. You really can't be serious about
that.

> Thomas' proposal of handling single-stepping with a user-space locking
> fallback, which is pretty much what I had in 2016, pushes a lot of
> complexity to user-space, requires an extra branch in the fast-path,
> as well as additional store-release/load-acquire semantics for consistency.
> I don't plan going down that route.
>
> Other than that, I have not received any concrete alternative proposal to
> properly handle single-stepping.

You provided the details today. Up to that point all we had was handwaving
and inconsistent information.

> The only opposition against cpu_opv is that there *should* be an hypothetical
> simpler solution. The rseq idea is not new: it's been presented by Paul Turner
> in 2012 at LPC. And so far, cpu_opv is the overall simplest and most
> efficient way I encountered to handle single-stepping, and it gives extra
> benefits, as described in my changelog.

That's how you define it and that does not make cpu_opv less complex and
more debuggable. There is no way to debug that and still you claim that it
removes compexity from user space. That ops stuff comes from user space and
is not magically constructed by the kernel. In some of your use cases it
even has different semantics than the rseq section code. So how is that
removing any complexity from user space? All it buys you is an extra branch
less in your rseq hotpath and that's your justification to shove that
thing into the kernel.

The version I reviewed was just undigestable. I did not have time to look
at the hastily cobbled together version of today. Aside of that the
scheduler portion of it has not seen any review from scheduler folks
either.

AFAICT there is not a single reviewed-by tag on the sys_rseq and the
sys_opv patches either.

Are you seriously expecting that new syscalls of that kind are going to be
merged without a deep and thorough review just based on your decision to
declare them ready?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
  2017-11-21 22:59       ` Thomas Gleixner
@ 2017-11-22 12:36         ` Mathieu Desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-22 12:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andi Kleen, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson, linux-kernel, linux-api,
	Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andrew Hunter, Chris Lameter, Ben Maurer,
	rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk

----- On Nov 21, 2017, at 5:59 PM, Thomas Gleixner tglx@linutronix.de wrote:

> On Tue, 21 Nov 2017, Mathieu Desnoyers wrote:
>> ----- On Nov 21, 2017, at 12:21 PM, Andi Kleen andi@firstfloor.org wrote:
>> 
>> > On Tue, Nov 21, 2017 at 09:18:38AM -0500, Mathieu Desnoyers wrote:
>> >> Hi,
>> >> 
>> >> Following changes based on a thorough coding style and patch changelog
>> >> review from Thomas Gleixner and Peter Zijlstra, I'm respinning this
>> >> series for another RFC.
>> >> 
>> > My suggestion would be that you also split out the opv system call.
>> > That seems to be main contention point currently, and the restartable
>> > sequences should be useful without it.
>> 
>> I consider rseq to be incomplete and a pain to use in various scenarios
>> without cpu_opv.
>> 
>> About the contention point you refer to:
>> 
>> Using vDSO as an example of how things should be done is just wrong: the
>> vDSO interaction with debugger instruction single-stepping is broken,
>> as I detailed in my previous email.
> 
> Let me turn that around. You're lamenting about a conditional branch in
> your rseq thing for performance reasons and at the same time you want to
> force extra code into the VDSO? clock_gettime() is one of the hottest
> vsyscalls in certain scenarions. So why would we want to have extra code
> there? Just to make debuggers happy. You really can't be serious about
> that.

There is *already* an existing branch in the clock_gettime vsyscall:
it's a loop. It won't hurt the fast-path to use that branch and
make it do something else instead. It could even help the vDSO fast-path
for some non-x86 architectures where branch prediction assumes that
backward branches are always taken (adding an unlikely() does not help
in those cases).

> 
>> Thomas' proposal of handling single-stepping with a user-space locking
>> fallback, which is pretty much what I had in 2016, pushes a lot of
>> complexity to user-space, requires an extra branch in the fast-path,
>> as well as additional store-release/load-acquire semantics for consistency.
>> I don't plan going down that route.
>>
>> Other than that, I have not received any concrete alternative proposal to
>> properly handle single-stepping.
> 
> You provided the details today. Up to that point all we had was handwaving
> and inconsistent information.

I mistakenly presumed you took interest in the past 2 years discussions.
It appears I was wrong, and that information needed to be summarized in
my changelog. This was my mistake and I fixed it.

> 
>> The only opposition against cpu_opv is that there *should* be an hypothetical
>> simpler solution. The rseq idea is not new: it's been presented by Paul Turner
>> in 2012 at LPC. And so far, cpu_opv is the overall simplest and most
>> efficient way I encountered to handle single-stepping, and it gives extra
>> benefits, as described in my changelog.
> 
> That's how you define it and that does not make cpu_opv less complex and
> more debuggable. There is no way to debug that and still you claim that it
> removes compexity from user space.

So I should ask: what kind of observability within cpu_opv() do you want ?
I can add a tracepoint for each operation, which would technically take care
of your concern. You main counter-argument seems to be a tooling issue.

> That ops stuff comes from user space and
> is not magically constructed by the kernel. In some of your use cases it
> even has different semantics than the rseq section code. So how is that
> removing any complexity from user space? All it buys you is an extra branch
> less in your rseq hotpath and that's your justification to shove that
> thing into the kernel.

Actually, the cpu-op user-space library can hide this difference from the
user: I implemented the equivalent rseq algorithm using a compare-and-store:

int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
                off_t voffp, intptr_t *load, int cpu)
{
        intptr_t oldv = READ_ONCE(*v);
        intptr_t *newp = (intptr_t *)(oldv + voffp);
        int ret;

        if (oldv == expectnot)
                return 1;
        ret = cpu_op_cmpeqv_storep_expect_fault(v, oldv, newp, cpu);
        if (!ret) {
                *load = oldv;
                return 0;
        }
        if (ret > 0) {
                errno = EAGAIN;
                return -1;
        }
        return -1;
}

So from a library user perspective, the fast-path and slow-path are
exactly the same.

> 
> The version I reviewed was just undigestable.

Thanks for the thorough coding style review by the way.

> I did not have time to look
> at the hastily cobbled together version of today. Aside of that the
> scheduler portion of it has not seen any review from scheduler folks
> either.

True. It appears that it really takes a merge window to get some
people's attention. That's OK, you guys are really busy on other
stuff. It's just unfortunate that the feedback about the cpu_opv
concept did not come sooner, e.g. during first rounds of patches
where the cpu_opv design was presented, or even at KS.

> 
> AFAICT there is not a single reviewed-by tag on the sys_rseq and the
> sys_opv patches either.

Very good point! Anyone in CC who cares about getting this in can
find time to do some official review ?

> 
> Are you seriously expecting that new syscalls of that kind are going to be
> merged without a deep and thorough review just based on your decision to
> declare them ready?

In my reply to Andi, I merely state that I'm not willing to push an
half-baked user-space ABI into the kernel, and rseq without cpu_opv
is only part of the solution.

Let's see if others find time to do an official review.

Thanks,

Mathieu



> 
> Thanks,
> 
> 	tglx

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-22 12:36         ` Mathieu Desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-22 12:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andi Kleen, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson, linux-kernel, linux-api,
	Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andrew Hunter, Chris Lameter, Ben Maurer,
	rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will

----- On Nov 21, 2017, at 5:59 PM, Thomas Gleixner tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org wrote:

> On Tue, 21 Nov 2017, Mathieu Desnoyers wrote:
>> ----- On Nov 21, 2017, at 12:21 PM, Andi Kleen andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org wrote:
>> 
>> > On Tue, Nov 21, 2017 at 09:18:38AM -0500, Mathieu Desnoyers wrote:
>> >> Hi,
>> >> 
>> >> Following changes based on a thorough coding style and patch changelog
>> >> review from Thomas Gleixner and Peter Zijlstra, I'm respinning this
>> >> series for another RFC.
>> >> 
>> > My suggestion would be that you also split out the opv system call.
>> > That seems to be main contention point currently, and the restartable
>> > sequences should be useful without it.
>> 
>> I consider rseq to be incomplete and a pain to use in various scenarios
>> without cpu_opv.
>> 
>> About the contention point you refer to:
>> 
>> Using vDSO as an example of how things should be done is just wrong: the
>> vDSO interaction with debugger instruction single-stepping is broken,
>> as I detailed in my previous email.
> 
> Let me turn that around. You're lamenting about a conditional branch in
> your rseq thing for performance reasons and at the same time you want to
> force extra code into the VDSO? clock_gettime() is one of the hottest
> vsyscalls in certain scenarions. So why would we want to have extra code
> there? Just to make debuggers happy. You really can't be serious about
> that.

There is *already* an existing branch in the clock_gettime vsyscall:
it's a loop. It won't hurt the fast-path to use that branch and
make it do something else instead. It could even help the vDSO fast-path
for some non-x86 architectures where branch prediction assumes that
backward branches are always taken (adding an unlikely() does not help
in those cases).

> 
>> Thomas' proposal of handling single-stepping with a user-space locking
>> fallback, which is pretty much what I had in 2016, pushes a lot of
>> complexity to user-space, requires an extra branch in the fast-path,
>> as well as additional store-release/load-acquire semantics for consistency.
>> I don't plan going down that route.
>>
>> Other than that, I have not received any concrete alternative proposal to
>> properly handle single-stepping.
> 
> You provided the details today. Up to that point all we had was handwaving
> and inconsistent information.

I mistakenly presumed you took interest in the past 2 years discussions.
It appears I was wrong, and that information needed to be summarized in
my changelog. This was my mistake and I fixed it.

> 
>> The only opposition against cpu_opv is that there *should* be an hypothetical
>> simpler solution. The rseq idea is not new: it's been presented by Paul Turner
>> in 2012 at LPC. And so far, cpu_opv is the overall simplest and most
>> efficient way I encountered to handle single-stepping, and it gives extra
>> benefits, as described in my changelog.
> 
> That's how you define it and that does not make cpu_opv less complex and
> more debuggable. There is no way to debug that and still you claim that it
> removes compexity from user space.

So I should ask: what kind of observability within cpu_opv() do you want ?
I can add a tracepoint for each operation, which would technically take care
of your concern. You main counter-argument seems to be a tooling issue.

> That ops stuff comes from user space and
> is not magically constructed by the kernel. In some of your use cases it
> even has different semantics than the rseq section code. So how is that
> removing any complexity from user space? All it buys you is an extra branch
> less in your rseq hotpath and that's your justification to shove that
> thing into the kernel.

Actually, the cpu-op user-space library can hide this difference from the
user: I implemented the equivalent rseq algorithm using a compare-and-store:

int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
                off_t voffp, intptr_t *load, int cpu)
{
        intptr_t oldv = READ_ONCE(*v);
        intptr_t *newp = (intptr_t *)(oldv + voffp);
        int ret;

        if (oldv == expectnot)
                return 1;
        ret = cpu_op_cmpeqv_storep_expect_fault(v, oldv, newp, cpu);
        if (!ret) {
                *load = oldv;
                return 0;
        }
        if (ret > 0) {
                errno = EAGAIN;
                return -1;
        }
        return -1;
}

So from a library user perspective, the fast-path and slow-path are
exactly the same.

> 
> The version I reviewed was just undigestable.

Thanks for the thorough coding style review by the way.

> I did not have time to look
> at the hastily cobbled together version of today. Aside of that the
> scheduler portion of it has not seen any review from scheduler folks
> either.

True. It appears that it really takes a merge window to get some
people's attention. That's OK, you guys are really busy on other
stuff. It's just unfortunate that the feedback about the cpu_opv
concept did not come sooner, e.g. during first rounds of patches
where the cpu_opv design was presented, or even at KS.

> 
> AFAICT there is not a single reviewed-by tag on the sys_rseq and the
> sys_opv patches either.

Very good point! Anyone in CC who cares about getting this in can
find time to do some official review ?

> 
> Are you seriously expecting that new syscalls of that kind are going to be
> merged without a deep and thorough review just based on your decision to
> declare them ready?

In my reply to Andi, I merely state that I'm not willing to push an
half-baked user-space ABI into the kernel, and rseq without cpu_opv
is only part of the solution.

Let's see if others find time to do an official review.

Thanks,

Mathieu



> 
> Thanks,
> 
> 	tglx

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH update for 4.15 1/3] selftests: lib.mk: Introduce OVERRIDE_TARGETS
  2017-11-21 22:19   ` mathieu.desnoyers
  (?)
  (?)
@ 2017-11-22 15:16     ` shuah
  -1 siblings, 0 replies; 175+ messages in thread
From: Shuah Khan @ 2017-11-22 15:16 UTC (permalink / raw)
  To: Mathieu Desnoyers, Peter Zijlstra, Paul E . McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, linux-kselftest, Shuah Khan,
	Shuah Khan

On 11/21/2017 03:19 PM, Mathieu Desnoyers wrote:
> Introduce OVERRIDE_TARGETS to allow tests to express dependencies on
> header files and .so, which require to override the selftests lib.mk
> targets.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: Russell King <linux@arm.linux.org.uk>
> CC: Catalin Marinas <catalin.marinas@arm.com>
> CC: Will Deacon <will.deacon@arm.com>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Paul Turner <pjt@google.com>
> CC: Andrew Hunter <ahh@google.com>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Andi Kleen <andi@firstfloor.org>
> CC: Dave Watson <davejwatson@fb.com>
> CC: Chris Lameter <cl@linux.com>
> CC: Ingo Molnar <mingo@redhat.com>
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: Ben Maurer <bmaurer@fb.com>
> CC: Steven Rostedt <rostedt@goodmis.org>
> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> CC: Josh Triplett <josh@joshtriplett.org>
> CC: Linus Torvalds <torvalds@linux-foundation.org>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Boqun Feng <boqun.feng@gmail.com>
> CC: Shuah Khan <shuah@kernel.org>
> CC: linux-kselftest@vger.kernel.org
> CC: linux-api@vger.kernel.org
> ---
>  tools/testing/selftests/lib.mk | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
> index 5bef05d6ba39..441d7bc63bb7 100644
> --- a/tools/testing/selftests/lib.mk
> +++ b/tools/testing/selftests/lib.mk
> @@ -105,6 +105,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c
>  LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
>  endif
>  
> +# Selftest makefiles can override those targets by setting
> +# OVERRIDE_TARGETS = 1.
> +ifeq ($(OVERRIDE_TARGETS),)
>  $(OUTPUT)/%:%.c
>  	$(LINK.c) $^ $(LDLIBS) -o $@
>  
> @@ -113,5 +116,6 @@ $(OUTPUT)/%.o:%.S
>  
>  $(OUTPUT)/%:%.S
>  	$(LINK.S) $^ $(LDLIBS) -o $@
> +endif
>  
>  .PHONY: run_tests all clean install emit_tests
> 

Thanks for splitting this patch. It will make it easier since it
is kselftest framework change.

Acked-by: Shuah Khan <shuahkh@osg.samsung.com>

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [PATCH update for 4.15 1/3] selftests: lib.mk: Introduce OVERRIDE_TARGETS
@ 2017-11-22 15:16     ` shuah
  0 siblings, 0 replies; 175+ messages in thread
From: shuah @ 2017-11-22 15:16 UTC (permalink / raw)


On 11/21/2017 03:19 PM, Mathieu Desnoyers wrote:
> Introduce OVERRIDE_TARGETS to allow tests to express dependencies on
> header files and .so, which require to override the selftests lib.mk
> targets.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> CC: Russell King <linux at arm.linux.org.uk>
> CC: Catalin Marinas <catalin.marinas at arm.com>
> CC: Will Deacon <will.deacon at arm.com>
> CC: Thomas Gleixner <tglx at linutronix.de>
> CC: Paul Turner <pjt at google.com>
> CC: Andrew Hunter <ahh at google.com>
> CC: Peter Zijlstra <peterz at infradead.org>
> CC: Andy Lutomirski <luto at amacapital.net>
> CC: Andi Kleen <andi at firstfloor.org>
> CC: Dave Watson <davejwatson at fb.com>
> CC: Chris Lameter <cl at linux.com>
> CC: Ingo Molnar <mingo at redhat.com>
> CC: "H. Peter Anvin" <hpa at zytor.com>
> CC: Ben Maurer <bmaurer at fb.com>
> CC: Steven Rostedt <rostedt at goodmis.org>
> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
> CC: Josh Triplett <josh at joshtriplett.org>
> CC: Linus Torvalds <torvalds at linux-foundation.org>
> CC: Andrew Morton <akpm at linux-foundation.org>
> CC: Boqun Feng <boqun.feng at gmail.com>
> CC: Shuah Khan <shuah at kernel.org>
> CC: linux-kselftest at vger.kernel.org
> CC: linux-api at vger.kernel.org
> ---
>  tools/testing/selftests/lib.mk | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
> index 5bef05d6ba39..441d7bc63bb7 100644
> --- a/tools/testing/selftests/lib.mk
> +++ b/tools/testing/selftests/lib.mk
> @@ -105,6 +105,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c
>  LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
>  endif
>  
> +# Selftest makefiles can override those targets by setting
> +# OVERRIDE_TARGETS = 1.
> +ifeq ($(OVERRIDE_TARGETS),)
>  $(OUTPUT)/%:%.c
>  	$(LINK.c) $^ $(LDLIBS) -o $@
>  
> @@ -113,5 +116,6 @@ $(OUTPUT)/%.o:%.S
>  
>  $(OUTPUT)/%:%.S
>  	$(LINK.S) $^ $(LDLIBS) -o $@
> +endif
>  
>  .PHONY: run_tests all clean install emit_tests
> 

Thanks for splitting this patch. It will make it easier since it
is kselftest framework change.

Acked-by: Shuah Khan <shuahkh at osg.samsung.com>

thanks,
-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [PATCH update for 4.15 1/3] selftests: lib.mk: Introduce OVERRIDE_TARGETS
@ 2017-11-22 15:16     ` shuah
  0 siblings, 0 replies; 175+ messages in thread
From: Shuah Khan @ 2017-11-22 15:16 UTC (permalink / raw)


On 11/21/2017 03:19 PM, Mathieu Desnoyers wrote:
> Introduce OVERRIDE_TARGETS to allow tests to express dependencies on
> header files and .so, which require to override the selftests lib.mk
> targets.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> CC: Russell King <linux at arm.linux.org.uk>
> CC: Catalin Marinas <catalin.marinas at arm.com>
> CC: Will Deacon <will.deacon at arm.com>
> CC: Thomas Gleixner <tglx at linutronix.de>
> CC: Paul Turner <pjt at google.com>
> CC: Andrew Hunter <ahh at google.com>
> CC: Peter Zijlstra <peterz at infradead.org>
> CC: Andy Lutomirski <luto at amacapital.net>
> CC: Andi Kleen <andi at firstfloor.org>
> CC: Dave Watson <davejwatson at fb.com>
> CC: Chris Lameter <cl at linux.com>
> CC: Ingo Molnar <mingo at redhat.com>
> CC: "H. Peter Anvin" <hpa at zytor.com>
> CC: Ben Maurer <bmaurer at fb.com>
> CC: Steven Rostedt <rostedt at goodmis.org>
> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
> CC: Josh Triplett <josh at joshtriplett.org>
> CC: Linus Torvalds <torvalds at linux-foundation.org>
> CC: Andrew Morton <akpm at linux-foundation.org>
> CC: Boqun Feng <boqun.feng at gmail.com>
> CC: Shuah Khan <shuah at kernel.org>
> CC: linux-kselftest at vger.kernel.org
> CC: linux-api at vger.kernel.org
> ---
>  tools/testing/selftests/lib.mk | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
> index 5bef05d6ba39..441d7bc63bb7 100644
> --- a/tools/testing/selftests/lib.mk
> +++ b/tools/testing/selftests/lib.mk
> @@ -105,6 +105,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c
>  LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
>  endif
>  
> +# Selftest makefiles can override those targets by setting
> +# OVERRIDE_TARGETS = 1.
> +ifeq ($(OVERRIDE_TARGETS),)
>  $(OUTPUT)/%:%.c
>  	$(LINK.c) $^ $(LDLIBS) -o $@
>  
> @@ -113,5 +116,6 @@ $(OUTPUT)/%.o:%.S
>  
>  $(OUTPUT)/%:%.S
>  	$(LINK.S) $^ $(LDLIBS) -o $@
> +endif
>  
>  .PHONY: run_tests all clean install emit_tests
> 

Thanks for splitting this patch. It will make it easier since it
is kselftest framework change.

Acked-by: Shuah Khan <shuahkh at osg.samsung.com>

thanks,
-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH update for 4.15 1/3] selftests: lib.mk: Introduce OVERRIDE_TARGETS
@ 2017-11-22 15:16     ` shuah
  0 siblings, 0 replies; 175+ messages in thread
From: Shuah Khan @ 2017-11-22 15:16 UTC (permalink / raw)
  To: Mathieu Desnoyers, Peter Zijlstra, Paul E . McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk,
	linux-kselftest-u79uwXL29TY76Z2rM5mHXA, Shuah Khan, Shuah Khan

On 11/21/2017 03:19 PM, Mathieu Desnoyers wrote:
> Introduce OVERRIDE_TARGETS to allow tests to express dependencies on
> header files and .so, which require to override the selftests lib.mk
> targets.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
> CC: Russell King <linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>
> CC: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
> CC: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
> CC: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
> CC: Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> CC: Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> CC: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
> CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
> CC: Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
> CC: Dave Watson <davejwatson-b10kYP2dOMg@public.gmane.org>
> CC: Chris Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
> CC: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> CC: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
> CC: Ben Maurer <bmaurer-b10kYP2dOMg@public.gmane.org>
> CC: Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>
> CC: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> CC: Josh Triplett <josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org>
> CC: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> CC: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> CC: Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> CC: Shuah Khan <shuah-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> CC: linux-kselftest-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> CC: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> ---
>  tools/testing/selftests/lib.mk | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
> index 5bef05d6ba39..441d7bc63bb7 100644
> --- a/tools/testing/selftests/lib.mk
> +++ b/tools/testing/selftests/lib.mk
> @@ -105,6 +105,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c
>  LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH)
>  endif
>  
> +# Selftest makefiles can override those targets by setting
> +# OVERRIDE_TARGETS = 1.
> +ifeq ($(OVERRIDE_TARGETS),)
>  $(OUTPUT)/%:%.c
>  	$(LINK.c) $^ $(LDLIBS) -o $@
>  
> @@ -113,5 +116,6 @@ $(OUTPUT)/%.o:%.S
>  
>  $(OUTPUT)/%:%.S
>  	$(LINK.S) $^ $(LDLIBS) -o $@
> +endif
>  
>  .PHONY: run_tests all clean install emit_tests
> 

Thanks for splitting this patch. It will make it easier since it
is kselftest framework change.

Acked-by: Shuah Khan <shuahkh-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org>

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH update for 4.15 2/3] cpu_opv: selftests: Implement selftests (v4)
  2017-11-21 22:19   ` mathieu.desnoyers
  (?)
  (?)
@ 2017-11-22 15:20     ` shuah
  -1 siblings, 0 replies; 175+ messages in thread
From: Shuah Khan @ 2017-11-22 15:20 UTC (permalink / raw)
  To: Mathieu Desnoyers, Peter Zijlstra, Paul E . McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, linux-kselftest, Shuah Khan,
	Shuah Khan

On 11/21/2017 03:19 PM, Mathieu Desnoyers wrote:
> Implement cpu_opv selftests. It needs to express dependencies on
> header files and .so, which require to override the selftests
> lib.mk targets. Use OVERRIDE_TARGETS define for this.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: Russell King <linux@arm.linux.org.uk>
> CC: Catalin Marinas <catalin.marinas@arm.com>
> CC: Will Deacon <will.deacon@arm.com>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Paul Turner <pjt@google.com>
> CC: Andrew Hunter <ahh@google.com>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Andi Kleen <andi@firstfloor.org>
> CC: Dave Watson <davejwatson@fb.com>
> CC: Chris Lameter <cl@linux.com>
> CC: Ingo Molnar <mingo@redhat.com>
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: Ben Maurer <bmaurer@fb.com>
> CC: Steven Rostedt <rostedt@goodmis.org>
> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> CC: Josh Triplett <josh@joshtriplett.org>
> CC: Linus Torvalds <torvalds@linux-foundation.org>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Boqun Feng <boqun.feng@gmail.com>
> CC: Shuah Khan <shuah@kernel.org>
> CC: linux-kselftest@vger.kernel.org
> CC: linux-api@vger.kernel.org
> ---
> Changes since v1:
> 
> - Expose similar library API as rseq:  Expose library API closely
>   matching the rseq APIs, following removal of the event counter from
>   the rseq kernel API.
> - Update makefile to fix make run_tests dependency on "all".
> - Introduce a OVERRIDE_TARGETS.
> 
> Changes since v2:
> 
> - Test page faults.
> 
> Changes since v3:
> 
> - Move lib.mk OVERRIDE_TARGETS change to its own patch.
> - Printout TAP output.
> ---
>  MAINTAINERS                                        |    1 +
>  tools/testing/selftests/Makefile                   |    1 +
>  tools/testing/selftests/cpu-opv/.gitignore         |    1 +
>  tools/testing/selftests/cpu-opv/Makefile           |   17 +
>  .../testing/selftests/cpu-opv/basic_cpu_opv_test.c | 1167 ++++++++++++++++++++
>  tools/testing/selftests/cpu-opv/cpu-op.c           |  348 ++++++
>  tools/testing/selftests/cpu-opv/cpu-op.h           |   68 ++
>  7 files changed, 1603 insertions(+)
>  create mode 100644 tools/testing/selftests/cpu-opv/.gitignore
>  create mode 100644 tools/testing/selftests/cpu-opv/Makefile
>  create mode 100644 tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
>  create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.c
>  create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.h
> 

Looks good.

Acked-by: Shuah Khan <shuahkh@osg.samsung.com>

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [PATCH update for 4.15 2/3] cpu_opv: selftests: Implement selftests (v4)
@ 2017-11-22 15:20     ` shuah
  0 siblings, 0 replies; 175+ messages in thread
From: shuah @ 2017-11-22 15:20 UTC (permalink / raw)


On 11/21/2017 03:19 PM, Mathieu Desnoyers wrote:
> Implement cpu_opv selftests. It needs to express dependencies on
> header files and .so, which require to override the selftests
> lib.mk targets. Use OVERRIDE_TARGETS define for this.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> CC: Russell King <linux at arm.linux.org.uk>
> CC: Catalin Marinas <catalin.marinas at arm.com>
> CC: Will Deacon <will.deacon at arm.com>
> CC: Thomas Gleixner <tglx at linutronix.de>
> CC: Paul Turner <pjt at google.com>
> CC: Andrew Hunter <ahh at google.com>
> CC: Peter Zijlstra <peterz at infradead.org>
> CC: Andy Lutomirski <luto at amacapital.net>
> CC: Andi Kleen <andi at firstfloor.org>
> CC: Dave Watson <davejwatson at fb.com>
> CC: Chris Lameter <cl at linux.com>
> CC: Ingo Molnar <mingo at redhat.com>
> CC: "H. Peter Anvin" <hpa at zytor.com>
> CC: Ben Maurer <bmaurer at fb.com>
> CC: Steven Rostedt <rostedt at goodmis.org>
> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
> CC: Josh Triplett <josh at joshtriplett.org>
> CC: Linus Torvalds <torvalds at linux-foundation.org>
> CC: Andrew Morton <akpm at linux-foundation.org>
> CC: Boqun Feng <boqun.feng at gmail.com>
> CC: Shuah Khan <shuah at kernel.org>
> CC: linux-kselftest at vger.kernel.org
> CC: linux-api at vger.kernel.org
> ---
> Changes since v1:
> 
> - Expose similar library API as rseq:  Expose library API closely
>   matching the rseq APIs, following removal of the event counter from
>   the rseq kernel API.
> - Update makefile to fix make run_tests dependency on "all".
> - Introduce a OVERRIDE_TARGETS.
> 
> Changes since v2:
> 
> - Test page faults.
> 
> Changes since v3:
> 
> - Move lib.mk OVERRIDE_TARGETS change to its own patch.
> - Printout TAP output.
> ---
>  MAINTAINERS                                        |    1 +
>  tools/testing/selftests/Makefile                   |    1 +
>  tools/testing/selftests/cpu-opv/.gitignore         |    1 +
>  tools/testing/selftests/cpu-opv/Makefile           |   17 +
>  .../testing/selftests/cpu-opv/basic_cpu_opv_test.c | 1167 ++++++++++++++++++++
>  tools/testing/selftests/cpu-opv/cpu-op.c           |  348 ++++++
>  tools/testing/selftests/cpu-opv/cpu-op.h           |   68 ++
>  7 files changed, 1603 insertions(+)
>  create mode 100644 tools/testing/selftests/cpu-opv/.gitignore
>  create mode 100644 tools/testing/selftests/cpu-opv/Makefile
>  create mode 100644 tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
>  create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.c
>  create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.h
> 

Looks good.

Acked-by: Shuah Khan <shuahkh at osg.samsung.com>

thanks,
-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [PATCH update for 4.15 2/3] cpu_opv: selftests: Implement selftests (v4)
@ 2017-11-22 15:20     ` shuah
  0 siblings, 0 replies; 175+ messages in thread
From: Shuah Khan @ 2017-11-22 15:20 UTC (permalink / raw)


On 11/21/2017 03:19 PM, Mathieu Desnoyers wrote:
> Implement cpu_opv selftests. It needs to express dependencies on
> header files and .so, which require to override the selftests
> lib.mk targets. Use OVERRIDE_TARGETS define for this.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> CC: Russell King <linux at arm.linux.org.uk>
> CC: Catalin Marinas <catalin.marinas at arm.com>
> CC: Will Deacon <will.deacon at arm.com>
> CC: Thomas Gleixner <tglx at linutronix.de>
> CC: Paul Turner <pjt at google.com>
> CC: Andrew Hunter <ahh at google.com>
> CC: Peter Zijlstra <peterz at infradead.org>
> CC: Andy Lutomirski <luto at amacapital.net>
> CC: Andi Kleen <andi at firstfloor.org>
> CC: Dave Watson <davejwatson at fb.com>
> CC: Chris Lameter <cl at linux.com>
> CC: Ingo Molnar <mingo at redhat.com>
> CC: "H. Peter Anvin" <hpa at zytor.com>
> CC: Ben Maurer <bmaurer at fb.com>
> CC: Steven Rostedt <rostedt at goodmis.org>
> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
> CC: Josh Triplett <josh at joshtriplett.org>
> CC: Linus Torvalds <torvalds at linux-foundation.org>
> CC: Andrew Morton <akpm at linux-foundation.org>
> CC: Boqun Feng <boqun.feng at gmail.com>
> CC: Shuah Khan <shuah at kernel.org>
> CC: linux-kselftest at vger.kernel.org
> CC: linux-api at vger.kernel.org
> ---
> Changes since v1:
> 
> - Expose similar library API as rseq:  Expose library API closely
>   matching the rseq APIs, following removal of the event counter from
>   the rseq kernel API.
> - Update makefile to fix make run_tests dependency on "all".
> - Introduce a OVERRIDE_TARGETS.
> 
> Changes since v2:
> 
> - Test page faults.
> 
> Changes since v3:
> 
> - Move lib.mk OVERRIDE_TARGETS change to its own patch.
> - Printout TAP output.
> ---
>  MAINTAINERS                                        |    1 +
>  tools/testing/selftests/Makefile                   |    1 +
>  tools/testing/selftests/cpu-opv/.gitignore         |    1 +
>  tools/testing/selftests/cpu-opv/Makefile           |   17 +
>  .../testing/selftests/cpu-opv/basic_cpu_opv_test.c | 1167 ++++++++++++++++++++
>  tools/testing/selftests/cpu-opv/cpu-op.c           |  348 ++++++
>  tools/testing/selftests/cpu-opv/cpu-op.h           |   68 ++
>  7 files changed, 1603 insertions(+)
>  create mode 100644 tools/testing/selftests/cpu-opv/.gitignore
>  create mode 100644 tools/testing/selftests/cpu-opv/Makefile
>  create mode 100644 tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
>  create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.c
>  create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.h
> 

Looks good.

Acked-by: Shuah Khan <shuahkh at osg.samsung.com>

thanks,
-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH update for 4.15 2/3] cpu_opv: selftests: Implement selftests (v4)
@ 2017-11-22 15:20     ` shuah
  0 siblings, 0 replies; 175+ messages in thread
From: Shuah Khan @ 2017-11-22 15:20 UTC (permalink / raw)
  To: Mathieu Desnoyers, Peter Zijlstra, Paul E . McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk,
	linux-kselftest-u79uwXL29TY76Z2rM5mHXA, Shuah Khan, Shuah Khan

On 11/21/2017 03:19 PM, Mathieu Desnoyers wrote:
> Implement cpu_opv selftests. It needs to express dependencies on
> header files and .so, which require to override the selftests
> lib.mk targets. Use OVERRIDE_TARGETS define for this.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
> CC: Russell King <linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>
> CC: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
> CC: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
> CC: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
> CC: Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> CC: Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> CC: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
> CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
> CC: Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
> CC: Dave Watson <davejwatson-b10kYP2dOMg@public.gmane.org>
> CC: Chris Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
> CC: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> CC: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
> CC: Ben Maurer <bmaurer-b10kYP2dOMg@public.gmane.org>
> CC: Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>
> CC: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> CC: Josh Triplett <josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org>
> CC: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> CC: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> CC: Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> CC: Shuah Khan <shuah-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> CC: linux-kselftest-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> CC: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> ---
> Changes since v1:
> 
> - Expose similar library API as rseq:  Expose library API closely
>   matching the rseq APIs, following removal of the event counter from
>   the rseq kernel API.
> - Update makefile to fix make run_tests dependency on "all".
> - Introduce a OVERRIDE_TARGETS.
> 
> Changes since v2:
> 
> - Test page faults.
> 
> Changes since v3:
> 
> - Move lib.mk OVERRIDE_TARGETS change to its own patch.
> - Printout TAP output.
> ---
>  MAINTAINERS                                        |    1 +
>  tools/testing/selftests/Makefile                   |    1 +
>  tools/testing/selftests/cpu-opv/.gitignore         |    1 +
>  tools/testing/selftests/cpu-opv/Makefile           |   17 +
>  .../testing/selftests/cpu-opv/basic_cpu_opv_test.c | 1167 ++++++++++++++++++++
>  tools/testing/selftests/cpu-opv/cpu-op.c           |  348 ++++++
>  tools/testing/selftests/cpu-opv/cpu-op.h           |   68 ++
>  7 files changed, 1603 insertions(+)
>  create mode 100644 tools/testing/selftests/cpu-opv/.gitignore
>  create mode 100644 tools/testing/selftests/cpu-opv/Makefile
>  create mode 100644 tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
>  create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.c
>  create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.h
> 

Looks good.

Acked-by: Shuah Khan <shuahkh-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org>

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH update for 4.15 3/3] rseq: selftests: Provide self-tests (v4)
  2017-11-21 22:19 ` [PATCH update for 4.15 3/3] rseq: selftests: Provide self-tests (v4) Mathieu Desnoyers
  2017-11-22 15:23     ` shuah
  (?)
@ 2017-11-22 15:23     ` shuah
  0 siblings, 0 replies; 175+ messages in thread
From: Shuah Khan @ 2017-11-22 15:23 UTC (permalink / raw)
  To: Mathieu Desnoyers, Peter Zijlstra, Paul E . McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, linux-kselftest, Shuah Khan,
	Shuah Khan

On 11/21/2017 03:19 PM, Mathieu Desnoyers wrote:
> Implements two basic tests of RSEQ functionality, and one more
> exhaustive parameterizable test.
> 
> The first, "basic_test" only asserts that RSEQ works moderately
> correctly. E.g. that the CPUID pointer works.
> 
> "basic_percpu_ops_test" is a slightly more "realistic" variant,
> implementing a few simple per-cpu operations and testing their
> correctness.
> 
> "param_test" is a parametrizable restartable sequences test. See
> the "--help" output for usage.
> 
> A run_param_test.sh script runs many variants of the parametrizable
> tests.
> 
> As part of those tests, a helper library "rseq" implements a user-space
> API around restartable sequences. It uses the cpu_opv system call as
> fallback when single-stepped by a debugger. It exposes the instruction
> pointer addresses where the rseq assembly blocks begin and end, as well
> as the associated abort instruction pointer, in the __rseq_table
> section. This section allows debuggers may know where to place
> breakpoints when single-stepping through assembly blocks which may be
> aborted at any point by the kernel.
> 
> The rseq library expose APIs that present the fast-path operations.
> The new from userspace is, e.g. for a counter increment:
> 
>     cpu = rseq_cpu_start();
>     ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>     if (likely(!ret))
>         return 0;        /* Success. */
>     do {
>         cpu = rseq_current_cpu();
>         ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>         if (likely(!ret))
>             return 0;    /* Success. */
>     } while (ret > 0 || errno == EAGAIN);
>     perror("cpu_op_addv");
>     return -1;           /* Unexpected error. */
> 
> PowerPC tests have been implemented by Boqun Feng.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: Russell King <linux@arm.linux.org.uk>
> CC: Catalin Marinas <catalin.marinas@arm.com>
> CC: Will Deacon <will.deacon@arm.com>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Paul Turner <pjt@google.com>
> CC: Andrew Hunter <ahh@google.com>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Andi Kleen <andi@firstfloor.org>
> CC: Dave Watson <davejwatson@fb.com>
> CC: Chris Lameter <cl@linux.com>
> CC: Ingo Molnar <mingo@redhat.com>
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: Ben Maurer <bmaurer@fb.com>
> CC: Steven Rostedt <rostedt@goodmis.org>
> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> CC: Josh Triplett <josh@joshtriplett.org>
> CC: Linus Torvalds <torvalds@linux-foundation.org>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Boqun Feng <boqun.feng@gmail.com>
> CC: Shuah Khan <shuah@kernel.org>
> CC: linux-kselftest@vger.kernel.org
> CC: linux-api@vger.kernel.org
> ---
> Changes since v1:
> - Provide abort-ip signature: The abort-ip signature is located just
>   before the abort-ip target. It is currently hardcoded, but a
>   user-space application could use the __rseq_table to iterate on all
>   abort-ip targets and use a random value as signature if needed in the
>   future.
> - Add rseq_prepare_unload(): Libraries and JIT code using rseq critical
>   sections need to issue rseq_prepare_unload() on each thread at least
>   once before reclaim of struct rseq_cs.
> - Use initial-exec TLS model, non-weak symbol: The initial-exec model is
>   signal-safe, whereas the global-dynamic model is not.  Remove the
>   "weak" symbol attribute from the __rseq_abi in rseq.c. The rseq.so
>   library will have ownership of that symbol, and there is not reason for
>   an application or user library to try to define that symbol.
>   The expected use is to link against libreq.so, which owns and provide
>   that symbol.
> - Set cpu_id to -2 on register error
> - Add rseq_len syscall parameter, rseq_cs version
> - Ensure disassember-friendly signature: x86 32/64 disassembler have a
>   hard time decoding the instruction stream after a bad instruction. Use
>   a nopl instruction to encode the signature. Suggested by Andy Lutomirski.
> - Exercise parametrized tests variants in a shell scripts.
> - Restartable sequences selftests: Remove use of event counter.
> - Use cpu_id_start field:  With the cpu_id_start field, the C
>   preparation phase of the fast-path does not need to compare cpu_id < 0
>   anymore.
> - Signal-safe registration and refcounting: Allow libraries using
>   librseq.so to register it from signal handlers.
> - Use OVERRIDE_TARGETS in makefile.
> - Use "m" constraints for rseq_cs field.
> 
> Changes since v2:
> - Update based on Thomas Gleixner's comments.
> 
> Changes since v3:
> - Generate param_test_skip_fastpath and param_test_benchmark with
>   -DSKIP_FASTPATH and -DBENCHMARK (respectively). Add param_test_fastpath
>   to run_param_test.sh.
> ---
>  MAINTAINERS                                        |    1 +
>  tools/testing/selftests/Makefile                   |    1 +
>  tools/testing/selftests/rseq/.gitignore            |    4 +
>  tools/testing/selftests/rseq/Makefile              |   33 +
>  .../testing/selftests/rseq/basic_percpu_ops_test.c |  333 +++++
>  tools/testing/selftests/rseq/basic_test.c          |   55 +
>  tools/testing/selftests/rseq/param_test.c          | 1285 ++++++++++++++++++++
>  tools/testing/selftests/rseq/rseq-arm.h            |  535 ++++++++
>  tools/testing/selftests/rseq/rseq-ppc.h            |  567 +++++++++
>  tools/testing/selftests/rseq/rseq-x86.h            |  898 ++++++++++++++
>  tools/testing/selftests/rseq/rseq.c                |  116 ++
>  tools/testing/selftests/rseq/rseq.h                |  154 +++
>  tools/testing/selftests/rseq/run_param_test.sh     |  126 ++
>  13 files changed, 4108 insertions(+)
>  create mode 100644 tools/testing/selftests/rseq/.gitignore
>  create mode 100644 tools/testing/selftests/rseq/Makefile
>  create mode 100644 tools/testing/selftests/rseq/basic_percpu_ops_test.c
>  create mode 100644 tools/testing/selftests/rseq/basic_test.c
>  create mode 100644 tools/testing/selftests/rseq/param_test.c
>  create mode 100644 tools/testing/selftests/rseq/rseq-arm.h
>  create mode 100644 tools/testing/selftests/rseq/rseq-ppc.h
>  create mode 100644 tools/testing/selftests/rseq/rseq-x86.h
>  create mode 100644 tools/testing/selftests/rseq/rseq.c
>  create mode 100644 tools/testing/selftests/rseq/rseq.h
>  create mode 100755 tools/testing/selftests/rseq/run_param_test.sh
> 

Looks good.

Acked-by: Shuah Khan <shuahkh@osg.samsung.com>

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [PATCH update for 4.15 3/3] rseq: selftests: Provide self-tests (v4)
@ 2017-11-22 15:23     ` shuah
  0 siblings, 0 replies; 175+ messages in thread
From: shuah @ 2017-11-22 15:23 UTC (permalink / raw)


On 11/21/2017 03:19 PM, Mathieu Desnoyers wrote:
> Implements two basic tests of RSEQ functionality, and one more
> exhaustive parameterizable test.
> 
> The first, "basic_test" only asserts that RSEQ works moderately
> correctly. E.g. that the CPUID pointer works.
> 
> "basic_percpu_ops_test" is a slightly more "realistic" variant,
> implementing a few simple per-cpu operations and testing their
> correctness.
> 
> "param_test" is a parametrizable restartable sequences test. See
> the "--help" output for usage.
> 
> A run_param_test.sh script runs many variants of the parametrizable
> tests.
> 
> As part of those tests, a helper library "rseq" implements a user-space
> API around restartable sequences. It uses the cpu_opv system call as
> fallback when single-stepped by a debugger. It exposes the instruction
> pointer addresses where the rseq assembly blocks begin and end, as well
> as the associated abort instruction pointer, in the __rseq_table
> section. This section allows debuggers may know where to place
> breakpoints when single-stepping through assembly blocks which may be
> aborted at any point by the kernel.
> 
> The rseq library expose APIs that present the fast-path operations.
> The new from userspace is, e.g. for a counter increment:
> 
>     cpu = rseq_cpu_start();
>     ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>     if (likely(!ret))
>         return 0;        /* Success. */
>     do {
>         cpu = rseq_current_cpu();
>         ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>         if (likely(!ret))
>             return 0;    /* Success. */
>     } while (ret > 0 || errno == EAGAIN);
>     perror("cpu_op_addv");
>     return -1;           /* Unexpected error. */
> 
> PowerPC tests have been implemented by Boqun Feng.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> CC: Russell King <linux at arm.linux.org.uk>
> CC: Catalin Marinas <catalin.marinas at arm.com>
> CC: Will Deacon <will.deacon at arm.com>
> CC: Thomas Gleixner <tglx at linutronix.de>
> CC: Paul Turner <pjt at google.com>
> CC: Andrew Hunter <ahh at google.com>
> CC: Peter Zijlstra <peterz at infradead.org>
> CC: Andy Lutomirski <luto at amacapital.net>
> CC: Andi Kleen <andi at firstfloor.org>
> CC: Dave Watson <davejwatson at fb.com>
> CC: Chris Lameter <cl at linux.com>
> CC: Ingo Molnar <mingo at redhat.com>
> CC: "H. Peter Anvin" <hpa at zytor.com>
> CC: Ben Maurer <bmaurer at fb.com>
> CC: Steven Rostedt <rostedt at goodmis.org>
> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
> CC: Josh Triplett <josh at joshtriplett.org>
> CC: Linus Torvalds <torvalds at linux-foundation.org>
> CC: Andrew Morton <akpm at linux-foundation.org>
> CC: Boqun Feng <boqun.feng at gmail.com>
> CC: Shuah Khan <shuah at kernel.org>
> CC: linux-kselftest at vger.kernel.org
> CC: linux-api at vger.kernel.org
> ---
> Changes since v1:
> - Provide abort-ip signature: The abort-ip signature is located just
>   before the abort-ip target. It is currently hardcoded, but a
>   user-space application could use the __rseq_table to iterate on all
>   abort-ip targets and use a random value as signature if needed in the
>   future.
> - Add rseq_prepare_unload(): Libraries and JIT code using rseq critical
>   sections need to issue rseq_prepare_unload() on each thread at least
>   once before reclaim of struct rseq_cs.
> - Use initial-exec TLS model, non-weak symbol: The initial-exec model is
>   signal-safe, whereas the global-dynamic model is not.  Remove the
>   "weak" symbol attribute from the __rseq_abi in rseq.c. The rseq.so
>   library will have ownership of that symbol, and there is not reason for
>   an application or user library to try to define that symbol.
>   The expected use is to link against libreq.so, which owns and provide
>   that symbol.
> - Set cpu_id to -2 on register error
> - Add rseq_len syscall parameter, rseq_cs version
> - Ensure disassember-friendly signature: x86 32/64 disassembler have a
>   hard time decoding the instruction stream after a bad instruction. Use
>   a nopl instruction to encode the signature. Suggested by Andy Lutomirski.
> - Exercise parametrized tests variants in a shell scripts.
> - Restartable sequences selftests: Remove use of event counter.
> - Use cpu_id_start field:  With the cpu_id_start field, the C
>   preparation phase of the fast-path does not need to compare cpu_id < 0
>   anymore.
> - Signal-safe registration and refcounting: Allow libraries using
>   librseq.so to register it from signal handlers.
> - Use OVERRIDE_TARGETS in makefile.
> - Use "m" constraints for rseq_cs field.
> 
> Changes since v2:
> - Update based on Thomas Gleixner's comments.
> 
> Changes since v3:
> - Generate param_test_skip_fastpath and param_test_benchmark with
>   -DSKIP_FASTPATH and -DBENCHMARK (respectively). Add param_test_fastpath
>   to run_param_test.sh.
> ---
>  MAINTAINERS                                        |    1 +
>  tools/testing/selftests/Makefile                   |    1 +
>  tools/testing/selftests/rseq/.gitignore            |    4 +
>  tools/testing/selftests/rseq/Makefile              |   33 +
>  .../testing/selftests/rseq/basic_percpu_ops_test.c |  333 +++++
>  tools/testing/selftests/rseq/basic_test.c          |   55 +
>  tools/testing/selftests/rseq/param_test.c          | 1285 ++++++++++++++++++++
>  tools/testing/selftests/rseq/rseq-arm.h            |  535 ++++++++
>  tools/testing/selftests/rseq/rseq-ppc.h            |  567 +++++++++
>  tools/testing/selftests/rseq/rseq-x86.h            |  898 ++++++++++++++
>  tools/testing/selftests/rseq/rseq.c                |  116 ++
>  tools/testing/selftests/rseq/rseq.h                |  154 +++
>  tools/testing/selftests/rseq/run_param_test.sh     |  126 ++
>  13 files changed, 4108 insertions(+)
>  create mode 100644 tools/testing/selftests/rseq/.gitignore
>  create mode 100644 tools/testing/selftests/rseq/Makefile
>  create mode 100644 tools/testing/selftests/rseq/basic_percpu_ops_test.c
>  create mode 100644 tools/testing/selftests/rseq/basic_test.c
>  create mode 100644 tools/testing/selftests/rseq/param_test.c
>  create mode 100644 tools/testing/selftests/rseq/rseq-arm.h
>  create mode 100644 tools/testing/selftests/rseq/rseq-ppc.h
>  create mode 100644 tools/testing/selftests/rseq/rseq-x86.h
>  create mode 100644 tools/testing/selftests/rseq/rseq.c
>  create mode 100644 tools/testing/selftests/rseq/rseq.h
>  create mode 100755 tools/testing/selftests/rseq/run_param_test.sh
> 

Looks good.

Acked-by: Shuah Khan <shuahkh at osg.samsung.com>

thanks,
-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [PATCH update for 4.15 3/3] rseq: selftests: Provide self-tests (v4)
@ 2017-11-22 15:23     ` shuah
  0 siblings, 0 replies; 175+ messages in thread
From: Shuah Khan @ 2017-11-22 15:23 UTC (permalink / raw)


On 11/21/2017 03:19 PM, Mathieu Desnoyers wrote:
> Implements two basic tests of RSEQ functionality, and one more
> exhaustive parameterizable test.
> 
> The first, "basic_test" only asserts that RSEQ works moderately
> correctly. E.g. that the CPUID pointer works.
> 
> "basic_percpu_ops_test" is a slightly more "realistic" variant,
> implementing a few simple per-cpu operations and testing their
> correctness.
> 
> "param_test" is a parametrizable restartable sequences test. See
> the "--help" output for usage.
> 
> A run_param_test.sh script runs many variants of the parametrizable
> tests.
> 
> As part of those tests, a helper library "rseq" implements a user-space
> API around restartable sequences. It uses the cpu_opv system call as
> fallback when single-stepped by a debugger. It exposes the instruction
> pointer addresses where the rseq assembly blocks begin and end, as well
> as the associated abort instruction pointer, in the __rseq_table
> section. This section allows debuggers may know where to place
> breakpoints when single-stepping through assembly blocks which may be
> aborted at any point by the kernel.
> 
> The rseq library expose APIs that present the fast-path operations.
> The new from userspace is, e.g. for a counter increment:
> 
>     cpu = rseq_cpu_start();
>     ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>     if (likely(!ret))
>         return 0;        /* Success. */
>     do {
>         cpu = rseq_current_cpu();
>         ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>         if (likely(!ret))
>             return 0;    /* Success. */
>     } while (ret > 0 || errno == EAGAIN);
>     perror("cpu_op_addv");
>     return -1;           /* Unexpected error. */
> 
> PowerPC tests have been implemented by Boqun Feng.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> CC: Russell King <linux at arm.linux.org.uk>
> CC: Catalin Marinas <catalin.marinas at arm.com>
> CC: Will Deacon <will.deacon at arm.com>
> CC: Thomas Gleixner <tglx at linutronix.de>
> CC: Paul Turner <pjt at google.com>
> CC: Andrew Hunter <ahh at google.com>
> CC: Peter Zijlstra <peterz at infradead.org>
> CC: Andy Lutomirski <luto at amacapital.net>
> CC: Andi Kleen <andi at firstfloor.org>
> CC: Dave Watson <davejwatson at fb.com>
> CC: Chris Lameter <cl at linux.com>
> CC: Ingo Molnar <mingo at redhat.com>
> CC: "H. Peter Anvin" <hpa at zytor.com>
> CC: Ben Maurer <bmaurer at fb.com>
> CC: Steven Rostedt <rostedt at goodmis.org>
> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
> CC: Josh Triplett <josh at joshtriplett.org>
> CC: Linus Torvalds <torvalds at linux-foundation.org>
> CC: Andrew Morton <akpm at linux-foundation.org>
> CC: Boqun Feng <boqun.feng at gmail.com>
> CC: Shuah Khan <shuah at kernel.org>
> CC: linux-kselftest at vger.kernel.org
> CC: linux-api at vger.kernel.org
> ---
> Changes since v1:
> - Provide abort-ip signature: The abort-ip signature is located just
>   before the abort-ip target. It is currently hardcoded, but a
>   user-space application could use the __rseq_table to iterate on all
>   abort-ip targets and use a random value as signature if needed in the
>   future.
> - Add rseq_prepare_unload(): Libraries and JIT code using rseq critical
>   sections need to issue rseq_prepare_unload() on each thread at least
>   once before reclaim of struct rseq_cs.
> - Use initial-exec TLS model, non-weak symbol: The initial-exec model is
>   signal-safe, whereas the global-dynamic model is not.  Remove the
>   "weak" symbol attribute from the __rseq_abi in rseq.c. The rseq.so
>   library will have ownership of that symbol, and there is not reason for
>   an application or user library to try to define that symbol.
>   The expected use is to link against libreq.so, which owns and provide
>   that symbol.
> - Set cpu_id to -2 on register error
> - Add rseq_len syscall parameter, rseq_cs version
> - Ensure disassember-friendly signature: x86 32/64 disassembler have a
>   hard time decoding the instruction stream after a bad instruction. Use
>   a nopl instruction to encode the signature. Suggested by Andy Lutomirski.
> - Exercise parametrized tests variants in a shell scripts.
> - Restartable sequences selftests: Remove use of event counter.
> - Use cpu_id_start field:  With the cpu_id_start field, the C
>   preparation phase of the fast-path does not need to compare cpu_id < 0
>   anymore.
> - Signal-safe registration and refcounting: Allow libraries using
>   librseq.so to register it from signal handlers.
> - Use OVERRIDE_TARGETS in makefile.
> - Use "m" constraints for rseq_cs field.
> 
> Changes since v2:
> - Update based on Thomas Gleixner's comments.
> 
> Changes since v3:
> - Generate param_test_skip_fastpath and param_test_benchmark with
>   -DSKIP_FASTPATH and -DBENCHMARK (respectively). Add param_test_fastpath
>   to run_param_test.sh.
> ---
>  MAINTAINERS                                        |    1 +
>  tools/testing/selftests/Makefile                   |    1 +
>  tools/testing/selftests/rseq/.gitignore            |    4 +
>  tools/testing/selftests/rseq/Makefile              |   33 +
>  .../testing/selftests/rseq/basic_percpu_ops_test.c |  333 +++++
>  tools/testing/selftests/rseq/basic_test.c          |   55 +
>  tools/testing/selftests/rseq/param_test.c          | 1285 ++++++++++++++++++++
>  tools/testing/selftests/rseq/rseq-arm.h            |  535 ++++++++
>  tools/testing/selftests/rseq/rseq-ppc.h            |  567 +++++++++
>  tools/testing/selftests/rseq/rseq-x86.h            |  898 ++++++++++++++
>  tools/testing/selftests/rseq/rseq.c                |  116 ++
>  tools/testing/selftests/rseq/rseq.h                |  154 +++
>  tools/testing/selftests/rseq/run_param_test.sh     |  126 ++
>  13 files changed, 4108 insertions(+)
>  create mode 100644 tools/testing/selftests/rseq/.gitignore
>  create mode 100644 tools/testing/selftests/rseq/Makefile
>  create mode 100644 tools/testing/selftests/rseq/basic_percpu_ops_test.c
>  create mode 100644 tools/testing/selftests/rseq/basic_test.c
>  create mode 100644 tools/testing/selftests/rseq/param_test.c
>  create mode 100644 tools/testing/selftests/rseq/rseq-arm.h
>  create mode 100644 tools/testing/selftests/rseq/rseq-ppc.h
>  create mode 100644 tools/testing/selftests/rseq/rseq-x86.h
>  create mode 100644 tools/testing/selftests/rseq/rseq.c
>  create mode 100644 tools/testing/selftests/rseq/rseq.h
>  create mode 100755 tools/testing/selftests/rseq/run_param_test.sh
> 

Looks good.

Acked-by: Shuah Khan <shuahkh at osg.samsung.com>

thanks,
-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH update for 4.15 3/3] rseq: selftests: Provide self-tests (v4)
@ 2017-11-22 15:23     ` shuah
  0 siblings, 0 replies; 175+ messages in thread
From: Shuah Khan @ 2017-11-22 15:23 UTC (permalink / raw)
  To: Mathieu Desnoyers, Peter Zijlstra, Paul E . McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk,
	linux-kselftest-u79uwXL29TY76Z2rM5mHXA, Shuah Khan, Shuah Khan

On 11/21/2017 03:19 PM, Mathieu Desnoyers wrote:
> Implements two basic tests of RSEQ functionality, and one more
> exhaustive parameterizable test.
> 
> The first, "basic_test" only asserts that RSEQ works moderately
> correctly. E.g. that the CPUID pointer works.
> 
> "basic_percpu_ops_test" is a slightly more "realistic" variant,
> implementing a few simple per-cpu operations and testing their
> correctness.
> 
> "param_test" is a parametrizable restartable sequences test. See
> the "--help" output for usage.
> 
> A run_param_test.sh script runs many variants of the parametrizable
> tests.
> 
> As part of those tests, a helper library "rseq" implements a user-space
> API around restartable sequences. It uses the cpu_opv system call as
> fallback when single-stepped by a debugger. It exposes the instruction
> pointer addresses where the rseq assembly blocks begin and end, as well
> as the associated abort instruction pointer, in the __rseq_table
> section. This section allows debuggers may know where to place
> breakpoints when single-stepping through assembly blocks which may be
> aborted at any point by the kernel.
> 
> The rseq library expose APIs that present the fast-path operations.
> The new from userspace is, e.g. for a counter increment:
> 
>     cpu = rseq_cpu_start();
>     ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>     if (likely(!ret))
>         return 0;        /* Success. */
>     do {
>         cpu = rseq_current_cpu();
>         ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>         if (likely(!ret))
>             return 0;    /* Success. */
>     } while (ret > 0 || errno == EAGAIN);
>     perror("cpu_op_addv");
>     return -1;           /* Unexpected error. */
> 
> PowerPC tests have been implemented by Boqun Feng.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
> CC: Russell King <linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>
> CC: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
> CC: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
> CC: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
> CC: Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> CC: Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> CC: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
> CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
> CC: Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
> CC: Dave Watson <davejwatson-b10kYP2dOMg@public.gmane.org>
> CC: Chris Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
> CC: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> CC: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
> CC: Ben Maurer <bmaurer-b10kYP2dOMg@public.gmane.org>
> CC: Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>
> CC: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> CC: Josh Triplett <josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org>
> CC: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> CC: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> CC: Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> CC: Shuah Khan <shuah-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> CC: linux-kselftest-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> CC: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> ---
> Changes since v1:
> - Provide abort-ip signature: The abort-ip signature is located just
>   before the abort-ip target. It is currently hardcoded, but a
>   user-space application could use the __rseq_table to iterate on all
>   abort-ip targets and use a random value as signature if needed in the
>   future.
> - Add rseq_prepare_unload(): Libraries and JIT code using rseq critical
>   sections need to issue rseq_prepare_unload() on each thread at least
>   once before reclaim of struct rseq_cs.
> - Use initial-exec TLS model, non-weak symbol: The initial-exec model is
>   signal-safe, whereas the global-dynamic model is not.  Remove the
>   "weak" symbol attribute from the __rseq_abi in rseq.c. The rseq.so
>   library will have ownership of that symbol, and there is not reason for
>   an application or user library to try to define that symbol.
>   The expected use is to link against libreq.so, which owns and provide
>   that symbol.
> - Set cpu_id to -2 on register error
> - Add rseq_len syscall parameter, rseq_cs version
> - Ensure disassember-friendly signature: x86 32/64 disassembler have a
>   hard time decoding the instruction stream after a bad instruction. Use
>   a nopl instruction to encode the signature. Suggested by Andy Lutomirski.
> - Exercise parametrized tests variants in a shell scripts.
> - Restartable sequences selftests: Remove use of event counter.
> - Use cpu_id_start field:  With the cpu_id_start field, the C
>   preparation phase of the fast-path does not need to compare cpu_id < 0
>   anymore.
> - Signal-safe registration and refcounting: Allow libraries using
>   librseq.so to register it from signal handlers.
> - Use OVERRIDE_TARGETS in makefile.
> - Use "m" constraints for rseq_cs field.
> 
> Changes since v2:
> - Update based on Thomas Gleixner's comments.
> 
> Changes since v3:
> - Generate param_test_skip_fastpath and param_test_benchmark with
>   -DSKIP_FASTPATH and -DBENCHMARK (respectively). Add param_test_fastpath
>   to run_param_test.sh.
> ---
>  MAINTAINERS                                        |    1 +
>  tools/testing/selftests/Makefile                   |    1 +
>  tools/testing/selftests/rseq/.gitignore            |    4 +
>  tools/testing/selftests/rseq/Makefile              |   33 +
>  .../testing/selftests/rseq/basic_percpu_ops_test.c |  333 +++++
>  tools/testing/selftests/rseq/basic_test.c          |   55 +
>  tools/testing/selftests/rseq/param_test.c          | 1285 ++++++++++++++++++++
>  tools/testing/selftests/rseq/rseq-arm.h            |  535 ++++++++
>  tools/testing/selftests/rseq/rseq-ppc.h            |  567 +++++++++
>  tools/testing/selftests/rseq/rseq-x86.h            |  898 ++++++++++++++
>  tools/testing/selftests/rseq/rseq.c                |  116 ++
>  tools/testing/selftests/rseq/rseq.h                |  154 +++
>  tools/testing/selftests/rseq/run_param_test.sh     |  126 ++
>  13 files changed, 4108 insertions(+)
>  create mode 100644 tools/testing/selftests/rseq/.gitignore
>  create mode 100644 tools/testing/selftests/rseq/Makefile
>  create mode 100644 tools/testing/selftests/rseq/basic_percpu_ops_test.c
>  create mode 100644 tools/testing/selftests/rseq/basic_test.c
>  create mode 100644 tools/testing/selftests/rseq/param_test.c
>  create mode 100644 tools/testing/selftests/rseq/rseq-arm.h
>  create mode 100644 tools/testing/selftests/rseq/rseq-ppc.h
>  create mode 100644 tools/testing/selftests/rseq/rseq-x86.h
>  create mode 100644 tools/testing/selftests/rseq/rseq.c
>  create mode 100644 tools/testing/selftests/rseq/rseq.h
>  create mode 100755 tools/testing/selftests/rseq/run_param_test.sh
> 

Looks good.

Acked-by: Shuah Khan <shuahkh-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org>

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-22 15:25           ` Thomas Gleixner
  0 siblings, 0 replies; 175+ messages in thread
From: Thomas Gleixner @ 2017-11-22 15:25 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson, linux-kernel, linux-api,
	Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andrew Hunter, Chris Lameter, Ben Maurer,
	rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk

On Wed, 22 Nov 2017, Mathieu Desnoyers wrote:
> ----- On Nov 21, 2017, at 5:59 PM, Thomas Gleixner tglx@linutronix.de wrote:
> >> Using vDSO as an example of how things should be done is just wrong: the
> >> vDSO interaction with debugger instruction single-stepping is broken,
> >> as I detailed in my previous email.
> > 
> > Let me turn that around. You're lamenting about a conditional branch in
> > your rseq thing for performance reasons and at the same time you want to
> > force extra code into the VDSO? clock_gettime() is one of the hottest
> > vsyscalls in certain scenarions. So why would we want to have extra code
> > there? Just to make debuggers happy. You really can't be serious about
> > that.
> 
> There is *already* an existing branch in the clock_gettime vsyscall:
> it's a loop. It won't hurt the fast-path to use that branch and
> make it do something else instead. It could even help the vDSO fast-path
> for some non-x86 architectures where branch prediction assumes that
> backward branches are always taken (adding an unlikely() does not help
> in those cases).

Yes, there is an existing branch, but forcing that thing into the real
syscall when the seqcount does not match is silly.

> >> Thomas' proposal of handling single-stepping with a user-space locking
> >> fallback, which is pretty much what I had in 2016, pushes a lot of
> >> complexity to user-space, requires an extra branch in the fast-path,
> >> as well as additional store-release/load-acquire semantics for consistency.
> >> I don't plan going down that route.
> >>
> >> Other than that, I have not received any concrete alternative proposal to
> >> properly handle single-stepping.
> > 
> > You provided the details today. Up to that point all we had was handwaving
> > and inconsistent information.
> 
> I mistakenly presumed you took interest in the past 2 years discussions.
> It appears I was wrong, and that information needed to be summarized in
> my changelog. This was my mistake and I fixed it.

I took interest, but I did not follow all the details. And it's not about
me. Anyone who looks at those patches for whatever reason should get enough
information.

> >> The only opposition against cpu_opv is that there *should* be an hypothetical
> >> simpler solution. The rseq idea is not new: it's been presented by Paul Turner
> >> in 2012 at LPC. And so far, cpu_opv is the overall simplest and most
> >> efficient way I encountered to handle single-stepping, and it gives extra
> >> benefits, as described in my changelog.
> > 
> > That's how you define it and that does not make cpu_opv less complex and
> > more debuggable. There is no way to debug that and still you claim that it
> > removes compexity from user space.
> 
> So I should ask: what kind of observability within cpu_opv() do you want ?
> I can add a tracepoint for each operation, which would technically take care
> of your concern. You main counter-argument seems to be a tooling issue.

Yes, it's a tooling issue and this is issue is not going to be solved
faster than gdb support for skipping rseq sections.

> > That ops stuff comes from user space and
> > is not magically constructed by the kernel. In some of your use cases it
> > even has different semantics than the rseq section code. So how is that
> > removing any complexity from user space? All it buys you is an extra branch
> > less in your rseq hotpath and that's your justification to shove that
> > thing into the kernel.
> 
> Actually, the cpu-op user-space library can hide this difference from the
> user: I implemented the equivalent rseq algorithm using a compare-and-store:

Yes, you implemented it and I don't want to know how long it took to get it
right. But syscalls are not coupled to be used with a particular library.

> So from a library user perspective, the fast-path and slow-path are
> exactly the same.

Aside of the fact that they are not the same in terms of implementation and
semantics. I'm looking at the syscall and not reviewing your magic user
space library.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-22 15:25           ` Thomas Gleixner
  0 siblings, 0 replies; 175+ messages in thread
From: Thomas Gleixner @ 2017-11-22 15:25 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson, linux-kernel, linux-api,
	Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andrew Hunter, Chris Lameter, Ben Maurer,
	rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will

On Wed, 22 Nov 2017, Mathieu Desnoyers wrote:
> ----- On Nov 21, 2017, at 5:59 PM, Thomas Gleixner tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org wrote:
> >> Using vDSO as an example of how things should be done is just wrong: the
> >> vDSO interaction with debugger instruction single-stepping is broken,
> >> as I detailed in my previous email.
> > 
> > Let me turn that around. You're lamenting about a conditional branch in
> > your rseq thing for performance reasons and at the same time you want to
> > force extra code into the VDSO? clock_gettime() is one of the hottest
> > vsyscalls in certain scenarions. So why would we want to have extra code
> > there? Just to make debuggers happy. You really can't be serious about
> > that.
> 
> There is *already* an existing branch in the clock_gettime vsyscall:
> it's a loop. It won't hurt the fast-path to use that branch and
> make it do something else instead. It could even help the vDSO fast-path
> for some non-x86 architectures where branch prediction assumes that
> backward branches are always taken (adding an unlikely() does not help
> in those cases).

Yes, there is an existing branch, but forcing that thing into the real
syscall when the seqcount does not match is silly.

> >> Thomas' proposal of handling single-stepping with a user-space locking
> >> fallback, which is pretty much what I had in 2016, pushes a lot of
> >> complexity to user-space, requires an extra branch in the fast-path,
> >> as well as additional store-release/load-acquire semantics for consistency.
> >> I don't plan going down that route.
> >>
> >> Other than that, I have not received any concrete alternative proposal to
> >> properly handle single-stepping.
> > 
> > You provided the details today. Up to that point all we had was handwaving
> > and inconsistent information.
> 
> I mistakenly presumed you took interest in the past 2 years discussions.
> It appears I was wrong, and that information needed to be summarized in
> my changelog. This was my mistake and I fixed it.

I took interest, but I did not follow all the details. And it's not about
me. Anyone who looks at those patches for whatever reason should get enough
information.

> >> The only opposition against cpu_opv is that there *should* be an hypothetical
> >> simpler solution. The rseq idea is not new: it's been presented by Paul Turner
> >> in 2012 at LPC. And so far, cpu_opv is the overall simplest and most
> >> efficient way I encountered to handle single-stepping, and it gives extra
> >> benefits, as described in my changelog.
> > 
> > That's how you define it and that does not make cpu_opv less complex and
> > more debuggable. There is no way to debug that and still you claim that it
> > removes compexity from user space.
> 
> So I should ask: what kind of observability within cpu_opv() do you want ?
> I can add a tracepoint for each operation, which would technically take care
> of your concern. You main counter-argument seems to be a tooling issue.

Yes, it's a tooling issue and this is issue is not going to be solved
faster than gdb support for skipping rseq sections.

> > That ops stuff comes from user space and
> > is not magically constructed by the kernel. In some of your use cases it
> > even has different semantics than the rseq section code. So how is that
> > removing any complexity from user space? All it buys you is an extra branch
> > less in your rseq hotpath and that's your justification to shove that
> > thing into the kernel.
> 
> Actually, the cpu-op user-space library can hide this difference from the
> user: I implemented the equivalent rseq algorithm using a compare-and-store:

Yes, you implemented it and I don't want to know how long it took to get it
right. But syscalls are not coupled to be used with a particular library.

> So from a library user perspective, the fast-path and slow-path are
> exactly the same.

Aside of the fact that they are not the same in terms of implementation and
semantics. I'm looking at the syscall and not reviewing your magic user
space library.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
  2017-11-21 22:05     ` Mathieu Desnoyers
@ 2017-11-22 15:28       ` Andy Lutomirski
  -1 siblings, 0 replies; 175+ messages in thread
From: Andy Lutomirski @ 2017-11-22 15:28 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Chris Lameter, Ben Maurer, rostedt, Josh Triplett,
	Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk

On Tue, Nov 21, 2017 at 2:05 PM, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
> ----- On Nov 21, 2017, at 12:21 PM, Andi Kleen andi@firstfloor.org wrote:
>
>> On Tue, Nov 21, 2017 at 09:18:38AM -0500, Mathieu Desnoyers wrote:
>>> Hi,
>>>
>>> Following changes based on a thorough coding style and patch changelog
>>> review from Thomas Gleixner and Peter Zijlstra, I'm respinning this
>>> series for another RFC.
>>>
>> My suggestion would be that you also split out the opv system call.
>> That seems to be main contention point currently, and the restartable
>> sequences should be useful without it.
>
> I consider rseq to be incomplete and a pain to use in various scenarios
> without cpu_opv.
>
> About the contention point you refer to:
>
> Using vDSO as an example of how things should be done is just wrong: the
> vDSO interaction with debugger instruction single-stepping is broken,
> as I detailed in my previous email.
>

If anyone ever reports that as a problem, I'll gladly fix it in the
kernel.  That's doable without an ABI change.  If rseq-like things
started breaking single-stepping, we can't just fix it in the kernel.

Also, there is one and only one vclock_gettime.  Debuggers can easily
special-case it.  For all I know, they already do.

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-22 15:28       ` Andy Lutomirski
  0 siblings, 0 replies; 175+ messages in thread
From: Andy Lutomirski @ 2017-11-22 15:28 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Chris Lameter, Ben Maurer, rostedt, Josh Triplett,
	Linus Torvalds, Catalin Marinas, Will

On Tue, Nov 21, 2017 at 2:05 PM, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
> ----- On Nov 21, 2017, at 12:21 PM, Andi Kleen andi@firstfloor.org wrote:
>
>> On Tue, Nov 21, 2017 at 09:18:38AM -0500, Mathieu Desnoyers wrote:
>>> Hi,
>>>
>>> Following changes based on a thorough coding style and patch changelog
>>> review from Thomas Gleixner and Peter Zijlstra, I'm respinning this
>>> series for another RFC.
>>>
>> My suggestion would be that you also split out the opv system call.
>> That seems to be main contention point currently, and the restartable
>> sequences should be useful without it.
>
> I consider rseq to be incomplete and a pain to use in various scenarios
> without cpu_opv.
>
> About the contention point you refer to:
>
> Using vDSO as an example of how things should be done is just wrong: the
> vDSO interaction with debugger instruction single-stepping is broken,
> as I detailed in my previous email.
>

If anyone ever reports that as a problem, I'll gladly fix it in the
kernel.  That's doable without an ABI change.  If rseq-like things
started breaking single-stepping, we can't just fix it in the kernel.

Also, there is one and only one vclock_gettime.  Debuggers can easily
special-case it.  For all I know, they already do.

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH update for 4.15 3/3] rseq: selftests: Provide self-tests (v4)
  2017-11-22 15:23     ` shuah
  (?)
  (?)
@ 2017-11-22 16:31       ` mathieu.desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-22 16:31 UTC (permalink / raw)
  To: shuah
  Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, linux-kselftest, Shuah Khan

----- On Nov 22, 2017, at 10:23 AM, shuah shuah@kernel.org wrote:

> On 11/21/2017 03:19 PM, Mathieu Desnoyers wrote:
>> Implements two basic tests of RSEQ functionality, and one more
>> exhaustive parameterizable test.
>> 
>> The first, "basic_test" only asserts that RSEQ works moderately
>> correctly. E.g. that the CPUID pointer works.
>> 
>> "basic_percpu_ops_test" is a slightly more "realistic" variant,
>> implementing a few simple per-cpu operations and testing their
>> correctness.
>> 
>> "param_test" is a parametrizable restartable sequences test. See
>> the "--help" output for usage.
>> 
>> A run_param_test.sh script runs many variants of the parametrizable
>> tests.
>> 
>> As part of those tests, a helper library "rseq" implements a user-space
>> API around restartable sequences. It uses the cpu_opv system call as
>> fallback when single-stepped by a debugger. It exposes the instruction
>> pointer addresses where the rseq assembly blocks begin and end, as well
>> as the associated abort instruction pointer, in the __rseq_table
>> section. This section allows debuggers may know where to place
>> breakpoints when single-stepping through assembly blocks which may be
>> aborted at any point by the kernel.
>> 
>> The rseq library expose APIs that present the fast-path operations.
>> The new from userspace is, e.g. for a counter increment:
>> 
>>     cpu = rseq_cpu_start();
>>     ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>>     if (likely(!ret))
>>         return 0;        /* Success. */
>>     do {
>>         cpu = rseq_current_cpu();
>>         ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>>         if (likely(!ret))
>>             return 0;    /* Success. */
>>     } while (ret > 0 || errno == EAGAIN);
>>     perror("cpu_op_addv");
>>     return -1;           /* Unexpected error. */
>> 
>> PowerPC tests have been implemented by Boqun Feng.
>> 
>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> CC: Russell King <linux@arm.linux.org.uk>
>> CC: Catalin Marinas <catalin.marinas@arm.com>
>> CC: Will Deacon <will.deacon@arm.com>
>> CC: Thomas Gleixner <tglx@linutronix.de>
>> CC: Paul Turner <pjt@google.com>
>> CC: Andrew Hunter <ahh@google.com>
>> CC: Peter Zijlstra <peterz@infradead.org>
>> CC: Andy Lutomirski <luto@amacapital.net>
>> CC: Andi Kleen <andi@firstfloor.org>
>> CC: Dave Watson <davejwatson@fb.com>
>> CC: Chris Lameter <cl@linux.com>
>> CC: Ingo Molnar <mingo@redhat.com>
>> CC: "H. Peter Anvin" <hpa@zytor.com>
>> CC: Ben Maurer <bmaurer@fb.com>
>> CC: Steven Rostedt <rostedt@goodmis.org>
>> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
>> CC: Josh Triplett <josh@joshtriplett.org>
>> CC: Linus Torvalds <torvalds@linux-foundation.org>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>> CC: Boqun Feng <boqun.feng@gmail.com>
>> CC: Shuah Khan <shuah@kernel.org>
>> CC: linux-kselftest@vger.kernel.org
>> CC: linux-api@vger.kernel.org
>> ---
>> Changes since v1:
>> - Provide abort-ip signature: The abort-ip signature is located just
>>   before the abort-ip target. It is currently hardcoded, but a
>>   user-space application could use the __rseq_table to iterate on all
>>   abort-ip targets and use a random value as signature if needed in the
>>   future.
>> - Add rseq_prepare_unload(): Libraries and JIT code using rseq critical
>>   sections need to issue rseq_prepare_unload() on each thread at least
>>   once before reclaim of struct rseq_cs.
>> - Use initial-exec TLS model, non-weak symbol: The initial-exec model is
>>   signal-safe, whereas the global-dynamic model is not.  Remove the
>>   "weak" symbol attribute from the __rseq_abi in rseq.c. The rseq.so
>>   library will have ownership of that symbol, and there is not reason for
>>   an application or user library to try to define that symbol.
>>   The expected use is to link against libreq.so, which owns and provide
>>   that symbol.
>> - Set cpu_id to -2 on register error
>> - Add rseq_len syscall parameter, rseq_cs version
>> - Ensure disassember-friendly signature: x86 32/64 disassembler have a
>>   hard time decoding the instruction stream after a bad instruction. Use
>>   a nopl instruction to encode the signature. Suggested by Andy Lutomirski.
>> - Exercise parametrized tests variants in a shell scripts.
>> - Restartable sequences selftests: Remove use of event counter.
>> - Use cpu_id_start field:  With the cpu_id_start field, the C
>>   preparation phase of the fast-path does not need to compare cpu_id < 0
>>   anymore.
>> - Signal-safe registration and refcounting: Allow libraries using
>>   librseq.so to register it from signal handlers.
>> - Use OVERRIDE_TARGETS in makefile.
>> - Use "m" constraints for rseq_cs field.
>> 
>> Changes since v2:
>> - Update based on Thomas Gleixner's comments.
>> 
>> Changes since v3:
>> - Generate param_test_skip_fastpath and param_test_benchmark with
>>   -DSKIP_FASTPATH and -DBENCHMARK (respectively). Add param_test_fastpath
>>   to run_param_test.sh.
>> ---
>>  MAINTAINERS                                        |    1 +
>>  tools/testing/selftests/Makefile                   |    1 +
>>  tools/testing/selftests/rseq/.gitignore            |    4 +
>>  tools/testing/selftests/rseq/Makefile              |   33 +
>>  .../testing/selftests/rseq/basic_percpu_ops_test.c |  333 +++++
>>  tools/testing/selftests/rseq/basic_test.c          |   55 +
>>  tools/testing/selftests/rseq/param_test.c          | 1285 ++++++++++++++++++++
>>  tools/testing/selftests/rseq/rseq-arm.h            |  535 ++++++++
>>  tools/testing/selftests/rseq/rseq-ppc.h            |  567 +++++++++
>>  tools/testing/selftests/rseq/rseq-x86.h            |  898 ++++++++++++++
>>  tools/testing/selftests/rseq/rseq.c                |  116 ++
>>  tools/testing/selftests/rseq/rseq.h                |  154 +++
>>  tools/testing/selftests/rseq/run_param_test.sh     |  126 ++
>>  13 files changed, 4108 insertions(+)
>>  create mode 100644 tools/testing/selftests/rseq/.gitignore
>>  create mode 100644 tools/testing/selftests/rseq/Makefile
>>  create mode 100644 tools/testing/selftests/rseq/basic_percpu_ops_test.c
>>  create mode 100644 tools/testing/selftests/rseq/basic_test.c
>>  create mode 100644 tools/testing/selftests/rseq/param_test.c
>>  create mode 100644 tools/testing/selftests/rseq/rseq-arm.h
>>  create mode 100644 tools/testing/selftests/rseq/rseq-ppc.h
>>  create mode 100644 tools/testing/selftests/rseq/rseq-x86.h
>>  create mode 100644 tools/testing/selftests/rseq/rseq.c
>>  create mode 100644 tools/testing/selftests/rseq/rseq.h
>>  create mode 100755 tools/testing/selftests/rseq/run_param_test.sh
>> 
> 
> Looks good.
> 
> Acked-by: Shuah Khan <shuahkh@osg.samsung.com>

Thanks the the reviews!

Mathieu

> 
> thanks,
> -- Shuah

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [PATCH update for 4.15 3/3] rseq: selftests: Provide self-tests (v4)
@ 2017-11-22 16:31       ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: mathieu.desnoyers @ 2017-11-22 16:31 UTC (permalink / raw)


----- On Nov 22, 2017, at 10:23 AM, shuah shuah at kernel.org wrote:

> On 11/21/2017 03:19 PM, Mathieu Desnoyers wrote:
>> Implements two basic tests of RSEQ functionality, and one more
>> exhaustive parameterizable test.
>> 
>> The first, "basic_test" only asserts that RSEQ works moderately
>> correctly. E.g. that the CPUID pointer works.
>> 
>> "basic_percpu_ops_test" is a slightly more "realistic" variant,
>> implementing a few simple per-cpu operations and testing their
>> correctness.
>> 
>> "param_test" is a parametrizable restartable sequences test. See
>> the "--help" output for usage.
>> 
>> A run_param_test.sh script runs many variants of the parametrizable
>> tests.
>> 
>> As part of those tests, a helper library "rseq" implements a user-space
>> API around restartable sequences. It uses the cpu_opv system call as
>> fallback when single-stepped by a debugger. It exposes the instruction
>> pointer addresses where the rseq assembly blocks begin and end, as well
>> as the associated abort instruction pointer, in the __rseq_table
>> section. This section allows debuggers may know where to place
>> breakpoints when single-stepping through assembly blocks which may be
>> aborted at any point by the kernel.
>> 
>> The rseq library expose APIs that present the fast-path operations.
>> The new from userspace is, e.g. for a counter increment:
>> 
>>     cpu = rseq_cpu_start();
>>     ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>>     if (likely(!ret))
>>         return 0;        /* Success. */
>>     do {
>>         cpu = rseq_current_cpu();
>>         ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>>         if (likely(!ret))
>>             return 0;    /* Success. */
>>     } while (ret > 0 || errno == EAGAIN);
>>     perror("cpu_op_addv");
>>     return -1;           /* Unexpected error. */
>> 
>> PowerPC tests have been implemented by Boqun Feng.
>> 
>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
>> CC: Russell King <linux at arm.linux.org.uk>
>> CC: Catalin Marinas <catalin.marinas at arm.com>
>> CC: Will Deacon <will.deacon at arm.com>
>> CC: Thomas Gleixner <tglx at linutronix.de>
>> CC: Paul Turner <pjt at google.com>
>> CC: Andrew Hunter <ahh at google.com>
>> CC: Peter Zijlstra <peterz at infradead.org>
>> CC: Andy Lutomirski <luto at amacapital.net>
>> CC: Andi Kleen <andi at firstfloor.org>
>> CC: Dave Watson <davejwatson at fb.com>
>> CC: Chris Lameter <cl at linux.com>
>> CC: Ingo Molnar <mingo at redhat.com>
>> CC: "H. Peter Anvin" <hpa at zytor.com>
>> CC: Ben Maurer <bmaurer at fb.com>
>> CC: Steven Rostedt <rostedt at goodmis.org>
>> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
>> CC: Josh Triplett <josh at joshtriplett.org>
>> CC: Linus Torvalds <torvalds at linux-foundation.org>
>> CC: Andrew Morton <akpm at linux-foundation.org>
>> CC: Boqun Feng <boqun.feng at gmail.com>
>> CC: Shuah Khan <shuah at kernel.org>
>> CC: linux-kselftest at vger.kernel.org
>> CC: linux-api at vger.kernel.org
>> ---
>> Changes since v1:
>> - Provide abort-ip signature: The abort-ip signature is located just
>>   before the abort-ip target. It is currently hardcoded, but a
>>   user-space application could use the __rseq_table to iterate on all
>>   abort-ip targets and use a random value as signature if needed in the
>>   future.
>> - Add rseq_prepare_unload(): Libraries and JIT code using rseq critical
>>   sections need to issue rseq_prepare_unload() on each thread at least
>>   once before reclaim of struct rseq_cs.
>> - Use initial-exec TLS model, non-weak symbol: The initial-exec model is
>>   signal-safe, whereas the global-dynamic model is not.  Remove the
>>   "weak" symbol attribute from the __rseq_abi in rseq.c. The rseq.so
>>   library will have ownership of that symbol, and there is not reason for
>>   an application or user library to try to define that symbol.
>>   The expected use is to link against libreq.so, which owns and provide
>>   that symbol.
>> - Set cpu_id to -2 on register error
>> - Add rseq_len syscall parameter, rseq_cs version
>> - Ensure disassember-friendly signature: x86 32/64 disassembler have a
>>   hard time decoding the instruction stream after a bad instruction. Use
>>   a nopl instruction to encode the signature. Suggested by Andy Lutomirski.
>> - Exercise parametrized tests variants in a shell scripts.
>> - Restartable sequences selftests: Remove use of event counter.
>> - Use cpu_id_start field:  With the cpu_id_start field, the C
>>   preparation phase of the fast-path does not need to compare cpu_id < 0
>>   anymore.
>> - Signal-safe registration and refcounting: Allow libraries using
>>   librseq.so to register it from signal handlers.
>> - Use OVERRIDE_TARGETS in makefile.
>> - Use "m" constraints for rseq_cs field.
>> 
>> Changes since v2:
>> - Update based on Thomas Gleixner's comments.
>> 
>> Changes since v3:
>> - Generate param_test_skip_fastpath and param_test_benchmark with
>>   -DSKIP_FASTPATH and -DBENCHMARK (respectively). Add param_test_fastpath
>>   to run_param_test.sh.
>> ---
>>  MAINTAINERS                                        |    1 +
>>  tools/testing/selftests/Makefile                   |    1 +
>>  tools/testing/selftests/rseq/.gitignore            |    4 +
>>  tools/testing/selftests/rseq/Makefile              |   33 +
>>  .../testing/selftests/rseq/basic_percpu_ops_test.c |  333 +++++
>>  tools/testing/selftests/rseq/basic_test.c          |   55 +
>>  tools/testing/selftests/rseq/param_test.c          | 1285 ++++++++++++++++++++
>>  tools/testing/selftests/rseq/rseq-arm.h            |  535 ++++++++
>>  tools/testing/selftests/rseq/rseq-ppc.h            |  567 +++++++++
>>  tools/testing/selftests/rseq/rseq-x86.h            |  898 ++++++++++++++
>>  tools/testing/selftests/rseq/rseq.c                |  116 ++
>>  tools/testing/selftests/rseq/rseq.h                |  154 +++
>>  tools/testing/selftests/rseq/run_param_test.sh     |  126 ++
>>  13 files changed, 4108 insertions(+)
>>  create mode 100644 tools/testing/selftests/rseq/.gitignore
>>  create mode 100644 tools/testing/selftests/rseq/Makefile
>>  create mode 100644 tools/testing/selftests/rseq/basic_percpu_ops_test.c
>>  create mode 100644 tools/testing/selftests/rseq/basic_test.c
>>  create mode 100644 tools/testing/selftests/rseq/param_test.c
>>  create mode 100644 tools/testing/selftests/rseq/rseq-arm.h
>>  create mode 100644 tools/testing/selftests/rseq/rseq-ppc.h
>>  create mode 100644 tools/testing/selftests/rseq/rseq-x86.h
>>  create mode 100644 tools/testing/selftests/rseq/rseq.c
>>  create mode 100644 tools/testing/selftests/rseq/rseq.h
>>  create mode 100755 tools/testing/selftests/rseq/run_param_test.sh
>> 
> 
> Looks good.
> 
> Acked-by: Shuah Khan <shuahkh at osg.samsung.com>

Thanks the the reviews!

Mathieu

> 
> thanks,
> -- Shuah

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [PATCH update for 4.15 3/3] rseq: selftests: Provide self-tests (v4)
@ 2017-11-22 16:31       ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-22 16:31 UTC (permalink / raw)


----- On Nov 22, 2017,@10:23 AM, shuah shuah@kernel.org wrote:

> On 11/21/2017 03:19 PM, Mathieu Desnoyers wrote:
>> Implements two basic tests of RSEQ functionality, and one more
>> exhaustive parameterizable test.
>> 
>> The first, "basic_test" only asserts that RSEQ works moderately
>> correctly. E.g. that the CPUID pointer works.
>> 
>> "basic_percpu_ops_test" is a slightly more "realistic" variant,
>> implementing a few simple per-cpu operations and testing their
>> correctness.
>> 
>> "param_test" is a parametrizable restartable sequences test. See
>> the "--help" output for usage.
>> 
>> A run_param_test.sh script runs many variants of the parametrizable
>> tests.
>> 
>> As part of those tests, a helper library "rseq" implements a user-space
>> API around restartable sequences. It uses the cpu_opv system call as
>> fallback when single-stepped by a debugger. It exposes the instruction
>> pointer addresses where the rseq assembly blocks begin and end, as well
>> as the associated abort instruction pointer, in the __rseq_table
>> section. This section allows debuggers may know where to place
>> breakpoints when single-stepping through assembly blocks which may be
>> aborted at any point by the kernel.
>> 
>> The rseq library expose APIs that present the fast-path operations.
>> The new from userspace is, e.g. for a counter increment:
>> 
>>     cpu = rseq_cpu_start();
>>     ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>>     if (likely(!ret))
>>         return 0;        /* Success. */
>>     do {
>>         cpu = rseq_current_cpu();
>>         ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>>         if (likely(!ret))
>>             return 0;    /* Success. */
>>     } while (ret > 0 || errno == EAGAIN);
>>     perror("cpu_op_addv");
>>     return -1;           /* Unexpected error. */
>> 
>> PowerPC tests have been implemented by Boqun Feng.
>> 
>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
>> CC: Russell King <linux at arm.linux.org.uk>
>> CC: Catalin Marinas <catalin.marinas at arm.com>
>> CC: Will Deacon <will.deacon at arm.com>
>> CC: Thomas Gleixner <tglx at linutronix.de>
>> CC: Paul Turner <pjt at google.com>
>> CC: Andrew Hunter <ahh at google.com>
>> CC: Peter Zijlstra <peterz at infradead.org>
>> CC: Andy Lutomirski <luto at amacapital.net>
>> CC: Andi Kleen <andi at firstfloor.org>
>> CC: Dave Watson <davejwatson at fb.com>
>> CC: Chris Lameter <cl at linux.com>
>> CC: Ingo Molnar <mingo at redhat.com>
>> CC: "H. Peter Anvin" <hpa at zytor.com>
>> CC: Ben Maurer <bmaurer at fb.com>
>> CC: Steven Rostedt <rostedt at goodmis.org>
>> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
>> CC: Josh Triplett <josh at joshtriplett.org>
>> CC: Linus Torvalds <torvalds at linux-foundation.org>
>> CC: Andrew Morton <akpm at linux-foundation.org>
>> CC: Boqun Feng <boqun.feng at gmail.com>
>> CC: Shuah Khan <shuah at kernel.org>
>> CC: linux-kselftest at vger.kernel.org
>> CC: linux-api at vger.kernel.org
>> ---
>> Changes since v1:
>> - Provide abort-ip signature: The abort-ip signature is located just
>>   before the abort-ip target. It is currently hardcoded, but a
>>   user-space application could use the __rseq_table to iterate on all
>>   abort-ip targets and use a random value as signature if needed in the
>>   future.
>> - Add rseq_prepare_unload(): Libraries and JIT code using rseq critical
>>   sections need to issue rseq_prepare_unload() on each thread at least
>>   once before reclaim of struct rseq_cs.
>> - Use initial-exec TLS model, non-weak symbol: The initial-exec model is
>>   signal-safe, whereas the global-dynamic model is not.  Remove the
>>   "weak" symbol attribute from the __rseq_abi in rseq.c. The rseq.so
>>   library will have ownership of that symbol, and there is not reason for
>>   an application or user library to try to define that symbol.
>>   The expected use is to link against libreq.so, which owns and provide
>>   that symbol.
>> - Set cpu_id to -2 on register error
>> - Add rseq_len syscall parameter, rseq_cs version
>> - Ensure disassember-friendly signature: x86 32/64 disassembler have a
>>   hard time decoding the instruction stream after a bad instruction. Use
>>   a nopl instruction to encode the signature. Suggested by Andy Lutomirski.
>> - Exercise parametrized tests variants in a shell scripts.
>> - Restartable sequences selftests: Remove use of event counter.
>> - Use cpu_id_start field:  With the cpu_id_start field, the C
>>   preparation phase of the fast-path does not need to compare cpu_id < 0
>>   anymore.
>> - Signal-safe registration and refcounting: Allow libraries using
>>   librseq.so to register it from signal handlers.
>> - Use OVERRIDE_TARGETS in makefile.
>> - Use "m" constraints for rseq_cs field.
>> 
>> Changes since v2:
>> - Update based on Thomas Gleixner's comments.
>> 
>> Changes since v3:
>> - Generate param_test_skip_fastpath and param_test_benchmark with
>>   -DSKIP_FASTPATH and -DBENCHMARK (respectively). Add param_test_fastpath
>>   to run_param_test.sh.
>> ---
>>  MAINTAINERS                                        |    1 +
>>  tools/testing/selftests/Makefile                   |    1 +
>>  tools/testing/selftests/rseq/.gitignore            |    4 +
>>  tools/testing/selftests/rseq/Makefile              |   33 +
>>  .../testing/selftests/rseq/basic_percpu_ops_test.c |  333 +++++
>>  tools/testing/selftests/rseq/basic_test.c          |   55 +
>>  tools/testing/selftests/rseq/param_test.c          | 1285 ++++++++++++++++++++
>>  tools/testing/selftests/rseq/rseq-arm.h            |  535 ++++++++
>>  tools/testing/selftests/rseq/rseq-ppc.h            |  567 +++++++++
>>  tools/testing/selftests/rseq/rseq-x86.h            |  898 ++++++++++++++
>>  tools/testing/selftests/rseq/rseq.c                |  116 ++
>>  tools/testing/selftests/rseq/rseq.h                |  154 +++
>>  tools/testing/selftests/rseq/run_param_test.sh     |  126 ++
>>  13 files changed, 4108 insertions(+)
>>  create mode 100644 tools/testing/selftests/rseq/.gitignore
>>  create mode 100644 tools/testing/selftests/rseq/Makefile
>>  create mode 100644 tools/testing/selftests/rseq/basic_percpu_ops_test.c
>>  create mode 100644 tools/testing/selftests/rseq/basic_test.c
>>  create mode 100644 tools/testing/selftests/rseq/param_test.c
>>  create mode 100644 tools/testing/selftests/rseq/rseq-arm.h
>>  create mode 100644 tools/testing/selftests/rseq/rseq-ppc.h
>>  create mode 100644 tools/testing/selftests/rseq/rseq-x86.h
>>  create mode 100644 tools/testing/selftests/rseq/rseq.c
>>  create mode 100644 tools/testing/selftests/rseq/rseq.h
>>  create mode 100755 tools/testing/selftests/rseq/run_param_test.sh
>> 
> 
> Looks good.
> 
> Acked-by: Shuah Khan <shuahkh at osg.samsung.com>

Thanks the the reviews!

Mathieu

> 
> thanks,
> -- Shuah

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH update for 4.15 3/3] rseq: selftests: Provide self-tests (v4)
@ 2017-11-22 16:31       ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-22 16:31 UTC (permalink / raw)
  To: shuah
  Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas

----- On Nov 22, 2017, at 10:23 AM, shuah shuah@kernel.org wrote:

> On 11/21/2017 03:19 PM, Mathieu Desnoyers wrote:
>> Implements two basic tests of RSEQ functionality, and one more
>> exhaustive parameterizable test.
>> 
>> The first, "basic_test" only asserts that RSEQ works moderately
>> correctly. E.g. that the CPUID pointer works.
>> 
>> "basic_percpu_ops_test" is a slightly more "realistic" variant,
>> implementing a few simple per-cpu operations and testing their
>> correctness.
>> 
>> "param_test" is a parametrizable restartable sequences test. See
>> the "--help" output for usage.
>> 
>> A run_param_test.sh script runs many variants of the parametrizable
>> tests.
>> 
>> As part of those tests, a helper library "rseq" implements a user-space
>> API around restartable sequences. It uses the cpu_opv system call as
>> fallback when single-stepped by a debugger. It exposes the instruction
>> pointer addresses where the rseq assembly blocks begin and end, as well
>> as the associated abort instruction pointer, in the __rseq_table
>> section. This section allows debuggers may know where to place
>> breakpoints when single-stepping through assembly blocks which may be
>> aborted at any point by the kernel.
>> 
>> The rseq library expose APIs that present the fast-path operations.
>> The new from userspace is, e.g. for a counter increment:
>> 
>>     cpu = rseq_cpu_start();
>>     ret = rseq_addv(&data->c[cpu].count, 1, cpu);
>>     if (likely(!ret))
>>         return 0;        /* Success. */
>>     do {
>>         cpu = rseq_current_cpu();
>>         ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
>>         if (likely(!ret))
>>             return 0;    /* Success. */
>>     } while (ret > 0 || errno == EAGAIN);
>>     perror("cpu_op_addv");
>>     return -1;           /* Unexpected error. */
>> 
>> PowerPC tests have been implemented by Boqun Feng.
>> 
>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> CC: Russell King <linux@arm.linux.org.uk>
>> CC: Catalin Marinas <catalin.marinas@arm.com>
>> CC: Will Deacon <will.deacon@arm.com>
>> CC: Thomas Gleixner <tglx@linutronix.de>
>> CC: Paul Turner <pjt@google.com>
>> CC: Andrew Hunter <ahh@google.com>
>> CC: Peter Zijlstra <peterz@infradead.org>
>> CC: Andy Lutomirski <luto@amacapital.net>
>> CC: Andi Kleen <andi@firstfloor.org>
>> CC: Dave Watson <davejwatson@fb.com>
>> CC: Chris Lameter <cl@linux.com>
>> CC: Ingo Molnar <mingo@redhat.com>
>> CC: "H. Peter Anvin" <hpa@zytor.com>
>> CC: Ben Maurer <bmaurer@fb.com>
>> CC: Steven Rostedt <rostedt@goodmis.org>
>> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
>> CC: Josh Triplett <josh@joshtriplett.org>
>> CC: Linus Torvalds <torvalds@linux-foundation.org>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>> CC: Boqun Feng <boqun.feng@gmail.com>
>> CC: Shuah Khan <shuah@kernel.org>
>> CC: linux-kselftest@vger.kernel.org
>> CC: linux-api@vger.kernel.org
>> ---
>> Changes since v1:
>> - Provide abort-ip signature: The abort-ip signature is located just
>>   before the abort-ip target. It is currently hardcoded, but a
>>   user-space application could use the __rseq_table to iterate on all
>>   abort-ip targets and use a random value as signature if needed in the
>>   future.
>> - Add rseq_prepare_unload(): Libraries and JIT code using rseq critical
>>   sections need to issue rseq_prepare_unload() on each thread at least
>>   once before reclaim of struct rseq_cs.
>> - Use initial-exec TLS model, non-weak symbol: The initial-exec model is
>>   signal-safe, whereas the global-dynamic model is not.  Remove the
>>   "weak" symbol attribute from the __rseq_abi in rseq.c. The rseq.so
>>   library will have ownership of that symbol, and there is not reason for
>>   an application or user library to try to define that symbol.
>>   The expected use is to link against libreq.so, which owns and provide
>>   that symbol.
>> - Set cpu_id to -2 on register error
>> - Add rseq_len syscall parameter, rseq_cs version
>> - Ensure disassember-friendly signature: x86 32/64 disassembler have a
>>   hard time decoding the instruction stream after a bad instruction. Use
>>   a nopl instruction to encode the signature. Suggested by Andy Lutomirski.
>> - Exercise parametrized tests variants in a shell scripts.
>> - Restartable sequences selftests: Remove use of event counter.
>> - Use cpu_id_start field:  With the cpu_id_start field, the C
>>   preparation phase of the fast-path does not need to compare cpu_id < 0
>>   anymore.
>> - Signal-safe registration and refcounting: Allow libraries using
>>   librseq.so to register it from signal handlers.
>> - Use OVERRIDE_TARGETS in makefile.
>> - Use "m" constraints for rseq_cs field.
>> 
>> Changes since v2:
>> - Update based on Thomas Gleixner's comments.
>> 
>> Changes since v3:
>> - Generate param_test_skip_fastpath and param_test_benchmark with
>>   -DSKIP_FASTPATH and -DBENCHMARK (respectively). Add param_test_fastpath
>>   to run_param_test.sh.
>> ---
>>  MAINTAINERS                                        |    1 +
>>  tools/testing/selftests/Makefile                   |    1 +
>>  tools/testing/selftests/rseq/.gitignore            |    4 +
>>  tools/testing/selftests/rseq/Makefile              |   33 +
>>  .../testing/selftests/rseq/basic_percpu_ops_test.c |  333 +++++
>>  tools/testing/selftests/rseq/basic_test.c          |   55 +
>>  tools/testing/selftests/rseq/param_test.c          | 1285 ++++++++++++++++++++
>>  tools/testing/selftests/rseq/rseq-arm.h            |  535 ++++++++
>>  tools/testing/selftests/rseq/rseq-ppc.h            |  567 +++++++++
>>  tools/testing/selftests/rseq/rseq-x86.h            |  898 ++++++++++++++
>>  tools/testing/selftests/rseq/rseq.c                |  116 ++
>>  tools/testing/selftests/rseq/rseq.h                |  154 +++
>>  tools/testing/selftests/rseq/run_param_test.sh     |  126 ++
>>  13 files changed, 4108 insertions(+)
>>  create mode 100644 tools/testing/selftests/rseq/.gitignore
>>  create mode 100644 tools/testing/selftests/rseq/Makefile
>>  create mode 100644 tools/testing/selftests/rseq/basic_percpu_ops_test.c
>>  create mode 100644 tools/testing/selftests/rseq/basic_test.c
>>  create mode 100644 tools/testing/selftests/rseq/param_test.c
>>  create mode 100644 tools/testing/selftests/rseq/rseq-arm.h
>>  create mode 100644 tools/testing/selftests/rseq/rseq-ppc.h
>>  create mode 100644 tools/testing/selftests/rseq/rseq-x86.h
>>  create mode 100644 tools/testing/selftests/rseq/rseq.c
>>  create mode 100644 tools/testing/selftests/rseq/rseq.h
>>  create mode 100755 tools/testing/selftests/rseq/run_param_test.sh
>> 
> 
> Looks good.
> 
> Acked-by: Shuah Khan <shuahkh@osg.samsung.com>

Thanks the the reviews!

Mathieu

> 
> thanks,
> -- Shuah

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-22 16:43         ` Mathieu Desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-22 16:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andi Kleen, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Chris Lameter, Ben Maurer, rostedt, Josh Triplett,
	Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk

----- On Nov 22, 2017, at 10:28 AM, Andy Lutomirski luto@amacapital.net wrote:

> On Tue, Nov 21, 2017 at 2:05 PM, Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
>> ----- On Nov 21, 2017, at 12:21 PM, Andi Kleen andi@firstfloor.org wrote:
>>
>>> On Tue, Nov 21, 2017 at 09:18:38AM -0500, Mathieu Desnoyers wrote:
>>>> Hi,
>>>>
>>>> Following changes based on a thorough coding style and patch changelog
>>>> review from Thomas Gleixner and Peter Zijlstra, I'm respinning this
>>>> series for another RFC.
>>>>
>>> My suggestion would be that you also split out the opv system call.
>>> That seems to be main contention point currently, and the restartable
>>> sequences should be useful without it.
>>
>> I consider rseq to be incomplete and a pain to use in various scenarios
>> without cpu_opv.
>>
>> About the contention point you refer to:
>>
>> Using vDSO as an example of how things should be done is just wrong: the
>> vDSO interaction with debugger instruction single-stepping is broken,
>> as I detailed in my previous email.
>>
> 
> If anyone ever reports that as a problem, I'll gladly fix it in the
> kernel.  That's doable without an ABI change.  If rseq-like things
> started breaking single-stepping, we can't just fix it in the kernel.

Very true. And rseq does break both line-level and instruction-level
single-stepping.

> 
> Also, there is one and only one vclock_gettime.  Debuggers can easily
> special-case it.  For all I know, they already do.

As my tests demonstrate, they don't. clock_gettime() vDSO currently
breaks instruction-level single-stepping (istep) with gdb. I'll
forward you the writeup I did on that a few days ago.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-22 16:43         ` Mathieu Desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-22 16:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andi Kleen, Peter Zijlstra, Paul E. McKenney, Boqun Feng,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Chris Lameter, Ben Maurer, rostedt, Josh Triplett,
	Linus Torvalds, Catalin Marinas, Will

----- On Nov 22, 2017, at 10:28 AM, Andy Lutomirski luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org wrote:

> On Tue, Nov 21, 2017 at 2:05 PM, Mathieu Desnoyers
> <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org> wrote:
>> ----- On Nov 21, 2017, at 12:21 PM, Andi Kleen andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org wrote:
>>
>>> On Tue, Nov 21, 2017 at 09:18:38AM -0500, Mathieu Desnoyers wrote:
>>>> Hi,
>>>>
>>>> Following changes based on a thorough coding style and patch changelog
>>>> review from Thomas Gleixner and Peter Zijlstra, I'm respinning this
>>>> series for another RFC.
>>>>
>>> My suggestion would be that you also split out the opv system call.
>>> That seems to be main contention point currently, and the restartable
>>> sequences should be useful without it.
>>
>> I consider rseq to be incomplete and a pain to use in various scenarios
>> without cpu_opv.
>>
>> About the contention point you refer to:
>>
>> Using vDSO as an example of how things should be done is just wrong: the
>> vDSO interaction with debugger instruction single-stepping is broken,
>> as I detailed in my previous email.
>>
> 
> If anyone ever reports that as a problem, I'll gladly fix it in the
> kernel.  That's doable without an ABI change.  If rseq-like things
> started breaking single-stepping, we can't just fix it in the kernel.

Very true. And rseq does break both line-level and instruction-level
single-stepping.

> 
> Also, there is one and only one vclock_gettime.  Debuggers can easily
> special-case it.  For all I know, they already do.

As my tests demonstrate, they don't. clock_gettime() vDSO currently
breaks instruction-level single-stepping (istep) with gdb. I'll
forward you the writeup I did on that a few days ago.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-22 18:10           ` Andi Kleen
  0 siblings, 0 replies; 175+ messages in thread
From: Andi Kleen @ 2017-11-22 18:10 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andy Lutomirski, Andi Kleen, Peter Zijlstra, Paul E. McKenney,
	Boqun Feng, Dave Watson, linux-kernel, linux-api, Paul Turner,
	Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Hunter, Chris Lameter, Ben Maurer,
	rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk

> > If anyone ever reports that as a problem, I'll gladly fix it in the
> > kernel.  That's doable without an ABI change.  If rseq-like things
> > started breaking single-stepping, we can't just fix it in the kernel.

AFAIK nobody ever complained about it since we have vsyscalls and vDSOs.

> 
> Very true. And rseq does break both line-level and instruction-level
> single-stepping.

They can just set a break point after it and continue.

In fact it could be even expressed to the debugger to do
that automatically based on some dwarf extension.

I also disagree that opv somehow "solves" debugging: it's a completely
different code path that has nothing to do with the original code path.
That's not debugging, that's at best a workaround. I don't think it's
any better than the break point method.

-Andi

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-22 18:10           ` Andi Kleen
  0 siblings, 0 replies; 175+ messages in thread
From: Andi Kleen @ 2017-11-22 18:10 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andy Lutomirski, Andi Kleen, Peter Zijlstra, Paul E. McKenney,
	Boqun Feng, Dave Watson, linux-kernel, linux-api, Paul Turner,
	Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andrew Hunter, Chris Lameter, Ben Maurer,
	rostedt, Josh Triplett, Linus Torvalds, Catalin

> > If anyone ever reports that as a problem, I'll gladly fix it in the
> > kernel.  That's doable without an ABI change.  If rseq-like things
> > started breaking single-stepping, we can't just fix it in the kernel.

AFAIK nobody ever complained about it since we have vsyscalls and vDSOs.

> 
> Very true. And rseq does break both line-level and instruction-level
> single-stepping.

They can just set a break point after it and continue.

In fact it could be even expressed to the debugger to do
that automatically based on some dwarf extension.

I also disagree that opv somehow "solves" debugging: it's a completely
different code path that has nothing to do with the original code path.
That's not debugging, that's at best a workaround. I don't think it's
any better than the break point method.

-Andi

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-22 19:32       ` Peter Zijlstra
  0 siblings, 0 replies; 175+ messages in thread
From: Peter Zijlstra @ 2017-11-22 19:32 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Chris Lameter, Ben Maurer, rostedt, Josh Triplett,
	Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk

On Tue, Nov 21, 2017 at 10:05:08PM +0000, Mathieu Desnoyers wrote:
> Other than that, I have not received any concrete alternative proposal to
> properly handle single-stepping.

That's not entirely true; amluto did have an alternative in Prague: do
full machine level instruction emulation till the end of the rseq when
it gets 'preempted too often'.

Yes, implementing that will be an absolute royal pain. But it does
remove the whole duplicate/dual program asm/bytecode thing and avoids
the syscall entirely.

And we don't need to do a full x86_64/arch-of-choice emulator for this
either; just as cpu_opv is fairly limited too. We can do a subset that
allows dealing with the known sequences and go from there -- it can
always fall back to not emulating and reverting to the pure rseq with
debug/fwd progress 'issues'.

So what exactly is the problem of leaving out the whole cpu_opv thing
for now? Pure rseq is usable -- albeit a bit cumbersome without
additional debugger support.

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-22 19:32       ` Peter Zijlstra
  0 siblings, 0 replies; 175+ messages in thread
From: Peter Zijlstra @ 2017-11-22 19:32 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Chris Lameter, Ben Maurer, rostedt, Josh Triplett,
	Linus Torvalds, Catalin Marinas, Will

On Tue, Nov 21, 2017 at 10:05:08PM +0000, Mathieu Desnoyers wrote:
> Other than that, I have not received any concrete alternative proposal to
> properly handle single-stepping.

That's not entirely true; amluto did have an alternative in Prague: do
full machine level instruction emulation till the end of the rseq when
it gets 'preempted too often'.

Yes, implementing that will be an absolute royal pain. But it does
remove the whole duplicate/dual program asm/bytecode thing and avoids
the syscall entirely.

And we don't need to do a full x86_64/arch-of-choice emulator for this
either; just as cpu_opv is fairly limited too. We can do a subset that
allows dealing with the known sequences and go from there -- it can
always fall back to not emulating and reverting to the pure rseq with
debug/fwd progress 'issues'.

So what exactly is the problem of leaving out the whole cpu_opv thing
for now? Pure rseq is usable -- albeit a bit cumbersome without
additional debugger support.

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
  2017-11-22 19:32       ` Peter Zijlstra
@ 2017-11-22 19:37         ` Will Deacon
  -1 siblings, 0 replies; 175+ messages in thread
From: Will Deacon @ 2017-11-22 19:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, Andi Kleen, Paul E. McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson, linux-kernel, linux-api,
	Paul Turner, Andrew Morton, Russell King, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Andrew Hunter, Chris Lameter,
	Ben Maurer, rostedt, Josh Triplett, Linus Torvalds,
	Catalin Marinas, Michael Kerrisk

On Wed, Nov 22, 2017 at 08:32:19PM +0100, Peter Zijlstra wrote:
> On Tue, Nov 21, 2017 at 10:05:08PM +0000, Mathieu Desnoyers wrote:
> > Other than that, I have not received any concrete alternative proposal to
> > properly handle single-stepping.
> 
> That's not entirely true; amluto did have an alternative in Prague: do
> full machine level instruction emulation till the end of the rseq when
> it gets 'preempted too often'.
> 
> Yes, implementing that will be an absolute royal pain. But it does
> remove the whole duplicate/dual program asm/bytecode thing and avoids
> the syscall entirely.
> 
> And we don't need to do a full x86_64/arch-of-choice emulator for this
> either; just as cpu_opv is fairly limited too. We can do a subset that
> allows dealing with the known sequences and go from there -- it can
> always fall back to not emulating and reverting to the pure rseq with
> debug/fwd progress 'issues'.
> 
> So what exactly is the problem of leaving out the whole cpu_opv thing
> for now? Pure rseq is usable -- albeit a bit cumbersome without
> additional debugger support.

Drive-by "ack" to that. I'd really like a working rseq implementation in
mainline, but I don't much care for another interpreter.

Will

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-22 19:37         ` Will Deacon
  0 siblings, 0 replies; 175+ messages in thread
From: Will Deacon @ 2017-11-22 19:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, Andi Kleen, Paul E. McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson, linux-kernel, linux-api,
	Paul Turner, Andrew Morton, Russell King, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Andrew Hunter, Chris Lameter,
	Ben Maurer, rostedt, Josh Triplett, Linus Torvalds

On Wed, Nov 22, 2017 at 08:32:19PM +0100, Peter Zijlstra wrote:
> On Tue, Nov 21, 2017 at 10:05:08PM +0000, Mathieu Desnoyers wrote:
> > Other than that, I have not received any concrete alternative proposal to
> > properly handle single-stepping.
> 
> That's not entirely true; amluto did have an alternative in Prague: do
> full machine level instruction emulation till the end of the rseq when
> it gets 'preempted too often'.
> 
> Yes, implementing that will be an absolute royal pain. But it does
> remove the whole duplicate/dual program asm/bytecode thing and avoids
> the syscall entirely.
> 
> And we don't need to do a full x86_64/arch-of-choice emulator for this
> either; just as cpu_opv is fairly limited too. We can do a subset that
> allows dealing with the known sequences and go from there -- it can
> always fall back to not emulating and reverting to the pure rseq with
> debug/fwd progress 'issues'.
> 
> So what exactly is the problem of leaving out the whole cpu_opv thing
> for now? Pure rseq is usable -- albeit a bit cumbersome without
> additional debugger support.

Drive-by "ack" to that. I'd really like a working rseq implementation in
mainline, but I don't much care for another interpreter.

Will

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
  2017-11-21 14:18 ` [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests Mathieu Desnoyers
  2017-11-21 15:34     ` Shuah Khan
  2017-11-22 19:38     ` peterz
@ 2017-11-22 19:38     ` peterz
  2017-11-23  8:55     ` peterz
  3 siblings, 0 replies; 175+ messages in thread
From: Peter Zijlstra @ 2017-11-22 19:38 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Shuah Khan, linux-kselftest

On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
> Implements two basic tests of RSEQ functionality, and one more
> exhaustive parameterizable test.
> 
> The first, "basic_test" only asserts that RSEQ works moderately
> correctly. E.g. that the CPUID pointer works.
> 
> "basic_percpu_ops_test" is a slightly more "realistic" variant,
> implementing a few simple per-cpu operations and testing their
> correctness.
> 
> "param_test" is a parametrizable restartable sequences test. See
> the "--help" output for usage.
> 
> A run_param_test.sh script runs many variants of the parametrizable
> tests.
> 
> As part of those tests, a helper library "rseq" implements a user-space
> API around restartable sequences. It uses the cpu_opv system call as
> fallback when single-stepped by a debugger. It exposes the instruction
> pointer addresses where the rseq assembly blocks begin and end, as well
> as the associated abort instruction pointer, in the __rseq_table
> section. This section allows debuggers may know where to place
> breakpoints when single-stepping through assembly blocks which may be
> aborted at any point by the kernel.

Could I ask you to split this in smaller bits?

I'd start with just the rseq library, using only the rseq interface.
Then add the whole cpu_opv fallback stuff.
Then add the selftests using librseq.

As is this is a tad much to read in a single go.

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-22 19:38     ` peterz
  0 siblings, 0 replies; 175+ messages in thread
From: peterz @ 2017-11-22 19:38 UTC (permalink / raw)


On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
> Implements two basic tests of RSEQ functionality, and one more
> exhaustive parameterizable test.
> 
> The first, "basic_test" only asserts that RSEQ works moderately
> correctly. E.g. that the CPUID pointer works.
> 
> "basic_percpu_ops_test" is a slightly more "realistic" variant,
> implementing a few simple per-cpu operations and testing their
> correctness.
> 
> "param_test" is a parametrizable restartable sequences test. See
> the "--help" output for usage.
> 
> A run_param_test.sh script runs many variants of the parametrizable
> tests.
> 
> As part of those tests, a helper library "rseq" implements a user-space
> API around restartable sequences. It uses the cpu_opv system call as
> fallback when single-stepped by a debugger. It exposes the instruction
> pointer addresses where the rseq assembly blocks begin and end, as well
> as the associated abort instruction pointer, in the __rseq_table
> section. This section allows debuggers may know where to place
> breakpoints when single-stepping through assembly blocks which may be
> aborted at any point by the kernel.

Could I ask you to split this in smaller bits?

I'd start with just the rseq library, using only the rseq interface.
Then add the whole cpu_opv fallback stuff.
Then add the selftests using librseq.

As is this is a tad much to read in a single go.
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-22 19:38     ` peterz
  0 siblings, 0 replies; 175+ messages in thread
From: Peter Zijlstra @ 2017-11-22 19:38 UTC (permalink / raw)


On Tue, Nov 21, 2017@09:18:53AM -0500, Mathieu Desnoyers wrote:
> Implements two basic tests of RSEQ functionality, and one more
> exhaustive parameterizable test.
> 
> The first, "basic_test" only asserts that RSEQ works moderately
> correctly. E.g. that the CPUID pointer works.
> 
> "basic_percpu_ops_test" is a slightly more "realistic" variant,
> implementing a few simple per-cpu operations and testing their
> correctness.
> 
> "param_test" is a parametrizable restartable sequences test. See
> the "--help" output for usage.
> 
> A run_param_test.sh script runs many variants of the parametrizable
> tests.
> 
> As part of those tests, a helper library "rseq" implements a user-space
> API around restartable sequences. It uses the cpu_opv system call as
> fallback when single-stepped by a debugger. It exposes the instruction
> pointer addresses where the rseq assembly blocks begin and end, as well
> as the associated abort instruction pointer, in the __rseq_table
> section. This section allows debuggers may know where to place
> breakpoints when single-stepping through assembly blocks which may be
> aborted at any point by the kernel.

Could I ask you to split this in smaller bits?

I'd start with just the rseq library, using only the rseq interface.
Then add the whole cpu_opv fallback stuff.
Then add the selftests using librseq.

As is this is a tad much to read in a single go.
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-22 19:38     ` peterz
  0 siblings, 0 replies; 175+ messages in thread
From: Peter Zijlstra @ 2017-11-22 19:38 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon

On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
> Implements two basic tests of RSEQ functionality, and one more
> exhaustive parameterizable test.
> 
> The first, "basic_test" only asserts that RSEQ works moderately
> correctly. E.g. that the CPUID pointer works.
> 
> "basic_percpu_ops_test" is a slightly more "realistic" variant,
> implementing a few simple per-cpu operations and testing their
> correctness.
> 
> "param_test" is a parametrizable restartable sequences test. See
> the "--help" output for usage.
> 
> A run_param_test.sh script runs many variants of the parametrizable
> tests.
> 
> As part of those tests, a helper library "rseq" implements a user-space
> API around restartable sequences. It uses the cpu_opv system call as
> fallback when single-stepped by a debugger. It exposes the instruction
> pointer addresses where the rseq assembly blocks begin and end, as well
> as the associated abort instruction pointer, in the __rseq_table
> section. This section allows debuggers may know where to place
> breakpoints when single-stepping through assembly blocks which may be
> aborted at any point by the kernel.

Could I ask you to split this in smaller bits?

I'd start with just the rseq library, using only the rseq interface.
Then add the whole cpu_opv fallback stuff.
Then add the selftests using librseq.

As is this is a tad much to read in a single go.

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
  2017-11-21 14:18 ` [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests Mathieu Desnoyers
  2017-11-21 15:34     ` Shuah Khan
  2017-11-22 19:38     ` peterz
@ 2017-11-22 21:48     ` peterz
  2017-11-23  8:55     ` peterz
  3 siblings, 0 replies; 175+ messages in thread
From: Peter Zijlstra @ 2017-11-22 21:48 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Shuah Khan, linux-kselftest

On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
> diff --git a/tools/testing/selftests/rseq/rseq-x86.h b/tools/testing/selftests/rseq/rseq-x86.h
> new file mode 100644
> index 000000000000..63e81d6c61fa
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/rseq-x86.h
> @@ -0,0 +1,898 @@
> +/*
> + * rseq-x86.h
> + *
> + * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#include <stdint.h>
> +
> +#define RSEQ_SIG	0x53053053
> +
> +#ifdef __x86_64__
> +
> +#define rseq_smp_mb()	__asm__ __volatile__ ("mfence" : : : "memory")

See commit:

  450cbdd0125c ("locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE")

> +#define rseq_smp_rmb()	barrier()
> +#define rseq_smp_wmb()	barrier()
> +
> +#define rseq_smp_load_acquire(p)					\
> +__extension__ ({							\
> +	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
> +	barrier();							\
> +	____p1;								\
> +})
> +
> +#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
> +
> +#define rseq_smp_store_release(p, v)					\
> +do {									\
> +	barrier();							\
> +	RSEQ_WRITE_ONCE(*p, v);						\
> +} while (0)
> +
> +#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,		\
> +			start_ip, post_commit_offset, abort_ip)		\
> +		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
> +		".balign 32\n\t"					\
> +		__rseq_str(label) ":\n\t"				\
> +		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
> +		".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \
> +		".popsection\n\t"

OK, so this creates table entry, but why is @section an argument, AFAICT
its _always_ the same thing, no?

> +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
> +		RSEQ_INJECT_ASM(1)					\
> +		"leaq " __rseq_str(cs_label) "(%%rip), %%rax\n\t"	\
> +		"movq %%rax, %[" __rseq_str(rseq_cs) "]\n\t"		\
> +		__rseq_str(label) ":\n\t"

And this sets the TLS variable to point to the table entry from the
previous macro, no? But again @rseq_cs seems to always be the very same,
why is that an argument?

> +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
> +		RSEQ_INJECT_ASM(2)					\
> +		"cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \
> +		"jnz " __rseq_str(label) "\n\t"

more things that are always the same it seems..

> +#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label) \
> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
> +		/* Disassembler-friendly signature: nopl <sig>(%rip). */\
> +		".byte 0x0f, 0x1f, 0x05\n\t"				\
> +		".long " __rseq_str(sig) "\n\t"			\
> +		__rseq_str(label) ":\n\t"				\
> +		teardown						\
> +		"jmp %l[" __rseq_str(abort_label) "]\n\t"		\
> +		".popsection\n\t"

@section and @sig seem to always be the same...

> +#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
> +		__rseq_str(label) ":\n\t"				\
> +		teardown						\
> +		"jmp %l[" __rseq_str(cmpfail_label) "]\n\t"		\
> +		".popsection\n\t"

Somewhat failing to see the point of this macro, it seems to just
obfuscate the normal failure path.

> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu)

I find this a very confusing name for what is essentially
compare-and-exchange or compare-and-swap, no?

> +{
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)

So we set up the section, but unreadably so... reducing the number of
arguments would help a lot.

Rename the current one to __RSEQ_ASM_DEFINE_TABLE() and then use:

#define RSEQ_ASM_DEFINE_TABLE(label, start_ip, post_commit_ip, abort_ip) \
	__RSEQ_ASM_DEFINE_TABLE(label, __rseq_table, 0x0, 0x0, start_ip, \
				(post_commit_ip - start_ip), abort_ip)

or something, such that we can write:

		RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */

> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)

And here we open start the rseq by storing the table entry pointer into
the TLS thingy.

> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		"cmpq %[v], %[expect]\n\t"
> +		"jnz 5f\n\t"

		"jnz %l[cmpfail]\n\t"

was too complicated?

> +		/* final store */
> +		"movq %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)

		: [cpu_id]         "r" (cpu),
		  [current_cpu_id] "m" (__rseq_abi.cpu_id),
		  [rseq_cs]        "m" (__rseq_abi.rseq_cs),
		  [v]              "m" (*v),
		  [expect]         "r" (expect),
		  [newv]           "r" (newv)

or something does read much better

> +		: "memory", "cc", "rax"
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
> +		off_t voffp, intptr_t *load, int cpu)

so this thing does what now? It compares @v to @expectnot, when _not_
matching it will store @voffp into @v and load something..?

> +{
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		"cmpq %[v], %[expectnot]\n\t"
> +		"jz 5f\n\t"

So I would prefer "je" in this context, or rather:

		je %l[cmpfail]

> +		"movq %[v], %%rax\n\t"

loads @v in A

But it could already have changed since the previous load from cmp, no?
Would it not make sense to put this load before the cmp and use A
instead?

> +		"movq %%rax, %[load]\n\t"

stores A in @load

> +		"addq %[voffp], %%rax\n\t"

adds @off to A

> +		"movq (%%rax), %%rax\n\t"

loads (A) in A

> +		/* final store */
> +		"movq %%rax, %[v]\n\t"

stores A in @v


So the whole thing loads @v into @load, adds and offset, dereferences
and adds that back in @v, provided @v doesn't match @expected.. whee.

> +		"2:\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expectnot]"r"(expectnot),
> +		  [voffp]"er"(voffp),
> +		  [load]"m"(*load)
> +		: "memory", "cc", "rax"
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	return -1;
> +cmpfail:
> +	return 1;
> +}

> +#elif __i386__
> +
> +/*
> + * Support older 32-bit architectures that do not implement fence
> + * instructions.
> + */
> +#define rseq_smp_mb()	\
> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
> +#define rseq_smp_rmb()	\
> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
> +#define rseq_smp_wmb()	\
> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")

Oh shiny, you're supporting that OOSTORE and PPRO_FENCE nonsense?

Going by commit:

  09df7c4c8097 ("x86: Remove CONFIG_X86_OOSTORE")

That smp_wmb() one was an 'optimization' (forced store buffer flush) but
not a correctness thing. And we dropped that stuff from the kernel a
_long_ time ago.

Ideally we'd kill that PPRO_FENCE crap too.

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-22 21:48     ` peterz
  0 siblings, 0 replies; 175+ messages in thread
From: peterz @ 2017-11-22 21:48 UTC (permalink / raw)


On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
> diff --git a/tools/testing/selftests/rseq/rseq-x86.h b/tools/testing/selftests/rseq/rseq-x86.h
> new file mode 100644
> index 000000000000..63e81d6c61fa
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/rseq-x86.h
> @@ -0,0 +1,898 @@
> +/*
> + * rseq-x86.h
> + *
> + * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#include <stdint.h>
> +
> +#define RSEQ_SIG	0x53053053
> +
> +#ifdef __x86_64__
> +
> +#define rseq_smp_mb()	__asm__ __volatile__ ("mfence" : : : "memory")

See commit:

  450cbdd0125c ("locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE")

> +#define rseq_smp_rmb()	barrier()
> +#define rseq_smp_wmb()	barrier()
> +
> +#define rseq_smp_load_acquire(p)					\
> +__extension__ ({							\
> +	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
> +	barrier();							\
> +	____p1;								\
> +})
> +
> +#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
> +
> +#define rseq_smp_store_release(p, v)					\
> +do {									\
> +	barrier();							\
> +	RSEQ_WRITE_ONCE(*p, v);						\
> +} while (0)
> +
> +#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,		\
> +			start_ip, post_commit_offset, abort_ip)		\
> +		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
> +		".balign 32\n\t"					\
> +		__rseq_str(label) ":\n\t"				\
> +		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
> +		".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \
> +		".popsection\n\t"

OK, so this creates table entry, but why is @section an argument, AFAICT
its _always_ the same thing, no?

> +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
> +		RSEQ_INJECT_ASM(1)					\
> +		"leaq " __rseq_str(cs_label) "(%%rip), %%rax\n\t"	\
> +		"movq %%rax, %[" __rseq_str(rseq_cs) "]\n\t"		\
> +		__rseq_str(label) ":\n\t"

And this sets the TLS variable to point to the table entry from the
previous macro, no? But again @rseq_cs seems to always be the very same,
why is that an argument?

> +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
> +		RSEQ_INJECT_ASM(2)					\
> +		"cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \
> +		"jnz " __rseq_str(label) "\n\t"

more things that are always the same it seems..

> +#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label) \
> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
> +		/* Disassembler-friendly signature: nopl <sig>(%rip). */\
> +		".byte 0x0f, 0x1f, 0x05\n\t"				\
> +		".long " __rseq_str(sig) "\n\t"			\
> +		__rseq_str(label) ":\n\t"				\
> +		teardown						\
> +		"jmp %l[" __rseq_str(abort_label) "]\n\t"		\
> +		".popsection\n\t"

@section and @sig seem to always be the same...

> +#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
> +		__rseq_str(label) ":\n\t"				\
> +		teardown						\
> +		"jmp %l[" __rseq_str(cmpfail_label) "]\n\t"		\
> +		".popsection\n\t"

Somewhat failing to see the point of this macro, it seems to just
obfuscate the normal failure path.

> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu)

I find this a very confusing name for what is essentially
compare-and-exchange or compare-and-swap, no?

> +{
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)

So we set up the section, but unreadably so... reducing the number of
arguments would help a lot.

Rename the current one to __RSEQ_ASM_DEFINE_TABLE() and then use:

#define RSEQ_ASM_DEFINE_TABLE(label, start_ip, post_commit_ip, abort_ip) \
	__RSEQ_ASM_DEFINE_TABLE(label, __rseq_table, 0x0, 0x0, start_ip, \
				(post_commit_ip - start_ip), abort_ip)

or something, such that we can write:

		RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */

> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)

And here we open start the rseq by storing the table entry pointer into
the TLS thingy.

> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		"cmpq %[v], %[expect]\n\t"
> +		"jnz 5f\n\t"

		"jnz %l[cmpfail]\n\t"

was too complicated?

> +		/* final store */
> +		"movq %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)

		: [cpu_id]         "r" (cpu),
		  [current_cpu_id] "m" (__rseq_abi.cpu_id),
		  [rseq_cs]        "m" (__rseq_abi.rseq_cs),
		  [v]              "m" (*v),
		  [expect]         "r" (expect),
		  [newv]           "r" (newv)

or something does read much better

> +		: "memory", "cc", "rax"
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
> +		off_t voffp, intptr_t *load, int cpu)

so this thing does what now? It compares @v to @expectnot, when _not_
matching it will store @voffp into @v and load something..?

> +{
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		"cmpq %[v], %[expectnot]\n\t"
> +		"jz 5f\n\t"

So I would prefer "je" in this context, or rather:

		je %l[cmpfail]

> +		"movq %[v], %%rax\n\t"

loads @v in A

But it could already have changed since the previous load from cmp, no?
Would it not make sense to put this load before the cmp and use A
instead?

> +		"movq %%rax, %[load]\n\t"

stores A in @load

> +		"addq %[voffp], %%rax\n\t"

adds @off to A

> +		"movq (%%rax), %%rax\n\t"

loads (A) in A

> +		/* final store */
> +		"movq %%rax, %[v]\n\t"

stores A in @v


So the whole thing loads @v into @load, adds and offset, dereferences
and adds that back in @v, provided @v doesn't match @expected.. whee.

> +		"2:\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expectnot]"r"(expectnot),
> +		  [voffp]"er"(voffp),
> +		  [load]"m"(*load)
> +		: "memory", "cc", "rax"
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	return -1;
> +cmpfail:
> +	return 1;
> +}

> +#elif __i386__
> +
> +/*
> + * Support older 32-bit architectures that do not implement fence
> + * instructions.
> + */
> +#define rseq_smp_mb()	\
> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
> +#define rseq_smp_rmb()	\
> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
> +#define rseq_smp_wmb()	\
> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")

Oh shiny, you're supporting that OOSTORE and PPRO_FENCE nonsense?

Going by commit:

  09df7c4c8097 ("x86: Remove CONFIG_X86_OOSTORE")

That smp_wmb() one was an 'optimization' (forced store buffer flush) but
not a correctness thing. And we dropped that stuff from the kernel a
_long_ time ago.

Ideally we'd kill that PPRO_FENCE crap too.

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-22 21:48     ` peterz
  0 siblings, 0 replies; 175+ messages in thread
From: Peter Zijlstra @ 2017-11-22 21:48 UTC (permalink / raw)


On Tue, Nov 21, 2017@09:18:53AM -0500, Mathieu Desnoyers wrote:
> diff --git a/tools/testing/selftests/rseq/rseq-x86.h b/tools/testing/selftests/rseq/rseq-x86.h
> new file mode 100644
> index 000000000000..63e81d6c61fa
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/rseq-x86.h
> @@ -0,0 +1,898 @@
> +/*
> + * rseq-x86.h
> + *
> + * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#include <stdint.h>
> +
> +#define RSEQ_SIG	0x53053053
> +
> +#ifdef __x86_64__
> +
> +#define rseq_smp_mb()	__asm__ __volatile__ ("mfence" : : : "memory")

See commit:

  450cbdd0125c ("locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE")

> +#define rseq_smp_rmb()	barrier()
> +#define rseq_smp_wmb()	barrier()
> +
> +#define rseq_smp_load_acquire(p)					\
> +__extension__ ({							\
> +	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
> +	barrier();							\
> +	____p1;								\
> +})
> +
> +#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
> +
> +#define rseq_smp_store_release(p, v)					\
> +do {									\
> +	barrier();							\
> +	RSEQ_WRITE_ONCE(*p, v);						\
> +} while (0)
> +
> +#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,		\
> +			start_ip, post_commit_offset, abort_ip)		\
> +		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
> +		".balign 32\n\t"					\
> +		__rseq_str(label) ":\n\t"				\
> +		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
> +		".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \
> +		".popsection\n\t"

OK, so this creates table entry, but why is @section an argument, AFAICT
its _always_ the same thing, no?

> +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
> +		RSEQ_INJECT_ASM(1)					\
> +		"leaq " __rseq_str(cs_label) "(%%rip), %%rax\n\t"	\
> +		"movq %%rax, %[" __rseq_str(rseq_cs) "]\n\t"		\
> +		__rseq_str(label) ":\n\t"

And this sets the TLS variable to point to the table entry from the
previous macro, no? But again @rseq_cs seems to always be the very same,
why is that an argument?

> +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
> +		RSEQ_INJECT_ASM(2)					\
> +		"cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \
> +		"jnz " __rseq_str(label) "\n\t"

more things that are always the same it seems..

> +#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label) \
> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
> +		/* Disassembler-friendly signature: nopl <sig>(%rip). */\
> +		".byte 0x0f, 0x1f, 0x05\n\t"				\
> +		".long " __rseq_str(sig) "\n\t"			\
> +		__rseq_str(label) ":\n\t"				\
> +		teardown						\
> +		"jmp %l[" __rseq_str(abort_label) "]\n\t"		\
> +		".popsection\n\t"

@section and @sig seem to always be the same...

> +#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
> +		__rseq_str(label) ":\n\t"				\
> +		teardown						\
> +		"jmp %l[" __rseq_str(cmpfail_label) "]\n\t"		\
> +		".popsection\n\t"

Somewhat failing to see the point of this macro, it seems to just
obfuscate the normal failure path.

> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu)

I find this a very confusing name for what is essentially
compare-and-exchange or compare-and-swap, no?

> +{
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)

So we set up the section, but unreadably so... reducing the number of
arguments would help a lot.

Rename the current one to __RSEQ_ASM_DEFINE_TABLE() and then use:

#define RSEQ_ASM_DEFINE_TABLE(label, start_ip, post_commit_ip, abort_ip) \
	__RSEQ_ASM_DEFINE_TABLE(label, __rseq_table, 0x0, 0x0, start_ip, \
				(post_commit_ip - start_ip), abort_ip)

or something, such that we can write:

		RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */

> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)

And here we open start the rseq by storing the table entry pointer into
the TLS thingy.

> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		"cmpq %[v], %[expect]\n\t"
> +		"jnz 5f\n\t"

		"jnz %l[cmpfail]\n\t"

was too complicated?

> +		/* final store */
> +		"movq %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)

		: [cpu_id]         "r" (cpu),
		  [current_cpu_id] "m" (__rseq_abi.cpu_id),
		  [rseq_cs]        "m" (__rseq_abi.rseq_cs),
		  [v]              "m" (*v),
		  [expect]         "r" (expect),
		  [newv]           "r" (newv)

or something does read much better

> +		: "memory", "cc", "rax"
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
> +		off_t voffp, intptr_t *load, int cpu)

so this thing does what now? It compares @v to @expectnot, when _not_
matching it will store @voffp into @v and load something..?

> +{
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		"cmpq %[v], %[expectnot]\n\t"
> +		"jz 5f\n\t"

So I would prefer "je" in this context, or rather:

		je %l[cmpfail]

> +		"movq %[v], %%rax\n\t"

loads @v in A

But it could already have changed since the previous load from cmp, no?
Would it not make sense to put this load before the cmp and use A
instead?

> +		"movq %%rax, %[load]\n\t"

stores A in @load

> +		"addq %[voffp], %%rax\n\t"

adds @off to A

> +		"movq (%%rax), %%rax\n\t"

loads (A) in A

> +		/* final store */
> +		"movq %%rax, %[v]\n\t"

stores A in @v


So the whole thing loads @v into @load, adds and offset, dereferences
and adds that back in @v, provided @v doesn't match @expected.. whee.

> +		"2:\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expectnot]"r"(expectnot),
> +		  [voffp]"er"(voffp),
> +		  [load]"m"(*load)
> +		: "memory", "cc", "rax"
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	return -1;
> +cmpfail:
> +	return 1;
> +}

> +#elif __i386__
> +
> +/*
> + * Support older 32-bit architectures that do not implement fence
> + * instructions.
> + */
> +#define rseq_smp_mb()	\
> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
> +#define rseq_smp_rmb()	\
> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
> +#define rseq_smp_wmb()	\
> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")

Oh shiny, you're supporting that OOSTORE and PPRO_FENCE nonsense?

Going by commit:

  09df7c4c8097 ("x86: Remove CONFIG_X86_OOSTORE")

That smp_wmb() one was an 'optimization' (forced store buffer flush) but
not a correctness thing. And we dropped that stuff from the kernel a
_long_ time ago.

Ideally we'd kill that PPRO_FENCE crap too.

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-22 21:48     ` peterz
  0 siblings, 0 replies; 175+ messages in thread
From: Peter Zijlstra @ 2017-11-22 21:48 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon

On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
> diff --git a/tools/testing/selftests/rseq/rseq-x86.h b/tools/testing/selftests/rseq/rseq-x86.h
> new file mode 100644
> index 000000000000..63e81d6c61fa
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/rseq-x86.h
> @@ -0,0 +1,898 @@
> +/*
> + * rseq-x86.h
> + *
> + * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#include <stdint.h>
> +
> +#define RSEQ_SIG	0x53053053
> +
> +#ifdef __x86_64__
> +
> +#define rseq_smp_mb()	__asm__ __volatile__ ("mfence" : : : "memory")

See commit:

  450cbdd0125c ("locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE")

> +#define rseq_smp_rmb()	barrier()
> +#define rseq_smp_wmb()	barrier()
> +
> +#define rseq_smp_load_acquire(p)					\
> +__extension__ ({							\
> +	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
> +	barrier();							\
> +	____p1;								\
> +})
> +
> +#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
> +
> +#define rseq_smp_store_release(p, v)					\
> +do {									\
> +	barrier();							\
> +	RSEQ_WRITE_ONCE(*p, v);						\
> +} while (0)
> +
> +#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,		\
> +			start_ip, post_commit_offset, abort_ip)		\
> +		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
> +		".balign 32\n\t"					\
> +		__rseq_str(label) ":\n\t"				\
> +		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
> +		".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \
> +		".popsection\n\t"

OK, so this creates table entry, but why is @section an argument, AFAICT
its _always_ the same thing, no?

> +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
> +		RSEQ_INJECT_ASM(1)					\
> +		"leaq " __rseq_str(cs_label) "(%%rip), %%rax\n\t"	\
> +		"movq %%rax, %[" __rseq_str(rseq_cs) "]\n\t"		\
> +		__rseq_str(label) ":\n\t"

And this sets the TLS variable to point to the table entry from the
previous macro, no? But again @rseq_cs seems to always be the very same,
why is that an argument?

> +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
> +		RSEQ_INJECT_ASM(2)					\
> +		"cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \
> +		"jnz " __rseq_str(label) "\n\t"

more things that are always the same it seems..

> +#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label) \
> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
> +		/* Disassembler-friendly signature: nopl <sig>(%rip). */\
> +		".byte 0x0f, 0x1f, 0x05\n\t"				\
> +		".long " __rseq_str(sig) "\n\t"			\
> +		__rseq_str(label) ":\n\t"				\
> +		teardown						\
> +		"jmp %l[" __rseq_str(abort_label) "]\n\t"		\
> +		".popsection\n\t"

@section and @sig seem to always be the same...

> +#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
> +		__rseq_str(label) ":\n\t"				\
> +		teardown						\
> +		"jmp %l[" __rseq_str(cmpfail_label) "]\n\t"		\
> +		".popsection\n\t"

Somewhat failing to see the point of this macro, it seems to just
obfuscate the normal failure path.

> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu)

I find this a very confusing name for what is essentially
compare-and-exchange or compare-and-swap, no?

> +{
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)

So we set up the section, but unreadably so... reducing the number of
arguments would help a lot.

Rename the current one to __RSEQ_ASM_DEFINE_TABLE() and then use:

#define RSEQ_ASM_DEFINE_TABLE(label, start_ip, post_commit_ip, abort_ip) \
	__RSEQ_ASM_DEFINE_TABLE(label, __rseq_table, 0x0, 0x0, start_ip, \
				(post_commit_ip - start_ip), abort_ip)

or something, such that we can write:

		RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */

> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)

And here we open start the rseq by storing the table entry pointer into
the TLS thingy.

> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		"cmpq %[v], %[expect]\n\t"
> +		"jnz 5f\n\t"

		"jnz %l[cmpfail]\n\t"

was too complicated?

> +		/* final store */
> +		"movq %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)

		: [cpu_id]         "r" (cpu),
		  [current_cpu_id] "m" (__rseq_abi.cpu_id),
		  [rseq_cs]        "m" (__rseq_abi.rseq_cs),
		  [v]              "m" (*v),
		  [expect]         "r" (expect),
		  [newv]           "r" (newv)

or something does read much better

> +		: "memory", "cc", "rax"
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
> +
> +static inline __attribute__((always_inline))
> +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
> +		off_t voffp, intptr_t *load, int cpu)

so this thing does what now? It compares @v to @expectnot, when _not_
matching it will store @voffp into @v and load something..?

> +{
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> +		"cmpq %[v], %[expectnot]\n\t"
> +		"jz 5f\n\t"

So I would prefer "je" in this context, or rather:

		je %l[cmpfail]

> +		"movq %[v], %%rax\n\t"

loads @v in A

But it could already have changed since the previous load from cmp, no?
Would it not make sense to put this load before the cmp and use A
instead?

> +		"movq %%rax, %[load]\n\t"

stores A in @load

> +		"addq %[voffp], %%rax\n\t"

adds @off to A

> +		"movq (%%rax), %%rax\n\t"

loads (A) in A

> +		/* final store */
> +		"movq %%rax, %[v]\n\t"

stores A in @v


So the whole thing loads @v into @load, adds and offset, dereferences
and adds that back in @v, provided @v doesn't match @expected.. whee.

> +		"2:\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  /* final store input */
> +		  [v]"m"(*v),
> +		  [expectnot]"r"(expectnot),
> +		  [voffp]"er"(voffp),
> +		  [load]"m"(*load)
> +		: "memory", "cc", "rax"
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	return -1;
> +cmpfail:
> +	return 1;
> +}

> +#elif __i386__
> +
> +/*
> + * Support older 32-bit architectures that do not implement fence
> + * instructions.
> + */
> +#define rseq_smp_mb()	\
> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
> +#define rseq_smp_rmb()	\
> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
> +#define rseq_smp_wmb()	\
> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")

Oh shiny, you're supporting that OOSTORE and PPRO_FENCE nonsense?

Going by commit:

  09df7c4c8097 ("x86: Remove CONFIG_X86_OOSTORE")

That smp_wmb() one was an 'optimization' (forced store buffer flush) but
not a correctness thing. And we dropped that stuff from the kernel a
_long_ time ago.

Ideally we'd kill that PPRO_FENCE crap too.

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
  2017-11-21 14:18 ` [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests Mathieu Desnoyers
  2017-11-21 15:34     ` Shuah Khan
  2017-11-22 19:38     ` peterz
@ 2017-11-23  8:55     ` peterz
  2017-11-23  8:55     ` peterz
  3 siblings, 0 replies; 175+ messages in thread
From: Peter Zijlstra @ 2017-11-23  8:55 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Shuah Khan, linux-kselftest

On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
> +int percpu_list_push(struct percpu_list *list, struct percpu_list_node *node)
> +{
> +	intptr_t *targetptr, newval, expect;
> +	int cpu, ret;
> +
> +	/* Try fast path. */
> +	cpu = rseq_cpu_start();

> +	/* Load list->c[cpu].head with single-copy atomicity. */
> +	expect = (intptr_t)READ_ONCE(list->c[cpu].head);
> +	newval = (intptr_t)node;
> +	targetptr = (intptr_t *)&list->c[cpu].head;
> +	node->next = (struct percpu_list_node *)expect;

> +	ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu);

> +	if (likely(!ret))
> +		return cpu;

> +	return cpu;
> +}

> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu)
> +{
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)

> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)

So the actual C part of the RSEQ is subject to an ABA, right? We can get
migrated to another CPU and back again without then failing here.

It used to be that this was caught by the sequence count, but that is
now gone.

The thing that makes it work is the compare against @v:

> +		"cmpq %[v], %[expect]\n\t"
> +		"jnz 5f\n\t"

That then ensures things are still as we observed them before (although
this itself is also subject to ABA).

This means all RSEQ primitives that have a C part must have a cmp-and-
form, but I suppose that was already pretty much the case anyway. I just
don't remember seeing that spelled out anywhere. Then again, I've not
yet read that manpage.

> +		/* final store */
> +		"movq %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		: "memory", "cc", "rax"
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	return -1;
> +cmpfail:
> +	return 1;
> +}

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-23  8:55     ` peterz
  0 siblings, 0 replies; 175+ messages in thread
From: peterz @ 2017-11-23  8:55 UTC (permalink / raw)


On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
> +int percpu_list_push(struct percpu_list *list, struct percpu_list_node *node)
> +{
> +	intptr_t *targetptr, newval, expect;
> +	int cpu, ret;
> +
> +	/* Try fast path. */
> +	cpu = rseq_cpu_start();

> +	/* Load list->c[cpu].head with single-copy atomicity. */
> +	expect = (intptr_t)READ_ONCE(list->c[cpu].head);
> +	newval = (intptr_t)node;
> +	targetptr = (intptr_t *)&list->c[cpu].head;
> +	node->next = (struct percpu_list_node *)expect;

> +	ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu);

> +	if (likely(!ret))
> +		return cpu;

> +	return cpu;
> +}

> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu)
> +{
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)

> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)

So the actual C part of the RSEQ is subject to an ABA, right? We can get
migrated to another CPU and back again without then failing here.

It used to be that this was caught by the sequence count, but that is
now gone.

The thing that makes it work is the compare against @v:

> +		"cmpq %[v], %[expect]\n\t"
> +		"jnz 5f\n\t"

That then ensures things are still as we observed them before (although
this itself is also subject to ABA).

This means all RSEQ primitives that have a C part must have a cmp-and-
form, but I suppose that was already pretty much the case anyway. I just
don't remember seeing that spelled out anywhere. Then again, I've not
yet read that manpage.

> +		/* final store */
> +		"movq %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		: "memory", "cc", "rax"
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-23  8:55     ` peterz
  0 siblings, 0 replies; 175+ messages in thread
From: Peter Zijlstra @ 2017-11-23  8:55 UTC (permalink / raw)


On Tue, Nov 21, 2017@09:18:53AM -0500, Mathieu Desnoyers wrote:
> +int percpu_list_push(struct percpu_list *list, struct percpu_list_node *node)
> +{
> +	intptr_t *targetptr, newval, expect;
> +	int cpu, ret;
> +
> +	/* Try fast path. */
> +	cpu = rseq_cpu_start();

> +	/* Load list->c[cpu].head with single-copy atomicity. */
> +	expect = (intptr_t)READ_ONCE(list->c[cpu].head);
> +	newval = (intptr_t)node;
> +	targetptr = (intptr_t *)&list->c[cpu].head;
> +	node->next = (struct percpu_list_node *)expect;

> +	ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu);

> +	if (likely(!ret))
> +		return cpu;

> +	return cpu;
> +}

> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu)
> +{
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)

> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)

So the actual C part of the RSEQ is subject to an ABA, right? We can get
migrated to another CPU and back again without then failing here.

It used to be that this was caught by the sequence count, but that is
now gone.

The thing that makes it work is the compare against @v:

> +		"cmpq %[v], %[expect]\n\t"
> +		"jnz 5f\n\t"

That then ensures things are still as we observed them before (although
this itself is also subject to ABA).

This means all RSEQ primitives that have a C part must have a cmp-and-
form, but I suppose that was already pretty much the case anyway. I just
don't remember seeing that spelled out anywhere. Then again, I've not
yet read that manpage.

> +		/* final store */
> +		"movq %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		: "memory", "cc", "rax"
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	return -1;
> +cmpfail:
> +	return 1;
> +}
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-23  8:55     ` peterz
  0 siblings, 0 replies; 175+ messages in thread
From: Peter Zijlstra @ 2017-11-23  8:55 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon

On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
> +int percpu_list_push(struct percpu_list *list, struct percpu_list_node *node)
> +{
> +	intptr_t *targetptr, newval, expect;
> +	int cpu, ret;
> +
> +	/* Try fast path. */
> +	cpu = rseq_cpu_start();

> +	/* Load list->c[cpu].head with single-copy atomicity. */
> +	expect = (intptr_t)READ_ONCE(list->c[cpu].head);
> +	newval = (intptr_t)node;
> +	targetptr = (intptr_t *)&list->c[cpu].head;
> +	node->next = (struct percpu_list_node *)expect;

> +	ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu);

> +	if (likely(!ret))
> +		return cpu;

> +	return cpu;
> +}

> +static inline __attribute__((always_inline))
> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> +		int cpu)
> +{
> +	__asm__ __volatile__ goto (
> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)

> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)

So the actual C part of the RSEQ is subject to an ABA, right? We can get
migrated to another CPU and back again without then failing here.

It used to be that this was caught by the sequence count, but that is
now gone.

The thing that makes it work is the compare against @v:

> +		"cmpq %[v], %[expect]\n\t"
> +		"jnz 5f\n\t"

That then ensures things are still as we observed them before (although
this itself is also subject to ABA).

This means all RSEQ primitives that have a C part must have a cmp-and-
form, but I suppose that was already pretty much the case anyway. I just
don't remember seeing that spelled out anywhere. Then again, I've not
yet read that manpage.

> +		/* final store */
> +		"movq %[newv], %[v]\n\t"
> +		"2:\n\t"
> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> +		: /* gcc asm goto does not allow outputs */
> +		: [cpu_id]"r"(cpu),
> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> +		  [v]"m"(*v),
> +		  [expect]"r"(expect),
> +		  [newv]"r"(newv)
> +		: "memory", "cc", "rax"
> +		: abort, cmpfail
> +	);
> +	return 0;
> +abort:
> +	return -1;
> +cmpfail:
> +	return 1;
> +}

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
  2017-11-23  8:55     ` peterz
  (?)
  (?)
@ 2017-11-23  8:57       ` peterz
  -1 siblings, 0 replies; 175+ messages in thread
From: Peter Zijlstra @ 2017-11-23  8:57 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Shuah Khan, linux-kselftest

On Thu, Nov 23, 2017 at 09:55:11AM +0100, Peter Zijlstra wrote:
> On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
> > +static inline __attribute__((always_inline))
> > +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> > +		int cpu)
> > +{
> > +	__asm__ __volatile__ goto (
> > +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> > +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> 
> > +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)

> > +		"cmpq %[v], %[expect]\n\t"
> > +		"jnz 5f\n\t"

Also, I'm confused between the abort and cmpfail cases.

In would expect the cpu_id compare to also result in cmpfail, that is, I
would only expect the kernel to result in abort.

> > +		/* final store */
> > +		"movq %[newv], %[v]\n\t"
> > +		"2:\n\t"
> > +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> > +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> > +		: /* gcc asm goto does not allow outputs */
> > +		: [cpu_id]"r"(cpu),
> > +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> > +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> > +		  [v]"m"(*v),
> > +		  [expect]"r"(expect),
> > +		  [newv]"r"(newv)
> > +		: "memory", "cc", "rax"
> > +		: abort, cmpfail
> > +	);
> > +	return 0;
> > +abort:
> > +	return -1;

Which then would suggest this be -EINTR or something like that.

> > +cmpfail:
> > +	return 1;
> > +}

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-23  8:57       ` peterz
  0 siblings, 0 replies; 175+ messages in thread
From: peterz @ 2017-11-23  8:57 UTC (permalink / raw)


On Thu, Nov 23, 2017 at 09:55:11AM +0100, Peter Zijlstra wrote:
> On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
> > +static inline __attribute__((always_inline))
> > +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> > +		int cpu)
> > +{
> > +	__asm__ __volatile__ goto (
> > +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> > +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> 
> > +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)

> > +		"cmpq %[v], %[expect]\n\t"
> > +		"jnz 5f\n\t"

Also, I'm confused between the abort and cmpfail cases.

In would expect the cpu_id compare to also result in cmpfail, that is, I
would only expect the kernel to result in abort.

> > +		/* final store */
> > +		"movq %[newv], %[v]\n\t"
> > +		"2:\n\t"
> > +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> > +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> > +		: /* gcc asm goto does not allow outputs */
> > +		: [cpu_id]"r"(cpu),
> > +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> > +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> > +		  [v]"m"(*v),
> > +		  [expect]"r"(expect),
> > +		  [newv]"r"(newv)
> > +		: "memory", "cc", "rax"
> > +		: abort, cmpfail
> > +	);
> > +	return 0;
> > +abort:
> > +	return -1;

Which then would suggest this be -EINTR or something like that.

> > +cmpfail:
> > +	return 1;
> > +}
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-23  8:57       ` peterz
  0 siblings, 0 replies; 175+ messages in thread
From: Peter Zijlstra @ 2017-11-23  8:57 UTC (permalink / raw)


On Thu, Nov 23, 2017@09:55:11AM +0100, Peter Zijlstra wrote:
> On Tue, Nov 21, 2017@09:18:53AM -0500, Mathieu Desnoyers wrote:
> > +static inline __attribute__((always_inline))
> > +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> > +		int cpu)
> > +{
> > +	__asm__ __volatile__ goto (
> > +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> > +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> 
> > +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)

> > +		"cmpq %[v], %[expect]\n\t"
> > +		"jnz 5f\n\t"

Also, I'm confused between the abort and cmpfail cases.

In would expect the cpu_id compare to also result in cmpfail, that is, I
would only expect the kernel to result in abort.

> > +		/* final store */
> > +		"movq %[newv], %[v]\n\t"
> > +		"2:\n\t"
> > +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> > +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> > +		: /* gcc asm goto does not allow outputs */
> > +		: [cpu_id]"r"(cpu),
> > +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> > +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> > +		  [v]"m"(*v),
> > +		  [expect]"r"(expect),
> > +		  [newv]"r"(newv)
> > +		: "memory", "cc", "rax"
> > +		: abort, cmpfail
> > +	);
> > +	return 0;
> > +abort:
> > +	return -1;

Which then would suggest this be -EINTR or something like that.

> > +cmpfail:
> > +	return 1;
> > +}
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-23  8:57       ` peterz
  0 siblings, 0 replies; 175+ messages in thread
From: Peter Zijlstra @ 2017-11-23  8:57 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon

On Thu, Nov 23, 2017 at 09:55:11AM +0100, Peter Zijlstra wrote:
> On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
> > +static inline __attribute__((always_inline))
> > +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
> > +		int cpu)
> > +{
> > +	__asm__ __volatile__ goto (
> > +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> > +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> 
> > +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)

> > +		"cmpq %[v], %[expect]\n\t"
> > +		"jnz 5f\n\t"

Also, I'm confused between the abort and cmpfail cases.

In would expect the cpu_id compare to also result in cmpfail, that is, I
would only expect the kernel to result in abort.

> > +		/* final store */
> > +		"movq %[newv], %[v]\n\t"
> > +		"2:\n\t"
> > +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
> > +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
> > +		: /* gcc asm goto does not allow outputs */
> > +		: [cpu_id]"r"(cpu),
> > +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
> > +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
> > +		  [v]"m"(*v),
> > +		  [expect]"r"(expect),
> > +		  [newv]"r"(newv)
> > +		: "memory", "cc", "rax"
> > +		: abort, cmpfail
> > +	);
> > +	return 0;
> > +abort:
> > +	return -1;

Which then would suggest this be -EINTR or something like that.

> > +cmpfail:
> > +	return 1;
> > +}

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
  2017-11-22 19:32       ` Peter Zijlstra
@ 2017-11-23 21:13         ` Mathieu Desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-23 21:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Chris Lameter, Ben Maurer, rostedt, Josh Triplett,
	Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk

----- On Nov 22, 2017, at 2:32 PM, Peter Zijlstra peterz@infradead.org wrote:

> On Tue, Nov 21, 2017 at 10:05:08PM +0000, Mathieu Desnoyers wrote:
>> Other than that, I have not received any concrete alternative proposal to
>> properly handle single-stepping.
> 
> That's not entirely true; amluto did have an alternative in Prague: do
> full machine level instruction emulation till the end of the rseq when
> it gets 'preempted too often'.

Yes, that's right. Andy did propose that alternative at KS. Which is also
interpreter-based.

> 
> Yes, implementing that will be an absolute royal pain. But it does
> remove the whole duplicate/dual program asm/bytecode thing and avoids
> the syscall entirely.

Agreed on this being a royal pain that we'd have to do for each
architecture.

By the way, I figured an interesting library API that would remove the
need for code duplication for end-users:

e.g.

static inline __attribute__((always_inline))
int percpu_addv(intptr_t *v, intptr_t count, int cpu)
{
        rseq_addv(v, count, cpu);
        if (rseq_unlikely(rseq_addv(v, count, cpu)))
                return cpu_op_addv(v, count, cpu);
        return 0;
}

And the caller becomes:

                cpu = rseq_cpu_start();
                ret = percpu_addv(&data->c[cpu].count, 1, cpu);
                if (unlikely(ret)) {
                        perror("cpu_opv");
                        abort();
                }

So the caller does not even have to bother retrying in case of
rseq error, it's all handled by the "percpu_*()" static inlines.

> 
> And we don't need to do a full x86_64/arch-of-choice emulator for this
> either; just as cpu_opv is fairly limited too. We can do a subset that
> allows dealing with the known sequences and go from there -- it can
> always fall back to not emulating and reverting to the pure rseq with
> debug/fwd progress 'issues'.

I think trying to make the kernel ABI "developer-friendly" is the wrong
approach. This kind of ease-of-use sugar should be provided by a library,
not by the kernel ABI. The futex system call is a good example of low-level
syscall meant to be used by libraries rather than directly by end-users.

> So what exactly is the problem of leaving out the whole cpu_opv thing
> for now? Pure rseq is usable -- albeit a bit cumbersome without
> additional debugger support.

Then rseq will cover _some_ use-cases, but will miss many others.

One example is the reserve+commit lttng-ust ring buffer operations, where
the commit _needs_ to run on the same CPU as the reserve. Just rseq does
not allow a tracer library to do that, rseq+cpu_opv allow that just
fine.

So if I introduce just rseq for now, then all those other use-cases will
need to check whether the kernel supports cpu_opv or not as well, cache the
result into a variable, and it will add a forest of branches into those
fast paths. No thanks.

Also, turning both line-level and instruction-level single-stepping into
infinite loops looks pretty much like a new kernel facility that breaks
user-space. It's a no-go from my point of view.

Thanks,

Mathieu




-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-23 21:13         ` Mathieu Desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-23 21:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Chris Lameter, Ben Maurer, rostedt, Josh Triplett,
	Linus Torvalds, Catalin Marinas, Will

----- On Nov 22, 2017, at 2:32 PM, Peter Zijlstra peterz@infradead.org wrote:

> On Tue, Nov 21, 2017 at 10:05:08PM +0000, Mathieu Desnoyers wrote:
>> Other than that, I have not received any concrete alternative proposal to
>> properly handle single-stepping.
> 
> That's not entirely true; amluto did have an alternative in Prague: do
> full machine level instruction emulation till the end of the rseq when
> it gets 'preempted too often'.

Yes, that's right. Andy did propose that alternative at KS. Which is also
interpreter-based.

> 
> Yes, implementing that will be an absolute royal pain. But it does
> remove the whole duplicate/dual program asm/bytecode thing and avoids
> the syscall entirely.

Agreed on this being a royal pain that we'd have to do for each
architecture.

By the way, I figured an interesting library API that would remove the
need for code duplication for end-users:

e.g.

static inline __attribute__((always_inline))
int percpu_addv(intptr_t *v, intptr_t count, int cpu)
{
        rseq_addv(v, count, cpu);
        if (rseq_unlikely(rseq_addv(v, count, cpu)))
                return cpu_op_addv(v, count, cpu);
        return 0;
}

And the caller becomes:

                cpu = rseq_cpu_start();
                ret = percpu_addv(&data->c[cpu].count, 1, cpu);
                if (unlikely(ret)) {
                        perror("cpu_opv");
                        abort();
                }

So the caller does not even have to bother retrying in case of
rseq error, it's all handled by the "percpu_*()" static inlines.

> 
> And we don't need to do a full x86_64/arch-of-choice emulator for this
> either; just as cpu_opv is fairly limited too. We can do a subset that
> allows dealing with the known sequences and go from there -- it can
> always fall back to not emulating and reverting to the pure rseq with
> debug/fwd progress 'issues'.

I think trying to make the kernel ABI "developer-friendly" is the wrong
approach. This kind of ease-of-use sugar should be provided by a library,
not by the kernel ABI. The futex system call is a good example of low-level
syscall meant to be used by libraries rather than directly by end-users.

> So what exactly is the problem of leaving out the whole cpu_opv thing
> for now? Pure rseq is usable -- albeit a bit cumbersome without
> additional debugger support.

Then rseq will cover _some_ use-cases, but will miss many others.

One example is the reserve+commit lttng-ust ring buffer operations, where
the commit _needs_ to run on the same CPU as the reserve. Just rseq does
not allow a tracer library to do that, rseq+cpu_opv allow that just
fine.

So if I introduce just rseq for now, then all those other use-cases will
need to check whether the kernel supports cpu_opv or not as well, cache the
result into a variable, and it will add a forest of branches into those
fast paths. No thanks.

Also, turning both line-level and instruction-level single-stepping into
infinite loops looks pretty much like a new kernel facility that breaks
user-space. It's a no-go from my point of view.

Thanks,

Mathieu




-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-23 21:15           ` Mathieu Desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-23 21:15 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Andi Kleen, Paul E. McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson, linux-kernel, linux-api,
	Paul Turner, Andrew Morton, Russell King, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Andrew Hunter, Chris Lameter,
	Ben Maurer, rostedt, Josh Triplett, Linus Torvalds,
	Catalin Marinas, Michael Kerrisk

----- On Nov 22, 2017, at 2:37 PM, Will Deacon will.deacon@arm.com wrote:

> On Wed, Nov 22, 2017 at 08:32:19PM +0100, Peter Zijlstra wrote:
>> On Tue, Nov 21, 2017 at 10:05:08PM +0000, Mathieu Desnoyers wrote:
>> > Other than that, I have not received any concrete alternative proposal to
>> > properly handle single-stepping.
>> 
>> That's not entirely true; amluto did have an alternative in Prague: do
>> full machine level instruction emulation till the end of the rseq when
>> it gets 'preempted too often'.
>> 
>> Yes, implementing that will be an absolute royal pain. But it does
>> remove the whole duplicate/dual program asm/bytecode thing and avoids
>> the syscall entirely.
>> 
>> And we don't need to do a full x86_64/arch-of-choice emulator for this
>> either; just as cpu_opv is fairly limited too. We can do a subset that
>> allows dealing with the known sequences and go from there -- it can
>> always fall back to not emulating and reverting to the pure rseq with
>> debug/fwd progress 'issues'.
>> 
>> So what exactly is the problem of leaving out the whole cpu_opv thing
>> for now? Pure rseq is usable -- albeit a bit cumbersome without
>> additional debugger support.
> 
> Drive-by "ack" to that. I'd really like a working rseq implementation in
> mainline, but I don't much care for another interpreter.

Considering the arm 64 use-case of reading PMU counters from user-space
using rseq to prevent migration, I understand that you're lucky enough to
already have a system call at your disposal that can perform the slow-path
in case of single-stepping.

So yes, your particular case is already covered, but unfortunately that's
not the same situation for other use-cases that have been expressed.

Thanks,

Mathieu


> 
> Will

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-23 21:15           ` Mathieu Desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-23 21:15 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Andi Kleen, Paul E. McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson, linux-kernel, linux-api,
	Paul Turner, Andrew Morton, Russell King, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Andrew Hunter, Chris Lameter,
	Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin

----- On Nov 22, 2017, at 2:37 PM, Will Deacon will.deacon-5wv7dgnIgG8@public.gmane.org wrote:

> On Wed, Nov 22, 2017 at 08:32:19PM +0100, Peter Zijlstra wrote:
>> On Tue, Nov 21, 2017 at 10:05:08PM +0000, Mathieu Desnoyers wrote:
>> > Other than that, I have not received any concrete alternative proposal to
>> > properly handle single-stepping.
>> 
>> That's not entirely true; amluto did have an alternative in Prague: do
>> full machine level instruction emulation till the end of the rseq when
>> it gets 'preempted too often'.
>> 
>> Yes, implementing that will be an absolute royal pain. But it does
>> remove the whole duplicate/dual program asm/bytecode thing and avoids
>> the syscall entirely.
>> 
>> And we don't need to do a full x86_64/arch-of-choice emulator for this
>> either; just as cpu_opv is fairly limited too. We can do a subset that
>> allows dealing with the known sequences and go from there -- it can
>> always fall back to not emulating and reverting to the pure rseq with
>> debug/fwd progress 'issues'.
>> 
>> So what exactly is the problem of leaving out the whole cpu_opv thing
>> for now? Pure rseq is usable -- albeit a bit cumbersome without
>> additional debugger support.
> 
> Drive-by "ack" to that. I'd really like a working rseq implementation in
> mainline, but I don't much care for another interpreter.

Considering the arm 64 use-case of reading PMU counters from user-space
using rseq to prevent migration, I understand that you're lucky enough to
already have a system call at your disposal that can perform the slow-path
in case of single-stepping.

So yes, your particular case is already covered, but unfortunately that's
not the same situation for other use-cases that have been expressed.

Thanks,

Mathieu


> 
> Will

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
  2017-11-22 19:38     ` peterz
  (?)
  (?)
@ 2017-11-23 21:16       ` mathieu.desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-23 21:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, shuah, linux-kselftest

----- On Nov 22, 2017, at 2:38 PM, Peter Zijlstra peterz@infradead.org wrote:

> On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
>> Implements two basic tests of RSEQ functionality, and one more
>> exhaustive parameterizable test.
>> 
>> The first, "basic_test" only asserts that RSEQ works moderately
>> correctly. E.g. that the CPUID pointer works.
>> 
>> "basic_percpu_ops_test" is a slightly more "realistic" variant,
>> implementing a few simple per-cpu operations and testing their
>> correctness.
>> 
>> "param_test" is a parametrizable restartable sequences test. See
>> the "--help" output for usage.
>> 
>> A run_param_test.sh script runs many variants of the parametrizable
>> tests.
>> 
>> As part of those tests, a helper library "rseq" implements a user-space
>> API around restartable sequences. It uses the cpu_opv system call as
>> fallback when single-stepped by a debugger. It exposes the instruction
>> pointer addresses where the rseq assembly blocks begin and end, as well
>> as the associated abort instruction pointer, in the __rseq_table
>> section. This section allows debuggers may know where to place
>> breakpoints when single-stepping through assembly blocks which may be
>> aborted at any point by the kernel.
> 
> Could I ask you to split this in smaller bits?
> 
> I'd start with just the rseq library, using only the rseq interface.
> Then add the whole cpu_opv fallback stuff.
> Then add the selftests using librseq.
> 
> As is this is a tad much to read in a single go.

Sure, will do! And I plan to change the selftests to use the new
"percpu_*()" API that removes the need to duplicate code in the
caller code.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-23 21:16       ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: mathieu.desnoyers @ 2017-11-23 21:16 UTC (permalink / raw)


----- On Nov 22, 2017, at 2:38 PM, Peter Zijlstra peterz at infradead.org wrote:

> On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
>> Implements two basic tests of RSEQ functionality, and one more
>> exhaustive parameterizable test.
>> 
>> The first, "basic_test" only asserts that RSEQ works moderately
>> correctly. E.g. that the CPUID pointer works.
>> 
>> "basic_percpu_ops_test" is a slightly more "realistic" variant,
>> implementing a few simple per-cpu operations and testing their
>> correctness.
>> 
>> "param_test" is a parametrizable restartable sequences test. See
>> the "--help" output for usage.
>> 
>> A run_param_test.sh script runs many variants of the parametrizable
>> tests.
>> 
>> As part of those tests, a helper library "rseq" implements a user-space
>> API around restartable sequences. It uses the cpu_opv system call as
>> fallback when single-stepped by a debugger. It exposes the instruction
>> pointer addresses where the rseq assembly blocks begin and end, as well
>> as the associated abort instruction pointer, in the __rseq_table
>> section. This section allows debuggers may know where to place
>> breakpoints when single-stepping through assembly blocks which may be
>> aborted at any point by the kernel.
> 
> Could I ask you to split this in smaller bits?
> 
> I'd start with just the rseq library, using only the rseq interface.
> Then add the whole cpu_opv fallback stuff.
> Then add the selftests using librseq.
> 
> As is this is a tad much to read in a single go.

Sure, will do! And I plan to change the selftests to use the new
"percpu_*()" API that removes the need to duplicate code in the
caller code.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-23 21:16       ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-23 21:16 UTC (permalink / raw)


----- On Nov 22, 2017,@2:38 PM, Peter Zijlstra peterz@infradead.org wrote:

> On Tue, Nov 21, 2017@09:18:53AM -0500, Mathieu Desnoyers wrote:
>> Implements two basic tests of RSEQ functionality, and one more
>> exhaustive parameterizable test.
>> 
>> The first, "basic_test" only asserts that RSEQ works moderately
>> correctly. E.g. that the CPUID pointer works.
>> 
>> "basic_percpu_ops_test" is a slightly more "realistic" variant,
>> implementing a few simple per-cpu operations and testing their
>> correctness.
>> 
>> "param_test" is a parametrizable restartable sequences test. See
>> the "--help" output for usage.
>> 
>> A run_param_test.sh script runs many variants of the parametrizable
>> tests.
>> 
>> As part of those tests, a helper library "rseq" implements a user-space
>> API around restartable sequences. It uses the cpu_opv system call as
>> fallback when single-stepped by a debugger. It exposes the instruction
>> pointer addresses where the rseq assembly blocks begin and end, as well
>> as the associated abort instruction pointer, in the __rseq_table
>> section. This section allows debuggers may know where to place
>> breakpoints when single-stepping through assembly blocks which may be
>> aborted at any point by the kernel.
> 
> Could I ask you to split this in smaller bits?
> 
> I'd start with just the rseq library, using only the rseq interface.
> Then add the whole cpu_opv fallback stuff.
> Then add the selftests using librseq.
> 
> As is this is a tad much to read in a single go.

Sure, will do! And I plan to change the selftests to use the new
"percpu_*()" API that removes the need to duplicate code in the
caller code.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-23 21:16       ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-23 21:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon

----- On Nov 22, 2017, at 2:38 PM, Peter Zijlstra peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org wrote:

> On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
>> Implements two basic tests of RSEQ functionality, and one more
>> exhaustive parameterizable test.
>> 
>> The first, "basic_test" only asserts that RSEQ works moderately
>> correctly. E.g. that the CPUID pointer works.
>> 
>> "basic_percpu_ops_test" is a slightly more "realistic" variant,
>> implementing a few simple per-cpu operations and testing their
>> correctness.
>> 
>> "param_test" is a parametrizable restartable sequences test. See
>> the "--help" output for usage.
>> 
>> A run_param_test.sh script runs many variants of the parametrizable
>> tests.
>> 
>> As part of those tests, a helper library "rseq" implements a user-space
>> API around restartable sequences. It uses the cpu_opv system call as
>> fallback when single-stepped by a debugger. It exposes the instruction
>> pointer addresses where the rseq assembly blocks begin and end, as well
>> as the associated abort instruction pointer, in the __rseq_table
>> section. This section allows debuggers may know where to place
>> breakpoints when single-stepping through assembly blocks which may be
>> aborted at any point by the kernel.
> 
> Could I ask you to split this in smaller bits?
> 
> I'd start with just the rseq library, using only the rseq interface.
> Then add the whole cpu_opv fallback stuff.
> Then add the selftests using librseq.
> 
> As is this is a tad much to read in a single go.

Sure, will do! And I plan to change the selftests to use the new
"percpu_*()" API that removes the need to duplicate code in the
caller code.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
  2017-11-23 21:13         ` Mathieu Desnoyers
@ 2017-11-23 21:49           ` Andi Kleen
  -1 siblings, 0 replies; 175+ messages in thread
From: Andi Kleen @ 2017-11-23 21:49 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Andi Kleen, Paul E. McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson, linux-kernel, linux-api,
	Paul Turner, Andrew Morton, Russell King, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Andrew Hunter, Chris Lameter,
	Ben Maurer, rostedt, Josh Triplett, Linus Torvalds,
	Catalin Marinas, Will Deacon, Michael Kerrisk

> Also, turning both line-level and instruction-level single-stepping into
> infinite loops looks pretty much like a new kernel facility that breaks
> user-space. It's a no-go from my point of view.

You could fix it at the debugger level with suitable annotation. Just
turn the whole rseq into an extended line, and make sure it is handled
for instruction stepping too.

-Andi

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-23 21:49           ` Andi Kleen
  0 siblings, 0 replies; 175+ messages in thread
From: Andi Kleen @ 2017-11-23 21:49 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Andi Kleen, Paul E. McKenney, Boqun Feng,
	Andy Lutomirski, Dave Watson, linux-kernel, linux-api,
	Paul Turner, Andrew Morton, Russell King, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Andrew Hunter, Chris Lameter,
	Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin

> Also, turning both line-level and instruction-level single-stepping into
> infinite loops looks pretty much like a new kernel facility that breaks
> user-space. It's a no-go from my point of view.

You could fix it at the debugger level with suitable annotation. Just
turn the whole rseq into an extended line, and make sure it is handled
for instruction stepping too.

-Andi

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-23 22:51             ` Thomas Gleixner
  0 siblings, 0 replies; 175+ messages in thread
From: Thomas Gleixner @ 2017-11-23 22:51 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Will Deacon, Peter Zijlstra, Andi Kleen, Paul E. McKenney,
	Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel,
	linux-api, Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andrew Hunter, Chris Lameter, Ben Maurer,
	rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Michael Kerrisk

On Thu, 23 Nov 2017, Mathieu Desnoyers wrote:
> ----- On Nov 22, 2017, at 2:37 PM, Will Deacon will.deacon@arm.com wrote:
> > On Wed, Nov 22, 2017 at 08:32:19PM +0100, Peter Zijlstra wrote:
> >>
> >> So what exactly is the problem of leaving out the whole cpu_opv thing
> >> for now? Pure rseq is usable -- albeit a bit cumbersome without
> >> additional debugger support.
> > 
> > Drive-by "ack" to that. I'd really like a working rseq implementation in
> > mainline, but I don't much care for another interpreter.
> 
> Considering the arm 64 use-case of reading PMU counters from user-space
> using rseq to prevent migration, I understand that you're lucky enough to
> already have a system call at your disposal that can perform the slow-path
> in case of single-stepping.
> 
> So yes, your particular case is already covered, but unfortunately that's
> not the same situation for other use-cases that have been expressed.

If we have users of rseq which can do without the other muck, then what's
the reason not to support it?

The sysops thing can be sorted out on top and the use cases which need both
will have to test for both syscalls being available anyway.

Tanks,

	tglx

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-23 22:51             ` Thomas Gleixner
  0 siblings, 0 replies; 175+ messages in thread
From: Thomas Gleixner @ 2017-11-23 22:51 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Will Deacon, Peter Zijlstra, Andi Kleen, Paul E. McKenney,
	Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel,
	linux-api, Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andrew Hunter, Chris Lameter, Ben Maurer,
	rostedt, Josh Triplett, Linus Torvalds, Catalin

On Thu, 23 Nov 2017, Mathieu Desnoyers wrote:
> ----- On Nov 22, 2017, at 2:37 PM, Will Deacon will.deacon-5wv7dgnIgG8@public.gmane.org wrote:
> > On Wed, Nov 22, 2017 at 08:32:19PM +0100, Peter Zijlstra wrote:
> >>
> >> So what exactly is the problem of leaving out the whole cpu_opv thing
> >> for now? Pure rseq is usable -- albeit a bit cumbersome without
> >> additional debugger support.
> > 
> > Drive-by "ack" to that. I'd really like a working rseq implementation in
> > mainline, but I don't much care for another interpreter.
> 
> Considering the arm 64 use-case of reading PMU counters from user-space
> using rseq to prevent migration, I understand that you're lucky enough to
> already have a system call at your disposal that can perform the slow-path
> in case of single-stepping.
> 
> So yes, your particular case is already covered, but unfortunately that's
> not the same situation for other use-cases that have been expressed.

If we have users of rseq which can do without the other muck, then what's
the reason not to support it?

The sysops thing can be sorted out on top and the use cases which need both
will have to test for both syscalls being available anyway.

Tanks,

	tglx

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
  2017-11-22 21:48     ` peterz
  (?)
  (?)
@ 2017-11-23 22:53       ` mathieu.desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-23 22:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, shuah, linux-kselftest

----- On Nov 22, 2017, at 4:48 PM, Peter Zijlstra peterz@infradead.org wrote:

> On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
>> diff --git a/tools/testing/selftests/rseq/rseq-x86.h
>> b/tools/testing/selftests/rseq/rseq-x86.h
>> new file mode 100644
>> index 000000000000..63e81d6c61fa
>> --- /dev/null
>> +++ b/tools/testing/selftests/rseq/rseq-x86.h
>> @@ -0,0 +1,898 @@
>> +/*
>> + * rseq-x86.h
>> + *
>> + * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>> + * of this software and associated documentation files (the "Software"), to
>> deal
>> + * in the Software without restriction, including without limitation the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> + * copies of the Software, and to permit persons to whom the Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
>> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> THE
>> + * SOFTWARE.
>> + */
>> +
>> +#include <stdint.h>
>> +
>> +#define RSEQ_SIG	0x53053053
>> +
>> +#ifdef __x86_64__
>> +
>> +#define rseq_smp_mb()	__asm__ __volatile__ ("mfence" : : : "memory")
> 
> See commit:
> 
>  450cbdd0125c ("locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE")

OK, will use:

#define rseq_smp_mb()   \
        __asm__ __volatile__ ("lock; addl $0,-128(%%rsp)" ::: "memory", "cc")

as done in tools/virtio/ringtest/main.h

> 
>> +#define rseq_smp_rmb()	barrier()
>> +#define rseq_smp_wmb()	barrier()
>> +
>> +#define rseq_smp_load_acquire(p)					\
>> +__extension__ ({							\
>> +	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
>> +	barrier();							\
>> +	____p1;								\
>> +})
>> +
>> +#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
>> +
>> +#define rseq_smp_store_release(p, v)					\
>> +do {									\
>> +	barrier();							\
>> +	RSEQ_WRITE_ONCE(*p, v);						\
>> +} while (0)
>> +
>> +#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,		\
>> +			start_ip, post_commit_offset, abort_ip)		\
>> +		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
>> +		".balign 32\n\t"					\
>> +		__rseq_str(label) ":\n\t"				\
>> +		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
>> +		".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", "
>> __rseq_str(abort_ip) "\n\t" \
>> +		".popsection\n\t"
> 
> OK, so this creates table entry, but why is @section an argument, AFAICT
> its _always_ the same thing, no?

I agree that section names don't need to be passed as arguments, since it
is not useful information to understand the flow of asm using those
macros. Will remove.

> 
>> +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
>> +		RSEQ_INJECT_ASM(1)					\
>> +		"leaq " __rseq_str(cs_label) "(%%rip), %%rax\n\t"	\
>> +		"movq %%rax, %[" __rseq_str(rseq_cs) "]\n\t"		\
>> +		__rseq_str(label) ":\n\t"
> 
> And this sets the TLS variable to point to the table entry from the
> previous macro, no? But again @rseq_cs seems to always be the very same,
> why is that an argument?

I don't want to hide anything within macros that would prevent someone
looking that the overall assembly to understand the flow.

So all labels and references to named input operands, are passed as
arguments to those macros.

> 
>> +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
>> +		RSEQ_INJECT_ASM(2)					\
>> +		"cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \
>> +		"jnz " __rseq_str(label) "\n\t"
> 
> more things that are always the same it seems..

See this as passing arguments to functions. It makes the overall
assembly using those macros easier to read.

> 
>> +#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label) \
>> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
>> +		/* Disassembler-friendly signature: nopl <sig>(%rip). */\
>> +		".byte 0x0f, 0x1f, 0x05\n\t"				\
>> +		".long " __rseq_str(sig) "\n\t"			\
>> +		__rseq_str(label) ":\n\t"				\
>> +		teardown						\
>> +		"jmp %l[" __rseq_str(abort_label) "]\n\t"		\
>> +		".popsection\n\t"
> 
> @section and @sig seem to always be the same...

Good point for the @section, will move it into the macro because it is not
useful to understand the flow.

Same for @sig. Will move into the macro, given that it's not useful to
understand the flow.

> 
>> +#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
>> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
>> +		__rseq_str(label) ":\n\t"				\
>> +		teardown						\
>> +		"jmp %l[" __rseq_str(cmpfail_label) "]\n\t"		\
>> +		".popsection\n\t"
> 
> Somewhat failing to see the point of this macro, it seems to just
> obfuscate the normal failure path.

It's needed to hold the "teardown" code needed on error, for the
memcpy loops. You're right that I don't need it for all the other
cases, and I can directly jump to the cmpfail_label. Will do.

> 
>> +static inline __attribute__((always_inline))
>> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
>> +		int cpu)
> 
> I find this a very confusing name for what is essentially
> compare-and-exchange or compare-and-swap, no?

A compare-and-exchange will load the original value and return it.
With rseq, it appears that all we need is to compare a value, and
store the new value if the comparison succeeded. We seldom care
about returning the old value, so we might as well skip that load.
But then it's _not_ a cmpxchg, because it does not return the prior
value.


> 
>> +{
>> +	__asm__ __volatile__ goto (
>> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> 
> So we set up the section, but unreadably so... reducing the number of
> arguments would help a lot.
> 
> Rename the current one to __RSEQ_ASM_DEFINE_TABLE() and then use:
> 
> #define RSEQ_ASM_DEFINE_TABLE(label, start_ip, post_commit_ip, abort_ip) \
>	__RSEQ_ASM_DEFINE_TABLE(label, __rseq_table, 0x0, 0x0, start_ip, \
>				(post_commit_ip - start_ip), abort_ip)
> 
> or something, such that we can write:
> 
>		RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */
> 

Good point, it removes @version, @flags, and the 2f-1f calculation from the
asm, which distract from the flow.


>> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> 
> And here we open start the rseq by storing the table entry pointer into
> the TLS thingy.

I'll add a comment if that's what you're pointing to.

> 
>> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
>> +		"cmpq %[v], %[expect]\n\t"
>> +		"jnz 5f\n\t"
> 
>		"jnz %l[cmpfail]\n\t"
> 
> was too complicated?

Good point. It's a leftover from the prior iterations where I needed
to clear the rseq_cs field before exiting from the critical section.
Now that the kernel does it lazily, I don't need a "teardown" code
in those cases, and we can directly jump to the cmpfail label. Will
do.


> 
>> +		/* final store */
>> +		"movq %[newv], %[v]\n\t"
>> +		"2:\n\t"
>> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
>> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
>> +		: /* gcc asm goto does not allow outputs */
>> +		: [cpu_id]"r"(cpu),
>> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
>> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
>> +		  [v]"m"(*v),
>> +		  [expect]"r"(expect),
>> +		  [newv]"r"(newv)
> 
>		: [cpu_id]         "r" (cpu),
>		  [current_cpu_id] "m" (__rseq_abi.cpu_id),
>		  [rseq_cs]        "m" (__rseq_abi.rseq_cs),
>		  [v]              "m" (*v),
>		  [expect]         "r" (expect),
>		  [newv]           "r" (newv)
> 
> or something does read much better

done.

> 
>> +		: "memory", "cc", "rax"
>> +		: abort, cmpfail
>> +	);
>> +	return 0;
>> +abort:
>> +	return -1;
>> +cmpfail:
>> +	return 1;
>> +}
>> +
>> +static inline __attribute__((always_inline))
>> +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
>> +		off_t voffp, intptr_t *load, int cpu)
> 
> so this thing does what now? It compares @v to @expectnot, when _not_
> matching it will store @voffp into @v and load something..?

Not quite. I'll add this comment:

/*
 * Compare @v against @expectnot. When it does _not_ match, load @v
 * into @load, and store the content of *@v + voffp into @v.
 */

> 
>> +{
>> +	__asm__ __volatile__ goto (
>> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
>> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
>> +		"cmpq %[v], %[expectnot]\n\t"
>> +		"jz 5f\n\t"
> 
> So I would prefer "je" in this context, or rather:
> 
>		je %l[cmpfail]

ok

> 
>> +		"movq %[v], %%rax\n\t"
> 
> loads @v in A
> 
> But it could already have changed since the previous load from cmp, no?

No, given that it should only be touched by rseq or cpu_opv, which
don't allow concurrent accesses on the same per-cpu data.

> Would it not make sense to put this load before the cmp and use A
> instead?

Sure, I could do the load once, and compare against a register
instead. It's not needed for correctness, but seems cleaner
nevertheless. I'll need to use some other register than rax though,
because I don't want to clash with the RSEQ_INJECT_ASM() delay-injection
loops, which are using eax. Perhaps rbx then.

> 
>> +		"movq %%rax, %[load]\n\t"
> 
> stores A in @load
> 
>> +		"addq %[voffp], %%rax\n\t"
> 
> adds @off to A
> 
>> +		"movq (%%rax), %%rax\n\t"
> 
> loads (A) in A
> 
>> +		/* final store */
>> +		"movq %%rax, %[v]\n\t"
> 
> stores A in @v
> 
> 
> So the whole thing loads @v into @load, adds and offset, dereferences
> and adds that back in @v, provided @v doesn't match @expected.. whee.

Yes, exactly! ;)


> 
>> +		"2:\n\t"
>> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
>> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
>> +		: /* gcc asm goto does not allow outputs */
>> +		: [cpu_id]"r"(cpu),
>> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
>> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
>> +		  /* final store input */
>> +		  [v]"m"(*v),
>> +		  [expectnot]"r"(expectnot),
>> +		  [voffp]"er"(voffp),
>> +		  [load]"m"(*load)
>> +		: "memory", "cc", "rax"
>> +		: abort, cmpfail
>> +	);
>> +	return 0;
>> +abort:
>> +	return -1;
>> +cmpfail:
>> +	return 1;
>> +}
> 
>> +#elif __i386__
>> +
>> +/*
>> + * Support older 32-bit architectures that do not implement fence
>> + * instructions.
>> + */
>> +#define rseq_smp_mb()	\
>> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
>> +#define rseq_smp_rmb()	\
>> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
>> +#define rseq_smp_wmb()	\
>> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
> 
> Oh shiny, you're supporting that OOSTORE and PPRO_FENCE nonsense?

Well, what's the point in supporting only a subset of the x86 architecture ? ;)

> 
> Going by commit:
> 
>  09df7c4c8097 ("x86: Remove CONFIG_X86_OOSTORE")
> 
> That smp_wmb() one was an 'optimization' (forced store buffer flush) but
> not a correctness thing. And we dropped that stuff from the kernel a
> _long_ time ago.

Perhaps from a kernel code perspective it's not a correctness thing, but
this is a correctness issue for user-space code.

I have the following comments in liburcu near the x86-32 barriers:

 * We leave smp_rmb/smp_wmb as full barriers for processors that do not have
 * fence instructions.
 *
 * An empty cmm_smp_rmb() may not be enough on old PentiumPro multiprocessor
 * systems, due to an erratum.  The Linux kernel says that "Even distro
 * kernels should think twice before enabling this", but for now let's
 * be conservative and leave the full barrier on 32-bit processors.  Also,
 * IDT WinChip supports weak store ordering, and the kernel may enable it
 * under our feet; cmm_smp_wmb() ceases to be a nop for these processors.

So it looks like from a user-space perspective, having the smp_wmb() would
be a correctness issue for older kernels on IDT WinChip processors.

Now you could argue that the recent kernel that will implement the "rseq"
system call will _never_ enable this IDT WinChip OOSTORE stuff. So we would
end up relying on success of the rseq cpu_id field comparison to ensure
rseq is indeed available, and in those cases the OOSTORE-aware smp_wmb is not
needed on x86-32.

Am I understanding this correctly ?

In that case, we should really document that those rseq_smp_wmb() are only
valid when running on kernels that provide the rseq system call.

> Ideally we'd kill that PPRO_FENCE crap too.

AFAIU, this one is part of the PPRO architecture, not really enabled explicitly
by the kernel, right ? I'm not sure what we can do about this other than
politely asking everyone to throw away these old pieces of hardware... ;)

Thanks for the review!

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-23 22:53       ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: mathieu.desnoyers @ 2017-11-23 22:53 UTC (permalink / raw)


----- On Nov 22, 2017, at 4:48 PM, Peter Zijlstra peterz at infradead.org wrote:

> On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
>> diff --git a/tools/testing/selftests/rseq/rseq-x86.h
>> b/tools/testing/selftests/rseq/rseq-x86.h
>> new file mode 100644
>> index 000000000000..63e81d6c61fa
>> --- /dev/null
>> +++ b/tools/testing/selftests/rseq/rseq-x86.h
>> @@ -0,0 +1,898 @@
>> +/*
>> + * rseq-x86.h
>> + *
>> + * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>> + * of this software and associated documentation files (the "Software"), to
>> deal
>> + * in the Software without restriction, including without limitation the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> + * copies of the Software, and to permit persons to whom the Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
>> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> THE
>> + * SOFTWARE.
>> + */
>> +
>> +#include <stdint.h>
>> +
>> +#define RSEQ_SIG	0x53053053
>> +
>> +#ifdef __x86_64__
>> +
>> +#define rseq_smp_mb()	__asm__ __volatile__ ("mfence" : : : "memory")
> 
> See commit:
> 
>  450cbdd0125c ("locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE")

OK, will use:

#define rseq_smp_mb()   \
        __asm__ __volatile__ ("lock; addl $0,-128(%%rsp)" ::: "memory", "cc")

as done in tools/virtio/ringtest/main.h

> 
>> +#define rseq_smp_rmb()	barrier()
>> +#define rseq_smp_wmb()	barrier()
>> +
>> +#define rseq_smp_load_acquire(p)					\
>> +__extension__ ({							\
>> +	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
>> +	barrier();							\
>> +	____p1;								\
>> +})
>> +
>> +#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
>> +
>> +#define rseq_smp_store_release(p, v)					\
>> +do {									\
>> +	barrier();							\
>> +	RSEQ_WRITE_ONCE(*p, v);						\
>> +} while (0)
>> +
>> +#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,		\
>> +			start_ip, post_commit_offset, abort_ip)		\
>> +		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
>> +		".balign 32\n\t"					\
>> +		__rseq_str(label) ":\n\t"				\
>> +		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
>> +		".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", "
>> __rseq_str(abort_ip) "\n\t" \
>> +		".popsection\n\t"
> 
> OK, so this creates table entry, but why is @section an argument, AFAICT
> its _always_ the same thing, no?

I agree that section names don't need to be passed as arguments, since it
is not useful information to understand the flow of asm using those
macros. Will remove.

> 
>> +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
>> +		RSEQ_INJECT_ASM(1)					\
>> +		"leaq " __rseq_str(cs_label) "(%%rip), %%rax\n\t"	\
>> +		"movq %%rax, %[" __rseq_str(rseq_cs) "]\n\t"		\
>> +		__rseq_str(label) ":\n\t"
> 
> And this sets the TLS variable to point to the table entry from the
> previous macro, no? But again @rseq_cs seems to always be the very same,
> why is that an argument?

I don't want to hide anything within macros that would prevent someone
looking that the overall assembly to understand the flow.

So all labels and references to named input operands, are passed as
arguments to those macros.

> 
>> +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
>> +		RSEQ_INJECT_ASM(2)					\
>> +		"cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \
>> +		"jnz " __rseq_str(label) "\n\t"
> 
> more things that are always the same it seems..

See this as passing arguments to functions. It makes the overall
assembly using those macros easier to read.

> 
>> +#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label) \
>> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
>> +		/* Disassembler-friendly signature: nopl <sig>(%rip). */\
>> +		".byte 0x0f, 0x1f, 0x05\n\t"				\
>> +		".long " __rseq_str(sig) "\n\t"			\
>> +		__rseq_str(label) ":\n\t"				\
>> +		teardown						\
>> +		"jmp %l[" __rseq_str(abort_label) "]\n\t"		\
>> +		".popsection\n\t"
> 
> @section and @sig seem to always be the same...

Good point for the @section, will move it into the macro because it is not
useful to understand the flow.

Same for @sig. Will move into the macro, given that it's not useful to
understand the flow.

> 
>> +#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
>> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
>> +		__rseq_str(label) ":\n\t"				\
>> +		teardown						\
>> +		"jmp %l[" __rseq_str(cmpfail_label) "]\n\t"		\
>> +		".popsection\n\t"
> 
> Somewhat failing to see the point of this macro, it seems to just
> obfuscate the normal failure path.

It's needed to hold the "teardown" code needed on error, for the
memcpy loops. You're right that I don't need it for all the other
cases, and I can directly jump to the cmpfail_label. Will do.

> 
>> +static inline __attribute__((always_inline))
>> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
>> +		int cpu)
> 
> I find this a very confusing name for what is essentially
> compare-and-exchange or compare-and-swap, no?

A compare-and-exchange will load the original value and return it.
With rseq, it appears that all we need is to compare a value, and
store the new value if the comparison succeeded. We seldom care
about returning the old value, so we might as well skip that load.
But then it's _not_ a cmpxchg, because it does not return the prior
value.


> 
>> +{
>> +	__asm__ __volatile__ goto (
>> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> 
> So we set up the section, but unreadably so... reducing the number of
> arguments would help a lot.
> 
> Rename the current one to __RSEQ_ASM_DEFINE_TABLE() and then use:
> 
> #define RSEQ_ASM_DEFINE_TABLE(label, start_ip, post_commit_ip, abort_ip) \
>	__RSEQ_ASM_DEFINE_TABLE(label, __rseq_table, 0x0, 0x0, start_ip, \
>				(post_commit_ip - start_ip), abort_ip)
> 
> or something, such that we can write:
> 
>		RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */
> 

Good point, it removes @version, @flags, and the 2f-1f calculation from the
asm, which distract from the flow.


>> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> 
> And here we open start the rseq by storing the table entry pointer into
> the TLS thingy.

I'll add a comment if that's what you're pointing to.

> 
>> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
>> +		"cmpq %[v], %[expect]\n\t"
>> +		"jnz 5f\n\t"
> 
>		"jnz %l[cmpfail]\n\t"
> 
> was too complicated?

Good point. It's a leftover from the prior iterations where I needed
to clear the rseq_cs field before exiting from the critical section.
Now that the kernel does it lazily, I don't need a "teardown" code
in those cases, and we can directly jump to the cmpfail label. Will
do.


> 
>> +		/* final store */
>> +		"movq %[newv], %[v]\n\t"
>> +		"2:\n\t"
>> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
>> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
>> +		: /* gcc asm goto does not allow outputs */
>> +		: [cpu_id]"r"(cpu),
>> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
>> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
>> +		  [v]"m"(*v),
>> +		  [expect]"r"(expect),
>> +		  [newv]"r"(newv)
> 
>		: [cpu_id]         "r" (cpu),
>		  [current_cpu_id] "m" (__rseq_abi.cpu_id),
>		  [rseq_cs]        "m" (__rseq_abi.rseq_cs),
>		  [v]              "m" (*v),
>		  [expect]         "r" (expect),
>		  [newv]           "r" (newv)
> 
> or something does read much better

done.

> 
>> +		: "memory", "cc", "rax"
>> +		: abort, cmpfail
>> +	);
>> +	return 0;
>> +abort:
>> +	return -1;
>> +cmpfail:
>> +	return 1;
>> +}
>> +
>> +static inline __attribute__((always_inline))
>> +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
>> +		off_t voffp, intptr_t *load, int cpu)
> 
> so this thing does what now? It compares @v to @expectnot, when _not_
> matching it will store @voffp into @v and load something..?

Not quite. I'll add this comment:

/*
 * Compare @v against @expectnot. When it does _not_ match, load @v
 * into @load, and store the content of *@v + voffp into @v.
 */

> 
>> +{
>> +	__asm__ __volatile__ goto (
>> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
>> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
>> +		"cmpq %[v], %[expectnot]\n\t"
>> +		"jz 5f\n\t"
> 
> So I would prefer "je" in this context, or rather:
> 
>		je %l[cmpfail]

ok

> 
>> +		"movq %[v], %%rax\n\t"
> 
> loads @v in A
> 
> But it could already have changed since the previous load from cmp, no?

No, given that it should only be touched by rseq or cpu_opv, which
don't allow concurrent accesses on the same per-cpu data.

> Would it not make sense to put this load before the cmp and use A
> instead?

Sure, I could do the load once, and compare against a register
instead. It's not needed for correctness, but seems cleaner
nevertheless. I'll need to use some other register than rax though,
because I don't want to clash with the RSEQ_INJECT_ASM() delay-injection
loops, which are using eax. Perhaps rbx then.

> 
>> +		"movq %%rax, %[load]\n\t"
> 
> stores A in @load
> 
>> +		"addq %[voffp], %%rax\n\t"
> 
> adds @off to A
> 
>> +		"movq (%%rax), %%rax\n\t"
> 
> loads (A) in A
> 
>> +		/* final store */
>> +		"movq %%rax, %[v]\n\t"
> 
> stores A in @v
> 
> 
> So the whole thing loads @v into @load, adds and offset, dereferences
> and adds that back in @v, provided @v doesn't match @expected.. whee.

Yes, exactly! ;)


> 
>> +		"2:\n\t"
>> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
>> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
>> +		: /* gcc asm goto does not allow outputs */
>> +		: [cpu_id]"r"(cpu),
>> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
>> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
>> +		  /* final store input */
>> +		  [v]"m"(*v),
>> +		  [expectnot]"r"(expectnot),
>> +		  [voffp]"er"(voffp),
>> +		  [load]"m"(*load)
>> +		: "memory", "cc", "rax"
>> +		: abort, cmpfail
>> +	);
>> +	return 0;
>> +abort:
>> +	return -1;
>> +cmpfail:
>> +	return 1;
>> +}
> 
>> +#elif __i386__
>> +
>> +/*
>> + * Support older 32-bit architectures that do not implement fence
>> + * instructions.
>> + */
>> +#define rseq_smp_mb()	\
>> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
>> +#define rseq_smp_rmb()	\
>> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
>> +#define rseq_smp_wmb()	\
>> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
> 
> Oh shiny, you're supporting that OOSTORE and PPRO_FENCE nonsense?

Well, what's the point in supporting only a subset of the x86 architecture ? ;)

> 
> Going by commit:
> 
>  09df7c4c8097 ("x86: Remove CONFIG_X86_OOSTORE")
> 
> That smp_wmb() one was an 'optimization' (forced store buffer flush) but
> not a correctness thing. And we dropped that stuff from the kernel a
> _long_ time ago.

Perhaps from a kernel code perspective it's not a correctness thing, but
this is a correctness issue for user-space code.

I have the following comments in liburcu near the x86-32 barriers:

 * We leave smp_rmb/smp_wmb as full barriers for processors that do not have
 * fence instructions.
 *
 * An empty cmm_smp_rmb() may not be enough on old PentiumPro multiprocessor
 * systems, due to an erratum.  The Linux kernel says that "Even distro
 * kernels should think twice before enabling this", but for now let's
 * be conservative and leave the full barrier on 32-bit processors.  Also,
 * IDT WinChip supports weak store ordering, and the kernel may enable it
 * under our feet; cmm_smp_wmb() ceases to be a nop for these processors.

So it looks like from a user-space perspective, having the smp_wmb() would
be a correctness issue for older kernels on IDT WinChip processors.

Now you could argue that the recent kernel that will implement the "rseq"
system call will _never_ enable this IDT WinChip OOSTORE stuff. So we would
end up relying on success of the rseq cpu_id field comparison to ensure
rseq is indeed available, and in those cases the OOSTORE-aware smp_wmb is not
needed on x86-32.

Am I understanding this correctly ?

In that case, we should really document that those rseq_smp_wmb() are only
valid when running on kernels that provide the rseq system call.

> Ideally we'd kill that PPRO_FENCE crap too.

AFAIU, this one is part of the PPRO architecture, not really enabled explicitly
by the kernel, right ? I'm not sure what we can do about this other than
politely asking everyone to throw away these old pieces of hardware... ;)

Thanks for the review!

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-23 22:53       ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-23 22:53 UTC (permalink / raw)


----- On Nov 22, 2017,@4:48 PM, Peter Zijlstra peterz@infradead.org wrote:

> On Tue, Nov 21, 2017@09:18:53AM -0500, Mathieu Desnoyers wrote:
>> diff --git a/tools/testing/selftests/rseq/rseq-x86.h
>> b/tools/testing/selftests/rseq/rseq-x86.h
>> new file mode 100644
>> index 000000000000..63e81d6c61fa
>> --- /dev/null
>> +++ b/tools/testing/selftests/rseq/rseq-x86.h
>> @@ -0,0 +1,898 @@
>> +/*
>> + * rseq-x86.h
>> + *
>> + * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>> + * of this software and associated documentation files (the "Software"), to
>> deal
>> + * in the Software without restriction, including without limitation the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> + * copies of the Software, and to permit persons to whom the Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
>> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> THE
>> + * SOFTWARE.
>> + */
>> +
>> +#include <stdint.h>
>> +
>> +#define RSEQ_SIG	0x53053053
>> +
>> +#ifdef __x86_64__
>> +
>> +#define rseq_smp_mb()	__asm__ __volatile__ ("mfence" : : : "memory")
> 
> See commit:
> 
>  450cbdd0125c ("locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE")

OK, will use:

#define rseq_smp_mb()   \
        __asm__ __volatile__ ("lock; addl $0,-128(%%rsp)" ::: "memory", "cc")

as done in tools/virtio/ringtest/main.h

> 
>> +#define rseq_smp_rmb()	barrier()
>> +#define rseq_smp_wmb()	barrier()
>> +
>> +#define rseq_smp_load_acquire(p)					\
>> +__extension__ ({							\
>> +	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
>> +	barrier();							\
>> +	____p1;								\
>> +})
>> +
>> +#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
>> +
>> +#define rseq_smp_store_release(p, v)					\
>> +do {									\
>> +	barrier();							\
>> +	RSEQ_WRITE_ONCE(*p, v);						\
>> +} while (0)
>> +
>> +#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,		\
>> +			start_ip, post_commit_offset, abort_ip)		\
>> +		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
>> +		".balign 32\n\t"					\
>> +		__rseq_str(label) ":\n\t"				\
>> +		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
>> +		".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", "
>> __rseq_str(abort_ip) "\n\t" \
>> +		".popsection\n\t"
> 
> OK, so this creates table entry, but why is @section an argument, AFAICT
> its _always_ the same thing, no?

I agree that section names don't need to be passed as arguments, since it
is not useful information to understand the flow of asm using those
macros. Will remove.

> 
>> +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
>> +		RSEQ_INJECT_ASM(1)					\
>> +		"leaq " __rseq_str(cs_label) "(%%rip), %%rax\n\t"	\
>> +		"movq %%rax, %[" __rseq_str(rseq_cs) "]\n\t"		\
>> +		__rseq_str(label) ":\n\t"
> 
> And this sets the TLS variable to point to the table entry from the
> previous macro, no? But again @rseq_cs seems to always be the very same,
> why is that an argument?

I don't want to hide anything within macros that would prevent someone
looking that the overall assembly to understand the flow.

So all labels and references to named input operands, are passed as
arguments to those macros.

> 
>> +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
>> +		RSEQ_INJECT_ASM(2)					\
>> +		"cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \
>> +		"jnz " __rseq_str(label) "\n\t"
> 
> more things that are always the same it seems..

See this as passing arguments to functions. It makes the overall
assembly using those macros easier to read.

> 
>> +#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label) \
>> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
>> +		/* Disassembler-friendly signature: nopl <sig>(%rip). */\
>> +		".byte 0x0f, 0x1f, 0x05\n\t"				\
>> +		".long " __rseq_str(sig) "\n\t"			\
>> +		__rseq_str(label) ":\n\t"				\
>> +		teardown						\
>> +		"jmp %l[" __rseq_str(abort_label) "]\n\t"		\
>> +		".popsection\n\t"
> 
> @section and @sig seem to always be the same...

Good point for the @section, will move it into the macro because it is not
useful to understand the flow.

Same for @sig. Will move into the macro, given that it's not useful to
understand the flow.

> 
>> +#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
>> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
>> +		__rseq_str(label) ":\n\t"				\
>> +		teardown						\
>> +		"jmp %l[" __rseq_str(cmpfail_label) "]\n\t"		\
>> +		".popsection\n\t"
> 
> Somewhat failing to see the point of this macro, it seems to just
> obfuscate the normal failure path.

It's needed to hold the "teardown" code needed on error, for the
memcpy loops. You're right that I don't need it for all the other
cases, and I can directly jump to the cmpfail_label. Will do.

> 
>> +static inline __attribute__((always_inline))
>> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
>> +		int cpu)
> 
> I find this a very confusing name for what is essentially
> compare-and-exchange or compare-and-swap, no?

A compare-and-exchange will load the original value and return it.
With rseq, it appears that all we need is to compare a value, and
store the new value if the comparison succeeded. We seldom care
about returning the old value, so we might as well skip that load.
But then it's _not_ a cmpxchg, because it does not return the prior
value.


> 
>> +{
>> +	__asm__ __volatile__ goto (
>> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> 
> So we set up the section, but unreadably so... reducing the number of
> arguments would help a lot.
> 
> Rename the current one to __RSEQ_ASM_DEFINE_TABLE() and then use:
> 
> #define RSEQ_ASM_DEFINE_TABLE(label, start_ip, post_commit_ip, abort_ip) \
>	__RSEQ_ASM_DEFINE_TABLE(label, __rseq_table, 0x0, 0x0, start_ip, \
>				(post_commit_ip - start_ip), abort_ip)
> 
> or something, such that we can write:
> 
>		RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */
> 

Good point, it removes @version, @flags, and the 2f-1f calculation from the
asm, which distract from the flow.


>> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> 
> And here we open start the rseq by storing the table entry pointer into
> the TLS thingy.

I'll add a comment if that's what you're pointing to.

> 
>> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
>> +		"cmpq %[v], %[expect]\n\t"
>> +		"jnz 5f\n\t"
> 
>		"jnz %l[cmpfail]\n\t"
> 
> was too complicated?

Good point. It's a leftover from the prior iterations where I needed
to clear the rseq_cs field before exiting from the critical section.
Now that the kernel does it lazily, I don't need a "teardown" code
in those cases, and we can directly jump to the cmpfail label. Will
do.


> 
>> +		/* final store */
>> +		"movq %[newv], %[v]\n\t"
>> +		"2:\n\t"
>> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
>> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
>> +		: /* gcc asm goto does not allow outputs */
>> +		: [cpu_id]"r"(cpu),
>> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
>> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
>> +		  [v]"m"(*v),
>> +		  [expect]"r"(expect),
>> +		  [newv]"r"(newv)
> 
>		: [cpu_id]         "r" (cpu),
>		  [current_cpu_id] "m" (__rseq_abi.cpu_id),
>		  [rseq_cs]        "m" (__rseq_abi.rseq_cs),
>		  [v]              "m" (*v),
>		  [expect]         "r" (expect),
>		  [newv]           "r" (newv)
> 
> or something does read much better

done.

> 
>> +		: "memory", "cc", "rax"
>> +		: abort, cmpfail
>> +	);
>> +	return 0;
>> +abort:
>> +	return -1;
>> +cmpfail:
>> +	return 1;
>> +}
>> +
>> +static inline __attribute__((always_inline))
>> +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
>> +		off_t voffp, intptr_t *load, int cpu)
> 
> so this thing does what now? It compares @v to @expectnot, when _not_
> matching it will store @voffp into @v and load something..?

Not quite. I'll add this comment:

/*
 * Compare @v against @expectnot. When it does _not_ match, load @v
 * into @load, and store the content of *@v + voffp into @v.
 */

> 
>> +{
>> +	__asm__ __volatile__ goto (
>> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
>> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
>> +		"cmpq %[v], %[expectnot]\n\t"
>> +		"jz 5f\n\t"
> 
> So I would prefer "je" in this context, or rather:
> 
>		je %l[cmpfail]

ok

> 
>> +		"movq %[v], %%rax\n\t"
> 
> loads @v in A
> 
> But it could already have changed since the previous load from cmp, no?

No, given that it should only be touched by rseq or cpu_opv, which
don't allow concurrent accesses on the same per-cpu data.

> Would it not make sense to put this load before the cmp and use A
> instead?

Sure, I could do the load once, and compare against a register
instead. It's not needed for correctness, but seems cleaner
nevertheless. I'll need to use some other register than rax though,
because I don't want to clash with the RSEQ_INJECT_ASM() delay-injection
loops, which are using eax. Perhaps rbx then.

> 
>> +		"movq %%rax, %[load]\n\t"
> 
> stores A in @load
> 
>> +		"addq %[voffp], %%rax\n\t"
> 
> adds @off to A
> 
>> +		"movq (%%rax), %%rax\n\t"
> 
> loads (A) in A
> 
>> +		/* final store */
>> +		"movq %%rax, %[v]\n\t"
> 
> stores A in @v
> 
> 
> So the whole thing loads @v into @load, adds and offset, dereferences
> and adds that back in @v, provided @v doesn't match @expected.. whee.

Yes, exactly! ;)


> 
>> +		"2:\n\t"
>> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
>> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
>> +		: /* gcc asm goto does not allow outputs */
>> +		: [cpu_id]"r"(cpu),
>> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
>> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
>> +		  /* final store input */
>> +		  [v]"m"(*v),
>> +		  [expectnot]"r"(expectnot),
>> +		  [voffp]"er"(voffp),
>> +		  [load]"m"(*load)
>> +		: "memory", "cc", "rax"
>> +		: abort, cmpfail
>> +	);
>> +	return 0;
>> +abort:
>> +	return -1;
>> +cmpfail:
>> +	return 1;
>> +}
> 
>> +#elif __i386__
>> +
>> +/*
>> + * Support older 32-bit architectures that do not implement fence
>> + * instructions.
>> + */
>> +#define rseq_smp_mb()	\
>> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
>> +#define rseq_smp_rmb()	\
>> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
>> +#define rseq_smp_wmb()	\
>> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
> 
> Oh shiny, you're supporting that OOSTORE and PPRO_FENCE nonsense?

Well, what's the point in supporting only a subset of the x86 architecture ? ;)

> 
> Going by commit:
> 
>  09df7c4c8097 ("x86: Remove CONFIG_X86_OOSTORE")
> 
> That smp_wmb() one was an 'optimization' (forced store buffer flush) but
> not a correctness thing. And we dropped that stuff from the kernel a
> _long_ time ago.

Perhaps from a kernel code perspective it's not a correctness thing, but
this is a correctness issue for user-space code.

I have the following comments in liburcu near the x86-32 barriers:

 * We leave smp_rmb/smp_wmb as full barriers for processors that do not have
 * fence instructions.
 *
 * An empty cmm_smp_rmb() may not be enough on old PentiumPro multiprocessor
 * systems, due to an erratum.  The Linux kernel says that "Even distro
 * kernels should think twice before enabling this", but for now let's
 * be conservative and leave the full barrier on 32-bit processors.  Also,
 * IDT WinChip supports weak store ordering, and the kernel may enable it
 * under our feet; cmm_smp_wmb() ceases to be a nop for these processors.

So it looks like from a user-space perspective, having the smp_wmb() would
be a correctness issue for older kernels on IDT WinChip processors.

Now you could argue that the recent kernel that will implement the "rseq"
system call will _never_ enable this IDT WinChip OOSTORE stuff. So we would
end up relying on success of the rseq cpu_id field comparison to ensure
rseq is indeed available, and in those cases the OOSTORE-aware smp_wmb is not
needed on x86-32.

Am I understanding this correctly ?

In that case, we should really document that those rseq_smp_wmb() are only
valid when running on kernels that provide the rseq system call.

> Ideally we'd kill that PPRO_FENCE crap too.

AFAIU, this one is part of the PPRO architecture, not really enabled explicitly
by the kernel, right ? I'm not sure what we can do about this other than
politely asking everyone to throw away these old pieces of hardware... ;)

Thanks for the review!

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-23 22:53       ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-23 22:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon

----- On Nov 22, 2017, at 4:48 PM, Peter Zijlstra peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org wrote:

> On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
>> diff --git a/tools/testing/selftests/rseq/rseq-x86.h
>> b/tools/testing/selftests/rseq/rseq-x86.h
>> new file mode 100644
>> index 000000000000..63e81d6c61fa
>> --- /dev/null
>> +++ b/tools/testing/selftests/rseq/rseq-x86.h
>> @@ -0,0 +1,898 @@
>> +/*
>> + * rseq-x86.h
>> + *
>> + * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>> + * of this software and associated documentation files (the "Software"), to
>> deal
>> + * in the Software without restriction, including without limitation the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> + * copies of the Software, and to permit persons to whom the Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
>> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> THE
>> + * SOFTWARE.
>> + */
>> +
>> +#include <stdint.h>
>> +
>> +#define RSEQ_SIG	0x53053053
>> +
>> +#ifdef __x86_64__
>> +
>> +#define rseq_smp_mb()	__asm__ __volatile__ ("mfence" : : : "memory")
> 
> See commit:
> 
>  450cbdd0125c ("locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE")

OK, will use:

#define rseq_smp_mb()   \
        __asm__ __volatile__ ("lock; addl $0,-128(%%rsp)" ::: "memory", "cc")

as done in tools/virtio/ringtest/main.h

> 
>> +#define rseq_smp_rmb()	barrier()
>> +#define rseq_smp_wmb()	barrier()
>> +
>> +#define rseq_smp_load_acquire(p)					\
>> +__extension__ ({							\
>> +	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
>> +	barrier();							\
>> +	____p1;								\
>> +})
>> +
>> +#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
>> +
>> +#define rseq_smp_store_release(p, v)					\
>> +do {									\
>> +	barrier();							\
>> +	RSEQ_WRITE_ONCE(*p, v);						\
>> +} while (0)
>> +
>> +#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,		\
>> +			start_ip, post_commit_offset, abort_ip)		\
>> +		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
>> +		".balign 32\n\t"					\
>> +		__rseq_str(label) ":\n\t"				\
>> +		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
>> +		".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", "
>> __rseq_str(abort_ip) "\n\t" \
>> +		".popsection\n\t"
> 
> OK, so this creates table entry, but why is @section an argument, AFAICT
> its _always_ the same thing, no?

I agree that section names don't need to be passed as arguments, since it
is not useful information to understand the flow of asm using those
macros. Will remove.

> 
>> +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
>> +		RSEQ_INJECT_ASM(1)					\
>> +		"leaq " __rseq_str(cs_label) "(%%rip), %%rax\n\t"	\
>> +		"movq %%rax, %[" __rseq_str(rseq_cs) "]\n\t"		\
>> +		__rseq_str(label) ":\n\t"
> 
> And this sets the TLS variable to point to the table entry from the
> previous macro, no? But again @rseq_cs seems to always be the very same,
> why is that an argument?

I don't want to hide anything within macros that would prevent someone
looking that the overall assembly to understand the flow.

So all labels and references to named input operands, are passed as
arguments to those macros.

> 
>> +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
>> +		RSEQ_INJECT_ASM(2)					\
>> +		"cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \
>> +		"jnz " __rseq_str(label) "\n\t"
> 
> more things that are always the same it seems..

See this as passing arguments to functions. It makes the overall
assembly using those macros easier to read.

> 
>> +#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label) \
>> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
>> +		/* Disassembler-friendly signature: nopl <sig>(%rip). */\
>> +		".byte 0x0f, 0x1f, 0x05\n\t"				\
>> +		".long " __rseq_str(sig) "\n\t"			\
>> +		__rseq_str(label) ":\n\t"				\
>> +		teardown						\
>> +		"jmp %l[" __rseq_str(abort_label) "]\n\t"		\
>> +		".popsection\n\t"
> 
> @section and @sig seem to always be the same...

Good point for the @section, will move it into the macro because it is not
useful to understand the flow.

Same for @sig. Will move into the macro, given that it's not useful to
understand the flow.

> 
>> +#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
>> +		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
>> +		__rseq_str(label) ":\n\t"				\
>> +		teardown						\
>> +		"jmp %l[" __rseq_str(cmpfail_label) "]\n\t"		\
>> +		".popsection\n\t"
> 
> Somewhat failing to see the point of this macro, it seems to just
> obfuscate the normal failure path.

It's needed to hold the "teardown" code needed on error, for the
memcpy loops. You're right that I don't need it for all the other
cases, and I can directly jump to the cmpfail_label. Will do.

> 
>> +static inline __attribute__((always_inline))
>> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
>> +		int cpu)
> 
> I find this a very confusing name for what is essentially
> compare-and-exchange or compare-and-swap, no?

A compare-and-exchange will load the original value and return it.
With rseq, it appears that all we need is to compare a value, and
store the new value if the comparison succeeded. We seldom care
about returning the old value, so we might as well skip that load.
But then it's _not_ a cmpxchg, because it does not return the prior
value.


> 
>> +{
>> +	__asm__ __volatile__ goto (
>> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
> 
> So we set up the section, but unreadably so... reducing the number of
> arguments would help a lot.
> 
> Rename the current one to __RSEQ_ASM_DEFINE_TABLE() and then use:
> 
> #define RSEQ_ASM_DEFINE_TABLE(label, start_ip, post_commit_ip, abort_ip) \
>	__RSEQ_ASM_DEFINE_TABLE(label, __rseq_table, 0x0, 0x0, start_ip, \
>				(post_commit_ip - start_ip), abort_ip)
> 
> or something, such that we can write:
> 
>		RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */
> 

Good point, it removes @version, @flags, and the 2f-1f calculation from the
asm, which distract from the flow.


>> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> 
> And here we open start the rseq by storing the table entry pointer into
> the TLS thingy.

I'll add a comment if that's what you're pointing to.

> 
>> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
>> +		"cmpq %[v], %[expect]\n\t"
>> +		"jnz 5f\n\t"
> 
>		"jnz %l[cmpfail]\n\t"
> 
> was too complicated?

Good point. It's a leftover from the prior iterations where I needed
to clear the rseq_cs field before exiting from the critical section.
Now that the kernel does it lazily, I don't need a "teardown" code
in those cases, and we can directly jump to the cmpfail label. Will
do.


> 
>> +		/* final store */
>> +		"movq %[newv], %[v]\n\t"
>> +		"2:\n\t"
>> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
>> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
>> +		: /* gcc asm goto does not allow outputs */
>> +		: [cpu_id]"r"(cpu),
>> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
>> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
>> +		  [v]"m"(*v),
>> +		  [expect]"r"(expect),
>> +		  [newv]"r"(newv)
> 
>		: [cpu_id]         "r" (cpu),
>		  [current_cpu_id] "m" (__rseq_abi.cpu_id),
>		  [rseq_cs]        "m" (__rseq_abi.rseq_cs),
>		  [v]              "m" (*v),
>		  [expect]         "r" (expect),
>		  [newv]           "r" (newv)
> 
> or something does read much better

done.

> 
>> +		: "memory", "cc", "rax"
>> +		: abort, cmpfail
>> +	);
>> +	return 0;
>> +abort:
>> +	return -1;
>> +cmpfail:
>> +	return 1;
>> +}
>> +
>> +static inline __attribute__((always_inline))
>> +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
>> +		off_t voffp, intptr_t *load, int cpu)
> 
> so this thing does what now? It compares @v to @expectnot, when _not_
> matching it will store @voffp into @v and load something..?

Not quite. I'll add this comment:

/*
 * Compare @v against @expectnot. When it does _not_ match, load @v
 * into @load, and store the content of *@v + voffp into @v.
 */

> 
>> +{
>> +	__asm__ __volatile__ goto (
>> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
>> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
>> +		"cmpq %[v], %[expectnot]\n\t"
>> +		"jz 5f\n\t"
> 
> So I would prefer "je" in this context, or rather:
> 
>		je %l[cmpfail]

ok

> 
>> +		"movq %[v], %%rax\n\t"
> 
> loads @v in A
> 
> But it could already have changed since the previous load from cmp, no?

No, given that it should only be touched by rseq or cpu_opv, which
don't allow concurrent accesses on the same per-cpu data.

> Would it not make sense to put this load before the cmp and use A
> instead?

Sure, I could do the load once, and compare against a register
instead. It's not needed for correctness, but seems cleaner
nevertheless. I'll need to use some other register than rax though,
because I don't want to clash with the RSEQ_INJECT_ASM() delay-injection
loops, which are using eax. Perhaps rbx then.

> 
>> +		"movq %%rax, %[load]\n\t"
> 
> stores A in @load
> 
>> +		"addq %[voffp], %%rax\n\t"
> 
> adds @off to A
> 
>> +		"movq (%%rax), %%rax\n\t"
> 
> loads (A) in A
> 
>> +		/* final store */
>> +		"movq %%rax, %[v]\n\t"
> 
> stores A in @v
> 
> 
> So the whole thing loads @v into @load, adds and offset, dereferences
> and adds that back in @v, provided @v doesn't match @expected.. whee.

Yes, exactly! ;)


> 
>> +		"2:\n\t"
>> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
>> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
>> +		: /* gcc asm goto does not allow outputs */
>> +		: [cpu_id]"r"(cpu),
>> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
>> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
>> +		  /* final store input */
>> +		  [v]"m"(*v),
>> +		  [expectnot]"r"(expectnot),
>> +		  [voffp]"er"(voffp),
>> +		  [load]"m"(*load)
>> +		: "memory", "cc", "rax"
>> +		: abort, cmpfail
>> +	);
>> +	return 0;
>> +abort:
>> +	return -1;
>> +cmpfail:
>> +	return 1;
>> +}
> 
>> +#elif __i386__
>> +
>> +/*
>> + * Support older 32-bit architectures that do not implement fence
>> + * instructions.
>> + */
>> +#define rseq_smp_mb()	\
>> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
>> +#define rseq_smp_rmb()	\
>> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
>> +#define rseq_smp_wmb()	\
>> +	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
> 
> Oh shiny, you're supporting that OOSTORE and PPRO_FENCE nonsense?

Well, what's the point in supporting only a subset of the x86 architecture ? ;)

> 
> Going by commit:
> 
>  09df7c4c8097 ("x86: Remove CONFIG_X86_OOSTORE")
> 
> That smp_wmb() one was an 'optimization' (forced store buffer flush) but
> not a correctness thing. And we dropped that stuff from the kernel a
> _long_ time ago.

Perhaps from a kernel code perspective it's not a correctness thing, but
this is a correctness issue for user-space code.

I have the following comments in liburcu near the x86-32 barriers:

 * We leave smp_rmb/smp_wmb as full barriers for processors that do not have
 * fence instructions.
 *
 * An empty cmm_smp_rmb() may not be enough on old PentiumPro multiprocessor
 * systems, due to an erratum.  The Linux kernel says that "Even distro
 * kernels should think twice before enabling this", but for now let's
 * be conservative and leave the full barrier on 32-bit processors.  Also,
 * IDT WinChip supports weak store ordering, and the kernel may enable it
 * under our feet; cmm_smp_wmb() ceases to be a nop for these processors.

So it looks like from a user-space perspective, having the smp_wmb() would
be a correctness issue for older kernels on IDT WinChip processors.

Now you could argue that the recent kernel that will implement the "rseq"
system call will _never_ enable this IDT WinChip OOSTORE stuff. So we would
end up relying on success of the rseq cpu_id field comparison to ensure
rseq is indeed available, and in those cases the OOSTORE-aware smp_wmb is not
needed on x86-32.

Am I understanding this correctly ?

In that case, we should really document that those rseq_smp_wmb() are only
valid when running on kernels that provide the rseq system call.

> Ideally we'd kill that PPRO_FENCE crap too.

AFAIU, this one is part of the PPRO architecture, not really enabled explicitly
by the kernel, right ? I'm not sure what we can do about this other than
politely asking everyone to throw away these old pieces of hardware... ;)

Thanks for the review!

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
  2017-11-23 22:51             ` Thomas Gleixner
@ 2017-11-23 23:01               ` Mathieu Desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-23 23:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Will Deacon, Peter Zijlstra, Andi Kleen, Paul E. McKenney,
	Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel,
	linux-api, Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andrew Hunter, Chris Lameter, Ben Maurer,
	rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Michael Kerrisk

----- On Nov 23, 2017, at 5:51 PM, Thomas Gleixner tglx@linutronix.de wrote:

> On Thu, 23 Nov 2017, Mathieu Desnoyers wrote:
>> ----- On Nov 22, 2017, at 2:37 PM, Will Deacon will.deacon@arm.com wrote:
>> > On Wed, Nov 22, 2017 at 08:32:19PM +0100, Peter Zijlstra wrote:
>> >>
>> >> So what exactly is the problem of leaving out the whole cpu_opv thing
>> >> for now? Pure rseq is usable -- albeit a bit cumbersome without
>> >> additional debugger support.
>> > 
>> > Drive-by "ack" to that. I'd really like a working rseq implementation in
>> > mainline, but I don't much care for another interpreter.
>> 
>> Considering the arm 64 use-case of reading PMU counters from user-space
>> using rseq to prevent migration, I understand that you're lucky enough to
>> already have a system call at your disposal that can perform the slow-path
>> in case of single-stepping.
>> 
>> So yes, your particular case is already covered, but unfortunately that's
>> not the same situation for other use-cases that have been expressed.
> 
> If we have users of rseq which can do without the other muck, then what's
> the reason not to support it?
> 
> The sysops thing can be sorted out on top and the use cases which need both
> will have to test for both syscalls being available anyway.

I'm currently making sure CONFIG_RSEQ selects both CONFIG_CPU_OPV and
CONFIG_MEMBARRIER, so the user-space fast-paths don't end up with
various ways of doing the fallback/single-stepping/memory barrier handling
depending on whether the kernel support each of those individually.
So first of all, it reduces complexity from a user-space perspective.

Moreover, with a single already needed cpu_id vs cpu_id_start field comparison
in the rseq fast-path, user-space knows that it can rely on having rseq,
cpu_opv, and membarrier. Without this guarantee, user-space would have to
detect individually whether each of those system calls is available, and
test flags on the fast-path, for additional overhead.

Those are my main concerns about pushing an incomplete solution at this
stage.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-23 23:01               ` Mathieu Desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-23 23:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Will Deacon, Peter Zijlstra, Andi Kleen, Paul E. McKenney,
	Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel,
	linux-api, Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andrew Hunter, Chris Lameter, Ben Maurer,
	rostedt, Josh Triplett, Linus Torvalds, Catalin

----- On Nov 23, 2017, at 5:51 PM, Thomas Gleixner tglx@linutronix.de wrote:

> On Thu, 23 Nov 2017, Mathieu Desnoyers wrote:
>> ----- On Nov 22, 2017, at 2:37 PM, Will Deacon will.deacon@arm.com wrote:
>> > On Wed, Nov 22, 2017 at 08:32:19PM +0100, Peter Zijlstra wrote:
>> >>
>> >> So what exactly is the problem of leaving out the whole cpu_opv thing
>> >> for now? Pure rseq is usable -- albeit a bit cumbersome without
>> >> additional debugger support.
>> > 
>> > Drive-by "ack" to that. I'd really like a working rseq implementation in
>> > mainline, but I don't much care for another interpreter.
>> 
>> Considering the arm 64 use-case of reading PMU counters from user-space
>> using rseq to prevent migration, I understand that you're lucky enough to
>> already have a system call at your disposal that can perform the slow-path
>> in case of single-stepping.
>> 
>> So yes, your particular case is already covered, but unfortunately that's
>> not the same situation for other use-cases that have been expressed.
> 
> If we have users of rseq which can do without the other muck, then what's
> the reason not to support it?
> 
> The sysops thing can be sorted out on top and the use cases which need both
> will have to test for both syscalls being available anyway.

I'm currently making sure CONFIG_RSEQ selects both CONFIG_CPU_OPV and
CONFIG_MEMBARRIER, so the user-space fast-paths don't end up with
various ways of doing the fallback/single-stepping/memory barrier handling
depending on whether the kernel support each of those individually.
So first of all, it reduces complexity from a user-space perspective.

Moreover, with a single already needed cpu_id vs cpu_id_start field comparison
in the rseq fast-path, user-space knows that it can rely on having rseq,
cpu_opv, and membarrier. Without this guarantee, user-space would have to
detect individually whether each of those system calls is available, and
test flags on the fast-path, for additional overhead.

Those are my main concerns about pushing an incomplete solution at this
stage.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-23 23:38                 ` Thomas Gleixner
  0 siblings, 0 replies; 175+ messages in thread
From: Thomas Gleixner @ 2017-11-23 23:38 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Will Deacon, Peter Zijlstra, Andi Kleen, Paul E. McKenney,
	Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel,
	linux-api, Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andrew Hunter, Chris Lameter, Ben Maurer,
	rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Michael Kerrisk

On Thu, 23 Nov 2017, Mathieu Desnoyers wrote:
> ----- On Nov 23, 2017, at 5:51 PM, Thomas Gleixner tglx@linutronix.de wrote:
> > On Thu, 23 Nov 2017, Mathieu Desnoyers wrote:
> >> ----- On Nov 22, 2017, at 2:37 PM, Will Deacon will.deacon@arm.com wrote:
> >> > On Wed, Nov 22, 2017 at 08:32:19PM +0100, Peter Zijlstra wrote:
> >> >>
> >> >> So what exactly is the problem of leaving out the whole cpu_opv thing
> >> >> for now? Pure rseq is usable -- albeit a bit cumbersome without
> >> >> additional debugger support.
> >> > 
> >> > Drive-by "ack" to that. I'd really like a working rseq implementation in
> >> > mainline, but I don't much care for another interpreter.
> >> 
> >> Considering the arm 64 use-case of reading PMU counters from user-space
> >> using rseq to prevent migration, I understand that you're lucky enough to
> >> already have a system call at your disposal that can perform the slow-path
> >> in case of single-stepping.
> >> 
> >> So yes, your particular case is already covered, but unfortunately that's
> >> not the same situation for other use-cases that have been expressed.
> > 
> > If we have users of rseq which can do without the other muck, then what's
> > the reason not to support it?
> > 
> > The sysops thing can be sorted out on top and the use cases which need both
> > will have to test for both syscalls being available anyway.
> 
> I'm currently making sure CONFIG_RSEQ selects both CONFIG_CPU_OPV and
> CONFIG_MEMBARRIER, so the user-space fast-paths don't end up with
> various ways of doing the fallback/single-stepping/memory barrier handling
> depending on whether the kernel support each of those individually.
> So first of all, it reduces complexity from a user-space perspective.
>
> Moreover, with a single already needed cpu_id vs cpu_id_start field comparison
> in the rseq fast-path, user-space knows that it can rely on having rseq,
> cpu_opv, and membarrier. Without this guarantee, user-space would have to
> detect individually whether each of those system calls is available, and
> test flags on the fast-path, for additional overhead.

You have to test for sys_rseq somewhere in the init code. So you can test
for the other two being fully functional as well.

If one of them is missing then you can avoid that rseq fastpath either
completely or because you never registered that rseq muck your rseq will
just contain stale init data which kicks you into some slowpath fallback
code.

You need something like this anyway unless you plan to ship code which
cannot run on systems w/o rseq support at all.

Either you designed your thing wrong or you try to create an artifical
dependency for political reasons.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-23 23:38                 ` Thomas Gleixner
  0 siblings, 0 replies; 175+ messages in thread
From: Thomas Gleixner @ 2017-11-23 23:38 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Will Deacon, Peter Zijlstra, Andi Kleen, Paul E. McKenney,
	Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel,
	linux-api, Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andrew Hunter, Chris Lameter, Ben Maurer,
	rostedt, Josh Triplett, Linus Torvalds, Catalin

On Thu, 23 Nov 2017, Mathieu Desnoyers wrote:
> ----- On Nov 23, 2017, at 5:51 PM, Thomas Gleixner tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org wrote:
> > On Thu, 23 Nov 2017, Mathieu Desnoyers wrote:
> >> ----- On Nov 22, 2017, at 2:37 PM, Will Deacon will.deacon-5wv7dgnIgG8@public.gmane.org wrote:
> >> > On Wed, Nov 22, 2017 at 08:32:19PM +0100, Peter Zijlstra wrote:
> >> >>
> >> >> So what exactly is the problem of leaving out the whole cpu_opv thing
> >> >> for now? Pure rseq is usable -- albeit a bit cumbersome without
> >> >> additional debugger support.
> >> > 
> >> > Drive-by "ack" to that. I'd really like a working rseq implementation in
> >> > mainline, but I don't much care for another interpreter.
> >> 
> >> Considering the arm 64 use-case of reading PMU counters from user-space
> >> using rseq to prevent migration, I understand that you're lucky enough to
> >> already have a system call at your disposal that can perform the slow-path
> >> in case of single-stepping.
> >> 
> >> So yes, your particular case is already covered, but unfortunately that's
> >> not the same situation for other use-cases that have been expressed.
> > 
> > If we have users of rseq which can do without the other muck, then what's
> > the reason not to support it?
> > 
> > The sysops thing can be sorted out on top and the use cases which need both
> > will have to test for both syscalls being available anyway.
> 
> I'm currently making sure CONFIG_RSEQ selects both CONFIG_CPU_OPV and
> CONFIG_MEMBARRIER, so the user-space fast-paths don't end up with
> various ways of doing the fallback/single-stepping/memory barrier handling
> depending on whether the kernel support each of those individually.
> So first of all, it reduces complexity from a user-space perspective.
>
> Moreover, with a single already needed cpu_id vs cpu_id_start field comparison
> in the rseq fast-path, user-space knows that it can rely on having rseq,
> cpu_opv, and membarrier. Without this guarantee, user-space would have to
> detect individually whether each of those system calls is available, and
> test flags on the fast-path, for additional overhead.

You have to test for sys_rseq somewhere in the init code. So you can test
for the other two being fully functional as well.

If one of them is missing then you can avoid that rseq fastpath either
completely or because you never registered that rseq muck your rseq will
just contain stale init data which kicks you into some slowpath fallback
code.

You need something like this anyway unless you plan to ship code which
cannot run on systems w/o rseq support at all.

Either you designed your thing wrong or you try to create an artifical
dependency for political reasons.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
  2017-11-23 23:38                 ` Thomas Gleixner
@ 2017-11-24  0:04                   ` Mathieu Desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-24  0:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Will Deacon, Peter Zijlstra, Andi Kleen, Paul E. McKenney,
	Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel,
	linux-api, Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andrew Hunter, Chris Lameter, Ben Maurer,
	rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Michael Kerrisk

----- On Nov 23, 2017, at 6:38 PM, Thomas Gleixner tglx@linutronix.de wrote:

> On Thu, 23 Nov 2017, Mathieu Desnoyers wrote:
>> ----- On Nov 23, 2017, at 5:51 PM, Thomas Gleixner tglx@linutronix.de wrote:
>> > On Thu, 23 Nov 2017, Mathieu Desnoyers wrote:
>> >> ----- On Nov 22, 2017, at 2:37 PM, Will Deacon will.deacon@arm.com wrote:
>> >> > On Wed, Nov 22, 2017 at 08:32:19PM +0100, Peter Zijlstra wrote:
>> >> >>
>> >> >> So what exactly is the problem of leaving out the whole cpu_opv thing
>> >> >> for now? Pure rseq is usable -- albeit a bit cumbersome without
>> >> >> additional debugger support.
>> >> > 
>> >> > Drive-by "ack" to that. I'd really like a working rseq implementation in
>> >> > mainline, but I don't much care for another interpreter.
>> >> 
>> >> Considering the arm 64 use-case of reading PMU counters from user-space
>> >> using rseq to prevent migration, I understand that you're lucky enough to
>> >> already have a system call at your disposal that can perform the slow-path
>> >> in case of single-stepping.
>> >> 
>> >> So yes, your particular case is already covered, but unfortunately that's
>> >> not the same situation for other use-cases that have been expressed.
>> > 
>> > If we have users of rseq which can do without the other muck, then what's
>> > the reason not to support it?
>> > 
>> > The sysops thing can be sorted out on top and the use cases which need both
>> > will have to test for both syscalls being available anyway.
>> 
>> I'm currently making sure CONFIG_RSEQ selects both CONFIG_CPU_OPV and
>> CONFIG_MEMBARRIER, so the user-space fast-paths don't end up with
>> various ways of doing the fallback/single-stepping/memory barrier handling
>> depending on whether the kernel support each of those individually.
>> So first of all, it reduces complexity from a user-space perspective.
>>
>> Moreover, with a single already needed cpu_id vs cpu_id_start field comparison
>> in the rseq fast-path, user-space knows that it can rely on having rseq,
>> cpu_opv, and membarrier. Without this guarantee, user-space would have to
>> detect individually whether each of those system calls is available, and
>> test flags on the fast-path, for additional overhead.
> 
> You have to test for sys_rseq somewhere in the init code. So you can test
> for the other two being fully functional as well.
> 
> If one of them is missing then you can avoid that rseq fastpath either
> completely or because you never registered that rseq muck your rseq will
> just contain stale init data which kicks you into some slowpath fallback
> code.

That would work if we could have more than one rseq TLS entry per thread.
If it would be the case, then e.g. lttng-ust could own its own rseq
TLS and do just as you explain above.

It's not the case with the current proposal. This means multiple user
libraries will have to share the same cpu_id and cpu_id_start fields,
which breaks your proposed new-app/old-kernel backward compatibility
check proposal.

For instance, if glibc librseq.so happily registers rseq (and does not
care about testing for cpu_opv or membarrier availability), then
lttng-ust cannot leave stale rseq init data which kicks in its slowpath
fallback.

> 
> You need something like this anyway unless you plan to ship code which
> cannot run on systems w/o rseq support at all.

My plan is to ensure that testing for

  (TLS::rseq->cpu_id_start == TLS::rseq->cpu_id)

should be enough for fast-paths to guarantee that:

- rseq is available and registered for the current thread,
- cpu_opv is available as fallback,
- membarrier private_expedited and shared_expedited are available.

> 
> Either you designed your thing wrong or you try to create an artifical
> dependency for political reasons.

Having the rseq TLS shared across multiple library/app users within a
single process does limit our options there. :-/

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-24  0:04                   ` Mathieu Desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-24  0:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Will Deacon, Peter Zijlstra, Andi Kleen, Paul E. McKenney,
	Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel,
	linux-api, Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andrew Hunter, Chris Lameter, Ben Maurer,
	rostedt, Josh Triplett, Linus Torvalds, Catalin

----- On Nov 23, 2017, at 6:38 PM, Thomas Gleixner tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org wrote:

> On Thu, 23 Nov 2017, Mathieu Desnoyers wrote:
>> ----- On Nov 23, 2017, at 5:51 PM, Thomas Gleixner tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org wrote:
>> > On Thu, 23 Nov 2017, Mathieu Desnoyers wrote:
>> >> ----- On Nov 22, 2017, at 2:37 PM, Will Deacon will.deacon-5wv7dgnIgG8@public.gmane.org wrote:
>> >> > On Wed, Nov 22, 2017 at 08:32:19PM +0100, Peter Zijlstra wrote:
>> >> >>
>> >> >> So what exactly is the problem of leaving out the whole cpu_opv thing
>> >> >> for now? Pure rseq is usable -- albeit a bit cumbersome without
>> >> >> additional debugger support.
>> >> > 
>> >> > Drive-by "ack" to that. I'd really like a working rseq implementation in
>> >> > mainline, but I don't much care for another interpreter.
>> >> 
>> >> Considering the arm 64 use-case of reading PMU counters from user-space
>> >> using rseq to prevent migration, I understand that you're lucky enough to
>> >> already have a system call at your disposal that can perform the slow-path
>> >> in case of single-stepping.
>> >> 
>> >> So yes, your particular case is already covered, but unfortunately that's
>> >> not the same situation for other use-cases that have been expressed.
>> > 
>> > If we have users of rseq which can do without the other muck, then what's
>> > the reason not to support it?
>> > 
>> > The sysops thing can be sorted out on top and the use cases which need both
>> > will have to test for both syscalls being available anyway.
>> 
>> I'm currently making sure CONFIG_RSEQ selects both CONFIG_CPU_OPV and
>> CONFIG_MEMBARRIER, so the user-space fast-paths don't end up with
>> various ways of doing the fallback/single-stepping/memory barrier handling
>> depending on whether the kernel support each of those individually.
>> So first of all, it reduces complexity from a user-space perspective.
>>
>> Moreover, with a single already needed cpu_id vs cpu_id_start field comparison
>> in the rseq fast-path, user-space knows that it can rely on having rseq,
>> cpu_opv, and membarrier. Without this guarantee, user-space would have to
>> detect individually whether each of those system calls is available, and
>> test flags on the fast-path, for additional overhead.
> 
> You have to test for sys_rseq somewhere in the init code. So you can test
> for the other two being fully functional as well.
> 
> If one of them is missing then you can avoid that rseq fastpath either
> completely or because you never registered that rseq muck your rseq will
> just contain stale init data which kicks you into some slowpath fallback
> code.

That would work if we could have more than one rseq TLS entry per thread.
If it would be the case, then e.g. lttng-ust could own its own rseq
TLS and do just as you explain above.

It's not the case with the current proposal. This means multiple user
libraries will have to share the same cpu_id and cpu_id_start fields,
which breaks your proposed new-app/old-kernel backward compatibility
check proposal.

For instance, if glibc librseq.so happily registers rseq (and does not
care about testing for cpu_opv or membarrier availability), then
lttng-ust cannot leave stale rseq init data which kicks in its slowpath
fallback.

> 
> You need something like this anyway unless you plan to ship code which
> cannot run on systems w/o rseq support at all.

My plan is to ensure that testing for

  (TLS::rseq->cpu_id_start == TLS::rseq->cpu_id)

should be enough for fast-paths to guarantee that:

- rseq is available and registered for the current thread,
- cpu_opv is available as fallback,
- membarrier private_expedited and shared_expedited are available.

> 
> Either you designed your thing wrong or you try to create an artifical
> dependency for political reasons.

Having the rseq TLS shared across multiple library/app users within a
single process does limit our options there. :-/

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
  2017-11-23  8:55     ` peterz
  (?)
  (?)
@ 2017-11-24 13:55       ` mathieu.desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-24 13:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, shuah, linux-kselftest

----- On Nov 23, 2017, at 3:55 AM, Peter Zijlstra peterz@infradead.org wrote:

> On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
>> +int percpu_list_push(struct percpu_list *list, struct percpu_list_node *node)
>> +{
>> +	intptr_t *targetptr, newval, expect;
>> +	int cpu, ret;
>> +
>> +	/* Try fast path. */
>> +	cpu = rseq_cpu_start();
> 
>> +	/* Load list->c[cpu].head with single-copy atomicity. */
>> +	expect = (intptr_t)READ_ONCE(list->c[cpu].head);
>> +	newval = (intptr_t)node;
>> +	targetptr = (intptr_t *)&list->c[cpu].head;
>> +	node->next = (struct percpu_list_node *)expect;
> 
>> +	ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu);
> 
>> +	if (likely(!ret))
>> +		return cpu;
> 
>> +	return cpu;
>> +}
> 
>> +static inline __attribute__((always_inline))
>> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
>> +		int cpu)
>> +{
>> +	__asm__ __volatile__ goto (
>> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> 
>> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> 
> So the actual C part of the RSEQ is subject to an ABA, right? We can get
> migrated to another CPU and back again without then failing here.

Yes, that's correct. All algorithms preparing something in C and
then using a compare-and-other-stuff sequence need to ensure they
do not have ABA situations. For instance, a list push does not care
if the list head is reclaimed and re-inserted concurrently, because
none of the preparation steps in C involve the head next pointer.

> 
> It used to be that this was caught by the sequence count, but that is
> now gone.

The sequence count introduced other weirdness: although it would catch
those migration cases, it is a sequence read-lock, which means
the C code "protected" by this sequence read-lock needed to be
extremely careful about not accessing reclaimed memory.
The sequence lock ensures consistency of the data when they comparison
matches, but it does not protect against other side-effects.

So removing this sequence lock is actually a good thing: it removes
any expectation that users may have about that sequence counter being
anything stronger than a read seqlock.

> 
> The thing that makes it work is the compare against @v:
> 
>> +		"cmpq %[v], %[expect]\n\t"
>> +		"jnz 5f\n\t"
> 
> That then ensures things are still as we observed them before (although
> this itself is also subject to ABA).

Yes.

> 
> This means all RSEQ primitives that have a C part must have a cmp-and-
> form, but I suppose that was already pretty much the case anyway. I just
> don't remember seeing that spelled out anywhere. Then again, I've not
> yet read that manpage.

Yes, pretty much. The only primitives that don't have the compare are
things like "rseq_addv()", which does not have much in the C part
(it's just incrementing a counter).

I did not state anything like "typical rseq c.s. do a compare and other stuff"
in rseq(2), given that the role of this man page, AFAIU, is to explain
how to interact with the kernel system call, and not really a document
about user-space implementation guide lines.

But let me know if I should expand it with a user-space sequence implementation
guide lines, which would include notes about being careful about ABA. I'm
not sure it belongs there though.

Thanks!

Mathieu

> 
>> +		/* final store */
>> +		"movq %[newv], %[v]\n\t"
>> +		"2:\n\t"
>> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
>> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
>> +		: /* gcc asm goto does not allow outputs */
>> +		: [cpu_id]"r"(cpu),
>> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
>> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
>> +		  [v]"m"(*v),
>> +		  [expect]"r"(expect),
>> +		  [newv]"r"(newv)
>> +		: "memory", "cc", "rax"
>> +		: abort, cmpfail
>> +	);
>> +	return 0;
>> +abort:
>> +	return -1;
>> +cmpfail:
>> +	return 1;
> > +}

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-24 13:55       ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: mathieu.desnoyers @ 2017-11-24 13:55 UTC (permalink / raw)


----- On Nov 23, 2017, at 3:55 AM, Peter Zijlstra peterz at infradead.org wrote:

> On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
>> +int percpu_list_push(struct percpu_list *list, struct percpu_list_node *node)
>> +{
>> +	intptr_t *targetptr, newval, expect;
>> +	int cpu, ret;
>> +
>> +	/* Try fast path. */
>> +	cpu = rseq_cpu_start();
> 
>> +	/* Load list->c[cpu].head with single-copy atomicity. */
>> +	expect = (intptr_t)READ_ONCE(list->c[cpu].head);
>> +	newval = (intptr_t)node;
>> +	targetptr = (intptr_t *)&list->c[cpu].head;
>> +	node->next = (struct percpu_list_node *)expect;
> 
>> +	ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu);
> 
>> +	if (likely(!ret))
>> +		return cpu;
> 
>> +	return cpu;
>> +}
> 
>> +static inline __attribute__((always_inline))
>> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
>> +		int cpu)
>> +{
>> +	__asm__ __volatile__ goto (
>> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> 
>> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> 
> So the actual C part of the RSEQ is subject to an ABA, right? We can get
> migrated to another CPU and back again without then failing here.

Yes, that's correct. All algorithms preparing something in C and
then using a compare-and-other-stuff sequence need to ensure they
do not have ABA situations. For instance, a list push does not care
if the list head is reclaimed and re-inserted concurrently, because
none of the preparation steps in C involve the head next pointer.

> 
> It used to be that this was caught by the sequence count, but that is
> now gone.

The sequence count introduced other weirdness: although it would catch
those migration cases, it is a sequence read-lock, which means
the C code "protected" by this sequence read-lock needed to be
extremely careful about not accessing reclaimed memory.
The sequence lock ensures consistency of the data when they comparison
matches, but it does not protect against other side-effects.

So removing this sequence lock is actually a good thing: it removes
any expectation that users may have about that sequence counter being
anything stronger than a read seqlock.

> 
> The thing that makes it work is the compare against @v:
> 
>> +		"cmpq %[v], %[expect]\n\t"
>> +		"jnz 5f\n\t"
> 
> That then ensures things are still as we observed them before (although
> this itself is also subject to ABA).

Yes.

> 
> This means all RSEQ primitives that have a C part must have a cmp-and-
> form, but I suppose that was already pretty much the case anyway. I just
> don't remember seeing that spelled out anywhere. Then again, I've not
> yet read that manpage.

Yes, pretty much. The only primitives that don't have the compare are
things like "rseq_addv()", which does not have much in the C part
(it's just incrementing a counter).

I did not state anything like "typical rseq c.s. do a compare and other stuff"
in rseq(2), given that the role of this man page, AFAIU, is to explain
how to interact with the kernel system call, and not really a document
about user-space implementation guide lines.

But let me know if I should expand it with a user-space sequence implementation
guide lines, which would include notes about being careful about ABA. I'm
not sure it belongs there though.

Thanks!

Mathieu

> 
>> +		/* final store */
>> +		"movq %[newv], %[v]\n\t"
>> +		"2:\n\t"
>> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
>> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
>> +		: /* gcc asm goto does not allow outputs */
>> +		: [cpu_id]"r"(cpu),
>> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
>> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
>> +		  [v]"m"(*v),
>> +		  [expect]"r"(expect),
>> +		  [newv]"r"(newv)
>> +		: "memory", "cc", "rax"
>> +		: abort, cmpfail
>> +	);
>> +	return 0;
>> +abort:
>> +	return -1;
>> +cmpfail:
>> +	return 1;
> > +}

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-24 13:55       ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-24 13:55 UTC (permalink / raw)


----- On Nov 23, 2017,@3:55 AM, Peter Zijlstra peterz@infradead.org wrote:

> On Tue, Nov 21, 2017@09:18:53AM -0500, Mathieu Desnoyers wrote:
>> +int percpu_list_push(struct percpu_list *list, struct percpu_list_node *node)
>> +{
>> +	intptr_t *targetptr, newval, expect;
>> +	int cpu, ret;
>> +
>> +	/* Try fast path. */
>> +	cpu = rseq_cpu_start();
> 
>> +	/* Load list->c[cpu].head with single-copy atomicity. */
>> +	expect = (intptr_t)READ_ONCE(list->c[cpu].head);
>> +	newval = (intptr_t)node;
>> +	targetptr = (intptr_t *)&list->c[cpu].head;
>> +	node->next = (struct percpu_list_node *)expect;
> 
>> +	ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu);
> 
>> +	if (likely(!ret))
>> +		return cpu;
> 
>> +	return cpu;
>> +}
> 
>> +static inline __attribute__((always_inline))
>> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
>> +		int cpu)
>> +{
>> +	__asm__ __volatile__ goto (
>> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> 
>> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> 
> So the actual C part of the RSEQ is subject to an ABA, right? We can get
> migrated to another CPU and back again without then failing here.

Yes, that's correct. All algorithms preparing something in C and
then using a compare-and-other-stuff sequence need to ensure they
do not have ABA situations. For instance, a list push does not care
if the list head is reclaimed and re-inserted concurrently, because
none of the preparation steps in C involve the head next pointer.

> 
> It used to be that this was caught by the sequence count, but that is
> now gone.

The sequence count introduced other weirdness: although it would catch
those migration cases, it is a sequence read-lock, which means
the C code "protected" by this sequence read-lock needed to be
extremely careful about not accessing reclaimed memory.
The sequence lock ensures consistency of the data when they comparison
matches, but it does not protect against other side-effects.

So removing this sequence lock is actually a good thing: it removes
any expectation that users may have about that sequence counter being
anything stronger than a read seqlock.

> 
> The thing that makes it work is the compare against @v:
> 
>> +		"cmpq %[v], %[expect]\n\t"
>> +		"jnz 5f\n\t"
> 
> That then ensures things are still as we observed them before (although
> this itself is also subject to ABA).

Yes.

> 
> This means all RSEQ primitives that have a C part must have a cmp-and-
> form, but I suppose that was already pretty much the case anyway. I just
> don't remember seeing that spelled out anywhere. Then again, I've not
> yet read that manpage.

Yes, pretty much. The only primitives that don't have the compare are
things like "rseq_addv()", which does not have much in the C part
(it's just incrementing a counter).

I did not state anything like "typical rseq c.s. do a compare and other stuff"
in rseq(2), given that the role of this man page, AFAIU, is to explain
how to interact with the kernel system call, and not really a document
about user-space implementation guide lines.

But let me know if I should expand it with a user-space sequence implementation
guide lines, which would include notes about being careful about ABA. I'm
not sure it belongs there though.

Thanks!

Mathieu

> 
>> +		/* final store */
>> +		"movq %[newv], %[v]\n\t"
>> +		"2:\n\t"
>> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
>> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
>> +		: /* gcc asm goto does not allow outputs */
>> +		: [cpu_id]"r"(cpu),
>> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
>> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
>> +		  [v]"m"(*v),
>> +		  [expect]"r"(expect),
>> +		  [newv]"r"(newv)
>> +		: "memory", "cc", "rax"
>> +		: abort, cmpfail
>> +	);
>> +	return 0;
>> +abort:
>> +	return -1;
>> +cmpfail:
>> +	return 1;
> > +}

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-24 13:55       ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-24 13:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon

----- On Nov 23, 2017, at 3:55 AM, Peter Zijlstra peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org wrote:

> On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
>> +int percpu_list_push(struct percpu_list *list, struct percpu_list_node *node)
>> +{
>> +	intptr_t *targetptr, newval, expect;
>> +	int cpu, ret;
>> +
>> +	/* Try fast path. */
>> +	cpu = rseq_cpu_start();
> 
>> +	/* Load list->c[cpu].head with single-copy atomicity. */
>> +	expect = (intptr_t)READ_ONCE(list->c[cpu].head);
>> +	newval = (intptr_t)node;
>> +	targetptr = (intptr_t *)&list->c[cpu].head;
>> +	node->next = (struct percpu_list_node *)expect;
> 
>> +	ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu);
> 
>> +	if (likely(!ret))
>> +		return cpu;
> 
>> +	return cpu;
>> +}
> 
>> +static inline __attribute__((always_inline))
>> +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
>> +		int cpu)
>> +{
>> +	__asm__ __volatile__ goto (
>> +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>> +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
> 
>> +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> 
> So the actual C part of the RSEQ is subject to an ABA, right? We can get
> migrated to another CPU and back again without then failing here.

Yes, that's correct. All algorithms preparing something in C and
then using a compare-and-other-stuff sequence need to ensure they
do not have ABA situations. For instance, a list push does not care
if the list head is reclaimed and re-inserted concurrently, because
none of the preparation steps in C involve the head next pointer.

> 
> It used to be that this was caught by the sequence count, but that is
> now gone.

The sequence count introduced other weirdness: although it would catch
those migration cases, it is a sequence read-lock, which means
the C code "protected" by this sequence read-lock needed to be
extremely careful about not accessing reclaimed memory.
The sequence lock ensures consistency of the data when they comparison
matches, but it does not protect against other side-effects.

So removing this sequence lock is actually a good thing: it removes
any expectation that users may have about that sequence counter being
anything stronger than a read seqlock.

> 
> The thing that makes it work is the compare against @v:
> 
>> +		"cmpq %[v], %[expect]\n\t"
>> +		"jnz 5f\n\t"
> 
> That then ensures things are still as we observed them before (although
> this itself is also subject to ABA).

Yes.

> 
> This means all RSEQ primitives that have a C part must have a cmp-and-
> form, but I suppose that was already pretty much the case anyway. I just
> don't remember seeing that spelled out anywhere. Then again, I've not
> yet read that manpage.

Yes, pretty much. The only primitives that don't have the compare are
things like "rseq_addv()", which does not have much in the C part
(it's just incrementing a counter).

I did not state anything like "typical rseq c.s. do a compare and other stuff"
in rseq(2), given that the role of this man page, AFAIU, is to explain
how to interact with the kernel system call, and not really a document
about user-space implementation guide lines.

But let me know if I should expand it with a user-space sequence implementation
guide lines, which would include notes about being careful about ABA. I'm
not sure it belongs there though.

Thanks!

Mathieu

> 
>> +		/* final store */
>> +		"movq %[newv], %[v]\n\t"
>> +		"2:\n\t"
>> +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
>> +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
>> +		: /* gcc asm goto does not allow outputs */
>> +		: [cpu_id]"r"(cpu),
>> +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
>> +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
>> +		  [v]"m"(*v),
>> +		  [expect]"r"(expect),
>> +		  [newv]"r"(newv)
>> +		: "memory", "cc", "rax"
>> +		: abort, cmpfail
>> +	);
>> +	return 0;
>> +abort:
>> +	return -1;
>> +cmpfail:
>> +	return 1;
> > +}

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
  2017-11-23  8:57       ` peterz
  (?)
  (?)
@ 2017-11-24 14:15         ` mathieu.desnoyers
  -1 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-24 14:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, shuah, linux-kselftest

----- On Nov 23, 2017, at 3:57 AM, Peter Zijlstra peterz@infradead.org wrote:

> On Thu, Nov 23, 2017 at 09:55:11AM +0100, Peter Zijlstra wrote:
>> On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
>> > +static inline __attribute__((always_inline))
>> > +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
>> > +		int cpu)
>> > +{
>> > +	__asm__ __volatile__ goto (
>> > +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>> > +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
>> 
>> > +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> 
>> > +		"cmpq %[v], %[expect]\n\t"
>> > +		"jnz 5f\n\t"
> 
> Also, I'm confused between the abort and cmpfail cases.
> 
> In would expect the cpu_id compare to also result in cmpfail, that is, I
> would only expect the kernel to result in abort.

Let's take the per-cpu spinlock as an example to explain why we need
the "compare fail" and "cpu_id compare fail" to return different
values.

Getting this lock involves doing:

cpu = rseq_cpu_start();
ret = rseq_cmpeqv_storev(&lock->c[cpu].v, 0, 1, cpu);

Now based on the "ret" value:

if ret == 0, it means that @v was indeed 0, and that rseq
executed the commit (stored 1).

if ret > 0, it means the comparison of @v against 0 failed,
which means the lock was already held. We therefore need to
postpone and retry later. A "try_lock" operation would return
that the lock is currently busy.

if ret < 0, then we have either been aborted by the kernel,
or the comparison of @cpu against cpu_id failed. If we think
about it, having @cpu != cpu_id will happen if we are migrated
before we enter the rseq critical section, which is pretty
similar to being aborted by the kernel within the critical
section. So I don't see any reason for making the branch target
of the cpu_id comparison anything else than the abort_ip. In
that situation, the caller needs to either re-try with an
updated @cpu value (except for multi-part algorithms e.g.
reserve+commit, which don't allow changing the @cpu number on
commit), or use cpu_opv to perform the operation.

Note that another cause why the @cpu == cpu_id test may fail is
if rseq is not registered for the current thread. Again, just
branching to the abort_ip and letting the caller fallback to
cpu_opv solves this.

> 
>> > +		/* final store */
>> > +		"movq %[newv], %[v]\n\t"
>> > +		"2:\n\t"
>> > +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
>> > +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
>> > +		: /* gcc asm goto does not allow outputs */
>> > +		: [cpu_id]"r"(cpu),
>> > +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
>> > +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
>> > +		  [v]"m"(*v),
>> > +		  [expect]"r"(expect),
>> > +		  [newv]"r"(newv)
>> > +		: "memory", "cc", "rax"
>> > +		: abort, cmpfail
>> > +	);
>> > +	return 0;
>> > +abort:
>> > +	return -1;
> 
> Which then would suggest this be -EINTR or something like that.

I'm not so sure returning kernel error codes is the expected
practice for user-space libraries.

Thoughts ?

Thanks!

Mathieu


> 
>> > +cmpfail:
>> > +	return 1;
> > > +}

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-24 14:15         ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: mathieu.desnoyers @ 2017-11-24 14:15 UTC (permalink / raw)


----- On Nov 23, 2017, at 3:57 AM, Peter Zijlstra peterz at infradead.org wrote:

> On Thu, Nov 23, 2017 at 09:55:11AM +0100, Peter Zijlstra wrote:
>> On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
>> > +static inline __attribute__((always_inline))
>> > +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
>> > +		int cpu)
>> > +{
>> > +	__asm__ __volatile__ goto (
>> > +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>> > +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
>> 
>> > +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> 
>> > +		"cmpq %[v], %[expect]\n\t"
>> > +		"jnz 5f\n\t"
> 
> Also, I'm confused between the abort and cmpfail cases.
> 
> In would expect the cpu_id compare to also result in cmpfail, that is, I
> would only expect the kernel to result in abort.

Let's take the per-cpu spinlock as an example to explain why we need
the "compare fail" and "cpu_id compare fail" to return different
values.

Getting this lock involves doing:

cpu = rseq_cpu_start();
ret = rseq_cmpeqv_storev(&lock->c[cpu].v, 0, 1, cpu);

Now based on the "ret" value:

if ret == 0, it means that @v was indeed 0, and that rseq
executed the commit (stored 1).

if ret > 0, it means the comparison of @v against 0 failed,
which means the lock was already held. We therefore need to
postpone and retry later. A "try_lock" operation would return
that the lock is currently busy.

if ret < 0, then we have either been aborted by the kernel,
or the comparison of @cpu against cpu_id failed. If we think
about it, having @cpu != cpu_id will happen if we are migrated
before we enter the rseq critical section, which is pretty
similar to being aborted by the kernel within the critical
section. So I don't see any reason for making the branch target
of the cpu_id comparison anything else than the abort_ip. In
that situation, the caller needs to either re-try with an
updated @cpu value (except for multi-part algorithms e.g.
reserve+commit, which don't allow changing the @cpu number on
commit), or use cpu_opv to perform the operation.

Note that another cause why the @cpu == cpu_id test may fail is
if rseq is not registered for the current thread. Again, just
branching to the abort_ip and letting the caller fallback to
cpu_opv solves this.

> 
>> > +		/* final store */
>> > +		"movq %[newv], %[v]\n\t"
>> > +		"2:\n\t"
>> > +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
>> > +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
>> > +		: /* gcc asm goto does not allow outputs */
>> > +		: [cpu_id]"r"(cpu),
>> > +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
>> > +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
>> > +		  [v]"m"(*v),
>> > +		  [expect]"r"(expect),
>> > +		  [newv]"r"(newv)
>> > +		: "memory", "cc", "rax"
>> > +		: abort, cmpfail
>> > +	);
>> > +	return 0;
>> > +abort:
>> > +	return -1;
> 
> Which then would suggest this be -EINTR or something like that.

I'm not so sure returning kernel error codes is the expected
practice for user-space libraries.

Thoughts ?

Thanks!

Mathieu


> 
>> > +cmpfail:
>> > +	return 1;
> > > +}

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [Linux-kselftest-mirror] [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-24 14:15         ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-24 14:15 UTC (permalink / raw)


----- On Nov 23, 2017,@3:57 AM, Peter Zijlstra peterz@infradead.org wrote:

> On Thu, Nov 23, 2017@09:55:11AM +0100, Peter Zijlstra wrote:
>> On Tue, Nov 21, 2017@09:18:53AM -0500, Mathieu Desnoyers wrote:
>> > +static inline __attribute__((always_inline))
>> > +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
>> > +		int cpu)
>> > +{
>> > +	__asm__ __volatile__ goto (
>> > +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>> > +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
>> 
>> > +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> 
>> > +		"cmpq %[v], %[expect]\n\t"
>> > +		"jnz 5f\n\t"
> 
> Also, I'm confused between the abort and cmpfail cases.
> 
> In would expect the cpu_id compare to also result in cmpfail, that is, I
> would only expect the kernel to result in abort.

Let's take the per-cpu spinlock as an example to explain why we need
the "compare fail" and "cpu_id compare fail" to return different
values.

Getting this lock involves doing:

cpu = rseq_cpu_start();
ret = rseq_cmpeqv_storev(&lock->c[cpu].v, 0, 1, cpu);

Now based on the "ret" value:

if ret == 0, it means that @v was indeed 0, and that rseq
executed the commit (stored 1).

if ret > 0, it means the comparison of @v against 0 failed,
which means the lock was already held. We therefore need to
postpone and retry later. A "try_lock" operation would return
that the lock is currently busy.

if ret < 0, then we have either been aborted by the kernel,
or the comparison of @cpu against cpu_id failed. If we think
about it, having @cpu != cpu_id will happen if we are migrated
before we enter the rseq critical section, which is pretty
similar to being aborted by the kernel within the critical
section. So I don't see any reason for making the branch target
of the cpu_id comparison anything else than the abort_ip. In
that situation, the caller needs to either re-try with an
updated @cpu value (except for multi-part algorithms e.g.
reserve+commit, which don't allow changing the @cpu number on
commit), or use cpu_opv to perform the operation.

Note that another cause why the @cpu == cpu_id test may fail is
if rseq is not registered for the current thread. Again, just
branching to the abort_ip and letting the caller fallback to
cpu_opv solves this.

> 
>> > +		/* final store */
>> > +		"movq %[newv], %[v]\n\t"
>> > +		"2:\n\t"
>> > +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
>> > +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
>> > +		: /* gcc asm goto does not allow outputs */
>> > +		: [cpu_id]"r"(cpu),
>> > +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
>> > +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
>> > +		  [v]"m"(*v),
>> > +		  [expect]"r"(expect),
>> > +		  [newv]"r"(newv)
>> > +		: "memory", "cc", "rax"
>> > +		: abort, cmpfail
>> > +	);
>> > +	return 0;
>> > +abort:
>> > +	return -1;
> 
> Which then would suggest this be -EINTR or something like that.

I'm not so sure returning kernel error codes is the expected
practice for user-space libraries.

Thoughts ?

Thanks!

Mathieu


> 
>> > +cmpfail:
>> > +	return 1;
> > > +}

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests
@ 2017-11-24 14:15         ` mathieu.desnoyers
  0 siblings, 0 replies; 175+ messages in thread
From: Mathieu Desnoyers @ 2017-11-24 14:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon

----- On Nov 23, 2017, at 3:57 AM, Peter Zijlstra peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org wrote:

> On Thu, Nov 23, 2017 at 09:55:11AM +0100, Peter Zijlstra wrote:
>> On Tue, Nov 21, 2017 at 09:18:53AM -0500, Mathieu Desnoyers wrote:
>> > +static inline __attribute__((always_inline))
>> > +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
>> > +		int cpu)
>> > +{
>> > +	__asm__ __volatile__ goto (
>> > +		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
>> > +		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
>> 
>> > +		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
> 
>> > +		"cmpq %[v], %[expect]\n\t"
>> > +		"jnz 5f\n\t"
> 
> Also, I'm confused between the abort and cmpfail cases.
> 
> In would expect the cpu_id compare to also result in cmpfail, that is, I
> would only expect the kernel to result in abort.

Let's take the per-cpu spinlock as an example to explain why we need
the "compare fail" and "cpu_id compare fail" to return different
values.

Getting this lock involves doing:

cpu = rseq_cpu_start();
ret = rseq_cmpeqv_storev(&lock->c[cpu].v, 0, 1, cpu);

Now based on the "ret" value:

if ret == 0, it means that @v was indeed 0, and that rseq
executed the commit (stored 1).

if ret > 0, it means the comparison of @v against 0 failed,
which means the lock was already held. We therefore need to
postpone and retry later. A "try_lock" operation would return
that the lock is currently busy.

if ret < 0, then we have either been aborted by the kernel,
or the comparison of @cpu against cpu_id failed. If we think
about it, having @cpu != cpu_id will happen if we are migrated
before we enter the rseq critical section, which is pretty
similar to being aborted by the kernel within the critical
section. So I don't see any reason for making the branch target
of the cpu_id comparison anything else than the abort_ip. In
that situation, the caller needs to either re-try with an
updated @cpu value (except for multi-part algorithms e.g.
reserve+commit, which don't allow changing the @cpu number on
commit), or use cpu_opv to perform the operation.

Note that another cause why the @cpu == cpu_id test may fail is
if rseq is not registered for the current thread. Again, just
branching to the abort_ip and letting the caller fallback to
cpu_opv solves this.

> 
>> > +		/* final store */
>> > +		"movq %[newv], %[v]\n\t"
>> > +		"2:\n\t"
>> > +		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
>> > +		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
>> > +		: /* gcc asm goto does not allow outputs */
>> > +		: [cpu_id]"r"(cpu),
>> > +		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
>> > +		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
>> > +		  [v]"m"(*v),
>> > +		  [expect]"r"(expect),
>> > +		  [newv]"r"(newv)
>> > +		: "memory", "cc", "rax"
>> > +		: abort, cmpfail
>> > +	);
>> > +	return 0;
>> > +abort:
>> > +	return -1;
> 
> Which then would suggest this be -EINTR or something like that.

I'm not so sure returning kernel error codes is the expected
practice for user-space libraries.

Thoughts ?

Thanks!

Mathieu


> 
>> > +cmpfail:
>> > +	return 1;
> > > +}

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-24 14:47                     ` Thomas Gleixner
  0 siblings, 0 replies; 175+ messages in thread
From: Thomas Gleixner @ 2017-11-24 14:47 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Will Deacon, Peter Zijlstra, Andi Kleen, Paul E. McKenney,
	Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel,
	linux-api, Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andrew Hunter, Chris Lameter, Ben Maurer,
	rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Michael Kerrisk

On Fri, 24 Nov 2017, Mathieu Desnoyers wrote:
> ----- On Nov 23, 2017, at 6:38 PM, Thomas Gleixner tglx@linutronix.de wrote:
> > You have to test for sys_rseq somewhere in the init code. So you can test
> > for the other two being fully functional as well.
> > 
> > If one of them is missing then you can avoid that rseq fastpath either
> > completely or because you never registered that rseq muck your rseq will
> > just contain stale init data which kicks you into some slowpath fallback
> > code.
> 
> That would work if we could have more than one rseq TLS entry per thread.
> If it would be the case, then e.g. lttng-ust could own its own rseq
> TLS and do just as you explain above.
> 
> It's not the case with the current proposal. This means multiple user
> libraries will have to share the same cpu_id and cpu_id_start fields,
> which breaks your proposed new-app/old-kernel backward compatibility
> check proposal.
> 
> For instance, if glibc librseq.so happily registers rseq (and does not
> care about testing for cpu_opv or membarrier availability), then
> lttng-ust cannot leave stale rseq init data which kicks in its slowpath
> fallback.

You have to make sure that _ALL_ prerequisites are there before you start
using it whether you have a shared rseq or not. If a setup has rseq working
and sysops are blocked by a stupid mistake in a security filter, then your
assumption of testing rseq alone is broken already and stuff will explode
in hard to debug ways.

You CANNOT make such assumptions ever. Robustness is the first thing to
look at and after that you can optimize the hell out of it, without
violating robustness while doing that.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector
@ 2017-11-24 14:47                     ` Thomas Gleixner
  0 siblings, 0 replies; 175+ messages in thread
From: Thomas Gleixner @ 2017-11-24 14:47 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Will Deacon, Peter Zijlstra, Andi Kleen, Paul E. McKenney,
	Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel,
	linux-api, Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andrew Hunter, Chris Lameter, Ben Maurer,
	rostedt, Josh Triplett, Linus Torvalds, Catalin

On Fri, 24 Nov 2017, Mathieu Desnoyers wrote:
> ----- On Nov 23, 2017, at 6:38 PM, Thomas Gleixner tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org wrote:
> > You have to test for sys_rseq somewhere in the init code. So you can test
> > for the other two being fully functional as well.
> > 
> > If one of them is missing then you can avoid that rseq fastpath either
> > completely or because you never registered that rseq muck your rseq will
> > just contain stale init data which kicks you into some slowpath fallback
> > code.
> 
> That would work if we could have more than one rseq TLS entry per thread.
> If it would be the case, then e.g. lttng-ust could own its own rseq
> TLS and do just as you explain above.
> 
> It's not the case with the current proposal. This means multiple user
> libraries will have to share the same cpu_id and cpu_id_start fields,
> which breaks your proposed new-app/old-kernel backward compatibility
> check proposal.
> 
> For instance, if glibc librseq.so happily registers rseq (and does not
> care about testing for cpu_opv or membarrier availability), then
> lttng-ust cannot leave stale rseq init data which kicks in its slowpath
> fallback.

You have to make sure that _ALL_ prerequisites are there before you start
using it whether you have a shared rseq or not. If a setup has rseq working
and sysops are blocked by a stupid mistake in a security filter, then your
assumption of testing rseq alone is broken already and stuff will explode
in hard to debug ways.

You CANNOT make such assumptions ever. Robustness is the first thing to
look at and after that you can optimize the hell out of it, without
violating robustness while doing that.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 175+ messages in thread

end of thread, other threads:[~2017-11-24 14:47 UTC | newest]

Thread overview: 175+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-21 14:18 [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector Mathieu Desnoyers
2017-11-21 14:18 ` Mathieu Desnoyers
2017-11-21 14:18 ` [RFC PATCH for 4.15 01/22] uapi headers: Provide types_32_64.h Mathieu Desnoyers
2017-11-21 14:18 ` [RFC PATCH for 4.15 v12 02/22] rseq: Introduce restartable sequences system call Mathieu Desnoyers
2017-11-21 14:18 ` [RFC PATCH for 4.15 03/22] arm: Add restartable sequences support Mathieu Desnoyers
2017-11-21 14:18 ` [RFC PATCH for 4.15 04/22] arm: Wire up restartable sequences system call Mathieu Desnoyers
2017-11-21 14:18 ` [RFC PATCH for 4.15 05/22] x86: Add support for restartable sequences Mathieu Desnoyers
2017-11-21 14:18   ` Mathieu Desnoyers
2017-11-21 14:18 ` [RFC PATCH for 4.15 06/22] x86: Wire up restartable sequence system call Mathieu Desnoyers
2017-11-21 14:18   ` Mathieu Desnoyers
2017-11-21 14:18 ` [RFC PATCH for 4.15 07/22] powerpc: Add support for restartable sequences Mathieu Desnoyers
2017-11-21 14:18   ` Mathieu Desnoyers
2017-11-21 14:18 ` [RFC PATCH for 4.15 08/22] powerpc: Wire up restartable sequences system call Mathieu Desnoyers
2017-11-21 14:18   ` Mathieu Desnoyers
2017-11-21 14:18 ` [RFC PATCH for 4.15 09/22] sched: Implement push_task_to_cpu Mathieu Desnoyers
2017-11-21 14:18 ` [RFC PATCH for 4.15 v4 10/22] cpu_opv: Provide cpu_opv system call Mathieu Desnoyers
2017-11-21 14:18 ` [RFC PATCH for 4.15 11/22] x86: Wire up " Mathieu Desnoyers
2017-11-21 14:18 ` [RFC PATCH for 4.15 12/22] powerpc: " Mathieu Desnoyers
2017-11-21 14:18   ` Mathieu Desnoyers
2017-11-21 14:18 ` [RFC PATCH for 4.15 13/22] arm: " Mathieu Desnoyers
2017-11-21 14:18 ` [RFC PATCH for 4.15 v3 14/22] cpu_opv: selftests: Implement selftests Mathieu Desnoyers
2017-11-21 14:18   ` [Linux-kselftest-mirror] " Mathieu Desnoyers
2017-11-21 14:18   ` mathieu.desnoyers
2017-11-21 15:17   ` Shuah Khan
2017-11-21 15:17     ` Shuah Khan
2017-11-21 15:17     ` [Linux-kselftest-mirror] " Shuah Khan
2017-11-21 15:17     ` shuah
2017-11-21 16:46     ` Mathieu Desnoyers
2017-11-21 16:46       ` Mathieu Desnoyers
2017-11-21 16:46       ` [Linux-kselftest-mirror] " Mathieu Desnoyers
2017-11-21 16:46       ` mathieu.desnoyers
2017-11-21 14:18 ` [RFC PATCH for 4.15 v3 15/22] rseq: selftests: Provide self-tests Mathieu Desnoyers
2017-11-21 15:34   ` Shuah Khan
2017-11-21 15:34     ` Shuah Khan
2017-11-21 17:05     ` Mathieu Desnoyers
2017-11-21 17:05       ` Mathieu Desnoyers
2017-11-21 17:05       ` [Linux-kselftest-mirror] " Mathieu Desnoyers
2017-11-21 17:05       ` mathieu.desnoyers
2017-11-21 17:40       ` Shuah Khan
2017-11-21 17:40         ` Shuah Khan
2017-11-21 17:40         ` [Linux-kselftest-mirror] " Shuah Khan
2017-11-21 17:40         ` shuah
2017-11-21 21:22         ` Mathieu Desnoyers
2017-11-21 21:22           ` Mathieu Desnoyers
2017-11-21 21:22           ` [Linux-kselftest-mirror] " Mathieu Desnoyers
2017-11-21 21:22           ` mathieu.desnoyers
2017-11-21 21:24           ` Shuah Khan
2017-11-21 21:24             ` Shuah Khan
2017-11-21 21:24             ` [Linux-kselftest-mirror] " Shuah Khan
2017-11-21 21:24             ` shuahkh
2017-11-21 21:44             ` Mathieu Desnoyers
2017-11-21 21:44               ` Mathieu Desnoyers
2017-11-21 21:44               ` [Linux-kselftest-mirror] " Mathieu Desnoyers
2017-11-21 21:44               ` mathieu.desnoyers
2017-11-22 19:38   ` Peter Zijlstra
2017-11-22 19:38     ` Peter Zijlstra
2017-11-22 19:38     ` [Linux-kselftest-mirror] " Peter Zijlstra
2017-11-22 19:38     ` peterz
2017-11-23 21:16     ` Mathieu Desnoyers
2017-11-23 21:16       ` Mathieu Desnoyers
2017-11-23 21:16       ` [Linux-kselftest-mirror] " Mathieu Desnoyers
2017-11-23 21:16       ` mathieu.desnoyers
2017-11-22 21:48   ` Peter Zijlstra
2017-11-22 21:48     ` Peter Zijlstra
2017-11-22 21:48     ` [Linux-kselftest-mirror] " Peter Zijlstra
2017-11-22 21:48     ` peterz
2017-11-23 22:53     ` Mathieu Desnoyers
2017-11-23 22:53       ` Mathieu Desnoyers
2017-11-23 22:53       ` [Linux-kselftest-mirror] " Mathieu Desnoyers
2017-11-23 22:53       ` mathieu.desnoyers
2017-11-23  8:55   ` Peter Zijlstra
2017-11-23  8:55     ` Peter Zijlstra
2017-11-23  8:55     ` [Linux-kselftest-mirror] " Peter Zijlstra
2017-11-23  8:55     ` peterz
2017-11-23  8:57     ` Peter Zijlstra
2017-11-23  8:57       ` Peter Zijlstra
2017-11-23  8:57       ` [Linux-kselftest-mirror] " Peter Zijlstra
2017-11-23  8:57       ` peterz
2017-11-24 14:15       ` Mathieu Desnoyers
2017-11-24 14:15         ` Mathieu Desnoyers
2017-11-24 14:15         ` [Linux-kselftest-mirror] " Mathieu Desnoyers
2017-11-24 14:15         ` mathieu.desnoyers
2017-11-24 13:55     ` Mathieu Desnoyers
2017-11-24 13:55       ` Mathieu Desnoyers
2017-11-24 13:55       ` [Linux-kselftest-mirror] " Mathieu Desnoyers
2017-11-24 13:55       ` mathieu.desnoyers
2017-11-21 14:18 ` [RFC PATCH for 4.15 16/22] rseq: selftests: arm: workaround gcc asm size guess Mathieu Desnoyers
2017-11-21 14:18   ` Mathieu Desnoyers
2017-11-21 14:18   ` [Linux-kselftest-mirror] " Mathieu Desnoyers
2017-11-21 14:18   ` mathieu.desnoyers
2017-11-21 15:39   ` Shuah Khan
2017-11-21 15:39     ` Shuah Khan
2017-11-21 15:39     ` [Linux-kselftest-mirror] " Shuah Khan
2017-11-21 15:39     ` shuah
2017-11-21 14:18 ` [RFC PATCH for 4.15 17/22] Fix: membarrier: add missing preempt off around smp_call_function_many Mathieu Desnoyers
2017-11-21 14:18   ` Mathieu Desnoyers
2017-11-21 14:18 ` [RFC PATCH for 4.15 18/22] membarrier: selftest: Test private expedited cmd Mathieu Desnoyers
2017-11-21 14:18   ` Mathieu Desnoyers
2017-11-21 14:18   ` [Linux-kselftest-mirror] " Mathieu Desnoyers
2017-11-21 14:18   ` mathieu.desnoyers
2017-11-21 14:18 ` [RFC PATCH for 4.15 v7 19/22] powerpc: membarrier: Skip memory barrier in switch_mm() Mathieu Desnoyers
2017-11-21 14:18   ` Mathieu Desnoyers
2017-11-21 14:18 ` [RFC PATCH for 4.15 v5 20/22] membarrier: Document scheduler barrier requirements Mathieu Desnoyers
2017-11-21 14:18   ` Mathieu Desnoyers
2017-11-21 14:18 ` [RFC PATCH for 4.15 v2 21/22] membarrier: provide SHARED_EXPEDITED command Mathieu Desnoyers
2017-11-21 14:18   ` Mathieu Desnoyers
2017-11-21 14:19 ` [RFC PATCH for 4.15 22/22] membarrier: selftest: Test shared expedited cmd Mathieu Desnoyers
2017-11-21 14:19   ` Mathieu Desnoyers
2017-11-21 14:19   ` [Linux-kselftest-mirror] " Mathieu Desnoyers
2017-11-21 14:19   ` mathieu.desnoyers
2017-11-21 17:21 ` [RFC PATCH for 4.15 v12 00/22] Restartable sequences and CPU op vector Andi Kleen
2017-11-21 17:21   ` Andi Kleen
2017-11-21 22:05   ` Mathieu Desnoyers
2017-11-21 22:05     ` Mathieu Desnoyers
2017-11-21 22:59     ` Thomas Gleixner
2017-11-21 22:59       ` Thomas Gleixner
2017-11-22 12:36       ` Mathieu Desnoyers
2017-11-22 12:36         ` Mathieu Desnoyers
2017-11-22 15:25         ` Thomas Gleixner
2017-11-22 15:25           ` Thomas Gleixner
2017-11-22 15:28     ` Andy Lutomirski
2017-11-22 15:28       ` Andy Lutomirski
2017-11-22 16:43       ` Mathieu Desnoyers
2017-11-22 16:43         ` Mathieu Desnoyers
2017-11-22 18:10         ` Andi Kleen
2017-11-22 18:10           ` Andi Kleen
2017-11-22 19:32     ` Peter Zijlstra
2017-11-22 19:32       ` Peter Zijlstra
2017-11-22 19:37       ` Will Deacon
2017-11-22 19:37         ` Will Deacon
2017-11-23 21:15         ` Mathieu Desnoyers
2017-11-23 21:15           ` Mathieu Desnoyers
2017-11-23 22:51           ` Thomas Gleixner
2017-11-23 22:51             ` Thomas Gleixner
2017-11-23 23:01             ` Mathieu Desnoyers
2017-11-23 23:01               ` Mathieu Desnoyers
2017-11-23 23:38               ` Thomas Gleixner
2017-11-23 23:38                 ` Thomas Gleixner
2017-11-24  0:04                 ` Mathieu Desnoyers
2017-11-24  0:04                   ` Mathieu Desnoyers
2017-11-24 14:47                   ` Thomas Gleixner
2017-11-24 14:47                     ` Thomas Gleixner
2017-11-23 21:13       ` Mathieu Desnoyers
2017-11-23 21:13         ` Mathieu Desnoyers
2017-11-23 21:49         ` Andi Kleen
2017-11-23 21:49           ` Andi Kleen
2017-11-21 22:19 ` [PATCH update for 4.15 1/3] selftests: lib.mk: Introduce OVERRIDE_TARGETS Mathieu Desnoyers
2017-11-21 22:19   ` Mathieu Desnoyers
2017-11-21 22:19   ` [Linux-kselftest-mirror] " Mathieu Desnoyers
2017-11-21 22:19   ` mathieu.desnoyers
2017-11-21 22:22   ` Mathieu Desnoyers
2017-11-21 22:22     ` Mathieu Desnoyers
2017-11-21 22:22     ` [Linux-kselftest-mirror] " Mathieu Desnoyers
2017-11-21 22:22     ` mathieu.desnoyers
2017-11-22 15:16   ` Shuah Khan
2017-11-22 15:16     ` Shuah Khan
2017-11-22 15:16     ` [Linux-kselftest-mirror] " Shuah Khan
2017-11-22 15:16     ` shuah
2017-11-21 22:19 ` [PATCH update for 4.15 2/3] cpu_opv: selftests: Implement selftests (v4) Mathieu Desnoyers
2017-11-21 22:19   ` Mathieu Desnoyers
2017-11-21 22:19   ` [Linux-kselftest-mirror] " Mathieu Desnoyers
2017-11-21 22:19   ` mathieu.desnoyers
2017-11-22 15:20   ` Shuah Khan
2017-11-22 15:20     ` Shuah Khan
2017-11-22 15:20     ` [Linux-kselftest-mirror] " Shuah Khan
2017-11-22 15:20     ` shuah
2017-11-21 22:19 ` [PATCH update for 4.15 3/3] rseq: selftests: Provide self-tests (v4) Mathieu Desnoyers
2017-11-22 15:23   ` Shuah Khan
2017-11-22 15:23     ` Shuah Khan
2017-11-22 15:23     ` [Linux-kselftest-mirror] " Shuah Khan
2017-11-22 15:23     ` shuah
2017-11-22 16:31     ` Mathieu Desnoyers
2017-11-22 16:31       ` Mathieu Desnoyers
2017-11-22 16:31       ` [Linux-kselftest-mirror] " Mathieu Desnoyers
2017-11-22 16:31       ` mathieu.desnoyers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.