* [RFC PATCH for 4.18 00/14] Restartable Sequences @ 2018-04-30 22:44 Mathieu Desnoyers 2018-04-30 22:44 ` [PATCH 01/14] uapi headers: Provide types_32_64.h (v2) Mathieu Desnoyers ` (14 more replies) 0 siblings, 15 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers Hi, Here is an updated RFC round of the Restartable Sequences patchset based on kernel 4.17-rc3. Based on feedback from Linus, I'm introducing only the rseq system call, keeping the rest for later. This already enables speeding up the Facebook jemalloc and arm64 PMC read from user-space use-cases, as well as speedup of use-cases relying on getting the current cpu number from user-space. We'll have to wait until a more complete solution is introduced before the LTTng-UST tracer can replace its ring buffer atomic instructions with rseq though. But let's proceed one step at a time. The main change introduced by the removal of cpu_opv from this series in terms of library use from user-space is that APIs that previously took a CPU number as argument now only act on the current CPU. So for instance, this turns: int cpu = rseq_per_cpu_lock(lock, target_cpu); [...] rseq_per_cpu_unlock(lock, cpu); into int cpu = rseq_this_cpu_lock(lock); [...] rseq_per_cpu_unlock(lock, cpu); and: per_cpu_list_push(list, node, target_cpu); [...] per_cpu_list_pop(list, node, target_cpu); into this_cpu_list_push(list, node, &cpu); /* cpu is an output parameter. */ [...] node = this_cpu_list_pop(list, &cpu); /* cpu is an output parameter. */ Eventually integrating cpu_opv or some alternative will allow passing the cpu number as parameter rather than requiring the algorithm to work on the current CPU. The second effect of not having the cpu_opv fallback is that line and instruction single-stepping with a debugger transforms rseq critical sections based on retry loops into never-ending loops. Debuggers need to use the __rseq_table section to skip those critical sections in order to correctly behave when single-stepping a thread which uses rseq in a retry loop. However, applications which use an alternative fallback method rather than retrying on rseq fast-path abort won't be affected by this kind of single-stepping issue. Feedback is welcome! Thanks, Mathieu Boqun Feng (2): powerpc: Add support for restartable sequences powerpc: Wire up restartable sequences system call Mathieu Desnoyers (12): uapi headers: Provide types_32_64.h (v2) rseq: Introduce restartable sequences system call (v13) arm: Add restartable sequences support arm: Wire up restartable sequences system call x86: Add support for restartable sequences (v2) x86: Wire up restartable sequence system call selftests: lib.mk: Introduce OVERRIDE_TARGETS rseq: selftests: Provide rseq library (v5) rseq: selftests: Provide basic test rseq: selftests: Provide basic percpu ops test (v2) rseq: selftests: Provide parametrized tests (v2) rseq: selftests: Provide Makefile, scripts, gitignore (v2) MAINTAINERS | 12 + arch/Kconfig | 7 + arch/arm/Kconfig | 1 + arch/arm/kernel/signal.c | 7 + arch/arm/tools/syscall.tbl | 1 + arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/systbl.h | 1 + arch/powerpc/include/asm/unistd.h | 2 +- arch/powerpc/include/uapi/asm/unistd.h | 1 + arch/powerpc/kernel/signal.c | 3 + arch/x86/Kconfig | 1 + arch/x86/entry/common.c | 3 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/x86/kernel/signal.c | 6 + fs/exec.c | 1 + include/linux/sched.h | 134 +++ include/linux/syscalls.h | 4 +- include/trace/events/rseq.h | 56 + include/uapi/linux/rseq.h | 150 +++ include/uapi/linux/types_32_64.h | 67 ++ init/Kconfig | 23 + kernel/Makefile | 1 + kernel/fork.c | 2 + kernel/rseq.c | 366 ++++++ kernel/sched/core.c | 2 + kernel/sys_ni.c | 3 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/lib.mk | 4 + tools/testing/selftests/rseq/.gitignore | 6 + tools/testing/selftests/rseq/Makefile | 29 + .../testing/selftests/rseq/basic_percpu_ops_test.c | 312 +++++ tools/testing/selftests/rseq/basic_test.c | 55 + tools/testing/selftests/rseq/param_test.c | 1259 ++++++++++++++++++++ tools/testing/selftests/rseq/rseq-arm.h | 732 ++++++++++++ tools/testing/selftests/rseq/rseq-ppc.h | 688 +++++++++++ tools/testing/selftests/rseq/rseq-skip.h | 82 ++ tools/testing/selftests/rseq/rseq-x86.h | 1149 ++++++++++++++++++ tools/testing/selftests/rseq/rseq.c | 116 ++ tools/testing/selftests/rseq/rseq.h | 164 +++ tools/testing/selftests/rseq/run_param_test.sh | 120 ++ 41 files changed, 5572 insertions(+), 2 deletions(-) create mode 100644 include/trace/events/rseq.h create mode 100644 include/uapi/linux/rseq.h create mode 100644 include/uapi/linux/types_32_64.h create mode 100644 kernel/rseq.c create mode 100644 tools/testing/selftests/rseq/.gitignore create mode 100644 tools/testing/selftests/rseq/Makefile create mode 100644 tools/testing/selftests/rseq/basic_percpu_ops_test.c create mode 100644 tools/testing/selftests/rseq/basic_test.c create mode 100644 tools/testing/selftests/rseq/param_test.c create mode 100644 tools/testing/selftests/rseq/rseq-arm.h create mode 100644 tools/testing/selftests/rseq/rseq-ppc.h create mode 100644 tools/testing/selftests/rseq/rseq-skip.h create mode 100644 tools/testing/selftests/rseq/rseq-x86.h create mode 100644 tools/testing/selftests/rseq/rseq.c create mode 100644 tools/testing/selftests/rseq/rseq.h create mode 100755 tools/testing/selftests/rseq/run_param_test.sh -- 2.11.0 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [PATCH 01/14] uapi headers: Provide types_32_64.h (v2) 2018-04-30 22:44 [RFC PATCH for 4.18 00/14] Restartable Sequences Mathieu Desnoyers @ 2018-04-30 22:44 ` Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers ` (13 subsequent siblings) 14 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers Provide helper macros for fields which represent pointers in kernel-userspace ABI. This facilitates handling of 32-bit user-space by 64-bit kernels by defining those fields as 32-bit 0-padding and 32-bit integer on 32-bit architectures, which allows the kernel to treat those as 64-bit integers. The order of padding and 32-bit integer depends on the endianness. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Paul Turner <pjt@google.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Andrew Hunter <ahh@google.com> CC: Andy Lutomirski <luto@amacapital.net> CC: Andi Kleen <andi@firstfloor.org> CC: Dave Watson <davejwatson@fb.com> CC: Chris Lameter <cl@linux.com> CC: Ingo Molnar <mingo@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Ben Maurer <bmaurer@fb.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: Josh Triplett <josh@joshtriplett.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Russell King <linux@arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas@arm.com> CC: Will Deacon <will.deacon@arm.com> CC: Michael Kerrisk <mtk.manpages@gmail.com> CC: Boqun Feng <boqun.feng@gmail.com> CC: linux-api@vger.kernel.org --- Changes since v1: - Public uapi headers use __u32 and __u64 rather than uint32_t and uint64_t. --- include/uapi/linux/types_32_64.h | 50 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) create mode 100644 include/uapi/linux/types_32_64.h diff --git a/include/uapi/linux/types_32_64.h b/include/uapi/linux/types_32_64.h new file mode 100644 index 000000000000..0a87ace34a57 --- /dev/null +++ b/include/uapi/linux/types_32_64.h @@ -0,0 +1,50 @@ +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */ +#ifndef _UAPI_LINUX_TYPES_32_64_H +#define _UAPI_LINUX_TYPES_32_64_H + +/* + * linux/types_32_64.h + * + * Integer type declaration for pointers across 32-bit and 64-bit systems. + * + * Copyright (c) 2015-2018 Mathieu Desnoyers <mathieu.desnoyers@efficios.com> + */ + +#ifdef __KERNEL__ +# include <linux/types.h> +#else +# include <stdint.h> +#endif + +#include <asm/byteorder.h> + +#ifdef __BYTE_ORDER +# if (__BYTE_ORDER == __BIG_ENDIAN) +# define LINUX_BYTE_ORDER_BIG_ENDIAN +# else +# define LINUX_BYTE_ORDER_LITTLE_ENDIAN +# endif +#else +# ifdef __BIG_ENDIAN +# define LINUX_BYTE_ORDER_BIG_ENDIAN +# else +# define LINUX_BYTE_ORDER_LITTLE_ENDIAN +# endif +#endif + +#ifdef __LP64__ +# define LINUX_FIELD_u32_u64(field) __u64 field +# define LINUX_FIELD_u32_u64_INIT_ONSTACK(field, v) field = (intptr_t)v +#else +# ifdef LINUX_BYTE_ORDER_BIG_ENDIAN +# define LINUX_FIELD_u32_u64(field) __u32 field ## _padding, field +# define LINUX_FIELD_u32_u64_INIT_ONSTACK(field, v) \ + field ## _padding = 0, field = (intptr_t)v +# else +# define LINUX_FIELD_u32_u64(field) __u32 field, field ## _padding +# define LINUX_FIELD_u32_u64_INIT_ONSTACK(field, v) \ + field = (intptr_t)v, field ## _padding = 0 +# endif +#endif + +#endif /* _UAPI_LINUX_TYPES_32_64_H */ -- 2.11.0 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 02/14] rseq: Introduce restartable sequences system call (v13) 2018-04-30 22:44 [RFC PATCH for 4.18 00/14] Restartable Sequences Mathieu Desnoyers @ 2018-04-30 22:44 ` Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers ` (13 subsequent siblings) 14 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers, Alexander Viro Expose a new system call allowing each thread to register one userspace memory area to be used as an ABI between kernel and user-space for two purposes: user-space restartable sequences and quick access to read the current CPU number value from user-space. * Restartable sequences (per-cpu atomics) Restartables sequences allow user-space to perform update operations on per-cpu data without requiring heavy-weight atomic operations. The restartable critical sections (percpu atomics) work has been started by Paul Turner and Andrew Hunter. It lets the kernel handle restart of critical sections. [1] [2] The re-implementation proposed here brings a few simplifications to the ABI which facilitates porting to other architectures and speeds up the user-space fast path. Here are benchmarks of various rseq use-cases. Test hardware: arm32: ARMv7 Processor rev 4 (v7l) "Cubietruck", 2-core x86-64: Intel E5-2630 v3@2.40GHz, 16-core, hyperthreading The following benchmarks were all performed on a single thread. * Per-CPU statistic counter increment getcpu+atomic (ns/op) rseq (ns/op) speedup arm32: 344.0 31.4 11.0 x86-64: 15.3 2.0 7.7 * LTTng-UST: write event 32-bit header, 32-bit payload into tracer per-cpu buffer getcpu+atomic (ns/op) rseq (ns/op) speedup arm32: 2502.0 2250.0 1.1 x86-64: 117.4 98.0 1.2 * liburcu percpu: lock-unlock pair, dereference, read/compare word getcpu+atomic (ns/op) rseq (ns/op) speedup arm32: 751.0 128.5 5.8 x86-64: 53.4 28.6 1.9 * jemalloc memory allocator adapted to use rseq Using rseq with per-cpu memory pools in jemalloc at Facebook (based on rseq 2016 implementation): The production workload response-time has 1-2% gain avg. latency, and the P99 overall latency drops by 2-3%. * Reading the current CPU number Speeding up reading the current CPU number on which the caller thread is running is done by keeping the current CPU number up do date within the cpu_id field of the memory area registered by the thread. This is done by making scheduler preemption set the TIF_NOTIFY_RESUME flag on the current thread. Upon return to user-space, a notify-resume handler updates the current CPU value within the registered user-space memory area. User-space can then read the current CPU number directly from memory. Keeping the current cpu id in a memory area shared between kernel and user-space is an improvement over current mechanisms available to read the current CPU number, which has the following benefits over alternative approaches: - 35x speedup on ARM vs system call through glibc - 20x speedup on x86 compared to calling glibc, which calls vdso executing a "lsl" instruction, - 14x speedup on x86 compared to inlined "lsl" instruction, - Unlike vdso approaches, this cpu_id value can be read from an inline assembly, which makes it a useful building block for restartable sequences. - The approach of reading the cpu id through memory mapping shared between kernel and user-space is portable (e.g. ARM), which is not the case for the lsl-based x86 vdso. On x86, yet another possible approach would be to use the gs segment selector to point to user-space per-cpu data. This approach performs similarly to the cpu id cache, but it has two disadvantages: it is not portable, and it is incompatible with existing applications already using the gs segment selector for other purposes. Benchmarking various approaches for reading the current CPU number: ARMv7 Processor rev 4 (v7l) Machine model: Cubietruck - Baseline (empty loop): 8.4 ns - Read CPU from rseq cpu_id: 16.7 ns - Read CPU from rseq cpu_id (lazy register): 19.8 ns - glibc 2.19-0ubuntu6.6 getcpu: 301.8 ns - getcpu system call: 234.9 ns x86-64 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz: - Baseline (empty loop): 0.8 ns - Read CPU from rseq cpu_id: 0.8 ns - Read CPU from rseq cpu_id (lazy register): 0.8 ns - Read using gs segment selector: 0.8 ns - "lsl" inline assembly: 13.0 ns - glibc 2.19-0ubuntu6 getcpu: 16.6 ns - getcpu system call: 53.9 ns - Speed (benchmark taken on v8 of patchset) Running 10 runs of hackbench -l 100000 seems to indicate, contrary to expectations, that enabling CONFIG_RSEQ slightly accelerates the scheduler: Configuration: 2 sockets * 8-core Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz (directly on hardware, hyperthreading disabled in BIOS, energy saving disabled in BIOS, turboboost disabled in BIOS, cpuidle.off=1 kernel parameter), with a Linux v4.6 defconfig+localyesconfig, restartable sequences series applied. * CONFIG_RSEQ=n avg.: 41.37 s std.dev.: 0.36 s * CONFIG_RSEQ=y avg.: 40.46 s std.dev.: 0.33 s - Size On x86-64, between CONFIG_RSEQ=n/y, the text size increase of vmlinux is 567 bytes, and the data size increase of vmlinux is 5696 bytes. [1] https://lwn.net/Articles/650333/ [2] http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdf Link: http://lkml.kernel.org/r/20151027235635.16059.11630.stgit@pjt-glaptop.roam.corp.google.com Link: http://lkml.kernel.org/r/20150624222609.6116.86035.stgit@kitami.mtv.corp.google.com Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Paul Turner <pjt@google.com> CC: Andrew Hunter <ahh@google.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Andi Kleen <andi@firstfloor.org> CC: Dave Watson <davejwatson@fb.com> CC: Chris Lameter <cl@linux.com> CC: Ingo Molnar <mingo@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Ben Maurer <bmaurer@fb.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Russell King <linux@arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas@arm.com> CC: Will Deacon <will.deacon@arm.com> CC: Michael Kerrisk <mtk.manpages@gmail.com> CC: Boqun Feng <boqun.feng@gmail.com> CC: Alexander Viro <viro@zeniv.linux.org.uk> CC: linux-api@vger.kernel.org --- Changes since v1: - Return -1, errno=EINVAL if cpu_cache pointer is not aligned on sizeof(int32_t). - Update man page to describe the pointer alignement requirements and update atomicity guarantees. - Add MAINTAINERS file GETCPU_CACHE entry. - Remove dynamic memory allocation: go back to having a single getcpu_cache entry per thread. Update documentation accordingly. - Rebased on Linux 4.4. Changes since v2: - Introduce a "cmd" argument, along with an enum with GETCPU_CACHE_GET and GETCPU_CACHE_SET. Introduce a uapi header linux/getcpu_cache.h defining this enumeration. - Split resume notifier architecture implementation from the system call wire up in the following arch-specific patches. - Man pages updates. - Handle 32-bit compat pointers. - Simplify handling of getcpu_cache GETCPU_CACHE_SET compiler barrier: set the current cpu cache pointer before doing the cache update, and set it back to NULL if the update fails. Setting it back to NULL on error ensures that no resume notifier will trigger a SIGSEGV if a migration happened concurrently. Changes since v3: - Fix __user annotations in compat code, - Update memory ordering comments. - Rebased on kernel v4.5-rc5. Changes since v4: - Inline getcpu_cache_fork, getcpu_cache_execve, and getcpu_cache_exit. - Add new line between if() and switch() to improve readability. - Added sched switch benchmarks (hackbench) and size overhead comparison to change log. Changes since v5: - Rename "getcpu_cache" to "thread_local_abi", allowing to extend this system call to cover future features such as restartable critical sections. Generalizing this system call ensures that we can add features similar to the cpu_id field within the same cache-line without having to track one pointer per feature within the task struct. - Add a tlabi_nr parameter to the system call, thus allowing to extend the ABI beyond the initial 64-byte structure by registering structures with tlabi_nr greater than 0. The initial ABI structure is associated with tlabi_nr 0. - Rebased on kernel v4.5. Changes since v6: - Integrate "restartable sequences" v2 patchset from Paul Turner. - Add handling of single-stepping purely in user-space, with a fallback to locking after 2 rseq failures to ensure progress, and by exposing a __rseq_table section to debuggers so they know where to put breakpoints when dealing with rseq assembly blocks which can be aborted at any point. - make the code and ABI generic: porting the kernel implementation simply requires to wire up the signal handler and return to user-space hooks, and allocate the syscall number. - extend testing with a fully configurable test program. See param_spinlock_test -h for details. - handling of rseq ENOSYS in user-space, also with a fallback to locking. - modify Paul Turner's rseq ABI to only require a single TLS store on the user-space fast-path, removing the need to populate two additional registers. This is made possible by introducing struct rseq_cs into the ABI to describe a critical section start_ip, post_commit_ip, and abort_ip. - Rebased on kernel v4.7-rc7. Changes since v7: - Documentation updates. - Integrated powerpc architecture support. - Compare rseq critical section start_ip, allows shriking the user-space fast-path code size. - Added Peter Zijlstra, Paul E. McKenney and Boqun Feng as co-maintainers. - Added do_rseq2 and do_rseq_memcpy to test program helper library. - Code cleanup based on review from Peter Zijlstra, Andy Lutomirski and Boqun Feng. - Rebase on kernel v4.8-rc2. Changes since v8: - clear rseq_cs even if non-nested. Speeds up user-space fast path by removing the final "rseq_cs=NULL" assignment. - add enum rseq_flags: critical sections and threads can set migration, preemption and signal "disable" flags to inhibit rseq behavior. - rseq_event_counter needs to be updated with a pre-increment: Otherwise misses an increment after exec (when TLS and in-kernel states are initially 0). Changes since v9: - Update changelog. - Fold instrumentation patch. - check abort-ip signature: Add a signature before the abort-ip landing address. This signature is also received as a new parameter to the rseq system call. The kernel uses it ensures that rseq cannot be used as an exploit vector to redirect execution to arbitrary code. - Use rseq pointer for both register and unregister. This is more symmetric, and eventually allow supporting a linked list of rseq struct per thread if needed in the future. - Unregistration of a rseq structure is now done with RSEQ_FLAG_UNREGISTER. - Remove reference counting. Return "EBUSY" to the caller if rseq is already registered for the current thread. This simplifies implementation while still allowing user-space to perform lazy registration in multi-lib use-cases. (suggested by Ben Maurer) - Clear rseq_cs upon unregister. - Set cpu_id back to -1 on unregister, so if rseq user libraries follow an unregister, and they expect to lazily register rseq, they can do so. - Document rseq_cs clear requirement: JIT should reset the rseq_cs pointer before reclaiming memory of rseq_cs structure. - Introduce rseq_len syscall parameter, rseq_cs version field: Allow keeping track of the registered rseq struct length, for future extensions. Add rseq_cs version as first field. Will allow future extensions. - Use offset and unsigned arithmetic to save a branch: Save a conditional branch when comparing instruction pointer against a rseq_cs descriptor's address range by having post_commit_ip as an offset from start_ip, and using unsigned integer comparison. Suggested by Ben Maurer. - Remove event counter from ABI. Suggested by Andy Lutomirski. - Add INIT_ONSTACK macro: Introduce the RSEQ_FIELD_u32_u64_INIT_ONSTACK() macros to ensure that users correctly initialize the upper bits of RSEQ_FIELD_u32_u64() on their stack to 0 on 32-bit architectures. - Select MEMBARRIER: Allows user-space rseq fast-paths to use the value of cpu_id field (inherently required by the rseq algorithm) to figure out whether membarrier can be expected to be available. This effectively allows user-space fast-paths to remove extra comparisons and branch testing whether membarrier is enabled, and thus whether a full barrier is required (e.g. in userspace RCU implementation after rcu_read_lock/before rcu_read_unlock). - Expose cpu_id_start field: Checking whether the (cpu_id < 0) in the C preparation part of the rseq fast-path brings significant overhead at least on arm32. We can remove this extra comparison by exposing two distinct cpu_id fields in the rseq TLS: The field cpu_id_start always contain a *possible* cpu number, although it may not be the current one if, for instance, rseq is not initialized for the current thread. cpu_id_start is meant to be used in the C code for the pointer chasing to figure out which per-cpu data structure should be passed to the rseq asm sequence. The field cpu_id values -1 means rseq is not initialized, and -2 means initialization failed. That field is used in the rseq asm sequence to confirm that the cpu_id_start value was indeed the current cpu number. It also ends up confirming that rseq is initialized for the current thread, because values -1 and -2 will never match the cpu_id_start possible cpu number values. This allows checking the current CPU number and rseq initialization state with a single comparison on the fast-path. Changes since v10: - Update rseq.c comment, removing reference to event_counter. Changes since v11: - Replace task struct rseq_preempt, rseq_signal, and rseq_migrate bool by u32 rseq_event_mask. - Add missing sys_rseq() asmlinkage declaration to include/linux/syscalls.h. - Copy event mask on process fork, set to 0 on exec and thread-fork. - Cleanups based on review from Peter Zijlstra. - Cleanups based on review from Thomas Gleixner. - Fix: rseq_cs needs to be cleared only when: - Nested over non-critical-section userspace code, - Nested over rseq_cs _and_ handling abort. Basically, we should never clear rseq_cs when the rseq resume to userspace handler is called and it is not handling abort: the problematic case is if any of the __get_user()/__put_user done by the handler trigger a page fault (e.g. page protection done by NUMA page migration work), which triggers preemption: the next call to the rseq resume to userspace handler needs to perform the abort. - Perform rseq event mask updates atomically wrt preemption, - Move rseq_migrate to __set_task_cpu(), thus catching migration scenario that bypass set_task_cpu(): fork and wake_up_new_task. - Merge content of rseq_sched_out into rseq_preempt. There is no need to have two hook sites. Both setting the rseq event mask preempt bit and setting the notify resume thread flag can be done from rseq_preempt(). - Issue rseq_preempt() from fork(), thus ensuring that we handle abort if needed. Changes since v12: - Disallow syscalls from rseq critical sections, - Introduce CONFIG_DEBUG_RSEQ, which terminates processes misusing rseq (e.g. doing a system call within a rseq critical section) with SIGSEGV, - Coding style cleanups based on feedback from Boqun Feng and Peter Zijlstra. Man page associated: RSEQ(2) Linux Programmer's Manual RSEQ(2) NAME rseq - Restartable sequences and cpu number cache SYNOPSIS #include <linux/rseq.h> int rseq(struct rseq * rseq, uint32_t rseq_len, int flags, uint32_t sig); DESCRIPTION The rseq() ABI accelerates user-space operations on per-cpu data by defining a shared data structure ABI between each user- space thread and the kernel. It allows user-space to perform update operations on per-cpu data without requiring heavy-weight atomic operations. The term CPU used in this documentation refers to a hardware execution context. Restartable sequences are atomic with respect to preemption (making it atomic with respect to other threads running on the same CPU), as well as signal delivery (user-space execution contexts nested over the same thread). It is suited for update operations on per-cpu data. It can be used on data structures shared between threads within a process, and on data structures shared between threads across different processes. Some examples of operations that can be accelerated or improved by this ABI: · Memory allocator per-cpu free-lists, · Querying the current CPU number, · Incrementing per-CPU counters, · Modifying data protected by per-CPU spinlocks, · Inserting/removing elements in per-CPU linked-lists, · Writing/reading per-CPU ring buffers content. · Accurately reading performance monitoring unit counters with respect to thread migration. Restartable sequences must not perform system calls. Doing so may result in termination of the process by a segmentation fault. The rseq argument is a pointer to the thread-local rseq struc‐ ture to be shared between kernel and user-space. A NULL rseq value unregisters the current thread rseq structure. The layout of struct rseq is as follows: Structure alignment This structure is aligned on multiples of 32 bytes. Structure size This structure is extensible. Its size is passed as parameter to the rseq system call. Fields cpu_id_start Optimistic cache of the CPU number on which the current thread is running. Its value is guaranteed to always be a possible CPU number, even when rseq is not initial‐ ized. The value it contains should always be confirmed by reading the cpu_id field. cpu_id Cache of the CPU number on which the current thread is running. -1 if uninitialized. rseq_cs The rseq_cs field is a pointer to a struct rseq_cs. Is is NULL when no rseq assembly block critical section is active for the current thread. Setting it to point to a critical section descriptor (struct rseq_cs) marks the beginning of the critical section. flags Flags indicating the restart behavior for the current thread. This is mainly used for debugging purposes. Can be either: · RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT · RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL · RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE The layout of struct rseq_cs version 0 is as follows: Structure alignment This structure is aligned on multiples of 32 bytes. Structure size This structure has a fixed size of 32 bytes. Fields version Version of this structure. flags Flags indicating the restart behavior of this structure. Can be either: · RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT · RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL · RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE start_ip Instruction pointer address of the first instruction of the sequence of consecutive assembly instructions. post_commit_offset Offset (from start_ip address) of the address after the last instruction of the sequence of consecutive assembly instructions. abort_ip Instruction pointer address where to move the execution flow in case of abort of the sequence of consecutive assembly instructions. The rseq_len argument is the size of the struct rseq to regis‐ ter. The flags argument is 0 for registration, and RSEQ_FLAG_UNREG‐ ISTER for unregistration. The sig argument is the 32-bit signature to be expected before the abort handler code. A single library per process should keep the rseq structure in a thread-local storage variable. The cpu_id field should be initialized to -1, and the cpu_id_start field should be ini‐ tialized to a possible CPU value (typically 0). Each thread is responsible for registering and unregistering its rseq structure. No more than one rseq structure address can be registered per thread at a given time. In a typical usage scenario, the thread registering the rseq structure will be performing loads and stores from/to that structure. It is however also allowed to read that structure from other threads. The rseq field updates performed by the kernel provide relaxed atomicity semantics, which guarantee that other threads performing relaxed atomic reads of the cpu number cache will always observe a consistent value. RETURN VALUE A return value of 0 indicates success. On error, -1 is returned, and errno is set appropriately. ERRORS EINVAL Either flags contains an invalid value, or rseq contains an address which is not appropriately aligned, or rseq_len contains a size that does not match the size received on registration. ENOSYS The rseq() system call is not implemented by this ker‐ nel. EFAULT rseq is an invalid address. EBUSY Restartable sequence is already registered for this thread. EPERM The sig argument on unregistration does not match the signature received on registration. VERSIONS The rseq() system call was added in Linux 4.X (TODO). CONFORMING TO rseq() is Linux-specific. SEE ALSO sched_getcpu(3) Linux 2017-11-06 RSEQ(2) --- MAINTAINERS | 11 ++ arch/Kconfig | 7 + fs/exec.c | 1 + include/linux/sched.h | 134 +++++++++++++++++ include/linux/syscalls.h | 4 +- include/trace/events/rseq.h | 57 +++++++ include/uapi/linux/rseq.h | 133 +++++++++++++++++ init/Kconfig | 23 +++ kernel/Makefile | 1 + kernel/fork.c | 2 + kernel/rseq.c | 357 ++++++++++++++++++++++++++++++++++++++++++++ kernel/sched/core.c | 2 + kernel/sys_ni.c | 3 + 13 files changed, 734 insertions(+), 1 deletion(-) create mode 100644 include/trace/events/rseq.h create mode 100644 include/uapi/linux/rseq.h create mode 100644 kernel/rseq.c diff --git a/MAINTAINERS b/MAINTAINERS index 79bb02ff812f..4d61ce154dfc 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11981,6 +11981,17 @@ F: include/dt-bindings/reset/ F: include/linux/reset.h F: include/linux/reset-controller.h +RESTARTABLE SEQUENCES SUPPORT +M: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> +M: Peter Zijlstra <peterz@infradead.org> +M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> +M: Boqun Feng <boqun.feng@gmail.com> +L: linux-kernel@vger.kernel.org +S: Supported +F: kernel/rseq.c +F: include/uapi/linux/rseq.h +F: include/trace/events/rseq.h + RFKILL M: Johannes Berg <johannes@sipsolutions.net> L: linux-wireless@vger.kernel.org diff --git a/arch/Kconfig b/arch/Kconfig index 8e0d665c8d53..43b5e103c1b2 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -272,6 +272,13 @@ config HAVE_REGS_AND_STACK_ACCESS_API declared in asm/ptrace.h For example the kprobes-based event tracer needs this API. +config HAVE_RSEQ + bool + depends on HAVE_REGS_AND_STACK_ACCESS_API + help + This symbol should be selected by an architecture if it + supports an implementation of restartable sequences. + config HAVE_CLK bool help diff --git a/fs/exec.c b/fs/exec.c index 183059c427b9..2c3911612b22 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1822,6 +1822,7 @@ static int do_execveat_common(int fd, struct filename *filename, current->fs->in_exec = 0; current->in_execve = 0; membarrier_execve(current); + rseq_execve(current); acct_update_integrals(current); task_numa_free(current); free_bprm(bprm); diff --git a/include/linux/sched.h b/include/linux/sched.h index b3d697f3b573..496a0b25a42d 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -27,6 +27,7 @@ #include <linux/signal_types.h> #include <linux/mm_types_task.h> #include <linux/task_io_accounting.h> +#include <linux/rseq.h> /* task_struct member predeclarations (sorted alphabetically): */ struct audit_context; @@ -1007,6 +1008,17 @@ struct task_struct { unsigned long numa_pages_migrated; #endif /* CONFIG_NUMA_BALANCING */ +#ifdef CONFIG_RSEQ + struct rseq __user *rseq; + u32 rseq_len; + u32 rseq_sig; + /* + * RmW on rseq_event_mask must be performed atomically + * with respect to preemption. + */ + unsigned long rseq_event_mask; +#endif + struct tlbflush_unmap_batch tlb_ubc; struct rcu_head rcu; @@ -1716,4 +1728,126 @@ extern long sched_getaffinity(pid_t pid, struct cpumask *mask); #define TASK_SIZE_OF(tsk) TASK_SIZE #endif +#ifdef CONFIG_RSEQ + +/* + * Map the event mask on the user-space ABI enum rseq_cs_flags + * for direct mask checks. + */ +enum rseq_event_mask_bits { + RSEQ_EVENT_PREEMPT_BIT = RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT_BIT, + RSEQ_EVENT_SIGNAL_BIT = RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL_BIT, + RSEQ_EVENT_MIGRATE_BIT = RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT, +}; + +enum rseq_event_mask { + RSEQ_EVENT_PREEMPT = (1U << RSEQ_EVENT_PREEMPT_BIT), + RSEQ_EVENT_SIGNAL = (1U << RSEQ_EVENT_SIGNAL_BIT), + RSEQ_EVENT_MIGRATE = (1U << RSEQ_EVENT_MIGRATE_BIT), +}; + +static inline void rseq_set_notify_resume(struct task_struct *t) +{ + if (t->rseq) + set_tsk_thread_flag(t, TIF_NOTIFY_RESUME); +} + +void __rseq_handle_notify_resume(struct pt_regs *regs); + +static inline void rseq_handle_notify_resume(struct pt_regs *regs) +{ + if (current->rseq) + __rseq_handle_notify_resume(regs); +} + +static inline void rseq_signal_deliver(struct pt_regs *regs) +{ + preempt_disable(); + __set_bit(RSEQ_EVENT_SIGNAL_BIT, ¤t->rseq_event_mask); + preempt_enable(); + rseq_handle_notify_resume(regs); +} + +/* rseq_preempt() requires preemption to be disabled. */ +static inline void rseq_preempt(struct task_struct *t) +{ + __set_bit(RSEQ_EVENT_PREEMPT_BIT, &t->rseq_event_mask); + rseq_set_notify_resume(t); +} + +/* rseq_migrate() requires preemption to be disabled. */ +static inline void rseq_migrate(struct task_struct *t) +{ + __set_bit(RSEQ_EVENT_MIGRATE_BIT, &t->rseq_event_mask); + rseq_set_notify_resume(t); +} + +/* + * If parent process has a registered restartable sequences area, the + * child inherits. Only applies when forking a process, not a thread. In + * case a parent fork() in the middle of a restartable sequence, set the + * resume notifier to force the child to retry. + */ +static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags) +{ + if (clone_flags & CLONE_THREAD) { + t->rseq = NULL; + t->rseq_len = 0; + t->rseq_sig = 0; + t->rseq_event_mask = 0; + } else { + t->rseq = current->rseq; + t->rseq_len = current->rseq_len; + t->rseq_sig = current->rseq_sig; + t->rseq_event_mask = current->rseq_event_mask; + rseq_preempt(t); + } +} + +static inline void rseq_execve(struct task_struct *t) +{ + t->rseq = NULL; + t->rseq_len = 0; + t->rseq_sig = 0; + t->rseq_event_mask = 0; +} + +#else + +static inline void rseq_set_notify_resume(struct task_struct *t) +{ +} +static inline void rseq_handle_notify_resume(struct pt_regs *regs) +{ +} +static inline void rseq_signal_deliver(struct pt_regs *regs) +{ +} +static inline void rseq_preempt(struct task_struct *t) +{ +} +static inline void rseq_migrate(struct task_struct *t) +{ +} +static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags) +{ +} +static inline void rseq_execve(struct task_struct *t) +{ +} + +#endif + +#ifdef CONFIG_DEBUG_RSEQ + +void rseq_syscall(struct pt_regs *regs); + +#else + +static inline void rseq_syscall(struct pt_regs *regs) +{ +} + +#endif + #endif diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 70fcda1a9049..a16d72c70f28 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -66,6 +66,7 @@ struct old_linux_dirent; struct perf_event_attr; struct file_handle; struct sigaltstack; +struct rseq; union bpf_attr; #include <linux/types.h> @@ -890,7 +891,8 @@ asmlinkage long sys_pkey_alloc(unsigned long flags, unsigned long init_val); asmlinkage long sys_pkey_free(int pkey); asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags, unsigned mask, struct statx __user *buffer); - +asmlinkage long sys_rseq(struct rseq __user *rseq, uint32_t rseq_len, + int flags, uint32_t sig); /* * Architecture-specific system calls diff --git a/include/trace/events/rseq.h b/include/trace/events/rseq.h new file mode 100644 index 000000000000..a04a64bc1a00 --- /dev/null +++ b/include/trace/events/rseq.h @@ -0,0 +1,57 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM rseq + +#if !defined(_TRACE_RSEQ_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_RSEQ_H + +#include <linux/tracepoint.h> +#include <linux/types.h> + +TRACE_EVENT(rseq_update, + + TP_PROTO(struct task_struct *t), + + TP_ARGS(t), + + TP_STRUCT__entry( + __field(s32, cpu_id) + ), + + TP_fast_assign( + __entry->cpu_id = raw_smp_processor_id(); + ), + + TP_printk("cpu_id=%d", __entry->cpu_id) +); + +TRACE_EVENT(rseq_ip_fixup, + + TP_PROTO(unsigned long regs_ip, unsigned long start_ip, + unsigned long post_commit_offset, unsigned long abort_ip), + + TP_ARGS(regs_ip, start_ip, post_commit_offset, abort_ip), + + TP_STRUCT__entry( + __field(unsigned long, regs_ip) + __field(unsigned long, start_ip) + __field(unsigned long, post_commit_offset) + __field(unsigned long, abort_ip) + ), + + TP_fast_assign( + __entry->regs_ip = regs_ip; + __entry->start_ip = start_ip; + __entry->post_commit_offset = post_commit_offset; + __entry->abort_ip = abort_ip; + ), + + TP_printk("regs_ip=0x%lx start_ip=0x%lx post_commit_offset=%lu abort_ip=0x%lx", + __entry->regs_ip, __entry->start_ip, + __entry->post_commit_offset, __entry->abort_ip) +); + +#endif /* _TRACE_SOCK_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h new file mode 100644 index 000000000000..d620fa43756c --- /dev/null +++ b/include/uapi/linux/rseq.h @@ -0,0 +1,133 @@ +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */ +#ifndef _UAPI_LINUX_RSEQ_H +#define _UAPI_LINUX_RSEQ_H + +/* + * linux/rseq.h + * + * Restartable sequences system call API + * + * Copyright (c) 2015-2018 Mathieu Desnoyers <mathieu.desnoyers@efficios.com> + */ + +#ifdef __KERNEL__ +# include <linux/types.h> +#else +# include <stdint.h> +#endif + +#include <linux/types_32_64.h> + +enum rseq_cpu_id_state { + RSEQ_CPU_ID_UNINITIALIZED = -1, + RSEQ_CPU_ID_REGISTRATION_FAILED = -2, +}; + +enum rseq_flags { + RSEQ_FLAG_UNREGISTER = (1 << 0), +}; + +enum rseq_cs_flags_bit { + RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT_BIT = 0, + RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL_BIT = 1, + RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT = 2, +}; + +enum rseq_cs_flags { + RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT = + (1U << RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT_BIT), + RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL = + (1U << RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL_BIT), + RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE = + (1U << RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT), +}; + +/* + * struct rseq_cs is aligned on 4 * 8 bytes to ensure it is always + * contained within a single cache-line. It is usually declared as + * link-time constant data. + */ +struct rseq_cs { + /* Version of this structure. */ + __u32 version; + /* enum rseq_cs_flags */ + __u32 flags; + LINUX_FIELD_u32_u64(start_ip); + /* Offset from start_ip. */ + LINUX_FIELD_u32_u64(post_commit_offset); + LINUX_FIELD_u32_u64(abort_ip); +} __attribute__((aligned(4 * sizeof(__u64)))); + +/* + * struct rseq is aligned on 4 * 8 bytes to ensure it is always + * contained within a single cache-line. + * + * A single struct rseq per thread is allowed. + */ +struct rseq { + /* + * Restartable sequences cpu_id_start field. Updated by the + * kernel, and read by user-space with single-copy atomicity + * semantics. Aligned on 32-bit. Always contains a value in the + * range of possible CPUs, although the value may not be the + * actual current CPU (e.g. if rseq is not initialized). This + * CPU number value should always be compared against the value + * of the cpu_id field before performing a rseq commit or + * returning a value read from a data structure indexed using + * the cpu_id_start value. + */ + __u32 cpu_id_start; + /* + * Restartable sequences cpu_id field. Updated by the kernel, + * and read by user-space with single-copy atomicity semantics. + * Aligned on 32-bit. Values RSEQ_CPU_ID_UNINITIALIZED and + * RSEQ_CPU_ID_REGISTRATION_FAILED have a special semantic: the + * former means "rseq uninitialized", and latter means "rseq + * initialization failed". This value is meant to be read within + * rseq critical sections and compared with the cpu_id_start + * value previously read, before performing the commit instruction, + * or read and compared with the cpu_id_start value before returning + * a value loaded from a data structure indexed using the + * cpu_id_start value. + */ + __u32 cpu_id; + /* + * Restartable sequences rseq_cs field. + * + * Contains NULL when no critical section is active for the current + * thread, or holds a pointer to the currently active struct rseq_cs. + * + * Updated by user-space, which sets the address of the currently + * active rseq_cs at the beginning of assembly instruction sequence + * block, and set to NULL by the kernel when it restarts an assembly + * instruction sequence block, as well as when the kernel detects that + * it is preempting or delivering a signal outside of the range + * targeted by the rseq_cs. Also needs to be set to NULL by user-space + * before reclaiming memory that contains the targeted struct rseq_cs. + * + * Read and set by the kernel with single-copy atomicity semantics. + * Set by user-space with single-copy atomicity semantics. Aligned + * on 64-bit. + */ + LINUX_FIELD_u32_u64(rseq_cs); + /* + * - RSEQ_DISABLE flag: + * + * Fallback fast-track flag for single-stepping. + * Set by user-space if lack of progress is detected. + * Cleared by user-space after rseq finish. + * Read by the kernel. + * - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT + * Inhibit instruction sequence block restart and event + * counter increment on preemption for this thread. + * - RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL + * Inhibit instruction sequence block restart and event + * counter increment on signal delivery for this thread. + * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE + * Inhibit instruction sequence block restart and event + * counter increment on migration for this thread. + */ + __u32 flags; +} __attribute__((aligned(4 * sizeof(__u64)))); + +#endif /* _UAPI_LINUX_RSEQ_H */ diff --git a/init/Kconfig b/init/Kconfig index f013afc74b11..2f7ff760870e 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1417,6 +1417,29 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS config ARCH_HAS_MEMBARRIER_SYNC_CORE bool +config RSEQ + bool "Enable rseq() system call" if EXPERT + default y + depends on HAVE_RSEQ + select MEMBARRIER + help + Enable the restartable sequences system call. It provides a + user-space cache for the current CPU number value, which + speeds up getting the current CPU number from user-space, + as well as an ABI to speed up user-space operations on + per-CPU data. + + If unsure, say Y. + +config DEBUG_RSEQ + default n + bool "Enabled debugging of rseq() system call" if EXPERT + depends on RSEQ && DEBUG_KERNEL + help + Enable extra debugging checks for the rseq system call. + + If unsure, say N. + config EMBEDDED bool "Embedded system" option allnoconfig_y diff --git a/kernel/Makefile b/kernel/Makefile index f85ae5dfa474..7085c841c413 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -113,6 +113,7 @@ obj-$(CONFIG_CONTEXT_TRACKING) += context_tracking.o obj-$(CONFIG_TORTURE_TEST) += torture.o obj-$(CONFIG_HAS_IOMEM) += memremap.o +obj-$(CONFIG_RSEQ) += rseq.o $(obj)/configs.o: $(obj)/config_data.h diff --git a/kernel/fork.c b/kernel/fork.c index a5d21c42acfc..70992bfeba81 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1899,6 +1899,8 @@ static __latent_entropy struct task_struct *copy_process( */ copy_seccomp(p); + rseq_fork(p, clone_flags); + /* * Process group and session signals need to be delivered to just the * parent before the fork or both the parent and the child after the diff --git a/kernel/rseq.c b/kernel/rseq.c new file mode 100644 index 000000000000..ae306f90c514 --- /dev/null +++ b/kernel/rseq.c @@ -0,0 +1,357 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Restartable sequences system call + * + * Copyright (C) 2015, Google, Inc., + * Paul Turner <pjt@google.com> and Andrew Hunter <ahh@google.com> + * Copyright (C) 2015-2018, EfficiOS Inc., + * Mathieu Desnoyers <mathieu.desnoyers@efficios.com> + */ + +#include <linux/sched.h> +#include <linux/uaccess.h> +#include <linux/syscalls.h> +#include <linux/rseq.h> +#include <linux/types.h> +#include <asm/ptrace.h> + +#define CREATE_TRACE_POINTS +#include <trace/events/rseq.h> + +#define RSEQ_CS_PREEMPT_MIGRATE_FLAGS (RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE | \ + RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT) + +/* + * + * Restartable sequences are a lightweight interface that allows + * user-level code to be executed atomically relative to scheduler + * preemption and signal delivery. Typically used for implementing + * per-cpu operations. + * + * It allows user-space to perform update operations on per-cpu data + * without requiring heavy-weight atomic operations. + * + * Detailed algorithm of rseq user-space assembly sequences: + * + * init(rseq_cs) + * cpu = TLS->rseq::cpu_id_start + * [1] TLS->rseq::rseq_cs = rseq_cs + * [start_ip] ---------------------------- + * [2] if (cpu != TLS->rseq::cpu_id) + * goto abort_ip; + * [3] <last_instruction_in_cs> + * [post_commit_ip] ---------------------------- + * + * The address of jump target abort_ip must be outside the critical + * region, i.e.: + * + * [abort_ip] < [start_ip] || [abort_ip] >= [post_commit_ip] + * + * Steps [2]-[3] (inclusive) need to be a sequence of instructions in + * userspace that can handle being interrupted between any of those + * instructions, and then resumed to the abort_ip. + * + * 1. Userspace stores the address of the struct rseq_cs assembly + * block descriptor into the rseq_cs field of the registered + * struct rseq TLS area. This update is performed through a single + * store within the inline assembly instruction sequence. + * [start_ip] + * + * 2. Userspace tests to check whether the current cpu_id field match + * the cpu number loaded before start_ip, branching to abort_ip + * in case of a mismatch. + * + * If the sequence is preempted or interrupted by a signal + * at or after start_ip and before post_commit_ip, then the kernel + * clears TLS->__rseq_abi::rseq_cs, and sets the user-space return + * ip to abort_ip before returning to user-space, so the preempted + * execution resumes at abort_ip. + * + * 3. Userspace critical section final instruction before + * post_commit_ip is the commit. The critical section is + * self-terminating. + * [post_commit_ip] + * + * 4. <success> + * + * On failure at [2], or if interrupted by preempt or signal delivery + * between [1] and [3]: + * + * [abort_ip] + * F1. <failure> + */ + +static int rseq_update_cpu_id(struct task_struct *t) +{ + u32 cpu_id = raw_smp_processor_id(); + + if (__put_user(cpu_id, &t->rseq->cpu_id_start)) + return -EFAULT; + if (__put_user(cpu_id, &t->rseq->cpu_id)) + return -EFAULT; + trace_rseq_update(t); + return 0; +} + +static int rseq_reset_rseq_cpu_id(struct task_struct *t) +{ + u32 cpu_id_start = 0, cpu_id = RSEQ_CPU_ID_UNINITIALIZED; + + /* + * Reset cpu_id_start to its initial state (0). + */ + if (__put_user(cpu_id_start, &t->rseq->cpu_id_start)) + return -EFAULT; + /* + * Reset cpu_id to RSEQ_CPU_ID_UNINITIALIZED, so any user coming + * in after unregistration can figure out that rseq needs to be + * registered again. + */ + if (__put_user(cpu_id, &t->rseq->cpu_id)) + return -EFAULT; + return 0; +} + +static int rseq_get_rseq_cs(struct task_struct *t, struct rseq_cs *rseq_cs) +{ + struct rseq_cs __user *urseq_cs; + unsigned long ptr; + u32 __user *usig; + u32 sig; + int ret; + + ret = __get_user(ptr, &t->rseq->rseq_cs); + if (ret) + return ret; + if (!ptr) { + memset(rseq_cs, 0, sizeof(*rseq_cs)); + return 0; + } + urseq_cs = (struct rseq_cs __user *)ptr; + if (copy_from_user(rseq_cs, urseq_cs, sizeof(*rseq_cs))) + return -EFAULT; + if (rseq_cs->version > 0) + return -EINVAL; + + /* Ensure that abort_ip is not in the critical section. */ + if (rseq_cs->abort_ip - rseq_cs->start_ip < rseq_cs->post_commit_offset) + return -EINVAL; + + usig = (u32 __user *)(rseq_cs->abort_ip - sizeof(u32)); + ret = get_user(sig, usig); + if (ret) + return ret; + + if (current->rseq_sig != sig) { + printk_ratelimited(KERN_WARNING + "Possible attack attempt. Unexpected rseq signature 0x%x, expecting 0x%x (pid=%d, addr=%p).\n", + sig, current->rseq_sig, current->pid, usig); + return -EPERM; + } + return 0; +} + +static int rseq_need_restart(struct task_struct *t, u32 cs_flags) +{ + u32 flags, event_mask; + int ret; + + /* Get thread flags. */ + ret = __get_user(flags, &t->rseq->flags); + if (ret) + return ret; + + /* Take critical section flags into account. */ + flags |= cs_flags; + + /* + * Restart on signal can only be inhibited when restart on + * preempt and restart on migrate are inhibited too. Otherwise, + * a preempted signal handler could fail to restart the prior + * execution context on sigreturn. + */ + if (unlikely((flags & RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL) && + (flags & RSEQ_CS_PREEMPT_MIGRATE_FLAGS) != + RSEQ_CS_PREEMPT_MIGRATE_FLAGS)) + return -EINVAL; + + /* + * Load and clear event mask atomically with respect to + * scheduler preemption. + */ + preempt_disable(); + event_mask = t->rseq_event_mask; + t->rseq_event_mask = 0; + preempt_enable(); + + return !!(event_mask & ~flags); +} + +static int clear_rseq_cs(struct task_struct *t) +{ + /* + * The rseq_cs field is set to NULL on preemption or signal + * delivery on top of rseq assembly block, as well as on top + * of code outside of the rseq assembly block. This performs + * a lazy clear of the rseq_cs field. + * + * Set rseq_cs to NULL with single-copy atomicity. + */ + return __put_user(0UL, &t->rseq->rseq_cs); +} + +/* + * Unsigned comparison will be true when ip >= start_ip, and when + * ip < start_ip + post_commit_offset. + */ +static bool in_rseq_cs(unsigned long ip, struct rseq_cs *rseq_cs) +{ + return ip - rseq_cs->start_ip < rseq_cs->post_commit_offset; +} + +static int rseq_ip_fixup(struct pt_regs *regs) +{ + unsigned long ip = instruction_pointer(regs); + struct task_struct *t = current; + struct rseq_cs rseq_cs; + int ret; + + ret = rseq_get_rseq_cs(t, &rseq_cs); + if (ret) + return ret; + + /* + * Handle potentially not being within a critical section. + * If not nested over a rseq critical section, restart is useless. + * Clear the rseq_cs pointer and return. + */ + if (!in_rseq_cs(ip, &rseq_cs)) + return clear_rseq_cs(t); + ret = rseq_need_restart(t, rseq_cs.flags); + if (ret <= 0) + return ret; + ret = clear_rseq_cs(t); + if (ret) + return ret; + trace_rseq_ip_fixup(ip, rseq_cs.start_ip, rseq_cs.post_commit_offset, + rseq_cs.abort_ip); + instruction_pointer_set(regs, (unsigned long)rseq_cs.abort_ip); + return 0; +} + +/* + * This resume handler must always be executed between any of: + * - preemption, + * - signal delivery, + * and return to user-space. + * + * This is how we can ensure that the entire rseq critical section, + * consisting of both the C part and the assembly instruction sequence, + * will issue the commit instruction only if executed atomically with + * respect to other threads scheduled on the same CPU, and with respect + * to signal handlers. + */ +void __rseq_handle_notify_resume(struct pt_regs *regs) +{ + struct task_struct *t = current; + int ret; + + if (unlikely(t->flags & PF_EXITING)) + return; + if (unlikely(!access_ok(VERIFY_WRITE, t->rseq, sizeof(*t->rseq)))) + goto error; + ret = rseq_ip_fixup(regs); + if (unlikely(ret < 0)) + goto error; + if (unlikely(rseq_update_cpu_id(t))) + goto error; + return; + +error: + force_sig(SIGSEGV, t); +} + +#ifdef CONFIG_DEBUG_RSEQ + +/* + * Terminate the process if a syscall is issued within a restartable + * sequence. + */ +void rseq_syscall(struct pt_regs *regs) +{ + unsigned long ip = instruction_pointer(regs); + struct task_struct *t = current; + struct rseq_cs rseq_cs; + + if (!t->rseq) + return; + if (!access_ok(VERIFY_READ, t->rseq, sizeof(*t->rseq)) || + rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) + force_sig(SIGSEGV, t); +} + +#endif + +/* + * sys_rseq - setup restartable sequences for caller thread. + */ +SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, + int, flags, u32, sig) +{ + int ret; + + if (flags & RSEQ_FLAG_UNREGISTER) { + /* Unregister rseq for current thread. */ + if (current->rseq != rseq || !current->rseq) + return -EINVAL; + if (current->rseq_len != rseq_len) + return -EINVAL; + if (current->rseq_sig != sig) + return -EPERM; + ret = rseq_reset_rseq_cpu_id(current); + if (ret) + return ret; + current->rseq = NULL; + current->rseq_len = 0; + current->rseq_sig = 0; + return 0; + } + + if (unlikely(flags)) + return -EINVAL; + + if (current->rseq) { + /* + * If rseq is already registered, check whether + * the provided address differs from the prior + * one. + */ + if (current->rseq != rseq || current->rseq_len != rseq_len) + return -EINVAL; + if (current->rseq_sig != sig) + return -EPERM; + /* Already registered. */ + return -EBUSY; + } + + /* + * If there was no rseq previously registered, + * ensure the provided rseq is properly aligned and valid. + */ + if (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq)) || + rseq_len != sizeof(*rseq)) + return -EINVAL; + if (!access_ok(VERIFY_WRITE, rseq, rseq_len)) + return -EFAULT; + current->rseq = rseq; + current->rseq_len = rseq_len; + current->rseq_sig = sig; + /* + * If rseq was previously inactive, and has just been + * registered, ensure the cpu_id_start and cpu_id fields + * are updated before returning to user-space. + */ + rseq_set_notify_resume(current); + + return 0; +} diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 5e10aaeebfcc..76d452ef2f0d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1169,6 +1169,7 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu) if (p->sched_class->migrate_task_rq) p->sched_class->migrate_task_rq(p); p->se.nr_migrations++; + rseq_migrate(p); perf_event_task_migrate(p); } @@ -2634,6 +2635,7 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev, { sched_info_switch(rq, prev, next); perf_event_task_sched_out(prev, next); + rseq_preempt(prev); fire_sched_out_preempt_notifiers(prev, next); prepare_task(next); prepare_arch_switch(next); diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 9791364925dc..22f4ef269959 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -430,3 +430,6 @@ COND_SYSCALL(setresgid16); COND_SYSCALL(setresuid16); COND_SYSCALL(setreuid16); COND_SYSCALL(setuid16); + +/* restartable sequence */ +COND_SYSCALL(rseq); -- 2.11.0 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 02/14] rseq: Introduce restartable sequences system call (v13) @ 2018-04-30 22:44 ` Mathieu Desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers, Alexander Viro Expose a new system call allowing each thread to register one userspace memory area to be used as an ABI between kernel and user-space for two purposes: user-space restartable sequences and quick access to read the current CPU number value from user-space. * Restartable sequences (per-cpu atomics) Restartables sequences allow user-space to perform update operations on per-cpu data without requiring heavy-weight atomic operations. The restartable critical sections (percpu atomics) work has been started by Paul Turner and Andrew Hunter. It lets the kernel handle restart of critical sections. [1] [2] The re-implementation proposed here brings a few simplifications to the ABI which facilitates porting to other architectures and speeds up the user-space fast path. Here are benchmarks of various rseq use-cases. Test hardware: arm32: ARMv7 Processor rev 4 (v7l) "Cubietruck", 2-core x86-64: Intel E5-2630 v3@2.40GHz, 16-core, hyperthreading The following benchmarks were all performed on a single thread. * Per-CPU statistic counter increment getcpu+atomic (ns/op) rseq (ns/op) speedup arm32: 344.0 31.4 11.0 x86-64: 15.3 2.0 7.7 * LTTng-UST: write event 32-bit header, 32-bit payload into tracer per-cpu buffer getcpu+atomic (ns/op) rseq (ns/op) speedup arm32: 2502.0 2250.0 1.1 x86-64: 117.4 98.0 1.2 * liburcu percpu: lock-unlock pair, dereference, read/compare word getcpu+atomic (ns/op) rseq (ns/op) speedup arm32: 751.0 128.5 5.8 x86-64: 53.4 28.6 1.9 * jemalloc memory allocator adapted to use rseq Using rseq with per-cpu memory pools in jemalloc at Facebook (based on rseq 2016 implementation): The production workload response-time has 1-2% gain avg. latency, and the P99 overall latency drops by 2-3%. * Reading the current CPU number Speeding up reading the current CPU number on which the caller thread is running is done by keeping the current CPU number up do date within the cpu_id field of the memory area registered by the thread. This is done by making scheduler preemption set the TIF_NOTIFY_RESUME flag on the current thread. Upon return to user-space, a notify-resume handler updates the current CPU value within the registered user-space memory area. User-space can then read the current CPU number directly from memory. Keeping the current cpu id in a memory area shared between kernel and user-space is an improvement over current mechanisms available to read the current CPU number, which has the following benefits over alternative approaches: - 35x speedup on ARM vs system call through glibc - 20x speedup on x86 compared to calling glibc, which calls vdso executing a "lsl" instruction, - 14x speedup on x86 compared to inlined "lsl" instruction, - Unlike vdso approaches, this cpu_id value can be read from an inline assembly, which makes it a useful building block for restartable sequences. - The approach of reading the cpu id through memory mapping shared between kernel and user-space is portable (e.g. ARM), which is not the case for the lsl-based x86 vdso. On x86, yet another possible approach would be to use the gs segment selector to point to user-space per-cpu data. This approach performs similarly to the cpu id cache, but it has two disadvantages: it is not portable, and it is incompatible with existing applications already using the gs segment selector for other purposes. Benchmarking various approaches for reading the current CPU number: ARMv7 Processor rev 4 (v7l) Machine model: Cubietruck - Baseline (empty loop): 8.4 ns - Read CPU from rseq cpu_id: 16.7 ns - Read CPU from rseq cpu_id (lazy register): 19.8 ns - glibc 2.19-0ubuntu6.6 getcpu: 301.8 ns - getcpu system call: 234.9 ns x86-64 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz: - Baseline (empty loop): 0.8 ns - Read CPU from rseq cpu_id: 0.8 ns - Read CPU from rseq cpu_id (lazy register): 0.8 ns - Read using gs segment selector: 0.8 ns - "lsl" inline assembly: 13.0 ns - glibc 2.19-0ubuntu6 getcpu: 16.6 ns - getcpu system call: 53.9 ns - Speed (benchmark taken on v8 of patchset) Running 10 runs of hackbench -l 100000 seems to indicate, contrary to expectations, that enabling CONFIG_RSEQ slightly accelerates the scheduler: Configuration: 2 sockets * 8-core Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz (directly on hardware, hyperthreading disabled in BIOS, energy saving disabled in BIOS, turboboost disabled in BIOS, cpuidle.off=1 kernel parameter), with a Linux v4.6 defconfig+localyesconfig, restartable sequences series applied. * CONFIG_RSEQ=n avg.: 41.37 s std.dev.: 0.36 s * CONFIG_RSEQ=y avg.: 40.46 s std.dev.: 0.33 s - Size On x86-64, between CONFIG_RSEQ=n/y, the text size increase of vmlinux is 567 bytes, and the data size increase of vmlinux is 5696 bytes. [1] https://lwn.net/Articles/650333/ [2] http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdf Link: http://lkml.kernel.org/r/20151027235635.16059.11630.stgit@pjt-glaptop.roam.corp.google.com Link: http://lkml.kernel.org/r/20150624222609.6116.86035.stgit@kitami.mtv.corp.google.com Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Paul Turner <pjt@google.com> CC: Andrew Hunter <ahh@google.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Andi Kleen <andi@firstfloor.org> CC: Dave Watson <davejwatson@fb.com> CC: Chris Lameter <cl@linux.com> CC: Ingo Molnar <mingo@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Ben Maurer <bmaurer@fb.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Russell King <linux@arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas@arm.com> CC: Will Deacon <will.deacon@arm.com> CC: Michael Kerrisk <mtk.manpages@gmail.com> CC: Boqun Feng <boqun.feng@gmail.com> CC: Alexander Viro <viro@zeniv.linux.org.uk> CC: linux-api@vger.kernel.org --- Changes since v1: - Return -1, errno=EINVAL if cpu_cache pointer is not aligned on sizeof(int32_t). - Update man page to describe the pointer alignement requirements and update atomicity guarantees. - Add MAINTAINERS file GETCPU_CACHE entry. - Remove dynamic memory allocation: go back to having a single getcpu_cache entry per thread. Update documentation accordingly. - Rebased on Linux 4.4. Changes since v2: - Introduce a "cmd" argument, along with an enum with GETCPU_CACHE_GET and GETCPU_CACHE_SET. Introduce a uapi header linux/getcpu_cache.h defining this enumeration. - Split resume notifier architecture implementation from the system call wire up in the following arch-specific patches. - Man pages updates. - Handle 32-bit compat pointers. - Simplify handling of getcpu_cache GETCPU_CACHE_SET compiler barrier: set the current cpu cache pointer before doing the cache update, and set it back to NULL if the update fails. Setting it back to NULL on error ensures that no resume notifier will trigger a SIGSEGV if a migration happened concurrently. Changes since v3: - Fix __user annotations in compat code, - Update memory ordering comments. - Rebased on kernel v4.5-rc5. Changes since v4: - Inline getcpu_cache_fork, getcpu_cache_execve, and getcpu_cache_exit. - Add new line between if() and switch() to improve readability. - Added sched switch benchmarks (hackbench) and size overhead comparison to change log. Changes since v5: - Rename "getcpu_cache" to "thread_local_abi", allowing to extend this system call to cover future features such as restartable critical sections. Generalizing this system call ensures that we can add features similar to the cpu_id field within the same cache-line without having to track one pointer per feature within the task struct. - Add a tlabi_nr parameter to the system call, thus allowing to extend the ABI beyond the initial 64-byte structure by registering structures with tlabi_nr greater than 0. The initial ABI structure is associated with tlabi_nr 0. - Rebased on kernel v4.5. Changes since v6: - Integrate "restartable sequences" v2 patchset from Paul Turner. - Add handling of single-stepping purely in user-space, with a fallback to locking after 2 rseq failures to ensure progress, and by exposing a __rseq_table section to debuggers so they know where to put breakpoints when dealing with rseq assembly blocks which can be aborted at any point. - make the code and ABI generic: porting the kernel implementation simply requires to wire up the signal handler and return to user-space hooks, and allocate the syscall number. - extend testing with a fully configurable test program. See param_spinlock_test -h for details. - handling of rseq ENOSYS in user-space, also with a fallback to locking. - modify Paul Turner's rseq ABI to only require a single TLS store on the user-space fast-path, removing the need to populate two additional registers. This is made possible by introducing struct rseq_cs into the ABI to describe a critical section start_ip, post_commit_ip, and abort_ip. - Rebased on kernel v4.7-rc7. Changes since v7: - Documentation updates. - Integrated powerpc architecture support. - Compare rseq critical section start_ip, allows shriking the user-space fast-path code size. - Added Peter Zijlstra, Paul E. McKenney and Boqun Feng as co-maintainers. - Added do_rseq2 and do_rseq_memcpy to test program helper library. - Code cleanup based on review from Peter Zijlstra, Andy Lutomirski and Boqun Feng. - Rebase on kernel v4.8-rc2. Changes since v8: - clear rseq_cs even if non-nested. Speeds up user-space fast path by removing the final "rseq_cs=NULL" assignment. - add enum rseq_flags: critical sections and threads can set migration, preemption and signal "disable" flags to inhibit rseq behavior. - rseq_event_counter needs to be updated with a pre-increment: Otherwise misses an increment after exec (when TLS and in-kernel states are initially 0). Changes since v9: - Update changelog. - Fold instrumentation patch. - check abort-ip signature: Add a signature before the abort-ip landing address. This signature is also received as a new parameter to the rseq system call. The kernel uses it ensures that rseq cannot be used as an exploit vector to redirect execution to arbitrary code. - Use rseq pointer for both register and unregister. This is more symmetric, and eventually allow supporting a linked list of rseq struct per thread if needed in the future. - Unregistration of a rseq structure is now done with RSEQ_FLAG_UNREGISTER. - Remove reference counting. Return "EBUSY" to the caller if rseq is already registered for the current thread. This simplifies implementation while still allowing user-space to perform lazy registration in multi-lib use-cases. (suggested by Ben Maurer) - Clear rseq_cs upon unregister. - Set cpu_id back to -1 on unregister, so if rseq user libraries follow an unregister, and they expect to lazily register rseq, they can do so. - Document rseq_cs clear requirement: JIT should reset the rseq_cs pointer before reclaiming memory of rseq_cs structure. - Introduce rseq_len syscall parameter, rseq_cs version field: Allow keeping track of the registered rseq struct length, for future extensions. Add rseq_cs version as first field. Will allow future extensions. - Use offset and unsigned arithmetic to save a branch: Save a conditional branch when comparing instruction pointer against a rseq_cs descriptor's address range by having post_commit_ip as an offset from start_ip, and using unsigned integer comparison. Suggested by Ben Maurer. - Remove event counter from ABI. Suggested by Andy Lutomirski. - Add INIT_ONSTACK macro: Introduce the RSEQ_FIELD_u32_u64_INIT_ONSTACK() macros to ensure that users correctly initialize the upper bits of RSEQ_FIELD_u32_u64() on their stack to 0 on 32-bit architectures. - Select MEMBARRIER: Allows user-space rseq fast-paths to use the value of cpu_id field (inherently required by the rseq algorithm) to figure out whether membarrier can be expected to be available. This effectively allows user-space fast-paths to remove extra comparisons and branch testing whether membarrier is enabled, and thus whether a full barrier is required (e.g. in userspace RCU implementation after rcu_read_lock/before rcu_read_unlock). - Expose cpu_id_start field: Checking whether the (cpu_id < 0) in the C preparation part of the rseq fast-path brings significant overhead at least on arm32. We can remove this extra comparison by exposing two distinct cpu_id fields in the rseq TLS: The field cpu_id_start always contain a *possible* cpu number, although it may not be the current one if, for instance, rseq is not initialized for the current thread. cpu_id_start is meant to be used in the C code for the pointer chasing to figure out which per-cpu data structure should be passed to the rseq asm sequence. The field cpu_id values -1 means rseq is not initialized, and -2 means initialization failed. That field is used in the rseq asm sequence to confirm that the cpu_id_start value was indeed the current cpu number. It also ends up confirming that rseq is initialized for the current thread, because values -1 and -2 will never match the cpu_id_start possible cpu number values. This allows checking the current CPU number and rseq initialization state with a single comparison on the fast-path. Changes since v10: - Update rseq.c comment, removing reference to event_counter. Changes since v11: - Replace task struct rseq_preempt, rseq_signal, and rseq_migrate bool by u32 rseq_event_mask. - Add missing sys_rseq() asmlinkage declaration to include/linux/syscalls.h. - Copy event mask on process fork, set to 0 on exec and thread-fork. - Cleanups based on review from Peter Zijlstra. - Cleanups based on review from Thomas Gleixner. - Fix: rseq_cs needs to be cleared only when: - Nested over non-critical-section userspace code, - Nested over rseq_cs _and_ handling abort. Basically, we should never clear rseq_cs when the rseq resume to userspace handler is called and it is not handling abort: the problematic case is if any of the __get_user()/__put_user done by the handler trigger a page fault (e.g. page protection done by NUMA page migration work), which triggers preemption: the next call to the rseq resume to userspace handler needs to perform the abort. - Perform rseq event mask updates atomically wrt preemption, - Move rseq_migrate to __set_task_cpu(), thus catching migration scenario that bypass set_task_cpu(): fork and wake_up_new_task. - Merge content of rseq_sched_out into rseq_preempt. There is no need to have two hook sites. Both setting the rseq event mask preempt bit and setting the notify resume thread flag can be done from rseq_preempt(). - Issue rseq_preempt() from fork(), thus ensuring that we handle abort if needed. Changes since v12: - Disallow syscalls from rseq critical sections, - Introduce CONFIG_DEBUG_RSEQ, which terminates processes misusing rseq (e.g. doing a system call within a rseq critical section) with SIGSEGV, - Coding style cleanups based on feedback from Boqun Feng and Peter Zijlstra. Man page associated: RSEQ(2) Linux Programmer's Manual RSEQ(2) NAME rseq - Restartable sequences and cpu number cache SYNOPSIS #include <linux/rseq.h> int rseq(struct rseq * rseq, uint32_t rseq_len, int flags, uint32_t sig); DESCRIPTION The rseq() ABI accelerates user-space operations on per-cpu data by defining a shared data structure ABI between each user- space thread and the kernel. It allows user-space to perform update operations on per-cpu data without requiring heavy-weight atomic operations. The term CPU used in this documentation refers to a hardware execution context. Restartable sequences are atomic with respect to preemption (making it atomic with respect to other threads running on the same CPU), as well as signal delivery (user-space execution contexts nested over the same thread). It is suited for update operations on per-cpu data. It can be used on data structures shared between threads within a process, and on data structures shared between threads across different processes. Some examples of operations that can be accelerated or improved by this ABI: · Memory allocator per-cpu free-lists, · Querying the current CPU number, · Incrementing per-CPU counters, · Modifying data protected by per-CPU spinlocks, · Inserting/removing elements in per-CPU linked-lists, · Writing/reading per-CPU ring buffers content. · Accurately reading performance monitoring unit counters with respect to thread migration. Restartable sequences must not perform system calls. Doing so may result in termination of the process by a segmentation fault. The rseq argument is a pointer to the thread-local rseq struc‐ ture to be shared between kernel and user-space. A NULL rseq value unregisters the current thread rseq structure. The layout of struct rseq is as follows: Structure alignment This structure is aligned on multiples of 32 bytes. Structure size This structure is extensible. Its size is passed as parameter to the rseq system call. Fields cpu_id_start Optimistic cache of the CPU number on which the current thread is running. Its value is guaranteed to always be a possible CPU number, even when rseq is not initial‐ ized. The value it contains should always be confirmed by reading the cpu_id field. cpu_id Cache of the CPU number on which the current thread is running. -1 if uninitialized. rseq_cs The rseq_cs field is a pointer to a struct rseq_cs. Is is NULL when no rseq assembly block critical section is active for the current thread. Setting it to point to a critical section descriptor (struct rseq_cs) marks the beginning of the critical section. flags Flags indicating the restart behavior for the current thread. This is mainly used for debugging purposes. Can be either: · RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT · RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL · RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE The layout of struct rseq_cs version 0 is as follows: Structure alignment This structure is aligned on multiples of 32 bytes. Structure size This structure has a fixed size of 32 bytes. Fields version Version of this structure. flags Flags indicating the restart behavior of this structure. Can be either: · RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT · RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL · RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE start_ip Instruction pointer address of the first instruction of the sequence of consecutive assembly instructions. post_commit_offset Offset (from start_ip address) of the address after the last instruction of the sequence of consecutive assembly instructions. abort_ip Instruction pointer address where to move the execution flow in case of abort of the sequence of consecutive assembly instructions. The rseq_len argument is the size of the struct rseq to regis‐ ter. The flags argument is 0 for registration, and RSEQ_FLAG_UNREG‐ ISTER for unregistration. The sig argument is the 32-bit signature to be expected before the abort handler code. A single library per process should keep the rseq structure in a thread-local storage variable. The cpu_id field should be initialized to -1, and the cpu_id_start field should be ini‐ tialized to a possible CPU value (typically 0). Each thread is responsible for registering and unregistering its rseq structure. No more than one rseq structure address can be registered per thread at a given time. In a typical usage scenario, the thread registering the rseq structure will be performing loads and stores from/to that structure. It is however also allowed to read that structure from other threads. The rseq field updates performed by the kernel provide relaxed atomicity semantics, which guarantee that other threads performing relaxed atomic reads of the cpu number cache will always observe a consistent value. RETURN VALUE A return value of 0 indicates success. On error, -1 is returned, and errno is set appropriately. ERRORS EINVAL Either flags contains an invalid value, or rseq contains an address which is not appropriately aligned, or rseq_len contains a size that does not match the size received on registration. ENOSYS The rseq() system call is not implemented by this ker‐ nel. EFAULT rseq is an invalid address. EBUSY Restartable sequence is already registered for this thread. EPERM The sig argument on unregistration does not match the signature received on registration. VERSIONS The rseq() system call was added in Linux 4.X (TODO). CONFORMING TO rseq() is Linux-specific. SEE ALSO sched_getcpu(3) Linux 2017-11-06 RSEQ(2) --- MAINTAINERS | 11 ++ arch/Kconfig | 7 + fs/exec.c | 1 + include/linux/sched.h | 134 +++++++++++++++++ include/linux/syscalls.h | 4 +- include/trace/events/rseq.h | 57 +++++++ include/uapi/linux/rseq.h | 133 +++++++++++++++++ init/Kconfig | 23 +++ kernel/Makefile | 1 + kernel/fork.c | 2 + kernel/rseq.c | 357 ++++++++++++++++++++++++++++++++++++++++++++ kernel/sched/core.c | 2 + kernel/sys_ni.c | 3 + 13 files changed, 734 insertions(+), 1 deletion(-) create mode 100644 include/trace/events/rseq.h create mode 100644 include/uapi/linux/rseq.h create mode 100644 kernel/rseq.c diff --git a/MAINTAINERS b/MAINTAINERS index 79bb02ff812f..4d61ce154dfc 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11981,6 +11981,17 @@ F: include/dt-bindings/reset/ F: include/linux/reset.h F: include/linux/reset-controller.h +RESTARTABLE SEQUENCES SUPPORT +M: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> +M: Peter Zijlstra <peterz@infradead.org> +M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> +M: Boqun Feng <boqun.feng@gmail.com> +L: linux-kernel@vger.kernel.org +S: Supported +F: kernel/rseq.c +F: include/uapi/linux/rseq.h +F: include/trace/events/rseq.h + RFKILL M: Johannes Berg <johannes@sipsolutions.net> L: linux-wireless@vger.kernel.org diff --git a/arch/Kconfig b/arch/Kconfig index 8e0d665c8d53..43b5e103c1b2 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -272,6 +272,13 @@ config HAVE_REGS_AND_STACK_ACCESS_API declared in asm/ptrace.h For example the kprobes-based event tracer needs this API. +config HAVE_RSEQ + bool + depends on HAVE_REGS_AND_STACK_ACCESS_API + help + This symbol should be selected by an architecture if it + supports an implementation of restartable sequences. + config HAVE_CLK bool help diff --git a/fs/exec.c b/fs/exec.c index 183059c427b9..2c3911612b22 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1822,6 +1822,7 @@ static int do_execveat_common(int fd, struct filename *filename, current->fs->in_exec = 0; current->in_execve = 0; membarrier_execve(current); + rseq_execve(current); acct_update_integrals(current); task_numa_free(current); free_bprm(bprm); diff --git a/include/linux/sched.h b/include/linux/sched.h index b3d697f3b573..496a0b25a42d 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -27,6 +27,7 @@ #include <linux/signal_types.h> #include <linux/mm_types_task.h> #include <linux/task_io_accounting.h> +#include <linux/rseq.h> /* task_struct member predeclarations (sorted alphabetically): */ struct audit_context; @@ -1007,6 +1008,17 @@ struct task_struct { unsigned long numa_pages_migrated; #endif /* CONFIG_NUMA_BALANCING */ +#ifdef CONFIG_RSEQ + struct rseq __user *rseq; + u32 rseq_len; + u32 rseq_sig; + /* + * RmW on rseq_event_mask must be performed atomically + * with respect to preemption. + */ + unsigned long rseq_event_mask; +#endif + struct tlbflush_unmap_batch tlb_ubc; struct rcu_head rcu; @@ -1716,4 +1728,126 @@ extern long sched_getaffinity(pid_t pid, struct cpumask *mask); #define TASK_SIZE_OF(tsk) TASK_SIZE #endif +#ifdef CONFIG_RSEQ + +/* + * Map the event mask on the user-space ABI enum rseq_cs_flags + * for direct mask checks. + */ +enum rseq_event_mask_bits { + RSEQ_EVENT_PREEMPT_BIT = RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT_BIT, + RSEQ_EVENT_SIGNAL_BIT = RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL_BIT, + RSEQ_EVENT_MIGRATE_BIT = RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT, +}; + +enum rseq_event_mask { + RSEQ_EVENT_PREEMPT = (1U << RSEQ_EVENT_PREEMPT_BIT), + RSEQ_EVENT_SIGNAL = (1U << RSEQ_EVENT_SIGNAL_BIT), + RSEQ_EVENT_MIGRATE = (1U << RSEQ_EVENT_MIGRATE_BIT), +}; + +static inline void rseq_set_notify_resume(struct task_struct *t) +{ + if (t->rseq) + set_tsk_thread_flag(t, TIF_NOTIFY_RESUME); +} + +void __rseq_handle_notify_resume(struct pt_regs *regs); + +static inline void rseq_handle_notify_resume(struct pt_regs *regs) +{ + if (current->rseq) + __rseq_handle_notify_resume(regs); +} + +static inline void rseq_signal_deliver(struct pt_regs *regs) +{ + preempt_disable(); + __set_bit(RSEQ_EVENT_SIGNAL_BIT, ¤t->rseq_event_mask); + preempt_enable(); + rseq_handle_notify_resume(regs); +} + +/* rseq_preempt() requires preemption to be disabled. */ +static inline void rseq_preempt(struct task_struct *t) +{ + __set_bit(RSEQ_EVENT_PREEMPT_BIT, &t->rseq_event_mask); + rseq_set_notify_resume(t); +} + +/* rseq_migrate() requires preemption to be disabled. */ +static inline void rseq_migrate(struct task_struct *t) +{ + __set_bit(RSEQ_EVENT_MIGRATE_BIT, &t->rseq_event_mask); + rseq_set_notify_resume(t); +} + +/* + * If parent process has a registered restartable sequences area, the + * child inherits. Only applies when forking a process, not a thread. In + * case a parent fork() in the middle of a restartable sequence, set the + * resume notifier to force the child to retry. + */ +static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags) +{ + if (clone_flags & CLONE_THREAD) { + t->rseq = NULL; + t->rseq_len = 0; + t->rseq_sig = 0; + t->rseq_event_mask = 0; + } else { + t->rseq = current->rseq; + t->rseq_len = current->rseq_len; + t->rseq_sig = current->rseq_sig; + t->rseq_event_mask = current->rseq_event_mask; + rseq_preempt(t); + } +} + +static inline void rseq_execve(struct task_struct *t) +{ + t->rseq = NULL; + t->rseq_len = 0; + t->rseq_sig = 0; + t->rseq_event_mask = 0; +} + +#else + +static inline void rseq_set_notify_resume(struct task_struct *t) +{ +} +static inline void rseq_handle_notify_resume(struct pt_regs *regs) +{ +} +static inline void rseq_signal_deliver(struct pt_regs *regs) +{ +} +static inline void rseq_preempt(struct task_struct *t) +{ +} +static inline void rseq_migrate(struct task_struct *t) +{ +} +static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags) +{ +} +static inline void rseq_execve(struct task_struct *t) +{ +} + +#endif + +#ifdef CONFIG_DEBUG_RSEQ + +void rseq_syscall(struct pt_regs *regs); + +#else + +static inline void rseq_syscall(struct pt_regs *regs) +{ +} + +#endif + #endif diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 70fcda1a9049..a16d72c70f28 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -66,6 +66,7 @@ struct old_linux_dirent; struct perf_event_attr; struct file_handle; struct sigaltstack; +struct rseq; union bpf_attr; #include <linux/types.h> @@ -890,7 +891,8 @@ asmlinkage long sys_pkey_alloc(unsigned long flags, unsigned long init_val); asmlinkage long sys_pkey_free(int pkey); asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags, unsigned mask, struct statx __user *buffer); - +asmlinkage long sys_rseq(struct rseq __user *rseq, uint32_t rseq_len, + int flags, uint32_t sig); /* * Architecture-specific system calls diff --git a/include/trace/events/rseq.h b/include/trace/events/rseq.h new file mode 100644 index 000000000000..a04a64bc1a00 --- /dev/null +++ b/include/trace/events/rseq.h @@ -0,0 +1,57 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM rseq + +#if !defined(_TRACE_RSEQ_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_RSEQ_H + +#include <linux/tracepoint.h> +#include <linux/types.h> + +TRACE_EVENT(rseq_update, + + TP_PROTO(struct task_struct *t), + + TP_ARGS(t), + + TP_STRUCT__entry( + __field(s32, cpu_id) + ), + + TP_fast_assign( + __entry->cpu_id = raw_smp_processor_id(); + ), + + TP_printk("cpu_id=%d", __entry->cpu_id) +); + +TRACE_EVENT(rseq_ip_fixup, + + TP_PROTO(unsigned long regs_ip, unsigned long start_ip, + unsigned long post_commit_offset, unsigned long abort_ip), + + TP_ARGS(regs_ip, start_ip, post_commit_offset, abort_ip), + + TP_STRUCT__entry( + __field(unsigned long, regs_ip) + __field(unsigned long, start_ip) + __field(unsigned long, post_commit_offset) + __field(unsigned long, abort_ip) + ), + + TP_fast_assign( + __entry->regs_ip = regs_ip; + __entry->start_ip = start_ip; + __entry->post_commit_offset = post_commit_offset; + __entry->abort_ip = abort_ip; + ), + + TP_printk("regs_ip=0x%lx start_ip=0x%lx post_commit_offset=%lu abort_ip=0x%lx", + __entry->regs_ip, __entry->start_ip, + __entry->post_commit_offset, __entry->abort_ip) +); + +#endif /* _TRACE_SOCK_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h new file mode 100644 index 000000000000..d620fa43756c --- /dev/null +++ b/include/uapi/linux/rseq.h @@ -0,0 +1,133 @@ +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */ +#ifndef _UAPI_LINUX_RSEQ_H +#define _UAPI_LINUX_RSEQ_H + +/* + * linux/rseq.h + * + * Restartable sequences system call API + * + * Copyright (c) 2015-2018 Mathieu Desnoyers <mathieu.desnoyers@efficios.com> + */ + +#ifdef __KERNEL__ +# include <linux/types.h> +#else +# include <stdint.h> +#endif + +#include <linux/types_32_64.h> + +enum rseq_cpu_id_state { + RSEQ_CPU_ID_UNINITIALIZED = -1, + RSEQ_CPU_ID_REGISTRATION_FAILED = -2, +}; + +enum rseq_flags { + RSEQ_FLAG_UNREGISTER = (1 << 0), +}; + +enum rseq_cs_flags_bit { + RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT_BIT = 0, + RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL_BIT = 1, + RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT = 2, +}; + +enum rseq_cs_flags { + RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT = + (1U << RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT_BIT), + RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL = + (1U << RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL_BIT), + RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE = + (1U << RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT), +}; + +/* + * struct rseq_cs is aligned on 4 * 8 bytes to ensure it is always + * contained within a single cache-line. It is usually declared as + * link-time constant data. + */ +struct rseq_cs { + /* Version of this structure. */ + __u32 version; + /* enum rseq_cs_flags */ + __u32 flags; + LINUX_FIELD_u32_u64(start_ip); + /* Offset from start_ip. */ + LINUX_FIELD_u32_u64(post_commit_offset); + LINUX_FIELD_u32_u64(abort_ip); +} __attribute__((aligned(4 * sizeof(__u64)))); + +/* + * struct rseq is aligned on 4 * 8 bytes to ensure it is always + * contained within a single cache-line. + * + * A single struct rseq per thread is allowed. + */ +struct rseq { + /* + * Restartable sequences cpu_id_start field. Updated by the + * kernel, and read by user-space with single-copy atomicity + * semantics. Aligned on 32-bit. Always contains a value in the + * range of possible CPUs, although the value may not be the + * actual current CPU (e.g. if rseq is not initialized). This + * CPU number value should always be compared against the value + * of the cpu_id field before performing a rseq commit or + * returning a value read from a data structure indexed using + * the cpu_id_start value. + */ + __u32 cpu_id_start; + /* + * Restartable sequences cpu_id field. Updated by the kernel, + * and read by user-space with single-copy atomicity semantics. + * Aligned on 32-bit. Values RSEQ_CPU_ID_UNINITIALIZED and + * RSEQ_CPU_ID_REGISTRATION_FAILED have a special semantic: the + * former means "rseq uninitialized", and latter means "rseq + * initialization failed". This value is meant to be read within + * rseq critical sections and compared with the cpu_id_start + * value previously read, before performing the commit instruction, + * or read and compared with the cpu_id_start value before returning + * a value loaded from a data structure indexed using the + * cpu_id_start value. + */ + __u32 cpu_id; + /* + * Restartable sequences rseq_cs field. + * + * Contains NULL when no critical section is active for the current + * thread, or holds a pointer to the currently active struct rseq_cs. + * + * Updated by user-space, which sets the address of the currently + * active rseq_cs at the beginning of assembly instruction sequence + * block, and set to NULL by the kernel when it restarts an assembly + * instruction sequence block, as well as when the kernel detects that + * it is preempting or delivering a signal outside of the range + * targeted by the rseq_cs. Also needs to be set to NULL by user-space + * before reclaiming memory that contains the targeted struct rseq_cs. + * + * Read and set by the kernel with single-copy atomicity semantics. + * Set by user-space with single-copy atomicity semantics. Aligned + * on 64-bit. + */ + LINUX_FIELD_u32_u64(rseq_cs); + /* + * - RSEQ_DISABLE flag: + * + * Fallback fast-track flag for single-stepping. + * Set by user-space if lack of progress is detected. + * Cleared by user-space after rseq finish. + * Read by the kernel. + * - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT + * Inhibit instruction sequence block restart and event + * counter increment on preemption for this thread. + * - RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL + * Inhibit instruction sequence block restart and event + * counter increment on signal delivery for this thread. + * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE + * Inhibit instruction sequence block restart and event + * counter increment on migration for this thread. + */ + __u32 flags; +} __attribute__((aligned(4 * sizeof(__u64)))); + +#endif /* _UAPI_LINUX_RSEQ_H */ diff --git a/init/Kconfig b/init/Kconfig index f013afc74b11..2f7ff760870e 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1417,6 +1417,29 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS config ARCH_HAS_MEMBARRIER_SYNC_CORE bool +config RSEQ + bool "Enable rseq() system call" if EXPERT + default y + depends on HAVE_RSEQ + select MEMBARRIER + help + Enable the restartable sequences system call. It provides a + user-space cache for the current CPU number value, which + speeds up getting the current CPU number from user-space, + as well as an ABI to speed up user-space operations on + per-CPU data. + + If unsure, say Y. + +config DEBUG_RSEQ + default n + bool "Enabled debugging of rseq() system call" if EXPERT + depends on RSEQ && DEBUG_KERNEL + help + Enable extra debugging checks for the rseq system call. + + If unsure, say N. + config EMBEDDED bool "Embedded system" option allnoconfig_y diff --git a/kernel/Makefile b/kernel/Makefile index f85ae5dfa474..7085c841c413 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -113,6 +113,7 @@ obj-$(CONFIG_CONTEXT_TRACKING) += context_tracking.o obj-$(CONFIG_TORTURE_TEST) += torture.o obj-$(CONFIG_HAS_IOMEM) += memremap.o +obj-$(CONFIG_RSEQ) += rseq.o $(obj)/configs.o: $(obj)/config_data.h diff --git a/kernel/fork.c b/kernel/fork.c index a5d21c42acfc..70992bfeba81 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1899,6 +1899,8 @@ static __latent_entropy struct task_struct *copy_process( */ copy_seccomp(p); + rseq_fork(p, clone_flags); + /* * Process group and session signals need to be delivered to just the * parent before the fork or both the parent and the child after the diff --git a/kernel/rseq.c b/kernel/rseq.c new file mode 100644 index 000000000000..ae306f90c514 --- /dev/null +++ b/kernel/rseq.c @@ -0,0 +1,357 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Restartable sequences system call + * + * Copyright (C) 2015, Google, Inc., + * Paul Turner <pjt@google.com> and Andrew Hunter <ahh@google.com> + * Copyright (C) 2015-2018, EfficiOS Inc., + * Mathieu Desnoyers <mathieu.desnoyers@efficios.com> + */ + +#include <linux/sched.h> +#include <linux/uaccess.h> +#include <linux/syscalls.h> +#include <linux/rseq.h> +#include <linux/types.h> +#include <asm/ptrace.h> + +#define CREATE_TRACE_POINTS +#include <trace/events/rseq.h> + +#define RSEQ_CS_PREEMPT_MIGRATE_FLAGS (RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE | \ + RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT) + +/* + * + * Restartable sequences are a lightweight interface that allows + * user-level code to be executed atomically relative to scheduler + * preemption and signal delivery. Typically used for implementing + * per-cpu operations. + * + * It allows user-space to perform update operations on per-cpu data + * without requiring heavy-weight atomic operations. + * + * Detailed algorithm of rseq user-space assembly sequences: + * + * init(rseq_cs) + * cpu = TLS->rseq::cpu_id_start + * [1] TLS->rseq::rseq_cs = rseq_cs + * [start_ip] ---------------------------- + * [2] if (cpu != TLS->rseq::cpu_id) + * goto abort_ip; + * [3] <last_instruction_in_cs> + * [post_commit_ip] ---------------------------- + * + * The address of jump target abort_ip must be outside the critical + * region, i.e.: + * + * [abort_ip] < [start_ip] || [abort_ip] >= [post_commit_ip] + * + * Steps [2]-[3] (inclusive) need to be a sequence of instructions in + * userspace that can handle being interrupted between any of those + * instructions, and then resumed to the abort_ip. + * + * 1. Userspace stores the address of the struct rseq_cs assembly + * block descriptor into the rseq_cs field of the registered + * struct rseq TLS area. This update is performed through a single + * store within the inline assembly instruction sequence. + * [start_ip] + * + * 2. Userspace tests to check whether the current cpu_id field match + * the cpu number loaded before start_ip, branching to abort_ip + * in case of a mismatch. + * + * If the sequence is preempted or interrupted by a signal + * at or after start_ip and before post_commit_ip, then the kernel + * clears TLS->__rseq_abi::rseq_cs, and sets the user-space return + * ip to abort_ip before returning to user-space, so the preempted + * execution resumes at abort_ip. + * + * 3. Userspace critical section final instruction before + * post_commit_ip is the commit. The critical section is + * self-terminating. + * [post_commit_ip] + * + * 4. <success> + * + * On failure at [2], or if interrupted by preempt or signal delivery + * between [1] and [3]: + * + * [abort_ip] + * F1. <failure> + */ + +static int rseq_update_cpu_id(struct task_struct *t) +{ + u32 cpu_id = raw_smp_processor_id(); + + if (__put_user(cpu_id, &t->rseq->cpu_id_start)) + return -EFAULT; + if (__put_user(cpu_id, &t->rseq->cpu_id)) + return -EFAULT; + trace_rseq_update(t); + return 0; +} + +static int rseq_reset_rseq_cpu_id(struct task_struct *t) +{ + u32 cpu_id_start = 0, cpu_id = RSEQ_CPU_ID_UNINITIALIZED; + + /* + * Reset cpu_id_start to its initial state (0). + */ + if (__put_user(cpu_id_start, &t->rseq->cpu_id_start)) + return -EFAULT; + /* + * Reset cpu_id to RSEQ_CPU_ID_UNINITIALIZED, so any user coming + * in after unregistration can figure out that rseq needs to be + * registered again. + */ + if (__put_user(cpu_id, &t->rseq->cpu_id)) + return -EFAULT; + return 0; +} + +static int rseq_get_rseq_cs(struct task_struct *t, struct rseq_cs *rseq_cs) +{ + struct rseq_cs __user *urseq_cs; + unsigned long ptr; + u32 __user *usig; + u32 sig; + int ret; + + ret = __get_user(ptr, &t->rseq->rseq_cs); + if (ret) + return ret; + if (!ptr) { + memset(rseq_cs, 0, sizeof(*rseq_cs)); + return 0; + } + urseq_cs = (struct rseq_cs __user *)ptr; + if (copy_from_user(rseq_cs, urseq_cs, sizeof(*rseq_cs))) + return -EFAULT; + if (rseq_cs->version > 0) + return -EINVAL; + + /* Ensure that abort_ip is not in the critical section. */ + if (rseq_cs->abort_ip - rseq_cs->start_ip < rseq_cs->post_commit_offset) + return -EINVAL; + + usig = (u32 __user *)(rseq_cs->abort_ip - sizeof(u32)); + ret = get_user(sig, usig); + if (ret) + return ret; + + if (current->rseq_sig != sig) { + printk_ratelimited(KERN_WARNING + "Possible attack attempt. Unexpected rseq signature 0x%x, expecting 0x%x (pid=%d, addr=%p).\n", + sig, current->rseq_sig, current->pid, usig); + return -EPERM; + } + return 0; +} + +static int rseq_need_restart(struct task_struct *t, u32 cs_flags) +{ + u32 flags, event_mask; + int ret; + + /* Get thread flags. */ + ret = __get_user(flags, &t->rseq->flags); + if (ret) + return ret; + + /* Take critical section flags into account. */ + flags |= cs_flags; + + /* + * Restart on signal can only be inhibited when restart on + * preempt and restart on migrate are inhibited too. Otherwise, + * a preempted signal handler could fail to restart the prior + * execution context on sigreturn. + */ + if (unlikely((flags & RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL) && + (flags & RSEQ_CS_PREEMPT_MIGRATE_FLAGS) != + RSEQ_CS_PREEMPT_MIGRATE_FLAGS)) + return -EINVAL; + + /* + * Load and clear event mask atomically with respect to + * scheduler preemption. + */ + preempt_disable(); + event_mask = t->rseq_event_mask; + t->rseq_event_mask = 0; + preempt_enable(); + + return !!(event_mask & ~flags); +} + +static int clear_rseq_cs(struct task_struct *t) +{ + /* + * The rseq_cs field is set to NULL on preemption or signal + * delivery on top of rseq assembly block, as well as on top + * of code outside of the rseq assembly block. This performs + * a lazy clear of the rseq_cs field. + * + * Set rseq_cs to NULL with single-copy atomicity. + */ + return __put_user(0UL, &t->rseq->rseq_cs); +} + +/* + * Unsigned comparison will be true when ip >= start_ip, and when + * ip < start_ip + post_commit_offset. + */ +static bool in_rseq_cs(unsigned long ip, struct rseq_cs *rseq_cs) +{ + return ip - rseq_cs->start_ip < rseq_cs->post_commit_offset; +} + +static int rseq_ip_fixup(struct pt_regs *regs) +{ + unsigned long ip = instruction_pointer(regs); + struct task_struct *t = current; + struct rseq_cs rseq_cs; + int ret; + + ret = rseq_get_rseq_cs(t, &rseq_cs); + if (ret) + return ret; + + /* + * Handle potentially not being within a critical section. + * If not nested over a rseq critical section, restart is useless. + * Clear the rseq_cs pointer and return. + */ + if (!in_rseq_cs(ip, &rseq_cs)) + return clear_rseq_cs(t); + ret = rseq_need_restart(t, rseq_cs.flags); + if (ret <= 0) + return ret; + ret = clear_rseq_cs(t); + if (ret) + return ret; + trace_rseq_ip_fixup(ip, rseq_cs.start_ip, rseq_cs.post_commit_offset, + rseq_cs.abort_ip); + instruction_pointer_set(regs, (unsigned long)rseq_cs.abort_ip); + return 0; +} + +/* + * This resume handler must always be executed between any of: + * - preemption, + * - signal delivery, + * and return to user-space. + * + * This is how we can ensure that the entire rseq critical section, + * consisting of both the C part and the assembly instruction sequence, + * will issue the commit instruction only if executed atomically with + * respect to other threads scheduled on the same CPU, and with respect + * to signal handlers. + */ +void __rseq_handle_notify_resume(struct pt_regs *regs) +{ + struct task_struct *t = current; + int ret; + + if (unlikely(t->flags & PF_EXITING)) + return; + if (unlikely(!access_ok(VERIFY_WRITE, t->rseq, sizeof(*t->rseq)))) + goto error; + ret = rseq_ip_fixup(regs); + if (unlikely(ret < 0)) + goto error; + if (unlikely(rseq_update_cpu_id(t))) + goto error; + return; + +error: + force_sig(SIGSEGV, t); +} + +#ifdef CONFIG_DEBUG_RSEQ + +/* + * Terminate the process if a syscall is issued within a restartable + * sequence. + */ +void rseq_syscall(struct pt_regs *regs) +{ + unsigned long ip = instruction_pointer(regs); + struct task_struct *t = current; + struct rseq_cs rseq_cs; + + if (!t->rseq) + return; + if (!access_ok(VERIFY_READ, t->rseq, sizeof(*t->rseq)) || + rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) + force_sig(SIGSEGV, t); +} + +#endif + +/* + * sys_rseq - setup restartable sequences for caller thread. + */ +SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, + int, flags, u32, sig) +{ + int ret; + + if (flags & RSEQ_FLAG_UNREGISTER) { + /* Unregister rseq for current thread. */ + if (current->rseq != rseq || !current->rseq) + return -EINVAL; + if (current->rseq_len != rseq_len) + return -EINVAL; + if (current->rseq_sig != sig) + return -EPERM; + ret = rseq_reset_rseq_cpu_id(current); + if (ret) + return ret; + current->rseq = NULL; + current->rseq_len = 0; + current->rseq_sig = 0; + return 0; + } + + if (unlikely(flags)) + return -EINVAL; + + if (current->rseq) { + /* + * If rseq is already registered, check whether + * the provided address differs from the prior + * one. + */ + if (current->rseq != rseq || current->rseq_len != rseq_len) + return -EINVAL; + if (current->rseq_sig != sig) + return -EPERM; + /* Already registered. */ + return -EBUSY; + } + + /* + * If there was no rseq previously registered, + * ensure the provided rseq is properly aligned and valid. + */ + if (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq)) || + rseq_len != sizeof(*rseq)) + return -EINVAL; + if (!access_ok(VERIFY_WRITE, rseq, rseq_len)) + return -EFAULT; + current->rseq = rseq; + current->rseq_len = rseq_len; + current->rseq_sig = sig; + /* + * If rseq was previously inactive, and has just been + * registered, ensure the cpu_id_start and cpu_id fields + * are updated before returning to user-space. + */ + rseq_set_notify_resume(current); + + return 0; +} diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 5e10aaeebfcc..76d452ef2f0d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1169,6 +1169,7 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu) if (p->sched_class->migrate_task_rq) p->sched_class->migrate_task_rq(p); p->se.nr_migrations++; + rseq_migrate(p); perf_event_task_migrate(p); } @@ -2634,6 +2635,7 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev, { sched_info_switch(rq, prev, next); perf_event_task_sched_out(prev, next); + rseq_preempt(prev); fire_sched_out_preempt_notifiers(prev, next); prepare_task(next); prepare_arch_switch(next); diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 9791364925dc..22f4ef269959 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -430,3 +430,6 @@ COND_SYSCALL(setresgid16); COND_SYSCALL(setresuid16); COND_SYSCALL(setreuid16); COND_SYSCALL(setuid16); + +/* restartable sequence */ +COND_SYSCALL(rseq); -- 2.11.0 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* Re: [PATCH 02/14] rseq: Introduce restartable sequences system call (v13) 2018-04-30 22:44 ` Mathieu Desnoyers @ 2018-05-16 16:24 ` Peter Zijlstra -1 siblings, 0 replies; 105+ messages in thread From: Peter Zijlstra @ 2018-05-16 16:24 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Alexander Viro On Mon, Apr 30, 2018 at 06:44:21PM -0400, Mathieu Desnoyers wrote: > Expose a new system call allowing each thread to register one userspace > memory area to be used as an ABI between kernel and user-space for two > purposes: user-space restartable sequences and quick access to read the > current CPU number value from user-space. > --- > MAINTAINERS | 11 ++ > arch/Kconfig | 7 + > fs/exec.c | 1 + > include/linux/sched.h | 134 +++++++++++++++++ > include/linux/syscalls.h | 4 +- > include/trace/events/rseq.h | 57 +++++++ > include/uapi/linux/rseq.h | 133 +++++++++++++++++ > init/Kconfig | 23 +++ > kernel/Makefile | 1 + > kernel/fork.c | 2 + > kernel/rseq.c | 357 ++++++++++++++++++++++++++++++++++++++++++++ > kernel/sched/core.c | 2 + > kernel/sys_ni.c | 3 + > 13 files changed, 734 insertions(+), 1 deletion(-) > create mode 100644 include/trace/events/rseq.h > create mode 100644 include/uapi/linux/rseq.h > create mode 100644 kernel/rseq.c > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 02/14] rseq: Introduce restartable sequences system call (v13) @ 2018-05-16 16:24 ` Peter Zijlstra 0 siblings, 0 replies; 105+ messages in thread From: Peter Zijlstra @ 2018-05-16 16:24 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon On Mon, Apr 30, 2018 at 06:44:21PM -0400, Mathieu Desnoyers wrote: > Expose a new system call allowing each thread to register one userspace > memory area to be used as an ABI between kernel and user-space for two > purposes: user-space restartable sequences and quick access to read the > current CPU number value from user-space. > --- > MAINTAINERS | 11 ++ > arch/Kconfig | 7 + > fs/exec.c | 1 + > include/linux/sched.h | 134 +++++++++++++++++ > include/linux/syscalls.h | 4 +- > include/trace/events/rseq.h | 57 +++++++ > include/uapi/linux/rseq.h | 133 +++++++++++++++++ > init/Kconfig | 23 +++ > kernel/Makefile | 1 + > kernel/fork.c | 2 + > kernel/rseq.c | 357 ++++++++++++++++++++++++++++++++++++++++++++ > kernel/sched/core.c | 2 + > kernel/sys_ni.c | 3 + > 13 files changed, 734 insertions(+), 1 deletion(-) > create mode 100644 include/trace/events/rseq.h > create mode 100644 include/uapi/linux/rseq.h > create mode 100644 kernel/rseq.c > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 02/14] rseq: Introduce restartable sequences system call (v13) 2018-05-16 16:24 ` Peter Zijlstra @ 2018-05-16 20:18 ` Mathieu Desnoyers -1 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-16 20:18 UTC (permalink / raw) To: Peter Zijlstra Cc: Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Alexander Viro ----- On May 16, 2018, at 12:24 PM, Peter Zijlstra peterz@infradead.org wrote: > On Mon, Apr 30, 2018 at 06:44:21PM -0400, Mathieu Desnoyers wrote: >> Expose a new system call allowing each thread to register one userspace >> memory area to be used as an ABI between kernel and user-space for two >> purposes: user-space restartable sequences and quick access to read the >> current CPU number value from user-space. > >> --- >> MAINTAINERS | 11 ++ >> arch/Kconfig | 7 + >> fs/exec.c | 1 + >> include/linux/sched.h | 134 +++++++++++++++++ >> include/linux/syscalls.h | 4 +- >> include/trace/events/rseq.h | 57 +++++++ >> include/uapi/linux/rseq.h | 133 +++++++++++++++++ >> init/Kconfig | 23 +++ >> kernel/Makefile | 1 + >> kernel/fork.c | 2 + >> kernel/rseq.c | 357 ++++++++++++++++++++++++++++++++++++++++++++ >> kernel/sched/core.c | 2 + >> kernel/sys_ni.c | 3 + >> 13 files changed, 734 insertions(+), 1 deletion(-) >> create mode 100644 include/trace/events/rseq.h >> create mode 100644 include/uapi/linux/rseq.h >> create mode 100644 kernel/rseq.c >> > > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Thanks Peter ! Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 02/14] rseq: Introduce restartable sequences system call (v13) @ 2018-05-16 20:18 ` Mathieu Desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-16 20:18 UTC (permalink / raw) To: Peter Zijlstra Cc: Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon ----- On May 16, 2018, at 12:24 PM, Peter Zijlstra peterz@infradead.org wrote: > On Mon, Apr 30, 2018 at 06:44:21PM -0400, Mathieu Desnoyers wrote: >> Expose a new system call allowing each thread to register one userspace >> memory area to be used as an ABI between kernel and user-space for two >> purposes: user-space restartable sequences and quick access to read the >> current CPU number value from user-space. > >> --- >> MAINTAINERS | 11 ++ >> arch/Kconfig | 7 + >> fs/exec.c | 1 + >> include/linux/sched.h | 134 +++++++++++++++++ >> include/linux/syscalls.h | 4 +- >> include/trace/events/rseq.h | 57 +++++++ >> include/uapi/linux/rseq.h | 133 +++++++++++++++++ >> init/Kconfig | 23 +++ >> kernel/Makefile | 1 + >> kernel/fork.c | 2 + >> kernel/rseq.c | 357 ++++++++++++++++++++++++++++++++++++++++++++ >> kernel/sched/core.c | 2 + >> kernel/sys_ni.c | 3 + >> 13 files changed, 734 insertions(+), 1 deletion(-) >> create mode 100644 include/trace/events/rseq.h >> create mode 100644 include/uapi/linux/rseq.h >> create mode 100644 kernel/rseq.c >> > > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Thanks Peter ! Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* [PATCH 03/14] arm: Add restartable sequences support 2018-04-30 22:44 [RFC PATCH for 4.18 00/14] Restartable Sequences Mathieu Desnoyers 2018-04-30 22:44 ` [PATCH 01/14] uapi headers: Provide types_32_64.h (v2) Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers @ 2018-04-30 22:44 ` Mathieu Desnoyers 2018-05-16 16:18 ` Peter Zijlstra 2018-04-30 22:44 ` [PATCH 04/14] arm: Wire up restartable sequences system call Mathieu Desnoyers ` (11 subsequent siblings) 14 siblings, 1 reply; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers Call the rseq_handle_notify_resume() function on return to userspace if TIF_NOTIFY_RESUME thread flag is set. Perform fixup on the pre-signal frame when a signal is delivered on top of a restartable sequence critical section. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Russell King <linux@arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas@arm.com> CC: Will Deacon <will.deacon@arm.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Paul Turner <pjt@google.com> CC: Andrew Hunter <ahh@google.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Andi Kleen <andi@firstfloor.org> CC: Dave Watson <davejwatson@fb.com> CC: Chris Lameter <cl@linux.com> CC: Ingo Molnar <mingo@redhat.com> CC: Ben Maurer <bmaurer@fb.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Boqun Feng <boqun.feng@gmail.com> CC: linux-api@vger.kernel.org --- arch/arm/Kconfig | 1 + arch/arm/kernel/signal.c | 7 +++++++ 2 files changed, 8 insertions(+) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index a7f8e7f4b88f..4f5c386631d4 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -91,6 +91,7 @@ config ARM select HAVE_PERF_USER_STACK_DUMP select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE) select HAVE_REGS_AND_STACK_ACCESS_API + select HAVE_RSEQ select HAVE_SYSCALL_TRACEPOINTS select HAVE_UID16 select HAVE_VIRT_CPU_ACCOUNTING_GEN diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c index bd8810d4acb3..5879ab3f53c1 100644 --- a/arch/arm/kernel/signal.c +++ b/arch/arm/kernel/signal.c @@ -541,6 +541,12 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs) int ret; /* + * Increment event counter and perform fixup for the pre-signal + * frame. + */ + rseq_signal_deliver(regs); + + /* * Set up the stack frame */ if (ksig->ka.sa.sa_flags & SA_SIGINFO) @@ -660,6 +666,7 @@ do_work_pending(struct pt_regs *regs, unsigned int thread_flags, int syscall) } else { clear_thread_flag(TIF_NOTIFY_RESUME); tracehook_notify_resume(regs); + rseq_handle_notify_resume(regs); } } local_irq_disable(); -- 2.11.0 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* Re: [PATCH 03/14] arm: Add restartable sequences support 2018-04-30 22:44 ` [PATCH 03/14] arm: Add restartable sequences support Mathieu Desnoyers @ 2018-05-16 16:18 ` Peter Zijlstra 0 siblings, 0 replies; 105+ messages in thread From: Peter Zijlstra @ 2018-05-16 16:18 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes On Mon, Apr 30, 2018 at 06:44:22PM -0400, Mathieu Desnoyers wrote: > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index a7f8e7f4b88f..4f5c386631d4 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -91,6 +91,7 @@ config ARM > select HAVE_PERF_USER_STACK_DUMP > select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE) > select HAVE_REGS_AND_STACK_ACCESS_API > + select HAVE_RSEQ > select HAVE_SYSCALL_TRACEPOINTS > select HAVE_UID16 > select HAVE_VIRT_CPU_ACCOUNTING_GEN > diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c > index bd8810d4acb3..5879ab3f53c1 100644 > --- a/arch/arm/kernel/signal.c > +++ b/arch/arm/kernel/signal.c > @@ -541,6 +541,12 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs) > int ret; > > /* > + * Increment event counter and perform fixup for the pre-signal > + * frame. > + */ > + rseq_signal_deliver(regs); > + > + /* > * Set up the stack frame > */ > if (ksig->ka.sa.sa_flags & SA_SIGINFO) > @@ -660,6 +666,7 @@ do_work_pending(struct pt_regs *regs, unsigned int thread_flags, int syscall) > } else { > clear_thread_flag(TIF_NOTIFY_RESUME); > tracehook_notify_resume(regs); > + rseq_handle_notify_resume(regs); > } > } > local_irq_disable(); I think you forgot to hook up rseq_syscall() checking. ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 03/14] arm: Add restartable sequences support @ 2018-05-16 16:18 ` Peter Zijlstra 0 siblings, 0 replies; 105+ messages in thread From: Peter Zijlstra @ 2018-05-16 16:18 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon On Mon, Apr 30, 2018 at 06:44:22PM -0400, Mathieu Desnoyers wrote: > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index a7f8e7f4b88f..4f5c386631d4 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -91,6 +91,7 @@ config ARM > select HAVE_PERF_USER_STACK_DUMP > select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE) > select HAVE_REGS_AND_STACK_ACCESS_API > + select HAVE_RSEQ > select HAVE_SYSCALL_TRACEPOINTS > select HAVE_UID16 > select HAVE_VIRT_CPU_ACCOUNTING_GEN > diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c > index bd8810d4acb3..5879ab3f53c1 100644 > --- a/arch/arm/kernel/signal.c > +++ b/arch/arm/kernel/signal.c > @@ -541,6 +541,12 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs) > int ret; > > /* > + * Increment event counter and perform fixup for the pre-signal > + * frame. > + */ > + rseq_signal_deliver(regs); > + > + /* > * Set up the stack frame > */ > if (ksig->ka.sa.sa_flags & SA_SIGINFO) > @@ -660,6 +666,7 @@ do_work_pending(struct pt_regs *regs, unsigned int thread_flags, int syscall) > } else { > clear_thread_flag(TIF_NOTIFY_RESUME); > tracehook_notify_resume(regs); > + rseq_handle_notify_resume(regs); > } > } > local_irq_disable(); I think you forgot to hook up rseq_syscall() checking. ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 03/14] arm: Add restartable sequences support 2018-05-16 16:18 ` Peter Zijlstra @ 2018-05-16 20:13 ` Mathieu Desnoyers -1 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-16 20:13 UTC (permalink / raw) To: Peter Zijlstra Cc: Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes ----- On May 16, 2018, at 12:18 PM, Peter Zijlstra peterz@infradead.org wrote: > On Mon, Apr 30, 2018 at 06:44:22PM -0400, Mathieu Desnoyers wrote: >> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig >> index a7f8e7f4b88f..4f5c386631d4 100644 >> --- a/arch/arm/Kconfig >> +++ b/arch/arm/Kconfig >> @@ -91,6 +91,7 @@ config ARM >> select HAVE_PERF_USER_STACK_DUMP >> select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE) >> select HAVE_REGS_AND_STACK_ACCESS_API >> + select HAVE_RSEQ >> select HAVE_SYSCALL_TRACEPOINTS >> select HAVE_UID16 >> select HAVE_VIRT_CPU_ACCOUNTING_GEN >> diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c >> index bd8810d4acb3..5879ab3f53c1 100644 >> --- a/arch/arm/kernel/signal.c >> +++ b/arch/arm/kernel/signal.c >> @@ -541,6 +541,12 @@ static void handle_signal(struct ksignal *ksig, struct >> pt_regs *regs) >> int ret; >> >> /* >> + * Increment event counter and perform fixup for the pre-signal >> + * frame. >> + */ >> + rseq_signal_deliver(regs); >> + >> + /* >> * Set up the stack frame >> */ >> if (ksig->ka.sa.sa_flags & SA_SIGINFO) >> @@ -660,6 +666,7 @@ do_work_pending(struct pt_regs *regs, unsigned int >> thread_flags, int syscall) >> } else { >> clear_thread_flag(TIF_NOTIFY_RESUME); >> tracehook_notify_resume(regs); >> + rseq_handle_notify_resume(regs); >> } >> } >> local_irq_disable(); > > I think you forgot to hook up rseq_syscall() checking. Considering that rseq_syscall is implemented as follows: +void rseq_syscall(struct pt_regs *regs) +{ + unsigned long ip = instruction_pointer(regs); + struct task_struct *t = current; + struct rseq_cs rseq_cs; + + if (!t->rseq) + return; + if (!access_ok(VERIFY_READ, t->rseq, sizeof(*t->rseq)) || + rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) + force_sig(SIGSEGV, t); +} and that x86 calls it from syscall_return_slowpath() (which AFAIU is now used in the fast-path since KPTI), I wonder where we should call this on ARM ? I was under the impression that ARM return to userspace fast-path was not calling C code unless work flags were set, but I might be wrong. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 03/14] arm: Add restartable sequences support @ 2018-05-16 20:13 ` Mathieu Desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-16 20:13 UTC (permalink / raw) To: Peter Zijlstra Cc: Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon ----- On May 16, 2018, at 12:18 PM, Peter Zijlstra peterz@infradead.org wrote: > On Mon, Apr 30, 2018 at 06:44:22PM -0400, Mathieu Desnoyers wrote: >> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig >> index a7f8e7f4b88f..4f5c386631d4 100644 >> --- a/arch/arm/Kconfig >> +++ b/arch/arm/Kconfig >> @@ -91,6 +91,7 @@ config ARM >> select HAVE_PERF_USER_STACK_DUMP >> select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE) >> select HAVE_REGS_AND_STACK_ACCESS_API >> + select HAVE_RSEQ >> select HAVE_SYSCALL_TRACEPOINTS >> select HAVE_UID16 >> select HAVE_VIRT_CPU_ACCOUNTING_GEN >> diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c >> index bd8810d4acb3..5879ab3f53c1 100644 >> --- a/arch/arm/kernel/signal.c >> +++ b/arch/arm/kernel/signal.c >> @@ -541,6 +541,12 @@ static void handle_signal(struct ksignal *ksig, struct >> pt_regs *regs) >> int ret; >> >> /* >> + * Increment event counter and perform fixup for the pre-signal >> + * frame. >> + */ >> + rseq_signal_deliver(regs); >> + >> + /* >> * Set up the stack frame >> */ >> if (ksig->ka.sa.sa_flags & SA_SIGINFO) >> @@ -660,6 +666,7 @@ do_work_pending(struct pt_regs *regs, unsigned int >> thread_flags, int syscall) >> } else { >> clear_thread_flag(TIF_NOTIFY_RESUME); >> tracehook_notify_resume(regs); >> + rseq_handle_notify_resume(regs); >> } >> } >> local_irq_disable(); > > I think you forgot to hook up rseq_syscall() checking. Considering that rseq_syscall is implemented as follows: +void rseq_syscall(struct pt_regs *regs) +{ + unsigned long ip = instruction_pointer(regs); + struct task_struct *t = current; + struct rseq_cs rseq_cs; + + if (!t->rseq) + return; + if (!access_ok(VERIFY_READ, t->rseq, sizeof(*t->rseq)) || + rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) + force_sig(SIGSEGV, t); +} and that x86 calls it from syscall_return_slowpath() (which AFAIU is now used in the fast-path since KPTI), I wonder where we should call this on ARM ? I was under the impression that ARM return to userspace fast-path was not calling C code unless work flags were set, but I might be wrong. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 03/14] arm: Add restartable sequences support 2018-05-16 20:13 ` Mathieu Desnoyers @ 2018-05-17 13:32 ` Will Deacon -1 siblings, 0 replies; 105+ messages in thread From: Will Deacon @ 2018-05-17 13:32 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Michael Kerrisk, Joel Fernandes On Wed, May 16, 2018 at 04:13:13PM -0400, Mathieu Desnoyers wrote: > ----- On May 16, 2018, at 12:18 PM, Peter Zijlstra peterz@infradead.org wrote: > > > On Mon, Apr 30, 2018 at 06:44:22PM -0400, Mathieu Desnoyers wrote: > >> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > >> index a7f8e7f4b88f..4f5c386631d4 100644 > >> --- a/arch/arm/Kconfig > >> +++ b/arch/arm/Kconfig > >> @@ -91,6 +91,7 @@ config ARM > >> select HAVE_PERF_USER_STACK_DUMP > >> select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE) > >> select HAVE_REGS_AND_STACK_ACCESS_API > >> + select HAVE_RSEQ > >> select HAVE_SYSCALL_TRACEPOINTS > >> select HAVE_UID16 > >> select HAVE_VIRT_CPU_ACCOUNTING_GEN > >> diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c > >> index bd8810d4acb3..5879ab3f53c1 100644 > >> --- a/arch/arm/kernel/signal.c > >> +++ b/arch/arm/kernel/signal.c > >> @@ -541,6 +541,12 @@ static void handle_signal(struct ksignal *ksig, struct > >> pt_regs *regs) > >> int ret; > >> > >> /* > >> + * Increment event counter and perform fixup for the pre-signal > >> + * frame. > >> + */ > >> + rseq_signal_deliver(regs); > >> + > >> + /* > >> * Set up the stack frame > >> */ > >> if (ksig->ka.sa.sa_flags & SA_SIGINFO) > >> @@ -660,6 +666,7 @@ do_work_pending(struct pt_regs *regs, unsigned int > >> thread_flags, int syscall) > >> } else { > >> clear_thread_flag(TIF_NOTIFY_RESUME); > >> tracehook_notify_resume(regs); > >> + rseq_handle_notify_resume(regs); > >> } > >> } > >> local_irq_disable(); > > > > I think you forgot to hook up rseq_syscall() checking. > > Considering that rseq_syscall is implemented as follows: > > +void rseq_syscall(struct pt_regs *regs) > +{ > + unsigned long ip = instruction_pointer(regs); > + struct task_struct *t = current; > + struct rseq_cs rseq_cs; > + > + if (!t->rseq) > + return; > + if (!access_ok(VERIFY_READ, t->rseq, sizeof(*t->rseq)) || > + rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) > + force_sig(SIGSEGV, t); > +} > > and that x86 calls it from syscall_return_slowpath() (which AFAIU is > now used in the fast-path since KPTI), I wonder where we should call > this on ARM ? I was under the impression that ARM return to userspace > fast-path was not calling C code unless work flags were set, but I might > be wrong. > > Thoughts ? Since this only matters for CONFIG_DEBUG_RSEQ, can we just force the slowpath for rseq tasks when that option is set? Will ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 03/14] arm: Add restartable sequences support @ 2018-05-17 13:32 ` Will Deacon 0 siblings, 0 replies; 105+ messages in thread From: Will Deacon @ 2018-05-17 13:32 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas On Wed, May 16, 2018 at 04:13:13PM -0400, Mathieu Desnoyers wrote: > ----- On May 16, 2018, at 12:18 PM, Peter Zijlstra peterz@infradead.org wrote: > > > On Mon, Apr 30, 2018 at 06:44:22PM -0400, Mathieu Desnoyers wrote: > >> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > >> index a7f8e7f4b88f..4f5c386631d4 100644 > >> --- a/arch/arm/Kconfig > >> +++ b/arch/arm/Kconfig > >> @@ -91,6 +91,7 @@ config ARM > >> select HAVE_PERF_USER_STACK_DUMP > >> select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE) > >> select HAVE_REGS_AND_STACK_ACCESS_API > >> + select HAVE_RSEQ > >> select HAVE_SYSCALL_TRACEPOINTS > >> select HAVE_UID16 > >> select HAVE_VIRT_CPU_ACCOUNTING_GEN > >> diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c > >> index bd8810d4acb3..5879ab3f53c1 100644 > >> --- a/arch/arm/kernel/signal.c > >> +++ b/arch/arm/kernel/signal.c > >> @@ -541,6 +541,12 @@ static void handle_signal(struct ksignal *ksig, struct > >> pt_regs *regs) > >> int ret; > >> > >> /* > >> + * Increment event counter and perform fixup for the pre-signal > >> + * frame. > >> + */ > >> + rseq_signal_deliver(regs); > >> + > >> + /* > >> * Set up the stack frame > >> */ > >> if (ksig->ka.sa.sa_flags & SA_SIGINFO) > >> @@ -660,6 +666,7 @@ do_work_pending(struct pt_regs *regs, unsigned int > >> thread_flags, int syscall) > >> } else { > >> clear_thread_flag(TIF_NOTIFY_RESUME); > >> tracehook_notify_resume(regs); > >> + rseq_handle_notify_resume(regs); > >> } > >> } > >> local_irq_disable(); > > > > I think you forgot to hook up rseq_syscall() checking. > > Considering that rseq_syscall is implemented as follows: > > +void rseq_syscall(struct pt_regs *regs) > +{ > + unsigned long ip = instruction_pointer(regs); > + struct task_struct *t = current; > + struct rseq_cs rseq_cs; > + > + if (!t->rseq) > + return; > + if (!access_ok(VERIFY_READ, t->rseq, sizeof(*t->rseq)) || > + rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) > + force_sig(SIGSEGV, t); > +} > > and that x86 calls it from syscall_return_slowpath() (which AFAIU is > now used in the fast-path since KPTI), I wonder where we should call > this on ARM ? I was under the impression that ARM return to userspace > fast-path was not calling C code unless work flags were set, but I might > be wrong. > > Thoughts ? Since this only matters for CONFIG_DEBUG_RSEQ, can we just force the slowpath for rseq tasks when that option is set? Will ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 03/14] arm: Add restartable sequences support 2018-05-17 13:32 ` Will Deacon @ 2018-05-17 15:30 ` Mathieu Desnoyers -1 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-17 15:30 UTC (permalink / raw) To: Will Deacon Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Michael Kerrisk, Joel Fernandes ----- On May 17, 2018, at 9:32 AM, Will Deacon will.deacon@arm.com wrote: > On Wed, May 16, 2018 at 04:13:13PM -0400, Mathieu Desnoyers wrote: >> ----- On May 16, 2018, at 12:18 PM, Peter Zijlstra peterz@infradead.org wrote: >> >> > On Mon, Apr 30, 2018 at 06:44:22PM -0400, Mathieu Desnoyers wrote: >> >> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig >> >> index a7f8e7f4b88f..4f5c386631d4 100644 >> >> --- a/arch/arm/Kconfig >> >> +++ b/arch/arm/Kconfig >> >> @@ -91,6 +91,7 @@ config ARM >> >> select HAVE_PERF_USER_STACK_DUMP >> >> select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE) >> >> select HAVE_REGS_AND_STACK_ACCESS_API >> >> + select HAVE_RSEQ >> >> select HAVE_SYSCALL_TRACEPOINTS >> >> select HAVE_UID16 >> >> select HAVE_VIRT_CPU_ACCOUNTING_GEN >> >> diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c >> >> index bd8810d4acb3..5879ab3f53c1 100644 >> >> --- a/arch/arm/kernel/signal.c >> >> +++ b/arch/arm/kernel/signal.c >> >> @@ -541,6 +541,12 @@ static void handle_signal(struct ksignal *ksig, struct >> >> pt_regs *regs) >> >> int ret; >> >> >> >> /* >> >> + * Increment event counter and perform fixup for the pre-signal >> >> + * frame. >> >> + */ >> >> + rseq_signal_deliver(regs); >> >> + >> >> + /* >> >> * Set up the stack frame >> >> */ >> >> if (ksig->ka.sa.sa_flags & SA_SIGINFO) >> >> @@ -660,6 +666,7 @@ do_work_pending(struct pt_regs *regs, unsigned int >> >> thread_flags, int syscall) >> >> } else { >> >> clear_thread_flag(TIF_NOTIFY_RESUME); >> >> tracehook_notify_resume(regs); >> >> + rseq_handle_notify_resume(regs); >> >> } >> >> } >> >> local_irq_disable(); >> > >> > I think you forgot to hook up rseq_syscall() checking. >> >> Considering that rseq_syscall is implemented as follows: >> >> +void rseq_syscall(struct pt_regs *regs) >> +{ >> + unsigned long ip = instruction_pointer(regs); >> + struct task_struct *t = current; >> + struct rseq_cs rseq_cs; >> + >> + if (!t->rseq) >> + return; >> + if (!access_ok(VERIFY_READ, t->rseq, sizeof(*t->rseq)) || >> + rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) >> + force_sig(SIGSEGV, t); >> +} >> >> and that x86 calls it from syscall_return_slowpath() (which AFAIU is >> now used in the fast-path since KPTI), I wonder where we should call >> this on ARM ? I was under the impression that ARM return to userspace >> fast-path was not calling C code unless work flags were set, but I might >> be wrong. >> >> Thoughts ? > > Since this only matters for CONFIG_DEBUG_RSEQ, can we just force the > slowpath for rseq tasks when that option is set? Or as proposed by Boqun, we can simply call rseq_syscall in a CONFIG_DEBUG_RSEQ ifdef. Given that this is a debug option, is it worth it to add the current->rseq test for NULL in assembly before the call, or do we want to favor simplicity ? Thanks, Mathieu > > Will -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 03/14] arm: Add restartable sequences support @ 2018-05-17 15:30 ` Mathieu Desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-17 15:30 UTC (permalink / raw) To: Will Deacon Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas ----- On May 17, 2018, at 9:32 AM, Will Deacon will.deacon@arm.com wrote: > On Wed, May 16, 2018 at 04:13:13PM -0400, Mathieu Desnoyers wrote: >> ----- On May 16, 2018, at 12:18 PM, Peter Zijlstra peterz@infradead.org wrote: >> >> > On Mon, Apr 30, 2018 at 06:44:22PM -0400, Mathieu Desnoyers wrote: >> >> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig >> >> index a7f8e7f4b88f..4f5c386631d4 100644 >> >> --- a/arch/arm/Kconfig >> >> +++ b/arch/arm/Kconfig >> >> @@ -91,6 +91,7 @@ config ARM >> >> select HAVE_PERF_USER_STACK_DUMP >> >> select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE) >> >> select HAVE_REGS_AND_STACK_ACCESS_API >> >> + select HAVE_RSEQ >> >> select HAVE_SYSCALL_TRACEPOINTS >> >> select HAVE_UID16 >> >> select HAVE_VIRT_CPU_ACCOUNTING_GEN >> >> diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c >> >> index bd8810d4acb3..5879ab3f53c1 100644 >> >> --- a/arch/arm/kernel/signal.c >> >> +++ b/arch/arm/kernel/signal.c >> >> @@ -541,6 +541,12 @@ static void handle_signal(struct ksignal *ksig, struct >> >> pt_regs *regs) >> >> int ret; >> >> >> >> /* >> >> + * Increment event counter and perform fixup for the pre-signal >> >> + * frame. >> >> + */ >> >> + rseq_signal_deliver(regs); >> >> + >> >> + /* >> >> * Set up the stack frame >> >> */ >> >> if (ksig->ka.sa.sa_flags & SA_SIGINFO) >> >> @@ -660,6 +666,7 @@ do_work_pending(struct pt_regs *regs, unsigned int >> >> thread_flags, int syscall) >> >> } else { >> >> clear_thread_flag(TIF_NOTIFY_RESUME); >> >> tracehook_notify_resume(regs); >> >> + rseq_handle_notify_resume(regs); >> >> } >> >> } >> >> local_irq_disable(); >> > >> > I think you forgot to hook up rseq_syscall() checking. >> >> Considering that rseq_syscall is implemented as follows: >> >> +void rseq_syscall(struct pt_regs *regs) >> +{ >> + unsigned long ip = instruction_pointer(regs); >> + struct task_struct *t = current; >> + struct rseq_cs rseq_cs; >> + >> + if (!t->rseq) >> + return; >> + if (!access_ok(VERIFY_READ, t->rseq, sizeof(*t->rseq)) || >> + rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) >> + force_sig(SIGSEGV, t); >> +} >> >> and that x86 calls it from syscall_return_slowpath() (which AFAIU is >> now used in the fast-path since KPTI), I wonder where we should call >> this on ARM ? I was under the impression that ARM return to userspace >> fast-path was not calling C code unless work flags were set, but I might >> be wrong. >> >> Thoughts ? > > Since this only matters for CONFIG_DEBUG_RSEQ, can we just force the > slowpath for rseq tasks when that option is set? Or as proposed by Boqun, we can simply call rseq_syscall in a CONFIG_DEBUG_RSEQ ifdef. Given that this is a debug option, is it worth it to add the current->rseq test for NULL in assembly before the call, or do we want to favor simplicity ? Thanks, Mathieu > > Will -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 03/14] arm: Add restartable sequences support 2018-05-17 15:30 ` Mathieu Desnoyers @ 2018-05-22 18:19 ` Mathieu Desnoyers -1 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-22 18:19 UTC (permalink / raw) To: Will Deacon, Russell King Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Michael Kerrisk, Joel Fernandes ----- On May 17, 2018, at 11:30 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote: [...] > > Or as proposed by Boqun, we can simply call rseq_syscall in a CONFIG_DEBUG_RSEQ > ifdef. Given that this is a debug option, is it worth it to add the > current->rseq > test for NULL in assembly before the call, or do we want to favor simplicity ? > Based on advice from Will Deacon, I alternatively tried to add a new TIF_RSEQ thread flags, but unfortunately bits 1 through 8 are already used, and this is all that fits in an immediate operand on arm32 for the fast-path thread flag syscall work mask check in assembly. So considering that this is a kernel debug option, I took the approach of adding a call at the very beginning of return from syscall fast and slow paths, which is only compiled in if CONFIG_DEBUG_RSEQ=y. Does the following approach make sense ? arm: Add syscall detection for restartable sequences Syscalls are not allowed inside restartable sequences, so add a call to rseq_syscall() at the very beginning of system call exiting path for CONFIG_DEBUG_RSEQ=y kernel. This could help us to detect whether there is a syscall issued inside restartable sequences. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> --- diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S index 3c4f887..b427ef8 100644 --- a/arch/arm/kernel/entry-common.S +++ b/arch/arm/kernel/entry-common.S @@ -39,12 +39,13 @@ saved_pc .req lr .section .entry.text,"ax",%progbits .align 5 -#if !(IS_ENABLED(CONFIG_TRACE_IRQFLAGS) || IS_ENABLED(CONFIG_CONTEXT_TRACKING)) +#if !(IS_ENABLED(CONFIG_TRACE_IRQFLAGS) || IS_ENABLED(CONFIG_CONTEXT_TRACKING) || \ + IS_ENABLED(CONFIG_DEBUG_RSEQ)) /* * This is the fast syscall return path. We do as little as possible here, * such as avoiding writing r0 to the stack. We only use this path if we - * have tracing and context tracking disabled - the overheads from those - * features make this path too inefficient. + * have tracing, context tracking and rseq debug disabled - the overheads + * from those features make this path too inefficient. */ ret_fast_syscall: UNWIND(.fnstart ) @@ -71,14 +72,20 @@ fast_work_pending: /* fall through to work_pending */ #else /* - * The "replacement" ret_fast_syscall for when tracing or context tracking - * is enabled. As we will need to call out to some C functions, we save - * r0 first to avoid needing to save registers around each C function call. + * The "replacement" ret_fast_syscall for when tracing, context tracking, + * or rseq debug is enabled. As we will need to call out to some C functions, + * we save r0 first to avoid needing to save registers around each C function + * call. */ ret_fast_syscall: UNWIND(.fnstart ) UNWIND(.cantunwind ) str r0, [sp, #S_R0 + S_OFF]! @ save returned r0 +#if IS_ENABLED(CONFIG_DEBUG_RSEQ) + /* do_rseq_syscall needs interrupts enabled. */ + mov r0, sp @ 'regs' + bl do_rseq_syscall +#endif disable_irq_notrace @ disable interrupts ldr r2, [tsk, #TI_ADDR_LIMIT] cmp r2, #TASK_SIZE @@ -113,6 +120,12 @@ ENDPROC(ret_fast_syscall) */ ENTRY(ret_to_user) ret_slow_syscall: +#if IS_ENABLED(CONFIG_DEBUG_RSEQ) + /* do_rseq_syscall needs interrupts enabled. */ + enable_irq_notrace @ enable interrupts + mov r0, sp @ 'regs' + bl do_rseq_syscall +#endif disable_irq_notrace @ disable interrupts ENTRY(ret_to_user_from_irq) ldr r2, [tsk, #TI_ADDR_LIMIT] diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c index 5879ab3..f09e9d66 100644 --- a/arch/arm/kernel/signal.c +++ b/arch/arm/kernel/signal.c @@ -710,3 +710,10 @@ asmlinkage void addr_limit_check_failed(void) { addr_limit_user_check(); } + +#ifdef CONFIG_DEBUG_RSEQ +asmlinkage void do_rseq_syscall(struct pt_regs *regs) +{ + rseq_syscall(regs); +} +#endif -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply related [flat|nested] 105+ messages in thread
* Re: [PATCH 03/14] arm: Add restartable sequences support @ 2018-05-22 18:19 ` Mathieu Desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-22 18:19 UTC (permalink / raw) To: Will Deacon, Russell King Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas ----- On May 17, 2018, at 11:30 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote: [...] > > Or as proposed by Boqun, we can simply call rseq_syscall in a CONFIG_DEBUG_RSEQ > ifdef. Given that this is a debug option, is it worth it to add the > current->rseq > test for NULL in assembly before the call, or do we want to favor simplicity ? > Based on advice from Will Deacon, I alternatively tried to add a new TIF_RSEQ thread flags, but unfortunately bits 1 through 8 are already used, and this is all that fits in an immediate operand on arm32 for the fast-path thread flag syscall work mask check in assembly. So considering that this is a kernel debug option, I took the approach of adding a call at the very beginning of return from syscall fast and slow paths, which is only compiled in if CONFIG_DEBUG_RSEQ=y. Does the following approach make sense ? arm: Add syscall detection for restartable sequences Syscalls are not allowed inside restartable sequences, so add a call to rseq_syscall() at the very beginning of system call exiting path for CONFIG_DEBUG_RSEQ=y kernel. This could help us to detect whether there is a syscall issued inside restartable sequences. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> --- diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S index 3c4f887..b427ef8 100644 --- a/arch/arm/kernel/entry-common.S +++ b/arch/arm/kernel/entry-common.S @@ -39,12 +39,13 @@ saved_pc .req lr .section .entry.text,"ax",%progbits .align 5 -#if !(IS_ENABLED(CONFIG_TRACE_IRQFLAGS) || IS_ENABLED(CONFIG_CONTEXT_TRACKING)) +#if !(IS_ENABLED(CONFIG_TRACE_IRQFLAGS) || IS_ENABLED(CONFIG_CONTEXT_TRACKING) || \ + IS_ENABLED(CONFIG_DEBUG_RSEQ)) /* * This is the fast syscall return path. We do as little as possible here, * such as avoiding writing r0 to the stack. We only use this path if we - * have tracing and context tracking disabled - the overheads from those - * features make this path too inefficient. + * have tracing, context tracking and rseq debug disabled - the overheads + * from those features make this path too inefficient. */ ret_fast_syscall: UNWIND(.fnstart ) @@ -71,14 +72,20 @@ fast_work_pending: /* fall through to work_pending */ #else /* - * The "replacement" ret_fast_syscall for when tracing or context tracking - * is enabled. As we will need to call out to some C functions, we save - * r0 first to avoid needing to save registers around each C function call. + * The "replacement" ret_fast_syscall for when tracing, context tracking, + * or rseq debug is enabled. As we will need to call out to some C functions, + * we save r0 first to avoid needing to save registers around each C function + * call. */ ret_fast_syscall: UNWIND(.fnstart ) UNWIND(.cantunwind ) str r0, [sp, #S_R0 + S_OFF]! @ save returned r0 +#if IS_ENABLED(CONFIG_DEBUG_RSEQ) + /* do_rseq_syscall needs interrupts enabled. */ + mov r0, sp @ 'regs' + bl do_rseq_syscall +#endif disable_irq_notrace @ disable interrupts ldr r2, [tsk, #TI_ADDR_LIMIT] cmp r2, #TASK_SIZE @@ -113,6 +120,12 @@ ENDPROC(ret_fast_syscall) */ ENTRY(ret_to_user) ret_slow_syscall: +#if IS_ENABLED(CONFIG_DEBUG_RSEQ) + /* do_rseq_syscall needs interrupts enabled. */ + enable_irq_notrace @ enable interrupts + mov r0, sp @ 'regs' + bl do_rseq_syscall +#endif disable_irq_notrace @ disable interrupts ENTRY(ret_to_user_from_irq) ldr r2, [tsk, #TI_ADDR_LIMIT] diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c index 5879ab3..f09e9d66 100644 --- a/arch/arm/kernel/signal.c +++ b/arch/arm/kernel/signal.c @@ -710,3 +710,10 @@ asmlinkage void addr_limit_check_failed(void) { addr_limit_user_check(); } + +#ifdef CONFIG_DEBUG_RSEQ +asmlinkage void do_rseq_syscall(struct pt_regs *regs) +{ + rseq_syscall(regs); +} +#endif -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 04/14] arm: Wire up restartable sequences system call 2018-04-30 22:44 [RFC PATCH for 4.18 00/14] Restartable Sequences Mathieu Desnoyers ` (2 preceding siblings ...) 2018-04-30 22:44 ` [PATCH 03/14] arm: Add restartable sequences support Mathieu Desnoyers @ 2018-04-30 22:44 ` Mathieu Desnoyers 2018-04-30 22:44 ` [PATCH 05/14] x86: Add support for restartable sequences (v2) Mathieu Desnoyers ` (10 subsequent siblings) 14 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers Wire up the rseq system call on 32-bit ARM. This provides an ABI improving the speed of a user-space getcpu operation on ARM by skipping the getcpu system call on the fast path, as well as improving the speed of user-space operations on per-cpu data compared to using load-linked/store-conditional. TODO: wire up rseq_syscall() on return from system call. It is used with CONFIG_DEBUG_RSEQ=y to ensure system calls are not issued within rseq critical section Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Russell King <linux@arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas@arm.com> CC: Will Deacon <will.deacon@arm.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Paul Turner <pjt@google.com> CC: Andrew Hunter <ahh@google.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Andi Kleen <andi@firstfloor.org> CC: Dave Watson <davejwatson@fb.com> CC: Chris Lameter <cl@linux.com> CC: Ingo Molnar <mingo@redhat.com> CC: Ben Maurer <bmaurer@fb.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Boqun Feng <boqun.feng@gmail.com> CC: linux-api@vger.kernel.org --- arch/arm/tools/syscall.tbl | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 0bb0e9c6376c..fbc74b5fa3ed 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -412,3 +412,4 @@ 395 common pkey_alloc sys_pkey_alloc 396 common pkey_free sys_pkey_free 397 common statx sys_statx +398 common rseq sys_rseq -- 2.11.0 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 05/14] x86: Add support for restartable sequences (v2) 2018-04-30 22:44 [RFC PATCH for 4.18 00/14] Restartable Sequences Mathieu Desnoyers ` (3 preceding siblings ...) 2018-04-30 22:44 ` [PATCH 04/14] arm: Wire up restartable sequences system call Mathieu Desnoyers @ 2018-04-30 22:44 ` Mathieu Desnoyers 2018-04-30 22:44 ` [PATCH 06/14] x86: Wire up restartable sequence system call Mathieu Desnoyers ` (9 subsequent siblings) 14 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers Call the rseq_handle_notify_resume() function on return to userspace if TIF_NOTIFY_RESUME thread flag is set. Perform fixup on the pre-signal frame when a signal is delivered on top of a restartable sequence critical section. Check that system calls are not invoked from within rseq critical sections by invoking rseq_signal() from syscall_return_slowpath(). With CONFIG_DEBUG_RSEQ, such behavior results in termination of the process with SIGSEGV. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> CC: Russell King <linux@arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas@arm.com> CC: Will Deacon <will.deacon@arm.com> CC: Paul Turner <pjt@google.com> CC: Andrew Hunter <ahh@google.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Andi Kleen <andi@firstfloor.org> CC: Dave Watson <davejwatson@fb.com> CC: Chris Lameter <cl@linux.com> CC: Ingo Molnar <mingo@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Ben Maurer <bmaurer@fb.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Boqun Feng <boqun.feng@gmail.com> CC: linux-api@vger.kernel.org --- Changes since v1: - Call rseq_signal() when returning from a system call. --- arch/x86/Kconfig | 1 + arch/x86/entry/common.c | 3 +++ arch/x86/kernel/signal.c | 6 ++++++ 3 files changed, 10 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index c07f492b871a..62e00a1a7cf7 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -180,6 +180,7 @@ config X86 select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_RELIABLE_STACKTRACE if X86_64 && UNWINDER_FRAME_POINTER && STACK_VALIDATION select HAVE_STACK_VALIDATION if X86_64 + select HAVE_RSEQ select HAVE_SYSCALL_TRACEPOINTS select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_USER_RETURN_NOTIFIER diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index fbf6a6c3fd2d..92190879b228 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -164,6 +164,7 @@ static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags) if (cached_flags & _TIF_NOTIFY_RESUME) { clear_thread_flag(TIF_NOTIFY_RESUME); tracehook_notify_resume(regs); + rseq_handle_notify_resume(regs); } if (cached_flags & _TIF_USER_RETURN_NOTIFY) @@ -254,6 +255,8 @@ __visible inline void syscall_return_slowpath(struct pt_regs *regs) WARN(irqs_disabled(), "syscall %ld left IRQs disabled", regs->orig_ax)) local_irq_enable(); + rseq_syscall(regs); + /* * First do one-time work. If these work items are enabled, we * want to run them exactly once per syscall exit with IRQs on. diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index da270b95fe4d..445ca11ff863 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -688,6 +688,12 @@ setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs) sigset_t *set = sigmask_to_save(); compat_sigset_t *cset = (compat_sigset_t *) set; + /* + * Increment event counter and perform fixup for the pre-signal + * frame. + */ + rseq_signal_deliver(regs); + /* Set up the stack frame */ if (is_ia32_frame(ksig)) { if (ksig->ka.sa.sa_flags & SA_SIGINFO) -- 2.11.0 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 06/14] x86: Wire up restartable sequence system call 2018-04-30 22:44 [RFC PATCH for 4.18 00/14] Restartable Sequences Mathieu Desnoyers ` (4 preceding siblings ...) 2018-04-30 22:44 ` [PATCH 05/14] x86: Add support for restartable sequences (v2) Mathieu Desnoyers @ 2018-04-30 22:44 ` Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers ` (8 subsequent siblings) 14 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers Wire up the rseq system call on x86 32/64. This provides an ABI improving the speed of a user-space getcpu operation on x86 by removing the need to perform a function call, "lsl" instruction, or system call on the fast path, as well as improving the speed of user-space operations on per-cpu data. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> CC: Russell King <linux@arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas@arm.com> CC: Will Deacon <will.deacon@arm.com> CC: Paul Turner <pjt@google.com> CC: Andrew Hunter <ahh@google.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Andi Kleen <andi@firstfloor.org> CC: Dave Watson <davejwatson@fb.com> CC: Chris Lameter <cl@linux.com> CC: Ingo Molnar <mingo@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Ben Maurer <bmaurer@fb.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Boqun Feng <boqun.feng@gmail.com> CC: linux-api@vger.kernel.org --- arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + 2 files changed, 2 insertions(+) diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index d6b27dab1b30..db346da64947 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -396,3 +396,4 @@ 382 i386 pkey_free sys_pkey_free __ia32_sys_pkey_free 383 i386 statx sys_statx __ia32_sys_statx 384 i386 arch_prctl sys_arch_prctl __ia32_compat_sys_arch_prctl +385 i386 rseq sys_rseq __ia32_sys_rseq diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 4dfe42666d0c..41b082b125c3 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -341,6 +341,7 @@ 330 common pkey_alloc __x64_sys_pkey_alloc 331 common pkey_free __x64_sys_pkey_free 332 common statx __x64_sys_statx +333 common rseq __x64_sys_rseq # # x32-specific system call numbers start at 512 to avoid cache impact -- 2.11.0 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 07/14] powerpc: Add support for restartable sequences 2018-04-30 22:44 [RFC PATCH for 4.18 00/14] Restartable Sequences Mathieu Desnoyers @ 2018-04-30 22:44 ` Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers ` (13 subsequent siblings) 14 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linuxppc-dev From: Boqun Feng <boqun.feng@gmail.com> Call the rseq_handle_notify_resume() function on return to userspace if TIF_NOTIFY_RESUME thread flag is set. Perform fixup on the pre-signal when a signal is delivered on top of a restartable sequence critical section. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> CC: Michael Ellerman <mpe@ellerman.id.au> CC: Peter Zijlstra <peterz@infradead.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: linuxppc-dev@lists.ozlabs.org --- arch/powerpc/Kconfig | 1 + arch/powerpc/kernel/signal.c | 3 +++ 2 files changed, 4 insertions(+) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index c32a181a7cbb..ed21a777e8c6 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -223,6 +223,7 @@ config PPC select HAVE_SYSCALL_TRACEPOINTS select HAVE_VIRT_CPU_ACCOUNTING select HAVE_IRQ_TIME_ACCOUNTING + select HAVE_RSEQ select IRQ_DOMAIN select IRQ_FORCED_THREADING select MODULES_USE_ELF_RELA diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c index 61db86ecd318..d3bb3aaaf5ac 100644 --- a/arch/powerpc/kernel/signal.c +++ b/arch/powerpc/kernel/signal.c @@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk) /* Re-enable the breakpoints for the signal stack */ thread_change_pc(tsk, tsk->thread.regs); + rseq_signal_deliver(tsk->thread.regs); + if (is32) { if (ksig.ka.sa.sa_flags & SA_SIGINFO) ret = handle_rt_signal32(&ksig, oldset, tsk); @@ -164,6 +166,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags) if (thread_info_flags & _TIF_NOTIFY_RESUME) { clear_thread_flag(TIF_NOTIFY_RESUME); tracehook_notify_resume(regs); + rseq_handle_notify_resume(regs); } user_enter(); -- 2.11.0 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 07/14] powerpc: Add support for restartable sequences @ 2018-04-30 22:44 ` Mathieu Desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: Joel Fernandes, Will Deacon, Andi Kleen, Paul Mackerras, H . Peter Anvin, Chris Lameter, Russell King, Andrew Hunter, Ingo Molnar, Michael Kerrisk, Catalin Marinas, Paul Turner, Josh Triplett, Steven Rostedt, Ben Maurer, Mathieu Desnoyers, Thomas Gleixner, linux-api, linuxppc-dev, linux-kernel, Andrew Morton, Linus Torvalds From: Boqun Feng <boqun.feng@gmail.com> Call the rseq_handle_notify_resume() function on return to userspace if TIF_NOTIFY_RESUME thread flag is set. Perform fixup on the pre-signal when a signal is delivered on top of a restartable sequence critical section. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> CC: Michael Ellerman <mpe@ellerman.id.au> CC: Peter Zijlstra <peterz@infradead.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: linuxppc-dev@lists.ozlabs.org --- arch/powerpc/Kconfig | 1 + arch/powerpc/kernel/signal.c | 3 +++ 2 files changed, 4 insertions(+) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index c32a181a7cbb..ed21a777e8c6 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -223,6 +223,7 @@ config PPC select HAVE_SYSCALL_TRACEPOINTS select HAVE_VIRT_CPU_ACCOUNTING select HAVE_IRQ_TIME_ACCOUNTING + select HAVE_RSEQ select IRQ_DOMAIN select IRQ_FORCED_THREADING select MODULES_USE_ELF_RELA diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c index 61db86ecd318..d3bb3aaaf5ac 100644 --- a/arch/powerpc/kernel/signal.c +++ b/arch/powerpc/kernel/signal.c @@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk) /* Re-enable the breakpoints for the signal stack */ thread_change_pc(tsk, tsk->thread.regs); + rseq_signal_deliver(tsk->thread.regs); + if (is32) { if (ksig.ka.sa.sa_flags & SA_SIGINFO) ret = handle_rt_signal32(&ksig, oldset, tsk); @@ -164,6 +166,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags) if (thread_info_flags & _TIF_NOTIFY_RESUME) { clear_thread_flag(TIF_NOTIFY_RESUME); tracehook_notify_resume(regs); + rseq_handle_notify_resume(regs); } user_enter(); -- 2.11.0 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences 2018-04-30 22:44 ` Mathieu Desnoyers @ 2018-05-16 16:18 ` Peter Zijlstra -1 siblings, 0 replies; 105+ messages in thread From: Peter Zijlstra @ 2018-05-16 16:18 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linuxppc-dev On Mon, Apr 30, 2018 at 06:44:26PM -0400, Mathieu Desnoyers wrote: > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index c32a181a7cbb..ed21a777e8c6 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -223,6 +223,7 @@ config PPC > select HAVE_SYSCALL_TRACEPOINTS > select HAVE_VIRT_CPU_ACCOUNTING > select HAVE_IRQ_TIME_ACCOUNTING > + select HAVE_RSEQ > select IRQ_DOMAIN > select IRQ_FORCED_THREADING > select MODULES_USE_ELF_RELA > diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c > index 61db86ecd318..d3bb3aaaf5ac 100644 > --- a/arch/powerpc/kernel/signal.c > +++ b/arch/powerpc/kernel/signal.c > @@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk) > /* Re-enable the breakpoints for the signal stack */ > thread_change_pc(tsk, tsk->thread.regs); > > + rseq_signal_deliver(tsk->thread.regs); > + > if (is32) { > if (ksig.ka.sa.sa_flags & SA_SIGINFO) > ret = handle_rt_signal32(&ksig, oldset, tsk); > @@ -164,6 +166,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags) > if (thread_info_flags & _TIF_NOTIFY_RESUME) { > clear_thread_flag(TIF_NOTIFY_RESUME); > tracehook_notify_resume(regs); > + rseq_handle_notify_resume(regs); > } > > user_enter(); Again no rseq_syscall(). ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences @ 2018-05-16 16:18 ` Peter Zijlstra 0 siblings, 0 replies; 105+ messages in thread From: Peter Zijlstra @ 2018-05-16 16:18 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon On Mon, Apr 30, 2018 at 06:44:26PM -0400, Mathieu Desnoyers wrote: > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index c32a181a7cbb..ed21a777e8c6 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -223,6 +223,7 @@ config PPC > select HAVE_SYSCALL_TRACEPOINTS > select HAVE_VIRT_CPU_ACCOUNTING > select HAVE_IRQ_TIME_ACCOUNTING > + select HAVE_RSEQ > select IRQ_DOMAIN > select IRQ_FORCED_THREADING > select MODULES_USE_ELF_RELA > diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c > index 61db86ecd318..d3bb3aaaf5ac 100644 > --- a/arch/powerpc/kernel/signal.c > +++ b/arch/powerpc/kernel/signal.c > @@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk) > /* Re-enable the breakpoints for the signal stack */ > thread_change_pc(tsk, tsk->thread.regs); > > + rseq_signal_deliver(tsk->thread.regs); > + > if (is32) { > if (ksig.ka.sa.sa_flags & SA_SIGINFO) > ret = handle_rt_signal32(&ksig, oldset, tsk); > @@ -164,6 +166,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags) > if (thread_info_flags & _TIF_NOTIFY_RESUME) { > clear_thread_flag(TIF_NOTIFY_RESUME); > tracehook_notify_resume(regs); > + rseq_handle_notify_resume(regs); > } > > user_enter(); Again no rseq_syscall(). ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences 2018-05-16 16:18 ` Peter Zijlstra @ 2018-05-16 20:13 ` Mathieu Desnoyers -1 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-16 20:13 UTC (permalink / raw) To: Peter Zijlstra Cc: Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linuxppc-dev ----- On May 16, 2018, at 12:18 PM, Peter Zijlstra peterz@infradead.org wrote: > On Mon, Apr 30, 2018 at 06:44:26PM -0400, Mathieu Desnoyers wrote: >> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig >> index c32a181a7cbb..ed21a777e8c6 100644 >> --- a/arch/powerpc/Kconfig >> +++ b/arch/powerpc/Kconfig >> @@ -223,6 +223,7 @@ config PPC >> select HAVE_SYSCALL_TRACEPOINTS >> select HAVE_VIRT_CPU_ACCOUNTING >> select HAVE_IRQ_TIME_ACCOUNTING >> + select HAVE_RSEQ >> select IRQ_DOMAIN >> select IRQ_FORCED_THREADING >> select MODULES_USE_ELF_RELA >> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c >> index 61db86ecd318..d3bb3aaaf5ac 100644 >> --- a/arch/powerpc/kernel/signal.c >> +++ b/arch/powerpc/kernel/signal.c >> @@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk) >> /* Re-enable the breakpoints for the signal stack */ >> thread_change_pc(tsk, tsk->thread.regs); >> >> + rseq_signal_deliver(tsk->thread.regs); >> + >> if (is32) { >> if (ksig.ka.sa.sa_flags & SA_SIGINFO) >> ret = handle_rt_signal32(&ksig, oldset, tsk); >> @@ -164,6 +166,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long >> thread_info_flags) >> if (thread_info_flags & _TIF_NOTIFY_RESUME) { >> clear_thread_flag(TIF_NOTIFY_RESUME); >> tracehook_notify_resume(regs); >> + rseq_handle_notify_resume(regs); >> } >> >> user_enter(); > > Again no rseq_syscall(). Same question for PowerPC as for ARM: Considering that rseq_syscall is implemented as follows: +void rseq_syscall(struct pt_regs *regs) +{ + unsigned long ip = instruction_pointer(regs); + struct task_struct *t = current; + struct rseq_cs rseq_cs; + + if (!t->rseq) + return; + if (!access_ok(VERIFY_READ, t->rseq, sizeof(*t->rseq)) || + rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) + force_sig(SIGSEGV, t); +} and that x86 calls it from syscall_return_slowpath() (which AFAIU is now used in the fast-path since KPTI), I wonder where we should call this on PowerPC ? I was under the impression that PowerPC return to userspace fast-path was not calling C code unless work flags were set, but I might be wrong. Thoughts ? Thanks! Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences @ 2018-05-16 20:13 ` Mathieu Desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-16 20:13 UTC (permalink / raw) To: Peter Zijlstra Cc: Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon ----- On May 16, 2018, at 12:18 PM, Peter Zijlstra peterz@infradead.org wrote: > On Mon, Apr 30, 2018 at 06:44:26PM -0400, Mathieu Desnoyers wrote: >> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig >> index c32a181a7cbb..ed21a777e8c6 100644 >> --- a/arch/powerpc/Kconfig >> +++ b/arch/powerpc/Kconfig >> @@ -223,6 +223,7 @@ config PPC >> select HAVE_SYSCALL_TRACEPOINTS >> select HAVE_VIRT_CPU_ACCOUNTING >> select HAVE_IRQ_TIME_ACCOUNTING >> + select HAVE_RSEQ >> select IRQ_DOMAIN >> select IRQ_FORCED_THREADING >> select MODULES_USE_ELF_RELA >> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c >> index 61db86ecd318..d3bb3aaaf5ac 100644 >> --- a/arch/powerpc/kernel/signal.c >> +++ b/arch/powerpc/kernel/signal.c >> @@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk) >> /* Re-enable the breakpoints for the signal stack */ >> thread_change_pc(tsk, tsk->thread.regs); >> >> + rseq_signal_deliver(tsk->thread.regs); >> + >> if (is32) { >> if (ksig.ka.sa.sa_flags & SA_SIGINFO) >> ret = handle_rt_signal32(&ksig, oldset, tsk); >> @@ -164,6 +166,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long >> thread_info_flags) >> if (thread_info_flags & _TIF_NOTIFY_RESUME) { >> clear_thread_flag(TIF_NOTIFY_RESUME); >> tracehook_notify_resume(regs); >> + rseq_handle_notify_resume(regs); >> } >> >> user_enter(); > > Again no rseq_syscall(). Same question for PowerPC as for ARM: Considering that rseq_syscall is implemented as follows: +void rseq_syscall(struct pt_regs *regs) +{ + unsigned long ip = instruction_pointer(regs); + struct task_struct *t = current; + struct rseq_cs rseq_cs; + + if (!t->rseq) + return; + if (!access_ok(VERIFY_READ, t->rseq, sizeof(*t->rseq)) || + rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) + force_sig(SIGSEGV, t); +} and that x86 calls it from syscall_return_slowpath() (which AFAIU is now used in the fast-path since KPTI), I wonder where we should call this on PowerPC ? I was under the impression that PowerPC return to userspace fast-path was not calling C code unless work flags were set, but I might be wrong. Thoughts ? Thanks! Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences 2018-05-16 20:13 ` Mathieu Desnoyers (?) @ 2018-05-17 1:19 ` Boqun Feng -1 siblings, 0 replies; 105+ messages in thread From: Boqun Feng @ 2018-05-17 1:19 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linuxppc-dev [-- Attachment #1: Type: text/plain, Size: 3578 bytes --] On Wed, May 16, 2018 at 04:13:16PM -0400, Mathieu Desnoyers wrote: > ----- On May 16, 2018, at 12:18 PM, Peter Zijlstra peterz@infradead.org wrote: > > > On Mon, Apr 30, 2018 at 06:44:26PM -0400, Mathieu Desnoyers wrote: > >> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > >> index c32a181a7cbb..ed21a777e8c6 100644 > >> --- a/arch/powerpc/Kconfig > >> +++ b/arch/powerpc/Kconfig > >> @@ -223,6 +223,7 @@ config PPC > >> select HAVE_SYSCALL_TRACEPOINTS > >> select HAVE_VIRT_CPU_ACCOUNTING > >> select HAVE_IRQ_TIME_ACCOUNTING > >> + select HAVE_RSEQ > >> select IRQ_DOMAIN > >> select IRQ_FORCED_THREADING > >> select MODULES_USE_ELF_RELA > >> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c > >> index 61db86ecd318..d3bb3aaaf5ac 100644 > >> --- a/arch/powerpc/kernel/signal.c > >> +++ b/arch/powerpc/kernel/signal.c > >> @@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk) > >> /* Re-enable the breakpoints for the signal stack */ > >> thread_change_pc(tsk, tsk->thread.regs); > >> > >> + rseq_signal_deliver(tsk->thread.regs); > >> + > >> if (is32) { > >> if (ksig.ka.sa.sa_flags & SA_SIGINFO) > >> ret = handle_rt_signal32(&ksig, oldset, tsk); > >> @@ -164,6 +166,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long > >> thread_info_flags) > >> if (thread_info_flags & _TIF_NOTIFY_RESUME) { > >> clear_thread_flag(TIF_NOTIFY_RESUME); > >> tracehook_notify_resume(regs); > >> + rseq_handle_notify_resume(regs); > >> } > >> > >> user_enter(); > > > > Again no rseq_syscall(). > > Same question for PowerPC as for ARM: > > Considering that rseq_syscall is implemented as follows: > > +void rseq_syscall(struct pt_regs *regs) > +{ > + unsigned long ip = instruction_pointer(regs); > + struct task_struct *t = current; > + struct rseq_cs rseq_cs; > + > + if (!t->rseq) > + return; > + if (!access_ok(VERIFY_READ, t->rseq, sizeof(*t->rseq)) || > + rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) > + force_sig(SIGSEGV, t); > +} > > and that x86 calls it from syscall_return_slowpath() (which AFAIU is > now used in the fast-path since KPTI), I wonder where we should call So we actually detect this after the syscall takes effect, right? I wonder whether this could be problematic, because "disallowing syscall" in rseq areas may means the syscall won't take effect to some people, I guess? > this on PowerPC ? I was under the impression that PowerPC return to > userspace fast-path was not calling C code unless work flags were set, > but I might be wrong. > I think you're right. So we have to introduce callsite to rseq_syscall() in syscall path, something like: diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 51695608c68b..a25734a96640 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -222,6 +222,9 @@ system_call_exit: mtmsrd r11,1 #endif /* CONFIG_PPC_BOOK3E */ + addi r3,r1,STACK_FRAME_OVERHEAD + bl rseq_syscall + ld r9,TI_FLAGS(r12) li r11,-MAX_ERRNO andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) But I think it's important for us to first decide where (before or after the syscall) we do the detection. Regards, Boqun > Thoughts ? > > Thanks! > > Mathieu > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply related [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences @ 2018-05-17 1:19 ` Boqun Feng 0 siblings, 0 replies; 105+ messages in thread From: Boqun Feng @ 2018-05-17 1:19 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linuxppc-dev [-- Attachment #1: Type: text/plain, Size: 3578 bytes --] On Wed, May 16, 2018 at 04:13:16PM -0400, Mathieu Desnoyers wrote: > ----- On May 16, 2018, at 12:18 PM, Peter Zijlstra peterz@infradead.org wrote: > > > On Mon, Apr 30, 2018 at 06:44:26PM -0400, Mathieu Desnoyers wrote: > >> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > >> index c32a181a7cbb..ed21a777e8c6 100644 > >> --- a/arch/powerpc/Kconfig > >> +++ b/arch/powerpc/Kconfig > >> @@ -223,6 +223,7 @@ config PPC > >> select HAVE_SYSCALL_TRACEPOINTS > >> select HAVE_VIRT_CPU_ACCOUNTING > >> select HAVE_IRQ_TIME_ACCOUNTING > >> + select HAVE_RSEQ > >> select IRQ_DOMAIN > >> select IRQ_FORCED_THREADING > >> select MODULES_USE_ELF_RELA > >> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c > >> index 61db86ecd318..d3bb3aaaf5ac 100644 > >> --- a/arch/powerpc/kernel/signal.c > >> +++ b/arch/powerpc/kernel/signal.c > >> @@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk) > >> /* Re-enable the breakpoints for the signal stack */ > >> thread_change_pc(tsk, tsk->thread.regs); > >> > >> + rseq_signal_deliver(tsk->thread.regs); > >> + > >> if (is32) { > >> if (ksig.ka.sa.sa_flags & SA_SIGINFO) > >> ret = handle_rt_signal32(&ksig, oldset, tsk); > >> @@ -164,6 +166,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long > >> thread_info_flags) > >> if (thread_info_flags & _TIF_NOTIFY_RESUME) { > >> clear_thread_flag(TIF_NOTIFY_RESUME); > >> tracehook_notify_resume(regs); > >> + rseq_handle_notify_resume(regs); > >> } > >> > >> user_enter(); > > > > Again no rseq_syscall(). > > Same question for PowerPC as for ARM: > > Considering that rseq_syscall is implemented as follows: > > +void rseq_syscall(struct pt_regs *regs) > +{ > + unsigned long ip = instruction_pointer(regs); > + struct task_struct *t = current; > + struct rseq_cs rseq_cs; > + > + if (!t->rseq) > + return; > + if (!access_ok(VERIFY_READ, t->rseq, sizeof(*t->rseq)) || > + rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) > + force_sig(SIGSEGV, t); > +} > > and that x86 calls it from syscall_return_slowpath() (which AFAIU is > now used in the fast-path since KPTI), I wonder where we should call So we actually detect this after the syscall takes effect, right? I wonder whether this could be problematic, because "disallowing syscall" in rseq areas may means the syscall won't take effect to some people, I guess? > this on PowerPC ? I was under the impression that PowerPC return to > userspace fast-path was not calling C code unless work flags were set, > but I might be wrong. > I think you're right. So we have to introduce callsite to rseq_syscall() in syscall path, something like: diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 51695608c68b..a25734a96640 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -222,6 +222,9 @@ system_call_exit: mtmsrd r11,1 #endif /* CONFIG_PPC_BOOK3E */ + addi r3,r1,STACK_FRAME_OVERHEAD + bl rseq_syscall + ld r9,TI_FLAGS(r12) li r11,-MAX_ERRNO andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) But I think it's important for us to first decide where (before or after the syscall) we do the detection. Regards, Boqun > Thoughts ? > > Thanks! > > Mathieu > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply related [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences @ 2018-05-17 1:19 ` Boqun Feng 0 siblings, 0 replies; 105+ messages in thread From: Boqun Feng @ 2018-05-17 1:19 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas [-- Attachment #1: Type: text/plain, Size: 3578 bytes --] On Wed, May 16, 2018 at 04:13:16PM -0400, Mathieu Desnoyers wrote: > ----- On May 16, 2018, at 12:18 PM, Peter Zijlstra peterz@infradead.org wrote: > > > On Mon, Apr 30, 2018 at 06:44:26PM -0400, Mathieu Desnoyers wrote: > >> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > >> index c32a181a7cbb..ed21a777e8c6 100644 > >> --- a/arch/powerpc/Kconfig > >> +++ b/arch/powerpc/Kconfig > >> @@ -223,6 +223,7 @@ config PPC > >> select HAVE_SYSCALL_TRACEPOINTS > >> select HAVE_VIRT_CPU_ACCOUNTING > >> select HAVE_IRQ_TIME_ACCOUNTING > >> + select HAVE_RSEQ > >> select IRQ_DOMAIN > >> select IRQ_FORCED_THREADING > >> select MODULES_USE_ELF_RELA > >> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c > >> index 61db86ecd318..d3bb3aaaf5ac 100644 > >> --- a/arch/powerpc/kernel/signal.c > >> +++ b/arch/powerpc/kernel/signal.c > >> @@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk) > >> /* Re-enable the breakpoints for the signal stack */ > >> thread_change_pc(tsk, tsk->thread.regs); > >> > >> + rseq_signal_deliver(tsk->thread.regs); > >> + > >> if (is32) { > >> if (ksig.ka.sa.sa_flags & SA_SIGINFO) > >> ret = handle_rt_signal32(&ksig, oldset, tsk); > >> @@ -164,6 +166,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long > >> thread_info_flags) > >> if (thread_info_flags & _TIF_NOTIFY_RESUME) { > >> clear_thread_flag(TIF_NOTIFY_RESUME); > >> tracehook_notify_resume(regs); > >> + rseq_handle_notify_resume(regs); > >> } > >> > >> user_enter(); > > > > Again no rseq_syscall(). > > Same question for PowerPC as for ARM: > > Considering that rseq_syscall is implemented as follows: > > +void rseq_syscall(struct pt_regs *regs) > +{ > + unsigned long ip = instruction_pointer(regs); > + struct task_struct *t = current; > + struct rseq_cs rseq_cs; > + > + if (!t->rseq) > + return; > + if (!access_ok(VERIFY_READ, t->rseq, sizeof(*t->rseq)) || > + rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) > + force_sig(SIGSEGV, t); > +} > > and that x86 calls it from syscall_return_slowpath() (which AFAIU is > now used in the fast-path since KPTI), I wonder where we should call So we actually detect this after the syscall takes effect, right? I wonder whether this could be problematic, because "disallowing syscall" in rseq areas may means the syscall won't take effect to some people, I guess? > this on PowerPC ? I was under the impression that PowerPC return to > userspace fast-path was not calling C code unless work flags were set, > but I might be wrong. > I think you're right. So we have to introduce callsite to rseq_syscall() in syscall path, something like: diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 51695608c68b..a25734a96640 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -222,6 +222,9 @@ system_call_exit: mtmsrd r11,1 #endif /* CONFIG_PPC_BOOK3E */ + addi r3,r1,STACK_FRAME_OVERHEAD + bl rseq_syscall + ld r9,TI_FLAGS(r12) li r11,-MAX_ERRNO andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) But I think it's important for us to first decide where (before or after the syscall) we do the detection. Regards, Boqun > Thoughts ? > > Thanks! > > Mathieu > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply related [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences 2018-05-17 1:19 ` Boqun Feng @ 2018-05-17 7:43 ` Peter Zijlstra -1 siblings, 0 replies; 105+ messages in thread From: Peter Zijlstra @ 2018-05-17 7:43 UTC (permalink / raw) To: Boqun Feng Cc: Mathieu Desnoyers, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linuxppc-dev On Thu, May 17, 2018 at 09:19:49AM +0800, Boqun Feng wrote: > On Wed, May 16, 2018 at 04:13:16PM -0400, Mathieu Desnoyers wrote: > > and that x86 calls it from syscall_return_slowpath() (which AFAIU is > > now used in the fast-path since KPTI), I wonder where we should call > > So we actually detect this after the syscall takes effect, right? I > wonder whether this could be problematic, because "disallowing syscall" > in rseq areas may means the syscall won't take effect to some people, I > guess? It doesn't really matter I suspect, the important part is the program getting killed. I agree that doing it on sysenter is slightly nicer, but I'll take sysexit if that's what it takes. > > this on PowerPC ? I was under the impression that PowerPC return to > > userspace fast-path was not calling C code unless work flags were set, > > but I might be wrong. > > > > I think you're right. So we have to introduce callsite to rseq_syscall() > in syscall path, something like: > > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > index 51695608c68b..a25734a96640 100644 > --- a/arch/powerpc/kernel/entry_64.S > +++ b/arch/powerpc/kernel/entry_64.S > @@ -222,6 +222,9 @@ system_call_exit: > mtmsrd r11,1 > #endif /* CONFIG_PPC_BOOK3E */ > > + addi r3,r1,STACK_FRAME_OVERHEAD > + bl rseq_syscall > + > ld r9,TI_FLAGS(r12) > li r11,-MAX_ERRNO > andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) > > But I think it's important for us to first decide where (before or after > the syscall) we do the detection. The important thing is the processed getting very dead. Either sysenter or sysexit gets that done. ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences @ 2018-05-17 7:43 ` Peter Zijlstra 0 siblings, 0 replies; 105+ messages in thread From: Peter Zijlstra @ 2018-05-17 7:43 UTC (permalink / raw) To: Boqun Feng Cc: Mathieu Desnoyers, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas On Thu, May 17, 2018 at 09:19:49AM +0800, Boqun Feng wrote: > On Wed, May 16, 2018 at 04:13:16PM -0400, Mathieu Desnoyers wrote: > > and that x86 calls it from syscall_return_slowpath() (which AFAIU is > > now used in the fast-path since KPTI), I wonder where we should call > > So we actually detect this after the syscall takes effect, right? I > wonder whether this could be problematic, because "disallowing syscall" > in rseq areas may means the syscall won't take effect to some people, I > guess? It doesn't really matter I suspect, the important part is the program getting killed. I agree that doing it on sysenter is slightly nicer, but I'll take sysexit if that's what it takes. > > this on PowerPC ? I was under the impression that PowerPC return to > > userspace fast-path was not calling C code unless work flags were set, > > but I might be wrong. > > > > I think you're right. So we have to introduce callsite to rseq_syscall() > in syscall path, something like: > > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > index 51695608c68b..a25734a96640 100644 > --- a/arch/powerpc/kernel/entry_64.S > +++ b/arch/powerpc/kernel/entry_64.S > @@ -222,6 +222,9 @@ system_call_exit: > mtmsrd r11,1 > #endif /* CONFIG_PPC_BOOK3E */ > > + addi r3,r1,STACK_FRAME_OVERHEAD > + bl rseq_syscall > + > ld r9,TI_FLAGS(r12) > li r11,-MAX_ERRNO > andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) > > But I think it's important for us to first decide where (before or after > the syscall) we do the detection. The important thing is the processed getting very dead. Either sysenter or sysexit gets that done. ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences 2018-05-17 1:19 ` Boqun Feng @ 2018-05-17 15:28 ` Mathieu Desnoyers -1 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-17 15:28 UTC (permalink / raw) To: Boqun Feng, Will Deacon Cc: Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Michael Kerrisk, Joel Fernandes, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linuxppc-dev ----- On May 16, 2018, at 9:19 PM, Boqun Feng boqun.feng@gmail.com wrote: > On Wed, May 16, 2018 at 04:13:16PM -0400, Mathieu Desnoyers wrote: >> ----- On May 16, 2018, at 12:18 PM, Peter Zijlstra peterz@infradead.org wrote: >> >> > On Mon, Apr 30, 2018 at 06:44:26PM -0400, Mathieu Desnoyers wrote: >> >> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig >> >> index c32a181a7cbb..ed21a777e8c6 100644 >> >> --- a/arch/powerpc/Kconfig >> >> +++ b/arch/powerpc/Kconfig >> >> @@ -223,6 +223,7 @@ config PPC >> >> select HAVE_SYSCALL_TRACEPOINTS >> >> select HAVE_VIRT_CPU_ACCOUNTING >> >> select HAVE_IRQ_TIME_ACCOUNTING >> >> + select HAVE_RSEQ >> >> select IRQ_DOMAIN >> >> select IRQ_FORCED_THREADING >> >> select MODULES_USE_ELF_RELA >> >> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c >> >> index 61db86ecd318..d3bb3aaaf5ac 100644 >> >> --- a/arch/powerpc/kernel/signal.c >> >> +++ b/arch/powerpc/kernel/signal.c >> >> @@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk) >> >> /* Re-enable the breakpoints for the signal stack */ >> >> thread_change_pc(tsk, tsk->thread.regs); >> >> >> >> + rseq_signal_deliver(tsk->thread.regs); >> >> + >> >> if (is32) { >> >> if (ksig.ka.sa.sa_flags & SA_SIGINFO) >> >> ret = handle_rt_signal32(&ksig, oldset, tsk); >> >> @@ -164,6 +166,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long >> >> thread_info_flags) >> >> if (thread_info_flags & _TIF_NOTIFY_RESUME) { >> >> clear_thread_flag(TIF_NOTIFY_RESUME); >> >> tracehook_notify_resume(regs); >> >> + rseq_handle_notify_resume(regs); >> >> } >> >> >> >> user_enter(); >> > >> > Again no rseq_syscall(). >> >> Same question for PowerPC as for ARM: >> >> Considering that rseq_syscall is implemented as follows: >> >> +void rseq_syscall(struct pt_regs *regs) >> +{ >> + unsigned long ip = instruction_pointer(regs); >> + struct task_struct *t = current; >> + struct rseq_cs rseq_cs; >> + >> + if (!t->rseq) >> + return; >> + if (!access_ok(VERIFY_READ, t->rseq, sizeof(*t->rseq)) || >> + rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) >> + force_sig(SIGSEGV, t); >> +} >> >> and that x86 calls it from syscall_return_slowpath() (which AFAIU is >> now used in the fast-path since KPTI), I wonder where we should call > > So we actually detect this after the syscall takes effect, right? I > wonder whether this could be problematic, because "disallowing syscall" > in rseq areas may means the syscall won't take effect to some people, I > guess? > >> this on PowerPC ? I was under the impression that PowerPC return to >> userspace fast-path was not calling C code unless work flags were set, >> but I might be wrong. >> > > I think you're right. So we have to introduce callsite to rseq_syscall() > in syscall path, something like: > > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > index 51695608c68b..a25734a96640 100644 > --- a/arch/powerpc/kernel/entry_64.S > +++ b/arch/powerpc/kernel/entry_64.S > @@ -222,6 +222,9 @@ system_call_exit: > mtmsrd r11,1 > #endif /* CONFIG_PPC_BOOK3E */ > > + addi r3,r1,STACK_FRAME_OVERHEAD > + bl rseq_syscall > + > ld r9,TI_FLAGS(r12) > li r11,-MAX_ERRNO > andi. > r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) > > But I think it's important for us to first decide where (before or after > the syscall) we do the detection. As Peter said, we don't really care whether it's on syscall entry or exit, as long as the process gets killed when the erroneous use is detected. I think doing it on syscall exit is a bit easier because we can clearly access the userspace TLS, which AFAIU may be less straightforward on syscall entry. We may want to add #ifdef CONFIG_DEBUG_RSEQ / #endif around the code you proposed above, so it's only compiled in if CONFIG_DEBUG_RSEQ=y. On the ARM leg of the email thread, Will Deacon suggests to test whether current->rseq is non-NULL before calling rseq_syscall(). I wonder if this added check is justified as the assembly level, considering that this is just a debugging option. We already do that check at the very beginning of rseq_syscall(). Thoughts ? Thanks, Mathieu > > Regards, > Boqun > >> Thoughts ? >> >> Thanks! >> >> Mathieu >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. > > http://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences @ 2018-05-17 15:28 ` Mathieu Desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-17 15:28 UTC (permalink / raw) To: Boqun Feng, Will Deacon Cc: Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Michael ----- On May 16, 2018, at 9:19 PM, Boqun Feng boqun.feng@gmail.com wrote: > On Wed, May 16, 2018 at 04:13:16PM -0400, Mathieu Desnoyers wrote: >> ----- On May 16, 2018, at 12:18 PM, Peter Zijlstra peterz@infradead.org wrote: >> >> > On Mon, Apr 30, 2018 at 06:44:26PM -0400, Mathieu Desnoyers wrote: >> >> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig >> >> index c32a181a7cbb..ed21a777e8c6 100644 >> >> --- a/arch/powerpc/Kconfig >> >> +++ b/arch/powerpc/Kconfig >> >> @@ -223,6 +223,7 @@ config PPC >> >> select HAVE_SYSCALL_TRACEPOINTS >> >> select HAVE_VIRT_CPU_ACCOUNTING >> >> select HAVE_IRQ_TIME_ACCOUNTING >> >> + select HAVE_RSEQ >> >> select IRQ_DOMAIN >> >> select IRQ_FORCED_THREADING >> >> select MODULES_USE_ELF_RELA >> >> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c >> >> index 61db86ecd318..d3bb3aaaf5ac 100644 >> >> --- a/arch/powerpc/kernel/signal.c >> >> +++ b/arch/powerpc/kernel/signal.c >> >> @@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk) >> >> /* Re-enable the breakpoints for the signal stack */ >> >> thread_change_pc(tsk, tsk->thread.regs); >> >> >> >> + rseq_signal_deliver(tsk->thread.regs); >> >> + >> >> if (is32) { >> >> if (ksig.ka.sa.sa_flags & SA_SIGINFO) >> >> ret = handle_rt_signal32(&ksig, oldset, tsk); >> >> @@ -164,6 +166,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long >> >> thread_info_flags) >> >> if (thread_info_flags & _TIF_NOTIFY_RESUME) { >> >> clear_thread_flag(TIF_NOTIFY_RESUME); >> >> tracehook_notify_resume(regs); >> >> + rseq_handle_notify_resume(regs); >> >> } >> >> >> >> user_enter(); >> > >> > Again no rseq_syscall(). >> >> Same question for PowerPC as for ARM: >> >> Considering that rseq_syscall is implemented as follows: >> >> +void rseq_syscall(struct pt_regs *regs) >> +{ >> + unsigned long ip = instruction_pointer(regs); >> + struct task_struct *t = current; >> + struct rseq_cs rseq_cs; >> + >> + if (!t->rseq) >> + return; >> + if (!access_ok(VERIFY_READ, t->rseq, sizeof(*t->rseq)) || >> + rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) >> + force_sig(SIGSEGV, t); >> +} >> >> and that x86 calls it from syscall_return_slowpath() (which AFAIU is >> now used in the fast-path since KPTI), I wonder where we should call > > So we actually detect this after the syscall takes effect, right? I > wonder whether this could be problematic, because "disallowing syscall" > in rseq areas may means the syscall won't take effect to some people, I > guess? > >> this on PowerPC ? I was under the impression that PowerPC return to >> userspace fast-path was not calling C code unless work flags were set, >> but I might be wrong. >> > > I think you're right. So we have to introduce callsite to rseq_syscall() > in syscall path, something like: > > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > index 51695608c68b..a25734a96640 100644 > --- a/arch/powerpc/kernel/entry_64.S > +++ b/arch/powerpc/kernel/entry_64.S > @@ -222,6 +222,9 @@ system_call_exit: > mtmsrd r11,1 > #endif /* CONFIG_PPC_BOOK3E */ > > + addi r3,r1,STACK_FRAME_OVERHEAD > + bl rseq_syscall > + > ld r9,TI_FLAGS(r12) > li r11,-MAX_ERRNO > andi. > r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) > > But I think it's important for us to first decide where (before or after > the syscall) we do the detection. As Peter said, we don't really care whether it's on syscall entry or exit, as long as the process gets killed when the erroneous use is detected. I think doing it on syscall exit is a bit easier because we can clearly access the userspace TLS, which AFAIU may be less straightforward on syscall entry. We may want to add #ifdef CONFIG_DEBUG_RSEQ / #endif around the code you proposed above, so it's only compiled in if CONFIG_DEBUG_RSEQ=y. On the ARM leg of the email thread, Will Deacon suggests to test whether current->rseq is non-NULL before calling rseq_syscall(). I wonder if this added check is justified as the assembly level, considering that this is just a debugging option. We already do that check at the very beginning of rseq_syscall(). Thoughts ? Thanks, Mathieu > > Regards, > Boqun > >> Thoughts ? >> >> Thanks! >> >> Mathieu >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. > > http://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences 2018-05-17 15:28 ` Mathieu Desnoyers @ 2018-05-17 23:50 ` Boqun Feng -1 siblings, 0 replies; 105+ messages in thread From: Boqun Feng @ 2018-05-17 23:50 UTC (permalink / raw) To: Mathieu Desnoyers, Will Deacon Cc: Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Michael Kerrisk, Joel Fernandes, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linuxppc-dev On Thu, May 17, 2018, at 11:28 PM, Mathieu Desnoyers wrote: > ----- On May 16, 2018, at 9:19 PM, Boqun Feng boqun.feng@gmail.com wrote: > > > On Wed, May 16, 2018 at 04:13:16PM -0400, Mathieu Desnoyers wrote: > >> ----- On May 16, 2018, at 12:18 PM, Peter Zijlstra peterz@infradead.org wrote: > >> > >> > On Mon, Apr 30, 2018 at 06:44:26PM -0400, Mathieu Desnoyers wrote: > >> >> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > >> >> index c32a181a7cbb..ed21a777e8c6 100644 > >> >> --- a/arch/powerpc/Kconfig > >> >> +++ b/arch/powerpc/Kconfig > >> >> @@ -223,6 +223,7 @@ config PPC > >> >> select HAVE_SYSCALL_TRACEPOINTS > >> >> select HAVE_VIRT_CPU_ACCOUNTING > >> >> select HAVE_IRQ_TIME_ACCOUNTING > >> >> + select HAVE_RSEQ > >> >> select IRQ_DOMAIN > >> >> select IRQ_FORCED_THREADING > >> >> select MODULES_USE_ELF_RELA > >> >> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c > >> >> index 61db86ecd318..d3bb3aaaf5ac 100644 > >> >> --- a/arch/powerpc/kernel/signal.c > >> >> +++ b/arch/powerpc/kernel/signal.c > >> >> @@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk) > >> >> /* Re-enable the breakpoints for the signal stack */ > >> >> thread_change_pc(tsk, tsk->thread.regs); > >> >> > >> >> + rseq_signal_deliver(tsk->thread.regs); > >> >> + > >> >> if (is32) { > >> >> if (ksig.ka.sa.sa_flags & SA_SIGINFO) > >> >> ret = handle_rt_signal32(&ksig, oldset, tsk); > >> >> @@ -164,6 +166,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long > >> >> thread_info_flags) > >> >> if (thread_info_flags & _TIF_NOTIFY_RESUME) { > >> >> clear_thread_flag(TIF_NOTIFY_RESUME); > >> >> tracehook_notify_resume(regs); > >> >> + rseq_handle_notify_resume(regs); > >> >> } > >> >> > >> >> user_enter(); > >> > > >> > Again no rseq_syscall(). > >> > >> Same question for PowerPC as for ARM: > >> > >> Considering that rseq_syscall is implemented as follows: > >> > >> +void rseq_syscall(struct pt_regs *regs) > >> +{ > >> + unsigned long ip = instruction_pointer(regs); > >> + struct task_struct *t = current; > >> + struct rseq_cs rseq_cs; > >> + > >> + if (!t->rseq) > >> + return; > >> + if (!access_ok(VERIFY_READ, t->rseq, sizeof(*t->rseq)) || > >> + rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) > >> + force_sig(SIGSEGV, t); > >> +} > >> > >> and that x86 calls it from syscall_return_slowpath() (which AFAIU is > >> now used in the fast-path since KPTI), I wonder where we should call > > > > So we actually detect this after the syscall takes effect, right? I > > wonder whether this could be problematic, because "disallowing syscall" > > in rseq areas may means the syscall won't take effect to some people, I > > guess? > > > >> this on PowerPC ? I was under the impression that PowerPC return to > >> userspace fast-path was not calling C code unless work flags were set, > >> but I might be wrong. > >> > > > > I think you're right. So we have to introduce callsite to rseq_syscall() > > in syscall path, something like: > > > > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > > index 51695608c68b..a25734a96640 100644 > > --- a/arch/powerpc/kernel/entry_64.S > > +++ b/arch/powerpc/kernel/entry_64.S > > @@ -222,6 +222,9 @@ system_call_exit: > > mtmsrd r11,1 > > #endif /* CONFIG_PPC_BOOK3E */ > > > > + addi r3,r1,STACK_FRAME_OVERHEAD > > + bl rseq_syscall > > + > > ld r9,TI_FLAGS(r12) > > li r11,-MAX_ERRNO > > andi. > > r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) > > > > But I think it's important for us to first decide where (before or after > > the syscall) we do the detection. > > As Peter said, we don't really care whether it's on syscall entry or > exit, as > long as the process gets killed when the erroneous use is detected. I > think doing > it on syscall exit is a bit easier because we can clearly access the > userspace > TLS, which AFAIU may be less straightforward on syscall entry. > Fair enough. > We may want to add #ifdef CONFIG_DEBUG_RSEQ / #endif around the code you > proposed above, so it's only compiled in if CONFIG_DEBUG_RSEQ=y. > OK. > On the ARM leg of the email thread, Will Deacon suggests to test whether > current->rseq > is non-NULL before calling rseq_syscall(). I wonder if this added check > is justified > as the assembly level, considering that this is just a debugging option. > We already do > that check at the very beginning of rseq_syscall(). > Yes, I think it's better to do the check in rseq_syscall(), leaving the asm code a bit cleaner. Regards, Boqun > Thoughts ? > > Thanks, > > Mathieu > > > > > Regards, > > Boqun > > > >> Thoughts ? > >> > >> Thanks! > >> > >> Mathieu > >> > >> -- > >> Mathieu Desnoyers > >> EfficiOS Inc. > > > http://www.efficios.com > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences @ 2018-05-17 23:50 ` Boqun Feng 0 siblings, 0 replies; 105+ messages in thread From: Boqun Feng @ 2018-05-17 23:50 UTC (permalink / raw) To: Mathieu Desnoyers, Will Deacon Cc: Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Michael On Thu, May 17, 2018, at 11:28 PM, Mathieu Desnoyers wrote: > ----- On May 16, 2018, at 9:19 PM, Boqun Feng boqun.feng@gmail.com wrote: > > > On Wed, May 16, 2018 at 04:13:16PM -0400, Mathieu Desnoyers wrote: > >> ----- On May 16, 2018, at 12:18 PM, Peter Zijlstra peterz@infradead.org wrote: > >> > >> > On Mon, Apr 30, 2018 at 06:44:26PM -0400, Mathieu Desnoyers wrote: > >> >> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > >> >> index c32a181a7cbb..ed21a777e8c6 100644 > >> >> --- a/arch/powerpc/Kconfig > >> >> +++ b/arch/powerpc/Kconfig > >> >> @@ -223,6 +223,7 @@ config PPC > >> >> select HAVE_SYSCALL_TRACEPOINTS > >> >> select HAVE_VIRT_CPU_ACCOUNTING > >> >> select HAVE_IRQ_TIME_ACCOUNTING > >> >> + select HAVE_RSEQ > >> >> select IRQ_DOMAIN > >> >> select IRQ_FORCED_THREADING > >> >> select MODULES_USE_ELF_RELA > >> >> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c > >> >> index 61db86ecd318..d3bb3aaaf5ac 100644 > >> >> --- a/arch/powerpc/kernel/signal.c > >> >> +++ b/arch/powerpc/kernel/signal.c > >> >> @@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk) > >> >> /* Re-enable the breakpoints for the signal stack */ > >> >> thread_change_pc(tsk, tsk->thread.regs); > >> >> > >> >> + rseq_signal_deliver(tsk->thread.regs); > >> >> + > >> >> if (is32) { > >> >> if (ksig.ka.sa.sa_flags & SA_SIGINFO) > >> >> ret = handle_rt_signal32(&ksig, oldset, tsk); > >> >> @@ -164,6 +166,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long > >> >> thread_info_flags) > >> >> if (thread_info_flags & _TIF_NOTIFY_RESUME) { > >> >> clear_thread_flag(TIF_NOTIFY_RESUME); > >> >> tracehook_notify_resume(regs); > >> >> + rseq_handle_notify_resume(regs); > >> >> } > >> >> > >> >> user_enter(); > >> > > >> > Again no rseq_syscall(). > >> > >> Same question for PowerPC as for ARM: > >> > >> Considering that rseq_syscall is implemented as follows: > >> > >> +void rseq_syscall(struct pt_regs *regs) > >> +{ > >> + unsigned long ip = instruction_pointer(regs); > >> + struct task_struct *t = current; > >> + struct rseq_cs rseq_cs; > >> + > >> + if (!t->rseq) > >> + return; > >> + if (!access_ok(VERIFY_READ, t->rseq, sizeof(*t->rseq)) || > >> + rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) > >> + force_sig(SIGSEGV, t); > >> +} > >> > >> and that x86 calls it from syscall_return_slowpath() (which AFAIU is > >> now used in the fast-path since KPTI), I wonder where we should call > > > > So we actually detect this after the syscall takes effect, right? I > > wonder whether this could be problematic, because "disallowing syscall" > > in rseq areas may means the syscall won't take effect to some people, I > > guess? > > > >> this on PowerPC ? I was under the impression that PowerPC return to > >> userspace fast-path was not calling C code unless work flags were set, > >> but I might be wrong. > >> > > > > I think you're right. So we have to introduce callsite to rseq_syscall() > > in syscall path, something like: > > > > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > > index 51695608c68b..a25734a96640 100644 > > --- a/arch/powerpc/kernel/entry_64.S > > +++ b/arch/powerpc/kernel/entry_64.S > > @@ -222,6 +222,9 @@ system_call_exit: > > mtmsrd r11,1 > > #endif /* CONFIG_PPC_BOOK3E */ > > > > + addi r3,r1,STACK_FRAME_OVERHEAD > > + bl rseq_syscall > > + > > ld r9,TI_FLAGS(r12) > > li r11,-MAX_ERRNO > > andi. > > r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) > > > > But I think it's important for us to first decide where (before or after > > the syscall) we do the detection. > > As Peter said, we don't really care whether it's on syscall entry or > exit, as > long as the process gets killed when the erroneous use is detected. I > think doing > it on syscall exit is a bit easier because we can clearly access the > userspace > TLS, which AFAIU may be less straightforward on syscall entry. > Fair enough. > We may want to add #ifdef CONFIG_DEBUG_RSEQ / #endif around the code you > proposed above, so it's only compiled in if CONFIG_DEBUG_RSEQ=y. > OK. > On the ARM leg of the email thread, Will Deacon suggests to test whether > current->rseq > is non-NULL before calling rseq_syscall(). I wonder if this added check > is justified > as the assembly level, considering that this is just a debugging option. > We already do > that check at the very beginning of rseq_syscall(). > Yes, I think it's better to do the check in rseq_syscall(), leaving the asm code a bit cleaner. Regards, Boqun > Thoughts ? > > Thanks, > > Mathieu > > > > > Regards, > > Boqun > > > >> Thoughts ? > >> > >> Thanks! > >> > >> Mathieu > >> > >> -- > >> Mathieu Desnoyers > >> EfficiOS Inc. > > > http://www.efficios.com > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences 2018-05-17 23:50 ` Boqun Feng @ 2018-05-18 18:17 ` Mathieu Desnoyers -1 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-18 18:17 UTC (permalink / raw) To: Boqun Feng Cc: Will Deacon, Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Michael Kerrisk, Joel Fernandes, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linuxppc-dev ----- On May 17, 2018, at 7:50 PM, Boqun Feng boqun.feng@gmail.com wrote: [...] >> > I think you're right. So we have to introduce callsite to rseq_syscall() >> > in syscall path, something like: >> > >> > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S >> > index 51695608c68b..a25734a96640 100644 >> > --- a/arch/powerpc/kernel/entry_64.S >> > +++ b/arch/powerpc/kernel/entry_64.S >> > @@ -222,6 +222,9 @@ system_call_exit: >> > mtmsrd r11,1 >> > #endif /* CONFIG_PPC_BOOK3E */ >> > >> > + addi r3,r1,STACK_FRAME_OVERHEAD >> > + bl rseq_syscall >> > + >> > ld r9,TI_FLAGS(r12) >> > li r11,-MAX_ERRNO >> > andi. >> > r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) >> > By the way, I think this is not the right spot to call rseq_syscall, because interrupts are disabled. I think we should move this hunk right after system_call_exit. Would you like to implement and test an updated patch adding those calls for ppc 32 and 64 ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences @ 2018-05-18 18:17 ` Mathieu Desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-18 18:17 UTC (permalink / raw) To: Boqun Feng Cc: Will Deacon, Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas ----- On May 17, 2018, at 7:50 PM, Boqun Feng boqun.feng@gmail.com wrote: [...] >> > I think you're right. So we have to introduce callsite to rseq_syscall() >> > in syscall path, something like: >> > >> > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S >> > index 51695608c68b..a25734a96640 100644 >> > --- a/arch/powerpc/kernel/entry_64.S >> > +++ b/arch/powerpc/kernel/entry_64.S >> > @@ -222,6 +222,9 @@ system_call_exit: >> > mtmsrd r11,1 >> > #endif /* CONFIG_PPC_BOOK3E */ >> > >> > + addi r3,r1,STACK_FRAME_OVERHEAD >> > + bl rseq_syscall >> > + >> > ld r9,TI_FLAGS(r12) >> > li r11,-MAX_ERRNO >> > andi. >> > r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) >> > By the way, I think this is not the right spot to call rseq_syscall, because interrupts are disabled. I think we should move this hunk right after system_call_exit. Would you like to implement and test an updated patch adding those calls for ppc 32 and 64 ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences 2018-05-18 18:17 ` Mathieu Desnoyers (?) @ 2018-05-20 14:08 ` Boqun Feng -1 siblings, 0 replies; 105+ messages in thread From: Boqun Feng @ 2018-05-20 14:08 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Will Deacon, Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Michael Kerrisk, Joel Fernandes, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linuxppc-dev On Fri, May 18, 2018 at 02:17:17PM -0400, Mathieu Desnoyers wrote: > ----- On May 17, 2018, at 7:50 PM, Boqun Feng boqun.feng@gmail.com wrote: > [...] > >> > I think you're right. So we have to introduce callsite to rseq_syscall() > >> > in syscall path, something like: > >> > > >> > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > >> > index 51695608c68b..a25734a96640 100644 > >> > --- a/arch/powerpc/kernel/entry_64.S > >> > +++ b/arch/powerpc/kernel/entry_64.S > >> > @@ -222,6 +222,9 @@ system_call_exit: > >> > mtmsrd r11,1 > >> > #endif /* CONFIG_PPC_BOOK3E */ > >> > > >> > + addi r3,r1,STACK_FRAME_OVERHEAD > >> > + bl rseq_syscall > >> > + > >> > ld r9,TI_FLAGS(r12) > >> > li r11,-MAX_ERRNO > >> > andi. > >> > r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) > >> > > > By the way, I think this is not the right spot to call rseq_syscall, because > interrupts are disabled. I think we should move this hunk right after system_call_exit. > Good point. > Would you like to implement and test an updated patch adding those calls for ppc 32 and 64 ? > I'd like to help, but I don't have a handy ppc environment for test... So I made the below patch which has only been build-tested, hope it could be somewhat helpful. Regards, Boqun --------------------------------->8 Subject: [PATCH] powerpc: Add syscall detection for restartable sequences Syscalls are not allowed inside restartable sequences, so add a call to rseq_syscall() at the very beginning of system call exiting path for CONFIG_DEBUG_RSEQ=y kernel. This could help us to detect whether there is a syscall issued inside restartable sequences. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> --- arch/powerpc/kernel/entry_32.S | 5 +++++ arch/powerpc/kernel/entry_64.S | 5 +++++ 2 files changed, 10 insertions(+) diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index eb8d01bae8c6..2f134eebe7ed 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -365,6 +365,11 @@ syscall_dotrace_cont: blrl /* Call handler */ .globl ret_from_syscall ret_from_syscall: +#ifdef CONFIG_DEBUG_RSEQ + /* Check whether the syscall is issued inside a restartable sequence */ + addi r3,r1,STACK_FRAME_OVERHEAD + bl rseq_syscall +#endif mr r6,r3 CURRENT_THREAD_INFO(r12, r1) /* disable interrupts so current_thread_info()->flags can't change */ diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 2cb5109a7ea3..2e2d59bb45d0 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -204,6 +204,11 @@ system_call: /* label this so stack traces look sane */ * This is blacklisted from kprobes further below with _ASM_NOKPROBE_SYMBOL(). */ system_call_exit: +#ifdef CONFIG_DEBUG_RSEQ + /* Check whether the syscall is issued inside a restartable sequence */ + addi r3,r1,STACK_FRAME_OVERHEAD + bl rseq_syscall +#endif /* * Disable interrupts so current_thread_info()->flags can't change, * and so that we don't get interrupted after loading SRR0/1. -- 2.16.2 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences @ 2018-05-20 14:08 ` Boqun Feng 0 siblings, 0 replies; 105+ messages in thread From: Boqun Feng @ 2018-05-20 14:08 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Will Deacon, Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Michael Kerrisk, Joel Fernandes, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linuxppc-dev On Fri, May 18, 2018 at 02:17:17PM -0400, Mathieu Desnoyers wrote: > ----- On May 17, 2018, at 7:50 PM, Boqun Feng boqun.feng@gmail.com wrote: > [...] > >> > I think you're right. So we have to introduce callsite to rseq_syscall() > >> > in syscall path, something like: > >> > > >> > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > >> > index 51695608c68b..a25734a96640 100644 > >> > --- a/arch/powerpc/kernel/entry_64.S > >> > +++ b/arch/powerpc/kernel/entry_64.S > >> > @@ -222,6 +222,9 @@ system_call_exit: > >> > mtmsrd r11,1 > >> > #endif /* CONFIG_PPC_BOOK3E */ > >> > > >> > + addi r3,r1,STACK_FRAME_OVERHEAD > >> > + bl rseq_syscall > >> > + > >> > ld r9,TI_FLAGS(r12) > >> > li r11,-MAX_ERRNO > >> > andi. > >> > r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) > >> > > > By the way, I think this is not the right spot to call rseq_syscall, because > interrupts are disabled. I think we should move this hunk right after system_call_exit. > Good point. > Would you like to implement and test an updated patch adding those calls for ppc 32 and 64 ? > I'd like to help, but I don't have a handy ppc environment for test... So I made the below patch which has only been build-tested, hope it could be somewhat helpful. Regards, Boqun --------------------------------->8 Subject: [PATCH] powerpc: Add syscall detection for restartable sequences Syscalls are not allowed inside restartable sequences, so add a call to rseq_syscall() at the very beginning of system call exiting path for CONFIG_DEBUG_RSEQ=y kernel. This could help us to detect whether there is a syscall issued inside restartable sequences. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> --- arch/powerpc/kernel/entry_32.S | 5 +++++ arch/powerpc/kernel/entry_64.S | 5 +++++ 2 files changed, 10 insertions(+) diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index eb8d01bae8c6..2f134eebe7ed 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -365,6 +365,11 @@ syscall_dotrace_cont: blrl /* Call handler */ .globl ret_from_syscall ret_from_syscall: +#ifdef CONFIG_DEBUG_RSEQ + /* Check whether the syscall is issued inside a restartable sequence */ + addi r3,r1,STACK_FRAME_OVERHEAD + bl rseq_syscall +#endif mr r6,r3 CURRENT_THREAD_INFO(r12, r1) /* disable interrupts so current_thread_info()->flags can't change */ diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 2cb5109a7ea3..2e2d59bb45d0 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -204,6 +204,11 @@ system_call: /* label this so stack traces look sane */ * This is blacklisted from kprobes further below with _ASM_NOKPROBE_SYMBOL(). */ system_call_exit: +#ifdef CONFIG_DEBUG_RSEQ + /* Check whether the syscall is issued inside a restartable sequence */ + addi r3,r1,STACK_FRAME_OVERHEAD + bl rseq_syscall +#endif /* * Disable interrupts so current_thread_info()->flags can't change, * and so that we don't get interrupted after loading SRR0/1. -- 2.16.2 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences @ 2018-05-20 14:08 ` Boqun Feng 0 siblings, 0 replies; 105+ messages in thread From: Boqun Feng @ 2018-05-20 14:08 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Will Deacon, Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas On Fri, May 18, 2018 at 02:17:17PM -0400, Mathieu Desnoyers wrote: > ----- On May 17, 2018, at 7:50 PM, Boqun Feng boqun.feng@gmail.com wrote: > [...] > >> > I think you're right. So we have to introduce callsite to rseq_syscall() > >> > in syscall path, something like: > >> > > >> > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > >> > index 51695608c68b..a25734a96640 100644 > >> > --- a/arch/powerpc/kernel/entry_64.S > >> > +++ b/arch/powerpc/kernel/entry_64.S > >> > @@ -222,6 +222,9 @@ system_call_exit: > >> > mtmsrd r11,1 > >> > #endif /* CONFIG_PPC_BOOK3E */ > >> > > >> > + addi r3,r1,STACK_FRAME_OVERHEAD > >> > + bl rseq_syscall > >> > + > >> > ld r9,TI_FLAGS(r12) > >> > li r11,-MAX_ERRNO > >> > andi. > >> > r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) > >> > > > By the way, I think this is not the right spot to call rseq_syscall, because > interrupts are disabled. I think we should move this hunk right after system_call_exit. > Good point. > Would you like to implement and test an updated patch adding those calls for ppc 32 and 64 ? > I'd like to help, but I don't have a handy ppc environment for test... So I made the below patch which has only been build-tested, hope it could be somewhat helpful. Regards, Boqun --------------------------------->8 Subject: [PATCH] powerpc: Add syscall detection for restartable sequences Syscalls are not allowed inside restartable sequences, so add a call to rseq_syscall() at the very beginning of system call exiting path for CONFIG_DEBUG_RSEQ=y kernel. This could help us to detect whether there is a syscall issued inside restartable sequences. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> --- arch/powerpc/kernel/entry_32.S | 5 +++++ arch/powerpc/kernel/entry_64.S | 5 +++++ 2 files changed, 10 insertions(+) diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index eb8d01bae8c6..2f134eebe7ed 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -365,6 +365,11 @@ syscall_dotrace_cont: blrl /* Call handler */ .globl ret_from_syscall ret_from_syscall: +#ifdef CONFIG_DEBUG_RSEQ + /* Check whether the syscall is issued inside a restartable sequence */ + addi r3,r1,STACK_FRAME_OVERHEAD + bl rseq_syscall +#endif mr r6,r3 CURRENT_THREAD_INFO(r12, r1) /* disable interrupts so current_thread_info()->flags can't change */ diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 2cb5109a7ea3..2e2d59bb45d0 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -204,6 +204,11 @@ system_call: /* label this so stack traces look sane */ * This is blacklisted from kprobes further below with _ASM_NOKPROBE_SYMBOL(). */ system_call_exit: +#ifdef CONFIG_DEBUG_RSEQ + /* Check whether the syscall is issued inside a restartable sequence */ + addi r3,r1,STACK_FRAME_OVERHEAD + bl rseq_syscall +#endif /* * Disable interrupts so current_thread_info()->flags can't change, * and so that we don't get interrupted after loading SRR0/1. -- 2.16.2 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences 2018-05-20 14:08 ` Boqun Feng @ 2018-05-23 20:14 ` Mathieu Desnoyers -1 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-23 20:14 UTC (permalink / raw) To: Boqun Feng Cc: Will Deacon, Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Michael Kerrisk, Joel Fernandes, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linuxppc-dev ----- On May 20, 2018, at 10:08 AM, Boqun Feng boqun.feng@gmail.com wrote: > On Fri, May 18, 2018 at 02:17:17PM -0400, Mathieu Desnoyers wrote: >> ----- On May 17, 2018, at 7:50 PM, Boqun Feng boqun.feng@gmail.com wrote: >> [...] >> >> > I think you're right. So we have to introduce callsite to rseq_syscall() >> >> > in syscall path, something like: >> >> > >> >> > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S >> >> > index 51695608c68b..a25734a96640 100644 >> >> > --- a/arch/powerpc/kernel/entry_64.S >> >> > +++ b/arch/powerpc/kernel/entry_64.S >> >> > @@ -222,6 +222,9 @@ system_call_exit: >> >> > mtmsrd r11,1 >> >> > #endif /* CONFIG_PPC_BOOK3E */ >> >> > >> >> > + addi r3,r1,STACK_FRAME_OVERHEAD >> >> > + bl rseq_syscall >> >> > + >> >> > ld r9,TI_FLAGS(r12) >> >> > li r11,-MAX_ERRNO >> >> > andi. >> >> > r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) >> >> > >> >> By the way, I think this is not the right spot to call rseq_syscall, because >> interrupts are disabled. I think we should move this hunk right after >> system_call_exit. >> > > Good point. > >> Would you like to implement and test an updated patch adding those calls for ppc >> 32 and 64 ? >> > > I'd like to help, but I don't have a handy ppc environment for test... > So I made the below patch which has only been build-tested, hope it > could be somewhat helpful. Hi Boqun, I tried your patch in a ppc64 le environment, and it does not survive boot with CONFIG_DEBUG_RSEQ=y. init gets killed right away. Moreover, I'm not sure that the r3 register don't contain something worth saving before the call on ppc32. Just after there is a "mr" instruction which AFAIU takes r3 as input register. Can you look into it ? Thanks, Mathieu > > Regards, > Boqun > > --------------------------------->8 > Subject: [PATCH] powerpc: Add syscall detection for restartable sequences > > Syscalls are not allowed inside restartable sequences, so add a call to > rseq_syscall() at the very beginning of system call exiting path for > CONFIG_DEBUG_RSEQ=y kernel. This could help us to detect whether there > is a syscall issued inside restartable sequences. > > Signed-off-by: Boqun Feng <boqun.feng@gmail.com> > --- > arch/powerpc/kernel/entry_32.S | 5 +++++ > arch/powerpc/kernel/entry_64.S | 5 +++++ > 2 files changed, 10 insertions(+) > > diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S > index eb8d01bae8c6..2f134eebe7ed 100644 > --- a/arch/powerpc/kernel/entry_32.S > +++ b/arch/powerpc/kernel/entry_32.S > @@ -365,6 +365,11 @@ syscall_dotrace_cont: > blrl /* Call handler */ > .globl ret_from_syscall > ret_from_syscall: > +#ifdef CONFIG_DEBUG_RSEQ > + /* Check whether the syscall is issued inside a restartable sequence */ > + addi r3,r1,STACK_FRAME_OVERHEAD > + bl rseq_syscall > +#endif > mr r6,r3 > CURRENT_THREAD_INFO(r12, r1) > /* disable interrupts so current_thread_info()->flags can't change */ > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > index 2cb5109a7ea3..2e2d59bb45d0 100644 > --- a/arch/powerpc/kernel/entry_64.S > +++ b/arch/powerpc/kernel/entry_64.S > @@ -204,6 +204,11 @@ system_call: /* label this so stack traces look sane */ > * This is blacklisted from kprobes further below with _ASM_NOKPROBE_SYMBOL(). > */ > system_call_exit: > +#ifdef CONFIG_DEBUG_RSEQ > + /* Check whether the syscall is issued inside a restartable sequence */ > + addi r3,r1,STACK_FRAME_OVERHEAD > + bl rseq_syscall > +#endif > /* > * Disable interrupts so current_thread_info()->flags can't change, > * and so that we don't get interrupted after loading SRR0/1. > -- > 2.16.2 -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences @ 2018-05-23 20:14 ` Mathieu Desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-23 20:14 UTC (permalink / raw) To: Boqun Feng Cc: Will Deacon, Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas ----- On May 20, 2018, at 10:08 AM, Boqun Feng boqun.feng@gmail.com wrote: > On Fri, May 18, 2018 at 02:17:17PM -0400, Mathieu Desnoyers wrote: >> ----- On May 17, 2018, at 7:50 PM, Boqun Feng boqun.feng@gmail.com wrote: >> [...] >> >> > I think you're right. So we have to introduce callsite to rseq_syscall() >> >> > in syscall path, something like: >> >> > >> >> > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S >> >> > index 51695608c68b..a25734a96640 100644 >> >> > --- a/arch/powerpc/kernel/entry_64.S >> >> > +++ b/arch/powerpc/kernel/entry_64.S >> >> > @@ -222,6 +222,9 @@ system_call_exit: >> >> > mtmsrd r11,1 >> >> > #endif /* CONFIG_PPC_BOOK3E */ >> >> > >> >> > + addi r3,r1,STACK_FRAME_OVERHEAD >> >> > + bl rseq_syscall >> >> > + >> >> > ld r9,TI_FLAGS(r12) >> >> > li r11,-MAX_ERRNO >> >> > andi. >> >> > r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) >> >> > >> >> By the way, I think this is not the right spot to call rseq_syscall, because >> interrupts are disabled. I think we should move this hunk right after >> system_call_exit. >> > > Good point. > >> Would you like to implement and test an updated patch adding those calls for ppc >> 32 and 64 ? >> > > I'd like to help, but I don't have a handy ppc environment for test... > So I made the below patch which has only been build-tested, hope it > could be somewhat helpful. Hi Boqun, I tried your patch in a ppc64 le environment, and it does not survive boot with CONFIG_DEBUG_RSEQ=y. init gets killed right away. Moreover, I'm not sure that the r3 register don't contain something worth saving before the call on ppc32. Just after there is a "mr" instruction which AFAIU takes r3 as input register. Can you look into it ? Thanks, Mathieu > > Regards, > Boqun > > --------------------------------->8 > Subject: [PATCH] powerpc: Add syscall detection for restartable sequences > > Syscalls are not allowed inside restartable sequences, so add a call to > rseq_syscall() at the very beginning of system call exiting path for > CONFIG_DEBUG_RSEQ=y kernel. This could help us to detect whether there > is a syscall issued inside restartable sequences. > > Signed-off-by: Boqun Feng <boqun.feng@gmail.com> > --- > arch/powerpc/kernel/entry_32.S | 5 +++++ > arch/powerpc/kernel/entry_64.S | 5 +++++ > 2 files changed, 10 insertions(+) > > diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S > index eb8d01bae8c6..2f134eebe7ed 100644 > --- a/arch/powerpc/kernel/entry_32.S > +++ b/arch/powerpc/kernel/entry_32.S > @@ -365,6 +365,11 @@ syscall_dotrace_cont: > blrl /* Call handler */ > .globl ret_from_syscall > ret_from_syscall: > +#ifdef CONFIG_DEBUG_RSEQ > + /* Check whether the syscall is issued inside a restartable sequence */ > + addi r3,r1,STACK_FRAME_OVERHEAD > + bl rseq_syscall > +#endif > mr r6,r3 > CURRENT_THREAD_INFO(r12, r1) > /* disable interrupts so current_thread_info()->flags can't change */ > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > index 2cb5109a7ea3..2e2d59bb45d0 100644 > --- a/arch/powerpc/kernel/entry_64.S > +++ b/arch/powerpc/kernel/entry_64.S > @@ -204,6 +204,11 @@ system_call: /* label this so stack traces look sane */ > * This is blacklisted from kprobes further below with _ASM_NOKPROBE_SYMBOL(). > */ > system_call_exit: > +#ifdef CONFIG_DEBUG_RSEQ > + /* Check whether the syscall is issued inside a restartable sequence */ > + addi r3,r1,STACK_FRAME_OVERHEAD > + bl rseq_syscall > +#endif > /* > * Disable interrupts so current_thread_info()->flags can't change, > * and so that we don't get interrupted after loading SRR0/1. > -- > 2.16.2 -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences 2018-05-23 20:14 ` Mathieu Desnoyers @ 2018-05-23 20:46 ` Paul E. McKenney -1 siblings, 0 replies; 105+ messages in thread From: Paul E. McKenney @ 2018-05-23 20:46 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Boqun Feng, Will Deacon, Peter Zijlstra, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Michael Kerrisk, Joel Fernandes, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linuxppc-dev On Wed, May 23, 2018 at 04:14:39PM -0400, Mathieu Desnoyers wrote: > ----- On May 20, 2018, at 10:08 AM, Boqun Feng boqun.feng@gmail.com wrote: > > > On Fri, May 18, 2018 at 02:17:17PM -0400, Mathieu Desnoyers wrote: > >> ----- On May 17, 2018, at 7:50 PM, Boqun Feng boqun.feng@gmail.com wrote: > >> [...] > >> >> > I think you're right. So we have to introduce callsite to rseq_syscall() > >> >> > in syscall path, something like: > >> >> > > >> >> > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > >> >> > index 51695608c68b..a25734a96640 100644 > >> >> > --- a/arch/powerpc/kernel/entry_64.S > >> >> > +++ b/arch/powerpc/kernel/entry_64.S > >> >> > @@ -222,6 +222,9 @@ system_call_exit: > >> >> > mtmsrd r11,1 > >> >> > #endif /* CONFIG_PPC_BOOK3E */ > >> >> > > >> >> > + addi r3,r1,STACK_FRAME_OVERHEAD > >> >> > + bl rseq_syscall > >> >> > + > >> >> > ld r9,TI_FLAGS(r12) > >> >> > li r11,-MAX_ERRNO > >> >> > andi. > >> >> > r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) > >> >> > > >> > >> By the way, I think this is not the right spot to call rseq_syscall, because > >> interrupts are disabled. I think we should move this hunk right after > >> system_call_exit. > >> > > > > Good point. > > > >> Would you like to implement and test an updated patch adding those calls for ppc > >> 32 and 64 ? > >> > > > > I'd like to help, but I don't have a handy ppc environment for test... > > So I made the below patch which has only been build-tested, hope it > > could be somewhat helpful. > > Hi Boqun, > > I tried your patch in a ppc64 le environment, and it does not survive boot > with CONFIG_DEBUG_RSEQ=y. init gets killed right away. > > Moreover, I'm not sure that the r3 register don't contain something worth > saving before the call on ppc32. Just after there is a "mr" instruction > which AFAIU takes r3 as input register. > > Can you look into it ? Hello, Boqun, You can also request access to a ppc64 environment here: http://osuosl.org/services/powerdev/request_hosting/ Thanx, Paul > Thanks, > > Mathieu > > > > > Regards, > > Boqun > > > > --------------------------------->8 > > Subject: [PATCH] powerpc: Add syscall detection for restartable sequences > > > > Syscalls are not allowed inside restartable sequences, so add a call to > > rseq_syscall() at the very beginning of system call exiting path for > > CONFIG_DEBUG_RSEQ=y kernel. This could help us to detect whether there > > is a syscall issued inside restartable sequences. > > > > Signed-off-by: Boqun Feng <boqun.feng@gmail.com> > > --- > > arch/powerpc/kernel/entry_32.S | 5 +++++ > > arch/powerpc/kernel/entry_64.S | 5 +++++ > > 2 files changed, 10 insertions(+) > > > > diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S > > index eb8d01bae8c6..2f134eebe7ed 100644 > > --- a/arch/powerpc/kernel/entry_32.S > > +++ b/arch/powerpc/kernel/entry_32.S > > @@ -365,6 +365,11 @@ syscall_dotrace_cont: > > blrl /* Call handler */ > > .globl ret_from_syscall > > ret_from_syscall: > > +#ifdef CONFIG_DEBUG_RSEQ > > + /* Check whether the syscall is issued inside a restartable sequence */ > > + addi r3,r1,STACK_FRAME_OVERHEAD > > + bl rseq_syscall > > +#endif > > mr r6,r3 > > CURRENT_THREAD_INFO(r12, r1) > > /* disable interrupts so current_thread_info()->flags can't change */ > > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > > index 2cb5109a7ea3..2e2d59bb45d0 100644 > > --- a/arch/powerpc/kernel/entry_64.S > > +++ b/arch/powerpc/kernel/entry_64.S > > @@ -204,6 +204,11 @@ system_call: /* label this so stack traces look sane */ > > * This is blacklisted from kprobes further below with _ASM_NOKPROBE_SYMBOL(). > > */ > > system_call_exit: > > +#ifdef CONFIG_DEBUG_RSEQ > > + /* Check whether the syscall is issued inside a restartable sequence */ > > + addi r3,r1,STACK_FRAME_OVERHEAD > > + bl rseq_syscall > > +#endif > > /* > > * Disable interrupts so current_thread_info()->flags can't change, > > * and so that we don't get interrupted after loading SRR0/1. > > -- > > 2.16.2 > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com > ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences @ 2018-05-23 20:46 ` Paul E. McKenney 0 siblings, 0 replies; 105+ messages in thread From: Paul E. McKenney @ 2018-05-23 20:46 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Boqun Feng, Will Deacon, Peter Zijlstra, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas On Wed, May 23, 2018 at 04:14:39PM -0400, Mathieu Desnoyers wrote: > ----- On May 20, 2018, at 10:08 AM, Boqun Feng boqun.feng@gmail.com wrote: > > > On Fri, May 18, 2018 at 02:17:17PM -0400, Mathieu Desnoyers wrote: > >> ----- On May 17, 2018, at 7:50 PM, Boqun Feng boqun.feng@gmail.com wrote: > >> [...] > >> >> > I think you're right. So we have to introduce callsite to rseq_syscall() > >> >> > in syscall path, something like: > >> >> > > >> >> > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > >> >> > index 51695608c68b..a25734a96640 100644 > >> >> > --- a/arch/powerpc/kernel/entry_64.S > >> >> > +++ b/arch/powerpc/kernel/entry_64.S > >> >> > @@ -222,6 +222,9 @@ system_call_exit: > >> >> > mtmsrd r11,1 > >> >> > #endif /* CONFIG_PPC_BOOK3E */ > >> >> > > >> >> > + addi r3,r1,STACK_FRAME_OVERHEAD > >> >> > + bl rseq_syscall > >> >> > + > >> >> > ld r9,TI_FLAGS(r12) > >> >> > li r11,-MAX_ERRNO > >> >> > andi. > >> >> > r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) > >> >> > > >> > >> By the way, I think this is not the right spot to call rseq_syscall, because > >> interrupts are disabled. I think we should move this hunk right after > >> system_call_exit. > >> > > > > Good point. > > > >> Would you like to implement and test an updated patch adding those calls for ppc > >> 32 and 64 ? > >> > > > > I'd like to help, but I don't have a handy ppc environment for test... > > So I made the below patch which has only been build-tested, hope it > > could be somewhat helpful. > > Hi Boqun, > > I tried your patch in a ppc64 le environment, and it does not survive boot > with CONFIG_DEBUG_RSEQ=y. init gets killed right away. > > Moreover, I'm not sure that the r3 register don't contain something worth > saving before the call on ppc32. Just after there is a "mr" instruction > which AFAIU takes r3 as input register. > > Can you look into it ? Hello, Boqun, You can also request access to a ppc64 environment here: http://osuosl.org/services/powerdev/request_hosting/ Thanx, Paul > Thanks, > > Mathieu > > > > > Regards, > > Boqun > > > > --------------------------------->8 > > Subject: [PATCH] powerpc: Add syscall detection for restartable sequences > > > > Syscalls are not allowed inside restartable sequences, so add a call to > > rseq_syscall() at the very beginning of system call exiting path for > > CONFIG_DEBUG_RSEQ=y kernel. This could help us to detect whether there > > is a syscall issued inside restartable sequences. > > > > Signed-off-by: Boqun Feng <boqun.feng@gmail.com> > > --- > > arch/powerpc/kernel/entry_32.S | 5 +++++ > > arch/powerpc/kernel/entry_64.S | 5 +++++ > > 2 files changed, 10 insertions(+) > > > > diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S > > index eb8d01bae8c6..2f134eebe7ed 100644 > > --- a/arch/powerpc/kernel/entry_32.S > > +++ b/arch/powerpc/kernel/entry_32.S > > @@ -365,6 +365,11 @@ syscall_dotrace_cont: > > blrl /* Call handler */ > > .globl ret_from_syscall > > ret_from_syscall: > > +#ifdef CONFIG_DEBUG_RSEQ > > + /* Check whether the syscall is issued inside a restartable sequence */ > > + addi r3,r1,STACK_FRAME_OVERHEAD > > + bl rseq_syscall > > +#endif > > mr r6,r3 > > CURRENT_THREAD_INFO(r12, r1) > > /* disable interrupts so current_thread_info()->flags can't change */ > > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > > index 2cb5109a7ea3..2e2d59bb45d0 100644 > > --- a/arch/powerpc/kernel/entry_64.S > > +++ b/arch/powerpc/kernel/entry_64.S > > @@ -204,6 +204,11 @@ system_call: /* label this so stack traces look sane */ > > * This is blacklisted from kprobes further below with _ASM_NOKPROBE_SYMBOL(). > > */ > > system_call_exit: > > +#ifdef CONFIG_DEBUG_RSEQ > > + /* Check whether the syscall is issued inside a restartable sequence */ > > + addi r3,r1,STACK_FRAME_OVERHEAD > > + bl rseq_syscall > > +#endif > > /* > > * Disable interrupts so current_thread_info()->flags can't change, > > * and so that we don't get interrupted after loading SRR0/1. > > -- > > 2.16.2 > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com > ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences 2018-05-23 20:14 ` Mathieu Desnoyers @ 2018-05-23 21:29 ` Mathieu Desnoyers -1 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-23 21:29 UTC (permalink / raw) To: Boqun Feng Cc: Will Deacon, Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Michael Kerrisk, Joel Fernandes, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linuxppc-dev ----- On May 23, 2018, at 4:14 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote: > ----- On May 20, 2018, at 10:08 AM, Boqun Feng boqun.feng@gmail.com wrote: > >> On Fri, May 18, 2018 at 02:17:17PM -0400, Mathieu Desnoyers wrote: >>> ----- On May 17, 2018, at 7:50 PM, Boqun Feng boqun.feng@gmail.com wrote: >>> [...] >>> >> > I think you're right. So we have to introduce callsite to rseq_syscall() >>> >> > in syscall path, something like: >>> >> > >>> >> > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S >>> >> > index 51695608c68b..a25734a96640 100644 >>> >> > --- a/arch/powerpc/kernel/entry_64.S >>> >> > +++ b/arch/powerpc/kernel/entry_64.S >>> >> > @@ -222,6 +222,9 @@ system_call_exit: >>> >> > mtmsrd r11,1 >>> >> > #endif /* CONFIG_PPC_BOOK3E */ >>> >> > >>> >> > + addi r3,r1,STACK_FRAME_OVERHEAD >>> >> > + bl rseq_syscall >>> >> > + >>> >> > ld r9,TI_FLAGS(r12) >>> >> > li r11,-MAX_ERRNO >>> >> > andi. >>> >> > r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) >>> >> > >>> >>> By the way, I think this is not the right spot to call rseq_syscall, because >>> interrupts are disabled. I think we should move this hunk right after >>> system_call_exit. >>> >> >> Good point. >> >>> Would you like to implement and test an updated patch adding those calls for ppc >>> 32 and 64 ? >>> >> >> I'd like to help, but I don't have a handy ppc environment for test... >> So I made the below patch which has only been build-tested, hope it >> could be somewhat helpful. > > Hi Boqun, > > I tried your patch in a ppc64 le environment, and it does not survive boot > with CONFIG_DEBUG_RSEQ=y. init gets killed right away. The following fixup gets ppc64 to work: --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -208,6 +208,7 @@ system_call_exit: /* Check whether the syscall is issued inside a restartable sequence */ addi r3,r1,STACK_FRAME_OVERHEAD bl rseq_syscall + ld r3,RESULT(r1) #endif /* * Disable interrupts so current_thread_info()->flags can't change, > Moreover, I'm not sure that the r3 register don't contain something worth > saving before the call on ppc32. Just after there is a "mr" instruction > which AFAIU takes r3 as input register. I'll start testing on ppc32 now. Thanks, Mathieu > > Can you look into it ? > > Thanks, > > Mathieu > >> >> Regards, >> Boqun >> >> --------------------------------->8 >> Subject: [PATCH] powerpc: Add syscall detection for restartable sequences >> >> Syscalls are not allowed inside restartable sequences, so add a call to >> rseq_syscall() at the very beginning of system call exiting path for >> CONFIG_DEBUG_RSEQ=y kernel. This could help us to detect whether there >> is a syscall issued inside restartable sequences. >> >> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> >> --- >> arch/powerpc/kernel/entry_32.S | 5 +++++ >> arch/powerpc/kernel/entry_64.S | 5 +++++ >> 2 files changed, 10 insertions(+) >> >> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S >> index eb8d01bae8c6..2f134eebe7ed 100644 >> --- a/arch/powerpc/kernel/entry_32.S >> +++ b/arch/powerpc/kernel/entry_32.S >> @@ -365,6 +365,11 @@ syscall_dotrace_cont: >> blrl /* Call handler */ >> .globl ret_from_syscall >> ret_from_syscall: >> +#ifdef CONFIG_DEBUG_RSEQ >> + /* Check whether the syscall is issued inside a restartable sequence */ >> + addi r3,r1,STACK_FRAME_OVERHEAD >> + bl rseq_syscall >> +#endif >> mr r6,r3 >> CURRENT_THREAD_INFO(r12, r1) >> /* disable interrupts so current_thread_info()->flags can't change */ >> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S >> index 2cb5109a7ea3..2e2d59bb45d0 100644 >> --- a/arch/powerpc/kernel/entry_64.S >> +++ b/arch/powerpc/kernel/entry_64.S >> @@ -204,6 +204,11 @@ system_call: /* label this so stack traces look sane */ >> * This is blacklisted from kprobes further below with _ASM_NOKPROBE_SYMBOL(). >> */ >> system_call_exit: >> +#ifdef CONFIG_DEBUG_RSEQ >> + /* Check whether the syscall is issued inside a restartable sequence */ >> + addi r3,r1,STACK_FRAME_OVERHEAD >> + bl rseq_syscall >> +#endif >> /* >> * Disable interrupts so current_thread_info()->flags can't change, >> * and so that we don't get interrupted after loading SRR0/1. >> -- >> 2.16.2 > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences @ 2018-05-23 21:29 ` Mathieu Desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-23 21:29 UTC (permalink / raw) To: Boqun Feng Cc: Will Deacon, Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas ----- On May 23, 2018, at 4:14 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote: > ----- On May 20, 2018, at 10:08 AM, Boqun Feng boqun.feng@gmail.com wrote: > >> On Fri, May 18, 2018 at 02:17:17PM -0400, Mathieu Desnoyers wrote: >>> ----- On May 17, 2018, at 7:50 PM, Boqun Feng boqun.feng@gmail.com wrote: >>> [...] >>> >> > I think you're right. So we have to introduce callsite to rseq_syscall() >>> >> > in syscall path, something like: >>> >> > >>> >> > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S >>> >> > index 51695608c68b..a25734a96640 100644 >>> >> > --- a/arch/powerpc/kernel/entry_64.S >>> >> > +++ b/arch/powerpc/kernel/entry_64.S >>> >> > @@ -222,6 +222,9 @@ system_call_exit: >>> >> > mtmsrd r11,1 >>> >> > #endif /* CONFIG_PPC_BOOK3E */ >>> >> > >>> >> > + addi r3,r1,STACK_FRAME_OVERHEAD >>> >> > + bl rseq_syscall >>> >> > + >>> >> > ld r9,TI_FLAGS(r12) >>> >> > li r11,-MAX_ERRNO >>> >> > andi. >>> >> > r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) >>> >> > >>> >>> By the way, I think this is not the right spot to call rseq_syscall, because >>> interrupts are disabled. I think we should move this hunk right after >>> system_call_exit. >>> >> >> Good point. >> >>> Would you like to implement and test an updated patch adding those calls for ppc >>> 32 and 64 ? >>> >> >> I'd like to help, but I don't have a handy ppc environment for test... >> So I made the below patch which has only been build-tested, hope it >> could be somewhat helpful. > > Hi Boqun, > > I tried your patch in a ppc64 le environment, and it does not survive boot > with CONFIG_DEBUG_RSEQ=y. init gets killed right away. The following fixup gets ppc64 to work: --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -208,6 +208,7 @@ system_call_exit: /* Check whether the syscall is issued inside a restartable sequence */ addi r3,r1,STACK_FRAME_OVERHEAD bl rseq_syscall + ld r3,RESULT(r1) #endif /* * Disable interrupts so current_thread_info()->flags can't change, > Moreover, I'm not sure that the r3 register don't contain something worth > saving before the call on ppc32. Just after there is a "mr" instruction > which AFAIU takes r3 as input register. I'll start testing on ppc32 now. Thanks, Mathieu > > Can you look into it ? > > Thanks, > > Mathieu > >> >> Regards, >> Boqun >> >> --------------------------------->8 >> Subject: [PATCH] powerpc: Add syscall detection for restartable sequences >> >> Syscalls are not allowed inside restartable sequences, so add a call to >> rseq_syscall() at the very beginning of system call exiting path for >> CONFIG_DEBUG_RSEQ=y kernel. This could help us to detect whether there >> is a syscall issued inside restartable sequences. >> >> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> >> --- >> arch/powerpc/kernel/entry_32.S | 5 +++++ >> arch/powerpc/kernel/entry_64.S | 5 +++++ >> 2 files changed, 10 insertions(+) >> >> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S >> index eb8d01bae8c6..2f134eebe7ed 100644 >> --- a/arch/powerpc/kernel/entry_32.S >> +++ b/arch/powerpc/kernel/entry_32.S >> @@ -365,6 +365,11 @@ syscall_dotrace_cont: >> blrl /* Call handler */ >> .globl ret_from_syscall >> ret_from_syscall: >> +#ifdef CONFIG_DEBUG_RSEQ >> + /* Check whether the syscall is issued inside a restartable sequence */ >> + addi r3,r1,STACK_FRAME_OVERHEAD >> + bl rseq_syscall >> +#endif >> mr r6,r3 >> CURRENT_THREAD_INFO(r12, r1) >> /* disable interrupts so current_thread_info()->flags can't change */ >> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S >> index 2cb5109a7ea3..2e2d59bb45d0 100644 >> --- a/arch/powerpc/kernel/entry_64.S >> +++ b/arch/powerpc/kernel/entry_64.S >> @@ -204,6 +204,11 @@ system_call: /* label this so stack traces look sane */ >> * This is blacklisted from kprobes further below with _ASM_NOKPROBE_SYMBOL(). >> */ >> system_call_exit: >> +#ifdef CONFIG_DEBUG_RSEQ >> + /* Check whether the syscall is issued inside a restartable sequence */ >> + addi r3,r1,STACK_FRAME_OVERHEAD >> + bl rseq_syscall >> +#endif >> /* >> * Disable interrupts so current_thread_info()->flags can't change, >> * and so that we don't get interrupted after loading SRR0/1. >> -- >> 2.16.2 > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences 2018-05-23 21:29 ` Mathieu Desnoyers @ 2018-05-24 1:03 ` Michael Ellerman -1 siblings, 0 replies; 105+ messages in thread From: Michael Ellerman @ 2018-05-24 1:03 UTC (permalink / raw) To: Mathieu Desnoyers, Boqun Feng Cc: Will Deacon, Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Michael Kerrisk, Joel Fernandes, Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev Mathieu Desnoyers <mathieu.desnoyers@efficios.com> writes: > ----- On May 23, 2018, at 4:14 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote: ... >> >> Hi Boqun, >> >> I tried your patch in a ppc64 le environment, and it does not survive boot >> with CONFIG_DEBUG_RSEQ=y. init gets killed right away. Sorry this code is super gross and hard to deal with. > The following fixup gets ppc64 to work: > > --- a/arch/powerpc/kernel/entry_64.S > +++ b/arch/powerpc/kernel/entry_64.S > @@ -208,6 +208,7 @@ system_call_exit: > /* Check whether the syscall is issued inside a restartable sequence */ > addi r3,r1,STACK_FRAME_OVERHEAD > bl rseq_syscall > + ld r3,RESULT(r1) > #endif > /* > * Disable interrupts so current_thread_info()->flags can't change, I don't think that's safe. If you look above that, we have r3, r8 and r12 all live: .Lsyscall_exit: std r3,RESULT(r1) CURRENT_THREAD_INFO(r12, r1) ld r8,_MSR(r1) #ifdef CONFIG_PPC_BOOK3S /* No MSR:RI on BookE */ andi. r10,r8,MSR_RI beq- .Lunrecov_restore #endif They're all volatile across function calls: http://openpowerfoundation.org/wp-content/uploads/resources/leabi/content/dbdoclet.50655240_68174.html The system_call_exit symbol is actually there for kprobes and cosmetic purposes. The actual syscall return flow starts at .Lsyscall_exit. So I think this would work: diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index db4df061c33a..e19f377a25e0 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -184,6 +184,14 @@ system_call: /* label this so stack traces look sane */ .Lsyscall_exit: std r3,RESULT(r1) + +#ifdef CONFIG_DEBUG_RSEQ + /* Check whether the syscall is issued inside a restartable sequence */ + addi r3,r1,STACK_FRAME_OVERHEAD + bl rseq_syscall + ld r3,RESULT(r1) +#endif + CURRENT_THREAD_INFO(r12, r1) ld r8,_MSR(r1) I'll try and get this series into my test setup at some point, been a bit busy lately :) cheers ^ permalink raw reply related [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences @ 2018-05-24 1:03 ` Michael Ellerman 0 siblings, 0 replies; 105+ messages in thread From: Michael Ellerman @ 2018-05-24 1:03 UTC (permalink / raw) To: Mathieu Desnoyers, Boqun Feng Cc: Will Deacon, Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas Mathieu Desnoyers <mathieu.desnoyers@efficios.com> writes: > ----- On May 23, 2018, at 4:14 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote: ... >> >> Hi Boqun, >> >> I tried your patch in a ppc64 le environment, and it does not survive boot >> with CONFIG_DEBUG_RSEQ=y. init gets killed right away. Sorry this code is super gross and hard to deal with. > The following fixup gets ppc64 to work: > > --- a/arch/powerpc/kernel/entry_64.S > +++ b/arch/powerpc/kernel/entry_64.S > @@ -208,6 +208,7 @@ system_call_exit: > /* Check whether the syscall is issued inside a restartable sequence */ > addi r3,r1,STACK_FRAME_OVERHEAD > bl rseq_syscall > + ld r3,RESULT(r1) > #endif > /* > * Disable interrupts so current_thread_info()->flags can't change, I don't think that's safe. If you look above that, we have r3, r8 and r12 all live: .Lsyscall_exit: std r3,RESULT(r1) CURRENT_THREAD_INFO(r12, r1) ld r8,_MSR(r1) #ifdef CONFIG_PPC_BOOK3S /* No MSR:RI on BookE */ andi. r10,r8,MSR_RI beq- .Lunrecov_restore #endif They're all volatile across function calls: http://openpowerfoundation.org/wp-content/uploads/resources/leabi/content/dbdoclet.50655240_68174.html The system_call_exit symbol is actually there for kprobes and cosmetic purposes. The actual syscall return flow starts at .Lsyscall_exit. So I think this would work: diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index db4df061c33a..e19f377a25e0 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -184,6 +184,14 @@ system_call: /* label this so stack traces look sane */ .Lsyscall_exit: std r3,RESULT(r1) + +#ifdef CONFIG_DEBUG_RSEQ + /* Check whether the syscall is issued inside a restartable sequence */ + addi r3,r1,STACK_FRAME_OVERHEAD + bl rseq_syscall + ld r3,RESULT(r1) +#endif + CURRENT_THREAD_INFO(r12, r1) ld r8,_MSR(r1) I'll try and get this series into my test setup at some point, been a bit busy lately :) cheers ^ permalink raw reply related [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences 2018-05-24 1:03 ` Michael Ellerman @ 2018-05-28 7:00 ` Mathieu Desnoyers -1 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-28 7:00 UTC (permalink / raw) To: Michael Ellerman, Boqun Feng Cc: Will Deacon, Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Michael Kerrisk, Joel Fernandes, Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev ----- On May 24, 2018, at 3:03 AM, Michael Ellerman mpe@ellerman.id.au wrote: > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> writes: >> ----- On May 23, 2018, at 4:14 PM, Mathieu Desnoyers >> mathieu.desnoyers@efficios.com wrote: > ... >>> >>> Hi Boqun, >>> >>> I tried your patch in a ppc64 le environment, and it does not survive boot >>> with CONFIG_DEBUG_RSEQ=y. init gets killed right away. > > > Sorry this code is super gross and hard to deal with. > >> The following fixup gets ppc64 to work: >> >> --- a/arch/powerpc/kernel/entry_64.S >> +++ b/arch/powerpc/kernel/entry_64.S >> @@ -208,6 +208,7 @@ system_call_exit: >> /* Check whether the syscall is issued inside a restartable sequence */ >> addi r3,r1,STACK_FRAME_OVERHEAD >> bl rseq_syscall >> + ld r3,RESULT(r1) >> #endif >> /* >> * Disable interrupts so current_thread_info()->flags can't change, > > I don't think that's safe. > > If you look above that, we have r3, r8 and r12 all live: > > .Lsyscall_exit: > std r3,RESULT(r1) > CURRENT_THREAD_INFO(r12, r1) > > ld r8,_MSR(r1) > #ifdef CONFIG_PPC_BOOK3S > /* No MSR:RI on BookE */ > andi. r10,r8,MSR_RI > beq- .Lunrecov_restore > #endif > > > They're all volatile across function calls: > > http://openpowerfoundation.org/wp-content/uploads/resources/leabi/content/dbdoclet.50655240_68174.html > > > The system_call_exit symbol is actually there for kprobes and cosmetic > purposes. The actual syscall return flow starts at .Lsyscall_exit. > > So I think this would work: > > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > index db4df061c33a..e19f377a25e0 100644 > --- a/arch/powerpc/kernel/entry_64.S > +++ b/arch/powerpc/kernel/entry_64.S > @@ -184,6 +184,14 @@ system_call: /* label this so stack traces look sane */ > > .Lsyscall_exit: > std r3,RESULT(r1) > + > +#ifdef CONFIG_DEBUG_RSEQ > + /* Check whether the syscall is issued inside a restartable sequence */ > + addi r3,r1,STACK_FRAME_OVERHEAD > + bl rseq_syscall > + ld r3,RESULT(r1) > +#endif > + > CURRENT_THREAD_INFO(r12, r1) > > ld r8,_MSR(r1) > > > I'll try and get this series into my test setup at some point, been a > bit busy lately :) Yes, this was needed. I had this in my tree already, but there is still a kernel OOPS when running the rseq selftests on ppc64 with CONFIG_DEBUG_RSEQ=y. My current dev tree is at: https://github.com/compudj/linux-percpu-dev/tree/rseq/dev-local So considering we are at rc7 now, should I plan to removing the powerpc bits for merge window submission, or is there someone planning to spend time on fixing and testing ppc integration before the merge window opens ? Thanks, Mathieu > > cheers -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences @ 2018-05-28 7:00 ` Mathieu Desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-28 7:00 UTC (permalink / raw) To: Michael Ellerman, Boqun Feng Cc: Will Deacon, Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas ----- On May 24, 2018, at 3:03 AM, Michael Ellerman mpe@ellerman.id.au wrote: > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> writes: >> ----- On May 23, 2018, at 4:14 PM, Mathieu Desnoyers >> mathieu.desnoyers@efficios.com wrote: > ... >>> >>> Hi Boqun, >>> >>> I tried your patch in a ppc64 le environment, and it does not survive boot >>> with CONFIG_DEBUG_RSEQ=y. init gets killed right away. > > > Sorry this code is super gross and hard to deal with. > >> The following fixup gets ppc64 to work: >> >> --- a/arch/powerpc/kernel/entry_64.S >> +++ b/arch/powerpc/kernel/entry_64.S >> @@ -208,6 +208,7 @@ system_call_exit: >> /* Check whether the syscall is issued inside a restartable sequence */ >> addi r3,r1,STACK_FRAME_OVERHEAD >> bl rseq_syscall >> + ld r3,RESULT(r1) >> #endif >> /* >> * Disable interrupts so current_thread_info()->flags can't change, > > I don't think that's safe. > > If you look above that, we have r3, r8 and r12 all live: > > .Lsyscall_exit: > std r3,RESULT(r1) > CURRENT_THREAD_INFO(r12, r1) > > ld r8,_MSR(r1) > #ifdef CONFIG_PPC_BOOK3S > /* No MSR:RI on BookE */ > andi. r10,r8,MSR_RI > beq- .Lunrecov_restore > #endif > > > They're all volatile across function calls: > > http://openpowerfoundation.org/wp-content/uploads/resources/leabi/content/dbdoclet.50655240_68174.html > > > The system_call_exit symbol is actually there for kprobes and cosmetic > purposes. The actual syscall return flow starts at .Lsyscall_exit. > > So I think this would work: > > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > index db4df061c33a..e19f377a25e0 100644 > --- a/arch/powerpc/kernel/entry_64.S > +++ b/arch/powerpc/kernel/entry_64.S > @@ -184,6 +184,14 @@ system_call: /* label this so stack traces look sane */ > > .Lsyscall_exit: > std r3,RESULT(r1) > + > +#ifdef CONFIG_DEBUG_RSEQ > + /* Check whether the syscall is issued inside a restartable sequence */ > + addi r3,r1,STACK_FRAME_OVERHEAD > + bl rseq_syscall > + ld r3,RESULT(r1) > +#endif > + > CURRENT_THREAD_INFO(r12, r1) > > ld r8,_MSR(r1) > > > I'll try and get this series into my test setup at some point, been a > bit busy lately :) Yes, this was needed. I had this in my tree already, but there is still a kernel OOPS when running the rseq selftests on ppc64 with CONFIG_DEBUG_RSEQ=y. My current dev tree is at: https://github.com/compudj/linux-percpu-dev/tree/rseq/dev-local So considering we are at rc7 now, should I plan to removing the powerpc bits for merge window submission, or is there someone planning to spend time on fixing and testing ppc integration before the merge window opens ? Thanks, Mathieu > > cheers -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences 2018-05-17 15:28 ` Mathieu Desnoyers @ 2018-05-18 12:38 ` Michael Ellerman -1 siblings, 0 replies; 105+ messages in thread From: Michael Ellerman @ 2018-05-18 12:38 UTC (permalink / raw) To: Mathieu Desnoyers, Boqun Feng, Will Deacon Cc: Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Michael Kerrisk, Joel Fernandes, Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev Mathieu Desnoyers <mathieu.desnoyers@efficios.com> writes: > ----- On May 16, 2018, at 9:19 PM, Boqun Feng boqun.feng@gmail.com wrote: >> On Wed, May 16, 2018 at 04:13:16PM -0400, Mathieu Desnoyers wrote: >>> ----- On May 16, 2018, at 12:18 PM, Peter Zijlstra peterz@infradead.org wrote: >>> > On Mon, Apr 30, 2018 at 06:44:26PM -0400, Mathieu Desnoyers wrote: >>> >> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig >>> >> index c32a181a7cbb..ed21a777e8c6 100644 >>> >> --- a/arch/powerpc/Kconfig >>> >> +++ b/arch/powerpc/Kconfig >>> >> @@ -223,6 +223,7 @@ config PPC >>> >> select HAVE_SYSCALL_TRACEPOINTS >>> >> select HAVE_VIRT_CPU_ACCOUNTING >>> >> select HAVE_IRQ_TIME_ACCOUNTING >>> >> + select HAVE_RSEQ >>> >> select IRQ_DOMAIN >>> >> select IRQ_FORCED_THREADING >>> >> select MODULES_USE_ELF_RELA >>> >> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c >>> >> index 61db86ecd318..d3bb3aaaf5ac 100644 >>> >> --- a/arch/powerpc/kernel/signal.c >>> >> +++ b/arch/powerpc/kernel/signal.c >>> >> @@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk) >>> >> /* Re-enable the breakpoints for the signal stack */ >>> >> thread_change_pc(tsk, tsk->thread.regs); >>> >> >>> >> + rseq_signal_deliver(tsk->thread.regs); >>> >> + >>> >> if (is32) { >>> >> if (ksig.ka.sa.sa_flags & SA_SIGINFO) >>> >> ret = handle_rt_signal32(&ksig, oldset, tsk); >>> >> @@ -164,6 +166,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long >>> >> thread_info_flags) >>> >> if (thread_info_flags & _TIF_NOTIFY_RESUME) { >>> >> clear_thread_flag(TIF_NOTIFY_RESUME); >>> >> tracehook_notify_resume(regs); >>> >> + rseq_handle_notify_resume(regs); >>> >> } >>> >> >>> >> user_enter(); >>> > >>> > Again no rseq_syscall(). >>> >>> Same question for PowerPC as for ARM: >>> >>> Considering that rseq_syscall is implemented as follows: >>> >>> +void rseq_syscall(struct pt_regs *regs) >>> +{ >>> + unsigned long ip = instruction_pointer(regs); >>> + struct task_struct *t = current; >>> + struct rseq_cs rseq_cs; >>> + >>> + if (!t->rseq) >>> + return; >>> + if (!access_ok(VERIFY_READ, t->rseq, sizeof(*t->rseq)) || >>> + rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) >>> + force_sig(SIGSEGV, t); >>> +} >>> >>> and that x86 calls it from syscall_return_slowpath() (which AFAIU is >>> now used in the fast-path since KPTI), I wonder where we should call >> >> So we actually detect this after the syscall takes effect, right? I >> wonder whether this could be problematic, because "disallowing syscall" >> in rseq areas may means the syscall won't take effect to some people, I >> guess? >> >>> this on PowerPC ? I was under the impression that PowerPC return to >>> userspace fast-path was not calling C code unless work flags were set, >>> but I might be wrong. >>> >> >> I think you're right. So we have to introduce callsite to rseq_syscall() >> in syscall path, something like: >> >> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S >> index 51695608c68b..a25734a96640 100644 >> --- a/arch/powerpc/kernel/entry_64.S >> +++ b/arch/powerpc/kernel/entry_64.S >> @@ -222,6 +222,9 @@ system_call_exit: >> mtmsrd r11,1 >> #endif /* CONFIG_PPC_BOOK3E */ >> >> + addi r3,r1,STACK_FRAME_OVERHEAD >> + bl rseq_syscall >> + >> ld r9,TI_FLAGS(r12) >> li r11,-MAX_ERRNO >> andi. >> r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) >> >> But I think it's important for us to first decide where (before or after >> the syscall) we do the detection. > > As Peter said, we don't really care whether it's on syscall entry or exit, as > long as the process gets killed when the erroneous use is detected. I think doing > it on syscall exit is a bit easier because we can clearly access the userspace > TLS, which AFAIU may be less straightforward on syscall entry. Coming in to the thread late, sorry if I'm missing the point. > We may want to add #ifdef CONFIG_DEBUG_RSEQ / #endif around the code you > proposed above, so it's only compiled in if CONFIG_DEBUG_RSEQ=y. That sounds good. A function call is not free even if it returns immediately. > On the ARM leg of the email thread, Will Deacon suggests to test whether current->rseq > is non-NULL before calling rseq_syscall(). I wonder if this added check is justified > as the assembly level, considering that this is just a debugging option. We already do > that check at the very beginning of rseq_syscall(). I guess it depends if this is one of those "debugging options" that's going to end up turned on in distro kernels? I think in that code we'd need to check paca->current->rseq, so that wouldn't be free either. cheers ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [PATCH 07/14] powerpc: Add support for restartable sequences @ 2018-05-18 12:38 ` Michael Ellerman 0 siblings, 0 replies; 105+ messages in thread From: Michael Ellerman @ 2018-05-18 12:38 UTC (permalink / raw) To: Mathieu Desnoyers, Boqun Feng, Will Deacon Cc: Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Michael Mathieu Desnoyers <mathieu.desnoyers@efficios.com> writes: > ----- On May 16, 2018, at 9:19 PM, Boqun Feng boqun.feng@gmail.com wrote: >> On Wed, May 16, 2018 at 04:13:16PM -0400, Mathieu Desnoyers wrote: >>> ----- On May 16, 2018, at 12:18 PM, Peter Zijlstra peterz@infradead.org wrote: >>> > On Mon, Apr 30, 2018 at 06:44:26PM -0400, Mathieu Desnoyers wrote: >>> >> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig >>> >> index c32a181a7cbb..ed21a777e8c6 100644 >>> >> --- a/arch/powerpc/Kconfig >>> >> +++ b/arch/powerpc/Kconfig >>> >> @@ -223,6 +223,7 @@ config PPC >>> >> select HAVE_SYSCALL_TRACEPOINTS >>> >> select HAVE_VIRT_CPU_ACCOUNTING >>> >> select HAVE_IRQ_TIME_ACCOUNTING >>> >> + select HAVE_RSEQ >>> >> select IRQ_DOMAIN >>> >> select IRQ_FORCED_THREADING >>> >> select MODULES_USE_ELF_RELA >>> >> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c >>> >> index 61db86ecd318..d3bb3aaaf5ac 100644 >>> >> --- a/arch/powerpc/kernel/signal.c >>> >> +++ b/arch/powerpc/kernel/signal.c >>> >> @@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk) >>> >> /* Re-enable the breakpoints for the signal stack */ >>> >> thread_change_pc(tsk, tsk->thread.regs); >>> >> >>> >> + rseq_signal_deliver(tsk->thread.regs); >>> >> + >>> >> if (is32) { >>> >> if (ksig.ka.sa.sa_flags & SA_SIGINFO) >>> >> ret = handle_rt_signal32(&ksig, oldset, tsk); >>> >> @@ -164,6 +166,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long >>> >> thread_info_flags) >>> >> if (thread_info_flags & _TIF_NOTIFY_RESUME) { >>> >> clear_thread_flag(TIF_NOTIFY_RESUME); >>> >> tracehook_notify_resume(regs); >>> >> + rseq_handle_notify_resume(regs); >>> >> } >>> >> >>> >> user_enter(); >>> > >>> > Again no rseq_syscall(). >>> >>> Same question for PowerPC as for ARM: >>> >>> Considering that rseq_syscall is implemented as follows: >>> >>> +void rseq_syscall(struct pt_regs *regs) >>> +{ >>> + unsigned long ip = instruction_pointer(regs); >>> + struct task_struct *t = current; >>> + struct rseq_cs rseq_cs; >>> + >>> + if (!t->rseq) >>> + return; >>> + if (!access_ok(VERIFY_READ, t->rseq, sizeof(*t->rseq)) || >>> + rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) >>> + force_sig(SIGSEGV, t); >>> +} >>> >>> and that x86 calls it from syscall_return_slowpath() (which AFAIU is >>> now used in the fast-path since KPTI), I wonder where we should call >> >> So we actually detect this after the syscall takes effect, right? I >> wonder whether this could be problematic, because "disallowing syscall" >> in rseq areas may means the syscall won't take effect to some people, I >> guess? >> >>> this on PowerPC ? I was under the impression that PowerPC return to >>> userspace fast-path was not calling C code unless work flags were set, >>> but I might be wrong. >>> >> >> I think you're right. So we have to introduce callsite to rseq_syscall() >> in syscall path, something like: >> >> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S >> index 51695608c68b..a25734a96640 100644 >> --- a/arch/powerpc/kernel/entry_64.S >> +++ b/arch/powerpc/kernel/entry_64.S >> @@ -222,6 +222,9 @@ system_call_exit: >> mtmsrd r11,1 >> #endif /* CONFIG_PPC_BOOK3E */ >> >> + addi r3,r1,STACK_FRAME_OVERHEAD >> + bl rseq_syscall >> + >> ld r9,TI_FLAGS(r12) >> li r11,-MAX_ERRNO >> andi. >> r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) >> >> But I think it's important for us to first decide where (before or after >> the syscall) we do the detection. > > As Peter said, we don't really care whether it's on syscall entry or exit, as > long as the process gets killed when the erroneous use is detected. I think doing > it on syscall exit is a bit easier because we can clearly access the userspace > TLS, which AFAIU may be less straightforward on syscall entry. Coming in to the thread late, sorry if I'm missing the point. > We may want to add #ifdef CONFIG_DEBUG_RSEQ / #endif around the code you > proposed above, so it's only compiled in if CONFIG_DEBUG_RSEQ=y. That sounds good. A function call is not free even if it returns immediately. > On the ARM leg of the email thread, Will Deacon suggests to test whether current->rseq > is non-NULL before calling rseq_syscall(). I wonder if this added check is justified > as the assembly level, considering that this is just a debugging option. We already do > that check at the very beginning of rseq_syscall(). I guess it depends if this is one of those "debugging options" that's going to end up turned on in distro kernels? I think in that code we'd need to check paca->current->rseq, so that wouldn't be free either. cheers ^ permalink raw reply [flat|nested] 105+ messages in thread
* [PATCH 08/14] powerpc: Wire up restartable sequences system call 2018-04-30 22:44 [RFC PATCH for 4.18 00/14] Restartable Sequences Mathieu Desnoyers @ 2018-04-30 22:44 ` Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers ` (13 subsequent siblings) 14 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, linuxppc-dev From: Boqun Feng <boqun.feng@gmail.com> Wire up the rseq system call on powerpc. This provides an ABI improving the speed of a user-space getcpu operation on powerpc by skipping the getcpu system call on the fast path, as well as improving the speed of user-space operations on per-cpu data compared to using load-reservation/store-conditional atomics. TODO: wire up rseq_syscall() on return from system call. It is used with CONFIG_DEBUG_RSEQ=y to ensure system calls are not issued within rseq critical section Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> CC: Michael Ellerman <mpe@ellerman.id.au> CC: Peter Zijlstra <peterz@infradead.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: linuxppc-dev@lists.ozlabs.org --- arch/powerpc/include/asm/systbl.h | 1 + arch/powerpc/include/asm/unistd.h | 2 +- arch/powerpc/include/uapi/asm/unistd.h | 1 + 3 files changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h index d61f9c96d916..45d4d37495fd 100644 --- a/arch/powerpc/include/asm/systbl.h +++ b/arch/powerpc/include/asm/systbl.h @@ -392,3 +392,4 @@ SYSCALL(statx) SYSCALL(pkey_alloc) SYSCALL(pkey_free) SYSCALL(pkey_mprotect) +SYSCALL(rseq) diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h index daf1ba97a00c..1e9708632dce 100644 --- a/arch/powerpc/include/asm/unistd.h +++ b/arch/powerpc/include/asm/unistd.h @@ -12,7 +12,7 @@ #include <uapi/asm/unistd.h> -#define NR_syscalls 387 +#define NR_syscalls 388 #define __NR__exit __NR_exit diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h index 389c36fd8299..ac5ba55066dd 100644 --- a/arch/powerpc/include/uapi/asm/unistd.h +++ b/arch/powerpc/include/uapi/asm/unistd.h @@ -398,5 +398,6 @@ #define __NR_pkey_alloc 384 #define __NR_pkey_free 385 #define __NR_pkey_mprotect 386 +#define __NR_rseq 387 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */ -- 2.11.0 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 08/14] powerpc: Wire up restartable sequences system call @ 2018-04-30 22:44 ` Mathieu Desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers, Benjamin Herrenschmidt From: Boqun Feng <boqun.feng@gmail.com> Wire up the rseq system call on powerpc. This provides an ABI improving the speed of a user-space getcpu operation on powerpc by skipping the getcpu system call on the fast path, as well as improving the speed of user-space operations on per-cpu data compared to using load-reservation/store-conditional atomics. TODO: wire up rseq_syscall() on return from system call. It is used with CONFIG_DEBUG_RSEQ=y to ensure system calls are not issued within rseq critical section Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> CC: Michael Ellerman <mpe@ellerman.id.au> CC: Peter Zijlstra <peterz@infradead.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: linuxppc-dev@lists.ozlabs.org --- arch/powerpc/include/asm/systbl.h | 1 + arch/powerpc/include/asm/unistd.h | 2 +- arch/powerpc/include/uapi/asm/unistd.h | 1 + 3 files changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h index d61f9c96d916..45d4d37495fd 100644 --- a/arch/powerpc/include/asm/systbl.h +++ b/arch/powerpc/include/asm/systbl.h @@ -392,3 +392,4 @@ SYSCALL(statx) SYSCALL(pkey_alloc) SYSCALL(pkey_free) SYSCALL(pkey_mprotect) +SYSCALL(rseq) diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h index daf1ba97a00c..1e9708632dce 100644 --- a/arch/powerpc/include/asm/unistd.h +++ b/arch/powerpc/include/asm/unistd.h @@ -12,7 +12,7 @@ #include <uapi/asm/unistd.h> -#define NR_syscalls 387 +#define NR_syscalls 388 #define __NR__exit __NR_exit diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h index 389c36fd8299..ac5ba55066dd 100644 --- a/arch/powerpc/include/uapi/asm/unistd.h +++ b/arch/powerpc/include/uapi/asm/unistd.h @@ -398,5 +398,6 @@ #define __NR_pkey_alloc 384 #define __NR_pkey_free 385 #define __NR_pkey_mprotect 386 +#define __NR_rseq 387 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */ -- 2.11.0 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 09/14] selftests: lib.mk: Introduce OVERRIDE_TARGETS 2018-04-30 22:44 [RFC PATCH for 4.18 00/14] Restartable Sequences Mathieu Desnoyers 2018-04-30 22:44 ` [PATCH 01/14] uapi headers: Provide types_32_64.h (v2) Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers @ 2018-04-30 22:44 ` mathieu.desnoyers 2018-04-30 22:44 ` [PATCH 04/14] arm: Wire up restartable sequences system call Mathieu Desnoyers ` (11 subsequent siblings) 14 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers, linux-kselftest Introduce OVERRIDE_TARGETS to allow tests to express dependencies on header files and .so, which require to override the selftests lib.mk targets. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Shuah Khan <shuahkh@osg.samsung.com> CC: Russell King <linux@arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas@arm.com> CC: Will Deacon <will.deacon@arm.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Paul Turner <pjt@google.com> CC: Andrew Hunter <ahh@google.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Andi Kleen <andi@firstfloor.org> CC: Dave Watson <davejwatson@fb.com> CC: Chris Lameter <cl@linux.com> CC: Ingo Molnar <mingo@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Ben Maurer <bmaurer@fb.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Boqun Feng <boqun.feng@gmail.com> CC: linux-kselftest@vger.kernel.org CC: linux-api@vger.kernel.org --- tools/testing/selftests/lib.mk | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk index 195e9d4739a9..9fd57efae439 100644 --- a/tools/testing/selftests/lib.mk +++ b/tools/testing/selftests/lib.mk @@ -106,6 +106,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) endif +# Selftest makefiles can override those targets by setting +# OVERRIDE_TARGETS = 1. +ifeq ($(OVERRIDE_TARGETS),) $(OUTPUT)/%:%.c $(LINK.c) $^ $(LDLIBS) -o $@ @@ -114,5 +117,6 @@ $(OUTPUT)/%.o:%.S $(OUTPUT)/%:%.S $(LINK.S) $^ $(LDLIBS) -o $@ +endif .PHONY: run_tests all clean install emit_tests -- 2.11.0 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 09/14] selftests: lib.mk: Introduce OVERRIDE_TARGETS @ 2018-04-30 22:44 ` mathieu.desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers, linux-kselftest Introduce OVERRIDE_TARGETS to allow tests to express dependencies on header files and .so, which require to override the selftests lib.mk targets. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Shuah Khan <shuahkh@osg.samsung.com> CC: Russell King <linux@arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas@arm.com> CC: Will Deacon <will.deacon@arm.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Paul Turner <pjt@google.com> CC: Andrew Hunter <ahh@google.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Andi Kleen <andi@firstfloor.org> CC: Dave Watson <davejwatson@fb.com> CC: Chris Lameter <cl@linux.com> CC: Ingo Molnar <mingo@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Ben Maurer <bmaurer@fb.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Boqun Feng <boqun.feng@gmail.com> CC: linux-kselftest@vger.kernel.org CC: linux-api@vger.kernel.org --- tools/testing/selftests/lib.mk | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk index 195e9d4739a9..9fd57efae439 100644 --- a/tools/testing/selftests/lib.mk +++ b/tools/testing/selftests/lib.mk @@ -106,6 +106,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) endif +# Selftest makefiles can override those targets by setting +# OVERRIDE_TARGETS = 1. +ifeq ($(OVERRIDE_TARGETS),) $(OUTPUT)/%:%.c $(LINK.c) $^ $(LDLIBS) -o $@ @@ -114,5 +117,6 @@ $(OUTPUT)/%.o:%.S $(OUTPUT)/%:%.S $(LINK.S) $^ $(LDLIBS) -o $@ +endif .PHONY: run_tests all clean install emit_tests -- 2.11.0 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 09/14] selftests: lib.mk: Introduce OVERRIDE_TARGETS @ 2018-04-30 22:44 ` mathieu.desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) Introduce OVERRIDE_TARGETS to allow tests to express dependencies on header files and .so, which require to override the selftests lib.mk targets. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com> Acked-by: Shuah Khan <shuahkh at osg.samsung.com> CC: Russell King <linux at arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas at arm.com> CC: Will Deacon <will.deacon at arm.com> CC: Thomas Gleixner <tglx at linutronix.de> CC: Paul Turner <pjt at google.com> CC: Andrew Hunter <ahh at google.com> CC: Peter Zijlstra <peterz at infradead.org> CC: Andy Lutomirski <luto at amacapital.net> CC: Andi Kleen <andi at firstfloor.org> CC: Dave Watson <davejwatson at fb.com> CC: Chris Lameter <cl at linux.com> CC: Ingo Molnar <mingo at redhat.com> CC: "H. Peter Anvin" <hpa at zytor.com> CC: Ben Maurer <bmaurer at fb.com> CC: Steven Rostedt <rostedt at goodmis.org> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com> CC: Josh Triplett <josh at joshtriplett.org> CC: Linus Torvalds <torvalds at linux-foundation.org> CC: Andrew Morton <akpm at linux-foundation.org> CC: Boqun Feng <boqun.feng at gmail.com> CC: linux-kselftest at vger.kernel.org CC: linux-api at vger.kernel.org --- tools/testing/selftests/lib.mk | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk index 195e9d4739a9..9fd57efae439 100644 --- a/tools/testing/selftests/lib.mk +++ b/tools/testing/selftests/lib.mk @@ -106,6 +106,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) endif +# Selftest makefiles can override those targets by setting +# OVERRIDE_TARGETS = 1. +ifeq ($(OVERRIDE_TARGETS),) $(OUTPUT)/%:%.c $(LINK.c) $^ $(LDLIBS) -o $@ @@ -114,5 +117,6 @@ $(OUTPUT)/%.o:%.S $(OUTPUT)/%:%.S $(LINK.S) $^ $(LDLIBS) -o $@ +endif .PHONY: run_tests all clean install emit_tests -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 09/14] selftests: lib.mk: Introduce OVERRIDE_TARGETS @ 2018-04-30 22:44 ` mathieu.desnoyers 0 siblings, 0 replies; 105+ messages in thread From: mathieu.desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) Introduce OVERRIDE_TARGETS to allow tests to express dependencies on header files and .so, which require to override the selftests lib.mk targets. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com> Acked-by: Shuah Khan <shuahkh at osg.samsung.com> CC: Russell King <linux at arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas at arm.com> CC: Will Deacon <will.deacon at arm.com> CC: Thomas Gleixner <tglx at linutronix.de> CC: Paul Turner <pjt at google.com> CC: Andrew Hunter <ahh at google.com> CC: Peter Zijlstra <peterz at infradead.org> CC: Andy Lutomirski <luto at amacapital.net> CC: Andi Kleen <andi at firstfloor.org> CC: Dave Watson <davejwatson at fb.com> CC: Chris Lameter <cl at linux.com> CC: Ingo Molnar <mingo at redhat.com> CC: "H. Peter Anvin" <hpa at zytor.com> CC: Ben Maurer <bmaurer at fb.com> CC: Steven Rostedt <rostedt at goodmis.org> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com> CC: Josh Triplett <josh at joshtriplett.org> CC: Linus Torvalds <torvalds at linux-foundation.org> CC: Andrew Morton <akpm at linux-foundation.org> CC: Boqun Feng <boqun.feng at gmail.com> CC: linux-kselftest at vger.kernel.org CC: linux-api at vger.kernel.org --- tools/testing/selftests/lib.mk | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk index 195e9d4739a9..9fd57efae439 100644 --- a/tools/testing/selftests/lib.mk +++ b/tools/testing/selftests/lib.mk @@ -106,6 +106,9 @@ COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c LINK.S = $(CC) $(ASFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) endif +# Selftest makefiles can override those targets by setting +# OVERRIDE_TARGETS = 1. +ifeq ($(OVERRIDE_TARGETS),) $(OUTPUT)/%:%.c $(LINK.c) $^ $(LDLIBS) -o $@ @@ -114,5 +117,6 @@ $(OUTPUT)/%.o:%.S $(OUTPUT)/%:%.S $(LINK.S) $^ $(LDLIBS) -o $@ +endif .PHONY: run_tests all clean install emit_tests -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 10/14] rseq: selftests: Provide rseq library (v5) 2018-04-30 22:44 [RFC PATCH for 4.18 00/14] Restartable Sequences Mathieu Desnoyers 2018-04-30 22:44 ` [PATCH 01/14] uapi headers: Provide types_32_64.h (v2) Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers @ 2018-04-30 22:44 ` mathieu.desnoyers 2018-04-30 22:44 ` [PATCH 04/14] arm: Wire up restartable sequences system call Mathieu Desnoyers ` (11 subsequent siblings) 14 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers, Shuah Khan, linux-kselftest This rseq helper library provides a user-space API to the rseq() system call. The rseq fast-path exposes the instruction pointer addresses where the rseq assembly blocks begin and end, as well as the associated abort instruction pointer, in the __rseq_table section. This section allows debuggers may know where to place breakpoints when single-stepping through assembly blocks which may be aborted at any point by the kernel. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Shuah Khan <shuahkh@osg.samsung.com> CC: Russell King <linux@arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas@arm.com> CC: Will Deacon <will.deacon@arm.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Paul Turner <pjt@google.com> CC: Andrew Hunter <ahh@google.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Andi Kleen <andi@firstfloor.org> CC: Dave Watson <davejwatson@fb.com> CC: Chris Lameter <cl@linux.com> CC: Ingo Molnar <mingo@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Ben Maurer <bmaurer@fb.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Boqun Feng <boqun.feng@gmail.com> CC: linux-kselftest@vger.kernel.org CC: linux-api@vger.kernel.org --- Changes since v1: - Provide abort-ip signature: The abort-ip signature is located just before the abort-ip target. It is currently hardcoded, but a user-space application could use the __rseq_table to iterate on all abort-ip targets and use a random value as signature if needed in the future. - Add rseq_prepare_unload(): Libraries and JIT code using rseq critical sections need to issue rseq_prepare_unload() on each thread at least once before reclaim of struct rseq_cs. - Use initial-exec TLS model, non-weak symbol: The initial-exec model is signal-safe, whereas the global-dynamic model is not. Remove the "weak" symbol attribute from the __rseq_abi in rseq.c. The rseq.so library will have ownership of that symbol, and there is not reason for an application or user library to try to define that symbol. The expected use is to link against libreq.so, which owns and provide that symbol. - Set cpu_id to -2 on register error - Add rseq_len syscall parameter, rseq_cs version - Ensure disassember-friendly signature: x86 32/64 disassembler have a hard time decoding the instruction stream after a bad instruction. Use a nopl instruction to encode the signature. Suggested by Andy Lutomirski. - Exercise parametrized tests variants in a shell scripts. - Restartable sequences selftests: Remove use of event counter. - Use cpu_id_start field: With the cpu_id_start field, the C preparation phase of the fast-path does not need to compare cpu_id < 0 anymore. - Signal-safe registration and refcounting: Allow libraries using librseq.so to register it from signal handlers. - Use OVERRIDE_TARGETS in makefile. - Use "m" constraints for rseq_cs field. Changes since v2: - Update based on Thomas Gleixner's comments. Changes since v3: - Generate param_test_skip_fastpath and param_test_benchmark with -DSKIP_FASTPATH and -DBENCHMARK (respectively). Add param_test_fastpath to run_param_test.sh. Changes since v4: - Fold arm: workaround gcc asm size guess, - Namespace barrier() -> rseq_barrier() in library header, - Take into account coding style feedback from Peter Zijlstra, - Split rseq selftests into logical commits. --- tools/testing/selftests/rseq/rseq-arm.h | 715 +++++++++++++++++++ tools/testing/selftests/rseq/rseq-ppc.h | 671 ++++++++++++++++++ tools/testing/selftests/rseq/rseq-skip.h | 65 ++ tools/testing/selftests/rseq/rseq-x86.h | 1132 ++++++++++++++++++++++++++++++ tools/testing/selftests/rseq/rseq.c | 117 +++ tools/testing/selftests/rseq/rseq.h | 147 ++++ 6 files changed, 2847 insertions(+) create mode 100644 tools/testing/selftests/rseq/rseq-arm.h create mode 100644 tools/testing/selftests/rseq/rseq-ppc.h create mode 100644 tools/testing/selftests/rseq/rseq-skip.h create mode 100644 tools/testing/selftests/rseq/rseq-x86.h create mode 100644 tools/testing/selftests/rseq/rseq.c create mode 100644 tools/testing/selftests/rseq/rseq.h diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h new file mode 100644 index 000000000000..3b055f9aeaab --- /dev/null +++ b/tools/testing/selftests/rseq/rseq-arm.h @@ -0,0 +1,715 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * rseq-arm.h + * + * (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com> + */ + +#define RSEQ_SIG 0x53053053 + +#define rseq_smp_mb() __asm__ __volatile__ ("dmb" ::: "memory", "cc") +#define rseq_smp_rmb() __asm__ __volatile__ ("dmb" ::: "memory", "cc") +#define rseq_smp_wmb() __asm__ __volatile__ ("dmb" ::: "memory", "cc") + +#define rseq_smp_load_acquire(p) \ +__extension__ ({ \ + __typeof(*p) ____p1 = RSEQ_READ_ONCE(*p); \ + rseq_smp_mb(); \ + ____p1; \ +}) + +#define rseq_smp_acquire__after_ctrl_dep() rseq_smp_rmb() + +#define rseq_smp_store_release(p, v) \ +do { \ + rseq_smp_mb(); \ + RSEQ_WRITE_ONCE(*p, v); \ +} while (0) + +#ifdef RSEQ_SKIP_FASTPATH +#include "rseq-skip.h" +#else /* !RSEQ_SKIP_FASTPATH */ + +#define __RSEQ_ASM_DEFINE_TABLE(version, flags, start_ip, \ + post_commit_offset, abort_ip) \ + ".pushsection __rseq_table, \"aw\"\n\t" \ + ".balign 32\n\t" \ + ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + ".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_DEFINE_TABLE(start_ip, post_commit_ip, abort_ip) \ + __RSEQ_ASM_DEFINE_TABLE(0x0, 0x0, start_ip, \ + (post_commit_ip - start_ip), abort_ip) + +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs) \ + RSEQ_INJECT_ASM(1) \ + "adr r0, " __rseq_str(cs_label) "\n\t" \ + "str r0, %[" __rseq_str(rseq_cs) "]\n\t" \ + __rseq_str(label) ":\n\t" + +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label) \ + RSEQ_INJECT_ASM(2) \ + "ldr r0, %[" __rseq_str(current_cpu_id) "]\n\t" \ + "cmp %[" __rseq_str(cpu_id) "], r0\n\t" \ + "bne " __rseq_str(label) "\n\t" + +#define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \ + abort_label, version, flags, \ + start_ip, post_commit_offset, abort_ip) \ + __rseq_str(table_label) ":\n\t" \ + ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + ".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \ + ".word " __rseq_str(RSEQ_SIG) "\n\t" \ + __rseq_str(label) ":\n\t" \ + teardown \ + "b %l[" __rseq_str(abort_label) "]\n\t" + +#define RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, abort_label, \ + start_ip, post_commit_ip, abort_ip) \ + __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \ + abort_label, 0x0, 0x0, start_ip, \ + (post_commit_ip - start_ip), abort_ip) + +#define RSEQ_ASM_DEFINE_CMPFAIL(label, teardown, cmpfail_label) \ + __rseq_str(label) ":\n\t" \ + teardown \ + "b %l[" __rseq_str(cmpfail_label) "]\n\t" + +#define rseq_workaround_gcc_asm_size_guess() __asm__ __volatile__("") + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[error2]\n\t" +#endif + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot, + off_t voffp, intptr_t *load, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expectnot], r0\n\t" + "beq %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "ldr r0, %[v]\n\t" + "cmp %[expectnot], r0\n\t" + "beq %l[error2]\n\t" +#endif + "str r0, %[load]\n\t" + "add r0, %[voffp]\n\t" + "ldr r0, [r0]\n\t" + /* final store */ + "str r0, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expectnot] "r" (expectnot), + [voffp] "Ir" (voffp), + [load] "m" (*load) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_addv(intptr_t *v, intptr_t count, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) +#endif + "ldr r0, %[v]\n\t" + "add r0, %[count]\n\t" + /* final store */ + "str r0, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(4) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + [v] "m" (*v), + [count] "Ir" (count) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort +#ifdef RSEQ_COMPARE_TWICE + , error1 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[error2]\n\t" +#endif + /* try store */ + "str %[newv2], %[v2]\n\t" + RSEQ_INJECT_ASM(5) + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[error2]\n\t" +#endif + /* try store */ + "str %[newv2], %[v2]\n\t" + RSEQ_INJECT_ASM(5) + "dmb\n\t" /* full mb provides store-release */ + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t expect2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) + "ldr r0, %[v2]\n\t" + "cmp %[expect2], r0\n\t" + "bne %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(5) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[error2]\n\t" + "ldr r0, %[v2]\n\t" + "cmp %[expect2], r0\n\t" + "bne %l[error3]\n\t" +#endif + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* cmp2 input */ + [v2] "m" (*v2), + [expect2] "r" (expect2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2, error3 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("1st expected value comparison failed"); +error3: + rseq_bug("2nd expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + uint32_t rseq_scratch[3]; + + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + "str %[src], %[rseq_scratch0]\n\t" + "str %[dst], %[rseq_scratch1]\n\t" + "str %[len], %[rseq_scratch2]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne 5f\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 6f) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne 7f\n\t" +#endif + /* try memcpy */ + "cmp %[len], #0\n\t" \ + "beq 333f\n\t" \ + "222:\n\t" \ + "ldrb %%r0, [%[src]]\n\t" \ + "strb %%r0, [%[dst]]\n\t" \ + "adds %[src], #1\n\t" \ + "adds %[dst], #1\n\t" \ + "subs %[len], #1\n\t" \ + "bne 222b\n\t" \ + "333:\n\t" \ + RSEQ_INJECT_ASM(5) + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t" + "b 8f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + abort, 1b, 2b, 4f) + RSEQ_ASM_DEFINE_CMPFAIL(5, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + cmpfail) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_DEFINE_CMPFAIL(6, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + error1) + RSEQ_ASM_DEFINE_CMPFAIL(7, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + error2) +#endif + "8:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len), + [rseq_scratch0] "m" (rseq_scratch[0]), + [rseq_scratch1] "m" (rseq_scratch[1]), + [rseq_scratch2] "m" (rseq_scratch[2]) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_workaround_gcc_asm_size_guess(); + rseq_bug("cpu_id comparison failed"); +error2: + rseq_workaround_gcc_asm_size_guess(); + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + uint32_t rseq_scratch[3]; + + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + "str %[src], %[rseq_scratch0]\n\t" + "str %[dst], %[rseq_scratch1]\n\t" + "str %[len], %[rseq_scratch2]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne 5f\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 6f) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne 7f\n\t" +#endif + /* try memcpy */ + "cmp %[len], #0\n\t" \ + "beq 333f\n\t" \ + "222:\n\t" \ + "ldrb %%r0, [%[src]]\n\t" \ + "strb %%r0, [%[dst]]\n\t" \ + "adds %[src], #1\n\t" \ + "adds %[dst], #1\n\t" \ + "subs %[len], #1\n\t" \ + "bne 222b\n\t" \ + "333:\n\t" \ + RSEQ_INJECT_ASM(5) + "dmb\n\t" /* full mb provides store-release */ + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t" + "b 8f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + abort, 1b, 2b, 4f) + RSEQ_ASM_DEFINE_CMPFAIL(5, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + cmpfail) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_DEFINE_CMPFAIL(6, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + error1) + RSEQ_ASM_DEFINE_CMPFAIL(7, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + error2) +#endif + "8:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len), + [rseq_scratch0] "m" (rseq_scratch[0]), + [rseq_scratch1] "m" (rseq_scratch[1]), + [rseq_scratch2] "m" (rseq_scratch[2]) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_workaround_gcc_asm_size_guess(); + rseq_bug("cpu_id comparison failed"); +error2: + rseq_workaround_gcc_asm_size_guess(); + rseq_bug("expected value comparison failed"); +#endif +} + +#endif /* !RSEQ_SKIP_FASTPATH */ diff --git a/tools/testing/selftests/rseq/rseq-ppc.h b/tools/testing/selftests/rseq/rseq-ppc.h new file mode 100644 index 000000000000..52630c9f42be --- /dev/null +++ b/tools/testing/selftests/rseq/rseq-ppc.h @@ -0,0 +1,671 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * rseq-ppc.h + * + * (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com> + * (C) Copyright 2016-2018 - Boqun Feng <boqun.feng@gmail.com> + */ + +#define RSEQ_SIG 0x53053053 + +#define rseq_smp_mb() __asm__ __volatile__ ("sync" ::: "memory", "cc") +#define rseq_smp_lwsync() __asm__ __volatile__ ("lwsync" ::: "memory", "cc") +#define rseq_smp_rmb() rseq_smp_lwsync() +#define rseq_smp_wmb() rseq_smp_lwsync() + +#define rseq_smp_load_acquire(p) \ +__extension__ ({ \ + __typeof(*p) ____p1 = RSEQ_READ_ONCE(*p); \ + rseq_smp_lwsync(); \ + ____p1; \ +}) + +#define rseq_smp_acquire__after_ctrl_dep() rseq_smp_lwsync() + +#define rseq_smp_store_release(p, v) \ +do { \ + rseq_smp_lwsync(); \ + RSEQ_WRITE_ONCE(*p, v); \ +} while (0) + +#ifdef RSEQ_SKIP_FASTPATH +#include "rseq-skip.h" +#else /* !RSEQ_SKIP_FASTPATH */ + +/* + * The __rseq_table section can be used by debuggers to better handle + * single-stepping through the restartable critical sections. + */ + +#ifdef __PPC64__ + +#define STORE_WORD "std " +#define LOAD_WORD "ld " +#define LOADX_WORD "ldx " +#define CMP_WORD "cmpd " + +#define __RSEQ_ASM_DEFINE_TABLE(label, version, flags, \ + start_ip, post_commit_offset, abort_ip) \ + ".pushsection __rseq_table, \"aw\"\n\t" \ + ".balign 32\n\t" \ + __rseq_str(label) ":\n\t" \ + ".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + ".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs) \ + RSEQ_INJECT_ASM(1) \ + "lis %%r17, (" __rseq_str(cs_label) ")@highest\n\t" \ + "ori %%r17, %%r17, (" __rseq_str(cs_label) ")@higher\n\t" \ + "rldicr %%r17, %%r17, 32, 31\n\t" \ + "oris %%r17, %%r17, (" __rseq_str(cs_label) ")@high\n\t" \ + "ori %%r17, %%r17, (" __rseq_str(cs_label) ")@l\n\t" \ + "std %%r17, %[" __rseq_str(rseq_cs) "]\n\t" \ + __rseq_str(label) ":\n\t" + +#else /* #ifdef __PPC64__ */ + +#define STORE_WORD "stw " +#define LOAD_WORD "lwz " +#define LOADX_WORD "lwzx " +#define CMP_WORD "cmpw " + +#define __RSEQ_ASM_DEFINE_TABLE(label, version, flags, \ + start_ip, post_commit_offset, abort_ip) \ + ".pushsection __rseq_table, \"aw\"\n\t" \ + ".balign 32\n\t" \ + __rseq_str(label) ":\n\t" \ + ".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + /* 32-bit only supported on BE */ \ + ".long 0x0, " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) "\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs) \ + RSEQ_INJECT_ASM(1) \ + "lis %%r17, (" __rseq_str(cs_label) ")@ha\n\t" \ + "addi %%r17, %%r17, (" __rseq_str(cs_label) ")@l\n\t" \ + "stw %%r17, %[" __rseq_str(rseq_cs) "]\n\t" \ + __rseq_str(label) ":\n\t" + +#endif /* #ifdef __PPC64__ */ + +#define RSEQ_ASM_DEFINE_TABLE(label, start_ip, post_commit_ip, abort_ip) \ + __RSEQ_ASM_DEFINE_TABLE(label, 0x0, 0x0, start_ip, \ + (post_commit_ip - start_ip), abort_ip) + +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label) \ + RSEQ_INJECT_ASM(2) \ + "lwz %%r17, %[" __rseq_str(current_cpu_id) "]\n\t" \ + "cmpw cr7, %[" __rseq_str(cpu_id) "], %%r17\n\t" \ + "bne- cr7, " __rseq_str(label) "\n\t" + +#define RSEQ_ASM_DEFINE_ABORT(label, abort_label) \ + ".pushsection __rseq_failure, \"ax\"\n\t" \ + ".long " __rseq_str(RSEQ_SIG) "\n\t" \ + __rseq_str(label) ":\n\t" \ + "b %l[" __rseq_str(abort_label) "]\n\t" \ + ".popsection\n\t" + +/* + * RSEQ_ASM_OPs: asm operations for rseq + * RSEQ_ASM_OP_R_*: has hard-code registers in it + * RSEQ_ASM_OP_* (else): doesn't have hard-code registers(unless cr7) + */ +#define RSEQ_ASM_OP_CMPEQ(var, expect, label) \ + LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t" \ + CMP_WORD "cr7, %%r17, %[" __rseq_str(expect) "]\n\t" \ + "bne- cr7, " __rseq_str(label) "\n\t" + +#define RSEQ_ASM_OP_CMPNE(var, expectnot, label) \ + LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t" \ + CMP_WORD "cr7, %%r17, %[" __rseq_str(expectnot) "]\n\t" \ + "beq- cr7, " __rseq_str(label) "\n\t" + +#define RSEQ_ASM_OP_STORE(value, var) \ + STORE_WORD "%[" __rseq_str(value) "], %[" __rseq_str(var) "]\n\t" + +/* Load @var to r17 */ +#define RSEQ_ASM_OP_R_LOAD(var) \ + LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t" + +/* Store r17 to @var */ +#define RSEQ_ASM_OP_R_STORE(var) \ + STORE_WORD "%%r17, %[" __rseq_str(var) "]\n\t" + +/* Add @count to r17 */ +#define RSEQ_ASM_OP_R_ADD(count) \ + "add %%r17, %[" __rseq_str(count) "], %%r17\n\t" + +/* Load (r17 + voffp) to r17 */ +#define RSEQ_ASM_OP_R_LOADX(voffp) \ + LOADX_WORD "%%r17, %[" __rseq_str(voffp) "], %%r17\n\t" + +/* TODO: implement a faster memcpy. */ +#define RSEQ_ASM_OP_R_MEMCPY() \ + "cmpdi %%r19, 0\n\t" \ + "beq 333f\n\t" \ + "addi %%r20, %%r20, -1\n\t" \ + "addi %%r21, %%r21, -1\n\t" \ + "222:\n\t" \ + "lbzu %%r18, 1(%%r20)\n\t" \ + "stbu %%r18, 1(%%r21)\n\t" \ + "addi %%r19, %%r19, -1\n\t" \ + "cmpdi %%r19, 0\n\t" \ + "bne 222b\n\t" \ + "333:\n\t" \ + +#define RSEQ_ASM_OP_R_FINAL_STORE(var, post_commit_label) \ + STORE_WORD "%%r17, %[" __rseq_str(var) "]\n\t" \ + __rseq_str(post_commit_label) ":\n\t" + +#define RSEQ_ASM_OP_FINAL_STORE(value, var, post_commit_label) \ + STORE_WORD "%[" __rseq_str(value) "], %[" __rseq_str(var) "]\n\t" \ + __rseq_str(post_commit_label) ":\n\t" + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) +#endif + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot, + off_t voffp, intptr_t *load, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v not equal to @expectnot */ + RSEQ_ASM_OP_CMPNE(v, expectnot, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v not equal to @expectnot */ + RSEQ_ASM_OP_CMPNE(v, expectnot, %l[error2]) +#endif + /* load the value of @v */ + RSEQ_ASM_OP_R_LOAD(v) + /* store it in @load */ + RSEQ_ASM_OP_R_STORE(load) + /* dereference voffp(v) */ + RSEQ_ASM_OP_R_LOADX(voffp) + /* final store the value at voffp(v) */ + RSEQ_ASM_OP_R_FINAL_STORE(v, 2) + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expectnot] "r" (expectnot), + [voffp] "b" (voffp), + [load] "m" (*load) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_addv(intptr_t *v, intptr_t count, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) +#endif + /* load the value of @v */ + RSEQ_ASM_OP_R_LOAD(v) + /* add @count to it */ + RSEQ_ASM_OP_R_ADD(count) + /* final store */ + RSEQ_ASM_OP_R_FINAL_STORE(v, 2) + RSEQ_INJECT_ASM(4) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [count] "r" (count) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort +#ifdef RSEQ_COMPARE_TWICE + , error1 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) +#endif + /* try store */ + RSEQ_ASM_OP_STORE(newv2, v2) + RSEQ_INJECT_ASM(5) + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) +#endif + /* try store */ + RSEQ_ASM_OP_STORE(newv2, v2) + RSEQ_INJECT_ASM(5) + /* for 'release' */ + "lwsync\n\t" + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t expect2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) + /* cmp @v2 equal to @expct2 */ + RSEQ_ASM_OP_CMPEQ(v2, expect2, %l[cmpfail]) + RSEQ_INJECT_ASM(5) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) + /* cmp @v2 equal to @expct2 */ + RSEQ_ASM_OP_CMPEQ(v2, expect2, %l[error3]) +#endif + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* cmp2 input */ + [v2] "m" (*v2), + [expect2] "r" (expect2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2, error3 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("1st expected value comparison failed"); +error3: + rseq_bug("2nd expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* setup for mempcy */ + "mr %%r19, %[len]\n\t" + "mr %%r20, %[src]\n\t" + "mr %%r21, %[dst]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) +#endif + /* try memcpy */ + RSEQ_ASM_OP_R_MEMCPY() + RSEQ_INJECT_ASM(5) + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(6) + /* teardown */ + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17", "r18", "r19", "r20", "r21" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* setup for mempcy */ + "mr %%r19, %[len]\n\t" + "mr %%r20, %[src]\n\t" + "mr %%r21, %[dst]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) +#endif + /* try memcpy */ + RSEQ_ASM_OP_R_MEMCPY() + RSEQ_INJECT_ASM(5) + /* for 'release' */ + "lwsync\n\t" + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(6) + /* teardown */ + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17", "r18", "r19", "r20", "r21" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +#undef STORE_WORD +#undef LOAD_WORD +#undef LOADX_WORD +#undef CMP_WORD + +#endif /* !RSEQ_SKIP_FASTPATH */ diff --git a/tools/testing/selftests/rseq/rseq-skip.h b/tools/testing/selftests/rseq/rseq-skip.h new file mode 100644 index 000000000000..72750b5905a9 --- /dev/null +++ b/tools/testing/selftests/rseq/rseq-skip.h @@ -0,0 +1,65 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * rseq-skip.h + * + * (C) Copyright 2017-2018 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com> + */ + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot, + off_t voffp, intptr_t *load, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_addv(intptr_t *v, intptr_t count, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t expect2, + intptr_t newv, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + return -1; +} diff --git a/tools/testing/selftests/rseq/rseq-x86.h b/tools/testing/selftests/rseq/rseq-x86.h new file mode 100644 index 000000000000..089410a314e9 --- /dev/null +++ b/tools/testing/selftests/rseq/rseq-x86.h @@ -0,0 +1,1132 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * rseq-x86.h + * + * (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com> + */ + +#include <stdint.h> + +#define RSEQ_SIG 0x53053053 + +#ifdef __x86_64__ + +#define rseq_smp_mb() \ + __asm__ __volatile__ ("lock; addl $0,-128(%%rsp)" ::: "memory", "cc") +#define rseq_smp_rmb() rseq_barrier() +#define rseq_smp_wmb() rseq_barrier() + +#define rseq_smp_load_acquire(p) \ +__extension__ ({ \ + __typeof(*p) ____p1 = RSEQ_READ_ONCE(*p); \ + rseq_barrier(); \ + ____p1; \ +}) + +#define rseq_smp_acquire__after_ctrl_dep() rseq_smp_rmb() + +#define rseq_smp_store_release(p, v) \ +do { \ + rseq_barrier(); \ + RSEQ_WRITE_ONCE(*p, v); \ +} while (0) + +#ifdef RSEQ_SKIP_FASTPATH +#include "rseq-skip.h" +#else /* !RSEQ_SKIP_FASTPATH */ + +#define __RSEQ_ASM_DEFINE_TABLE(label, version, flags, \ + start_ip, post_commit_offset, abort_ip) \ + ".pushsection __rseq_table, \"aw\"\n\t" \ + ".balign 32\n\t" \ + __rseq_str(label) ":\n\t" \ + ".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + ".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_DEFINE_TABLE(label, start_ip, post_commit_ip, abort_ip) \ + __RSEQ_ASM_DEFINE_TABLE(label, 0x0, 0x0, start_ip, \ + (post_commit_ip - start_ip), abort_ip) + +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs) \ + RSEQ_INJECT_ASM(1) \ + "leaq " __rseq_str(cs_label) "(%%rip), %%rax\n\t" \ + "movq %%rax, %[" __rseq_str(rseq_cs) "]\n\t" \ + __rseq_str(label) ":\n\t" + +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label) \ + RSEQ_INJECT_ASM(2) \ + "cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \ + "jnz " __rseq_str(label) "\n\t" + +#define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ + ".pushsection __rseq_failure, \"ax\"\n\t" \ + /* Disassembler-friendly signature: nopl <sig>(%rip). */\ + ".byte 0x0f, 0x1f, 0x05\n\t" \ + ".long " __rseq_str(RSEQ_SIG) "\n\t" \ + __rseq_str(label) ":\n\t" \ + teardown \ + "jmp %l[" __rseq_str(abort_label) "]\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_DEFINE_CMPFAIL(label, teardown, cmpfail_label) \ + ".pushsection __rseq_failure, \"ax\"\n\t" \ + __rseq_str(label) ":\n\t" \ + teardown \ + "jmp %l[" __rseq_str(cmpfail_label) "]\n\t" \ + ".popsection\n\t" + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpq %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpq %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" +#endif + /* final store */ + "movq %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + : "memory", "cc", "rax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +/* + * Compare @v against @expectnot. When it does _not_ match, load @v + * into @load, and store the content of *@v + voffp into @v. + */ +static inline __attribute__((always_inline)) +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot, + off_t voffp, intptr_t *load, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "movq %[v], %%rbx\n\t" + "cmpq %%rbx, %[expectnot]\n\t" + "je %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "movq %[v], %%rbx\n\t" + "cmpq %%rbx, %[expectnot]\n\t" + "je %l[error2]\n\t" +#endif + "movq %%rbx, %[load]\n\t" + "addq %[voffp], %%rbx\n\t" + "movq (%%rbx), %%rbx\n\t" + /* final store */ + "movq %%rbx, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expectnot] "r" (expectnot), + [voffp] "er" (voffp), + [load] "m" (*load) + : "memory", "cc", "rax", "rbx" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_addv(intptr_t *v, intptr_t count, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) +#endif + /* final store */ + "addq %[count], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(4) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [count] "er" (count) + : "memory", "cc", "rax" + RSEQ_INJECT_CLOBBER + : abort +#ifdef RSEQ_COMPARE_TWICE + , error1 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpq %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpq %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" +#endif + /* try store */ + "movq %[newv2], %[v2]\n\t" + RSEQ_INJECT_ASM(5) + /* final store */ + "movq %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + : "memory", "cc", "rax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +/* x86-64 is TSO. */ +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + return rseq_cmpeqv_trystorev_storev(v, expect, v2, newv2, newv, cpu); +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t expect2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpq %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) + "cmpq %[v2], %[expect2]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(5) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpq %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" + "cmpq %[v2], %[expect2]\n\t" + "jnz %l[error3]\n\t" +#endif + /* final store */ + "movq %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* cmp2 input */ + [v2] "m" (*v2), + [expect2] "r" (expect2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + : "memory", "cc", "rax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2, error3 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("1st expected value comparison failed"); +error3: + rseq_bug("2nd expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + uint64_t rseq_scratch[3]; + + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + "movq %[src], %[rseq_scratch0]\n\t" + "movq %[dst], %[rseq_scratch1]\n\t" + "movq %[len], %[rseq_scratch2]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpq %[v], %[expect]\n\t" + "jnz 5f\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 6f) + "cmpq %[v], %[expect]\n\t" + "jnz 7f\n\t" +#endif + /* try memcpy */ + "test %[len], %[len]\n\t" \ + "jz 333f\n\t" \ + "222:\n\t" \ + "movb (%[src]), %%al\n\t" \ + "movb %%al, (%[dst])\n\t" \ + "inc %[src]\n\t" \ + "inc %[dst]\n\t" \ + "dec %[len]\n\t" \ + "jnz 222b\n\t" \ + "333:\n\t" \ + RSEQ_INJECT_ASM(5) + /* final store */ + "movq %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + /* teardown */ + "movq %[rseq_scratch2], %[len]\n\t" + "movq %[rseq_scratch1], %[dst]\n\t" + "movq %[rseq_scratch0], %[src]\n\t" + RSEQ_ASM_DEFINE_ABORT(4, + "movq %[rseq_scratch2], %[len]\n\t" + "movq %[rseq_scratch1], %[dst]\n\t" + "movq %[rseq_scratch0], %[src]\n\t", + abort) + RSEQ_ASM_DEFINE_CMPFAIL(5, + "movq %[rseq_scratch2], %[len]\n\t" + "movq %[rseq_scratch1], %[dst]\n\t" + "movq %[rseq_scratch0], %[src]\n\t", + cmpfail) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_DEFINE_CMPFAIL(6, + "movq %[rseq_scratch2], %[len]\n\t" + "movq %[rseq_scratch1], %[dst]\n\t" + "movq %[rseq_scratch0], %[src]\n\t", + error1) + RSEQ_ASM_DEFINE_CMPFAIL(7, + "movq %[rseq_scratch2], %[len]\n\t" + "movq %[rseq_scratch1], %[dst]\n\t" + "movq %[rseq_scratch0], %[src]\n\t", + error2) +#endif + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len), + [rseq_scratch0] "m" (rseq_scratch[0]), + [rseq_scratch1] "m" (rseq_scratch[1]), + [rseq_scratch2] "m" (rseq_scratch[2]) + : "memory", "cc", "rax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +/* x86-64 is TSO. */ +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + return rseq_cmpeqv_trymemcpy_storev(v, expect, dst, src, len, + newv, cpu); +} + +#endif /* !RSEQ_SKIP_FASTPATH */ + +#elif __i386__ + +#define rseq_smp_mb() \ + __asm__ __volatile__ ("lock; addl $0,-128(%%esp)" ::: "memory", "cc") +#define rseq_smp_rmb() \ + __asm__ __volatile__ ("lock; addl $0,-128(%%esp)" ::: "memory", "cc") +#define rseq_smp_wmb() \ + __asm__ __volatile__ ("lock; addl $0,-128(%%esp)" ::: "memory", "cc") + +#define rseq_smp_load_acquire(p) \ +__extension__ ({ \ + __typeof(*p) ____p1 = RSEQ_READ_ONCE(*p); \ + rseq_smp_mb(); \ + ____p1; \ +}) + +#define rseq_smp_acquire__after_ctrl_dep() rseq_smp_rmb() + +#define rseq_smp_store_release(p, v) \ +do { \ + rseq_smp_mb(); \ + RSEQ_WRITE_ONCE(*p, v); \ +} while (0) + +#ifdef RSEQ_SKIP_FASTPATH +#include "rseq-skip.h" +#else /* !RSEQ_SKIP_FASTPATH */ + +/* + * Use eax as scratch register and take memory operands as input to + * lessen register pressure. Especially needed when compiling in O0. + */ +#define __RSEQ_ASM_DEFINE_TABLE(label, version, flags, \ + start_ip, post_commit_offset, abort_ip) \ + ".pushsection __rseq_table, \"aw\"\n\t" \ + ".balign 32\n\t" \ + __rseq_str(label) ":\n\t" \ + ".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + ".long " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_DEFINE_TABLE(label, start_ip, post_commit_ip, abort_ip) \ + __RSEQ_ASM_DEFINE_TABLE(label, 0x0, 0x0, start_ip, \ + (post_commit_ip - start_ip), abort_ip) + +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs) \ + RSEQ_INJECT_ASM(1) \ + "movl $" __rseq_str(cs_label) ", %[rseq_cs]\n\t" \ + __rseq_str(label) ":\n\t" + +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label) \ + RSEQ_INJECT_ASM(2) \ + "cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \ + "jnz " __rseq_str(label) "\n\t" + +#define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ + ".pushsection __rseq_failure, \"ax\"\n\t" \ + /* Disassembler-friendly signature: nopl <sig>. */ \ + ".byte 0x0f, 0x1f, 0x05\n\t" \ + ".long " __rseq_str(RSEQ_SIG) "\n\t" \ + __rseq_str(label) ":\n\t" \ + teardown \ + "jmp %l[" __rseq_str(abort_label) "]\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_DEFINE_CMPFAIL(label, teardown, cmpfail_label) \ + ".pushsection __rseq_failure, \"ax\"\n\t" \ + __rseq_str(label) ":\n\t" \ + teardown \ + "jmp %l[" __rseq_str(cmpfail_label) "]\n\t" \ + ".popsection\n\t" + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpl %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpl %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" +#endif + /* final store */ + "movl %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +/* + * Compare @v against @expectnot. When it does _not_ match, load @v + * into @load, and store the content of *@v + voffp into @v. + */ +static inline __attribute__((always_inline)) +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot, + off_t voffp, intptr_t *load, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "movl %[v], %%ebx\n\t" + "cmpl %%ebx, %[expectnot]\n\t" + "je %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "movl %[v], %%ebx\n\t" + "cmpl %%ebx, %[expectnot]\n\t" + "je %l[error2]\n\t" +#endif + "movl %%ebx, %[load]\n\t" + "addl %[voffp], %%ebx\n\t" + "movl (%%ebx), %%ebx\n\t" + /* final store */ + "movl %%ebx, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expectnot] "r" (expectnot), + [voffp] "ir" (voffp), + [load] "m" (*load) + : "memory", "cc", "eax", "ebx" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_addv(intptr_t *v, intptr_t count, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) +#endif + /* final store */ + "addl %[count], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(4) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [count] "ir" (count) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort +#ifdef RSEQ_COMPARE_TWICE + , error1 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpl %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpl %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" +#endif + /* try store */ + "movl %[newv2], %%eax\n\t" + "movl %%eax, %[v2]\n\t" + RSEQ_INJECT_ASM(5) + /* final store */ + "movl %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "m" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "movl %[expect], %%eax\n\t" + "cmpl %[v], %%eax\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "movl %[expect], %%eax\n\t" + "cmpl %[v], %%eax\n\t" + "jnz %l[error2]\n\t" +#endif + /* try store */ + "movl %[newv2], %[v2]\n\t" + RSEQ_INJECT_ASM(5) + "lock; addl $0,-128(%%esp)\n\t" + /* final store */ + "movl %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "m" (expect), + [newv] "r" (newv) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif + +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t expect2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpl %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) + "cmpl %[expect2], %[v2]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(5) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpl %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" + "cmpl %[expect2], %[v2]\n\t" + "jnz %l[error3]\n\t" +#endif + "movl %[newv], %%eax\n\t" + /* final store */ + "movl %%eax, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* cmp2 input */ + [v2] "m" (*v2), + [expect2] "r" (expect2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "m" (newv) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2, error3 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("1st expected value comparison failed"); +error3: + rseq_bug("2nd expected value comparison failed"); +#endif +} + +/* TODO: implement a faster memcpy. */ +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + uint32_t rseq_scratch[3]; + + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + "movl %[src], %[rseq_scratch0]\n\t" + "movl %[dst], %[rseq_scratch1]\n\t" + "movl %[len], %[rseq_scratch2]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "movl %[expect], %%eax\n\t" + "cmpl %%eax, %[v]\n\t" + "jnz 5f\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 6f) + "movl %[expect], %%eax\n\t" + "cmpl %%eax, %[v]\n\t" + "jnz 7f\n\t" +#endif + /* try memcpy */ + "test %[len], %[len]\n\t" \ + "jz 333f\n\t" \ + "222:\n\t" \ + "movb (%[src]), %%al\n\t" \ + "movb %%al, (%[dst])\n\t" \ + "inc %[src]\n\t" \ + "inc %[dst]\n\t" \ + "dec %[len]\n\t" \ + "jnz 222b\n\t" \ + "333:\n\t" \ + RSEQ_INJECT_ASM(5) + "movl %[newv], %%eax\n\t" + /* final store */ + "movl %%eax, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + /* teardown */ + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t" + RSEQ_ASM_DEFINE_ABORT(4, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + abort) + RSEQ_ASM_DEFINE_CMPFAIL(5, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + cmpfail) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_DEFINE_CMPFAIL(6, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + error1) + RSEQ_ASM_DEFINE_CMPFAIL(7, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + error2) +#endif + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "m" (expect), + [newv] "m" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len), + [rseq_scratch0] "m" (rseq_scratch[0]), + [rseq_scratch1] "m" (rseq_scratch[1]), + [rseq_scratch2] "m" (rseq_scratch[2]) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +/* TODO: implement a faster memcpy. */ +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + uint32_t rseq_scratch[3]; + + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + "movl %[src], %[rseq_scratch0]\n\t" + "movl %[dst], %[rseq_scratch1]\n\t" + "movl %[len], %[rseq_scratch2]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "movl %[expect], %%eax\n\t" + "cmpl %%eax, %[v]\n\t" + "jnz 5f\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 6f) + "movl %[expect], %%eax\n\t" + "cmpl %%eax, %[v]\n\t" + "jnz 7f\n\t" +#endif + /* try memcpy */ + "test %[len], %[len]\n\t" \ + "jz 333f\n\t" \ + "222:\n\t" \ + "movb (%[src]), %%al\n\t" \ + "movb %%al, (%[dst])\n\t" \ + "inc %[src]\n\t" \ + "inc %[dst]\n\t" \ + "dec %[len]\n\t" \ + "jnz 222b\n\t" \ + "333:\n\t" \ + RSEQ_INJECT_ASM(5) + "lock; addl $0,-128(%%esp)\n\t" + "movl %[newv], %%eax\n\t" + /* final store */ + "movl %%eax, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + /* teardown */ + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t" + RSEQ_ASM_DEFINE_ABORT(4, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + abort) + RSEQ_ASM_DEFINE_CMPFAIL(5, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + cmpfail) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_DEFINE_CMPFAIL(6, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + error1) + RSEQ_ASM_DEFINE_CMPFAIL(7, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + error2) +#endif + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "m" (expect), + [newv] "m" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len), + [rseq_scratch0] "m" (rseq_scratch[0]), + [rseq_scratch1] "m" (rseq_scratch[1]), + [rseq_scratch2] "m" (rseq_scratch[2]) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +#endif /* !RSEQ_SKIP_FASTPATH */ + +#endif diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c new file mode 100644 index 000000000000..4847e97ed049 --- /dev/null +++ b/tools/testing/selftests/rseq/rseq.c @@ -0,0 +1,117 @@ +// SPDX-License-Identifier: LGPL-2.1 +/* + * rseq.c + * + * Copyright (C) 2016 Mathieu Desnoyers <mathieu.desnoyers@efficios.com> + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; only + * version 2.1 of the License. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + */ + +#define _GNU_SOURCE +#include <errno.h> +#include <sched.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> +#include <syscall.h> +#include <assert.h> +#include <signal.h> + +#include "rseq.h" + +#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])) + +__attribute__((tls_model("initial-exec"))) __thread +volatile struct rseq __rseq_abi = { + .cpu_id = RSEQ_CPU_ID_UNINITIALIZED, +}; + +static __attribute__((tls_model("initial-exec"))) __thread +volatile int refcount; + +static void signal_off_save(sigset_t *oldset) +{ + sigset_t set; + int ret; + + sigfillset(&set); + ret = pthread_sigmask(SIG_BLOCK, &set, oldset); + if (ret) + abort(); +} + +static void signal_restore(sigset_t oldset) +{ + int ret; + + ret = pthread_sigmask(SIG_SETMASK, &oldset, NULL); + if (ret) + abort(); +} + +static int sys_rseq(volatile struct rseq *rseq_abi, uint32_t rseq_len, + int flags, uint32_t sig) +{ + return syscall(__NR_rseq, rseq_abi, rseq_len, flags, sig); +} + +int rseq_register_current_thread(void) +{ + int rc, ret = 0; + sigset_t oldset; + + signal_off_save(&oldset); + if (refcount++) + goto end; + rc = sys_rseq(&__rseq_abi, sizeof(struct rseq), 0, RSEQ_SIG); + if (!rc) { + assert(rseq_current_cpu_raw() >= 0); + goto end; + } + if (errno != EBUSY) + __rseq_abi.cpu_id = -2; + ret = -1; + refcount--; +end: + signal_restore(oldset); + return ret; +} + +int rseq_unregister_current_thread(void) +{ + int rc, ret = 0; + sigset_t oldset; + + signal_off_save(&oldset); + if (--refcount) + goto end; + rc = sys_rseq(&__rseq_abi, sizeof(struct rseq), + RSEQ_FLAG_UNREGISTER, RSEQ_SIG); + if (!rc) + goto end; + ret = -1; +end: + signal_restore(oldset); + return ret; +} + +int32_t rseq_fallback_current_cpu(void) +{ + int32_t cpu; + + cpu = sched_getcpu(); + if (cpu < 0) { + perror("sched_getcpu()"); + abort(); + } + return cpu; +} diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h new file mode 100644 index 000000000000..0a808575cbc4 --- /dev/null +++ b/tools/testing/selftests/rseq/rseq.h @@ -0,0 +1,147 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * rseq.h + * + * (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com> + */ + +#ifndef RSEQ_H +#define RSEQ_H + +#include <stdint.h> +#include <stdbool.h> +#include <pthread.h> +#include <signal.h> +#include <sched.h> +#include <errno.h> +#include <stdio.h> +#include <stdlib.h> +#include <sched.h> +#include <linux/rseq.h> + +/* + * Empty code injection macros, override when testing. + * It is important to consider that the ASM injection macros need to be + * fully reentrant (e.g. do not modify the stack). + */ +#ifndef RSEQ_INJECT_ASM +#define RSEQ_INJECT_ASM(n) +#endif + +#ifndef RSEQ_INJECT_C +#define RSEQ_INJECT_C(n) +#endif + +#ifndef RSEQ_INJECT_INPUT +#define RSEQ_INJECT_INPUT +#endif + +#ifndef RSEQ_INJECT_CLOBBER +#define RSEQ_INJECT_CLOBBER +#endif + +#ifndef RSEQ_INJECT_FAILED +#define RSEQ_INJECT_FAILED +#endif + +extern __thread volatile struct rseq __rseq_abi; + +#define rseq_likely(x) __builtin_expect(!!(x), 1) +#define rseq_unlikely(x) __builtin_expect(!!(x), 0) +#define rseq_barrier() __asm__ __volatile__("" : : : "memory") + +#define RSEQ_ACCESS_ONCE(x) (*(__volatile__ __typeof__(x) *)&(x)) +#define RSEQ_WRITE_ONCE(x, v) __extension__ ({ RSEQ_ACCESS_ONCE(x) = (v); }) +#define RSEQ_READ_ONCE(x) RSEQ_ACCESS_ONCE(x) + +#define __rseq_str_1(x) #x +#define __rseq_str(x) __rseq_str_1(x) + +#define rseq_log(fmt, args...) \ + fprintf(stderr, fmt "(in %s() at " __FILE__ ":" __rseq_str(__LINE__)"\n", \ + ## args, __func__) + +#define rseq_bug(fmt, args...) \ + do { \ + rseq_log(fmt, ##args); \ + abort(); \ + } while (0) + +#if defined(__x86_64__) || defined(__i386__) +#include <rseq-x86.h> +#elif defined(__ARMEL__) +#include <rseq-arm.h> +#elif defined(__PPC__) +#include <rseq-ppc.h> +#else +#error unsupported target +#endif + +/* + * Register rseq for the current thread. This needs to be called once + * by any thread which uses restartable sequences, before they start + * using restartable sequences, to ensure restartable sequences + * succeed. A restartable sequence executed from a non-registered + * thread will always fail. + */ +int rseq_register_current_thread(void); + +/* + * Unregister rseq for current thread. + */ +int rseq_unregister_current_thread(void); + +/* + * Restartable sequence fallback for reading the current CPU number. + */ +int32_t rseq_fallback_current_cpu(void); + +/* + * Values returned can be either the current CPU number, -1 (rseq is + * uninitialized), or -2 (rseq initialization has failed). + */ +static inline int32_t rseq_current_cpu_raw(void) +{ + return RSEQ_ACCESS_ONCE(__rseq_abi.cpu_id); +} + +/* + * Returns a possible CPU number, which is typically the current CPU. + * The returned CPU number can be used to prepare for an rseq critical + * section, which will confirm whether the cpu number is indeed the + * current one, and whether rseq is initialized. + * + * The CPU number returned by rseq_cpu_start should always be validated + * by passing it to a rseq asm sequence, or by comparing it to the + * return value of rseq_current_cpu_raw() if the rseq asm sequence + * does not need to be invoked. + */ +static inline uint32_t rseq_cpu_start(void) +{ + return RSEQ_ACCESS_ONCE(__rseq_abi.cpu_id_start); +} + +static inline uint32_t rseq_current_cpu(void) +{ + int32_t cpu; + + cpu = rseq_current_cpu_raw(); + if (rseq_unlikely(cpu < 0)) + cpu = rseq_fallback_current_cpu(); + return cpu; +} + +/* + * rseq_prepare_unload() should be invoked by each thread using rseq_finish*() + * at least once between their last rseq_finish*() and library unload of the + * library defining the rseq critical section (struct rseq_cs). This also + * applies to use of rseq in code generated by JIT: rseq_prepare_unload() + * should be invoked at least once by each thread using rseq_finish*() before + * reclaim of the memory holding the struct rseq_cs. + */ +static inline void rseq_prepare_unload(void) +{ + __rseq_abi.rseq_cs = 0; +} + +#endif /* RSEQ_H_ */ -- 2.11.0 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 10/14] rseq: selftests: Provide rseq library (v5) @ 2018-04-30 22:44 ` mathieu.desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers, Shuah Khan This rseq helper library provides a user-space API to the rseq() system call. The rseq fast-path exposes the instruction pointer addresses where the rseq assembly blocks begin and end, as well as the associated abort instruction pointer, in the __rseq_table section. This section allows debuggers may know where to place breakpoints when single-stepping through assembly blocks which may be aborted at any point by the kernel. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Shuah Khan <shuahkh@osg.samsung.com> CC: Russell King <linux@arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas@arm.com> CC: Will Deacon <will.deacon@arm.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Paul Turner <pjt@google.com> CC: Andrew Hunter <ahh@google.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Andi Kleen <andi@firstfloor.org> CC: Dave Watson <davejwatson@fb.com> CC: Chris Lameter <cl@linux.com> CC: Ingo Molnar <mingo@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Ben Maurer <bmaurer@fb.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Boqun Feng <boqun.feng@gmail.com> CC: linux-kselftest@vger.kernel.org CC: linux-api@vger.kernel.org --- Changes since v1: - Provide abort-ip signature: The abort-ip signature is located just before the abort-ip target. It is currently hardcoded, but a user-space application could use the __rseq_table to iterate on all abort-ip targets and use a random value as signature if needed in the future. - Add rseq_prepare_unload(): Libraries and JIT code using rseq critical sections need to issue rseq_prepare_unload() on each thread at least once before reclaim of struct rseq_cs. - Use initial-exec TLS model, non-weak symbol: The initial-exec model is signal-safe, whereas the global-dynamic model is not. Remove the "weak" symbol attribute from the __rseq_abi in rseq.c. The rseq.so library will have ownership of that symbol, and there is not reason for an application or user library to try to define that symbol. The expected use is to link against libreq.so, which owns and provide that symbol. - Set cpu_id to -2 on register error - Add rseq_len syscall parameter, rseq_cs version - Ensure disassember-friendly signature: x86 32/64 disassembler have a hard time decoding the instruction stream after a bad instruction. Use a nopl instruction to encode the signature. Suggested by Andy Lutomirski. - Exercise parametrized tests variants in a shell scripts. - Restartable sequences selftests: Remove use of event counter. - Use cpu_id_start field: With the cpu_id_start field, the C preparation phase of the fast-path does not need to compare cpu_id < 0 anymore. - Signal-safe registration and refcounting: Allow libraries using librseq.so to register it from signal handlers. - Use OVERRIDE_TARGETS in makefile. - Use "m" constraints for rseq_cs field. Changes since v2: - Update based on Thomas Gleixner's comments. Changes since v3: - Generate param_test_skip_fastpath and param_test_benchmark with -DSKIP_FASTPATH and -DBENCHMARK (respectively). Add param_test_fastpath to run_param_test.sh. Changes since v4: - Fold arm: workaround gcc asm size guess, - Namespace barrier() -> rseq_barrier() in library header, - Take into account coding style feedback from Peter Zijlstra, - Split rseq selftests into logical commits. --- tools/testing/selftests/rseq/rseq-arm.h | 715 +++++++++++++++++++ tools/testing/selftests/rseq/rseq-ppc.h | 671 ++++++++++++++++++ tools/testing/selftests/rseq/rseq-skip.h | 65 ++ tools/testing/selftests/rseq/rseq-x86.h | 1132 ++++++++++++++++++++++++++++++ tools/testing/selftests/rseq/rseq.c | 117 +++ tools/testing/selftests/rseq/rseq.h | 147 ++++ 6 files changed, 2847 insertions(+) create mode 100644 tools/testing/selftests/rseq/rseq-arm.h create mode 100644 tools/testing/selftests/rseq/rseq-ppc.h create mode 100644 tools/testing/selftests/rseq/rseq-skip.h create mode 100644 tools/testing/selftests/rseq/rseq-x86.h create mode 100644 tools/testing/selftests/rseq/rseq.c create mode 100644 tools/testing/selftests/rseq/rseq.h diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h new file mode 100644 index 000000000000..3b055f9aeaab --- /dev/null +++ b/tools/testing/selftests/rseq/rseq-arm.h @@ -0,0 +1,715 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * rseq-arm.h + * + * (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com> + */ + +#define RSEQ_SIG 0x53053053 + +#define rseq_smp_mb() __asm__ __volatile__ ("dmb" ::: "memory", "cc") +#define rseq_smp_rmb() __asm__ __volatile__ ("dmb" ::: "memory", "cc") +#define rseq_smp_wmb() __asm__ __volatile__ ("dmb" ::: "memory", "cc") + +#define rseq_smp_load_acquire(p) \ +__extension__ ({ \ + __typeof(*p) ____p1 = RSEQ_READ_ONCE(*p); \ + rseq_smp_mb(); \ + ____p1; \ +}) + +#define rseq_smp_acquire__after_ctrl_dep() rseq_smp_rmb() + +#define rseq_smp_store_release(p, v) \ +do { \ + rseq_smp_mb(); \ + RSEQ_WRITE_ONCE(*p, v); \ +} while (0) + +#ifdef RSEQ_SKIP_FASTPATH +#include "rseq-skip.h" +#else /* !RSEQ_SKIP_FASTPATH */ + +#define __RSEQ_ASM_DEFINE_TABLE(version, flags, start_ip, \ + post_commit_offset, abort_ip) \ + ".pushsection __rseq_table, \"aw\"\n\t" \ + ".balign 32\n\t" \ + ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + ".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_DEFINE_TABLE(start_ip, post_commit_ip, abort_ip) \ + __RSEQ_ASM_DEFINE_TABLE(0x0, 0x0, start_ip, \ + (post_commit_ip - start_ip), abort_ip) + +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs) \ + RSEQ_INJECT_ASM(1) \ + "adr r0, " __rseq_str(cs_label) "\n\t" \ + "str r0, %[" __rseq_str(rseq_cs) "]\n\t" \ + __rseq_str(label) ":\n\t" + +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label) \ + RSEQ_INJECT_ASM(2) \ + "ldr r0, %[" __rseq_str(current_cpu_id) "]\n\t" \ + "cmp %[" __rseq_str(cpu_id) "], r0\n\t" \ + "bne " __rseq_str(label) "\n\t" + +#define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \ + abort_label, version, flags, \ + start_ip, post_commit_offset, abort_ip) \ + __rseq_str(table_label) ":\n\t" \ + ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + ".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \ + ".word " __rseq_str(RSEQ_SIG) "\n\t" \ + __rseq_str(label) ":\n\t" \ + teardown \ + "b %l[" __rseq_str(abort_label) "]\n\t" + +#define RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, abort_label, \ + start_ip, post_commit_ip, abort_ip) \ + __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \ + abort_label, 0x0, 0x0, start_ip, \ + (post_commit_ip - start_ip), abort_ip) + +#define RSEQ_ASM_DEFINE_CMPFAIL(label, teardown, cmpfail_label) \ + __rseq_str(label) ":\n\t" \ + teardown \ + "b %l[" __rseq_str(cmpfail_label) "]\n\t" + +#define rseq_workaround_gcc_asm_size_guess() __asm__ __volatile__("") + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[error2]\n\t" +#endif + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot, + off_t voffp, intptr_t *load, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expectnot], r0\n\t" + "beq %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "ldr r0, %[v]\n\t" + "cmp %[expectnot], r0\n\t" + "beq %l[error2]\n\t" +#endif + "str r0, %[load]\n\t" + "add r0, %[voffp]\n\t" + "ldr r0, [r0]\n\t" + /* final store */ + "str r0, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expectnot] "r" (expectnot), + [voffp] "Ir" (voffp), + [load] "m" (*load) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_addv(intptr_t *v, intptr_t count, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) +#endif + "ldr r0, %[v]\n\t" + "add r0, %[count]\n\t" + /* final store */ + "str r0, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(4) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + [v] "m" (*v), + [count] "Ir" (count) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort +#ifdef RSEQ_COMPARE_TWICE + , error1 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[error2]\n\t" +#endif + /* try store */ + "str %[newv2], %[v2]\n\t" + RSEQ_INJECT_ASM(5) + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[error2]\n\t" +#endif + /* try store */ + "str %[newv2], %[v2]\n\t" + RSEQ_INJECT_ASM(5) + "dmb\n\t" /* full mb provides store-release */ + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t expect2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) + "ldr r0, %[v2]\n\t" + "cmp %[expect2], r0\n\t" + "bne %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(5) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[error2]\n\t" + "ldr r0, %[v2]\n\t" + "cmp %[expect2], r0\n\t" + "bne %l[error3]\n\t" +#endif + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* cmp2 input */ + [v2] "m" (*v2), + [expect2] "r" (expect2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2, error3 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("1st expected value comparison failed"); +error3: + rseq_bug("2nd expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + uint32_t rseq_scratch[3]; + + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + "str %[src], %[rseq_scratch0]\n\t" + "str %[dst], %[rseq_scratch1]\n\t" + "str %[len], %[rseq_scratch2]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne 5f\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 6f) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne 7f\n\t" +#endif + /* try memcpy */ + "cmp %[len], #0\n\t" \ + "beq 333f\n\t" \ + "222:\n\t" \ + "ldrb %%r0, [%[src]]\n\t" \ + "strb %%r0, [%[dst]]\n\t" \ + "adds %[src], #1\n\t" \ + "adds %[dst], #1\n\t" \ + "subs %[len], #1\n\t" \ + "bne 222b\n\t" \ + "333:\n\t" \ + RSEQ_INJECT_ASM(5) + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t" + "b 8f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + abort, 1b, 2b, 4f) + RSEQ_ASM_DEFINE_CMPFAIL(5, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + cmpfail) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_DEFINE_CMPFAIL(6, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + error1) + RSEQ_ASM_DEFINE_CMPFAIL(7, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + error2) +#endif + "8:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len), + [rseq_scratch0] "m" (rseq_scratch[0]), + [rseq_scratch1] "m" (rseq_scratch[1]), + [rseq_scratch2] "m" (rseq_scratch[2]) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_workaround_gcc_asm_size_guess(); + rseq_bug("cpu_id comparison failed"); +error2: + rseq_workaround_gcc_asm_size_guess(); + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + uint32_t rseq_scratch[3]; + + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + "str %[src], %[rseq_scratch0]\n\t" + "str %[dst], %[rseq_scratch1]\n\t" + "str %[len], %[rseq_scratch2]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne 5f\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 6f) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne 7f\n\t" +#endif + /* try memcpy */ + "cmp %[len], #0\n\t" \ + "beq 333f\n\t" \ + "222:\n\t" \ + "ldrb %%r0, [%[src]]\n\t" \ + "strb %%r0, [%[dst]]\n\t" \ + "adds %[src], #1\n\t" \ + "adds %[dst], #1\n\t" \ + "subs %[len], #1\n\t" \ + "bne 222b\n\t" \ + "333:\n\t" \ + RSEQ_INJECT_ASM(5) + "dmb\n\t" /* full mb provides store-release */ + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t" + "b 8f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + abort, 1b, 2b, 4f) + RSEQ_ASM_DEFINE_CMPFAIL(5, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + cmpfail) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_DEFINE_CMPFAIL(6, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + error1) + RSEQ_ASM_DEFINE_CMPFAIL(7, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + error2) +#endif + "8:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len), + [rseq_scratch0] "m" (rseq_scratch[0]), + [rseq_scratch1] "m" (rseq_scratch[1]), + [rseq_scratch2] "m" (rseq_scratch[2]) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_workaround_gcc_asm_size_guess(); + rseq_bug("cpu_id comparison failed"); +error2: + rseq_workaround_gcc_asm_size_guess(); + rseq_bug("expected value comparison failed"); +#endif +} + +#endif /* !RSEQ_SKIP_FASTPATH */ diff --git a/tools/testing/selftests/rseq/rseq-ppc.h b/tools/testing/selftests/rseq/rseq-ppc.h new file mode 100644 index 000000000000..52630c9f42be --- /dev/null +++ b/tools/testing/selftests/rseq/rseq-ppc.h @@ -0,0 +1,671 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * rseq-ppc.h + * + * (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com> + * (C) Copyright 2016-2018 - Boqun Feng <boqun.feng@gmail.com> + */ + +#define RSEQ_SIG 0x53053053 + +#define rseq_smp_mb() __asm__ __volatile__ ("sync" ::: "memory", "cc") +#define rseq_smp_lwsync() __asm__ __volatile__ ("lwsync" ::: "memory", "cc") +#define rseq_smp_rmb() rseq_smp_lwsync() +#define rseq_smp_wmb() rseq_smp_lwsync() + +#define rseq_smp_load_acquire(p) \ +__extension__ ({ \ + __typeof(*p) ____p1 = RSEQ_READ_ONCE(*p); \ + rseq_smp_lwsync(); \ + ____p1; \ +}) + +#define rseq_smp_acquire__after_ctrl_dep() rseq_smp_lwsync() + +#define rseq_smp_store_release(p, v) \ +do { \ + rseq_smp_lwsync(); \ + RSEQ_WRITE_ONCE(*p, v); \ +} while (0) + +#ifdef RSEQ_SKIP_FASTPATH +#include "rseq-skip.h" +#else /* !RSEQ_SKIP_FASTPATH */ + +/* + * The __rseq_table section can be used by debuggers to better handle + * single-stepping through the restartable critical sections. + */ + +#ifdef __PPC64__ + +#define STORE_WORD "std " +#define LOAD_WORD "ld " +#define LOADX_WORD "ldx " +#define CMP_WORD "cmpd " + +#define __RSEQ_ASM_DEFINE_TABLE(label, version, flags, \ + start_ip, post_commit_offset, abort_ip) \ + ".pushsection __rseq_table, \"aw\"\n\t" \ + ".balign 32\n\t" \ + __rseq_str(label) ":\n\t" \ + ".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + ".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs) \ + RSEQ_INJECT_ASM(1) \ + "lis %%r17, (" __rseq_str(cs_label) ")@highest\n\t" \ + "ori %%r17, %%r17, (" __rseq_str(cs_label) ")@higher\n\t" \ + "rldicr %%r17, %%r17, 32, 31\n\t" \ + "oris %%r17, %%r17, (" __rseq_str(cs_label) ")@high\n\t" \ + "ori %%r17, %%r17, (" __rseq_str(cs_label) ")@l\n\t" \ + "std %%r17, %[" __rseq_str(rseq_cs) "]\n\t" \ + __rseq_str(label) ":\n\t" + +#else /* #ifdef __PPC64__ */ + +#define STORE_WORD "stw " +#define LOAD_WORD "lwz " +#define LOADX_WORD "lwzx " +#define CMP_WORD "cmpw " + +#define __RSEQ_ASM_DEFINE_TABLE(label, version, flags, \ + start_ip, post_commit_offset, abort_ip) \ + ".pushsection __rseq_table, \"aw\"\n\t" \ + ".balign 32\n\t" \ + __rseq_str(label) ":\n\t" \ + ".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + /* 32-bit only supported on BE */ \ + ".long 0x0, " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) "\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs) \ + RSEQ_INJECT_ASM(1) \ + "lis %%r17, (" __rseq_str(cs_label) ")@ha\n\t" \ + "addi %%r17, %%r17, (" __rseq_str(cs_label) ")@l\n\t" \ + "stw %%r17, %[" __rseq_str(rseq_cs) "]\n\t" \ + __rseq_str(label) ":\n\t" + +#endif /* #ifdef __PPC64__ */ + +#define RSEQ_ASM_DEFINE_TABLE(label, start_ip, post_commit_ip, abort_ip) \ + __RSEQ_ASM_DEFINE_TABLE(label, 0x0, 0x0, start_ip, \ + (post_commit_ip - start_ip), abort_ip) + +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label) \ + RSEQ_INJECT_ASM(2) \ + "lwz %%r17, %[" __rseq_str(current_cpu_id) "]\n\t" \ + "cmpw cr7, %[" __rseq_str(cpu_id) "], %%r17\n\t" \ + "bne- cr7, " __rseq_str(label) "\n\t" + +#define RSEQ_ASM_DEFINE_ABORT(label, abort_label) \ + ".pushsection __rseq_failure, \"ax\"\n\t" \ + ".long " __rseq_str(RSEQ_SIG) "\n\t" \ + __rseq_str(label) ":\n\t" \ + "b %l[" __rseq_str(abort_label) "]\n\t" \ + ".popsection\n\t" + +/* + * RSEQ_ASM_OPs: asm operations for rseq + * RSEQ_ASM_OP_R_*: has hard-code registers in it + * RSEQ_ASM_OP_* (else): doesn't have hard-code registers(unless cr7) + */ +#define RSEQ_ASM_OP_CMPEQ(var, expect, label) \ + LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t" \ + CMP_WORD "cr7, %%r17, %[" __rseq_str(expect) "]\n\t" \ + "bne- cr7, " __rseq_str(label) "\n\t" + +#define RSEQ_ASM_OP_CMPNE(var, expectnot, label) \ + LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t" \ + CMP_WORD "cr7, %%r17, %[" __rseq_str(expectnot) "]\n\t" \ + "beq- cr7, " __rseq_str(label) "\n\t" + +#define RSEQ_ASM_OP_STORE(value, var) \ + STORE_WORD "%[" __rseq_str(value) "], %[" __rseq_str(var) "]\n\t" + +/* Load @var to r17 */ +#define RSEQ_ASM_OP_R_LOAD(var) \ + LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t" + +/* Store r17 to @var */ +#define RSEQ_ASM_OP_R_STORE(var) \ + STORE_WORD "%%r17, %[" __rseq_str(var) "]\n\t" + +/* Add @count to r17 */ +#define RSEQ_ASM_OP_R_ADD(count) \ + "add %%r17, %[" __rseq_str(count) "], %%r17\n\t" + +/* Load (r17 + voffp) to r17 */ +#define RSEQ_ASM_OP_R_LOADX(voffp) \ + LOADX_WORD "%%r17, %[" __rseq_str(voffp) "], %%r17\n\t" + +/* TODO: implement a faster memcpy. */ +#define RSEQ_ASM_OP_R_MEMCPY() \ + "cmpdi %%r19, 0\n\t" \ + "beq 333f\n\t" \ + "addi %%r20, %%r20, -1\n\t" \ + "addi %%r21, %%r21, -1\n\t" \ + "222:\n\t" \ + "lbzu %%r18, 1(%%r20)\n\t" \ + "stbu %%r18, 1(%%r21)\n\t" \ + "addi %%r19, %%r19, -1\n\t" \ + "cmpdi %%r19, 0\n\t" \ + "bne 222b\n\t" \ + "333:\n\t" \ + +#define RSEQ_ASM_OP_R_FINAL_STORE(var, post_commit_label) \ + STORE_WORD "%%r17, %[" __rseq_str(var) "]\n\t" \ + __rseq_str(post_commit_label) ":\n\t" + +#define RSEQ_ASM_OP_FINAL_STORE(value, var, post_commit_label) \ + STORE_WORD "%[" __rseq_str(value) "], %[" __rseq_str(var) "]\n\t" \ + __rseq_str(post_commit_label) ":\n\t" + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) +#endif + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot, + off_t voffp, intptr_t *load, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v not equal to @expectnot */ + RSEQ_ASM_OP_CMPNE(v, expectnot, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v not equal to @expectnot */ + RSEQ_ASM_OP_CMPNE(v, expectnot, %l[error2]) +#endif + /* load the value of @v */ + RSEQ_ASM_OP_R_LOAD(v) + /* store it in @load */ + RSEQ_ASM_OP_R_STORE(load) + /* dereference voffp(v) */ + RSEQ_ASM_OP_R_LOADX(voffp) + /* final store the value at voffp(v) */ + RSEQ_ASM_OP_R_FINAL_STORE(v, 2) + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expectnot] "r" (expectnot), + [voffp] "b" (voffp), + [load] "m" (*load) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_addv(intptr_t *v, intptr_t count, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) +#endif + /* load the value of @v */ + RSEQ_ASM_OP_R_LOAD(v) + /* add @count to it */ + RSEQ_ASM_OP_R_ADD(count) + /* final store */ + RSEQ_ASM_OP_R_FINAL_STORE(v, 2) + RSEQ_INJECT_ASM(4) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [count] "r" (count) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort +#ifdef RSEQ_COMPARE_TWICE + , error1 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) +#endif + /* try store */ + RSEQ_ASM_OP_STORE(newv2, v2) + RSEQ_INJECT_ASM(5) + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) +#endif + /* try store */ + RSEQ_ASM_OP_STORE(newv2, v2) + RSEQ_INJECT_ASM(5) + /* for 'release' */ + "lwsync\n\t" + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t expect2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) + /* cmp @v2 equal to @expct2 */ + RSEQ_ASM_OP_CMPEQ(v2, expect2, %l[cmpfail]) + RSEQ_INJECT_ASM(5) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) + /* cmp @v2 equal to @expct2 */ + RSEQ_ASM_OP_CMPEQ(v2, expect2, %l[error3]) +#endif + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* cmp2 input */ + [v2] "m" (*v2), + [expect2] "r" (expect2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2, error3 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("1st expected value comparison failed"); +error3: + rseq_bug("2nd expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* setup for mempcy */ + "mr %%r19, %[len]\n\t" + "mr %%r20, %[src]\n\t" + "mr %%r21, %[dst]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) +#endif + /* try memcpy */ + RSEQ_ASM_OP_R_MEMCPY() + RSEQ_INJECT_ASM(5) + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(6) + /* teardown */ + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17", "r18", "r19", "r20", "r21" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* setup for mempcy */ + "mr %%r19, %[len]\n\t" + "mr %%r20, %[src]\n\t" + "mr %%r21, %[dst]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) +#endif + /* try memcpy */ + RSEQ_ASM_OP_R_MEMCPY() + RSEQ_INJECT_ASM(5) + /* for 'release' */ + "lwsync\n\t" + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(6) + /* teardown */ + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17", "r18", "r19", "r20", "r21" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +#undef STORE_WORD +#undef LOAD_WORD +#undef LOADX_WORD +#undef CMP_WORD + +#endif /* !RSEQ_SKIP_FASTPATH */ diff --git a/tools/testing/selftests/rseq/rseq-skip.h b/tools/testing/selftests/rseq/rseq-skip.h new file mode 100644 index 000000000000..72750b5905a9 --- /dev/null +++ b/tools/testing/selftests/rseq/rseq-skip.h @@ -0,0 +1,65 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * rseq-skip.h + * + * (C) Copyright 2017-2018 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com> + */ + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot, + off_t voffp, intptr_t *load, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_addv(intptr_t *v, intptr_t count, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t expect2, + intptr_t newv, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + return -1; +} diff --git a/tools/testing/selftests/rseq/rseq-x86.h b/tools/testing/selftests/rseq/rseq-x86.h new file mode 100644 index 000000000000..089410a314e9 --- /dev/null +++ b/tools/testing/selftests/rseq/rseq-x86.h @@ -0,0 +1,1132 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * rseq-x86.h + * + * (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com> + */ + +#include <stdint.h> + +#define RSEQ_SIG 0x53053053 + +#ifdef __x86_64__ + +#define rseq_smp_mb() \ + __asm__ __volatile__ ("lock; addl $0,-128(%%rsp)" ::: "memory", "cc") +#define rseq_smp_rmb() rseq_barrier() +#define rseq_smp_wmb() rseq_barrier() + +#define rseq_smp_load_acquire(p) \ +__extension__ ({ \ + __typeof(*p) ____p1 = RSEQ_READ_ONCE(*p); \ + rseq_barrier(); \ + ____p1; \ +}) + +#define rseq_smp_acquire__after_ctrl_dep() rseq_smp_rmb() + +#define rseq_smp_store_release(p, v) \ +do { \ + rseq_barrier(); \ + RSEQ_WRITE_ONCE(*p, v); \ +} while (0) + +#ifdef RSEQ_SKIP_FASTPATH +#include "rseq-skip.h" +#else /* !RSEQ_SKIP_FASTPATH */ + +#define __RSEQ_ASM_DEFINE_TABLE(label, version, flags, \ + start_ip, post_commit_offset, abort_ip) \ + ".pushsection __rseq_table, \"aw\"\n\t" \ + ".balign 32\n\t" \ + __rseq_str(label) ":\n\t" \ + ".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + ".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_DEFINE_TABLE(label, start_ip, post_commit_ip, abort_ip) \ + __RSEQ_ASM_DEFINE_TABLE(label, 0x0, 0x0, start_ip, \ + (post_commit_ip - start_ip), abort_ip) + +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs) \ + RSEQ_INJECT_ASM(1) \ + "leaq " __rseq_str(cs_label) "(%%rip), %%rax\n\t" \ + "movq %%rax, %[" __rseq_str(rseq_cs) "]\n\t" \ + __rseq_str(label) ":\n\t" + +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label) \ + RSEQ_INJECT_ASM(2) \ + "cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \ + "jnz " __rseq_str(label) "\n\t" + +#define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ + ".pushsection __rseq_failure, \"ax\"\n\t" \ + /* Disassembler-friendly signature: nopl <sig>(%rip). */\ + ".byte 0x0f, 0x1f, 0x05\n\t" \ + ".long " __rseq_str(RSEQ_SIG) "\n\t" \ + __rseq_str(label) ":\n\t" \ + teardown \ + "jmp %l[" __rseq_str(abort_label) "]\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_DEFINE_CMPFAIL(label, teardown, cmpfail_label) \ + ".pushsection __rseq_failure, \"ax\"\n\t" \ + __rseq_str(label) ":\n\t" \ + teardown \ + "jmp %l[" __rseq_str(cmpfail_label) "]\n\t" \ + ".popsection\n\t" + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpq %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpq %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" +#endif + /* final store */ + "movq %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + : "memory", "cc", "rax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +/* + * Compare @v against @expectnot. When it does _not_ match, load @v + * into @load, and store the content of *@v + voffp into @v. + */ +static inline __attribute__((always_inline)) +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot, + off_t voffp, intptr_t *load, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "movq %[v], %%rbx\n\t" + "cmpq %%rbx, %[expectnot]\n\t" + "je %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "movq %[v], %%rbx\n\t" + "cmpq %%rbx, %[expectnot]\n\t" + "je %l[error2]\n\t" +#endif + "movq %%rbx, %[load]\n\t" + "addq %[voffp], %%rbx\n\t" + "movq (%%rbx), %%rbx\n\t" + /* final store */ + "movq %%rbx, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expectnot] "r" (expectnot), + [voffp] "er" (voffp), + [load] "m" (*load) + : "memory", "cc", "rax", "rbx" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_addv(intptr_t *v, intptr_t count, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) +#endif + /* final store */ + "addq %[count], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(4) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [count] "er" (count) + : "memory", "cc", "rax" + RSEQ_INJECT_CLOBBER + : abort +#ifdef RSEQ_COMPARE_TWICE + , error1 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpq %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpq %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" +#endif + /* try store */ + "movq %[newv2], %[v2]\n\t" + RSEQ_INJECT_ASM(5) + /* final store */ + "movq %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + : "memory", "cc", "rax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +/* x86-64 is TSO. */ +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + return rseq_cmpeqv_trystorev_storev(v, expect, v2, newv2, newv, cpu); +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t expect2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpq %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) + "cmpq %[v2], %[expect2]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(5) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpq %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" + "cmpq %[v2], %[expect2]\n\t" + "jnz %l[error3]\n\t" +#endif + /* final store */ + "movq %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* cmp2 input */ + [v2] "m" (*v2), + [expect2] "r" (expect2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + : "memory", "cc", "rax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2, error3 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("1st expected value comparison failed"); +error3: + rseq_bug("2nd expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + uint64_t rseq_scratch[3]; + + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + "movq %[src], %[rseq_scratch0]\n\t" + "movq %[dst], %[rseq_scratch1]\n\t" + "movq %[len], %[rseq_scratch2]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpq %[v], %[expect]\n\t" + "jnz 5f\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 6f) + "cmpq %[v], %[expect]\n\t" + "jnz 7f\n\t" +#endif + /* try memcpy */ + "test %[len], %[len]\n\t" \ + "jz 333f\n\t" \ + "222:\n\t" \ + "movb (%[src]), %%al\n\t" \ + "movb %%al, (%[dst])\n\t" \ + "inc %[src]\n\t" \ + "inc %[dst]\n\t" \ + "dec %[len]\n\t" \ + "jnz 222b\n\t" \ + "333:\n\t" \ + RSEQ_INJECT_ASM(5) + /* final store */ + "movq %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + /* teardown */ + "movq %[rseq_scratch2], %[len]\n\t" + "movq %[rseq_scratch1], %[dst]\n\t" + "movq %[rseq_scratch0], %[src]\n\t" + RSEQ_ASM_DEFINE_ABORT(4, + "movq %[rseq_scratch2], %[len]\n\t" + "movq %[rseq_scratch1], %[dst]\n\t" + "movq %[rseq_scratch0], %[src]\n\t", + abort) + RSEQ_ASM_DEFINE_CMPFAIL(5, + "movq %[rseq_scratch2], %[len]\n\t" + "movq %[rseq_scratch1], %[dst]\n\t" + "movq %[rseq_scratch0], %[src]\n\t", + cmpfail) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_DEFINE_CMPFAIL(6, + "movq %[rseq_scratch2], %[len]\n\t" + "movq %[rseq_scratch1], %[dst]\n\t" + "movq %[rseq_scratch0], %[src]\n\t", + error1) + RSEQ_ASM_DEFINE_CMPFAIL(7, + "movq %[rseq_scratch2], %[len]\n\t" + "movq %[rseq_scratch1], %[dst]\n\t" + "movq %[rseq_scratch0], %[src]\n\t", + error2) +#endif + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len), + [rseq_scratch0] "m" (rseq_scratch[0]), + [rseq_scratch1] "m" (rseq_scratch[1]), + [rseq_scratch2] "m" (rseq_scratch[2]) + : "memory", "cc", "rax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +/* x86-64 is TSO. */ +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + return rseq_cmpeqv_trymemcpy_storev(v, expect, dst, src, len, + newv, cpu); +} + +#endif /* !RSEQ_SKIP_FASTPATH */ + +#elif __i386__ + +#define rseq_smp_mb() \ + __asm__ __volatile__ ("lock; addl $0,-128(%%esp)" ::: "memory", "cc") +#define rseq_smp_rmb() \ + __asm__ __volatile__ ("lock; addl $0,-128(%%esp)" ::: "memory", "cc") +#define rseq_smp_wmb() \ + __asm__ __volatile__ ("lock; addl $0,-128(%%esp)" ::: "memory", "cc") + +#define rseq_smp_load_acquire(p) \ +__extension__ ({ \ + __typeof(*p) ____p1 = RSEQ_READ_ONCE(*p); \ + rseq_smp_mb(); \ + ____p1; \ +}) + +#define rseq_smp_acquire__after_ctrl_dep() rseq_smp_rmb() + +#define rseq_smp_store_release(p, v) \ +do { \ + rseq_smp_mb(); \ + RSEQ_WRITE_ONCE(*p, v); \ +} while (0) + +#ifdef RSEQ_SKIP_FASTPATH +#include "rseq-skip.h" +#else /* !RSEQ_SKIP_FASTPATH */ + +/* + * Use eax as scratch register and take memory operands as input to + * lessen register pressure. Especially needed when compiling in O0. + */ +#define __RSEQ_ASM_DEFINE_TABLE(label, version, flags, \ + start_ip, post_commit_offset, abort_ip) \ + ".pushsection __rseq_table, \"aw\"\n\t" \ + ".balign 32\n\t" \ + __rseq_str(label) ":\n\t" \ + ".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + ".long " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_DEFINE_TABLE(label, start_ip, post_commit_ip, abort_ip) \ + __RSEQ_ASM_DEFINE_TABLE(label, 0x0, 0x0, start_ip, \ + (post_commit_ip - start_ip), abort_ip) + +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs) \ + RSEQ_INJECT_ASM(1) \ + "movl $" __rseq_str(cs_label) ", %[rseq_cs]\n\t" \ + __rseq_str(label) ":\n\t" + +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label) \ + RSEQ_INJECT_ASM(2) \ + "cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \ + "jnz " __rseq_str(label) "\n\t" + +#define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ + ".pushsection __rseq_failure, \"ax\"\n\t" \ + /* Disassembler-friendly signature: nopl <sig>. */ \ + ".byte 0x0f, 0x1f, 0x05\n\t" \ + ".long " __rseq_str(RSEQ_SIG) "\n\t" \ + __rseq_str(label) ":\n\t" \ + teardown \ + "jmp %l[" __rseq_str(abort_label) "]\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_DEFINE_CMPFAIL(label, teardown, cmpfail_label) \ + ".pushsection __rseq_failure, \"ax\"\n\t" \ + __rseq_str(label) ":\n\t" \ + teardown \ + "jmp %l[" __rseq_str(cmpfail_label) "]\n\t" \ + ".popsection\n\t" + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpl %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpl %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" +#endif + /* final store */ + "movl %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +/* + * Compare @v against @expectnot. When it does _not_ match, load @v + * into @load, and store the content of *@v + voffp into @v. + */ +static inline __attribute__((always_inline)) +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot, + off_t voffp, intptr_t *load, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "movl %[v], %%ebx\n\t" + "cmpl %%ebx, %[expectnot]\n\t" + "je %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "movl %[v], %%ebx\n\t" + "cmpl %%ebx, %[expectnot]\n\t" + "je %l[error2]\n\t" +#endif + "movl %%ebx, %[load]\n\t" + "addl %[voffp], %%ebx\n\t" + "movl (%%ebx), %%ebx\n\t" + /* final store */ + "movl %%ebx, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expectnot] "r" (expectnot), + [voffp] "ir" (voffp), + [load] "m" (*load) + : "memory", "cc", "eax", "ebx" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_addv(intptr_t *v, intptr_t count, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) +#endif + /* final store */ + "addl %[count], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(4) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [count] "ir" (count) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort +#ifdef RSEQ_COMPARE_TWICE + , error1 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpl %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpl %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" +#endif + /* try store */ + "movl %[newv2], %%eax\n\t" + "movl %%eax, %[v2]\n\t" + RSEQ_INJECT_ASM(5) + /* final store */ + "movl %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "m" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "movl %[expect], %%eax\n\t" + "cmpl %[v], %%eax\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "movl %[expect], %%eax\n\t" + "cmpl %[v], %%eax\n\t" + "jnz %l[error2]\n\t" +#endif + /* try store */ + "movl %[newv2], %[v2]\n\t" + RSEQ_INJECT_ASM(5) + "lock; addl $0,-128(%%esp)\n\t" + /* final store */ + "movl %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "m" (expect), + [newv] "r" (newv) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif + +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t expect2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpl %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) + "cmpl %[expect2], %[v2]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(5) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpl %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" + "cmpl %[expect2], %[v2]\n\t" + "jnz %l[error3]\n\t" +#endif + "movl %[newv], %%eax\n\t" + /* final store */ + "movl %%eax, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* cmp2 input */ + [v2] "m" (*v2), + [expect2] "r" (expect2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "m" (newv) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2, error3 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("1st expected value comparison failed"); +error3: + rseq_bug("2nd expected value comparison failed"); +#endif +} + +/* TODO: implement a faster memcpy. */ +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + uint32_t rseq_scratch[3]; + + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + "movl %[src], %[rseq_scratch0]\n\t" + "movl %[dst], %[rseq_scratch1]\n\t" + "movl %[len], %[rseq_scratch2]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "movl %[expect], %%eax\n\t" + "cmpl %%eax, %[v]\n\t" + "jnz 5f\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 6f) + "movl %[expect], %%eax\n\t" + "cmpl %%eax, %[v]\n\t" + "jnz 7f\n\t" +#endif + /* try memcpy */ + "test %[len], %[len]\n\t" \ + "jz 333f\n\t" \ + "222:\n\t" \ + "movb (%[src]), %%al\n\t" \ + "movb %%al, (%[dst])\n\t" \ + "inc %[src]\n\t" \ + "inc %[dst]\n\t" \ + "dec %[len]\n\t" \ + "jnz 222b\n\t" \ + "333:\n\t" \ + RSEQ_INJECT_ASM(5) + "movl %[newv], %%eax\n\t" + /* final store */ + "movl %%eax, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + /* teardown */ + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t" + RSEQ_ASM_DEFINE_ABORT(4, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + abort) + RSEQ_ASM_DEFINE_CMPFAIL(5, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + cmpfail) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_DEFINE_CMPFAIL(6, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + error1) + RSEQ_ASM_DEFINE_CMPFAIL(7, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + error2) +#endif + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "m" (expect), + [newv] "m" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len), + [rseq_scratch0] "m" (rseq_scratch[0]), + [rseq_scratch1] "m" (rseq_scratch[1]), + [rseq_scratch2] "m" (rseq_scratch[2]) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +/* TODO: implement a faster memcpy. */ +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + uint32_t rseq_scratch[3]; + + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + "movl %[src], %[rseq_scratch0]\n\t" + "movl %[dst], %[rseq_scratch1]\n\t" + "movl %[len], %[rseq_scratch2]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "movl %[expect], %%eax\n\t" + "cmpl %%eax, %[v]\n\t" + "jnz 5f\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 6f) + "movl %[expect], %%eax\n\t" + "cmpl %%eax, %[v]\n\t" + "jnz 7f\n\t" +#endif + /* try memcpy */ + "test %[len], %[len]\n\t" \ + "jz 333f\n\t" \ + "222:\n\t" \ + "movb (%[src]), %%al\n\t" \ + "movb %%al, (%[dst])\n\t" \ + "inc %[src]\n\t" \ + "inc %[dst]\n\t" \ + "dec %[len]\n\t" \ + "jnz 222b\n\t" \ + "333:\n\t" \ + RSEQ_INJECT_ASM(5) + "lock; addl $0,-128(%%esp)\n\t" + "movl %[newv], %%eax\n\t" + /* final store */ + "movl %%eax, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + /* teardown */ + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t" + RSEQ_ASM_DEFINE_ABORT(4, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + abort) + RSEQ_ASM_DEFINE_CMPFAIL(5, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + cmpfail) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_DEFINE_CMPFAIL(6, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + error1) + RSEQ_ASM_DEFINE_CMPFAIL(7, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + error2) +#endif + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "m" (expect), + [newv] "m" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len), + [rseq_scratch0] "m" (rseq_scratch[0]), + [rseq_scratch1] "m" (rseq_scratch[1]), + [rseq_scratch2] "m" (rseq_scratch[2]) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +#endif /* !RSEQ_SKIP_FASTPATH */ + +#endif diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c new file mode 100644 index 000000000000..4847e97ed049 --- /dev/null +++ b/tools/testing/selftests/rseq/rseq.c @@ -0,0 +1,117 @@ +// SPDX-License-Identifier: LGPL-2.1 +/* + * rseq.c + * + * Copyright (C) 2016 Mathieu Desnoyers <mathieu.desnoyers@efficios.com> + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; only + * version 2.1 of the License. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + */ + +#define _GNU_SOURCE +#include <errno.h> +#include <sched.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> +#include <syscall.h> +#include <assert.h> +#include <signal.h> + +#include "rseq.h" + +#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])) + +__attribute__((tls_model("initial-exec"))) __thread +volatile struct rseq __rseq_abi = { + .cpu_id = RSEQ_CPU_ID_UNINITIALIZED, +}; + +static __attribute__((tls_model("initial-exec"))) __thread +volatile int refcount; + +static void signal_off_save(sigset_t *oldset) +{ + sigset_t set; + int ret; + + sigfillset(&set); + ret = pthread_sigmask(SIG_BLOCK, &set, oldset); + if (ret) + abort(); +} + +static void signal_restore(sigset_t oldset) +{ + int ret; + + ret = pthread_sigmask(SIG_SETMASK, &oldset, NULL); + if (ret) + abort(); +} + +static int sys_rseq(volatile struct rseq *rseq_abi, uint32_t rseq_len, + int flags, uint32_t sig) +{ + return syscall(__NR_rseq, rseq_abi, rseq_len, flags, sig); +} + +int rseq_register_current_thread(void) +{ + int rc, ret = 0; + sigset_t oldset; + + signal_off_save(&oldset); + if (refcount++) + goto end; + rc = sys_rseq(&__rseq_abi, sizeof(struct rseq), 0, RSEQ_SIG); + if (!rc) { + assert(rseq_current_cpu_raw() >= 0); + goto end; + } + if (errno != EBUSY) + __rseq_abi.cpu_id = -2; + ret = -1; + refcount--; +end: + signal_restore(oldset); + return ret; +} + +int rseq_unregister_current_thread(void) +{ + int rc, ret = 0; + sigset_t oldset; + + signal_off_save(&oldset); + if (--refcount) + goto end; + rc = sys_rseq(&__rseq_abi, sizeof(struct rseq), + RSEQ_FLAG_UNREGISTER, RSEQ_SIG); + if (!rc) + goto end; + ret = -1; +end: + signal_restore(oldset); + return ret; +} + +int32_t rseq_fallback_current_cpu(void) +{ + int32_t cpu; + + cpu = sched_getcpu(); + if (cpu < 0) { + perror("sched_getcpu()"); + abort(); + } + return cpu; +} diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h new file mode 100644 index 000000000000..0a808575cbc4 --- /dev/null +++ b/tools/testing/selftests/rseq/rseq.h @@ -0,0 +1,147 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * rseq.h + * + * (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com> + */ + +#ifndef RSEQ_H +#define RSEQ_H + +#include <stdint.h> +#include <stdbool.h> +#include <pthread.h> +#include <signal.h> +#include <sched.h> +#include <errno.h> +#include <stdio.h> +#include <stdlib.h> +#include <sched.h> +#include <linux/rseq.h> + +/* + * Empty code injection macros, override when testing. + * It is important to consider that the ASM injection macros need to be + * fully reentrant (e.g. do not modify the stack). + */ +#ifndef RSEQ_INJECT_ASM +#define RSEQ_INJECT_ASM(n) +#endif + +#ifndef RSEQ_INJECT_C +#define RSEQ_INJECT_C(n) +#endif + +#ifndef RSEQ_INJECT_INPUT +#define RSEQ_INJECT_INPUT +#endif + +#ifndef RSEQ_INJECT_CLOBBER +#define RSEQ_INJECT_CLOBBER +#endif + +#ifndef RSEQ_INJECT_FAILED +#define RSEQ_INJECT_FAILED +#endif + +extern __thread volatile struct rseq __rseq_abi; + +#define rseq_likely(x) __builtin_expect(!!(x), 1) +#define rseq_unlikely(x) __builtin_expect(!!(x), 0) +#define rseq_barrier() __asm__ __volatile__("" : : : "memory") + +#define RSEQ_ACCESS_ONCE(x) (*(__volatile__ __typeof__(x) *)&(x)) +#define RSEQ_WRITE_ONCE(x, v) __extension__ ({ RSEQ_ACCESS_ONCE(x) = (v); }) +#define RSEQ_READ_ONCE(x) RSEQ_ACCESS_ONCE(x) + +#define __rseq_str_1(x) #x +#define __rseq_str(x) __rseq_str_1(x) + +#define rseq_log(fmt, args...) \ + fprintf(stderr, fmt "(in %s() at " __FILE__ ":" __rseq_str(__LINE__)"\n", \ + ## args, __func__) + +#define rseq_bug(fmt, args...) \ + do { \ + rseq_log(fmt, ##args); \ + abort(); \ + } while (0) + +#if defined(__x86_64__) || defined(__i386__) +#include <rseq-x86.h> +#elif defined(__ARMEL__) +#include <rseq-arm.h> +#elif defined(__PPC__) +#include <rseq-ppc.h> +#else +#error unsupported target +#endif + +/* + * Register rseq for the current thread. This needs to be called once + * by any thread which uses restartable sequences, before they start + * using restartable sequences, to ensure restartable sequences + * succeed. A restartable sequence executed from a non-registered + * thread will always fail. + */ +int rseq_register_current_thread(void); + +/* + * Unregister rseq for current thread. + */ +int rseq_unregister_current_thread(void); + +/* + * Restartable sequence fallback for reading the current CPU number. + */ +int32_t rseq_fallback_current_cpu(void); + +/* + * Values returned can be either the current CPU number, -1 (rseq is + * uninitialized), or -2 (rseq initialization has failed). + */ +static inline int32_t rseq_current_cpu_raw(void) +{ + return RSEQ_ACCESS_ONCE(__rseq_abi.cpu_id); +} + +/* + * Returns a possible CPU number, which is typically the current CPU. + * The returned CPU number can be used to prepare for an rseq critical + * section, which will confirm whether the cpu number is indeed the + * current one, and whether rseq is initialized. + * + * The CPU number returned by rseq_cpu_start should always be validated + * by passing it to a rseq asm sequence, or by comparing it to the + * return value of rseq_current_cpu_raw() if the rseq asm sequence + * does not need to be invoked. + */ +static inline uint32_t rseq_cpu_start(void) +{ + return RSEQ_ACCESS_ONCE(__rseq_abi.cpu_id_start); +} + +static inline uint32_t rseq_current_cpu(void) +{ + int32_t cpu; + + cpu = rseq_current_cpu_raw(); + if (rseq_unlikely(cpu < 0)) + cpu = rseq_fallback_current_cpu(); + return cpu; +} + +/* + * rseq_prepare_unload() should be invoked by each thread using rseq_finish*() + * at least once between their last rseq_finish*() and library unload of the + * library defining the rseq critical section (struct rseq_cs). This also + * applies to use of rseq in code generated by JIT: rseq_prepare_unload() + * should be invoked at least once by each thread using rseq_finish*() before + * reclaim of the memory holding the struct rseq_cs. + */ +static inline void rseq_prepare_unload(void) +{ + __rseq_abi.rseq_cs = 0; +} + +#endif /* RSEQ_H_ */ -- 2.11.0 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 10/14] rseq: selftests: Provide rseq library (v5) @ 2018-04-30 22:44 ` mathieu.desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) This rseq helper library provides a user-space API to the rseq() system call. The rseq fast-path exposes the instruction pointer addresses where the rseq assembly blocks begin and end, as well as the associated abort instruction pointer, in the __rseq_table section. This section allows debuggers may know where to place breakpoints when single-stepping through assembly blocks which may be aborted at any point by the kernel. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com> CC: Shuah Khan <shuahkh at osg.samsung.com> CC: Russell King <linux at arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas at arm.com> CC: Will Deacon <will.deacon at arm.com> CC: Thomas Gleixner <tglx at linutronix.de> CC: Paul Turner <pjt at google.com> CC: Andrew Hunter <ahh at google.com> CC: Peter Zijlstra <peterz at infradead.org> CC: Andy Lutomirski <luto at amacapital.net> CC: Andi Kleen <andi at firstfloor.org> CC: Dave Watson <davejwatson at fb.com> CC: Chris Lameter <cl at linux.com> CC: Ingo Molnar <mingo at redhat.com> CC: "H. Peter Anvin" <hpa at zytor.com> CC: Ben Maurer <bmaurer at fb.com> CC: Steven Rostedt <rostedt at goodmis.org> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com> CC: Josh Triplett <josh at joshtriplett.org> CC: Linus Torvalds <torvalds at linux-foundation.org> CC: Andrew Morton <akpm at linux-foundation.org> CC: Boqun Feng <boqun.feng at gmail.com> CC: linux-kselftest at vger.kernel.org CC: linux-api at vger.kernel.org --- Changes since v1: - Provide abort-ip signature: The abort-ip signature is located just before the abort-ip target. It is currently hardcoded, but a user-space application could use the __rseq_table to iterate on all abort-ip targets and use a random value as signature if needed in the future. - Add rseq_prepare_unload(): Libraries and JIT code using rseq critical sections need to issue rseq_prepare_unload() on each thread at least once before reclaim of struct rseq_cs. - Use initial-exec TLS model, non-weak symbol: The initial-exec model is signal-safe, whereas the global-dynamic model is not. Remove the "weak" symbol attribute from the __rseq_abi in rseq.c. The rseq.so library will have ownership of that symbol, and there is not reason for an application or user library to try to define that symbol. The expected use is to link against libreq.so, which owns and provide that symbol. - Set cpu_id to -2 on register error - Add rseq_len syscall parameter, rseq_cs version - Ensure disassember-friendly signature: x86 32/64 disassembler have a hard time decoding the instruction stream after a bad instruction. Use a nopl instruction to encode the signature. Suggested by Andy Lutomirski. - Exercise parametrized tests variants in a shell scripts. - Restartable sequences selftests: Remove use of event counter. - Use cpu_id_start field: With the cpu_id_start field, the C preparation phase of the fast-path does not need to compare cpu_id < 0 anymore. - Signal-safe registration and refcounting: Allow libraries using librseq.so to register it from signal handlers. - Use OVERRIDE_TARGETS in makefile. - Use "m" constraints for rseq_cs field. Changes since v2: - Update based on Thomas Gleixner's comments. Changes since v3: - Generate param_test_skip_fastpath and param_test_benchmark with -DSKIP_FASTPATH and -DBENCHMARK (respectively). Add param_test_fastpath to run_param_test.sh. Changes since v4: - Fold arm: workaround gcc asm size guess, - Namespace barrier() -> rseq_barrier() in library header, - Take into account coding style feedback from Peter Zijlstra, - Split rseq selftests into logical commits. --- tools/testing/selftests/rseq/rseq-arm.h | 715 +++++++++++++++++++ tools/testing/selftests/rseq/rseq-ppc.h | 671 ++++++++++++++++++ tools/testing/selftests/rseq/rseq-skip.h | 65 ++ tools/testing/selftests/rseq/rseq-x86.h | 1132 ++++++++++++++++++++++++++++++ tools/testing/selftests/rseq/rseq.c | 117 +++ tools/testing/selftests/rseq/rseq.h | 147 ++++ 6 files changed, 2847 insertions(+) create mode 100644 tools/testing/selftests/rseq/rseq-arm.h create mode 100644 tools/testing/selftests/rseq/rseq-ppc.h create mode 100644 tools/testing/selftests/rseq/rseq-skip.h create mode 100644 tools/testing/selftests/rseq/rseq-x86.h create mode 100644 tools/testing/selftests/rseq/rseq.c create mode 100644 tools/testing/selftests/rseq/rseq.h diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h new file mode 100644 index 000000000000..3b055f9aeaab --- /dev/null +++ b/tools/testing/selftests/rseq/rseq-arm.h @@ -0,0 +1,715 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * rseq-arm.h + * + * (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers at efficios.com> + */ + +#define RSEQ_SIG 0x53053053 + +#define rseq_smp_mb() __asm__ __volatile__ ("dmb" ::: "memory", "cc") +#define rseq_smp_rmb() __asm__ __volatile__ ("dmb" ::: "memory", "cc") +#define rseq_smp_wmb() __asm__ __volatile__ ("dmb" ::: "memory", "cc") + +#define rseq_smp_load_acquire(p) \ +__extension__ ({ \ + __typeof(*p) ____p1 = RSEQ_READ_ONCE(*p); \ + rseq_smp_mb(); \ + ____p1; \ +}) + +#define rseq_smp_acquire__after_ctrl_dep() rseq_smp_rmb() + +#define rseq_smp_store_release(p, v) \ +do { \ + rseq_smp_mb(); \ + RSEQ_WRITE_ONCE(*p, v); \ +} while (0) + +#ifdef RSEQ_SKIP_FASTPATH +#include "rseq-skip.h" +#else /* !RSEQ_SKIP_FASTPATH */ + +#define __RSEQ_ASM_DEFINE_TABLE(version, flags, start_ip, \ + post_commit_offset, abort_ip) \ + ".pushsection __rseq_table, \"aw\"\n\t" \ + ".balign 32\n\t" \ + ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + ".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_DEFINE_TABLE(start_ip, post_commit_ip, abort_ip) \ + __RSEQ_ASM_DEFINE_TABLE(0x0, 0x0, start_ip, \ + (post_commit_ip - start_ip), abort_ip) + +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs) \ + RSEQ_INJECT_ASM(1) \ + "adr r0, " __rseq_str(cs_label) "\n\t" \ + "str r0, %[" __rseq_str(rseq_cs) "]\n\t" \ + __rseq_str(label) ":\n\t" + +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label) \ + RSEQ_INJECT_ASM(2) \ + "ldr r0, %[" __rseq_str(current_cpu_id) "]\n\t" \ + "cmp %[" __rseq_str(cpu_id) "], r0\n\t" \ + "bne " __rseq_str(label) "\n\t" + +#define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \ + abort_label, version, flags, \ + start_ip, post_commit_offset, abort_ip) \ + __rseq_str(table_label) ":\n\t" \ + ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + ".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \ + ".word " __rseq_str(RSEQ_SIG) "\n\t" \ + __rseq_str(label) ":\n\t" \ + teardown \ + "b %l[" __rseq_str(abort_label) "]\n\t" + +#define RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, abort_label, \ + start_ip, post_commit_ip, abort_ip) \ + __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \ + abort_label, 0x0, 0x0, start_ip, \ + (post_commit_ip - start_ip), abort_ip) + +#define RSEQ_ASM_DEFINE_CMPFAIL(label, teardown, cmpfail_label) \ + __rseq_str(label) ":\n\t" \ + teardown \ + "b %l[" __rseq_str(cmpfail_label) "]\n\t" + +#define rseq_workaround_gcc_asm_size_guess() __asm__ __volatile__("") + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[error2]\n\t" +#endif + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot, + off_t voffp, intptr_t *load, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expectnot], r0\n\t" + "beq %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "ldr r0, %[v]\n\t" + "cmp %[expectnot], r0\n\t" + "beq %l[error2]\n\t" +#endif + "str r0, %[load]\n\t" + "add r0, %[voffp]\n\t" + "ldr r0, [r0]\n\t" + /* final store */ + "str r0, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expectnot] "r" (expectnot), + [voffp] "Ir" (voffp), + [load] "m" (*load) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_addv(intptr_t *v, intptr_t count, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) +#endif + "ldr r0, %[v]\n\t" + "add r0, %[count]\n\t" + /* final store */ + "str r0, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(4) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + [v] "m" (*v), + [count] "Ir" (count) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort +#ifdef RSEQ_COMPARE_TWICE + , error1 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[error2]\n\t" +#endif + /* try store */ + "str %[newv2], %[v2]\n\t" + RSEQ_INJECT_ASM(5) + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[error2]\n\t" +#endif + /* try store */ + "str %[newv2], %[v2]\n\t" + RSEQ_INJECT_ASM(5) + "dmb\n\t" /* full mb provides store-release */ + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t expect2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) + "ldr r0, %[v2]\n\t" + "cmp %[expect2], r0\n\t" + "bne %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(5) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[error2]\n\t" + "ldr r0, %[v2]\n\t" + "cmp %[expect2], r0\n\t" + "bne %l[error3]\n\t" +#endif + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* cmp2 input */ + [v2] "m" (*v2), + [expect2] "r" (expect2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2, error3 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("1st expected value comparison failed"); +error3: + rseq_bug("2nd expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + uint32_t rseq_scratch[3]; + + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + "str %[src], %[rseq_scratch0]\n\t" + "str %[dst], %[rseq_scratch1]\n\t" + "str %[len], %[rseq_scratch2]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne 5f\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 6f) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne 7f\n\t" +#endif + /* try memcpy */ + "cmp %[len], #0\n\t" \ + "beq 333f\n\t" \ + "222:\n\t" \ + "ldrb %%r0, [%[src]]\n\t" \ + "strb %%r0, [%[dst]]\n\t" \ + "adds %[src], #1\n\t" \ + "adds %[dst], #1\n\t" \ + "subs %[len], #1\n\t" \ + "bne 222b\n\t" \ + "333:\n\t" \ + RSEQ_INJECT_ASM(5) + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t" + "b 8f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + abort, 1b, 2b, 4f) + RSEQ_ASM_DEFINE_CMPFAIL(5, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + cmpfail) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_DEFINE_CMPFAIL(6, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + error1) + RSEQ_ASM_DEFINE_CMPFAIL(7, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + error2) +#endif + "8:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len), + [rseq_scratch0] "m" (rseq_scratch[0]), + [rseq_scratch1] "m" (rseq_scratch[1]), + [rseq_scratch2] "m" (rseq_scratch[2]) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_workaround_gcc_asm_size_guess(); + rseq_bug("cpu_id comparison failed"); +error2: + rseq_workaround_gcc_asm_size_guess(); + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + uint32_t rseq_scratch[3]; + + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + "str %[src], %[rseq_scratch0]\n\t" + "str %[dst], %[rseq_scratch1]\n\t" + "str %[len], %[rseq_scratch2]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne 5f\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 6f) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne 7f\n\t" +#endif + /* try memcpy */ + "cmp %[len], #0\n\t" \ + "beq 333f\n\t" \ + "222:\n\t" \ + "ldrb %%r0, [%[src]]\n\t" \ + "strb %%r0, [%[dst]]\n\t" \ + "adds %[src], #1\n\t" \ + "adds %[dst], #1\n\t" \ + "subs %[len], #1\n\t" \ + "bne 222b\n\t" \ + "333:\n\t" \ + RSEQ_INJECT_ASM(5) + "dmb\n\t" /* full mb provides store-release */ + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t" + "b 8f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + abort, 1b, 2b, 4f) + RSEQ_ASM_DEFINE_CMPFAIL(5, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + cmpfail) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_DEFINE_CMPFAIL(6, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + error1) + RSEQ_ASM_DEFINE_CMPFAIL(7, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + error2) +#endif + "8:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len), + [rseq_scratch0] "m" (rseq_scratch[0]), + [rseq_scratch1] "m" (rseq_scratch[1]), + [rseq_scratch2] "m" (rseq_scratch[2]) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_workaround_gcc_asm_size_guess(); + rseq_bug("cpu_id comparison failed"); +error2: + rseq_workaround_gcc_asm_size_guess(); + rseq_bug("expected value comparison failed"); +#endif +} + +#endif /* !RSEQ_SKIP_FASTPATH */ diff --git a/tools/testing/selftests/rseq/rseq-ppc.h b/tools/testing/selftests/rseq/rseq-ppc.h new file mode 100644 index 000000000000..52630c9f42be --- /dev/null +++ b/tools/testing/selftests/rseq/rseq-ppc.h @@ -0,0 +1,671 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * rseq-ppc.h + * + * (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers at efficios.com> + * (C) Copyright 2016-2018 - Boqun Feng <boqun.feng at gmail.com> + */ + +#define RSEQ_SIG 0x53053053 + +#define rseq_smp_mb() __asm__ __volatile__ ("sync" ::: "memory", "cc") +#define rseq_smp_lwsync() __asm__ __volatile__ ("lwsync" ::: "memory", "cc") +#define rseq_smp_rmb() rseq_smp_lwsync() +#define rseq_smp_wmb() rseq_smp_lwsync() + +#define rseq_smp_load_acquire(p) \ +__extension__ ({ \ + __typeof(*p) ____p1 = RSEQ_READ_ONCE(*p); \ + rseq_smp_lwsync(); \ + ____p1; \ +}) + +#define rseq_smp_acquire__after_ctrl_dep() rseq_smp_lwsync() + +#define rseq_smp_store_release(p, v) \ +do { \ + rseq_smp_lwsync(); \ + RSEQ_WRITE_ONCE(*p, v); \ +} while (0) + +#ifdef RSEQ_SKIP_FASTPATH +#include "rseq-skip.h" +#else /* !RSEQ_SKIP_FASTPATH */ + +/* + * The __rseq_table section can be used by debuggers to better handle + * single-stepping through the restartable critical sections. + */ + +#ifdef __PPC64__ + +#define STORE_WORD "std " +#define LOAD_WORD "ld " +#define LOADX_WORD "ldx " +#define CMP_WORD "cmpd " + +#define __RSEQ_ASM_DEFINE_TABLE(label, version, flags, \ + start_ip, post_commit_offset, abort_ip) \ + ".pushsection __rseq_table, \"aw\"\n\t" \ + ".balign 32\n\t" \ + __rseq_str(label) ":\n\t" \ + ".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + ".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs) \ + RSEQ_INJECT_ASM(1) \ + "lis %%r17, (" __rseq_str(cs_label) ")@highest\n\t" \ + "ori %%r17, %%r17, (" __rseq_str(cs_label) ")@higher\n\t" \ + "rldicr %%r17, %%r17, 32, 31\n\t" \ + "oris %%r17, %%r17, (" __rseq_str(cs_label) ")@high\n\t" \ + "ori %%r17, %%r17, (" __rseq_str(cs_label) ")@l\n\t" \ + "std %%r17, %[" __rseq_str(rseq_cs) "]\n\t" \ + __rseq_str(label) ":\n\t" + +#else /* #ifdef __PPC64__ */ + +#define STORE_WORD "stw " +#define LOAD_WORD "lwz " +#define LOADX_WORD "lwzx " +#define CMP_WORD "cmpw " + +#define __RSEQ_ASM_DEFINE_TABLE(label, version, flags, \ + start_ip, post_commit_offset, abort_ip) \ + ".pushsection __rseq_table, \"aw\"\n\t" \ + ".balign 32\n\t" \ + __rseq_str(label) ":\n\t" \ + ".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + /* 32-bit only supported on BE */ \ + ".long 0x0, " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) "\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs) \ + RSEQ_INJECT_ASM(1) \ + "lis %%r17, (" __rseq_str(cs_label) ")@ha\n\t" \ + "addi %%r17, %%r17, (" __rseq_str(cs_label) ")@l\n\t" \ + "stw %%r17, %[" __rseq_str(rseq_cs) "]\n\t" \ + __rseq_str(label) ":\n\t" + +#endif /* #ifdef __PPC64__ */ + +#define RSEQ_ASM_DEFINE_TABLE(label, start_ip, post_commit_ip, abort_ip) \ + __RSEQ_ASM_DEFINE_TABLE(label, 0x0, 0x0, start_ip, \ + (post_commit_ip - start_ip), abort_ip) + +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label) \ + RSEQ_INJECT_ASM(2) \ + "lwz %%r17, %[" __rseq_str(current_cpu_id) "]\n\t" \ + "cmpw cr7, %[" __rseq_str(cpu_id) "], %%r17\n\t" \ + "bne- cr7, " __rseq_str(label) "\n\t" + +#define RSEQ_ASM_DEFINE_ABORT(label, abort_label) \ + ".pushsection __rseq_failure, \"ax\"\n\t" \ + ".long " __rseq_str(RSEQ_SIG) "\n\t" \ + __rseq_str(label) ":\n\t" \ + "b %l[" __rseq_str(abort_label) "]\n\t" \ + ".popsection\n\t" + +/* + * RSEQ_ASM_OPs: asm operations for rseq + * RSEQ_ASM_OP_R_*: has hard-code registers in it + * RSEQ_ASM_OP_* (else): doesn't have hard-code registers(unless cr7) + */ +#define RSEQ_ASM_OP_CMPEQ(var, expect, label) \ + LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t" \ + CMP_WORD "cr7, %%r17, %[" __rseq_str(expect) "]\n\t" \ + "bne- cr7, " __rseq_str(label) "\n\t" + +#define RSEQ_ASM_OP_CMPNE(var, expectnot, label) \ + LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t" \ + CMP_WORD "cr7, %%r17, %[" __rseq_str(expectnot) "]\n\t" \ + "beq- cr7, " __rseq_str(label) "\n\t" + +#define RSEQ_ASM_OP_STORE(value, var) \ + STORE_WORD "%[" __rseq_str(value) "], %[" __rseq_str(var) "]\n\t" + +/* Load @var to r17 */ +#define RSEQ_ASM_OP_R_LOAD(var) \ + LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t" + +/* Store r17 to @var */ +#define RSEQ_ASM_OP_R_STORE(var) \ + STORE_WORD "%%r17, %[" __rseq_str(var) "]\n\t" + +/* Add @count to r17 */ +#define RSEQ_ASM_OP_R_ADD(count) \ + "add %%r17, %[" __rseq_str(count) "], %%r17\n\t" + +/* Load (r17 + voffp) to r17 */ +#define RSEQ_ASM_OP_R_LOADX(voffp) \ + LOADX_WORD "%%r17, %[" __rseq_str(voffp) "], %%r17\n\t" + +/* TODO: implement a faster memcpy. */ +#define RSEQ_ASM_OP_R_MEMCPY() \ + "cmpdi %%r19, 0\n\t" \ + "beq 333f\n\t" \ + "addi %%r20, %%r20, -1\n\t" \ + "addi %%r21, %%r21, -1\n\t" \ + "222:\n\t" \ + "lbzu %%r18, 1(%%r20)\n\t" \ + "stbu %%r18, 1(%%r21)\n\t" \ + "addi %%r19, %%r19, -1\n\t" \ + "cmpdi %%r19, 0\n\t" \ + "bne 222b\n\t" \ + "333:\n\t" \ + +#define RSEQ_ASM_OP_R_FINAL_STORE(var, post_commit_label) \ + STORE_WORD "%%r17, %[" __rseq_str(var) "]\n\t" \ + __rseq_str(post_commit_label) ":\n\t" + +#define RSEQ_ASM_OP_FINAL_STORE(value, var, post_commit_label) \ + STORE_WORD "%[" __rseq_str(value) "], %[" __rseq_str(var) "]\n\t" \ + __rseq_str(post_commit_label) ":\n\t" + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) +#endif + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot, + off_t voffp, intptr_t *load, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v not equal to @expectnot */ + RSEQ_ASM_OP_CMPNE(v, expectnot, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v not equal to @expectnot */ + RSEQ_ASM_OP_CMPNE(v, expectnot, %l[error2]) +#endif + /* load the value of @v */ + RSEQ_ASM_OP_R_LOAD(v) + /* store it in @load */ + RSEQ_ASM_OP_R_STORE(load) + /* dereference voffp(v) */ + RSEQ_ASM_OP_R_LOADX(voffp) + /* final store the value at voffp(v) */ + RSEQ_ASM_OP_R_FINAL_STORE(v, 2) + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expectnot] "r" (expectnot), + [voffp] "b" (voffp), + [load] "m" (*load) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_addv(intptr_t *v, intptr_t count, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) +#endif + /* load the value of @v */ + RSEQ_ASM_OP_R_LOAD(v) + /* add @count to it */ + RSEQ_ASM_OP_R_ADD(count) + /* final store */ + RSEQ_ASM_OP_R_FINAL_STORE(v, 2) + RSEQ_INJECT_ASM(4) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [count] "r" (count) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort +#ifdef RSEQ_COMPARE_TWICE + , error1 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) +#endif + /* try store */ + RSEQ_ASM_OP_STORE(newv2, v2) + RSEQ_INJECT_ASM(5) + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) +#endif + /* try store */ + RSEQ_ASM_OP_STORE(newv2, v2) + RSEQ_INJECT_ASM(5) + /* for 'release' */ + "lwsync\n\t" + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t expect2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) + /* cmp @v2 equal to @expct2 */ + RSEQ_ASM_OP_CMPEQ(v2, expect2, %l[cmpfail]) + RSEQ_INJECT_ASM(5) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) + /* cmp @v2 equal to @expct2 */ + RSEQ_ASM_OP_CMPEQ(v2, expect2, %l[error3]) +#endif + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* cmp2 input */ + [v2] "m" (*v2), + [expect2] "r" (expect2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2, error3 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("1st expected value comparison failed"); +error3: + rseq_bug("2nd expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* setup for mempcy */ + "mr %%r19, %[len]\n\t" + "mr %%r20, %[src]\n\t" + "mr %%r21, %[dst]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) +#endif + /* try memcpy */ + RSEQ_ASM_OP_R_MEMCPY() + RSEQ_INJECT_ASM(5) + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(6) + /* teardown */ + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17", "r18", "r19", "r20", "r21" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* setup for mempcy */ + "mr %%r19, %[len]\n\t" + "mr %%r20, %[src]\n\t" + "mr %%r21, %[dst]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) +#endif + /* try memcpy */ + RSEQ_ASM_OP_R_MEMCPY() + RSEQ_INJECT_ASM(5) + /* for 'release' */ + "lwsync\n\t" + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(6) + /* teardown */ + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17", "r18", "r19", "r20", "r21" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +#undef STORE_WORD +#undef LOAD_WORD +#undef LOADX_WORD +#undef CMP_WORD + +#endif /* !RSEQ_SKIP_FASTPATH */ diff --git a/tools/testing/selftests/rseq/rseq-skip.h b/tools/testing/selftests/rseq/rseq-skip.h new file mode 100644 index 000000000000..72750b5905a9 --- /dev/null +++ b/tools/testing/selftests/rseq/rseq-skip.h @@ -0,0 +1,65 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * rseq-skip.h + * + * (C) Copyright 2017-2018 - Mathieu Desnoyers <mathieu.desnoyers at efficios.com> + */ + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot, + off_t voffp, intptr_t *load, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_addv(intptr_t *v, intptr_t count, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t expect2, + intptr_t newv, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + return -1; +} diff --git a/tools/testing/selftests/rseq/rseq-x86.h b/tools/testing/selftests/rseq/rseq-x86.h new file mode 100644 index 000000000000..089410a314e9 --- /dev/null +++ b/tools/testing/selftests/rseq/rseq-x86.h @@ -0,0 +1,1132 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * rseq-x86.h + * + * (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers at efficios.com> + */ + +#include <stdint.h> + +#define RSEQ_SIG 0x53053053 + +#ifdef __x86_64__ + +#define rseq_smp_mb() \ + __asm__ __volatile__ ("lock; addl $0,-128(%%rsp)" ::: "memory", "cc") +#define rseq_smp_rmb() rseq_barrier() +#define rseq_smp_wmb() rseq_barrier() + +#define rseq_smp_load_acquire(p) \ +__extension__ ({ \ + __typeof(*p) ____p1 = RSEQ_READ_ONCE(*p); \ + rseq_barrier(); \ + ____p1; \ +}) + +#define rseq_smp_acquire__after_ctrl_dep() rseq_smp_rmb() + +#define rseq_smp_store_release(p, v) \ +do { \ + rseq_barrier(); \ + RSEQ_WRITE_ONCE(*p, v); \ +} while (0) + +#ifdef RSEQ_SKIP_FASTPATH +#include "rseq-skip.h" +#else /* !RSEQ_SKIP_FASTPATH */ + +#define __RSEQ_ASM_DEFINE_TABLE(label, version, flags, \ + start_ip, post_commit_offset, abort_ip) \ + ".pushsection __rseq_table, \"aw\"\n\t" \ + ".balign 32\n\t" \ + __rseq_str(label) ":\n\t" \ + ".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + ".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_DEFINE_TABLE(label, start_ip, post_commit_ip, abort_ip) \ + __RSEQ_ASM_DEFINE_TABLE(label, 0x0, 0x0, start_ip, \ + (post_commit_ip - start_ip), abort_ip) + +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs) \ + RSEQ_INJECT_ASM(1) \ + "leaq " __rseq_str(cs_label) "(%%rip), %%rax\n\t" \ + "movq %%rax, %[" __rseq_str(rseq_cs) "]\n\t" \ + __rseq_str(label) ":\n\t" + +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label) \ + RSEQ_INJECT_ASM(2) \ + "cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \ + "jnz " __rseq_str(label) "\n\t" + +#define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ + ".pushsection __rseq_failure, \"ax\"\n\t" \ + /* Disassembler-friendly signature: nopl <sig>(%rip). */\ + ".byte 0x0f, 0x1f, 0x05\n\t" \ + ".long " __rseq_str(RSEQ_SIG) "\n\t" \ + __rseq_str(label) ":\n\t" \ + teardown \ + "jmp %l[" __rseq_str(abort_label) "]\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_DEFINE_CMPFAIL(label, teardown, cmpfail_label) \ + ".pushsection __rseq_failure, \"ax\"\n\t" \ + __rseq_str(label) ":\n\t" \ + teardown \ + "jmp %l[" __rseq_str(cmpfail_label) "]\n\t" \ + ".popsection\n\t" + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpq %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpq %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" +#endif + /* final store */ + "movq %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + : "memory", "cc", "rax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +/* + * Compare @v against @expectnot. When it does _not_ match, load @v + * into @load, and store the content of *@v + voffp into @v. + */ +static inline __attribute__((always_inline)) +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot, + off_t voffp, intptr_t *load, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "movq %[v], %%rbx\n\t" + "cmpq %%rbx, %[expectnot]\n\t" + "je %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "movq %[v], %%rbx\n\t" + "cmpq %%rbx, %[expectnot]\n\t" + "je %l[error2]\n\t" +#endif + "movq %%rbx, %[load]\n\t" + "addq %[voffp], %%rbx\n\t" + "movq (%%rbx), %%rbx\n\t" + /* final store */ + "movq %%rbx, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expectnot] "r" (expectnot), + [voffp] "er" (voffp), + [load] "m" (*load) + : "memory", "cc", "rax", "rbx" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_addv(intptr_t *v, intptr_t count, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) +#endif + /* final store */ + "addq %[count], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(4) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [count] "er" (count) + : "memory", "cc", "rax" + RSEQ_INJECT_CLOBBER + : abort +#ifdef RSEQ_COMPARE_TWICE + , error1 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpq %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpq %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" +#endif + /* try store */ + "movq %[newv2], %[v2]\n\t" + RSEQ_INJECT_ASM(5) + /* final store */ + "movq %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + : "memory", "cc", "rax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +/* x86-64 is TSO. */ +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + return rseq_cmpeqv_trystorev_storev(v, expect, v2, newv2, newv, cpu); +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t expect2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpq %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) + "cmpq %[v2], %[expect2]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(5) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpq %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" + "cmpq %[v2], %[expect2]\n\t" + "jnz %l[error3]\n\t" +#endif + /* final store */ + "movq %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* cmp2 input */ + [v2] "m" (*v2), + [expect2] "r" (expect2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + : "memory", "cc", "rax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2, error3 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("1st expected value comparison failed"); +error3: + rseq_bug("2nd expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + uint64_t rseq_scratch[3]; + + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + "movq %[src], %[rseq_scratch0]\n\t" + "movq %[dst], %[rseq_scratch1]\n\t" + "movq %[len], %[rseq_scratch2]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpq %[v], %[expect]\n\t" + "jnz 5f\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 6f) + "cmpq %[v], %[expect]\n\t" + "jnz 7f\n\t" +#endif + /* try memcpy */ + "test %[len], %[len]\n\t" \ + "jz 333f\n\t" \ + "222:\n\t" \ + "movb (%[src]), %%al\n\t" \ + "movb %%al, (%[dst])\n\t" \ + "inc %[src]\n\t" \ + "inc %[dst]\n\t" \ + "dec %[len]\n\t" \ + "jnz 222b\n\t" \ + "333:\n\t" \ + RSEQ_INJECT_ASM(5) + /* final store */ + "movq %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + /* teardown */ + "movq %[rseq_scratch2], %[len]\n\t" + "movq %[rseq_scratch1], %[dst]\n\t" + "movq %[rseq_scratch0], %[src]\n\t" + RSEQ_ASM_DEFINE_ABORT(4, + "movq %[rseq_scratch2], %[len]\n\t" + "movq %[rseq_scratch1], %[dst]\n\t" + "movq %[rseq_scratch0], %[src]\n\t", + abort) + RSEQ_ASM_DEFINE_CMPFAIL(5, + "movq %[rseq_scratch2], %[len]\n\t" + "movq %[rseq_scratch1], %[dst]\n\t" + "movq %[rseq_scratch0], %[src]\n\t", + cmpfail) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_DEFINE_CMPFAIL(6, + "movq %[rseq_scratch2], %[len]\n\t" + "movq %[rseq_scratch1], %[dst]\n\t" + "movq %[rseq_scratch0], %[src]\n\t", + error1) + RSEQ_ASM_DEFINE_CMPFAIL(7, + "movq %[rseq_scratch2], %[len]\n\t" + "movq %[rseq_scratch1], %[dst]\n\t" + "movq %[rseq_scratch0], %[src]\n\t", + error2) +#endif + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len), + [rseq_scratch0] "m" (rseq_scratch[0]), + [rseq_scratch1] "m" (rseq_scratch[1]), + [rseq_scratch2] "m" (rseq_scratch[2]) + : "memory", "cc", "rax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +/* x86-64 is TSO. */ +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + return rseq_cmpeqv_trymemcpy_storev(v, expect, dst, src, len, + newv, cpu); +} + +#endif /* !RSEQ_SKIP_FASTPATH */ + +#elif __i386__ + +#define rseq_smp_mb() \ + __asm__ __volatile__ ("lock; addl $0,-128(%%esp)" ::: "memory", "cc") +#define rseq_smp_rmb() \ + __asm__ __volatile__ ("lock; addl $0,-128(%%esp)" ::: "memory", "cc") +#define rseq_smp_wmb() \ + __asm__ __volatile__ ("lock; addl $0,-128(%%esp)" ::: "memory", "cc") + +#define rseq_smp_load_acquire(p) \ +__extension__ ({ \ + __typeof(*p) ____p1 = RSEQ_READ_ONCE(*p); \ + rseq_smp_mb(); \ + ____p1; \ +}) + +#define rseq_smp_acquire__after_ctrl_dep() rseq_smp_rmb() + +#define rseq_smp_store_release(p, v) \ +do { \ + rseq_smp_mb(); \ + RSEQ_WRITE_ONCE(*p, v); \ +} while (0) + +#ifdef RSEQ_SKIP_FASTPATH +#include "rseq-skip.h" +#else /* !RSEQ_SKIP_FASTPATH */ + +/* + * Use eax as scratch register and take memory operands as input to + * lessen register pressure. Especially needed when compiling in O0. + */ +#define __RSEQ_ASM_DEFINE_TABLE(label, version, flags, \ + start_ip, post_commit_offset, abort_ip) \ + ".pushsection __rseq_table, \"aw\"\n\t" \ + ".balign 32\n\t" \ + __rseq_str(label) ":\n\t" \ + ".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + ".long " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_DEFINE_TABLE(label, start_ip, post_commit_ip, abort_ip) \ + __RSEQ_ASM_DEFINE_TABLE(label, 0x0, 0x0, start_ip, \ + (post_commit_ip - start_ip), abort_ip) + +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs) \ + RSEQ_INJECT_ASM(1) \ + "movl $" __rseq_str(cs_label) ", %[rseq_cs]\n\t" \ + __rseq_str(label) ":\n\t" + +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label) \ + RSEQ_INJECT_ASM(2) \ + "cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \ + "jnz " __rseq_str(label) "\n\t" + +#define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ + ".pushsection __rseq_failure, \"ax\"\n\t" \ + /* Disassembler-friendly signature: nopl <sig>. */ \ + ".byte 0x0f, 0x1f, 0x05\n\t" \ + ".long " __rseq_str(RSEQ_SIG) "\n\t" \ + __rseq_str(label) ":\n\t" \ + teardown \ + "jmp %l[" __rseq_str(abort_label) "]\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_DEFINE_CMPFAIL(label, teardown, cmpfail_label) \ + ".pushsection __rseq_failure, \"ax\"\n\t" \ + __rseq_str(label) ":\n\t" \ + teardown \ + "jmp %l[" __rseq_str(cmpfail_label) "]\n\t" \ + ".popsection\n\t" + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpl %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpl %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" +#endif + /* final store */ + "movl %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +/* + * Compare @v against @expectnot. When it does _not_ match, load @v + * into @load, and store the content of *@v + voffp into @v. + */ +static inline __attribute__((always_inline)) +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot, + off_t voffp, intptr_t *load, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "movl %[v], %%ebx\n\t" + "cmpl %%ebx, %[expectnot]\n\t" + "je %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "movl %[v], %%ebx\n\t" + "cmpl %%ebx, %[expectnot]\n\t" + "je %l[error2]\n\t" +#endif + "movl %%ebx, %[load]\n\t" + "addl %[voffp], %%ebx\n\t" + "movl (%%ebx), %%ebx\n\t" + /* final store */ + "movl %%ebx, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expectnot] "r" (expectnot), + [voffp] "ir" (voffp), + [load] "m" (*load) + : "memory", "cc", "eax", "ebx" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_addv(intptr_t *v, intptr_t count, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) +#endif + /* final store */ + "addl %[count], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(4) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [count] "ir" (count) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort +#ifdef RSEQ_COMPARE_TWICE + , error1 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpl %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpl %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" +#endif + /* try store */ + "movl %[newv2], %%eax\n\t" + "movl %%eax, %[v2]\n\t" + RSEQ_INJECT_ASM(5) + /* final store */ + "movl %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "m" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "movl %[expect], %%eax\n\t" + "cmpl %[v], %%eax\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "movl %[expect], %%eax\n\t" + "cmpl %[v], %%eax\n\t" + "jnz %l[error2]\n\t" +#endif + /* try store */ + "movl %[newv2], %[v2]\n\t" + RSEQ_INJECT_ASM(5) + "lock; addl $0,-128(%%esp)\n\t" + /* final store */ + "movl %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "m" (expect), + [newv] "r" (newv) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif + +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t expect2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpl %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) + "cmpl %[expect2], %[v2]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(5) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpl %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" + "cmpl %[expect2], %[v2]\n\t" + "jnz %l[error3]\n\t" +#endif + "movl %[newv], %%eax\n\t" + /* final store */ + "movl %%eax, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* cmp2 input */ + [v2] "m" (*v2), + [expect2] "r" (expect2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "m" (newv) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2, error3 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("1st expected value comparison failed"); +error3: + rseq_bug("2nd expected value comparison failed"); +#endif +} + +/* TODO: implement a faster memcpy. */ +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + uint32_t rseq_scratch[3]; + + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + "movl %[src], %[rseq_scratch0]\n\t" + "movl %[dst], %[rseq_scratch1]\n\t" + "movl %[len], %[rseq_scratch2]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "movl %[expect], %%eax\n\t" + "cmpl %%eax, %[v]\n\t" + "jnz 5f\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 6f) + "movl %[expect], %%eax\n\t" + "cmpl %%eax, %[v]\n\t" + "jnz 7f\n\t" +#endif + /* try memcpy */ + "test %[len], %[len]\n\t" \ + "jz 333f\n\t" \ + "222:\n\t" \ + "movb (%[src]), %%al\n\t" \ + "movb %%al, (%[dst])\n\t" \ + "inc %[src]\n\t" \ + "inc %[dst]\n\t" \ + "dec %[len]\n\t" \ + "jnz 222b\n\t" \ + "333:\n\t" \ + RSEQ_INJECT_ASM(5) + "movl %[newv], %%eax\n\t" + /* final store */ + "movl %%eax, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + /* teardown */ + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t" + RSEQ_ASM_DEFINE_ABORT(4, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + abort) + RSEQ_ASM_DEFINE_CMPFAIL(5, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + cmpfail) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_DEFINE_CMPFAIL(6, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + error1) + RSEQ_ASM_DEFINE_CMPFAIL(7, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + error2) +#endif + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "m" (expect), + [newv] "m" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len), + [rseq_scratch0] "m" (rseq_scratch[0]), + [rseq_scratch1] "m" (rseq_scratch[1]), + [rseq_scratch2] "m" (rseq_scratch[2]) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +/* TODO: implement a faster memcpy. */ +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + uint32_t rseq_scratch[3]; + + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + "movl %[src], %[rseq_scratch0]\n\t" + "movl %[dst], %[rseq_scratch1]\n\t" + "movl %[len], %[rseq_scratch2]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "movl %[expect], %%eax\n\t" + "cmpl %%eax, %[v]\n\t" + "jnz 5f\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 6f) + "movl %[expect], %%eax\n\t" + "cmpl %%eax, %[v]\n\t" + "jnz 7f\n\t" +#endif + /* try memcpy */ + "test %[len], %[len]\n\t" \ + "jz 333f\n\t" \ + "222:\n\t" \ + "movb (%[src]), %%al\n\t" \ + "movb %%al, (%[dst])\n\t" \ + "inc %[src]\n\t" \ + "inc %[dst]\n\t" \ + "dec %[len]\n\t" \ + "jnz 222b\n\t" \ + "333:\n\t" \ + RSEQ_INJECT_ASM(5) + "lock; addl $0,-128(%%esp)\n\t" + "movl %[newv], %%eax\n\t" + /* final store */ + "movl %%eax, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + /* teardown */ + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t" + RSEQ_ASM_DEFINE_ABORT(4, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + abort) + RSEQ_ASM_DEFINE_CMPFAIL(5, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + cmpfail) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_DEFINE_CMPFAIL(6, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + error1) + RSEQ_ASM_DEFINE_CMPFAIL(7, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + error2) +#endif + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "m" (expect), + [newv] "m" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len), + [rseq_scratch0] "m" (rseq_scratch[0]), + [rseq_scratch1] "m" (rseq_scratch[1]), + [rseq_scratch2] "m" (rseq_scratch[2]) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +#endif /* !RSEQ_SKIP_FASTPATH */ + +#endif diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c new file mode 100644 index 000000000000..4847e97ed049 --- /dev/null +++ b/tools/testing/selftests/rseq/rseq.c @@ -0,0 +1,117 @@ +// SPDX-License-Identifier: LGPL-2.1 +/* + * rseq.c + * + * Copyright (C) 2016 Mathieu Desnoyers <mathieu.desnoyers at efficios.com> + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; only + * version 2.1 of the License. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + */ + +#define _GNU_SOURCE +#include <errno.h> +#include <sched.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> +#include <syscall.h> +#include <assert.h> +#include <signal.h> + +#include "rseq.h" + +#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])) + +__attribute__((tls_model("initial-exec"))) __thread +volatile struct rseq __rseq_abi = { + .cpu_id = RSEQ_CPU_ID_UNINITIALIZED, +}; + +static __attribute__((tls_model("initial-exec"))) __thread +volatile int refcount; + +static void signal_off_save(sigset_t *oldset) +{ + sigset_t set; + int ret; + + sigfillset(&set); + ret = pthread_sigmask(SIG_BLOCK, &set, oldset); + if (ret) + abort(); +} + +static void signal_restore(sigset_t oldset) +{ + int ret; + + ret = pthread_sigmask(SIG_SETMASK, &oldset, NULL); + if (ret) + abort(); +} + +static int sys_rseq(volatile struct rseq *rseq_abi, uint32_t rseq_len, + int flags, uint32_t sig) +{ + return syscall(__NR_rseq, rseq_abi, rseq_len, flags, sig); +} + +int rseq_register_current_thread(void) +{ + int rc, ret = 0; + sigset_t oldset; + + signal_off_save(&oldset); + if (refcount++) + goto end; + rc = sys_rseq(&__rseq_abi, sizeof(struct rseq), 0, RSEQ_SIG); + if (!rc) { + assert(rseq_current_cpu_raw() >= 0); + goto end; + } + if (errno != EBUSY) + __rseq_abi.cpu_id = -2; + ret = -1; + refcount--; +end: + signal_restore(oldset); + return ret; +} + +int rseq_unregister_current_thread(void) +{ + int rc, ret = 0; + sigset_t oldset; + + signal_off_save(&oldset); + if (--refcount) + goto end; + rc = sys_rseq(&__rseq_abi, sizeof(struct rseq), + RSEQ_FLAG_UNREGISTER, RSEQ_SIG); + if (!rc) + goto end; + ret = -1; +end: + signal_restore(oldset); + return ret; +} + +int32_t rseq_fallback_current_cpu(void) +{ + int32_t cpu; + + cpu = sched_getcpu(); + if (cpu < 0) { + perror("sched_getcpu()"); + abort(); + } + return cpu; +} diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h new file mode 100644 index 000000000000..0a808575cbc4 --- /dev/null +++ b/tools/testing/selftests/rseq/rseq.h @@ -0,0 +1,147 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * rseq.h + * + * (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers at efficios.com> + */ + +#ifndef RSEQ_H +#define RSEQ_H + +#include <stdint.h> +#include <stdbool.h> +#include <pthread.h> +#include <signal.h> +#include <sched.h> +#include <errno.h> +#include <stdio.h> +#include <stdlib.h> +#include <sched.h> +#include <linux/rseq.h> + +/* + * Empty code injection macros, override when testing. + * It is important to consider that the ASM injection macros need to be + * fully reentrant (e.g. do not modify the stack). + */ +#ifndef RSEQ_INJECT_ASM +#define RSEQ_INJECT_ASM(n) +#endif + +#ifndef RSEQ_INJECT_C +#define RSEQ_INJECT_C(n) +#endif + +#ifndef RSEQ_INJECT_INPUT +#define RSEQ_INJECT_INPUT +#endif + +#ifndef RSEQ_INJECT_CLOBBER +#define RSEQ_INJECT_CLOBBER +#endif + +#ifndef RSEQ_INJECT_FAILED +#define RSEQ_INJECT_FAILED +#endif + +extern __thread volatile struct rseq __rseq_abi; + +#define rseq_likely(x) __builtin_expect(!!(x), 1) +#define rseq_unlikely(x) __builtin_expect(!!(x), 0) +#define rseq_barrier() __asm__ __volatile__("" : : : "memory") + +#define RSEQ_ACCESS_ONCE(x) (*(__volatile__ __typeof__(x) *)&(x)) +#define RSEQ_WRITE_ONCE(x, v) __extension__ ({ RSEQ_ACCESS_ONCE(x) = (v); }) +#define RSEQ_READ_ONCE(x) RSEQ_ACCESS_ONCE(x) + +#define __rseq_str_1(x) #x +#define __rseq_str(x) __rseq_str_1(x) + +#define rseq_log(fmt, args...) \ + fprintf(stderr, fmt "(in %s() at " __FILE__ ":" __rseq_str(__LINE__)"\n", \ + ## args, __func__) + +#define rseq_bug(fmt, args...) \ + do { \ + rseq_log(fmt, ##args); \ + abort(); \ + } while (0) + +#if defined(__x86_64__) || defined(__i386__) +#include <rseq-x86.h> +#elif defined(__ARMEL__) +#include <rseq-arm.h> +#elif defined(__PPC__) +#include <rseq-ppc.h> +#else +#error unsupported target +#endif + +/* + * Register rseq for the current thread. This needs to be called once + * by any thread which uses restartable sequences, before they start + * using restartable sequences, to ensure restartable sequences + * succeed. A restartable sequence executed from a non-registered + * thread will always fail. + */ +int rseq_register_current_thread(void); + +/* + * Unregister rseq for current thread. + */ +int rseq_unregister_current_thread(void); + +/* + * Restartable sequence fallback for reading the current CPU number. + */ +int32_t rseq_fallback_current_cpu(void); + +/* + * Values returned can be either the current CPU number, -1 (rseq is + * uninitialized), or -2 (rseq initialization has failed). + */ +static inline int32_t rseq_current_cpu_raw(void) +{ + return RSEQ_ACCESS_ONCE(__rseq_abi.cpu_id); +} + +/* + * Returns a possible CPU number, which is typically the current CPU. + * The returned CPU number can be used to prepare for an rseq critical + * section, which will confirm whether the cpu number is indeed the + * current one, and whether rseq is initialized. + * + * The CPU number returned by rseq_cpu_start should always be validated + * by passing it to a rseq asm sequence, or by comparing it to the + * return value of rseq_current_cpu_raw() if the rseq asm sequence + * does not need to be invoked. + */ +static inline uint32_t rseq_cpu_start(void) +{ + return RSEQ_ACCESS_ONCE(__rseq_abi.cpu_id_start); +} + +static inline uint32_t rseq_current_cpu(void) +{ + int32_t cpu; + + cpu = rseq_current_cpu_raw(); + if (rseq_unlikely(cpu < 0)) + cpu = rseq_fallback_current_cpu(); + return cpu; +} + +/* + * rseq_prepare_unload() should be invoked by each thread using rseq_finish*() + * at least once between their last rseq_finish*() and library unload of the + * library defining the rseq critical section (struct rseq_cs). This also + * applies to use of rseq in code generated by JIT: rseq_prepare_unload() + * should be invoked at least once by each thread using rseq_finish*() before + * reclaim of the memory holding the struct rseq_cs. + */ +static inline void rseq_prepare_unload(void) +{ + __rseq_abi.rseq_cs = 0; +} + +#endif /* RSEQ_H_ */ -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 10/14] rseq: selftests: Provide rseq library (v5) @ 2018-04-30 22:44 ` mathieu.desnoyers 0 siblings, 0 replies; 105+ messages in thread From: mathieu.desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) This rseq helper library provides a user-space API to the rseq() system call. The rseq fast-path exposes the instruction pointer addresses where the rseq assembly blocks begin and end, as well as the associated abort instruction pointer, in the __rseq_table section. This section allows debuggers may know where to place breakpoints when single-stepping through assembly blocks which may be aborted at any point by the kernel. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com> CC: Shuah Khan <shuahkh at osg.samsung.com> CC: Russell King <linux at arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas at arm.com> CC: Will Deacon <will.deacon at arm.com> CC: Thomas Gleixner <tglx at linutronix.de> CC: Paul Turner <pjt at google.com> CC: Andrew Hunter <ahh at google.com> CC: Peter Zijlstra <peterz at infradead.org> CC: Andy Lutomirski <luto at amacapital.net> CC: Andi Kleen <andi at firstfloor.org> CC: Dave Watson <davejwatson at fb.com> CC: Chris Lameter <cl at linux.com> CC: Ingo Molnar <mingo at redhat.com> CC: "H. Peter Anvin" <hpa at zytor.com> CC: Ben Maurer <bmaurer at fb.com> CC: Steven Rostedt <rostedt at goodmis.org> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com> CC: Josh Triplett <josh at joshtriplett.org> CC: Linus Torvalds <torvalds at linux-foundation.org> CC: Andrew Morton <akpm at linux-foundation.org> CC: Boqun Feng <boqun.feng at gmail.com> CC: linux-kselftest at vger.kernel.org CC: linux-api at vger.kernel.org --- Changes since v1: - Provide abort-ip signature: The abort-ip signature is located just before the abort-ip target. It is currently hardcoded, but a user-space application could use the __rseq_table to iterate on all abort-ip targets and use a random value as signature if needed in the future. - Add rseq_prepare_unload(): Libraries and JIT code using rseq critical sections need to issue rseq_prepare_unload() on each thread at least once before reclaim of struct rseq_cs. - Use initial-exec TLS model, non-weak symbol: The initial-exec model is signal-safe, whereas the global-dynamic model is not. Remove the "weak" symbol attribute from the __rseq_abi in rseq.c. The rseq.so library will have ownership of that symbol, and there is not reason for an application or user library to try to define that symbol. The expected use is to link against libreq.so, which owns and provide that symbol. - Set cpu_id to -2 on register error - Add rseq_len syscall parameter, rseq_cs version - Ensure disassember-friendly signature: x86 32/64 disassembler have a hard time decoding the instruction stream after a bad instruction. Use a nopl instruction to encode the signature. Suggested by Andy Lutomirski. - Exercise parametrized tests variants in a shell scripts. - Restartable sequences selftests: Remove use of event counter. - Use cpu_id_start field: With the cpu_id_start field, the C preparation phase of the fast-path does not need to compare cpu_id < 0 anymore. - Signal-safe registration and refcounting: Allow libraries using librseq.so to register it from signal handlers. - Use OVERRIDE_TARGETS in makefile. - Use "m" constraints for rseq_cs field. Changes since v2: - Update based on Thomas Gleixner's comments. Changes since v3: - Generate param_test_skip_fastpath and param_test_benchmark with -DSKIP_FASTPATH and -DBENCHMARK (respectively). Add param_test_fastpath to run_param_test.sh. Changes since v4: - Fold arm: workaround gcc asm size guess, - Namespace barrier() -> rseq_barrier() in library header, - Take into account coding style feedback from Peter Zijlstra, - Split rseq selftests into logical commits. --- tools/testing/selftests/rseq/rseq-arm.h | 715 +++++++++++++++++++ tools/testing/selftests/rseq/rseq-ppc.h | 671 ++++++++++++++++++ tools/testing/selftests/rseq/rseq-skip.h | 65 ++ tools/testing/selftests/rseq/rseq-x86.h | 1132 ++++++++++++++++++++++++++++++ tools/testing/selftests/rseq/rseq.c | 117 +++ tools/testing/selftests/rseq/rseq.h | 147 ++++ 6 files changed, 2847 insertions(+) create mode 100644 tools/testing/selftests/rseq/rseq-arm.h create mode 100644 tools/testing/selftests/rseq/rseq-ppc.h create mode 100644 tools/testing/selftests/rseq/rseq-skip.h create mode 100644 tools/testing/selftests/rseq/rseq-x86.h create mode 100644 tools/testing/selftests/rseq/rseq.c create mode 100644 tools/testing/selftests/rseq/rseq.h diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h new file mode 100644 index 000000000000..3b055f9aeaab --- /dev/null +++ b/tools/testing/selftests/rseq/rseq-arm.h @@ -0,0 +1,715 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * rseq-arm.h + * + * (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers at efficios.com> + */ + +#define RSEQ_SIG 0x53053053 + +#define rseq_smp_mb() __asm__ __volatile__ ("dmb" ::: "memory", "cc") +#define rseq_smp_rmb() __asm__ __volatile__ ("dmb" ::: "memory", "cc") +#define rseq_smp_wmb() __asm__ __volatile__ ("dmb" ::: "memory", "cc") + +#define rseq_smp_load_acquire(p) \ +__extension__ ({ \ + __typeof(*p) ____p1 = RSEQ_READ_ONCE(*p); \ + rseq_smp_mb(); \ + ____p1; \ +}) + +#define rseq_smp_acquire__after_ctrl_dep() rseq_smp_rmb() + +#define rseq_smp_store_release(p, v) \ +do { \ + rseq_smp_mb(); \ + RSEQ_WRITE_ONCE(*p, v); \ +} while (0) + +#ifdef RSEQ_SKIP_FASTPATH +#include "rseq-skip.h" +#else /* !RSEQ_SKIP_FASTPATH */ + +#define __RSEQ_ASM_DEFINE_TABLE(version, flags, start_ip, \ + post_commit_offset, abort_ip) \ + ".pushsection __rseq_table, \"aw\"\n\t" \ + ".balign 32\n\t" \ + ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + ".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_DEFINE_TABLE(start_ip, post_commit_ip, abort_ip) \ + __RSEQ_ASM_DEFINE_TABLE(0x0, 0x0, start_ip, \ + (post_commit_ip - start_ip), abort_ip) + +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs) \ + RSEQ_INJECT_ASM(1) \ + "adr r0, " __rseq_str(cs_label) "\n\t" \ + "str r0, %[" __rseq_str(rseq_cs) "]\n\t" \ + __rseq_str(label) ":\n\t" + +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label) \ + RSEQ_INJECT_ASM(2) \ + "ldr r0, %[" __rseq_str(current_cpu_id) "]\n\t" \ + "cmp %[" __rseq_str(cpu_id) "], r0\n\t" \ + "bne " __rseq_str(label) "\n\t" + +#define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \ + abort_label, version, flags, \ + start_ip, post_commit_offset, abort_ip) \ + __rseq_str(table_label) ":\n\t" \ + ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + ".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \ + ".word " __rseq_str(RSEQ_SIG) "\n\t" \ + __rseq_str(label) ":\n\t" \ + teardown \ + "b %l[" __rseq_str(abort_label) "]\n\t" + +#define RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, abort_label, \ + start_ip, post_commit_ip, abort_ip) \ + __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \ + abort_label, 0x0, 0x0, start_ip, \ + (post_commit_ip - start_ip), abort_ip) + +#define RSEQ_ASM_DEFINE_CMPFAIL(label, teardown, cmpfail_label) \ + __rseq_str(label) ":\n\t" \ + teardown \ + "b %l[" __rseq_str(cmpfail_label) "]\n\t" + +#define rseq_workaround_gcc_asm_size_guess() __asm__ __volatile__("") + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[error2]\n\t" +#endif + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot, + off_t voffp, intptr_t *load, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expectnot], r0\n\t" + "beq %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "ldr r0, %[v]\n\t" + "cmp %[expectnot], r0\n\t" + "beq %l[error2]\n\t" +#endif + "str r0, %[load]\n\t" + "add r0, %[voffp]\n\t" + "ldr r0, [r0]\n\t" + /* final store */ + "str r0, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expectnot] "r" (expectnot), + [voffp] "Ir" (voffp), + [load] "m" (*load) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_addv(intptr_t *v, intptr_t count, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) +#endif + "ldr r0, %[v]\n\t" + "add r0, %[count]\n\t" + /* final store */ + "str r0, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(4) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + [v] "m" (*v), + [count] "Ir" (count) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort +#ifdef RSEQ_COMPARE_TWICE + , error1 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[error2]\n\t" +#endif + /* try store */ + "str %[newv2], %[v2]\n\t" + RSEQ_INJECT_ASM(5) + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[error2]\n\t" +#endif + /* try store */ + "str %[newv2], %[v2]\n\t" + RSEQ_INJECT_ASM(5) + "dmb\n\t" /* full mb provides store-release */ + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t expect2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) + "ldr r0, %[v2]\n\t" + "cmp %[expect2], r0\n\t" + "bne %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(5) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne %l[error2]\n\t" + "ldr r0, %[v2]\n\t" + "cmp %[expect2], r0\n\t" + "bne %l[error3]\n\t" +#endif + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + "b 5f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, "", abort, 1b, 2b, 4f) + "5:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* cmp2 input */ + [v2] "m" (*v2), + [expect2] "r" (expect2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2, error3 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("1st expected value comparison failed"); +error3: + rseq_bug("2nd expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + uint32_t rseq_scratch[3]; + + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + "str %[src], %[rseq_scratch0]\n\t" + "str %[dst], %[rseq_scratch1]\n\t" + "str %[len], %[rseq_scratch2]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne 5f\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 6f) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne 7f\n\t" +#endif + /* try memcpy */ + "cmp %[len], #0\n\t" \ + "beq 333f\n\t" \ + "222:\n\t" \ + "ldrb %%r0, [%[src]]\n\t" \ + "strb %%r0, [%[dst]]\n\t" \ + "adds %[src], #1\n\t" \ + "adds %[dst], #1\n\t" \ + "subs %[len], #1\n\t" \ + "bne 222b\n\t" \ + "333:\n\t" \ + RSEQ_INJECT_ASM(5) + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t" + "b 8f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + abort, 1b, 2b, 4f) + RSEQ_ASM_DEFINE_CMPFAIL(5, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + cmpfail) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_DEFINE_CMPFAIL(6, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + error1) + RSEQ_ASM_DEFINE_CMPFAIL(7, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + error2) +#endif + "8:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len), + [rseq_scratch0] "m" (rseq_scratch[0]), + [rseq_scratch1] "m" (rseq_scratch[1]), + [rseq_scratch2] "m" (rseq_scratch[2]) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_workaround_gcc_asm_size_guess(); + rseq_bug("cpu_id comparison failed"); +error2: + rseq_workaround_gcc_asm_size_guess(); + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + uint32_t rseq_scratch[3]; + + RSEQ_INJECT_C(9) + + rseq_workaround_gcc_asm_size_guess(); + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(1f, 2f, 4f) /* start, commit, abort */ + "str %[src], %[rseq_scratch0]\n\t" + "str %[dst], %[rseq_scratch1]\n\t" + "str %[len], %[rseq_scratch2]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne 5f\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 6f) + "ldr r0, %[v]\n\t" + "cmp %[expect], r0\n\t" + "bne 7f\n\t" +#endif + /* try memcpy */ + "cmp %[len], #0\n\t" \ + "beq 333f\n\t" \ + "222:\n\t" \ + "ldrb %%r0, [%[src]]\n\t" \ + "strb %%r0, [%[dst]]\n\t" \ + "adds %[src], #1\n\t" \ + "adds %[dst], #1\n\t" \ + "subs %[len], #1\n\t" \ + "bne 222b\n\t" \ + "333:\n\t" \ + RSEQ_INJECT_ASM(5) + "dmb\n\t" /* full mb provides store-release */ + /* final store */ + "str %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t" + "b 8f\n\t" + RSEQ_ASM_DEFINE_ABORT(3, 4, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + abort, 1b, 2b, 4f) + RSEQ_ASM_DEFINE_CMPFAIL(5, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + cmpfail) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_DEFINE_CMPFAIL(6, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + error1) + RSEQ_ASM_DEFINE_CMPFAIL(7, + /* teardown */ + "ldr %[len], %[rseq_scratch2]\n\t" + "ldr %[dst], %[rseq_scratch1]\n\t" + "ldr %[src], %[rseq_scratch0]\n\t", + error2) +#endif + "8:\n\t" + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len), + [rseq_scratch0] "m" (rseq_scratch[0]), + [rseq_scratch1] "m" (rseq_scratch[1]), + [rseq_scratch2] "m" (rseq_scratch[2]) + RSEQ_INJECT_INPUT + : "r0", "memory", "cc" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + rseq_workaround_gcc_asm_size_guess(); + return 0; +abort: + rseq_workaround_gcc_asm_size_guess(); + RSEQ_INJECT_FAILED + return -1; +cmpfail: + rseq_workaround_gcc_asm_size_guess(); + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_workaround_gcc_asm_size_guess(); + rseq_bug("cpu_id comparison failed"); +error2: + rseq_workaround_gcc_asm_size_guess(); + rseq_bug("expected value comparison failed"); +#endif +} + +#endif /* !RSEQ_SKIP_FASTPATH */ diff --git a/tools/testing/selftests/rseq/rseq-ppc.h b/tools/testing/selftests/rseq/rseq-ppc.h new file mode 100644 index 000000000000..52630c9f42be --- /dev/null +++ b/tools/testing/selftests/rseq/rseq-ppc.h @@ -0,0 +1,671 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * rseq-ppc.h + * + * (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers at efficios.com> + * (C) Copyright 2016-2018 - Boqun Feng <boqun.feng at gmail.com> + */ + +#define RSEQ_SIG 0x53053053 + +#define rseq_smp_mb() __asm__ __volatile__ ("sync" ::: "memory", "cc") +#define rseq_smp_lwsync() __asm__ __volatile__ ("lwsync" ::: "memory", "cc") +#define rseq_smp_rmb() rseq_smp_lwsync() +#define rseq_smp_wmb() rseq_smp_lwsync() + +#define rseq_smp_load_acquire(p) \ +__extension__ ({ \ + __typeof(*p) ____p1 = RSEQ_READ_ONCE(*p); \ + rseq_smp_lwsync(); \ + ____p1; \ +}) + +#define rseq_smp_acquire__after_ctrl_dep() rseq_smp_lwsync() + +#define rseq_smp_store_release(p, v) \ +do { \ + rseq_smp_lwsync(); \ + RSEQ_WRITE_ONCE(*p, v); \ +} while (0) + +#ifdef RSEQ_SKIP_FASTPATH +#include "rseq-skip.h" +#else /* !RSEQ_SKIP_FASTPATH */ + +/* + * The __rseq_table section can be used by debuggers to better handle + * single-stepping through the restartable critical sections. + */ + +#ifdef __PPC64__ + +#define STORE_WORD "std " +#define LOAD_WORD "ld " +#define LOADX_WORD "ldx " +#define CMP_WORD "cmpd " + +#define __RSEQ_ASM_DEFINE_TABLE(label, version, flags, \ + start_ip, post_commit_offset, abort_ip) \ + ".pushsection __rseq_table, \"aw\"\n\t" \ + ".balign 32\n\t" \ + __rseq_str(label) ":\n\t" \ + ".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + ".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs) \ + RSEQ_INJECT_ASM(1) \ + "lis %%r17, (" __rseq_str(cs_label) ")@highest\n\t" \ + "ori %%r17, %%r17, (" __rseq_str(cs_label) ")@higher\n\t" \ + "rldicr %%r17, %%r17, 32, 31\n\t" \ + "oris %%r17, %%r17, (" __rseq_str(cs_label) ")@high\n\t" \ + "ori %%r17, %%r17, (" __rseq_str(cs_label) ")@l\n\t" \ + "std %%r17, %[" __rseq_str(rseq_cs) "]\n\t" \ + __rseq_str(label) ":\n\t" + +#else /* #ifdef __PPC64__ */ + +#define STORE_WORD "stw " +#define LOAD_WORD "lwz " +#define LOADX_WORD "lwzx " +#define CMP_WORD "cmpw " + +#define __RSEQ_ASM_DEFINE_TABLE(label, version, flags, \ + start_ip, post_commit_offset, abort_ip) \ + ".pushsection __rseq_table, \"aw\"\n\t" \ + ".balign 32\n\t" \ + __rseq_str(label) ":\n\t" \ + ".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + /* 32-bit only supported on BE */ \ + ".long 0x0, " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) "\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs) \ + RSEQ_INJECT_ASM(1) \ + "lis %%r17, (" __rseq_str(cs_label) ")@ha\n\t" \ + "addi %%r17, %%r17, (" __rseq_str(cs_label) ")@l\n\t" \ + "stw %%r17, %[" __rseq_str(rseq_cs) "]\n\t" \ + __rseq_str(label) ":\n\t" + +#endif /* #ifdef __PPC64__ */ + +#define RSEQ_ASM_DEFINE_TABLE(label, start_ip, post_commit_ip, abort_ip) \ + __RSEQ_ASM_DEFINE_TABLE(label, 0x0, 0x0, start_ip, \ + (post_commit_ip - start_ip), abort_ip) + +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label) \ + RSEQ_INJECT_ASM(2) \ + "lwz %%r17, %[" __rseq_str(current_cpu_id) "]\n\t" \ + "cmpw cr7, %[" __rseq_str(cpu_id) "], %%r17\n\t" \ + "bne- cr7, " __rseq_str(label) "\n\t" + +#define RSEQ_ASM_DEFINE_ABORT(label, abort_label) \ + ".pushsection __rseq_failure, \"ax\"\n\t" \ + ".long " __rseq_str(RSEQ_SIG) "\n\t" \ + __rseq_str(label) ":\n\t" \ + "b %l[" __rseq_str(abort_label) "]\n\t" \ + ".popsection\n\t" + +/* + * RSEQ_ASM_OPs: asm operations for rseq + * RSEQ_ASM_OP_R_*: has hard-code registers in it + * RSEQ_ASM_OP_* (else): doesn't have hard-code registers(unless cr7) + */ +#define RSEQ_ASM_OP_CMPEQ(var, expect, label) \ + LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t" \ + CMP_WORD "cr7, %%r17, %[" __rseq_str(expect) "]\n\t" \ + "bne- cr7, " __rseq_str(label) "\n\t" + +#define RSEQ_ASM_OP_CMPNE(var, expectnot, label) \ + LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t" \ + CMP_WORD "cr7, %%r17, %[" __rseq_str(expectnot) "]\n\t" \ + "beq- cr7, " __rseq_str(label) "\n\t" + +#define RSEQ_ASM_OP_STORE(value, var) \ + STORE_WORD "%[" __rseq_str(value) "], %[" __rseq_str(var) "]\n\t" + +/* Load @var to r17 */ +#define RSEQ_ASM_OP_R_LOAD(var) \ + LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t" + +/* Store r17 to @var */ +#define RSEQ_ASM_OP_R_STORE(var) \ + STORE_WORD "%%r17, %[" __rseq_str(var) "]\n\t" + +/* Add @count to r17 */ +#define RSEQ_ASM_OP_R_ADD(count) \ + "add %%r17, %[" __rseq_str(count) "], %%r17\n\t" + +/* Load (r17 + voffp) to r17 */ +#define RSEQ_ASM_OP_R_LOADX(voffp) \ + LOADX_WORD "%%r17, %[" __rseq_str(voffp) "], %%r17\n\t" + +/* TODO: implement a faster memcpy. */ +#define RSEQ_ASM_OP_R_MEMCPY() \ + "cmpdi %%r19, 0\n\t" \ + "beq 333f\n\t" \ + "addi %%r20, %%r20, -1\n\t" \ + "addi %%r21, %%r21, -1\n\t" \ + "222:\n\t" \ + "lbzu %%r18, 1(%%r20)\n\t" \ + "stbu %%r18, 1(%%r21)\n\t" \ + "addi %%r19, %%r19, -1\n\t" \ + "cmpdi %%r19, 0\n\t" \ + "bne 222b\n\t" \ + "333:\n\t" \ + +#define RSEQ_ASM_OP_R_FINAL_STORE(var, post_commit_label) \ + STORE_WORD "%%r17, %[" __rseq_str(var) "]\n\t" \ + __rseq_str(post_commit_label) ":\n\t" + +#define RSEQ_ASM_OP_FINAL_STORE(value, var, post_commit_label) \ + STORE_WORD "%[" __rseq_str(value) "], %[" __rseq_str(var) "]\n\t" \ + __rseq_str(post_commit_label) ":\n\t" + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) +#endif + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot, + off_t voffp, intptr_t *load, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v not equal to @expectnot */ + RSEQ_ASM_OP_CMPNE(v, expectnot, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v not equal to @expectnot */ + RSEQ_ASM_OP_CMPNE(v, expectnot, %l[error2]) +#endif + /* load the value of @v */ + RSEQ_ASM_OP_R_LOAD(v) + /* store it in @load */ + RSEQ_ASM_OP_R_STORE(load) + /* dereference voffp(v) */ + RSEQ_ASM_OP_R_LOADX(voffp) + /* final store the value at voffp(v) */ + RSEQ_ASM_OP_R_FINAL_STORE(v, 2) + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expectnot] "r" (expectnot), + [voffp] "b" (voffp), + [load] "m" (*load) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_addv(intptr_t *v, intptr_t count, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) +#endif + /* load the value of @v */ + RSEQ_ASM_OP_R_LOAD(v) + /* add @count to it */ + RSEQ_ASM_OP_R_ADD(count) + /* final store */ + RSEQ_ASM_OP_R_FINAL_STORE(v, 2) + RSEQ_INJECT_ASM(4) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [count] "r" (count) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort +#ifdef RSEQ_COMPARE_TWICE + , error1 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) +#endif + /* try store */ + RSEQ_ASM_OP_STORE(newv2, v2) + RSEQ_INJECT_ASM(5) + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) +#endif + /* try store */ + RSEQ_ASM_OP_STORE(newv2, v2) + RSEQ_INJECT_ASM(5) + /* for 'release' */ + "lwsync\n\t" + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t expect2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) + /* cmp @v2 equal to @expct2 */ + RSEQ_ASM_OP_CMPEQ(v2, expect2, %l[cmpfail]) + RSEQ_INJECT_ASM(5) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) + /* cmp @v2 equal to @expct2 */ + RSEQ_ASM_OP_CMPEQ(v2, expect2, %l[error3]) +#endif + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* cmp2 input */ + [v2] "m" (*v2), + [expect2] "r" (expect2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2, error3 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("1st expected value comparison failed"); +error3: + rseq_bug("2nd expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* setup for mempcy */ + "mr %%r19, %[len]\n\t" + "mr %%r20, %[src]\n\t" + "mr %%r21, %[dst]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) +#endif + /* try memcpy */ + RSEQ_ASM_OP_R_MEMCPY() + RSEQ_INJECT_ASM(5) + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(6) + /* teardown */ + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17", "r18", "r19", "r20", "r21" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* setup for mempcy */ + "mr %%r19, %[len]\n\t" + "mr %%r20, %[src]\n\t" + "mr %%r21, %[dst]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[cmpfail]) + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + /* cmp cpuid */ + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + /* cmp @v equal to @expect */ + RSEQ_ASM_OP_CMPEQ(v, expect, %l[error2]) +#endif + /* try memcpy */ + RSEQ_ASM_OP_R_MEMCPY() + RSEQ_INJECT_ASM(5) + /* for 'release' */ + "lwsync\n\t" + /* final store */ + RSEQ_ASM_OP_FINAL_STORE(newv, v, 2) + RSEQ_INJECT_ASM(6) + /* teardown */ + RSEQ_ASM_DEFINE_ABORT(4, abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len) + RSEQ_INJECT_INPUT + : "memory", "cc", "r17", "r18", "r19", "r20", "r21" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +#undef STORE_WORD +#undef LOAD_WORD +#undef LOADX_WORD +#undef CMP_WORD + +#endif /* !RSEQ_SKIP_FASTPATH */ diff --git a/tools/testing/selftests/rseq/rseq-skip.h b/tools/testing/selftests/rseq/rseq-skip.h new file mode 100644 index 000000000000..72750b5905a9 --- /dev/null +++ b/tools/testing/selftests/rseq/rseq-skip.h @@ -0,0 +1,65 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * rseq-skip.h + * + * (C) Copyright 2017-2018 - Mathieu Desnoyers <mathieu.desnoyers at efficios.com> + */ + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot, + off_t voffp, intptr_t *load, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_addv(intptr_t *v, intptr_t count, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t expect2, + intptr_t newv, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + return -1; +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + return -1; +} diff --git a/tools/testing/selftests/rseq/rseq-x86.h b/tools/testing/selftests/rseq/rseq-x86.h new file mode 100644 index 000000000000..089410a314e9 --- /dev/null +++ b/tools/testing/selftests/rseq/rseq-x86.h @@ -0,0 +1,1132 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * rseq-x86.h + * + * (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers at efficios.com> + */ + +#include <stdint.h> + +#define RSEQ_SIG 0x53053053 + +#ifdef __x86_64__ + +#define rseq_smp_mb() \ + __asm__ __volatile__ ("lock; addl $0,-128(%%rsp)" ::: "memory", "cc") +#define rseq_smp_rmb() rseq_barrier() +#define rseq_smp_wmb() rseq_barrier() + +#define rseq_smp_load_acquire(p) \ +__extension__ ({ \ + __typeof(*p) ____p1 = RSEQ_READ_ONCE(*p); \ + rseq_barrier(); \ + ____p1; \ +}) + +#define rseq_smp_acquire__after_ctrl_dep() rseq_smp_rmb() + +#define rseq_smp_store_release(p, v) \ +do { \ + rseq_barrier(); \ + RSEQ_WRITE_ONCE(*p, v); \ +} while (0) + +#ifdef RSEQ_SKIP_FASTPATH +#include "rseq-skip.h" +#else /* !RSEQ_SKIP_FASTPATH */ + +#define __RSEQ_ASM_DEFINE_TABLE(label, version, flags, \ + start_ip, post_commit_offset, abort_ip) \ + ".pushsection __rseq_table, \"aw\"\n\t" \ + ".balign 32\n\t" \ + __rseq_str(label) ":\n\t" \ + ".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + ".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_DEFINE_TABLE(label, start_ip, post_commit_ip, abort_ip) \ + __RSEQ_ASM_DEFINE_TABLE(label, 0x0, 0x0, start_ip, \ + (post_commit_ip - start_ip), abort_ip) + +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs) \ + RSEQ_INJECT_ASM(1) \ + "leaq " __rseq_str(cs_label) "(%%rip), %%rax\n\t" \ + "movq %%rax, %[" __rseq_str(rseq_cs) "]\n\t" \ + __rseq_str(label) ":\n\t" + +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label) \ + RSEQ_INJECT_ASM(2) \ + "cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \ + "jnz " __rseq_str(label) "\n\t" + +#define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ + ".pushsection __rseq_failure, \"ax\"\n\t" \ + /* Disassembler-friendly signature: nopl <sig>(%rip). */\ + ".byte 0x0f, 0x1f, 0x05\n\t" \ + ".long " __rseq_str(RSEQ_SIG) "\n\t" \ + __rseq_str(label) ":\n\t" \ + teardown \ + "jmp %l[" __rseq_str(abort_label) "]\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_DEFINE_CMPFAIL(label, teardown, cmpfail_label) \ + ".pushsection __rseq_failure, \"ax\"\n\t" \ + __rseq_str(label) ":\n\t" \ + teardown \ + "jmp %l[" __rseq_str(cmpfail_label) "]\n\t" \ + ".popsection\n\t" + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpq %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpq %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" +#endif + /* final store */ + "movq %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + : "memory", "cc", "rax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +/* + * Compare @v against @expectnot. When it does _not_ match, load @v + * into @load, and store the content of *@v + voffp into @v. + */ +static inline __attribute__((always_inline)) +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot, + off_t voffp, intptr_t *load, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "movq %[v], %%rbx\n\t" + "cmpq %%rbx, %[expectnot]\n\t" + "je %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "movq %[v], %%rbx\n\t" + "cmpq %%rbx, %[expectnot]\n\t" + "je %l[error2]\n\t" +#endif + "movq %%rbx, %[load]\n\t" + "addq %[voffp], %%rbx\n\t" + "movq (%%rbx), %%rbx\n\t" + /* final store */ + "movq %%rbx, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expectnot] "r" (expectnot), + [voffp] "er" (voffp), + [load] "m" (*load) + : "memory", "cc", "rax", "rbx" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_addv(intptr_t *v, intptr_t count, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) +#endif + /* final store */ + "addq %[count], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(4) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [count] "er" (count) + : "memory", "cc", "rax" + RSEQ_INJECT_CLOBBER + : abort +#ifdef RSEQ_COMPARE_TWICE + , error1 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpq %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpq %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" +#endif + /* try store */ + "movq %[newv2], %[v2]\n\t" + RSEQ_INJECT_ASM(5) + /* final store */ + "movq %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + : "memory", "cc", "rax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +/* x86-64 is TSO. */ +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + return rseq_cmpeqv_trystorev_storev(v, expect, v2, newv2, newv, cpu); +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t expect2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpq %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) + "cmpq %[v2], %[expect2]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(5) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpq %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" + "cmpq %[v2], %[expect2]\n\t" + "jnz %l[error3]\n\t" +#endif + /* final store */ + "movq %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* cmp2 input */ + [v2] "m" (*v2), + [expect2] "r" (expect2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + : "memory", "cc", "rax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2, error3 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("1st expected value comparison failed"); +error3: + rseq_bug("2nd expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + uint64_t rseq_scratch[3]; + + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + "movq %[src], %[rseq_scratch0]\n\t" + "movq %[dst], %[rseq_scratch1]\n\t" + "movq %[len], %[rseq_scratch2]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpq %[v], %[expect]\n\t" + "jnz 5f\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 6f) + "cmpq %[v], %[expect]\n\t" + "jnz 7f\n\t" +#endif + /* try memcpy */ + "test %[len], %[len]\n\t" \ + "jz 333f\n\t" \ + "222:\n\t" \ + "movb (%[src]), %%al\n\t" \ + "movb %%al, (%[dst])\n\t" \ + "inc %[src]\n\t" \ + "inc %[dst]\n\t" \ + "dec %[len]\n\t" \ + "jnz 222b\n\t" \ + "333:\n\t" \ + RSEQ_INJECT_ASM(5) + /* final store */ + "movq %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + /* teardown */ + "movq %[rseq_scratch2], %[len]\n\t" + "movq %[rseq_scratch1], %[dst]\n\t" + "movq %[rseq_scratch0], %[src]\n\t" + RSEQ_ASM_DEFINE_ABORT(4, + "movq %[rseq_scratch2], %[len]\n\t" + "movq %[rseq_scratch1], %[dst]\n\t" + "movq %[rseq_scratch0], %[src]\n\t", + abort) + RSEQ_ASM_DEFINE_CMPFAIL(5, + "movq %[rseq_scratch2], %[len]\n\t" + "movq %[rseq_scratch1], %[dst]\n\t" + "movq %[rseq_scratch0], %[src]\n\t", + cmpfail) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_DEFINE_CMPFAIL(6, + "movq %[rseq_scratch2], %[len]\n\t" + "movq %[rseq_scratch1], %[dst]\n\t" + "movq %[rseq_scratch0], %[src]\n\t", + error1) + RSEQ_ASM_DEFINE_CMPFAIL(7, + "movq %[rseq_scratch2], %[len]\n\t" + "movq %[rseq_scratch1], %[dst]\n\t" + "movq %[rseq_scratch0], %[src]\n\t", + error2) +#endif + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len), + [rseq_scratch0] "m" (rseq_scratch[0]), + [rseq_scratch1] "m" (rseq_scratch[1]), + [rseq_scratch2] "m" (rseq_scratch[2]) + : "memory", "cc", "rax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +/* x86-64 is TSO. */ +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + return rseq_cmpeqv_trymemcpy_storev(v, expect, dst, src, len, + newv, cpu); +} + +#endif /* !RSEQ_SKIP_FASTPATH */ + +#elif __i386__ + +#define rseq_smp_mb() \ + __asm__ __volatile__ ("lock; addl $0,-128(%%esp)" ::: "memory", "cc") +#define rseq_smp_rmb() \ + __asm__ __volatile__ ("lock; addl $0,-128(%%esp)" ::: "memory", "cc") +#define rseq_smp_wmb() \ + __asm__ __volatile__ ("lock; addl $0,-128(%%esp)" ::: "memory", "cc") + +#define rseq_smp_load_acquire(p) \ +__extension__ ({ \ + __typeof(*p) ____p1 = RSEQ_READ_ONCE(*p); \ + rseq_smp_mb(); \ + ____p1; \ +}) + +#define rseq_smp_acquire__after_ctrl_dep() rseq_smp_rmb() + +#define rseq_smp_store_release(p, v) \ +do { \ + rseq_smp_mb(); \ + RSEQ_WRITE_ONCE(*p, v); \ +} while (0) + +#ifdef RSEQ_SKIP_FASTPATH +#include "rseq-skip.h" +#else /* !RSEQ_SKIP_FASTPATH */ + +/* + * Use eax as scratch register and take memory operands as input to + * lessen register pressure. Especially needed when compiling in O0. + */ +#define __RSEQ_ASM_DEFINE_TABLE(label, version, flags, \ + start_ip, post_commit_offset, abort_ip) \ + ".pushsection __rseq_table, \"aw\"\n\t" \ + ".balign 32\n\t" \ + __rseq_str(label) ":\n\t" \ + ".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ + ".long " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_DEFINE_TABLE(label, start_ip, post_commit_ip, abort_ip) \ + __RSEQ_ASM_DEFINE_TABLE(label, 0x0, 0x0, start_ip, \ + (post_commit_ip - start_ip), abort_ip) + +#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs) \ + RSEQ_INJECT_ASM(1) \ + "movl $" __rseq_str(cs_label) ", %[rseq_cs]\n\t" \ + __rseq_str(label) ":\n\t" + +#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label) \ + RSEQ_INJECT_ASM(2) \ + "cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \ + "jnz " __rseq_str(label) "\n\t" + +#define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ + ".pushsection __rseq_failure, \"ax\"\n\t" \ + /* Disassembler-friendly signature: nopl <sig>. */ \ + ".byte 0x0f, 0x1f, 0x05\n\t" \ + ".long " __rseq_str(RSEQ_SIG) "\n\t" \ + __rseq_str(label) ":\n\t" \ + teardown \ + "jmp %l[" __rseq_str(abort_label) "]\n\t" \ + ".popsection\n\t" + +#define RSEQ_ASM_DEFINE_CMPFAIL(label, teardown, cmpfail_label) \ + ".pushsection __rseq_failure, \"ax\"\n\t" \ + __rseq_str(label) ":\n\t" \ + teardown \ + "jmp %l[" __rseq_str(cmpfail_label) "]\n\t" \ + ".popsection\n\t" + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpl %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpl %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" +#endif + /* final store */ + "movl %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +/* + * Compare @v against @expectnot. When it does _not_ match, load @v + * into @load, and store the content of *@v + voffp into @v. + */ +static inline __attribute__((always_inline)) +int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot, + off_t voffp, intptr_t *load, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "movl %[v], %%ebx\n\t" + "cmpl %%ebx, %[expectnot]\n\t" + "je %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "movl %[v], %%ebx\n\t" + "cmpl %%ebx, %[expectnot]\n\t" + "je %l[error2]\n\t" +#endif + "movl %%ebx, %[load]\n\t" + "addl %[voffp], %%ebx\n\t" + "movl (%%ebx), %%ebx\n\t" + /* final store */ + "movl %%ebx, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(5) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expectnot] "r" (expectnot), + [voffp] "ir" (voffp), + [load] "m" (*load) + : "memory", "cc", "eax", "ebx" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_addv(intptr_t *v, intptr_t count, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) +#endif + /* final store */ + "addl %[count], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(4) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [count] "ir" (count) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort +#ifdef RSEQ_COMPARE_TWICE + , error1 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpl %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpl %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" +#endif + /* try store */ + "movl %[newv2], %%eax\n\t" + "movl %%eax, %[v2]\n\t" + RSEQ_INJECT_ASM(5) + /* final store */ + "movl %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "m" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "r" (newv) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t newv2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "movl %[expect], %%eax\n\t" + "cmpl %[v], %%eax\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "movl %[expect], %%eax\n\t" + "cmpl %[v], %%eax\n\t" + "jnz %l[error2]\n\t" +#endif + /* try store */ + "movl %[newv2], %[v2]\n\t" + RSEQ_INJECT_ASM(5) + "lock; addl $0,-128(%%esp)\n\t" + /* final store */ + "movl %[newv], %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* try store input */ + [v2] "m" (*v2), + [newv2] "r" (newv2), + /* final store input */ + [v] "m" (*v), + [expect] "m" (expect), + [newv] "r" (newv) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif + +} + +static inline __attribute__((always_inline)) +int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect, + intptr_t *v2, intptr_t expect2, + intptr_t newv, int cpu) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "cmpl %[v], %[expect]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(4) + "cmpl %[expect2], %[v2]\n\t" + "jnz %l[cmpfail]\n\t" + RSEQ_INJECT_ASM(5) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1]) + "cmpl %[v], %[expect]\n\t" + "jnz %l[error2]\n\t" + "cmpl %[expect2], %[v2]\n\t" + "jnz %l[error3]\n\t" +#endif + "movl %[newv], %%eax\n\t" + /* final store */ + "movl %%eax, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* cmp2 input */ + [v2] "m" (*v2), + [expect2] "r" (expect2), + /* final store input */ + [v] "m" (*v), + [expect] "r" (expect), + [newv] "m" (newv) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2, error3 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("1st expected value comparison failed"); +error3: + rseq_bug("2nd expected value comparison failed"); +#endif +} + +/* TODO: implement a faster memcpy. */ +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + uint32_t rseq_scratch[3]; + + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + "movl %[src], %[rseq_scratch0]\n\t" + "movl %[dst], %[rseq_scratch1]\n\t" + "movl %[len], %[rseq_scratch2]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "movl %[expect], %%eax\n\t" + "cmpl %%eax, %[v]\n\t" + "jnz 5f\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 6f) + "movl %[expect], %%eax\n\t" + "cmpl %%eax, %[v]\n\t" + "jnz 7f\n\t" +#endif + /* try memcpy */ + "test %[len], %[len]\n\t" \ + "jz 333f\n\t" \ + "222:\n\t" \ + "movb (%[src]), %%al\n\t" \ + "movb %%al, (%[dst])\n\t" \ + "inc %[src]\n\t" \ + "inc %[dst]\n\t" \ + "dec %[len]\n\t" \ + "jnz 222b\n\t" \ + "333:\n\t" \ + RSEQ_INJECT_ASM(5) + "movl %[newv], %%eax\n\t" + /* final store */ + "movl %%eax, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + /* teardown */ + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t" + RSEQ_ASM_DEFINE_ABORT(4, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + abort) + RSEQ_ASM_DEFINE_CMPFAIL(5, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + cmpfail) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_DEFINE_CMPFAIL(6, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + error1) + RSEQ_ASM_DEFINE_CMPFAIL(7, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + error2) +#endif + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "m" (expect), + [newv] "m" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len), + [rseq_scratch0] "m" (rseq_scratch[0]), + [rseq_scratch1] "m" (rseq_scratch[1]), + [rseq_scratch2] "m" (rseq_scratch[2]) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +/* TODO: implement a faster memcpy. */ +static inline __attribute__((always_inline)) +int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect, + void *dst, void *src, size_t len, + intptr_t newv, int cpu) +{ + uint32_t rseq_scratch[3]; + + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + "movl %[src], %[rseq_scratch0]\n\t" + "movl %[dst], %[rseq_scratch1]\n\t" + "movl %[len], %[rseq_scratch2]\n\t" + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs) + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f) + RSEQ_INJECT_ASM(3) + "movl %[expect], %%eax\n\t" + "cmpl %%eax, %[v]\n\t" + "jnz 5f\n\t" + RSEQ_INJECT_ASM(4) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 6f) + "movl %[expect], %%eax\n\t" + "cmpl %%eax, %[v]\n\t" + "jnz 7f\n\t" +#endif + /* try memcpy */ + "test %[len], %[len]\n\t" \ + "jz 333f\n\t" \ + "222:\n\t" \ + "movb (%[src]), %%al\n\t" \ + "movb %%al, (%[dst])\n\t" \ + "inc %[src]\n\t" \ + "inc %[dst]\n\t" \ + "dec %[len]\n\t" \ + "jnz 222b\n\t" \ + "333:\n\t" \ + RSEQ_INJECT_ASM(5) + "lock; addl $0,-128(%%esp)\n\t" + "movl %[newv], %%eax\n\t" + /* final store */ + "movl %%eax, %[v]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(6) + /* teardown */ + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t" + RSEQ_ASM_DEFINE_ABORT(4, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + abort) + RSEQ_ASM_DEFINE_CMPFAIL(5, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + cmpfail) +#ifdef RSEQ_COMPARE_TWICE + RSEQ_ASM_DEFINE_CMPFAIL(6, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + error1) + RSEQ_ASM_DEFINE_CMPFAIL(7, + "movl %[rseq_scratch2], %[len]\n\t" + "movl %[rseq_scratch1], %[dst]\n\t" + "movl %[rseq_scratch0], %[src]\n\t", + error2) +#endif + : /* gcc asm goto does not allow outputs */ + : [cpu_id] "r" (cpu), + [current_cpu_id] "m" (__rseq_abi.cpu_id), + [rseq_cs] "m" (__rseq_abi.rseq_cs), + /* final store input */ + [v] "m" (*v), + [expect] "m" (expect), + [newv] "m" (newv), + /* try memcpy input */ + [dst] "r" (dst), + [src] "r" (src), + [len] "r" (len), + [rseq_scratch0] "m" (rseq_scratch[0]), + [rseq_scratch1] "m" (rseq_scratch[1]), + [rseq_scratch2] "m" (rseq_scratch[2]) + : "memory", "cc", "eax" + RSEQ_INJECT_CLOBBER + : abort, cmpfail +#ifdef RSEQ_COMPARE_TWICE + , error1, error2 +#endif + ); + return 0; +abort: + RSEQ_INJECT_FAILED + return -1; +cmpfail: + return 1; +#ifdef RSEQ_COMPARE_TWICE +error1: + rseq_bug("cpu_id comparison failed"); +error2: + rseq_bug("expected value comparison failed"); +#endif +} + +#endif /* !RSEQ_SKIP_FASTPATH */ + +#endif diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c new file mode 100644 index 000000000000..4847e97ed049 --- /dev/null +++ b/tools/testing/selftests/rseq/rseq.c @@ -0,0 +1,117 @@ +// SPDX-License-Identifier: LGPL-2.1 +/* + * rseq.c + * + * Copyright (C) 2016 Mathieu Desnoyers <mathieu.desnoyers at efficios.com> + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; only + * version 2.1 of the License. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + */ + +#define _GNU_SOURCE +#include <errno.h> +#include <sched.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> +#include <syscall.h> +#include <assert.h> +#include <signal.h> + +#include "rseq.h" + +#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])) + +__attribute__((tls_model("initial-exec"))) __thread +volatile struct rseq __rseq_abi = { + .cpu_id = RSEQ_CPU_ID_UNINITIALIZED, +}; + +static __attribute__((tls_model("initial-exec"))) __thread +volatile int refcount; + +static void signal_off_save(sigset_t *oldset) +{ + sigset_t set; + int ret; + + sigfillset(&set); + ret = pthread_sigmask(SIG_BLOCK, &set, oldset); + if (ret) + abort(); +} + +static void signal_restore(sigset_t oldset) +{ + int ret; + + ret = pthread_sigmask(SIG_SETMASK, &oldset, NULL); + if (ret) + abort(); +} + +static int sys_rseq(volatile struct rseq *rseq_abi, uint32_t rseq_len, + int flags, uint32_t sig) +{ + return syscall(__NR_rseq, rseq_abi, rseq_len, flags, sig); +} + +int rseq_register_current_thread(void) +{ + int rc, ret = 0; + sigset_t oldset; + + signal_off_save(&oldset); + if (refcount++) + goto end; + rc = sys_rseq(&__rseq_abi, sizeof(struct rseq), 0, RSEQ_SIG); + if (!rc) { + assert(rseq_current_cpu_raw() >= 0); + goto end; + } + if (errno != EBUSY) + __rseq_abi.cpu_id = -2; + ret = -1; + refcount--; +end: + signal_restore(oldset); + return ret; +} + +int rseq_unregister_current_thread(void) +{ + int rc, ret = 0; + sigset_t oldset; + + signal_off_save(&oldset); + if (--refcount) + goto end; + rc = sys_rseq(&__rseq_abi, sizeof(struct rseq), + RSEQ_FLAG_UNREGISTER, RSEQ_SIG); + if (!rc) + goto end; + ret = -1; +end: + signal_restore(oldset); + return ret; +} + +int32_t rseq_fallback_current_cpu(void) +{ + int32_t cpu; + + cpu = sched_getcpu(); + if (cpu < 0) { + perror("sched_getcpu()"); + abort(); + } + return cpu; +} diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h new file mode 100644 index 000000000000..0a808575cbc4 --- /dev/null +++ b/tools/testing/selftests/rseq/rseq.h @@ -0,0 +1,147 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * rseq.h + * + * (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers at efficios.com> + */ + +#ifndef RSEQ_H +#define RSEQ_H + +#include <stdint.h> +#include <stdbool.h> +#include <pthread.h> +#include <signal.h> +#include <sched.h> +#include <errno.h> +#include <stdio.h> +#include <stdlib.h> +#include <sched.h> +#include <linux/rseq.h> + +/* + * Empty code injection macros, override when testing. + * It is important to consider that the ASM injection macros need to be + * fully reentrant (e.g. do not modify the stack). + */ +#ifndef RSEQ_INJECT_ASM +#define RSEQ_INJECT_ASM(n) +#endif + +#ifndef RSEQ_INJECT_C +#define RSEQ_INJECT_C(n) +#endif + +#ifndef RSEQ_INJECT_INPUT +#define RSEQ_INJECT_INPUT +#endif + +#ifndef RSEQ_INJECT_CLOBBER +#define RSEQ_INJECT_CLOBBER +#endif + +#ifndef RSEQ_INJECT_FAILED +#define RSEQ_INJECT_FAILED +#endif + +extern __thread volatile struct rseq __rseq_abi; + +#define rseq_likely(x) __builtin_expect(!!(x), 1) +#define rseq_unlikely(x) __builtin_expect(!!(x), 0) +#define rseq_barrier() __asm__ __volatile__("" : : : "memory") + +#define RSEQ_ACCESS_ONCE(x) (*(__volatile__ __typeof__(x) *)&(x)) +#define RSEQ_WRITE_ONCE(x, v) __extension__ ({ RSEQ_ACCESS_ONCE(x) = (v); }) +#define RSEQ_READ_ONCE(x) RSEQ_ACCESS_ONCE(x) + +#define __rseq_str_1(x) #x +#define __rseq_str(x) __rseq_str_1(x) + +#define rseq_log(fmt, args...) \ + fprintf(stderr, fmt "(in %s() at " __FILE__ ":" __rseq_str(__LINE__)"\n", \ + ## args, __func__) + +#define rseq_bug(fmt, args...) \ + do { \ + rseq_log(fmt, ##args); \ + abort(); \ + } while (0) + +#if defined(__x86_64__) || defined(__i386__) +#include <rseq-x86.h> +#elif defined(__ARMEL__) +#include <rseq-arm.h> +#elif defined(__PPC__) +#include <rseq-ppc.h> +#else +#error unsupported target +#endif + +/* + * Register rseq for the current thread. This needs to be called once + * by any thread which uses restartable sequences, before they start + * using restartable sequences, to ensure restartable sequences + * succeed. A restartable sequence executed from a non-registered + * thread will always fail. + */ +int rseq_register_current_thread(void); + +/* + * Unregister rseq for current thread. + */ +int rseq_unregister_current_thread(void); + +/* + * Restartable sequence fallback for reading the current CPU number. + */ +int32_t rseq_fallback_current_cpu(void); + +/* + * Values returned can be either the current CPU number, -1 (rseq is + * uninitialized), or -2 (rseq initialization has failed). + */ +static inline int32_t rseq_current_cpu_raw(void) +{ + return RSEQ_ACCESS_ONCE(__rseq_abi.cpu_id); +} + +/* + * Returns a possible CPU number, which is typically the current CPU. + * The returned CPU number can be used to prepare for an rseq critical + * section, which will confirm whether the cpu number is indeed the + * current one, and whether rseq is initialized. + * + * The CPU number returned by rseq_cpu_start should always be validated + * by passing it to a rseq asm sequence, or by comparing it to the + * return value of rseq_current_cpu_raw() if the rseq asm sequence + * does not need to be invoked. + */ +static inline uint32_t rseq_cpu_start(void) +{ + return RSEQ_ACCESS_ONCE(__rseq_abi.cpu_id_start); +} + +static inline uint32_t rseq_current_cpu(void) +{ + int32_t cpu; + + cpu = rseq_current_cpu_raw(); + if (rseq_unlikely(cpu < 0)) + cpu = rseq_fallback_current_cpu(); + return cpu; +} + +/* + * rseq_prepare_unload() should be invoked by each thread using rseq_finish*() + * at least once between their last rseq_finish*() and library unload of the + * library defining the rseq critical section (struct rseq_cs). This also + * applies to use of rseq in code generated by JIT: rseq_prepare_unload() + * should be invoked at least once by each thread using rseq_finish*() before + * reclaim of the memory holding the struct rseq_cs. + */ +static inline void rseq_prepare_unload(void) +{ + __rseq_abi.rseq_cs = 0; +} + +#endif /* RSEQ_H_ */ -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 11/14] rseq: selftests: Provide basic test 2018-04-30 22:44 [RFC PATCH for 4.18 00/14] Restartable Sequences Mathieu Desnoyers 2018-04-30 22:44 ` [PATCH 01/14] uapi headers: Provide types_32_64.h (v2) Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers @ 2018-04-30 22:44 ` mathieu.desnoyers 2018-04-30 22:44 ` [PATCH 04/14] arm: Wire up restartable sequences system call Mathieu Desnoyers ` (11 subsequent siblings) 14 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers, Shuah Khan, linux-kselftest "basic_test" only asserts that RSEQ works moderately correctly. E.g. that the CPUID pointer works. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Shuah Khan <shuahkh@osg.samsung.com> CC: Russell King <linux@arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas@arm.com> CC: Will Deacon <will.deacon@arm.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Paul Turner <pjt@google.com> CC: Andrew Hunter <ahh@google.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Andi Kleen <andi@firstfloor.org> CC: Dave Watson <davejwatson@fb.com> CC: Chris Lameter <cl@linux.com> CC: Ingo Molnar <mingo@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Ben Maurer <bmaurer@fb.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Boqun Feng <boqun.feng@gmail.com> CC: linux-kselftest@vger.kernel.org CC: linux-api@vger.kernel.org --- tools/testing/selftests/rseq/basic_test.c | 56 +++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) create mode 100644 tools/testing/selftests/rseq/basic_test.c diff --git a/tools/testing/selftests/rseq/basic_test.c b/tools/testing/selftests/rseq/basic_test.c new file mode 100644 index 000000000000..d8efbfb89193 --- /dev/null +++ b/tools/testing/selftests/rseq/basic_test.c @@ -0,0 +1,56 @@ +// SPDX-License-Identifier: LGPL-2.1 +/* + * Basic test coverage for critical regions and rseq_current_cpu(). + */ + +#define _GNU_SOURCE +#include <assert.h> +#include <sched.h> +#include <signal.h> +#include <stdio.h> +#include <string.h> +#include <sys/time.h> + +#include "rseq.h" + +void test_cpu_pointer(void) +{ + cpu_set_t affinity, test_affinity; + int i; + + sched_getaffinity(0, sizeof(affinity), &affinity); + CPU_ZERO(&test_affinity); + for (i = 0; i < CPU_SETSIZE; i++) { + if (CPU_ISSET(i, &affinity)) { + CPU_SET(i, &test_affinity); + sched_setaffinity(0, sizeof(test_affinity), + &test_affinity); + assert(sched_getcpu() == i); + assert(rseq_current_cpu() == i); + assert(rseq_current_cpu_raw() == i); + assert(rseq_cpu_start() == i); + CPU_CLR(i, &test_affinity); + } + } + sched_setaffinity(0, sizeof(affinity), &affinity); +} + +int main(int argc, char **argv) +{ + if (rseq_register_current_thread()) { + fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + goto init_thread_error; + } + printf("testing current cpu\n"); + test_cpu_pointer(); + if (rseq_unregister_current_thread()) { + fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + goto init_thread_error; + } + return 0; + +init_thread_error: + return -1; +} -- 2.11.0 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 11/14] rseq: selftests: Provide basic test @ 2018-04-30 22:44 ` mathieu.desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers, Shuah Khan "basic_test" only asserts that RSEQ works moderately correctly. E.g. that the CPUID pointer works. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Shuah Khan <shuahkh@osg.samsung.com> CC: Russell King <linux@arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas@arm.com> CC: Will Deacon <will.deacon@arm.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Paul Turner <pjt@google.com> CC: Andrew Hunter <ahh@google.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Andi Kleen <andi@firstfloor.org> CC: Dave Watson <davejwatson@fb.com> CC: Chris Lameter <cl@linux.com> CC: Ingo Molnar <mingo@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Ben Maurer <bmaurer@fb.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Boqun Feng <boqun.feng@gmail.com> CC: linux-kselftest@vger.kernel.org CC: linux-api@vger.kernel.org --- tools/testing/selftests/rseq/basic_test.c | 56 +++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) create mode 100644 tools/testing/selftests/rseq/basic_test.c diff --git a/tools/testing/selftests/rseq/basic_test.c b/tools/testing/selftests/rseq/basic_test.c new file mode 100644 index 000000000000..d8efbfb89193 --- /dev/null +++ b/tools/testing/selftests/rseq/basic_test.c @@ -0,0 +1,56 @@ +// SPDX-License-Identifier: LGPL-2.1 +/* + * Basic test coverage for critical regions and rseq_current_cpu(). + */ + +#define _GNU_SOURCE +#include <assert.h> +#include <sched.h> +#include <signal.h> +#include <stdio.h> +#include <string.h> +#include <sys/time.h> + +#include "rseq.h" + +void test_cpu_pointer(void) +{ + cpu_set_t affinity, test_affinity; + int i; + + sched_getaffinity(0, sizeof(affinity), &affinity); + CPU_ZERO(&test_affinity); + for (i = 0; i < CPU_SETSIZE; i++) { + if (CPU_ISSET(i, &affinity)) { + CPU_SET(i, &test_affinity); + sched_setaffinity(0, sizeof(test_affinity), + &test_affinity); + assert(sched_getcpu() == i); + assert(rseq_current_cpu() == i); + assert(rseq_current_cpu_raw() == i); + assert(rseq_cpu_start() == i); + CPU_CLR(i, &test_affinity); + } + } + sched_setaffinity(0, sizeof(affinity), &affinity); +} + +int main(int argc, char **argv) +{ + if (rseq_register_current_thread()) { + fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + goto init_thread_error; + } + printf("testing current cpu\n"); + test_cpu_pointer(); + if (rseq_unregister_current_thread()) { + fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + goto init_thread_error; + } + return 0; + +init_thread_error: + return -1; +} -- 2.11.0 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 11/14] rseq: selftests: Provide basic test @ 2018-04-30 22:44 ` mathieu.desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) "basic_test" only asserts that RSEQ works moderately correctly. E.g. that the CPUID pointer works. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com> CC: Shuah Khan <shuahkh at osg.samsung.com> CC: Russell King <linux at arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas at arm.com> CC: Will Deacon <will.deacon at arm.com> CC: Thomas Gleixner <tglx at linutronix.de> CC: Paul Turner <pjt at google.com> CC: Andrew Hunter <ahh at google.com> CC: Peter Zijlstra <peterz at infradead.org> CC: Andy Lutomirski <luto at amacapital.net> CC: Andi Kleen <andi at firstfloor.org> CC: Dave Watson <davejwatson at fb.com> CC: Chris Lameter <cl at linux.com> CC: Ingo Molnar <mingo at redhat.com> CC: "H. Peter Anvin" <hpa at zytor.com> CC: Ben Maurer <bmaurer at fb.com> CC: Steven Rostedt <rostedt at goodmis.org> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com> CC: Josh Triplett <josh at joshtriplett.org> CC: Linus Torvalds <torvalds at linux-foundation.org> CC: Andrew Morton <akpm at linux-foundation.org> CC: Boqun Feng <boqun.feng at gmail.com> CC: linux-kselftest at vger.kernel.org CC: linux-api at vger.kernel.org --- tools/testing/selftests/rseq/basic_test.c | 56 +++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) create mode 100644 tools/testing/selftests/rseq/basic_test.c diff --git a/tools/testing/selftests/rseq/basic_test.c b/tools/testing/selftests/rseq/basic_test.c new file mode 100644 index 000000000000..d8efbfb89193 --- /dev/null +++ b/tools/testing/selftests/rseq/basic_test.c @@ -0,0 +1,56 @@ +// SPDX-License-Identifier: LGPL-2.1 +/* + * Basic test coverage for critical regions and rseq_current_cpu(). + */ + +#define _GNU_SOURCE +#include <assert.h> +#include <sched.h> +#include <signal.h> +#include <stdio.h> +#include <string.h> +#include <sys/time.h> + +#include "rseq.h" + +void test_cpu_pointer(void) +{ + cpu_set_t affinity, test_affinity; + int i; + + sched_getaffinity(0, sizeof(affinity), &affinity); + CPU_ZERO(&test_affinity); + for (i = 0; i < CPU_SETSIZE; i++) { + if (CPU_ISSET(i, &affinity)) { + CPU_SET(i, &test_affinity); + sched_setaffinity(0, sizeof(test_affinity), + &test_affinity); + assert(sched_getcpu() == i); + assert(rseq_current_cpu() == i); + assert(rseq_current_cpu_raw() == i); + assert(rseq_cpu_start() == i); + CPU_CLR(i, &test_affinity); + } + } + sched_setaffinity(0, sizeof(affinity), &affinity); +} + +int main(int argc, char **argv) +{ + if (rseq_register_current_thread()) { + fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + goto init_thread_error; + } + printf("testing current cpu\n"); + test_cpu_pointer(); + if (rseq_unregister_current_thread()) { + fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + goto init_thread_error; + } + return 0; + +init_thread_error: + return -1; +} -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 11/14] rseq: selftests: Provide basic test @ 2018-04-30 22:44 ` mathieu.desnoyers 0 siblings, 0 replies; 105+ messages in thread From: mathieu.desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) "basic_test" only asserts that RSEQ works moderately correctly. E.g. that the CPUID pointer works. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com> CC: Shuah Khan <shuahkh at osg.samsung.com> CC: Russell King <linux at arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas at arm.com> CC: Will Deacon <will.deacon at arm.com> CC: Thomas Gleixner <tglx at linutronix.de> CC: Paul Turner <pjt at google.com> CC: Andrew Hunter <ahh at google.com> CC: Peter Zijlstra <peterz at infradead.org> CC: Andy Lutomirski <luto at amacapital.net> CC: Andi Kleen <andi at firstfloor.org> CC: Dave Watson <davejwatson at fb.com> CC: Chris Lameter <cl at linux.com> CC: Ingo Molnar <mingo at redhat.com> CC: "H. Peter Anvin" <hpa at zytor.com> CC: Ben Maurer <bmaurer at fb.com> CC: Steven Rostedt <rostedt at goodmis.org> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com> CC: Josh Triplett <josh at joshtriplett.org> CC: Linus Torvalds <torvalds at linux-foundation.org> CC: Andrew Morton <akpm at linux-foundation.org> CC: Boqun Feng <boqun.feng at gmail.com> CC: linux-kselftest at vger.kernel.org CC: linux-api at vger.kernel.org --- tools/testing/selftests/rseq/basic_test.c | 56 +++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) create mode 100644 tools/testing/selftests/rseq/basic_test.c diff --git a/tools/testing/selftests/rseq/basic_test.c b/tools/testing/selftests/rseq/basic_test.c new file mode 100644 index 000000000000..d8efbfb89193 --- /dev/null +++ b/tools/testing/selftests/rseq/basic_test.c @@ -0,0 +1,56 @@ +// SPDX-License-Identifier: LGPL-2.1 +/* + * Basic test coverage for critical regions and rseq_current_cpu(). + */ + +#define _GNU_SOURCE +#include <assert.h> +#include <sched.h> +#include <signal.h> +#include <stdio.h> +#include <string.h> +#include <sys/time.h> + +#include "rseq.h" + +void test_cpu_pointer(void) +{ + cpu_set_t affinity, test_affinity; + int i; + + sched_getaffinity(0, sizeof(affinity), &affinity); + CPU_ZERO(&test_affinity); + for (i = 0; i < CPU_SETSIZE; i++) { + if (CPU_ISSET(i, &affinity)) { + CPU_SET(i, &test_affinity); + sched_setaffinity(0, sizeof(test_affinity), + &test_affinity); + assert(sched_getcpu() == i); + assert(rseq_current_cpu() == i); + assert(rseq_current_cpu_raw() == i); + assert(rseq_cpu_start() == i); + CPU_CLR(i, &test_affinity); + } + } + sched_setaffinity(0, sizeof(affinity), &affinity); +} + +int main(int argc, char **argv) +{ + if (rseq_register_current_thread()) { + fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + goto init_thread_error; + } + printf("testing current cpu\n"); + test_cpu_pointer(); + if (rseq_unregister_current_thread()) { + fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + goto init_thread_error; + } + return 0; + +init_thread_error: + return -1; +} -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 12/14] rseq: selftests: Provide basic percpu ops test (v2) 2018-04-30 22:44 [RFC PATCH for 4.18 00/14] Restartable Sequences Mathieu Desnoyers 2018-04-30 22:44 ` [PATCH 01/14] uapi headers: Provide types_32_64.h (v2) Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers @ 2018-04-30 22:44 ` mathieu.desnoyers 2018-04-30 22:44 ` [PATCH 04/14] arm: Wire up restartable sequences system call Mathieu Desnoyers ` (11 subsequent siblings) 14 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers, Shuah Khan, linux-kselftest "basic_percpu_ops_test" is a slightly more "realistic" variant, implementing a few simple per-cpu operations and testing their correctness. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Shuah Khan <shuahkh@osg.samsung.com> CC: Russell King <linux@arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas@arm.com> CC: Will Deacon <will.deacon@arm.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Paul Turner <pjt@google.com> CC: Andrew Hunter <ahh@google.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Andi Kleen <andi@firstfloor.org> CC: Dave Watson <davejwatson@fb.com> CC: Chris Lameter <cl@linux.com> CC: Ingo Molnar <mingo@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Ben Maurer <bmaurer@fb.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Boqun Feng <boqun.feng@gmail.com> CC: linux-kselftest@vger.kernel.org CC: linux-api@vger.kernel.org --- Changes since v1: - Use only rseq, remove use of cpu_opv system call. --- .../testing/selftests/rseq/basic_percpu_ops_test.c | 313 +++++++++++++++++++++ 1 file changed, 313 insertions(+) create mode 100644 tools/testing/selftests/rseq/basic_percpu_ops_test.c diff --git a/tools/testing/selftests/rseq/basic_percpu_ops_test.c b/tools/testing/selftests/rseq/basic_percpu_ops_test.c new file mode 100644 index 000000000000..96ef27905879 --- /dev/null +++ b/tools/testing/selftests/rseq/basic_percpu_ops_test.c @@ -0,0 +1,313 @@ +// SPDX-License-Identifier: LGPL-2.1 +#define _GNU_SOURCE +#include <assert.h> +#include <pthread.h> +#include <sched.h> +#include <stdint.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <stddef.h> + +#include "rseq.h" + +#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])) + +struct percpu_lock_entry { + intptr_t v; +} __attribute__((aligned(128))); + +struct percpu_lock { + struct percpu_lock_entry c[CPU_SETSIZE]; +}; + +struct test_data_entry { + intptr_t count; +} __attribute__((aligned(128))); + +struct spinlock_test_data { + struct percpu_lock lock; + struct test_data_entry c[CPU_SETSIZE]; + int reps; +}; + +struct percpu_list_node { + intptr_t data; + struct percpu_list_node *next; +}; + +struct percpu_list_entry { + struct percpu_list_node *head; +} __attribute__((aligned(128))); + +struct percpu_list { + struct percpu_list_entry c[CPU_SETSIZE]; +}; + +/* A simple percpu spinlock. Returns the cpu lock was acquired on. */ +int rseq_this_cpu_lock(struct percpu_lock *lock) +{ + int cpu; + + for (;;) { + int ret; + + cpu = rseq_cpu_start(); + ret = rseq_cmpeqv_storev(&lock->c[cpu].v, + 0, 1, cpu); + if (rseq_likely(!ret)) + break; + /* Retry if comparison fails or rseq aborts. */ + } + /* + * Acquire semantic when taking lock after control dependency. + * Matches rseq_smp_store_release(). + */ + rseq_smp_acquire__after_ctrl_dep(); + return cpu; +} + +void rseq_percpu_unlock(struct percpu_lock *lock, int cpu) +{ + assert(lock->c[cpu].v == 1); + /* + * Release lock, with release semantic. Matches + * rseq_smp_acquire__after_ctrl_dep(). + */ + rseq_smp_store_release(&lock->c[cpu].v, 0); +} + +void *test_percpu_spinlock_thread(void *arg) +{ + struct spinlock_test_data *data = arg; + int i, cpu; + + if (rseq_register_current_thread()) { + fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + abort(); + } + for (i = 0; i < data->reps; i++) { + cpu = rseq_this_cpu_lock(&data->lock); + data->c[cpu].count++; + rseq_percpu_unlock(&data->lock, cpu); + } + if (rseq_unregister_current_thread()) { + fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + abort(); + } + + return NULL; +} + +/* + * A simple test which implements a sharded counter using a per-cpu + * lock. Obviously real applications might prefer to simply use a + * per-cpu increment; however, this is reasonable for a test and the + * lock can be extended to synchronize more complicated operations. + */ +void test_percpu_spinlock(void) +{ + const int num_threads = 200; + int i; + uint64_t sum; + pthread_t test_threads[num_threads]; + struct spinlock_test_data data; + + memset(&data, 0, sizeof(data)); + data.reps = 5000; + + for (i = 0; i < num_threads; i++) + pthread_create(&test_threads[i], NULL, + test_percpu_spinlock_thread, &data); + + for (i = 0; i < num_threads; i++) + pthread_join(test_threads[i], NULL); + + sum = 0; + for (i = 0; i < CPU_SETSIZE; i++) + sum += data.c[i].count; + + assert(sum == (uint64_t)data.reps * num_threads); +} + +void this_cpu_list_push(struct percpu_list *list, + struct percpu_list_node *node, + int *_cpu) +{ + int cpu; + + for (;;) { + intptr_t *targetptr, newval, expect; + int ret; + + cpu = rseq_cpu_start(); + /* Load list->c[cpu].head with single-copy atomicity. */ + expect = (intptr_t)RSEQ_READ_ONCE(list->c[cpu].head); + newval = (intptr_t)node; + targetptr = (intptr_t *)&list->c[cpu].head; + node->next = (struct percpu_list_node *)expect; + ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu); + if (rseq_likely(!ret)) + break; + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; +} + +/* + * Unlike a traditional lock-less linked list; the availability of a + * rseq primitive allows us to implement pop without concerns over + * ABA-type races. + */ +struct percpu_list_node *this_cpu_list_pop(struct percpu_list *list, + int *_cpu) +{ + for (;;) { + struct percpu_list_node *head; + intptr_t *targetptr, expectnot, *load; + off_t offset; + int ret, cpu; + + cpu = rseq_cpu_start(); + targetptr = (intptr_t *)&list->c[cpu].head; + expectnot = (intptr_t)NULL; + offset = offsetof(struct percpu_list_node, next); + load = (intptr_t *)&head; + ret = rseq_cmpnev_storeoffp_load(targetptr, expectnot, + offset, load, cpu); + if (rseq_likely(!ret)) { + if (_cpu) + *_cpu = cpu; + return head; + } + if (ret > 0) + return NULL; + /* Retry if rseq aborts. */ + } +} + +/* + * __percpu_list_pop is not safe against concurrent accesses. Should + * only be used on lists that are not concurrently modified. + */ +struct percpu_list_node *__percpu_list_pop(struct percpu_list *list, int cpu) +{ + struct percpu_list_node *node; + + node = list->c[cpu].head; + if (!node) + return NULL; + list->c[cpu].head = node->next; + return node; +} + +void *test_percpu_list_thread(void *arg) +{ + int i; + struct percpu_list *list = (struct percpu_list *)arg; + + if (rseq_register_current_thread()) { + fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + abort(); + } + + for (i = 0; i < 100000; i++) { + struct percpu_list_node *node; + + node = this_cpu_list_pop(list, NULL); + sched_yield(); /* encourage shuffling */ + if (node) + this_cpu_list_push(list, node, NULL); + } + + if (rseq_unregister_current_thread()) { + fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + abort(); + } + + return NULL; +} + +/* Simultaneous modification to a per-cpu linked list from many threads. */ +void test_percpu_list(void) +{ + int i, j; + uint64_t sum = 0, expected_sum = 0; + struct percpu_list list; + pthread_t test_threads[200]; + cpu_set_t allowed_cpus; + + memset(&list, 0, sizeof(list)); + + /* Generate list entries for every usable cpu. */ + sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus); + for (i = 0; i < CPU_SETSIZE; i++) { + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + for (j = 1; j <= 100; j++) { + struct percpu_list_node *node; + + expected_sum += j; + + node = malloc(sizeof(*node)); + assert(node); + node->data = j; + node->next = list.c[i].head; + list.c[i].head = node; + } + } + + for (i = 0; i < 200; i++) + pthread_create(&test_threads[i], NULL, + test_percpu_list_thread, &list); + + for (i = 0; i < 200; i++) + pthread_join(test_threads[i], NULL); + + for (i = 0; i < CPU_SETSIZE; i++) { + struct percpu_list_node *node; + + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + + while ((node = __percpu_list_pop(&list, i))) { + sum += node->data; + free(node); + } + } + + /* + * All entries should now be accounted for (unless some external + * actor is interfering with our allowed affinity while this + * test is running). + */ + assert(sum == expected_sum); +} + +int main(int argc, char **argv) +{ + if (rseq_register_current_thread()) { + fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + goto error; + } + printf("spinlock\n"); + test_percpu_spinlock(); + printf("percpu_list\n"); + test_percpu_list(); + if (rseq_unregister_current_thread()) { + fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + goto error; + } + return 0; + +error: + return -1; +} + -- 2.11.0 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 12/14] rseq: selftests: Provide basic percpu ops test (v2) @ 2018-04-30 22:44 ` mathieu.desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers, Shuah Khan "basic_percpu_ops_test" is a slightly more "realistic" variant, implementing a few simple per-cpu operations and testing their correctness. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Shuah Khan <shuahkh@osg.samsung.com> CC: Russell King <linux@arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas@arm.com> CC: Will Deacon <will.deacon@arm.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Paul Turner <pjt@google.com> CC: Andrew Hunter <ahh@google.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Andi Kleen <andi@firstfloor.org> CC: Dave Watson <davejwatson@fb.com> CC: Chris Lameter <cl@linux.com> CC: Ingo Molnar <mingo@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Ben Maurer <bmaurer@fb.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Boqun Feng <boqun.feng@gmail.com> CC: linux-kselftest@vger.kernel.org CC: linux-api@vger.kernel.org --- Changes since v1: - Use only rseq, remove use of cpu_opv system call. --- .../testing/selftests/rseq/basic_percpu_ops_test.c | 313 +++++++++++++++++++++ 1 file changed, 313 insertions(+) create mode 100644 tools/testing/selftests/rseq/basic_percpu_ops_test.c diff --git a/tools/testing/selftests/rseq/basic_percpu_ops_test.c b/tools/testing/selftests/rseq/basic_percpu_ops_test.c new file mode 100644 index 000000000000..96ef27905879 --- /dev/null +++ b/tools/testing/selftests/rseq/basic_percpu_ops_test.c @@ -0,0 +1,313 @@ +// SPDX-License-Identifier: LGPL-2.1 +#define _GNU_SOURCE +#include <assert.h> +#include <pthread.h> +#include <sched.h> +#include <stdint.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <stddef.h> + +#include "rseq.h" + +#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])) + +struct percpu_lock_entry { + intptr_t v; +} __attribute__((aligned(128))); + +struct percpu_lock { + struct percpu_lock_entry c[CPU_SETSIZE]; +}; + +struct test_data_entry { + intptr_t count; +} __attribute__((aligned(128))); + +struct spinlock_test_data { + struct percpu_lock lock; + struct test_data_entry c[CPU_SETSIZE]; + int reps; +}; + +struct percpu_list_node { + intptr_t data; + struct percpu_list_node *next; +}; + +struct percpu_list_entry { + struct percpu_list_node *head; +} __attribute__((aligned(128))); + +struct percpu_list { + struct percpu_list_entry c[CPU_SETSIZE]; +}; + +/* A simple percpu spinlock. Returns the cpu lock was acquired on. */ +int rseq_this_cpu_lock(struct percpu_lock *lock) +{ + int cpu; + + for (;;) { + int ret; + + cpu = rseq_cpu_start(); + ret = rseq_cmpeqv_storev(&lock->c[cpu].v, + 0, 1, cpu); + if (rseq_likely(!ret)) + break; + /* Retry if comparison fails or rseq aborts. */ + } + /* + * Acquire semantic when taking lock after control dependency. + * Matches rseq_smp_store_release(). + */ + rseq_smp_acquire__after_ctrl_dep(); + return cpu; +} + +void rseq_percpu_unlock(struct percpu_lock *lock, int cpu) +{ + assert(lock->c[cpu].v == 1); + /* + * Release lock, with release semantic. Matches + * rseq_smp_acquire__after_ctrl_dep(). + */ + rseq_smp_store_release(&lock->c[cpu].v, 0); +} + +void *test_percpu_spinlock_thread(void *arg) +{ + struct spinlock_test_data *data = arg; + int i, cpu; + + if (rseq_register_current_thread()) { + fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + abort(); + } + for (i = 0; i < data->reps; i++) { + cpu = rseq_this_cpu_lock(&data->lock); + data->c[cpu].count++; + rseq_percpu_unlock(&data->lock, cpu); + } + if (rseq_unregister_current_thread()) { + fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + abort(); + } + + return NULL; +} + +/* + * A simple test which implements a sharded counter using a per-cpu + * lock. Obviously real applications might prefer to simply use a + * per-cpu increment; however, this is reasonable for a test and the + * lock can be extended to synchronize more complicated operations. + */ +void test_percpu_spinlock(void) +{ + const int num_threads = 200; + int i; + uint64_t sum; + pthread_t test_threads[num_threads]; + struct spinlock_test_data data; + + memset(&data, 0, sizeof(data)); + data.reps = 5000; + + for (i = 0; i < num_threads; i++) + pthread_create(&test_threads[i], NULL, + test_percpu_spinlock_thread, &data); + + for (i = 0; i < num_threads; i++) + pthread_join(test_threads[i], NULL); + + sum = 0; + for (i = 0; i < CPU_SETSIZE; i++) + sum += data.c[i].count; + + assert(sum == (uint64_t)data.reps * num_threads); +} + +void this_cpu_list_push(struct percpu_list *list, + struct percpu_list_node *node, + int *_cpu) +{ + int cpu; + + for (;;) { + intptr_t *targetptr, newval, expect; + int ret; + + cpu = rseq_cpu_start(); + /* Load list->c[cpu].head with single-copy atomicity. */ + expect = (intptr_t)RSEQ_READ_ONCE(list->c[cpu].head); + newval = (intptr_t)node; + targetptr = (intptr_t *)&list->c[cpu].head; + node->next = (struct percpu_list_node *)expect; + ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu); + if (rseq_likely(!ret)) + break; + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; +} + +/* + * Unlike a traditional lock-less linked list; the availability of a + * rseq primitive allows us to implement pop without concerns over + * ABA-type races. + */ +struct percpu_list_node *this_cpu_list_pop(struct percpu_list *list, + int *_cpu) +{ + for (;;) { + struct percpu_list_node *head; + intptr_t *targetptr, expectnot, *load; + off_t offset; + int ret, cpu; + + cpu = rseq_cpu_start(); + targetptr = (intptr_t *)&list->c[cpu].head; + expectnot = (intptr_t)NULL; + offset = offsetof(struct percpu_list_node, next); + load = (intptr_t *)&head; + ret = rseq_cmpnev_storeoffp_load(targetptr, expectnot, + offset, load, cpu); + if (rseq_likely(!ret)) { + if (_cpu) + *_cpu = cpu; + return head; + } + if (ret > 0) + return NULL; + /* Retry if rseq aborts. */ + } +} + +/* + * __percpu_list_pop is not safe against concurrent accesses. Should + * only be used on lists that are not concurrently modified. + */ +struct percpu_list_node *__percpu_list_pop(struct percpu_list *list, int cpu) +{ + struct percpu_list_node *node; + + node = list->c[cpu].head; + if (!node) + return NULL; + list->c[cpu].head = node->next; + return node; +} + +void *test_percpu_list_thread(void *arg) +{ + int i; + struct percpu_list *list = (struct percpu_list *)arg; + + if (rseq_register_current_thread()) { + fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + abort(); + } + + for (i = 0; i < 100000; i++) { + struct percpu_list_node *node; + + node = this_cpu_list_pop(list, NULL); + sched_yield(); /* encourage shuffling */ + if (node) + this_cpu_list_push(list, node, NULL); + } + + if (rseq_unregister_current_thread()) { + fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + abort(); + } + + return NULL; +} + +/* Simultaneous modification to a per-cpu linked list from many threads. */ +void test_percpu_list(void) +{ + int i, j; + uint64_t sum = 0, expected_sum = 0; + struct percpu_list list; + pthread_t test_threads[200]; + cpu_set_t allowed_cpus; + + memset(&list, 0, sizeof(list)); + + /* Generate list entries for every usable cpu. */ + sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus); + for (i = 0; i < CPU_SETSIZE; i++) { + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + for (j = 1; j <= 100; j++) { + struct percpu_list_node *node; + + expected_sum += j; + + node = malloc(sizeof(*node)); + assert(node); + node->data = j; + node->next = list.c[i].head; + list.c[i].head = node; + } + } + + for (i = 0; i < 200; i++) + pthread_create(&test_threads[i], NULL, + test_percpu_list_thread, &list); + + for (i = 0; i < 200; i++) + pthread_join(test_threads[i], NULL); + + for (i = 0; i < CPU_SETSIZE; i++) { + struct percpu_list_node *node; + + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + + while ((node = __percpu_list_pop(&list, i))) { + sum += node->data; + free(node); + } + } + + /* + * All entries should now be accounted for (unless some external + * actor is interfering with our allowed affinity while this + * test is running). + */ + assert(sum == expected_sum); +} + +int main(int argc, char **argv) +{ + if (rseq_register_current_thread()) { + fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + goto error; + } + printf("spinlock\n"); + test_percpu_spinlock(); + printf("percpu_list\n"); + test_percpu_list(); + if (rseq_unregister_current_thread()) { + fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + goto error; + } + return 0; + +error: + return -1; +} + -- 2.11.0 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 12/14] rseq: selftests: Provide basic percpu ops test (v2) @ 2018-04-30 22:44 ` mathieu.desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) "basic_percpu_ops_test" is a slightly more "realistic" variant, implementing a few simple per-cpu operations and testing their correctness. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com> CC: Shuah Khan <shuahkh at osg.samsung.com> CC: Russell King <linux at arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas at arm.com> CC: Will Deacon <will.deacon at arm.com> CC: Thomas Gleixner <tglx at linutronix.de> CC: Paul Turner <pjt at google.com> CC: Andrew Hunter <ahh at google.com> CC: Peter Zijlstra <peterz at infradead.org> CC: Andy Lutomirski <luto at amacapital.net> CC: Andi Kleen <andi at firstfloor.org> CC: Dave Watson <davejwatson at fb.com> CC: Chris Lameter <cl at linux.com> CC: Ingo Molnar <mingo at redhat.com> CC: "H. Peter Anvin" <hpa at zytor.com> CC: Ben Maurer <bmaurer at fb.com> CC: Steven Rostedt <rostedt at goodmis.org> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com> CC: Josh Triplett <josh at joshtriplett.org> CC: Linus Torvalds <torvalds at linux-foundation.org> CC: Andrew Morton <akpm at linux-foundation.org> CC: Boqun Feng <boqun.feng at gmail.com> CC: linux-kselftest at vger.kernel.org CC: linux-api at vger.kernel.org --- Changes since v1: - Use only rseq, remove use of cpu_opv system call. --- .../testing/selftests/rseq/basic_percpu_ops_test.c | 313 +++++++++++++++++++++ 1 file changed, 313 insertions(+) create mode 100644 tools/testing/selftests/rseq/basic_percpu_ops_test.c diff --git a/tools/testing/selftests/rseq/basic_percpu_ops_test.c b/tools/testing/selftests/rseq/basic_percpu_ops_test.c new file mode 100644 index 000000000000..96ef27905879 --- /dev/null +++ b/tools/testing/selftests/rseq/basic_percpu_ops_test.c @@ -0,0 +1,313 @@ +// SPDX-License-Identifier: LGPL-2.1 +#define _GNU_SOURCE +#include <assert.h> +#include <pthread.h> +#include <sched.h> +#include <stdint.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <stddef.h> + +#include "rseq.h" + +#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])) + +struct percpu_lock_entry { + intptr_t v; +} __attribute__((aligned(128))); + +struct percpu_lock { + struct percpu_lock_entry c[CPU_SETSIZE]; +}; + +struct test_data_entry { + intptr_t count; +} __attribute__((aligned(128))); + +struct spinlock_test_data { + struct percpu_lock lock; + struct test_data_entry c[CPU_SETSIZE]; + int reps; +}; + +struct percpu_list_node { + intptr_t data; + struct percpu_list_node *next; +}; + +struct percpu_list_entry { + struct percpu_list_node *head; +} __attribute__((aligned(128))); + +struct percpu_list { + struct percpu_list_entry c[CPU_SETSIZE]; +}; + +/* A simple percpu spinlock. Returns the cpu lock was acquired on. */ +int rseq_this_cpu_lock(struct percpu_lock *lock) +{ + int cpu; + + for (;;) { + int ret; + + cpu = rseq_cpu_start(); + ret = rseq_cmpeqv_storev(&lock->c[cpu].v, + 0, 1, cpu); + if (rseq_likely(!ret)) + break; + /* Retry if comparison fails or rseq aborts. */ + } + /* + * Acquire semantic when taking lock after control dependency. + * Matches rseq_smp_store_release(). + */ + rseq_smp_acquire__after_ctrl_dep(); + return cpu; +} + +void rseq_percpu_unlock(struct percpu_lock *lock, int cpu) +{ + assert(lock->c[cpu].v == 1); + /* + * Release lock, with release semantic. Matches + * rseq_smp_acquire__after_ctrl_dep(). + */ + rseq_smp_store_release(&lock->c[cpu].v, 0); +} + +void *test_percpu_spinlock_thread(void *arg) +{ + struct spinlock_test_data *data = arg; + int i, cpu; + + if (rseq_register_current_thread()) { + fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + abort(); + } + for (i = 0; i < data->reps; i++) { + cpu = rseq_this_cpu_lock(&data->lock); + data->c[cpu].count++; + rseq_percpu_unlock(&data->lock, cpu); + } + if (rseq_unregister_current_thread()) { + fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + abort(); + } + + return NULL; +} + +/* + * A simple test which implements a sharded counter using a per-cpu + * lock. Obviously real applications might prefer to simply use a + * per-cpu increment; however, this is reasonable for a test and the + * lock can be extended to synchronize more complicated operations. + */ +void test_percpu_spinlock(void) +{ + const int num_threads = 200; + int i; + uint64_t sum; + pthread_t test_threads[num_threads]; + struct spinlock_test_data data; + + memset(&data, 0, sizeof(data)); + data.reps = 5000; + + for (i = 0; i < num_threads; i++) + pthread_create(&test_threads[i], NULL, + test_percpu_spinlock_thread, &data); + + for (i = 0; i < num_threads; i++) + pthread_join(test_threads[i], NULL); + + sum = 0; + for (i = 0; i < CPU_SETSIZE; i++) + sum += data.c[i].count; + + assert(sum == (uint64_t)data.reps * num_threads); +} + +void this_cpu_list_push(struct percpu_list *list, + struct percpu_list_node *node, + int *_cpu) +{ + int cpu; + + for (;;) { + intptr_t *targetptr, newval, expect; + int ret; + + cpu = rseq_cpu_start(); + /* Load list->c[cpu].head with single-copy atomicity. */ + expect = (intptr_t)RSEQ_READ_ONCE(list->c[cpu].head); + newval = (intptr_t)node; + targetptr = (intptr_t *)&list->c[cpu].head; + node->next = (struct percpu_list_node *)expect; + ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu); + if (rseq_likely(!ret)) + break; + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; +} + +/* + * Unlike a traditional lock-less linked list; the availability of a + * rseq primitive allows us to implement pop without concerns over + * ABA-type races. + */ +struct percpu_list_node *this_cpu_list_pop(struct percpu_list *list, + int *_cpu) +{ + for (;;) { + struct percpu_list_node *head; + intptr_t *targetptr, expectnot, *load; + off_t offset; + int ret, cpu; + + cpu = rseq_cpu_start(); + targetptr = (intptr_t *)&list->c[cpu].head; + expectnot = (intptr_t)NULL; + offset = offsetof(struct percpu_list_node, next); + load = (intptr_t *)&head; + ret = rseq_cmpnev_storeoffp_load(targetptr, expectnot, + offset, load, cpu); + if (rseq_likely(!ret)) { + if (_cpu) + *_cpu = cpu; + return head; + } + if (ret > 0) + return NULL; + /* Retry if rseq aborts. */ + } +} + +/* + * __percpu_list_pop is not safe against concurrent accesses. Should + * only be used on lists that are not concurrently modified. + */ +struct percpu_list_node *__percpu_list_pop(struct percpu_list *list, int cpu) +{ + struct percpu_list_node *node; + + node = list->c[cpu].head; + if (!node) + return NULL; + list->c[cpu].head = node->next; + return node; +} + +void *test_percpu_list_thread(void *arg) +{ + int i; + struct percpu_list *list = (struct percpu_list *)arg; + + if (rseq_register_current_thread()) { + fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + abort(); + } + + for (i = 0; i < 100000; i++) { + struct percpu_list_node *node; + + node = this_cpu_list_pop(list, NULL); + sched_yield(); /* encourage shuffling */ + if (node) + this_cpu_list_push(list, node, NULL); + } + + if (rseq_unregister_current_thread()) { + fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + abort(); + } + + return NULL; +} + +/* Simultaneous modification to a per-cpu linked list from many threads. */ +void test_percpu_list(void) +{ + int i, j; + uint64_t sum = 0, expected_sum = 0; + struct percpu_list list; + pthread_t test_threads[200]; + cpu_set_t allowed_cpus; + + memset(&list, 0, sizeof(list)); + + /* Generate list entries for every usable cpu. */ + sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus); + for (i = 0; i < CPU_SETSIZE; i++) { + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + for (j = 1; j <= 100; j++) { + struct percpu_list_node *node; + + expected_sum += j; + + node = malloc(sizeof(*node)); + assert(node); + node->data = j; + node->next = list.c[i].head; + list.c[i].head = node; + } + } + + for (i = 0; i < 200; i++) + pthread_create(&test_threads[i], NULL, + test_percpu_list_thread, &list); + + for (i = 0; i < 200; i++) + pthread_join(test_threads[i], NULL); + + for (i = 0; i < CPU_SETSIZE; i++) { + struct percpu_list_node *node; + + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + + while ((node = __percpu_list_pop(&list, i))) { + sum += node->data; + free(node); + } + } + + /* + * All entries should now be accounted for (unless some external + * actor is interfering with our allowed affinity while this + * test is running). + */ + assert(sum == expected_sum); +} + +int main(int argc, char **argv) +{ + if (rseq_register_current_thread()) { + fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + goto error; + } + printf("spinlock\n"); + test_percpu_spinlock(); + printf("percpu_list\n"); + test_percpu_list(); + if (rseq_unregister_current_thread()) { + fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + goto error; + } + return 0; + +error: + return -1; +} + -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 12/14] rseq: selftests: Provide basic percpu ops test (v2) @ 2018-04-30 22:44 ` mathieu.desnoyers 0 siblings, 0 replies; 105+ messages in thread From: mathieu.desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) "basic_percpu_ops_test" is a slightly more "realistic" variant, implementing a few simple per-cpu operations and testing their correctness. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com> CC: Shuah Khan <shuahkh at osg.samsung.com> CC: Russell King <linux at arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas at arm.com> CC: Will Deacon <will.deacon at arm.com> CC: Thomas Gleixner <tglx at linutronix.de> CC: Paul Turner <pjt at google.com> CC: Andrew Hunter <ahh at google.com> CC: Peter Zijlstra <peterz at infradead.org> CC: Andy Lutomirski <luto at amacapital.net> CC: Andi Kleen <andi at firstfloor.org> CC: Dave Watson <davejwatson at fb.com> CC: Chris Lameter <cl at linux.com> CC: Ingo Molnar <mingo at redhat.com> CC: "H. Peter Anvin" <hpa at zytor.com> CC: Ben Maurer <bmaurer at fb.com> CC: Steven Rostedt <rostedt at goodmis.org> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com> CC: Josh Triplett <josh at joshtriplett.org> CC: Linus Torvalds <torvalds at linux-foundation.org> CC: Andrew Morton <akpm at linux-foundation.org> CC: Boqun Feng <boqun.feng at gmail.com> CC: linux-kselftest at vger.kernel.org CC: linux-api at vger.kernel.org --- Changes since v1: - Use only rseq, remove use of cpu_opv system call. --- .../testing/selftests/rseq/basic_percpu_ops_test.c | 313 +++++++++++++++++++++ 1 file changed, 313 insertions(+) create mode 100644 tools/testing/selftests/rseq/basic_percpu_ops_test.c diff --git a/tools/testing/selftests/rseq/basic_percpu_ops_test.c b/tools/testing/selftests/rseq/basic_percpu_ops_test.c new file mode 100644 index 000000000000..96ef27905879 --- /dev/null +++ b/tools/testing/selftests/rseq/basic_percpu_ops_test.c @@ -0,0 +1,313 @@ +// SPDX-License-Identifier: LGPL-2.1 +#define _GNU_SOURCE +#include <assert.h> +#include <pthread.h> +#include <sched.h> +#include <stdint.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <stddef.h> + +#include "rseq.h" + +#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])) + +struct percpu_lock_entry { + intptr_t v; +} __attribute__((aligned(128))); + +struct percpu_lock { + struct percpu_lock_entry c[CPU_SETSIZE]; +}; + +struct test_data_entry { + intptr_t count; +} __attribute__((aligned(128))); + +struct spinlock_test_data { + struct percpu_lock lock; + struct test_data_entry c[CPU_SETSIZE]; + int reps; +}; + +struct percpu_list_node { + intptr_t data; + struct percpu_list_node *next; +}; + +struct percpu_list_entry { + struct percpu_list_node *head; +} __attribute__((aligned(128))); + +struct percpu_list { + struct percpu_list_entry c[CPU_SETSIZE]; +}; + +/* A simple percpu spinlock. Returns the cpu lock was acquired on. */ +int rseq_this_cpu_lock(struct percpu_lock *lock) +{ + int cpu; + + for (;;) { + int ret; + + cpu = rseq_cpu_start(); + ret = rseq_cmpeqv_storev(&lock->c[cpu].v, + 0, 1, cpu); + if (rseq_likely(!ret)) + break; + /* Retry if comparison fails or rseq aborts. */ + } + /* + * Acquire semantic when taking lock after control dependency. + * Matches rseq_smp_store_release(). + */ + rseq_smp_acquire__after_ctrl_dep(); + return cpu; +} + +void rseq_percpu_unlock(struct percpu_lock *lock, int cpu) +{ + assert(lock->c[cpu].v == 1); + /* + * Release lock, with release semantic. Matches + * rseq_smp_acquire__after_ctrl_dep(). + */ + rseq_smp_store_release(&lock->c[cpu].v, 0); +} + +void *test_percpu_spinlock_thread(void *arg) +{ + struct spinlock_test_data *data = arg; + int i, cpu; + + if (rseq_register_current_thread()) { + fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + abort(); + } + for (i = 0; i < data->reps; i++) { + cpu = rseq_this_cpu_lock(&data->lock); + data->c[cpu].count++; + rseq_percpu_unlock(&data->lock, cpu); + } + if (rseq_unregister_current_thread()) { + fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + abort(); + } + + return NULL; +} + +/* + * A simple test which implements a sharded counter using a per-cpu + * lock. Obviously real applications might prefer to simply use a + * per-cpu increment; however, this is reasonable for a test and the + * lock can be extended to synchronize more complicated operations. + */ +void test_percpu_spinlock(void) +{ + const int num_threads = 200; + int i; + uint64_t sum; + pthread_t test_threads[num_threads]; + struct spinlock_test_data data; + + memset(&data, 0, sizeof(data)); + data.reps = 5000; + + for (i = 0; i < num_threads; i++) + pthread_create(&test_threads[i], NULL, + test_percpu_spinlock_thread, &data); + + for (i = 0; i < num_threads; i++) + pthread_join(test_threads[i], NULL); + + sum = 0; + for (i = 0; i < CPU_SETSIZE; i++) + sum += data.c[i].count; + + assert(sum == (uint64_t)data.reps * num_threads); +} + +void this_cpu_list_push(struct percpu_list *list, + struct percpu_list_node *node, + int *_cpu) +{ + int cpu; + + for (;;) { + intptr_t *targetptr, newval, expect; + int ret; + + cpu = rseq_cpu_start(); + /* Load list->c[cpu].head with single-copy atomicity. */ + expect = (intptr_t)RSEQ_READ_ONCE(list->c[cpu].head); + newval = (intptr_t)node; + targetptr = (intptr_t *)&list->c[cpu].head; + node->next = (struct percpu_list_node *)expect; + ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu); + if (rseq_likely(!ret)) + break; + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; +} + +/* + * Unlike a traditional lock-less linked list; the availability of a + * rseq primitive allows us to implement pop without concerns over + * ABA-type races. + */ +struct percpu_list_node *this_cpu_list_pop(struct percpu_list *list, + int *_cpu) +{ + for (;;) { + struct percpu_list_node *head; + intptr_t *targetptr, expectnot, *load; + off_t offset; + int ret, cpu; + + cpu = rseq_cpu_start(); + targetptr = (intptr_t *)&list->c[cpu].head; + expectnot = (intptr_t)NULL; + offset = offsetof(struct percpu_list_node, next); + load = (intptr_t *)&head; + ret = rseq_cmpnev_storeoffp_load(targetptr, expectnot, + offset, load, cpu); + if (rseq_likely(!ret)) { + if (_cpu) + *_cpu = cpu; + return head; + } + if (ret > 0) + return NULL; + /* Retry if rseq aborts. */ + } +} + +/* + * __percpu_list_pop is not safe against concurrent accesses. Should + * only be used on lists that are not concurrently modified. + */ +struct percpu_list_node *__percpu_list_pop(struct percpu_list *list, int cpu) +{ + struct percpu_list_node *node; + + node = list->c[cpu].head; + if (!node) + return NULL; + list->c[cpu].head = node->next; + return node; +} + +void *test_percpu_list_thread(void *arg) +{ + int i; + struct percpu_list *list = (struct percpu_list *)arg; + + if (rseq_register_current_thread()) { + fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + abort(); + } + + for (i = 0; i < 100000; i++) { + struct percpu_list_node *node; + + node = this_cpu_list_pop(list, NULL); + sched_yield(); /* encourage shuffling */ + if (node) + this_cpu_list_push(list, node, NULL); + } + + if (rseq_unregister_current_thread()) { + fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + abort(); + } + + return NULL; +} + +/* Simultaneous modification to a per-cpu linked list from many threads. */ +void test_percpu_list(void) +{ + int i, j; + uint64_t sum = 0, expected_sum = 0; + struct percpu_list list; + pthread_t test_threads[200]; + cpu_set_t allowed_cpus; + + memset(&list, 0, sizeof(list)); + + /* Generate list entries for every usable cpu. */ + sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus); + for (i = 0; i < CPU_SETSIZE; i++) { + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + for (j = 1; j <= 100; j++) { + struct percpu_list_node *node; + + expected_sum += j; + + node = malloc(sizeof(*node)); + assert(node); + node->data = j; + node->next = list.c[i].head; + list.c[i].head = node; + } + } + + for (i = 0; i < 200; i++) + pthread_create(&test_threads[i], NULL, + test_percpu_list_thread, &list); + + for (i = 0; i < 200; i++) + pthread_join(test_threads[i], NULL); + + for (i = 0; i < CPU_SETSIZE; i++) { + struct percpu_list_node *node; + + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + + while ((node = __percpu_list_pop(&list, i))) { + sum += node->data; + free(node); + } + } + + /* + * All entries should now be accounted for (unless some external + * actor is interfering with our allowed affinity while this + * test is running). + */ + assert(sum == expected_sum); +} + +int main(int argc, char **argv) +{ + if (rseq_register_current_thread()) { + fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + goto error; + } + printf("spinlock\n"); + test_percpu_spinlock(); + printf("percpu_list\n"); + test_percpu_list(); + if (rseq_unregister_current_thread()) { + fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n", + errno, strerror(errno)); + goto error; + } + return 0; + +error: + return -1; +} + -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 13/14] rseq: selftests: Provide parametrized tests (v2) 2018-04-30 22:44 [RFC PATCH for 4.18 00/14] Restartable Sequences Mathieu Desnoyers 2018-04-30 22:44 ` [PATCH 01/14] uapi headers: Provide types_32_64.h (v2) Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers @ 2018-04-30 22:44 ` mathieu.desnoyers 2018-04-30 22:44 ` [PATCH 04/14] arm: Wire up restartable sequences system call Mathieu Desnoyers ` (11 subsequent siblings) 14 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers, Shuah Khan, linux-kselftest "param_test" is a parametrizable restartable sequences test. See the "--help" output for usage. "param_test_benchmark" is the same as "param_test", but it removes testing book-keeping code to allow accurate benchmarks. "param_test_compare_twice" is the same as "param_test", but it performs each comparison within rseq critical section twice, thus validating invariants. If any of the second comparisons fails, an error message is printed and the test aborts. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Shuah Khan <shuahkh@osg.samsung.com> CC: Russell King <linux@arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas@arm.com> CC: Will Deacon <will.deacon@arm.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Paul Turner <pjt@google.com> CC: Andrew Hunter <ahh@google.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Andi Kleen <andi@firstfloor.org> CC: Dave Watson <davejwatson@fb.com> CC: Chris Lameter <cl@linux.com> CC: Ingo Molnar <mingo@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Ben Maurer <bmaurer@fb.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Boqun Feng <boqun.feng@gmail.com> CC: linux-kselftest@vger.kernel.org CC: linux-api@vger.kernel.org --- Changes since v1: - Use only rseq, remove use of cpu_opv. --- tools/testing/selftests/rseq/param_test.c | 1260 +++++++++++++++++++++++++++++ 1 file changed, 1260 insertions(+) create mode 100644 tools/testing/selftests/rseq/param_test.c diff --git a/tools/testing/selftests/rseq/param_test.c b/tools/testing/selftests/rseq/param_test.c new file mode 100644 index 000000000000..6a9f602a8718 --- /dev/null +++ b/tools/testing/selftests/rseq/param_test.c @@ -0,0 +1,1260 @@ +// SPDX-License-Identifier: LGPL-2.1 +#define _GNU_SOURCE +#include <assert.h> +#include <pthread.h> +#include <sched.h> +#include <stdint.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <syscall.h> +#include <unistd.h> +#include <poll.h> +#include <sys/types.h> +#include <signal.h> +#include <errno.h> +#include <stddef.h> + +static inline pid_t gettid(void) +{ + return syscall(__NR_gettid); +} + +#define NR_INJECT 9 +static int loop_cnt[NR_INJECT + 1]; + +static int loop_cnt_1 asm("asm_loop_cnt_1") __attribute__((used)); +static int loop_cnt_2 asm("asm_loop_cnt_2") __attribute__((used)); +static int loop_cnt_3 asm("asm_loop_cnt_3") __attribute__((used)); +static int loop_cnt_4 asm("asm_loop_cnt_4") __attribute__((used)); +static int loop_cnt_5 asm("asm_loop_cnt_5") __attribute__((used)); +static int loop_cnt_6 asm("asm_loop_cnt_6") __attribute__((used)); + +static int opt_modulo, verbose; + +static int opt_yield, opt_signal, opt_sleep, + opt_disable_rseq, opt_threads = 200, + opt_disable_mod = 0, opt_test = 's', opt_mb = 0; + +#ifndef RSEQ_SKIP_FASTPATH +static long long opt_reps = 5000; +#else +static long long opt_reps = 100; +#endif + +static __thread __attribute__((tls_model("initial-exec"))) +unsigned int signals_delivered; + +#ifndef BENCHMARK + +static __thread __attribute__((tls_model("initial-exec"), unused)) +unsigned int yield_mod_cnt, nr_abort; + +#define printf_verbose(fmt, ...) \ + do { \ + if (verbose) \ + printf(fmt, ## __VA_ARGS__); \ + } while (0) + +#if defined(__x86_64__) || defined(__i386__) + +#define INJECT_ASM_REG "eax" + +#define RSEQ_INJECT_CLOBBER \ + , INJECT_ASM_REG + +#ifdef __i386__ + +#define RSEQ_INJECT_ASM(n) \ + "mov asm_loop_cnt_" #n ", %%" INJECT_ASM_REG "\n\t" \ + "test %%" INJECT_ASM_REG ",%%" INJECT_ASM_REG "\n\t" \ + "jz 333f\n\t" \ + "222:\n\t" \ + "dec %%" INJECT_ASM_REG "\n\t" \ + "jnz 222b\n\t" \ + "333:\n\t" + +#elif defined(__x86_64__) + +#define RSEQ_INJECT_ASM(n) \ + "lea asm_loop_cnt_" #n "(%%rip), %%" INJECT_ASM_REG "\n\t" \ + "mov (%%" INJECT_ASM_REG "), %%" INJECT_ASM_REG "\n\t" \ + "test %%" INJECT_ASM_REG ",%%" INJECT_ASM_REG "\n\t" \ + "jz 333f\n\t" \ + "222:\n\t" \ + "dec %%" INJECT_ASM_REG "\n\t" \ + "jnz 222b\n\t" \ + "333:\n\t" + +#else +#error "Unsupported architecture" +#endif + +#elif defined(__ARMEL__) + +#define RSEQ_INJECT_INPUT \ + , [loop_cnt_1]"m"(loop_cnt[1]) \ + , [loop_cnt_2]"m"(loop_cnt[2]) \ + , [loop_cnt_3]"m"(loop_cnt[3]) \ + , [loop_cnt_4]"m"(loop_cnt[4]) \ + , [loop_cnt_5]"m"(loop_cnt[5]) \ + , [loop_cnt_6]"m"(loop_cnt[6]) + +#define INJECT_ASM_REG "r4" + +#define RSEQ_INJECT_CLOBBER \ + , INJECT_ASM_REG + +#define RSEQ_INJECT_ASM(n) \ + "ldr " INJECT_ASM_REG ", %[loop_cnt_" #n "]\n\t" \ + "cmp " INJECT_ASM_REG ", #0\n\t" \ + "beq 333f\n\t" \ + "222:\n\t" \ + "subs " INJECT_ASM_REG ", #1\n\t" \ + "bne 222b\n\t" \ + "333:\n\t" + +#elif __PPC__ + +#define RSEQ_INJECT_INPUT \ + , [loop_cnt_1]"m"(loop_cnt[1]) \ + , [loop_cnt_2]"m"(loop_cnt[2]) \ + , [loop_cnt_3]"m"(loop_cnt[3]) \ + , [loop_cnt_4]"m"(loop_cnt[4]) \ + , [loop_cnt_5]"m"(loop_cnt[5]) \ + , [loop_cnt_6]"m"(loop_cnt[6]) + +#define INJECT_ASM_REG "r18" + +#define RSEQ_INJECT_CLOBBER \ + , INJECT_ASM_REG + +#define RSEQ_INJECT_ASM(n) \ + "lwz %%" INJECT_ASM_REG ", %[loop_cnt_" #n "]\n\t" \ + "cmpwi %%" INJECT_ASM_REG ", 0\n\t" \ + "beq 333f\n\t" \ + "222:\n\t" \ + "subic. %%" INJECT_ASM_REG ", %%" INJECT_ASM_REG ", 1\n\t" \ + "bne 222b\n\t" \ + "333:\n\t" +#else +#error unsupported target +#endif + +#define RSEQ_INJECT_FAILED \ + nr_abort++; + +#define RSEQ_INJECT_C(n) \ +{ \ + int loc_i, loc_nr_loops = loop_cnt[n]; \ + \ + for (loc_i = 0; loc_i < loc_nr_loops; loc_i++) { \ + rseq_barrier(); \ + } \ + if (loc_nr_loops == -1 && opt_modulo) { \ + if (yield_mod_cnt == opt_modulo - 1) { \ + if (opt_sleep > 0) \ + poll(NULL, 0, opt_sleep); \ + if (opt_yield) \ + sched_yield(); \ + if (opt_signal) \ + raise(SIGUSR1); \ + yield_mod_cnt = 0; \ + } else { \ + yield_mod_cnt++; \ + } \ + } \ +} + +#else + +#define printf_verbose(fmt, ...) + +#endif /* BENCHMARK */ + +#include "rseq.h" + +struct percpu_lock_entry { + intptr_t v; +} __attribute__((aligned(128))); + +struct percpu_lock { + struct percpu_lock_entry c[CPU_SETSIZE]; +}; + +struct test_data_entry { + intptr_t count; +} __attribute__((aligned(128))); + +struct spinlock_test_data { + struct percpu_lock lock; + struct test_data_entry c[CPU_SETSIZE]; +}; + +struct spinlock_thread_test_data { + struct spinlock_test_data *data; + long long reps; + int reg; +}; + +struct inc_test_data { + struct test_data_entry c[CPU_SETSIZE]; +}; + +struct inc_thread_test_data { + struct inc_test_data *data; + long long reps; + int reg; +}; + +struct percpu_list_node { + intptr_t data; + struct percpu_list_node *next; +}; + +struct percpu_list_entry { + struct percpu_list_node *head; +} __attribute__((aligned(128))); + +struct percpu_list { + struct percpu_list_entry c[CPU_SETSIZE]; +}; + +#define BUFFER_ITEM_PER_CPU 100 + +struct percpu_buffer_node { + intptr_t data; +}; + +struct percpu_buffer_entry { + intptr_t offset; + intptr_t buflen; + struct percpu_buffer_node **array; +} __attribute__((aligned(128))); + +struct percpu_buffer { + struct percpu_buffer_entry c[CPU_SETSIZE]; +}; + +#define MEMCPY_BUFFER_ITEM_PER_CPU 100 + +struct percpu_memcpy_buffer_node { + intptr_t data1; + uint64_t data2; +}; + +struct percpu_memcpy_buffer_entry { + intptr_t offset; + intptr_t buflen; + struct percpu_memcpy_buffer_node *array; +} __attribute__((aligned(128))); + +struct percpu_memcpy_buffer { + struct percpu_memcpy_buffer_entry c[CPU_SETSIZE]; +}; + +/* A simple percpu spinlock. Grabs lock on current cpu. */ +static int rseq_this_cpu_lock(struct percpu_lock *lock) +{ + int cpu; + + for (;;) { + int ret; + + cpu = rseq_cpu_start(); + ret = rseq_cmpeqv_storev(&lock->c[cpu].v, + 0, 1, cpu); + if (rseq_likely(!ret)) + break; + /* Retry if comparison fails or rseq aborts. */ + } + /* + * Acquire semantic when taking lock after control dependency. + * Matches rseq_smp_store_release(). + */ + rseq_smp_acquire__after_ctrl_dep(); + return cpu; +} + +static void rseq_percpu_unlock(struct percpu_lock *lock, int cpu) +{ + assert(lock->c[cpu].v == 1); + /* + * Release lock, with release semantic. Matches + * rseq_smp_acquire__after_ctrl_dep(). + */ + rseq_smp_store_release(&lock->c[cpu].v, 0); +} + +void *test_percpu_spinlock_thread(void *arg) +{ + struct spinlock_thread_test_data *thread_data = arg; + struct spinlock_test_data *data = thread_data->data; + long long i, reps; + + if (!opt_disable_rseq && thread_data->reg && + rseq_register_current_thread()) + abort(); + reps = thread_data->reps; + for (i = 0; i < reps; i++) { + int cpu = rseq_cpu_start(); + + cpu = rseq_this_cpu_lock(&data->lock); + data->c[cpu].count++; + rseq_percpu_unlock(&data->lock, cpu); +#ifndef BENCHMARK + if (i != 0 && !(i % (reps / 10))) + printf_verbose("tid %d: count %lld\n", (int) gettid(), i); +#endif + } + printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", + (int) gettid(), nr_abort, signals_delivered); + if (!opt_disable_rseq && thread_data->reg && + rseq_unregister_current_thread()) + abort(); + return NULL; +} + +/* + * A simple test which implements a sharded counter using a per-cpu + * lock. Obviously real applications might prefer to simply use a + * per-cpu increment; however, this is reasonable for a test and the + * lock can be extended to synchronize more complicated operations. + */ +void test_percpu_spinlock(void) +{ + const int num_threads = opt_threads; + int i, ret; + uint64_t sum; + pthread_t test_threads[num_threads]; + struct spinlock_test_data data; + struct spinlock_thread_test_data thread_data[num_threads]; + + memset(&data, 0, sizeof(data)); + for (i = 0; i < num_threads; i++) { + thread_data[i].reps = opt_reps; + if (opt_disable_mod <= 0 || (i % opt_disable_mod)) + thread_data[i].reg = 1; + else + thread_data[i].reg = 0; + thread_data[i].data = &data; + ret = pthread_create(&test_threads[i], NULL, + test_percpu_spinlock_thread, + &thread_data[i]); + if (ret) { + errno = ret; + perror("pthread_create"); + abort(); + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_join(test_threads[i], NULL); + if (ret) { + errno = ret; + perror("pthread_join"); + abort(); + } + } + + sum = 0; + for (i = 0; i < CPU_SETSIZE; i++) + sum += data.c[i].count; + + assert(sum == (uint64_t)opt_reps * num_threads); +} + +void *test_percpu_inc_thread(void *arg) +{ + struct inc_thread_test_data *thread_data = arg; + struct inc_test_data *data = thread_data->data; + long long i, reps; + + if (!opt_disable_rseq && thread_data->reg && + rseq_register_current_thread()) + abort(); + reps = thread_data->reps; + for (i = 0; i < reps; i++) { + int ret; + + do { + int cpu; + + cpu = rseq_cpu_start(); + ret = rseq_addv(&data->c[cpu].count, 1, cpu); + } while (rseq_unlikely(ret)); +#ifndef BENCHMARK + if (i != 0 && !(i % (reps / 10))) + printf_verbose("tid %d: count %lld\n", (int) gettid(), i); +#endif + } + printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", + (int) gettid(), nr_abort, signals_delivered); + if (!opt_disable_rseq && thread_data->reg && + rseq_unregister_current_thread()) + abort(); + return NULL; +} + +void test_percpu_inc(void) +{ + const int num_threads = opt_threads; + int i, ret; + uint64_t sum; + pthread_t test_threads[num_threads]; + struct inc_test_data data; + struct inc_thread_test_data thread_data[num_threads]; + + memset(&data, 0, sizeof(data)); + for (i = 0; i < num_threads; i++) { + thread_data[i].reps = opt_reps; + if (opt_disable_mod <= 0 || (i % opt_disable_mod)) + thread_data[i].reg = 1; + else + thread_data[i].reg = 0; + thread_data[i].data = &data; + ret = pthread_create(&test_threads[i], NULL, + test_percpu_inc_thread, + &thread_data[i]); + if (ret) { + errno = ret; + perror("pthread_create"); + abort(); + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_join(test_threads[i], NULL); + if (ret) { + errno = ret; + perror("pthread_join"); + abort(); + } + } + + sum = 0; + for (i = 0; i < CPU_SETSIZE; i++) + sum += data.c[i].count; + + assert(sum == (uint64_t)opt_reps * num_threads); +} + +void this_cpu_list_push(struct percpu_list *list, + struct percpu_list_node *node, + int *_cpu) +{ + int cpu; + + for (;;) { + intptr_t *targetptr, newval, expect; + int ret; + + cpu = rseq_cpu_start(); + /* Load list->c[cpu].head with single-copy atomicity. */ + expect = (intptr_t)RSEQ_READ_ONCE(list->c[cpu].head); + newval = (intptr_t)node; + targetptr = (intptr_t *)&list->c[cpu].head; + node->next = (struct percpu_list_node *)expect; + ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu); + if (rseq_likely(!ret)) + break; + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; +} + +/* + * Unlike a traditional lock-less linked list; the availability of a + * rseq primitive allows us to implement pop without concerns over + * ABA-type races. + */ +struct percpu_list_node *this_cpu_list_pop(struct percpu_list *list, + int *_cpu) +{ + struct percpu_list_node *node = NULL; + int cpu; + + for (;;) { + struct percpu_list_node *head; + intptr_t *targetptr, expectnot, *load; + off_t offset; + int ret; + + cpu = rseq_cpu_start(); + targetptr = (intptr_t *)&list->c[cpu].head; + expectnot = (intptr_t)NULL; + offset = offsetof(struct percpu_list_node, next); + load = (intptr_t *)&head; + ret = rseq_cmpnev_storeoffp_load(targetptr, expectnot, + offset, load, cpu); + if (rseq_likely(!ret)) { + node = head; + break; + } + if (ret > 0) + break; + /* Retry if rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; + return node; +} + +/* + * __percpu_list_pop is not safe against concurrent accesses. Should + * only be used on lists that are not concurrently modified. + */ +struct percpu_list_node *__percpu_list_pop(struct percpu_list *list, int cpu) +{ + struct percpu_list_node *node; + + node = list->c[cpu].head; + if (!node) + return NULL; + list->c[cpu].head = node->next; + return node; +} + +void *test_percpu_list_thread(void *arg) +{ + long long i, reps; + struct percpu_list *list = (struct percpu_list *)arg; + + if (!opt_disable_rseq && rseq_register_current_thread()) + abort(); + + reps = opt_reps; + for (i = 0; i < reps; i++) { + struct percpu_list_node *node; + + node = this_cpu_list_pop(list, NULL); + if (opt_yield) + sched_yield(); /* encourage shuffling */ + if (node) + this_cpu_list_push(list, node, NULL); + } + + printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", + (int) gettid(), nr_abort, signals_delivered); + if (!opt_disable_rseq && rseq_unregister_current_thread()) + abort(); + + return NULL; +} + +/* Simultaneous modification to a per-cpu linked list from many threads. */ +void test_percpu_list(void) +{ + const int num_threads = opt_threads; + int i, j, ret; + uint64_t sum = 0, expected_sum = 0; + struct percpu_list list; + pthread_t test_threads[num_threads]; + cpu_set_t allowed_cpus; + + memset(&list, 0, sizeof(list)); + + /* Generate list entries for every usable cpu. */ + sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus); + for (i = 0; i < CPU_SETSIZE; i++) { + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + for (j = 1; j <= 100; j++) { + struct percpu_list_node *node; + + expected_sum += j; + + node = malloc(sizeof(*node)); + assert(node); + node->data = j; + node->next = list.c[i].head; + list.c[i].head = node; + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_create(&test_threads[i], NULL, + test_percpu_list_thread, &list); + if (ret) { + errno = ret; + perror("pthread_create"); + abort(); + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_join(test_threads[i], NULL); + if (ret) { + errno = ret; + perror("pthread_join"); + abort(); + } + } + + for (i = 0; i < CPU_SETSIZE; i++) { + struct percpu_list_node *node; + + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + + while ((node = __percpu_list_pop(&list, i))) { + sum += node->data; + free(node); + } + } + + /* + * All entries should now be accounted for (unless some external + * actor is interfering with our allowed affinity while this + * test is running). + */ + assert(sum == expected_sum); +} + +bool this_cpu_buffer_push(struct percpu_buffer *buffer, + struct percpu_buffer_node *node, + int *_cpu) +{ + bool result = false; + int cpu; + + for (;;) { + intptr_t *targetptr_spec, newval_spec; + intptr_t *targetptr_final, newval_final; + intptr_t offset; + int ret; + + cpu = rseq_cpu_start(); + offset = RSEQ_READ_ONCE(buffer->c[cpu].offset); + if (offset == buffer->c[cpu].buflen) + break; + newval_spec = (intptr_t)node; + targetptr_spec = (intptr_t *)&buffer->c[cpu].array[offset]; + newval_final = offset + 1; + targetptr_final = &buffer->c[cpu].offset; + if (opt_mb) + ret = rseq_cmpeqv_trystorev_storev_release( + targetptr_final, offset, targetptr_spec, + newval_spec, newval_final, cpu); + else + ret = rseq_cmpeqv_trystorev_storev(targetptr_final, + offset, targetptr_spec, newval_spec, + newval_final, cpu); + if (rseq_likely(!ret)) { + result = true; + break; + } + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; + return result; +} + +struct percpu_buffer_node *this_cpu_buffer_pop(struct percpu_buffer *buffer, + int *_cpu) +{ + struct percpu_buffer_node *head; + int cpu; + + for (;;) { + intptr_t *targetptr, newval; + intptr_t offset; + int ret; + + cpu = rseq_cpu_start(); + /* Load offset with single-copy atomicity. */ + offset = RSEQ_READ_ONCE(buffer->c[cpu].offset); + if (offset == 0) { + head = NULL; + break; + } + head = RSEQ_READ_ONCE(buffer->c[cpu].array[offset - 1]); + newval = offset - 1; + targetptr = (intptr_t *)&buffer->c[cpu].offset; + ret = rseq_cmpeqv_cmpeqv_storev(targetptr, offset, + (intptr_t *)&buffer->c[cpu].array[offset - 1], + (intptr_t)head, newval, cpu); + if (rseq_likely(!ret)) + break; + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; + return head; +} + +/* + * __percpu_buffer_pop is not safe against concurrent accesses. Should + * only be used on buffers that are not concurrently modified. + */ +struct percpu_buffer_node *__percpu_buffer_pop(struct percpu_buffer *buffer, + int cpu) +{ + struct percpu_buffer_node *head; + intptr_t offset; + + offset = buffer->c[cpu].offset; + if (offset == 0) + return NULL; + head = buffer->c[cpu].array[offset - 1]; + buffer->c[cpu].offset = offset - 1; + return head; +} + +void *test_percpu_buffer_thread(void *arg) +{ + long long i, reps; + struct percpu_buffer *buffer = (struct percpu_buffer *)arg; + + if (!opt_disable_rseq && rseq_register_current_thread()) + abort(); + + reps = opt_reps; + for (i = 0; i < reps; i++) { + struct percpu_buffer_node *node; + + node = this_cpu_buffer_pop(buffer, NULL); + if (opt_yield) + sched_yield(); /* encourage shuffling */ + if (node) { + if (!this_cpu_buffer_push(buffer, node, NULL)) { + /* Should increase buffer size. */ + abort(); + } + } + } + + printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", + (int) gettid(), nr_abort, signals_delivered); + if (!opt_disable_rseq && rseq_unregister_current_thread()) + abort(); + + return NULL; +} + +/* Simultaneous modification to a per-cpu buffer from many threads. */ +void test_percpu_buffer(void) +{ + const int num_threads = opt_threads; + int i, j, ret; + uint64_t sum = 0, expected_sum = 0; + struct percpu_buffer buffer; + pthread_t test_threads[num_threads]; + cpu_set_t allowed_cpus; + + memset(&buffer, 0, sizeof(buffer)); + + /* Generate list entries for every usable cpu. */ + sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus); + for (i = 0; i < CPU_SETSIZE; i++) { + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + /* Worse-case is every item in same CPU. */ + buffer.c[i].array = + malloc(sizeof(*buffer.c[i].array) * CPU_SETSIZE * + BUFFER_ITEM_PER_CPU); + assert(buffer.c[i].array); + buffer.c[i].buflen = CPU_SETSIZE * BUFFER_ITEM_PER_CPU; + for (j = 1; j <= BUFFER_ITEM_PER_CPU; j++) { + struct percpu_buffer_node *node; + + expected_sum += j; + + /* + * We could theoretically put the word-sized + * "data" directly in the buffer. However, we + * want to model objects that would not fit + * within a single word, so allocate an object + * for each node. + */ + node = malloc(sizeof(*node)); + assert(node); + node->data = j; + buffer.c[i].array[j - 1] = node; + buffer.c[i].offset++; + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_create(&test_threads[i], NULL, + test_percpu_buffer_thread, &buffer); + if (ret) { + errno = ret; + perror("pthread_create"); + abort(); + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_join(test_threads[i], NULL); + if (ret) { + errno = ret; + perror("pthread_join"); + abort(); + } + } + + for (i = 0; i < CPU_SETSIZE; i++) { + struct percpu_buffer_node *node; + + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + + while ((node = __percpu_buffer_pop(&buffer, i))) { + sum += node->data; + free(node); + } + free(buffer.c[i].array); + } + + /* + * All entries should now be accounted for (unless some external + * actor is interfering with our allowed affinity while this + * test is running). + */ + assert(sum == expected_sum); +} + +bool this_cpu_memcpy_buffer_push(struct percpu_memcpy_buffer *buffer, + struct percpu_memcpy_buffer_node item, + int *_cpu) +{ + bool result = false; + int cpu; + + for (;;) { + intptr_t *targetptr_final, newval_final, offset; + char *destptr, *srcptr; + size_t copylen; + int ret; + + cpu = rseq_cpu_start(); + /* Load offset with single-copy atomicity. */ + offset = RSEQ_READ_ONCE(buffer->c[cpu].offset); + if (offset == buffer->c[cpu].buflen) + break; + destptr = (char *)&buffer->c[cpu].array[offset]; + srcptr = (char *)&item; + /* copylen must be <= 4kB. */ + copylen = sizeof(item); + newval_final = offset + 1; + targetptr_final = &buffer->c[cpu].offset; + if (opt_mb) + ret = rseq_cmpeqv_trymemcpy_storev_release( + targetptr_final, offset, + destptr, srcptr, copylen, + newval_final, cpu); + else + ret = rseq_cmpeqv_trymemcpy_storev(targetptr_final, + offset, destptr, srcptr, copylen, + newval_final, cpu); + if (rseq_likely(!ret)) { + result = true; + break; + } + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; + return result; +} + +bool this_cpu_memcpy_buffer_pop(struct percpu_memcpy_buffer *buffer, + struct percpu_memcpy_buffer_node *item, + int *_cpu) +{ + bool result = false; + int cpu; + + for (;;) { + intptr_t *targetptr_final, newval_final, offset; + char *destptr, *srcptr; + size_t copylen; + int ret; + + cpu = rseq_cpu_start(); + /* Load offset with single-copy atomicity. */ + offset = RSEQ_READ_ONCE(buffer->c[cpu].offset); + if (offset == 0) + break; + destptr = (char *)item; + srcptr = (char *)&buffer->c[cpu].array[offset - 1]; + /* copylen must be <= 4kB. */ + copylen = sizeof(*item); + newval_final = offset - 1; + targetptr_final = &buffer->c[cpu].offset; + ret = rseq_cmpeqv_trymemcpy_storev(targetptr_final, + offset, destptr, srcptr, copylen, + newval_final, cpu); + if (rseq_likely(!ret)) { + result = true; + break; + } + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; + return result; +} + +/* + * __percpu_memcpy_buffer_pop is not safe against concurrent accesses. Should + * only be used on buffers that are not concurrently modified. + */ +bool __percpu_memcpy_buffer_pop(struct percpu_memcpy_buffer *buffer, + struct percpu_memcpy_buffer_node *item, + int cpu) +{ + intptr_t offset; + + offset = buffer->c[cpu].offset; + if (offset == 0) + return false; + memcpy(item, &buffer->c[cpu].array[offset - 1], sizeof(*item)); + buffer->c[cpu].offset = offset - 1; + return true; +} + +void *test_percpu_memcpy_buffer_thread(void *arg) +{ + long long i, reps; + struct percpu_memcpy_buffer *buffer = (struct percpu_memcpy_buffer *)arg; + + if (!opt_disable_rseq && rseq_register_current_thread()) + abort(); + + reps = opt_reps; + for (i = 0; i < reps; i++) { + struct percpu_memcpy_buffer_node item; + bool result; + + result = this_cpu_memcpy_buffer_pop(buffer, &item, NULL); + if (opt_yield) + sched_yield(); /* encourage shuffling */ + if (result) { + if (!this_cpu_memcpy_buffer_push(buffer, item, NULL)) { + /* Should increase buffer size. */ + abort(); + } + } + } + + printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", + (int) gettid(), nr_abort, signals_delivered); + if (!opt_disable_rseq && rseq_unregister_current_thread()) + abort(); + + return NULL; +} + +/* Simultaneous modification to a per-cpu buffer from many threads. */ +void test_percpu_memcpy_buffer(void) +{ + const int num_threads = opt_threads; + int i, j, ret; + uint64_t sum = 0, expected_sum = 0; + struct percpu_memcpy_buffer buffer; + pthread_t test_threads[num_threads]; + cpu_set_t allowed_cpus; + + memset(&buffer, 0, sizeof(buffer)); + + /* Generate list entries for every usable cpu. */ + sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus); + for (i = 0; i < CPU_SETSIZE; i++) { + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + /* Worse-case is every item in same CPU. */ + buffer.c[i].array = + malloc(sizeof(*buffer.c[i].array) * CPU_SETSIZE * + MEMCPY_BUFFER_ITEM_PER_CPU); + assert(buffer.c[i].array); + buffer.c[i].buflen = CPU_SETSIZE * MEMCPY_BUFFER_ITEM_PER_CPU; + for (j = 1; j <= MEMCPY_BUFFER_ITEM_PER_CPU; j++) { + expected_sum += 2 * j + 1; + + /* + * We could theoretically put the word-sized + * "data" directly in the buffer. However, we + * want to model objects that would not fit + * within a single word, so allocate an object + * for each node. + */ + buffer.c[i].array[j - 1].data1 = j; + buffer.c[i].array[j - 1].data2 = j + 1; + buffer.c[i].offset++; + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_create(&test_threads[i], NULL, + test_percpu_memcpy_buffer_thread, + &buffer); + if (ret) { + errno = ret; + perror("pthread_create"); + abort(); + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_join(test_threads[i], NULL); + if (ret) { + errno = ret; + perror("pthread_join"); + abort(); + } + } + + for (i = 0; i < CPU_SETSIZE; i++) { + struct percpu_memcpy_buffer_node item; + + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + + while (__percpu_memcpy_buffer_pop(&buffer, &item, i)) { + sum += item.data1; + sum += item.data2; + } + free(buffer.c[i].array); + } + + /* + * All entries should now be accounted for (unless some external + * actor is interfering with our allowed affinity while this + * test is running). + */ + assert(sum == expected_sum); +} + +static void test_signal_interrupt_handler(int signo) +{ + signals_delivered++; +} + +static int set_signal_handler(void) +{ + int ret = 0; + struct sigaction sa; + sigset_t sigset; + + ret = sigemptyset(&sigset); + if (ret < 0) { + perror("sigemptyset"); + return ret; + } + + sa.sa_handler = test_signal_interrupt_handler; + sa.sa_mask = sigset; + sa.sa_flags = 0; + ret = sigaction(SIGUSR1, &sa, NULL); + if (ret < 0) { + perror("sigaction"); + return ret; + } + + printf_verbose("Signal handler set for SIGUSR1\n"); + + return ret; +} + +static void show_usage(int argc, char **argv) +{ + printf("Usage : %s <OPTIONS>\n", + argv[0]); + printf("OPTIONS:\n"); + printf(" [-1 loops] Number of loops for delay injection 1\n"); + printf(" [-2 loops] Number of loops for delay injection 2\n"); + printf(" [-3 loops] Number of loops for delay injection 3\n"); + printf(" [-4 loops] Number of loops for delay injection 4\n"); + printf(" [-5 loops] Number of loops for delay injection 5\n"); + printf(" [-6 loops] Number of loops for delay injection 6\n"); + printf(" [-7 loops] Number of loops for delay injection 7 (-1 to enable -m)\n"); + printf(" [-8 loops] Number of loops for delay injection 8 (-1 to enable -m)\n"); + printf(" [-9 loops] Number of loops for delay injection 9 (-1 to enable -m)\n"); + printf(" [-m N] Yield/sleep/kill every modulo N (default 0: disabled) (>= 0)\n"); + printf(" [-y] Yield\n"); + printf(" [-k] Kill thread with signal\n"); + printf(" [-s S] S: =0: disabled (default), >0: sleep time (ms)\n"); + printf(" [-t N] Number of threads (default 200)\n"); + printf(" [-r N] Number of repetitions per thread (default 5000)\n"); + printf(" [-d] Disable rseq system call (no initialization)\n"); + printf(" [-D M] Disable rseq for each M threads\n"); + printf(" [-T test] Choose test: (s)pinlock, (l)ist, (b)uffer, (m)emcpy, (i)ncrement\n"); + printf(" [-M] Push into buffer and memcpy buffer with memory barriers.\n"); + printf(" [-v] Verbose output.\n"); + printf(" [-h] Show this help.\n"); + printf("\n"); +} + +int main(int argc, char **argv) +{ + int i; + + for (i = 1; i < argc; i++) { + if (argv[i][0] != '-') + continue; + switch (argv[i][1]) { + case '1': + case '2': + case '3': + case '4': + case '5': + case '6': + case '7': + case '8': + case '9': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + loop_cnt[argv[i][1] - '0'] = atol(argv[i + 1]); + i++; + break; + case 'm': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_modulo = atol(argv[i + 1]); + if (opt_modulo < 0) { + show_usage(argc, argv); + goto error; + } + i++; + break; + case 's': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_sleep = atol(argv[i + 1]); + if (opt_sleep < 0) { + show_usage(argc, argv); + goto error; + } + i++; + break; + case 'y': + opt_yield = 1; + break; + case 'k': + opt_signal = 1; + break; + case 'd': + opt_disable_rseq = 1; + break; + case 'D': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_disable_mod = atol(argv[i + 1]); + if (opt_disable_mod < 0) { + show_usage(argc, argv); + goto error; + } + i++; + break; + case 't': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_threads = atol(argv[i + 1]); + if (opt_threads < 0) { + show_usage(argc, argv); + goto error; + } + i++; + break; + case 'r': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_reps = atoll(argv[i + 1]); + if (opt_reps < 0) { + show_usage(argc, argv); + goto error; + } + i++; + break; + case 'h': + show_usage(argc, argv); + goto end; + case 'T': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_test = *argv[i + 1]; + switch (opt_test) { + case 's': + case 'l': + case 'i': + case 'b': + case 'm': + break; + default: + show_usage(argc, argv); + goto error; + } + i++; + break; + case 'v': + verbose = 1; + break; + case 'M': + opt_mb = 1; + break; + default: + show_usage(argc, argv); + goto error; + } + } + + loop_cnt_1 = loop_cnt[1]; + loop_cnt_2 = loop_cnt[2]; + loop_cnt_3 = loop_cnt[3]; + loop_cnt_4 = loop_cnt[4]; + loop_cnt_5 = loop_cnt[5]; + loop_cnt_6 = loop_cnt[6]; + + if (set_signal_handler()) + goto error; + + if (!opt_disable_rseq && rseq_register_current_thread()) + goto error; + switch (opt_test) { + case 's': + printf_verbose("spinlock\n"); + test_percpu_spinlock(); + break; + case 'l': + printf_verbose("linked list\n"); + test_percpu_list(); + break; + case 'b': + printf_verbose("buffer\n"); + test_percpu_buffer(); + break; + case 'm': + printf_verbose("memcpy buffer\n"); + test_percpu_memcpy_buffer(); + break; + case 'i': + printf_verbose("counter increment\n"); + test_percpu_inc(); + break; + } + if (!opt_disable_rseq && rseq_unregister_current_thread()) + abort(); +end: + return 0; + +error: + return -1; +} -- 2.11.0 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 13/14] rseq: selftests: Provide parametrized tests (v2) @ 2018-04-30 22:44 ` mathieu.desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers, Shuah Khan "param_test" is a parametrizable restartable sequences test. See the "--help" output for usage. "param_test_benchmark" is the same as "param_test", but it removes testing book-keeping code to allow accurate benchmarks. "param_test_compare_twice" is the same as "param_test", but it performs each comparison within rseq critical section twice, thus validating invariants. If any of the second comparisons fails, an error message is printed and the test aborts. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Shuah Khan <shuahkh@osg.samsung.com> CC: Russell King <linux@arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas@arm.com> CC: Will Deacon <will.deacon@arm.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Paul Turner <pjt@google.com> CC: Andrew Hunter <ahh@google.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Andi Kleen <andi@firstfloor.org> CC: Dave Watson <davejwatson@fb.com> CC: Chris Lameter <cl@linux.com> CC: Ingo Molnar <mingo@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Ben Maurer <bmaurer@fb.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Boqun Feng <boqun.feng@gmail.com> CC: linux-kselftest@vger.kernel.org CC: linux-api@vger.kernel.org --- Changes since v1: - Use only rseq, remove use of cpu_opv. --- tools/testing/selftests/rseq/param_test.c | 1260 +++++++++++++++++++++++++++++ 1 file changed, 1260 insertions(+) create mode 100644 tools/testing/selftests/rseq/param_test.c diff --git a/tools/testing/selftests/rseq/param_test.c b/tools/testing/selftests/rseq/param_test.c new file mode 100644 index 000000000000..6a9f602a8718 --- /dev/null +++ b/tools/testing/selftests/rseq/param_test.c @@ -0,0 +1,1260 @@ +// SPDX-License-Identifier: LGPL-2.1 +#define _GNU_SOURCE +#include <assert.h> +#include <pthread.h> +#include <sched.h> +#include <stdint.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <syscall.h> +#include <unistd.h> +#include <poll.h> +#include <sys/types.h> +#include <signal.h> +#include <errno.h> +#include <stddef.h> + +static inline pid_t gettid(void) +{ + return syscall(__NR_gettid); +} + +#define NR_INJECT 9 +static int loop_cnt[NR_INJECT + 1]; + +static int loop_cnt_1 asm("asm_loop_cnt_1") __attribute__((used)); +static int loop_cnt_2 asm("asm_loop_cnt_2") __attribute__((used)); +static int loop_cnt_3 asm("asm_loop_cnt_3") __attribute__((used)); +static int loop_cnt_4 asm("asm_loop_cnt_4") __attribute__((used)); +static int loop_cnt_5 asm("asm_loop_cnt_5") __attribute__((used)); +static int loop_cnt_6 asm("asm_loop_cnt_6") __attribute__((used)); + +static int opt_modulo, verbose; + +static int opt_yield, opt_signal, opt_sleep, + opt_disable_rseq, opt_threads = 200, + opt_disable_mod = 0, opt_test = 's', opt_mb = 0; + +#ifndef RSEQ_SKIP_FASTPATH +static long long opt_reps = 5000; +#else +static long long opt_reps = 100; +#endif + +static __thread __attribute__((tls_model("initial-exec"))) +unsigned int signals_delivered; + +#ifndef BENCHMARK + +static __thread __attribute__((tls_model("initial-exec"), unused)) +unsigned int yield_mod_cnt, nr_abort; + +#define printf_verbose(fmt, ...) \ + do { \ + if (verbose) \ + printf(fmt, ## __VA_ARGS__); \ + } while (0) + +#if defined(__x86_64__) || defined(__i386__) + +#define INJECT_ASM_REG "eax" + +#define RSEQ_INJECT_CLOBBER \ + , INJECT_ASM_REG + +#ifdef __i386__ + +#define RSEQ_INJECT_ASM(n) \ + "mov asm_loop_cnt_" #n ", %%" INJECT_ASM_REG "\n\t" \ + "test %%" INJECT_ASM_REG ",%%" INJECT_ASM_REG "\n\t" \ + "jz 333f\n\t" \ + "222:\n\t" \ + "dec %%" INJECT_ASM_REG "\n\t" \ + "jnz 222b\n\t" \ + "333:\n\t" + +#elif defined(__x86_64__) + +#define RSEQ_INJECT_ASM(n) \ + "lea asm_loop_cnt_" #n "(%%rip), %%" INJECT_ASM_REG "\n\t" \ + "mov (%%" INJECT_ASM_REG "), %%" INJECT_ASM_REG "\n\t" \ + "test %%" INJECT_ASM_REG ",%%" INJECT_ASM_REG "\n\t" \ + "jz 333f\n\t" \ + "222:\n\t" \ + "dec %%" INJECT_ASM_REG "\n\t" \ + "jnz 222b\n\t" \ + "333:\n\t" + +#else +#error "Unsupported architecture" +#endif + +#elif defined(__ARMEL__) + +#define RSEQ_INJECT_INPUT \ + , [loop_cnt_1]"m"(loop_cnt[1]) \ + , [loop_cnt_2]"m"(loop_cnt[2]) \ + , [loop_cnt_3]"m"(loop_cnt[3]) \ + , [loop_cnt_4]"m"(loop_cnt[4]) \ + , [loop_cnt_5]"m"(loop_cnt[5]) \ + , [loop_cnt_6]"m"(loop_cnt[6]) + +#define INJECT_ASM_REG "r4" + +#define RSEQ_INJECT_CLOBBER \ + , INJECT_ASM_REG + +#define RSEQ_INJECT_ASM(n) \ + "ldr " INJECT_ASM_REG ", %[loop_cnt_" #n "]\n\t" \ + "cmp " INJECT_ASM_REG ", #0\n\t" \ + "beq 333f\n\t" \ + "222:\n\t" \ + "subs " INJECT_ASM_REG ", #1\n\t" \ + "bne 222b\n\t" \ + "333:\n\t" + +#elif __PPC__ + +#define RSEQ_INJECT_INPUT \ + , [loop_cnt_1]"m"(loop_cnt[1]) \ + , [loop_cnt_2]"m"(loop_cnt[2]) \ + , [loop_cnt_3]"m"(loop_cnt[3]) \ + , [loop_cnt_4]"m"(loop_cnt[4]) \ + , [loop_cnt_5]"m"(loop_cnt[5]) \ + , [loop_cnt_6]"m"(loop_cnt[6]) + +#define INJECT_ASM_REG "r18" + +#define RSEQ_INJECT_CLOBBER \ + , INJECT_ASM_REG + +#define RSEQ_INJECT_ASM(n) \ + "lwz %%" INJECT_ASM_REG ", %[loop_cnt_" #n "]\n\t" \ + "cmpwi %%" INJECT_ASM_REG ", 0\n\t" \ + "beq 333f\n\t" \ + "222:\n\t" \ + "subic. %%" INJECT_ASM_REG ", %%" INJECT_ASM_REG ", 1\n\t" \ + "bne 222b\n\t" \ + "333:\n\t" +#else +#error unsupported target +#endif + +#define RSEQ_INJECT_FAILED \ + nr_abort++; + +#define RSEQ_INJECT_C(n) \ +{ \ + int loc_i, loc_nr_loops = loop_cnt[n]; \ + \ + for (loc_i = 0; loc_i < loc_nr_loops; loc_i++) { \ + rseq_barrier(); \ + } \ + if (loc_nr_loops == -1 && opt_modulo) { \ + if (yield_mod_cnt == opt_modulo - 1) { \ + if (opt_sleep > 0) \ + poll(NULL, 0, opt_sleep); \ + if (opt_yield) \ + sched_yield(); \ + if (opt_signal) \ + raise(SIGUSR1); \ + yield_mod_cnt = 0; \ + } else { \ + yield_mod_cnt++; \ + } \ + } \ +} + +#else + +#define printf_verbose(fmt, ...) + +#endif /* BENCHMARK */ + +#include "rseq.h" + +struct percpu_lock_entry { + intptr_t v; +} __attribute__((aligned(128))); + +struct percpu_lock { + struct percpu_lock_entry c[CPU_SETSIZE]; +}; + +struct test_data_entry { + intptr_t count; +} __attribute__((aligned(128))); + +struct spinlock_test_data { + struct percpu_lock lock; + struct test_data_entry c[CPU_SETSIZE]; +}; + +struct spinlock_thread_test_data { + struct spinlock_test_data *data; + long long reps; + int reg; +}; + +struct inc_test_data { + struct test_data_entry c[CPU_SETSIZE]; +}; + +struct inc_thread_test_data { + struct inc_test_data *data; + long long reps; + int reg; +}; + +struct percpu_list_node { + intptr_t data; + struct percpu_list_node *next; +}; + +struct percpu_list_entry { + struct percpu_list_node *head; +} __attribute__((aligned(128))); + +struct percpu_list { + struct percpu_list_entry c[CPU_SETSIZE]; +}; + +#define BUFFER_ITEM_PER_CPU 100 + +struct percpu_buffer_node { + intptr_t data; +}; + +struct percpu_buffer_entry { + intptr_t offset; + intptr_t buflen; + struct percpu_buffer_node **array; +} __attribute__((aligned(128))); + +struct percpu_buffer { + struct percpu_buffer_entry c[CPU_SETSIZE]; +}; + +#define MEMCPY_BUFFER_ITEM_PER_CPU 100 + +struct percpu_memcpy_buffer_node { + intptr_t data1; + uint64_t data2; +}; + +struct percpu_memcpy_buffer_entry { + intptr_t offset; + intptr_t buflen; + struct percpu_memcpy_buffer_node *array; +} __attribute__((aligned(128))); + +struct percpu_memcpy_buffer { + struct percpu_memcpy_buffer_entry c[CPU_SETSIZE]; +}; + +/* A simple percpu spinlock. Grabs lock on current cpu. */ +static int rseq_this_cpu_lock(struct percpu_lock *lock) +{ + int cpu; + + for (;;) { + int ret; + + cpu = rseq_cpu_start(); + ret = rseq_cmpeqv_storev(&lock->c[cpu].v, + 0, 1, cpu); + if (rseq_likely(!ret)) + break; + /* Retry if comparison fails or rseq aborts. */ + } + /* + * Acquire semantic when taking lock after control dependency. + * Matches rseq_smp_store_release(). + */ + rseq_smp_acquire__after_ctrl_dep(); + return cpu; +} + +static void rseq_percpu_unlock(struct percpu_lock *lock, int cpu) +{ + assert(lock->c[cpu].v == 1); + /* + * Release lock, with release semantic. Matches + * rseq_smp_acquire__after_ctrl_dep(). + */ + rseq_smp_store_release(&lock->c[cpu].v, 0); +} + +void *test_percpu_spinlock_thread(void *arg) +{ + struct spinlock_thread_test_data *thread_data = arg; + struct spinlock_test_data *data = thread_data->data; + long long i, reps; + + if (!opt_disable_rseq && thread_data->reg && + rseq_register_current_thread()) + abort(); + reps = thread_data->reps; + for (i = 0; i < reps; i++) { + int cpu = rseq_cpu_start(); + + cpu = rseq_this_cpu_lock(&data->lock); + data->c[cpu].count++; + rseq_percpu_unlock(&data->lock, cpu); +#ifndef BENCHMARK + if (i != 0 && !(i % (reps / 10))) + printf_verbose("tid %d: count %lld\n", (int) gettid(), i); +#endif + } + printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", + (int) gettid(), nr_abort, signals_delivered); + if (!opt_disable_rseq && thread_data->reg && + rseq_unregister_current_thread()) + abort(); + return NULL; +} + +/* + * A simple test which implements a sharded counter using a per-cpu + * lock. Obviously real applications might prefer to simply use a + * per-cpu increment; however, this is reasonable for a test and the + * lock can be extended to synchronize more complicated operations. + */ +void test_percpu_spinlock(void) +{ + const int num_threads = opt_threads; + int i, ret; + uint64_t sum; + pthread_t test_threads[num_threads]; + struct spinlock_test_data data; + struct spinlock_thread_test_data thread_data[num_threads]; + + memset(&data, 0, sizeof(data)); + for (i = 0; i < num_threads; i++) { + thread_data[i].reps = opt_reps; + if (opt_disable_mod <= 0 || (i % opt_disable_mod)) + thread_data[i].reg = 1; + else + thread_data[i].reg = 0; + thread_data[i].data = &data; + ret = pthread_create(&test_threads[i], NULL, + test_percpu_spinlock_thread, + &thread_data[i]); + if (ret) { + errno = ret; + perror("pthread_create"); + abort(); + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_join(test_threads[i], NULL); + if (ret) { + errno = ret; + perror("pthread_join"); + abort(); + } + } + + sum = 0; + for (i = 0; i < CPU_SETSIZE; i++) + sum += data.c[i].count; + + assert(sum == (uint64_t)opt_reps * num_threads); +} + +void *test_percpu_inc_thread(void *arg) +{ + struct inc_thread_test_data *thread_data = arg; + struct inc_test_data *data = thread_data->data; + long long i, reps; + + if (!opt_disable_rseq && thread_data->reg && + rseq_register_current_thread()) + abort(); + reps = thread_data->reps; + for (i = 0; i < reps; i++) { + int ret; + + do { + int cpu; + + cpu = rseq_cpu_start(); + ret = rseq_addv(&data->c[cpu].count, 1, cpu); + } while (rseq_unlikely(ret)); +#ifndef BENCHMARK + if (i != 0 && !(i % (reps / 10))) + printf_verbose("tid %d: count %lld\n", (int) gettid(), i); +#endif + } + printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", + (int) gettid(), nr_abort, signals_delivered); + if (!opt_disable_rseq && thread_data->reg && + rseq_unregister_current_thread()) + abort(); + return NULL; +} + +void test_percpu_inc(void) +{ + const int num_threads = opt_threads; + int i, ret; + uint64_t sum; + pthread_t test_threads[num_threads]; + struct inc_test_data data; + struct inc_thread_test_data thread_data[num_threads]; + + memset(&data, 0, sizeof(data)); + for (i = 0; i < num_threads; i++) { + thread_data[i].reps = opt_reps; + if (opt_disable_mod <= 0 || (i % opt_disable_mod)) + thread_data[i].reg = 1; + else + thread_data[i].reg = 0; + thread_data[i].data = &data; + ret = pthread_create(&test_threads[i], NULL, + test_percpu_inc_thread, + &thread_data[i]); + if (ret) { + errno = ret; + perror("pthread_create"); + abort(); + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_join(test_threads[i], NULL); + if (ret) { + errno = ret; + perror("pthread_join"); + abort(); + } + } + + sum = 0; + for (i = 0; i < CPU_SETSIZE; i++) + sum += data.c[i].count; + + assert(sum == (uint64_t)opt_reps * num_threads); +} + +void this_cpu_list_push(struct percpu_list *list, + struct percpu_list_node *node, + int *_cpu) +{ + int cpu; + + for (;;) { + intptr_t *targetptr, newval, expect; + int ret; + + cpu = rseq_cpu_start(); + /* Load list->c[cpu].head with single-copy atomicity. */ + expect = (intptr_t)RSEQ_READ_ONCE(list->c[cpu].head); + newval = (intptr_t)node; + targetptr = (intptr_t *)&list->c[cpu].head; + node->next = (struct percpu_list_node *)expect; + ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu); + if (rseq_likely(!ret)) + break; + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; +} + +/* + * Unlike a traditional lock-less linked list; the availability of a + * rseq primitive allows us to implement pop without concerns over + * ABA-type races. + */ +struct percpu_list_node *this_cpu_list_pop(struct percpu_list *list, + int *_cpu) +{ + struct percpu_list_node *node = NULL; + int cpu; + + for (;;) { + struct percpu_list_node *head; + intptr_t *targetptr, expectnot, *load; + off_t offset; + int ret; + + cpu = rseq_cpu_start(); + targetptr = (intptr_t *)&list->c[cpu].head; + expectnot = (intptr_t)NULL; + offset = offsetof(struct percpu_list_node, next); + load = (intptr_t *)&head; + ret = rseq_cmpnev_storeoffp_load(targetptr, expectnot, + offset, load, cpu); + if (rseq_likely(!ret)) { + node = head; + break; + } + if (ret > 0) + break; + /* Retry if rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; + return node; +} + +/* + * __percpu_list_pop is not safe against concurrent accesses. Should + * only be used on lists that are not concurrently modified. + */ +struct percpu_list_node *__percpu_list_pop(struct percpu_list *list, int cpu) +{ + struct percpu_list_node *node; + + node = list->c[cpu].head; + if (!node) + return NULL; + list->c[cpu].head = node->next; + return node; +} + +void *test_percpu_list_thread(void *arg) +{ + long long i, reps; + struct percpu_list *list = (struct percpu_list *)arg; + + if (!opt_disable_rseq && rseq_register_current_thread()) + abort(); + + reps = opt_reps; + for (i = 0; i < reps; i++) { + struct percpu_list_node *node; + + node = this_cpu_list_pop(list, NULL); + if (opt_yield) + sched_yield(); /* encourage shuffling */ + if (node) + this_cpu_list_push(list, node, NULL); + } + + printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", + (int) gettid(), nr_abort, signals_delivered); + if (!opt_disable_rseq && rseq_unregister_current_thread()) + abort(); + + return NULL; +} + +/* Simultaneous modification to a per-cpu linked list from many threads. */ +void test_percpu_list(void) +{ + const int num_threads = opt_threads; + int i, j, ret; + uint64_t sum = 0, expected_sum = 0; + struct percpu_list list; + pthread_t test_threads[num_threads]; + cpu_set_t allowed_cpus; + + memset(&list, 0, sizeof(list)); + + /* Generate list entries for every usable cpu. */ + sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus); + for (i = 0; i < CPU_SETSIZE; i++) { + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + for (j = 1; j <= 100; j++) { + struct percpu_list_node *node; + + expected_sum += j; + + node = malloc(sizeof(*node)); + assert(node); + node->data = j; + node->next = list.c[i].head; + list.c[i].head = node; + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_create(&test_threads[i], NULL, + test_percpu_list_thread, &list); + if (ret) { + errno = ret; + perror("pthread_create"); + abort(); + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_join(test_threads[i], NULL); + if (ret) { + errno = ret; + perror("pthread_join"); + abort(); + } + } + + for (i = 0; i < CPU_SETSIZE; i++) { + struct percpu_list_node *node; + + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + + while ((node = __percpu_list_pop(&list, i))) { + sum += node->data; + free(node); + } + } + + /* + * All entries should now be accounted for (unless some external + * actor is interfering with our allowed affinity while this + * test is running). + */ + assert(sum == expected_sum); +} + +bool this_cpu_buffer_push(struct percpu_buffer *buffer, + struct percpu_buffer_node *node, + int *_cpu) +{ + bool result = false; + int cpu; + + for (;;) { + intptr_t *targetptr_spec, newval_spec; + intptr_t *targetptr_final, newval_final; + intptr_t offset; + int ret; + + cpu = rseq_cpu_start(); + offset = RSEQ_READ_ONCE(buffer->c[cpu].offset); + if (offset == buffer->c[cpu].buflen) + break; + newval_spec = (intptr_t)node; + targetptr_spec = (intptr_t *)&buffer->c[cpu].array[offset]; + newval_final = offset + 1; + targetptr_final = &buffer->c[cpu].offset; + if (opt_mb) + ret = rseq_cmpeqv_trystorev_storev_release( + targetptr_final, offset, targetptr_spec, + newval_spec, newval_final, cpu); + else + ret = rseq_cmpeqv_trystorev_storev(targetptr_final, + offset, targetptr_spec, newval_spec, + newval_final, cpu); + if (rseq_likely(!ret)) { + result = true; + break; + } + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; + return result; +} + +struct percpu_buffer_node *this_cpu_buffer_pop(struct percpu_buffer *buffer, + int *_cpu) +{ + struct percpu_buffer_node *head; + int cpu; + + for (;;) { + intptr_t *targetptr, newval; + intptr_t offset; + int ret; + + cpu = rseq_cpu_start(); + /* Load offset with single-copy atomicity. */ + offset = RSEQ_READ_ONCE(buffer->c[cpu].offset); + if (offset == 0) { + head = NULL; + break; + } + head = RSEQ_READ_ONCE(buffer->c[cpu].array[offset - 1]); + newval = offset - 1; + targetptr = (intptr_t *)&buffer->c[cpu].offset; + ret = rseq_cmpeqv_cmpeqv_storev(targetptr, offset, + (intptr_t *)&buffer->c[cpu].array[offset - 1], + (intptr_t)head, newval, cpu); + if (rseq_likely(!ret)) + break; + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; + return head; +} + +/* + * __percpu_buffer_pop is not safe against concurrent accesses. Should + * only be used on buffers that are not concurrently modified. + */ +struct percpu_buffer_node *__percpu_buffer_pop(struct percpu_buffer *buffer, + int cpu) +{ + struct percpu_buffer_node *head; + intptr_t offset; + + offset = buffer->c[cpu].offset; + if (offset == 0) + return NULL; + head = buffer->c[cpu].array[offset - 1]; + buffer->c[cpu].offset = offset - 1; + return head; +} + +void *test_percpu_buffer_thread(void *arg) +{ + long long i, reps; + struct percpu_buffer *buffer = (struct percpu_buffer *)arg; + + if (!opt_disable_rseq && rseq_register_current_thread()) + abort(); + + reps = opt_reps; + for (i = 0; i < reps; i++) { + struct percpu_buffer_node *node; + + node = this_cpu_buffer_pop(buffer, NULL); + if (opt_yield) + sched_yield(); /* encourage shuffling */ + if (node) { + if (!this_cpu_buffer_push(buffer, node, NULL)) { + /* Should increase buffer size. */ + abort(); + } + } + } + + printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", + (int) gettid(), nr_abort, signals_delivered); + if (!opt_disable_rseq && rseq_unregister_current_thread()) + abort(); + + return NULL; +} + +/* Simultaneous modification to a per-cpu buffer from many threads. */ +void test_percpu_buffer(void) +{ + const int num_threads = opt_threads; + int i, j, ret; + uint64_t sum = 0, expected_sum = 0; + struct percpu_buffer buffer; + pthread_t test_threads[num_threads]; + cpu_set_t allowed_cpus; + + memset(&buffer, 0, sizeof(buffer)); + + /* Generate list entries for every usable cpu. */ + sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus); + for (i = 0; i < CPU_SETSIZE; i++) { + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + /* Worse-case is every item in same CPU. */ + buffer.c[i].array = + malloc(sizeof(*buffer.c[i].array) * CPU_SETSIZE * + BUFFER_ITEM_PER_CPU); + assert(buffer.c[i].array); + buffer.c[i].buflen = CPU_SETSIZE * BUFFER_ITEM_PER_CPU; + for (j = 1; j <= BUFFER_ITEM_PER_CPU; j++) { + struct percpu_buffer_node *node; + + expected_sum += j; + + /* + * We could theoretically put the word-sized + * "data" directly in the buffer. However, we + * want to model objects that would not fit + * within a single word, so allocate an object + * for each node. + */ + node = malloc(sizeof(*node)); + assert(node); + node->data = j; + buffer.c[i].array[j - 1] = node; + buffer.c[i].offset++; + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_create(&test_threads[i], NULL, + test_percpu_buffer_thread, &buffer); + if (ret) { + errno = ret; + perror("pthread_create"); + abort(); + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_join(test_threads[i], NULL); + if (ret) { + errno = ret; + perror("pthread_join"); + abort(); + } + } + + for (i = 0; i < CPU_SETSIZE; i++) { + struct percpu_buffer_node *node; + + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + + while ((node = __percpu_buffer_pop(&buffer, i))) { + sum += node->data; + free(node); + } + free(buffer.c[i].array); + } + + /* + * All entries should now be accounted for (unless some external + * actor is interfering with our allowed affinity while this + * test is running). + */ + assert(sum == expected_sum); +} + +bool this_cpu_memcpy_buffer_push(struct percpu_memcpy_buffer *buffer, + struct percpu_memcpy_buffer_node item, + int *_cpu) +{ + bool result = false; + int cpu; + + for (;;) { + intptr_t *targetptr_final, newval_final, offset; + char *destptr, *srcptr; + size_t copylen; + int ret; + + cpu = rseq_cpu_start(); + /* Load offset with single-copy atomicity. */ + offset = RSEQ_READ_ONCE(buffer->c[cpu].offset); + if (offset == buffer->c[cpu].buflen) + break; + destptr = (char *)&buffer->c[cpu].array[offset]; + srcptr = (char *)&item; + /* copylen must be <= 4kB. */ + copylen = sizeof(item); + newval_final = offset + 1; + targetptr_final = &buffer->c[cpu].offset; + if (opt_mb) + ret = rseq_cmpeqv_trymemcpy_storev_release( + targetptr_final, offset, + destptr, srcptr, copylen, + newval_final, cpu); + else + ret = rseq_cmpeqv_trymemcpy_storev(targetptr_final, + offset, destptr, srcptr, copylen, + newval_final, cpu); + if (rseq_likely(!ret)) { + result = true; + break; + } + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; + return result; +} + +bool this_cpu_memcpy_buffer_pop(struct percpu_memcpy_buffer *buffer, + struct percpu_memcpy_buffer_node *item, + int *_cpu) +{ + bool result = false; + int cpu; + + for (;;) { + intptr_t *targetptr_final, newval_final, offset; + char *destptr, *srcptr; + size_t copylen; + int ret; + + cpu = rseq_cpu_start(); + /* Load offset with single-copy atomicity. */ + offset = RSEQ_READ_ONCE(buffer->c[cpu].offset); + if (offset == 0) + break; + destptr = (char *)item; + srcptr = (char *)&buffer->c[cpu].array[offset - 1]; + /* copylen must be <= 4kB. */ + copylen = sizeof(*item); + newval_final = offset - 1; + targetptr_final = &buffer->c[cpu].offset; + ret = rseq_cmpeqv_trymemcpy_storev(targetptr_final, + offset, destptr, srcptr, copylen, + newval_final, cpu); + if (rseq_likely(!ret)) { + result = true; + break; + } + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; + return result; +} + +/* + * __percpu_memcpy_buffer_pop is not safe against concurrent accesses. Should + * only be used on buffers that are not concurrently modified. + */ +bool __percpu_memcpy_buffer_pop(struct percpu_memcpy_buffer *buffer, + struct percpu_memcpy_buffer_node *item, + int cpu) +{ + intptr_t offset; + + offset = buffer->c[cpu].offset; + if (offset == 0) + return false; + memcpy(item, &buffer->c[cpu].array[offset - 1], sizeof(*item)); + buffer->c[cpu].offset = offset - 1; + return true; +} + +void *test_percpu_memcpy_buffer_thread(void *arg) +{ + long long i, reps; + struct percpu_memcpy_buffer *buffer = (struct percpu_memcpy_buffer *)arg; + + if (!opt_disable_rseq && rseq_register_current_thread()) + abort(); + + reps = opt_reps; + for (i = 0; i < reps; i++) { + struct percpu_memcpy_buffer_node item; + bool result; + + result = this_cpu_memcpy_buffer_pop(buffer, &item, NULL); + if (opt_yield) + sched_yield(); /* encourage shuffling */ + if (result) { + if (!this_cpu_memcpy_buffer_push(buffer, item, NULL)) { + /* Should increase buffer size. */ + abort(); + } + } + } + + printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", + (int) gettid(), nr_abort, signals_delivered); + if (!opt_disable_rseq && rseq_unregister_current_thread()) + abort(); + + return NULL; +} + +/* Simultaneous modification to a per-cpu buffer from many threads. */ +void test_percpu_memcpy_buffer(void) +{ + const int num_threads = opt_threads; + int i, j, ret; + uint64_t sum = 0, expected_sum = 0; + struct percpu_memcpy_buffer buffer; + pthread_t test_threads[num_threads]; + cpu_set_t allowed_cpus; + + memset(&buffer, 0, sizeof(buffer)); + + /* Generate list entries for every usable cpu. */ + sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus); + for (i = 0; i < CPU_SETSIZE; i++) { + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + /* Worse-case is every item in same CPU. */ + buffer.c[i].array = + malloc(sizeof(*buffer.c[i].array) * CPU_SETSIZE * + MEMCPY_BUFFER_ITEM_PER_CPU); + assert(buffer.c[i].array); + buffer.c[i].buflen = CPU_SETSIZE * MEMCPY_BUFFER_ITEM_PER_CPU; + for (j = 1; j <= MEMCPY_BUFFER_ITEM_PER_CPU; j++) { + expected_sum += 2 * j + 1; + + /* + * We could theoretically put the word-sized + * "data" directly in the buffer. However, we + * want to model objects that would not fit + * within a single word, so allocate an object + * for each node. + */ + buffer.c[i].array[j - 1].data1 = j; + buffer.c[i].array[j - 1].data2 = j + 1; + buffer.c[i].offset++; + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_create(&test_threads[i], NULL, + test_percpu_memcpy_buffer_thread, + &buffer); + if (ret) { + errno = ret; + perror("pthread_create"); + abort(); + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_join(test_threads[i], NULL); + if (ret) { + errno = ret; + perror("pthread_join"); + abort(); + } + } + + for (i = 0; i < CPU_SETSIZE; i++) { + struct percpu_memcpy_buffer_node item; + + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + + while (__percpu_memcpy_buffer_pop(&buffer, &item, i)) { + sum += item.data1; + sum += item.data2; + } + free(buffer.c[i].array); + } + + /* + * All entries should now be accounted for (unless some external + * actor is interfering with our allowed affinity while this + * test is running). + */ + assert(sum == expected_sum); +} + +static void test_signal_interrupt_handler(int signo) +{ + signals_delivered++; +} + +static int set_signal_handler(void) +{ + int ret = 0; + struct sigaction sa; + sigset_t sigset; + + ret = sigemptyset(&sigset); + if (ret < 0) { + perror("sigemptyset"); + return ret; + } + + sa.sa_handler = test_signal_interrupt_handler; + sa.sa_mask = sigset; + sa.sa_flags = 0; + ret = sigaction(SIGUSR1, &sa, NULL); + if (ret < 0) { + perror("sigaction"); + return ret; + } + + printf_verbose("Signal handler set for SIGUSR1\n"); + + return ret; +} + +static void show_usage(int argc, char **argv) +{ + printf("Usage : %s <OPTIONS>\n", + argv[0]); + printf("OPTIONS:\n"); + printf(" [-1 loops] Number of loops for delay injection 1\n"); + printf(" [-2 loops] Number of loops for delay injection 2\n"); + printf(" [-3 loops] Number of loops for delay injection 3\n"); + printf(" [-4 loops] Number of loops for delay injection 4\n"); + printf(" [-5 loops] Number of loops for delay injection 5\n"); + printf(" [-6 loops] Number of loops for delay injection 6\n"); + printf(" [-7 loops] Number of loops for delay injection 7 (-1 to enable -m)\n"); + printf(" [-8 loops] Number of loops for delay injection 8 (-1 to enable -m)\n"); + printf(" [-9 loops] Number of loops for delay injection 9 (-1 to enable -m)\n"); + printf(" [-m N] Yield/sleep/kill every modulo N (default 0: disabled) (>= 0)\n"); + printf(" [-y] Yield\n"); + printf(" [-k] Kill thread with signal\n"); + printf(" [-s S] S: =0: disabled (default), >0: sleep time (ms)\n"); + printf(" [-t N] Number of threads (default 200)\n"); + printf(" [-r N] Number of repetitions per thread (default 5000)\n"); + printf(" [-d] Disable rseq system call (no initialization)\n"); + printf(" [-D M] Disable rseq for each M threads\n"); + printf(" [-T test] Choose test: (s)pinlock, (l)ist, (b)uffer, (m)emcpy, (i)ncrement\n"); + printf(" [-M] Push into buffer and memcpy buffer with memory barriers.\n"); + printf(" [-v] Verbose output.\n"); + printf(" [-h] Show this help.\n"); + printf("\n"); +} + +int main(int argc, char **argv) +{ + int i; + + for (i = 1; i < argc; i++) { + if (argv[i][0] != '-') + continue; + switch (argv[i][1]) { + case '1': + case '2': + case '3': + case '4': + case '5': + case '6': + case '7': + case '8': + case '9': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + loop_cnt[argv[i][1] - '0'] = atol(argv[i + 1]); + i++; + break; + case 'm': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_modulo = atol(argv[i + 1]); + if (opt_modulo < 0) { + show_usage(argc, argv); + goto error; + } + i++; + break; + case 's': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_sleep = atol(argv[i + 1]); + if (opt_sleep < 0) { + show_usage(argc, argv); + goto error; + } + i++; + break; + case 'y': + opt_yield = 1; + break; + case 'k': + opt_signal = 1; + break; + case 'd': + opt_disable_rseq = 1; + break; + case 'D': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_disable_mod = atol(argv[i + 1]); + if (opt_disable_mod < 0) { + show_usage(argc, argv); + goto error; + } + i++; + break; + case 't': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_threads = atol(argv[i + 1]); + if (opt_threads < 0) { + show_usage(argc, argv); + goto error; + } + i++; + break; + case 'r': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_reps = atoll(argv[i + 1]); + if (opt_reps < 0) { + show_usage(argc, argv); + goto error; + } + i++; + break; + case 'h': + show_usage(argc, argv); + goto end; + case 'T': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_test = *argv[i + 1]; + switch (opt_test) { + case 's': + case 'l': + case 'i': + case 'b': + case 'm': + break; + default: + show_usage(argc, argv); + goto error; + } + i++; + break; + case 'v': + verbose = 1; + break; + case 'M': + opt_mb = 1; + break; + default: + show_usage(argc, argv); + goto error; + } + } + + loop_cnt_1 = loop_cnt[1]; + loop_cnt_2 = loop_cnt[2]; + loop_cnt_3 = loop_cnt[3]; + loop_cnt_4 = loop_cnt[4]; + loop_cnt_5 = loop_cnt[5]; + loop_cnt_6 = loop_cnt[6]; + + if (set_signal_handler()) + goto error; + + if (!opt_disable_rseq && rseq_register_current_thread()) + goto error; + switch (opt_test) { + case 's': + printf_verbose("spinlock\n"); + test_percpu_spinlock(); + break; + case 'l': + printf_verbose("linked list\n"); + test_percpu_list(); + break; + case 'b': + printf_verbose("buffer\n"); + test_percpu_buffer(); + break; + case 'm': + printf_verbose("memcpy buffer\n"); + test_percpu_memcpy_buffer(); + break; + case 'i': + printf_verbose("counter increment\n"); + test_percpu_inc(); + break; + } + if (!opt_disable_rseq && rseq_unregister_current_thread()) + abort(); +end: + return 0; + +error: + return -1; +} -- 2.11.0 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 13/14] rseq: selftests: Provide parametrized tests (v2) @ 2018-04-30 22:44 ` mathieu.desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) "param_test" is a parametrizable restartable sequences test. See the "--help" output for usage. "param_test_benchmark" is the same as "param_test", but it removes testing book-keeping code to allow accurate benchmarks. "param_test_compare_twice" is the same as "param_test", but it performs each comparison within rseq critical section twice, thus validating invariants. If any of the second comparisons fails, an error message is printed and the test aborts. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com> CC: Shuah Khan <shuahkh at osg.samsung.com> CC: Russell King <linux at arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas at arm.com> CC: Will Deacon <will.deacon at arm.com> CC: Thomas Gleixner <tglx at linutronix.de> CC: Paul Turner <pjt at google.com> CC: Andrew Hunter <ahh at google.com> CC: Peter Zijlstra <peterz at infradead.org> CC: Andy Lutomirski <luto at amacapital.net> CC: Andi Kleen <andi at firstfloor.org> CC: Dave Watson <davejwatson at fb.com> CC: Chris Lameter <cl at linux.com> CC: Ingo Molnar <mingo at redhat.com> CC: "H. Peter Anvin" <hpa at zytor.com> CC: Ben Maurer <bmaurer at fb.com> CC: Steven Rostedt <rostedt at goodmis.org> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com> CC: Josh Triplett <josh at joshtriplett.org> CC: Linus Torvalds <torvalds at linux-foundation.org> CC: Andrew Morton <akpm at linux-foundation.org> CC: Boqun Feng <boqun.feng at gmail.com> CC: linux-kselftest at vger.kernel.org CC: linux-api at vger.kernel.org --- Changes since v1: - Use only rseq, remove use of cpu_opv. --- tools/testing/selftests/rseq/param_test.c | 1260 +++++++++++++++++++++++++++++ 1 file changed, 1260 insertions(+) create mode 100644 tools/testing/selftests/rseq/param_test.c diff --git a/tools/testing/selftests/rseq/param_test.c b/tools/testing/selftests/rseq/param_test.c new file mode 100644 index 000000000000..6a9f602a8718 --- /dev/null +++ b/tools/testing/selftests/rseq/param_test.c @@ -0,0 +1,1260 @@ +// SPDX-License-Identifier: LGPL-2.1 +#define _GNU_SOURCE +#include <assert.h> +#include <pthread.h> +#include <sched.h> +#include <stdint.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <syscall.h> +#include <unistd.h> +#include <poll.h> +#include <sys/types.h> +#include <signal.h> +#include <errno.h> +#include <stddef.h> + +static inline pid_t gettid(void) +{ + return syscall(__NR_gettid); +} + +#define NR_INJECT 9 +static int loop_cnt[NR_INJECT + 1]; + +static int loop_cnt_1 asm("asm_loop_cnt_1") __attribute__((used)); +static int loop_cnt_2 asm("asm_loop_cnt_2") __attribute__((used)); +static int loop_cnt_3 asm("asm_loop_cnt_3") __attribute__((used)); +static int loop_cnt_4 asm("asm_loop_cnt_4") __attribute__((used)); +static int loop_cnt_5 asm("asm_loop_cnt_5") __attribute__((used)); +static int loop_cnt_6 asm("asm_loop_cnt_6") __attribute__((used)); + +static int opt_modulo, verbose; + +static int opt_yield, opt_signal, opt_sleep, + opt_disable_rseq, opt_threads = 200, + opt_disable_mod = 0, opt_test = 's', opt_mb = 0; + +#ifndef RSEQ_SKIP_FASTPATH +static long long opt_reps = 5000; +#else +static long long opt_reps = 100; +#endif + +static __thread __attribute__((tls_model("initial-exec"))) +unsigned int signals_delivered; + +#ifndef BENCHMARK + +static __thread __attribute__((tls_model("initial-exec"), unused)) +unsigned int yield_mod_cnt, nr_abort; + +#define printf_verbose(fmt, ...) \ + do { \ + if (verbose) \ + printf(fmt, ## __VA_ARGS__); \ + } while (0) + +#if defined(__x86_64__) || defined(__i386__) + +#define INJECT_ASM_REG "eax" + +#define RSEQ_INJECT_CLOBBER \ + , INJECT_ASM_REG + +#ifdef __i386__ + +#define RSEQ_INJECT_ASM(n) \ + "mov asm_loop_cnt_" #n ", %%" INJECT_ASM_REG "\n\t" \ + "test %%" INJECT_ASM_REG ",%%" INJECT_ASM_REG "\n\t" \ + "jz 333f\n\t" \ + "222:\n\t" \ + "dec %%" INJECT_ASM_REG "\n\t" \ + "jnz 222b\n\t" \ + "333:\n\t" + +#elif defined(__x86_64__) + +#define RSEQ_INJECT_ASM(n) \ + "lea asm_loop_cnt_" #n "(%%rip), %%" INJECT_ASM_REG "\n\t" \ + "mov (%%" INJECT_ASM_REG "), %%" INJECT_ASM_REG "\n\t" \ + "test %%" INJECT_ASM_REG ",%%" INJECT_ASM_REG "\n\t" \ + "jz 333f\n\t" \ + "222:\n\t" \ + "dec %%" INJECT_ASM_REG "\n\t" \ + "jnz 222b\n\t" \ + "333:\n\t" + +#else +#error "Unsupported architecture" +#endif + +#elif defined(__ARMEL__) + +#define RSEQ_INJECT_INPUT \ + , [loop_cnt_1]"m"(loop_cnt[1]) \ + , [loop_cnt_2]"m"(loop_cnt[2]) \ + , [loop_cnt_3]"m"(loop_cnt[3]) \ + , [loop_cnt_4]"m"(loop_cnt[4]) \ + , [loop_cnt_5]"m"(loop_cnt[5]) \ + , [loop_cnt_6]"m"(loop_cnt[6]) + +#define INJECT_ASM_REG "r4" + +#define RSEQ_INJECT_CLOBBER \ + , INJECT_ASM_REG + +#define RSEQ_INJECT_ASM(n) \ + "ldr " INJECT_ASM_REG ", %[loop_cnt_" #n "]\n\t" \ + "cmp " INJECT_ASM_REG ", #0\n\t" \ + "beq 333f\n\t" \ + "222:\n\t" \ + "subs " INJECT_ASM_REG ", #1\n\t" \ + "bne 222b\n\t" \ + "333:\n\t" + +#elif __PPC__ + +#define RSEQ_INJECT_INPUT \ + , [loop_cnt_1]"m"(loop_cnt[1]) \ + , [loop_cnt_2]"m"(loop_cnt[2]) \ + , [loop_cnt_3]"m"(loop_cnt[3]) \ + , [loop_cnt_4]"m"(loop_cnt[4]) \ + , [loop_cnt_5]"m"(loop_cnt[5]) \ + , [loop_cnt_6]"m"(loop_cnt[6]) + +#define INJECT_ASM_REG "r18" + +#define RSEQ_INJECT_CLOBBER \ + , INJECT_ASM_REG + +#define RSEQ_INJECT_ASM(n) \ + "lwz %%" INJECT_ASM_REG ", %[loop_cnt_" #n "]\n\t" \ + "cmpwi %%" INJECT_ASM_REG ", 0\n\t" \ + "beq 333f\n\t" \ + "222:\n\t" \ + "subic. %%" INJECT_ASM_REG ", %%" INJECT_ASM_REG ", 1\n\t" \ + "bne 222b\n\t" \ + "333:\n\t" +#else +#error unsupported target +#endif + +#define RSEQ_INJECT_FAILED \ + nr_abort++; + +#define RSEQ_INJECT_C(n) \ +{ \ + int loc_i, loc_nr_loops = loop_cnt[n]; \ + \ + for (loc_i = 0; loc_i < loc_nr_loops; loc_i++) { \ + rseq_barrier(); \ + } \ + if (loc_nr_loops == -1 && opt_modulo) { \ + if (yield_mod_cnt == opt_modulo - 1) { \ + if (opt_sleep > 0) \ + poll(NULL, 0, opt_sleep); \ + if (opt_yield) \ + sched_yield(); \ + if (opt_signal) \ + raise(SIGUSR1); \ + yield_mod_cnt = 0; \ + } else { \ + yield_mod_cnt++; \ + } \ + } \ +} + +#else + +#define printf_verbose(fmt, ...) + +#endif /* BENCHMARK */ + +#include "rseq.h" + +struct percpu_lock_entry { + intptr_t v; +} __attribute__((aligned(128))); + +struct percpu_lock { + struct percpu_lock_entry c[CPU_SETSIZE]; +}; + +struct test_data_entry { + intptr_t count; +} __attribute__((aligned(128))); + +struct spinlock_test_data { + struct percpu_lock lock; + struct test_data_entry c[CPU_SETSIZE]; +}; + +struct spinlock_thread_test_data { + struct spinlock_test_data *data; + long long reps; + int reg; +}; + +struct inc_test_data { + struct test_data_entry c[CPU_SETSIZE]; +}; + +struct inc_thread_test_data { + struct inc_test_data *data; + long long reps; + int reg; +}; + +struct percpu_list_node { + intptr_t data; + struct percpu_list_node *next; +}; + +struct percpu_list_entry { + struct percpu_list_node *head; +} __attribute__((aligned(128))); + +struct percpu_list { + struct percpu_list_entry c[CPU_SETSIZE]; +}; + +#define BUFFER_ITEM_PER_CPU 100 + +struct percpu_buffer_node { + intptr_t data; +}; + +struct percpu_buffer_entry { + intptr_t offset; + intptr_t buflen; + struct percpu_buffer_node **array; +} __attribute__((aligned(128))); + +struct percpu_buffer { + struct percpu_buffer_entry c[CPU_SETSIZE]; +}; + +#define MEMCPY_BUFFER_ITEM_PER_CPU 100 + +struct percpu_memcpy_buffer_node { + intptr_t data1; + uint64_t data2; +}; + +struct percpu_memcpy_buffer_entry { + intptr_t offset; + intptr_t buflen; + struct percpu_memcpy_buffer_node *array; +} __attribute__((aligned(128))); + +struct percpu_memcpy_buffer { + struct percpu_memcpy_buffer_entry c[CPU_SETSIZE]; +}; + +/* A simple percpu spinlock. Grabs lock on current cpu. */ +static int rseq_this_cpu_lock(struct percpu_lock *lock) +{ + int cpu; + + for (;;) { + int ret; + + cpu = rseq_cpu_start(); + ret = rseq_cmpeqv_storev(&lock->c[cpu].v, + 0, 1, cpu); + if (rseq_likely(!ret)) + break; + /* Retry if comparison fails or rseq aborts. */ + } + /* + * Acquire semantic when taking lock after control dependency. + * Matches rseq_smp_store_release(). + */ + rseq_smp_acquire__after_ctrl_dep(); + return cpu; +} + +static void rseq_percpu_unlock(struct percpu_lock *lock, int cpu) +{ + assert(lock->c[cpu].v == 1); + /* + * Release lock, with release semantic. Matches + * rseq_smp_acquire__after_ctrl_dep(). + */ + rseq_smp_store_release(&lock->c[cpu].v, 0); +} + +void *test_percpu_spinlock_thread(void *arg) +{ + struct spinlock_thread_test_data *thread_data = arg; + struct spinlock_test_data *data = thread_data->data; + long long i, reps; + + if (!opt_disable_rseq && thread_data->reg && + rseq_register_current_thread()) + abort(); + reps = thread_data->reps; + for (i = 0; i < reps; i++) { + int cpu = rseq_cpu_start(); + + cpu = rseq_this_cpu_lock(&data->lock); + data->c[cpu].count++; + rseq_percpu_unlock(&data->lock, cpu); +#ifndef BENCHMARK + if (i != 0 && !(i % (reps / 10))) + printf_verbose("tid %d: count %lld\n", (int) gettid(), i); +#endif + } + printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", + (int) gettid(), nr_abort, signals_delivered); + if (!opt_disable_rseq && thread_data->reg && + rseq_unregister_current_thread()) + abort(); + return NULL; +} + +/* + * A simple test which implements a sharded counter using a per-cpu + * lock. Obviously real applications might prefer to simply use a + * per-cpu increment; however, this is reasonable for a test and the + * lock can be extended to synchronize more complicated operations. + */ +void test_percpu_spinlock(void) +{ + const int num_threads = opt_threads; + int i, ret; + uint64_t sum; + pthread_t test_threads[num_threads]; + struct spinlock_test_data data; + struct spinlock_thread_test_data thread_data[num_threads]; + + memset(&data, 0, sizeof(data)); + for (i = 0; i < num_threads; i++) { + thread_data[i].reps = opt_reps; + if (opt_disable_mod <= 0 || (i % opt_disable_mod)) + thread_data[i].reg = 1; + else + thread_data[i].reg = 0; + thread_data[i].data = &data; + ret = pthread_create(&test_threads[i], NULL, + test_percpu_spinlock_thread, + &thread_data[i]); + if (ret) { + errno = ret; + perror("pthread_create"); + abort(); + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_join(test_threads[i], NULL); + if (ret) { + errno = ret; + perror("pthread_join"); + abort(); + } + } + + sum = 0; + for (i = 0; i < CPU_SETSIZE; i++) + sum += data.c[i].count; + + assert(sum == (uint64_t)opt_reps * num_threads); +} + +void *test_percpu_inc_thread(void *arg) +{ + struct inc_thread_test_data *thread_data = arg; + struct inc_test_data *data = thread_data->data; + long long i, reps; + + if (!opt_disable_rseq && thread_data->reg && + rseq_register_current_thread()) + abort(); + reps = thread_data->reps; + for (i = 0; i < reps; i++) { + int ret; + + do { + int cpu; + + cpu = rseq_cpu_start(); + ret = rseq_addv(&data->c[cpu].count, 1, cpu); + } while (rseq_unlikely(ret)); +#ifndef BENCHMARK + if (i != 0 && !(i % (reps / 10))) + printf_verbose("tid %d: count %lld\n", (int) gettid(), i); +#endif + } + printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", + (int) gettid(), nr_abort, signals_delivered); + if (!opt_disable_rseq && thread_data->reg && + rseq_unregister_current_thread()) + abort(); + return NULL; +} + +void test_percpu_inc(void) +{ + const int num_threads = opt_threads; + int i, ret; + uint64_t sum; + pthread_t test_threads[num_threads]; + struct inc_test_data data; + struct inc_thread_test_data thread_data[num_threads]; + + memset(&data, 0, sizeof(data)); + for (i = 0; i < num_threads; i++) { + thread_data[i].reps = opt_reps; + if (opt_disable_mod <= 0 || (i % opt_disable_mod)) + thread_data[i].reg = 1; + else + thread_data[i].reg = 0; + thread_data[i].data = &data; + ret = pthread_create(&test_threads[i], NULL, + test_percpu_inc_thread, + &thread_data[i]); + if (ret) { + errno = ret; + perror("pthread_create"); + abort(); + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_join(test_threads[i], NULL); + if (ret) { + errno = ret; + perror("pthread_join"); + abort(); + } + } + + sum = 0; + for (i = 0; i < CPU_SETSIZE; i++) + sum += data.c[i].count; + + assert(sum == (uint64_t)opt_reps * num_threads); +} + +void this_cpu_list_push(struct percpu_list *list, + struct percpu_list_node *node, + int *_cpu) +{ + int cpu; + + for (;;) { + intptr_t *targetptr, newval, expect; + int ret; + + cpu = rseq_cpu_start(); + /* Load list->c[cpu].head with single-copy atomicity. */ + expect = (intptr_t)RSEQ_READ_ONCE(list->c[cpu].head); + newval = (intptr_t)node; + targetptr = (intptr_t *)&list->c[cpu].head; + node->next = (struct percpu_list_node *)expect; + ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu); + if (rseq_likely(!ret)) + break; + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; +} + +/* + * Unlike a traditional lock-less linked list; the availability of a + * rseq primitive allows us to implement pop without concerns over + * ABA-type races. + */ +struct percpu_list_node *this_cpu_list_pop(struct percpu_list *list, + int *_cpu) +{ + struct percpu_list_node *node = NULL; + int cpu; + + for (;;) { + struct percpu_list_node *head; + intptr_t *targetptr, expectnot, *load; + off_t offset; + int ret; + + cpu = rseq_cpu_start(); + targetptr = (intptr_t *)&list->c[cpu].head; + expectnot = (intptr_t)NULL; + offset = offsetof(struct percpu_list_node, next); + load = (intptr_t *)&head; + ret = rseq_cmpnev_storeoffp_load(targetptr, expectnot, + offset, load, cpu); + if (rseq_likely(!ret)) { + node = head; + break; + } + if (ret > 0) + break; + /* Retry if rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; + return node; +} + +/* + * __percpu_list_pop is not safe against concurrent accesses. Should + * only be used on lists that are not concurrently modified. + */ +struct percpu_list_node *__percpu_list_pop(struct percpu_list *list, int cpu) +{ + struct percpu_list_node *node; + + node = list->c[cpu].head; + if (!node) + return NULL; + list->c[cpu].head = node->next; + return node; +} + +void *test_percpu_list_thread(void *arg) +{ + long long i, reps; + struct percpu_list *list = (struct percpu_list *)arg; + + if (!opt_disable_rseq && rseq_register_current_thread()) + abort(); + + reps = opt_reps; + for (i = 0; i < reps; i++) { + struct percpu_list_node *node; + + node = this_cpu_list_pop(list, NULL); + if (opt_yield) + sched_yield(); /* encourage shuffling */ + if (node) + this_cpu_list_push(list, node, NULL); + } + + printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", + (int) gettid(), nr_abort, signals_delivered); + if (!opt_disable_rseq && rseq_unregister_current_thread()) + abort(); + + return NULL; +} + +/* Simultaneous modification to a per-cpu linked list from many threads. */ +void test_percpu_list(void) +{ + const int num_threads = opt_threads; + int i, j, ret; + uint64_t sum = 0, expected_sum = 0; + struct percpu_list list; + pthread_t test_threads[num_threads]; + cpu_set_t allowed_cpus; + + memset(&list, 0, sizeof(list)); + + /* Generate list entries for every usable cpu. */ + sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus); + for (i = 0; i < CPU_SETSIZE; i++) { + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + for (j = 1; j <= 100; j++) { + struct percpu_list_node *node; + + expected_sum += j; + + node = malloc(sizeof(*node)); + assert(node); + node->data = j; + node->next = list.c[i].head; + list.c[i].head = node; + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_create(&test_threads[i], NULL, + test_percpu_list_thread, &list); + if (ret) { + errno = ret; + perror("pthread_create"); + abort(); + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_join(test_threads[i], NULL); + if (ret) { + errno = ret; + perror("pthread_join"); + abort(); + } + } + + for (i = 0; i < CPU_SETSIZE; i++) { + struct percpu_list_node *node; + + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + + while ((node = __percpu_list_pop(&list, i))) { + sum += node->data; + free(node); + } + } + + /* + * All entries should now be accounted for (unless some external + * actor is interfering with our allowed affinity while this + * test is running). + */ + assert(sum == expected_sum); +} + +bool this_cpu_buffer_push(struct percpu_buffer *buffer, + struct percpu_buffer_node *node, + int *_cpu) +{ + bool result = false; + int cpu; + + for (;;) { + intptr_t *targetptr_spec, newval_spec; + intptr_t *targetptr_final, newval_final; + intptr_t offset; + int ret; + + cpu = rseq_cpu_start(); + offset = RSEQ_READ_ONCE(buffer->c[cpu].offset); + if (offset == buffer->c[cpu].buflen) + break; + newval_spec = (intptr_t)node; + targetptr_spec = (intptr_t *)&buffer->c[cpu].array[offset]; + newval_final = offset + 1; + targetptr_final = &buffer->c[cpu].offset; + if (opt_mb) + ret = rseq_cmpeqv_trystorev_storev_release( + targetptr_final, offset, targetptr_spec, + newval_spec, newval_final, cpu); + else + ret = rseq_cmpeqv_trystorev_storev(targetptr_final, + offset, targetptr_spec, newval_spec, + newval_final, cpu); + if (rseq_likely(!ret)) { + result = true; + break; + } + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; + return result; +} + +struct percpu_buffer_node *this_cpu_buffer_pop(struct percpu_buffer *buffer, + int *_cpu) +{ + struct percpu_buffer_node *head; + int cpu; + + for (;;) { + intptr_t *targetptr, newval; + intptr_t offset; + int ret; + + cpu = rseq_cpu_start(); + /* Load offset with single-copy atomicity. */ + offset = RSEQ_READ_ONCE(buffer->c[cpu].offset); + if (offset == 0) { + head = NULL; + break; + } + head = RSEQ_READ_ONCE(buffer->c[cpu].array[offset - 1]); + newval = offset - 1; + targetptr = (intptr_t *)&buffer->c[cpu].offset; + ret = rseq_cmpeqv_cmpeqv_storev(targetptr, offset, + (intptr_t *)&buffer->c[cpu].array[offset - 1], + (intptr_t)head, newval, cpu); + if (rseq_likely(!ret)) + break; + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; + return head; +} + +/* + * __percpu_buffer_pop is not safe against concurrent accesses. Should + * only be used on buffers that are not concurrently modified. + */ +struct percpu_buffer_node *__percpu_buffer_pop(struct percpu_buffer *buffer, + int cpu) +{ + struct percpu_buffer_node *head; + intptr_t offset; + + offset = buffer->c[cpu].offset; + if (offset == 0) + return NULL; + head = buffer->c[cpu].array[offset - 1]; + buffer->c[cpu].offset = offset - 1; + return head; +} + +void *test_percpu_buffer_thread(void *arg) +{ + long long i, reps; + struct percpu_buffer *buffer = (struct percpu_buffer *)arg; + + if (!opt_disable_rseq && rseq_register_current_thread()) + abort(); + + reps = opt_reps; + for (i = 0; i < reps; i++) { + struct percpu_buffer_node *node; + + node = this_cpu_buffer_pop(buffer, NULL); + if (opt_yield) + sched_yield(); /* encourage shuffling */ + if (node) { + if (!this_cpu_buffer_push(buffer, node, NULL)) { + /* Should increase buffer size. */ + abort(); + } + } + } + + printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", + (int) gettid(), nr_abort, signals_delivered); + if (!opt_disable_rseq && rseq_unregister_current_thread()) + abort(); + + return NULL; +} + +/* Simultaneous modification to a per-cpu buffer from many threads. */ +void test_percpu_buffer(void) +{ + const int num_threads = opt_threads; + int i, j, ret; + uint64_t sum = 0, expected_sum = 0; + struct percpu_buffer buffer; + pthread_t test_threads[num_threads]; + cpu_set_t allowed_cpus; + + memset(&buffer, 0, sizeof(buffer)); + + /* Generate list entries for every usable cpu. */ + sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus); + for (i = 0; i < CPU_SETSIZE; i++) { + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + /* Worse-case is every item in same CPU. */ + buffer.c[i].array = + malloc(sizeof(*buffer.c[i].array) * CPU_SETSIZE * + BUFFER_ITEM_PER_CPU); + assert(buffer.c[i].array); + buffer.c[i].buflen = CPU_SETSIZE * BUFFER_ITEM_PER_CPU; + for (j = 1; j <= BUFFER_ITEM_PER_CPU; j++) { + struct percpu_buffer_node *node; + + expected_sum += j; + + /* + * We could theoretically put the word-sized + * "data" directly in the buffer. However, we + * want to model objects that would not fit + * within a single word, so allocate an object + * for each node. + */ + node = malloc(sizeof(*node)); + assert(node); + node->data = j; + buffer.c[i].array[j - 1] = node; + buffer.c[i].offset++; + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_create(&test_threads[i], NULL, + test_percpu_buffer_thread, &buffer); + if (ret) { + errno = ret; + perror("pthread_create"); + abort(); + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_join(test_threads[i], NULL); + if (ret) { + errno = ret; + perror("pthread_join"); + abort(); + } + } + + for (i = 0; i < CPU_SETSIZE; i++) { + struct percpu_buffer_node *node; + + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + + while ((node = __percpu_buffer_pop(&buffer, i))) { + sum += node->data; + free(node); + } + free(buffer.c[i].array); + } + + /* + * All entries should now be accounted for (unless some external + * actor is interfering with our allowed affinity while this + * test is running). + */ + assert(sum == expected_sum); +} + +bool this_cpu_memcpy_buffer_push(struct percpu_memcpy_buffer *buffer, + struct percpu_memcpy_buffer_node item, + int *_cpu) +{ + bool result = false; + int cpu; + + for (;;) { + intptr_t *targetptr_final, newval_final, offset; + char *destptr, *srcptr; + size_t copylen; + int ret; + + cpu = rseq_cpu_start(); + /* Load offset with single-copy atomicity. */ + offset = RSEQ_READ_ONCE(buffer->c[cpu].offset); + if (offset == buffer->c[cpu].buflen) + break; + destptr = (char *)&buffer->c[cpu].array[offset]; + srcptr = (char *)&item; + /* copylen must be <= 4kB. */ + copylen = sizeof(item); + newval_final = offset + 1; + targetptr_final = &buffer->c[cpu].offset; + if (opt_mb) + ret = rseq_cmpeqv_trymemcpy_storev_release( + targetptr_final, offset, + destptr, srcptr, copylen, + newval_final, cpu); + else + ret = rseq_cmpeqv_trymemcpy_storev(targetptr_final, + offset, destptr, srcptr, copylen, + newval_final, cpu); + if (rseq_likely(!ret)) { + result = true; + break; + } + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; + return result; +} + +bool this_cpu_memcpy_buffer_pop(struct percpu_memcpy_buffer *buffer, + struct percpu_memcpy_buffer_node *item, + int *_cpu) +{ + bool result = false; + int cpu; + + for (;;) { + intptr_t *targetptr_final, newval_final, offset; + char *destptr, *srcptr; + size_t copylen; + int ret; + + cpu = rseq_cpu_start(); + /* Load offset with single-copy atomicity. */ + offset = RSEQ_READ_ONCE(buffer->c[cpu].offset); + if (offset == 0) + break; + destptr = (char *)item; + srcptr = (char *)&buffer->c[cpu].array[offset - 1]; + /* copylen must be <= 4kB. */ + copylen = sizeof(*item); + newval_final = offset - 1; + targetptr_final = &buffer->c[cpu].offset; + ret = rseq_cmpeqv_trymemcpy_storev(targetptr_final, + offset, destptr, srcptr, copylen, + newval_final, cpu); + if (rseq_likely(!ret)) { + result = true; + break; + } + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; + return result; +} + +/* + * __percpu_memcpy_buffer_pop is not safe against concurrent accesses. Should + * only be used on buffers that are not concurrently modified. + */ +bool __percpu_memcpy_buffer_pop(struct percpu_memcpy_buffer *buffer, + struct percpu_memcpy_buffer_node *item, + int cpu) +{ + intptr_t offset; + + offset = buffer->c[cpu].offset; + if (offset == 0) + return false; + memcpy(item, &buffer->c[cpu].array[offset - 1], sizeof(*item)); + buffer->c[cpu].offset = offset - 1; + return true; +} + +void *test_percpu_memcpy_buffer_thread(void *arg) +{ + long long i, reps; + struct percpu_memcpy_buffer *buffer = (struct percpu_memcpy_buffer *)arg; + + if (!opt_disable_rseq && rseq_register_current_thread()) + abort(); + + reps = opt_reps; + for (i = 0; i < reps; i++) { + struct percpu_memcpy_buffer_node item; + bool result; + + result = this_cpu_memcpy_buffer_pop(buffer, &item, NULL); + if (opt_yield) + sched_yield(); /* encourage shuffling */ + if (result) { + if (!this_cpu_memcpy_buffer_push(buffer, item, NULL)) { + /* Should increase buffer size. */ + abort(); + } + } + } + + printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", + (int) gettid(), nr_abort, signals_delivered); + if (!opt_disable_rseq && rseq_unregister_current_thread()) + abort(); + + return NULL; +} + +/* Simultaneous modification to a per-cpu buffer from many threads. */ +void test_percpu_memcpy_buffer(void) +{ + const int num_threads = opt_threads; + int i, j, ret; + uint64_t sum = 0, expected_sum = 0; + struct percpu_memcpy_buffer buffer; + pthread_t test_threads[num_threads]; + cpu_set_t allowed_cpus; + + memset(&buffer, 0, sizeof(buffer)); + + /* Generate list entries for every usable cpu. */ + sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus); + for (i = 0; i < CPU_SETSIZE; i++) { + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + /* Worse-case is every item in same CPU. */ + buffer.c[i].array = + malloc(sizeof(*buffer.c[i].array) * CPU_SETSIZE * + MEMCPY_BUFFER_ITEM_PER_CPU); + assert(buffer.c[i].array); + buffer.c[i].buflen = CPU_SETSIZE * MEMCPY_BUFFER_ITEM_PER_CPU; + for (j = 1; j <= MEMCPY_BUFFER_ITEM_PER_CPU; j++) { + expected_sum += 2 * j + 1; + + /* + * We could theoretically put the word-sized + * "data" directly in the buffer. However, we + * want to model objects that would not fit + * within a single word, so allocate an object + * for each node. + */ + buffer.c[i].array[j - 1].data1 = j; + buffer.c[i].array[j - 1].data2 = j + 1; + buffer.c[i].offset++; + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_create(&test_threads[i], NULL, + test_percpu_memcpy_buffer_thread, + &buffer); + if (ret) { + errno = ret; + perror("pthread_create"); + abort(); + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_join(test_threads[i], NULL); + if (ret) { + errno = ret; + perror("pthread_join"); + abort(); + } + } + + for (i = 0; i < CPU_SETSIZE; i++) { + struct percpu_memcpy_buffer_node item; + + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + + while (__percpu_memcpy_buffer_pop(&buffer, &item, i)) { + sum += item.data1; + sum += item.data2; + } + free(buffer.c[i].array); + } + + /* + * All entries should now be accounted for (unless some external + * actor is interfering with our allowed affinity while this + * test is running). + */ + assert(sum == expected_sum); +} + +static void test_signal_interrupt_handler(int signo) +{ + signals_delivered++; +} + +static int set_signal_handler(void) +{ + int ret = 0; + struct sigaction sa; + sigset_t sigset; + + ret = sigemptyset(&sigset); + if (ret < 0) { + perror("sigemptyset"); + return ret; + } + + sa.sa_handler = test_signal_interrupt_handler; + sa.sa_mask = sigset; + sa.sa_flags = 0; + ret = sigaction(SIGUSR1, &sa, NULL); + if (ret < 0) { + perror("sigaction"); + return ret; + } + + printf_verbose("Signal handler set for SIGUSR1\n"); + + return ret; +} + +static void show_usage(int argc, char **argv) +{ + printf("Usage : %s <OPTIONS>\n", + argv[0]); + printf("OPTIONS:\n"); + printf(" [-1 loops] Number of loops for delay injection 1\n"); + printf(" [-2 loops] Number of loops for delay injection 2\n"); + printf(" [-3 loops] Number of loops for delay injection 3\n"); + printf(" [-4 loops] Number of loops for delay injection 4\n"); + printf(" [-5 loops] Number of loops for delay injection 5\n"); + printf(" [-6 loops] Number of loops for delay injection 6\n"); + printf(" [-7 loops] Number of loops for delay injection 7 (-1 to enable -m)\n"); + printf(" [-8 loops] Number of loops for delay injection 8 (-1 to enable -m)\n"); + printf(" [-9 loops] Number of loops for delay injection 9 (-1 to enable -m)\n"); + printf(" [-m N] Yield/sleep/kill every modulo N (default 0: disabled) (>= 0)\n"); + printf(" [-y] Yield\n"); + printf(" [-k] Kill thread with signal\n"); + printf(" [-s S] S: =0: disabled (default), >0: sleep time (ms)\n"); + printf(" [-t N] Number of threads (default 200)\n"); + printf(" [-r N] Number of repetitions per thread (default 5000)\n"); + printf(" [-d] Disable rseq system call (no initialization)\n"); + printf(" [-D M] Disable rseq for each M threads\n"); + printf(" [-T test] Choose test: (s)pinlock, (l)ist, (b)uffer, (m)emcpy, (i)ncrement\n"); + printf(" [-M] Push into buffer and memcpy buffer with memory barriers.\n"); + printf(" [-v] Verbose output.\n"); + printf(" [-h] Show this help.\n"); + printf("\n"); +} + +int main(int argc, char **argv) +{ + int i; + + for (i = 1; i < argc; i++) { + if (argv[i][0] != '-') + continue; + switch (argv[i][1]) { + case '1': + case '2': + case '3': + case '4': + case '5': + case '6': + case '7': + case '8': + case '9': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + loop_cnt[argv[i][1] - '0'] = atol(argv[i + 1]); + i++; + break; + case 'm': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_modulo = atol(argv[i + 1]); + if (opt_modulo < 0) { + show_usage(argc, argv); + goto error; + } + i++; + break; + case 's': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_sleep = atol(argv[i + 1]); + if (opt_sleep < 0) { + show_usage(argc, argv); + goto error; + } + i++; + break; + case 'y': + opt_yield = 1; + break; + case 'k': + opt_signal = 1; + break; + case 'd': + opt_disable_rseq = 1; + break; + case 'D': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_disable_mod = atol(argv[i + 1]); + if (opt_disable_mod < 0) { + show_usage(argc, argv); + goto error; + } + i++; + break; + case 't': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_threads = atol(argv[i + 1]); + if (opt_threads < 0) { + show_usage(argc, argv); + goto error; + } + i++; + break; + case 'r': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_reps = atoll(argv[i + 1]); + if (opt_reps < 0) { + show_usage(argc, argv); + goto error; + } + i++; + break; + case 'h': + show_usage(argc, argv); + goto end; + case 'T': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_test = *argv[i + 1]; + switch (opt_test) { + case 's': + case 'l': + case 'i': + case 'b': + case 'm': + break; + default: + show_usage(argc, argv); + goto error; + } + i++; + break; + case 'v': + verbose = 1; + break; + case 'M': + opt_mb = 1; + break; + default: + show_usage(argc, argv); + goto error; + } + } + + loop_cnt_1 = loop_cnt[1]; + loop_cnt_2 = loop_cnt[2]; + loop_cnt_3 = loop_cnt[3]; + loop_cnt_4 = loop_cnt[4]; + loop_cnt_5 = loop_cnt[5]; + loop_cnt_6 = loop_cnt[6]; + + if (set_signal_handler()) + goto error; + + if (!opt_disable_rseq && rseq_register_current_thread()) + goto error; + switch (opt_test) { + case 's': + printf_verbose("spinlock\n"); + test_percpu_spinlock(); + break; + case 'l': + printf_verbose("linked list\n"); + test_percpu_list(); + break; + case 'b': + printf_verbose("buffer\n"); + test_percpu_buffer(); + break; + case 'm': + printf_verbose("memcpy buffer\n"); + test_percpu_memcpy_buffer(); + break; + case 'i': + printf_verbose("counter increment\n"); + test_percpu_inc(); + break; + } + if (!opt_disable_rseq && rseq_unregister_current_thread()) + abort(); +end: + return 0; + +error: + return -1; +} -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 13/14] rseq: selftests: Provide parametrized tests (v2) @ 2018-04-30 22:44 ` mathieu.desnoyers 0 siblings, 0 replies; 105+ messages in thread From: mathieu.desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) "param_test" is a parametrizable restartable sequences test. See the "--help" output for usage. "param_test_benchmark" is the same as "param_test", but it removes testing book-keeping code to allow accurate benchmarks. "param_test_compare_twice" is the same as "param_test", but it performs each comparison within rseq critical section twice, thus validating invariants. If any of the second comparisons fails, an error message is printed and the test aborts. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com> CC: Shuah Khan <shuahkh at osg.samsung.com> CC: Russell King <linux at arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas at arm.com> CC: Will Deacon <will.deacon at arm.com> CC: Thomas Gleixner <tglx at linutronix.de> CC: Paul Turner <pjt at google.com> CC: Andrew Hunter <ahh at google.com> CC: Peter Zijlstra <peterz at infradead.org> CC: Andy Lutomirski <luto at amacapital.net> CC: Andi Kleen <andi at firstfloor.org> CC: Dave Watson <davejwatson at fb.com> CC: Chris Lameter <cl at linux.com> CC: Ingo Molnar <mingo at redhat.com> CC: "H. Peter Anvin" <hpa at zytor.com> CC: Ben Maurer <bmaurer at fb.com> CC: Steven Rostedt <rostedt at goodmis.org> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com> CC: Josh Triplett <josh at joshtriplett.org> CC: Linus Torvalds <torvalds at linux-foundation.org> CC: Andrew Morton <akpm at linux-foundation.org> CC: Boqun Feng <boqun.feng at gmail.com> CC: linux-kselftest at vger.kernel.org CC: linux-api at vger.kernel.org --- Changes since v1: - Use only rseq, remove use of cpu_opv. --- tools/testing/selftests/rseq/param_test.c | 1260 +++++++++++++++++++++++++++++ 1 file changed, 1260 insertions(+) create mode 100644 tools/testing/selftests/rseq/param_test.c diff --git a/tools/testing/selftests/rseq/param_test.c b/tools/testing/selftests/rseq/param_test.c new file mode 100644 index 000000000000..6a9f602a8718 --- /dev/null +++ b/tools/testing/selftests/rseq/param_test.c @@ -0,0 +1,1260 @@ +// SPDX-License-Identifier: LGPL-2.1 +#define _GNU_SOURCE +#include <assert.h> +#include <pthread.h> +#include <sched.h> +#include <stdint.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <syscall.h> +#include <unistd.h> +#include <poll.h> +#include <sys/types.h> +#include <signal.h> +#include <errno.h> +#include <stddef.h> + +static inline pid_t gettid(void) +{ + return syscall(__NR_gettid); +} + +#define NR_INJECT 9 +static int loop_cnt[NR_INJECT + 1]; + +static int loop_cnt_1 asm("asm_loop_cnt_1") __attribute__((used)); +static int loop_cnt_2 asm("asm_loop_cnt_2") __attribute__((used)); +static int loop_cnt_3 asm("asm_loop_cnt_3") __attribute__((used)); +static int loop_cnt_4 asm("asm_loop_cnt_4") __attribute__((used)); +static int loop_cnt_5 asm("asm_loop_cnt_5") __attribute__((used)); +static int loop_cnt_6 asm("asm_loop_cnt_6") __attribute__((used)); + +static int opt_modulo, verbose; + +static int opt_yield, opt_signal, opt_sleep, + opt_disable_rseq, opt_threads = 200, + opt_disable_mod = 0, opt_test = 's', opt_mb = 0; + +#ifndef RSEQ_SKIP_FASTPATH +static long long opt_reps = 5000; +#else +static long long opt_reps = 100; +#endif + +static __thread __attribute__((tls_model("initial-exec"))) +unsigned int signals_delivered; + +#ifndef BENCHMARK + +static __thread __attribute__((tls_model("initial-exec"), unused)) +unsigned int yield_mod_cnt, nr_abort; + +#define printf_verbose(fmt, ...) \ + do { \ + if (verbose) \ + printf(fmt, ## __VA_ARGS__); \ + } while (0) + +#if defined(__x86_64__) || defined(__i386__) + +#define INJECT_ASM_REG "eax" + +#define RSEQ_INJECT_CLOBBER \ + , INJECT_ASM_REG + +#ifdef __i386__ + +#define RSEQ_INJECT_ASM(n) \ + "mov asm_loop_cnt_" #n ", %%" INJECT_ASM_REG "\n\t" \ + "test %%" INJECT_ASM_REG ",%%" INJECT_ASM_REG "\n\t" \ + "jz 333f\n\t" \ + "222:\n\t" \ + "dec %%" INJECT_ASM_REG "\n\t" \ + "jnz 222b\n\t" \ + "333:\n\t" + +#elif defined(__x86_64__) + +#define RSEQ_INJECT_ASM(n) \ + "lea asm_loop_cnt_" #n "(%%rip), %%" INJECT_ASM_REG "\n\t" \ + "mov (%%" INJECT_ASM_REG "), %%" INJECT_ASM_REG "\n\t" \ + "test %%" INJECT_ASM_REG ",%%" INJECT_ASM_REG "\n\t" \ + "jz 333f\n\t" \ + "222:\n\t" \ + "dec %%" INJECT_ASM_REG "\n\t" \ + "jnz 222b\n\t" \ + "333:\n\t" + +#else +#error "Unsupported architecture" +#endif + +#elif defined(__ARMEL__) + +#define RSEQ_INJECT_INPUT \ + , [loop_cnt_1]"m"(loop_cnt[1]) \ + , [loop_cnt_2]"m"(loop_cnt[2]) \ + , [loop_cnt_3]"m"(loop_cnt[3]) \ + , [loop_cnt_4]"m"(loop_cnt[4]) \ + , [loop_cnt_5]"m"(loop_cnt[5]) \ + , [loop_cnt_6]"m"(loop_cnt[6]) + +#define INJECT_ASM_REG "r4" + +#define RSEQ_INJECT_CLOBBER \ + , INJECT_ASM_REG + +#define RSEQ_INJECT_ASM(n) \ + "ldr " INJECT_ASM_REG ", %[loop_cnt_" #n "]\n\t" \ + "cmp " INJECT_ASM_REG ", #0\n\t" \ + "beq 333f\n\t" \ + "222:\n\t" \ + "subs " INJECT_ASM_REG ", #1\n\t" \ + "bne 222b\n\t" \ + "333:\n\t" + +#elif __PPC__ + +#define RSEQ_INJECT_INPUT \ + , [loop_cnt_1]"m"(loop_cnt[1]) \ + , [loop_cnt_2]"m"(loop_cnt[2]) \ + , [loop_cnt_3]"m"(loop_cnt[3]) \ + , [loop_cnt_4]"m"(loop_cnt[4]) \ + , [loop_cnt_5]"m"(loop_cnt[5]) \ + , [loop_cnt_6]"m"(loop_cnt[6]) + +#define INJECT_ASM_REG "r18" + +#define RSEQ_INJECT_CLOBBER \ + , INJECT_ASM_REG + +#define RSEQ_INJECT_ASM(n) \ + "lwz %%" INJECT_ASM_REG ", %[loop_cnt_" #n "]\n\t" \ + "cmpwi %%" INJECT_ASM_REG ", 0\n\t" \ + "beq 333f\n\t" \ + "222:\n\t" \ + "subic. %%" INJECT_ASM_REG ", %%" INJECT_ASM_REG ", 1\n\t" \ + "bne 222b\n\t" \ + "333:\n\t" +#else +#error unsupported target +#endif + +#define RSEQ_INJECT_FAILED \ + nr_abort++; + +#define RSEQ_INJECT_C(n) \ +{ \ + int loc_i, loc_nr_loops = loop_cnt[n]; \ + \ + for (loc_i = 0; loc_i < loc_nr_loops; loc_i++) { \ + rseq_barrier(); \ + } \ + if (loc_nr_loops == -1 && opt_modulo) { \ + if (yield_mod_cnt == opt_modulo - 1) { \ + if (opt_sleep > 0) \ + poll(NULL, 0, opt_sleep); \ + if (opt_yield) \ + sched_yield(); \ + if (opt_signal) \ + raise(SIGUSR1); \ + yield_mod_cnt = 0; \ + } else { \ + yield_mod_cnt++; \ + } \ + } \ +} + +#else + +#define printf_verbose(fmt, ...) + +#endif /* BENCHMARK */ + +#include "rseq.h" + +struct percpu_lock_entry { + intptr_t v; +} __attribute__((aligned(128))); + +struct percpu_lock { + struct percpu_lock_entry c[CPU_SETSIZE]; +}; + +struct test_data_entry { + intptr_t count; +} __attribute__((aligned(128))); + +struct spinlock_test_data { + struct percpu_lock lock; + struct test_data_entry c[CPU_SETSIZE]; +}; + +struct spinlock_thread_test_data { + struct spinlock_test_data *data; + long long reps; + int reg; +}; + +struct inc_test_data { + struct test_data_entry c[CPU_SETSIZE]; +}; + +struct inc_thread_test_data { + struct inc_test_data *data; + long long reps; + int reg; +}; + +struct percpu_list_node { + intptr_t data; + struct percpu_list_node *next; +}; + +struct percpu_list_entry { + struct percpu_list_node *head; +} __attribute__((aligned(128))); + +struct percpu_list { + struct percpu_list_entry c[CPU_SETSIZE]; +}; + +#define BUFFER_ITEM_PER_CPU 100 + +struct percpu_buffer_node { + intptr_t data; +}; + +struct percpu_buffer_entry { + intptr_t offset; + intptr_t buflen; + struct percpu_buffer_node **array; +} __attribute__((aligned(128))); + +struct percpu_buffer { + struct percpu_buffer_entry c[CPU_SETSIZE]; +}; + +#define MEMCPY_BUFFER_ITEM_PER_CPU 100 + +struct percpu_memcpy_buffer_node { + intptr_t data1; + uint64_t data2; +}; + +struct percpu_memcpy_buffer_entry { + intptr_t offset; + intptr_t buflen; + struct percpu_memcpy_buffer_node *array; +} __attribute__((aligned(128))); + +struct percpu_memcpy_buffer { + struct percpu_memcpy_buffer_entry c[CPU_SETSIZE]; +}; + +/* A simple percpu spinlock. Grabs lock on current cpu. */ +static int rseq_this_cpu_lock(struct percpu_lock *lock) +{ + int cpu; + + for (;;) { + int ret; + + cpu = rseq_cpu_start(); + ret = rseq_cmpeqv_storev(&lock->c[cpu].v, + 0, 1, cpu); + if (rseq_likely(!ret)) + break; + /* Retry if comparison fails or rseq aborts. */ + } + /* + * Acquire semantic when taking lock after control dependency. + * Matches rseq_smp_store_release(). + */ + rseq_smp_acquire__after_ctrl_dep(); + return cpu; +} + +static void rseq_percpu_unlock(struct percpu_lock *lock, int cpu) +{ + assert(lock->c[cpu].v == 1); + /* + * Release lock, with release semantic. Matches + * rseq_smp_acquire__after_ctrl_dep(). + */ + rseq_smp_store_release(&lock->c[cpu].v, 0); +} + +void *test_percpu_spinlock_thread(void *arg) +{ + struct spinlock_thread_test_data *thread_data = arg; + struct spinlock_test_data *data = thread_data->data; + long long i, reps; + + if (!opt_disable_rseq && thread_data->reg && + rseq_register_current_thread()) + abort(); + reps = thread_data->reps; + for (i = 0; i < reps; i++) { + int cpu = rseq_cpu_start(); + + cpu = rseq_this_cpu_lock(&data->lock); + data->c[cpu].count++; + rseq_percpu_unlock(&data->lock, cpu); +#ifndef BENCHMARK + if (i != 0 && !(i % (reps / 10))) + printf_verbose("tid %d: count %lld\n", (int) gettid(), i); +#endif + } + printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", + (int) gettid(), nr_abort, signals_delivered); + if (!opt_disable_rseq && thread_data->reg && + rseq_unregister_current_thread()) + abort(); + return NULL; +} + +/* + * A simple test which implements a sharded counter using a per-cpu + * lock. Obviously real applications might prefer to simply use a + * per-cpu increment; however, this is reasonable for a test and the + * lock can be extended to synchronize more complicated operations. + */ +void test_percpu_spinlock(void) +{ + const int num_threads = opt_threads; + int i, ret; + uint64_t sum; + pthread_t test_threads[num_threads]; + struct spinlock_test_data data; + struct spinlock_thread_test_data thread_data[num_threads]; + + memset(&data, 0, sizeof(data)); + for (i = 0; i < num_threads; i++) { + thread_data[i].reps = opt_reps; + if (opt_disable_mod <= 0 || (i % opt_disable_mod)) + thread_data[i].reg = 1; + else + thread_data[i].reg = 0; + thread_data[i].data = &data; + ret = pthread_create(&test_threads[i], NULL, + test_percpu_spinlock_thread, + &thread_data[i]); + if (ret) { + errno = ret; + perror("pthread_create"); + abort(); + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_join(test_threads[i], NULL); + if (ret) { + errno = ret; + perror("pthread_join"); + abort(); + } + } + + sum = 0; + for (i = 0; i < CPU_SETSIZE; i++) + sum += data.c[i].count; + + assert(sum == (uint64_t)opt_reps * num_threads); +} + +void *test_percpu_inc_thread(void *arg) +{ + struct inc_thread_test_data *thread_data = arg; + struct inc_test_data *data = thread_data->data; + long long i, reps; + + if (!opt_disable_rseq && thread_data->reg && + rseq_register_current_thread()) + abort(); + reps = thread_data->reps; + for (i = 0; i < reps; i++) { + int ret; + + do { + int cpu; + + cpu = rseq_cpu_start(); + ret = rseq_addv(&data->c[cpu].count, 1, cpu); + } while (rseq_unlikely(ret)); +#ifndef BENCHMARK + if (i != 0 && !(i % (reps / 10))) + printf_verbose("tid %d: count %lld\n", (int) gettid(), i); +#endif + } + printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", + (int) gettid(), nr_abort, signals_delivered); + if (!opt_disable_rseq && thread_data->reg && + rseq_unregister_current_thread()) + abort(); + return NULL; +} + +void test_percpu_inc(void) +{ + const int num_threads = opt_threads; + int i, ret; + uint64_t sum; + pthread_t test_threads[num_threads]; + struct inc_test_data data; + struct inc_thread_test_data thread_data[num_threads]; + + memset(&data, 0, sizeof(data)); + for (i = 0; i < num_threads; i++) { + thread_data[i].reps = opt_reps; + if (opt_disable_mod <= 0 || (i % opt_disable_mod)) + thread_data[i].reg = 1; + else + thread_data[i].reg = 0; + thread_data[i].data = &data; + ret = pthread_create(&test_threads[i], NULL, + test_percpu_inc_thread, + &thread_data[i]); + if (ret) { + errno = ret; + perror("pthread_create"); + abort(); + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_join(test_threads[i], NULL); + if (ret) { + errno = ret; + perror("pthread_join"); + abort(); + } + } + + sum = 0; + for (i = 0; i < CPU_SETSIZE; i++) + sum += data.c[i].count; + + assert(sum == (uint64_t)opt_reps * num_threads); +} + +void this_cpu_list_push(struct percpu_list *list, + struct percpu_list_node *node, + int *_cpu) +{ + int cpu; + + for (;;) { + intptr_t *targetptr, newval, expect; + int ret; + + cpu = rseq_cpu_start(); + /* Load list->c[cpu].head with single-copy atomicity. */ + expect = (intptr_t)RSEQ_READ_ONCE(list->c[cpu].head); + newval = (intptr_t)node; + targetptr = (intptr_t *)&list->c[cpu].head; + node->next = (struct percpu_list_node *)expect; + ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu); + if (rseq_likely(!ret)) + break; + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; +} + +/* + * Unlike a traditional lock-less linked list; the availability of a + * rseq primitive allows us to implement pop without concerns over + * ABA-type races. + */ +struct percpu_list_node *this_cpu_list_pop(struct percpu_list *list, + int *_cpu) +{ + struct percpu_list_node *node = NULL; + int cpu; + + for (;;) { + struct percpu_list_node *head; + intptr_t *targetptr, expectnot, *load; + off_t offset; + int ret; + + cpu = rseq_cpu_start(); + targetptr = (intptr_t *)&list->c[cpu].head; + expectnot = (intptr_t)NULL; + offset = offsetof(struct percpu_list_node, next); + load = (intptr_t *)&head; + ret = rseq_cmpnev_storeoffp_load(targetptr, expectnot, + offset, load, cpu); + if (rseq_likely(!ret)) { + node = head; + break; + } + if (ret > 0) + break; + /* Retry if rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; + return node; +} + +/* + * __percpu_list_pop is not safe against concurrent accesses. Should + * only be used on lists that are not concurrently modified. + */ +struct percpu_list_node *__percpu_list_pop(struct percpu_list *list, int cpu) +{ + struct percpu_list_node *node; + + node = list->c[cpu].head; + if (!node) + return NULL; + list->c[cpu].head = node->next; + return node; +} + +void *test_percpu_list_thread(void *arg) +{ + long long i, reps; + struct percpu_list *list = (struct percpu_list *)arg; + + if (!opt_disable_rseq && rseq_register_current_thread()) + abort(); + + reps = opt_reps; + for (i = 0; i < reps; i++) { + struct percpu_list_node *node; + + node = this_cpu_list_pop(list, NULL); + if (opt_yield) + sched_yield(); /* encourage shuffling */ + if (node) + this_cpu_list_push(list, node, NULL); + } + + printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", + (int) gettid(), nr_abort, signals_delivered); + if (!opt_disable_rseq && rseq_unregister_current_thread()) + abort(); + + return NULL; +} + +/* Simultaneous modification to a per-cpu linked list from many threads. */ +void test_percpu_list(void) +{ + const int num_threads = opt_threads; + int i, j, ret; + uint64_t sum = 0, expected_sum = 0; + struct percpu_list list; + pthread_t test_threads[num_threads]; + cpu_set_t allowed_cpus; + + memset(&list, 0, sizeof(list)); + + /* Generate list entries for every usable cpu. */ + sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus); + for (i = 0; i < CPU_SETSIZE; i++) { + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + for (j = 1; j <= 100; j++) { + struct percpu_list_node *node; + + expected_sum += j; + + node = malloc(sizeof(*node)); + assert(node); + node->data = j; + node->next = list.c[i].head; + list.c[i].head = node; + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_create(&test_threads[i], NULL, + test_percpu_list_thread, &list); + if (ret) { + errno = ret; + perror("pthread_create"); + abort(); + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_join(test_threads[i], NULL); + if (ret) { + errno = ret; + perror("pthread_join"); + abort(); + } + } + + for (i = 0; i < CPU_SETSIZE; i++) { + struct percpu_list_node *node; + + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + + while ((node = __percpu_list_pop(&list, i))) { + sum += node->data; + free(node); + } + } + + /* + * All entries should now be accounted for (unless some external + * actor is interfering with our allowed affinity while this + * test is running). + */ + assert(sum == expected_sum); +} + +bool this_cpu_buffer_push(struct percpu_buffer *buffer, + struct percpu_buffer_node *node, + int *_cpu) +{ + bool result = false; + int cpu; + + for (;;) { + intptr_t *targetptr_spec, newval_spec; + intptr_t *targetptr_final, newval_final; + intptr_t offset; + int ret; + + cpu = rseq_cpu_start(); + offset = RSEQ_READ_ONCE(buffer->c[cpu].offset); + if (offset == buffer->c[cpu].buflen) + break; + newval_spec = (intptr_t)node; + targetptr_spec = (intptr_t *)&buffer->c[cpu].array[offset]; + newval_final = offset + 1; + targetptr_final = &buffer->c[cpu].offset; + if (opt_mb) + ret = rseq_cmpeqv_trystorev_storev_release( + targetptr_final, offset, targetptr_spec, + newval_spec, newval_final, cpu); + else + ret = rseq_cmpeqv_trystorev_storev(targetptr_final, + offset, targetptr_spec, newval_spec, + newval_final, cpu); + if (rseq_likely(!ret)) { + result = true; + break; + } + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; + return result; +} + +struct percpu_buffer_node *this_cpu_buffer_pop(struct percpu_buffer *buffer, + int *_cpu) +{ + struct percpu_buffer_node *head; + int cpu; + + for (;;) { + intptr_t *targetptr, newval; + intptr_t offset; + int ret; + + cpu = rseq_cpu_start(); + /* Load offset with single-copy atomicity. */ + offset = RSEQ_READ_ONCE(buffer->c[cpu].offset); + if (offset == 0) { + head = NULL; + break; + } + head = RSEQ_READ_ONCE(buffer->c[cpu].array[offset - 1]); + newval = offset - 1; + targetptr = (intptr_t *)&buffer->c[cpu].offset; + ret = rseq_cmpeqv_cmpeqv_storev(targetptr, offset, + (intptr_t *)&buffer->c[cpu].array[offset - 1], + (intptr_t)head, newval, cpu); + if (rseq_likely(!ret)) + break; + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; + return head; +} + +/* + * __percpu_buffer_pop is not safe against concurrent accesses. Should + * only be used on buffers that are not concurrently modified. + */ +struct percpu_buffer_node *__percpu_buffer_pop(struct percpu_buffer *buffer, + int cpu) +{ + struct percpu_buffer_node *head; + intptr_t offset; + + offset = buffer->c[cpu].offset; + if (offset == 0) + return NULL; + head = buffer->c[cpu].array[offset - 1]; + buffer->c[cpu].offset = offset - 1; + return head; +} + +void *test_percpu_buffer_thread(void *arg) +{ + long long i, reps; + struct percpu_buffer *buffer = (struct percpu_buffer *)arg; + + if (!opt_disable_rseq && rseq_register_current_thread()) + abort(); + + reps = opt_reps; + for (i = 0; i < reps; i++) { + struct percpu_buffer_node *node; + + node = this_cpu_buffer_pop(buffer, NULL); + if (opt_yield) + sched_yield(); /* encourage shuffling */ + if (node) { + if (!this_cpu_buffer_push(buffer, node, NULL)) { + /* Should increase buffer size. */ + abort(); + } + } + } + + printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", + (int) gettid(), nr_abort, signals_delivered); + if (!opt_disable_rseq && rseq_unregister_current_thread()) + abort(); + + return NULL; +} + +/* Simultaneous modification to a per-cpu buffer from many threads. */ +void test_percpu_buffer(void) +{ + const int num_threads = opt_threads; + int i, j, ret; + uint64_t sum = 0, expected_sum = 0; + struct percpu_buffer buffer; + pthread_t test_threads[num_threads]; + cpu_set_t allowed_cpus; + + memset(&buffer, 0, sizeof(buffer)); + + /* Generate list entries for every usable cpu. */ + sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus); + for (i = 0; i < CPU_SETSIZE; i++) { + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + /* Worse-case is every item in same CPU. */ + buffer.c[i].array = + malloc(sizeof(*buffer.c[i].array) * CPU_SETSIZE * + BUFFER_ITEM_PER_CPU); + assert(buffer.c[i].array); + buffer.c[i].buflen = CPU_SETSIZE * BUFFER_ITEM_PER_CPU; + for (j = 1; j <= BUFFER_ITEM_PER_CPU; j++) { + struct percpu_buffer_node *node; + + expected_sum += j; + + /* + * We could theoretically put the word-sized + * "data" directly in the buffer. However, we + * want to model objects that would not fit + * within a single word, so allocate an object + * for each node. + */ + node = malloc(sizeof(*node)); + assert(node); + node->data = j; + buffer.c[i].array[j - 1] = node; + buffer.c[i].offset++; + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_create(&test_threads[i], NULL, + test_percpu_buffer_thread, &buffer); + if (ret) { + errno = ret; + perror("pthread_create"); + abort(); + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_join(test_threads[i], NULL); + if (ret) { + errno = ret; + perror("pthread_join"); + abort(); + } + } + + for (i = 0; i < CPU_SETSIZE; i++) { + struct percpu_buffer_node *node; + + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + + while ((node = __percpu_buffer_pop(&buffer, i))) { + sum += node->data; + free(node); + } + free(buffer.c[i].array); + } + + /* + * All entries should now be accounted for (unless some external + * actor is interfering with our allowed affinity while this + * test is running). + */ + assert(sum == expected_sum); +} + +bool this_cpu_memcpy_buffer_push(struct percpu_memcpy_buffer *buffer, + struct percpu_memcpy_buffer_node item, + int *_cpu) +{ + bool result = false; + int cpu; + + for (;;) { + intptr_t *targetptr_final, newval_final, offset; + char *destptr, *srcptr; + size_t copylen; + int ret; + + cpu = rseq_cpu_start(); + /* Load offset with single-copy atomicity. */ + offset = RSEQ_READ_ONCE(buffer->c[cpu].offset); + if (offset == buffer->c[cpu].buflen) + break; + destptr = (char *)&buffer->c[cpu].array[offset]; + srcptr = (char *)&item; + /* copylen must be <= 4kB. */ + copylen = sizeof(item); + newval_final = offset + 1; + targetptr_final = &buffer->c[cpu].offset; + if (opt_mb) + ret = rseq_cmpeqv_trymemcpy_storev_release( + targetptr_final, offset, + destptr, srcptr, copylen, + newval_final, cpu); + else + ret = rseq_cmpeqv_trymemcpy_storev(targetptr_final, + offset, destptr, srcptr, copylen, + newval_final, cpu); + if (rseq_likely(!ret)) { + result = true; + break; + } + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; + return result; +} + +bool this_cpu_memcpy_buffer_pop(struct percpu_memcpy_buffer *buffer, + struct percpu_memcpy_buffer_node *item, + int *_cpu) +{ + bool result = false; + int cpu; + + for (;;) { + intptr_t *targetptr_final, newval_final, offset; + char *destptr, *srcptr; + size_t copylen; + int ret; + + cpu = rseq_cpu_start(); + /* Load offset with single-copy atomicity. */ + offset = RSEQ_READ_ONCE(buffer->c[cpu].offset); + if (offset == 0) + break; + destptr = (char *)item; + srcptr = (char *)&buffer->c[cpu].array[offset - 1]; + /* copylen must be <= 4kB. */ + copylen = sizeof(*item); + newval_final = offset - 1; + targetptr_final = &buffer->c[cpu].offset; + ret = rseq_cmpeqv_trymemcpy_storev(targetptr_final, + offset, destptr, srcptr, copylen, + newval_final, cpu); + if (rseq_likely(!ret)) { + result = true; + break; + } + /* Retry if comparison fails or rseq aborts. */ + } + if (_cpu) + *_cpu = cpu; + return result; +} + +/* + * __percpu_memcpy_buffer_pop is not safe against concurrent accesses. Should + * only be used on buffers that are not concurrently modified. + */ +bool __percpu_memcpy_buffer_pop(struct percpu_memcpy_buffer *buffer, + struct percpu_memcpy_buffer_node *item, + int cpu) +{ + intptr_t offset; + + offset = buffer->c[cpu].offset; + if (offset == 0) + return false; + memcpy(item, &buffer->c[cpu].array[offset - 1], sizeof(*item)); + buffer->c[cpu].offset = offset - 1; + return true; +} + +void *test_percpu_memcpy_buffer_thread(void *arg) +{ + long long i, reps; + struct percpu_memcpy_buffer *buffer = (struct percpu_memcpy_buffer *)arg; + + if (!opt_disable_rseq && rseq_register_current_thread()) + abort(); + + reps = opt_reps; + for (i = 0; i < reps; i++) { + struct percpu_memcpy_buffer_node item; + bool result; + + result = this_cpu_memcpy_buffer_pop(buffer, &item, NULL); + if (opt_yield) + sched_yield(); /* encourage shuffling */ + if (result) { + if (!this_cpu_memcpy_buffer_push(buffer, item, NULL)) { + /* Should increase buffer size. */ + abort(); + } + } + } + + printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", + (int) gettid(), nr_abort, signals_delivered); + if (!opt_disable_rseq && rseq_unregister_current_thread()) + abort(); + + return NULL; +} + +/* Simultaneous modification to a per-cpu buffer from many threads. */ +void test_percpu_memcpy_buffer(void) +{ + const int num_threads = opt_threads; + int i, j, ret; + uint64_t sum = 0, expected_sum = 0; + struct percpu_memcpy_buffer buffer; + pthread_t test_threads[num_threads]; + cpu_set_t allowed_cpus; + + memset(&buffer, 0, sizeof(buffer)); + + /* Generate list entries for every usable cpu. */ + sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus); + for (i = 0; i < CPU_SETSIZE; i++) { + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + /* Worse-case is every item in same CPU. */ + buffer.c[i].array = + malloc(sizeof(*buffer.c[i].array) * CPU_SETSIZE * + MEMCPY_BUFFER_ITEM_PER_CPU); + assert(buffer.c[i].array); + buffer.c[i].buflen = CPU_SETSIZE * MEMCPY_BUFFER_ITEM_PER_CPU; + for (j = 1; j <= MEMCPY_BUFFER_ITEM_PER_CPU; j++) { + expected_sum += 2 * j + 1; + + /* + * We could theoretically put the word-sized + * "data" directly in the buffer. However, we + * want to model objects that would not fit + * within a single word, so allocate an object + * for each node. + */ + buffer.c[i].array[j - 1].data1 = j; + buffer.c[i].array[j - 1].data2 = j + 1; + buffer.c[i].offset++; + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_create(&test_threads[i], NULL, + test_percpu_memcpy_buffer_thread, + &buffer); + if (ret) { + errno = ret; + perror("pthread_create"); + abort(); + } + } + + for (i = 0; i < num_threads; i++) { + ret = pthread_join(test_threads[i], NULL); + if (ret) { + errno = ret; + perror("pthread_join"); + abort(); + } + } + + for (i = 0; i < CPU_SETSIZE; i++) { + struct percpu_memcpy_buffer_node item; + + if (!CPU_ISSET(i, &allowed_cpus)) + continue; + + while (__percpu_memcpy_buffer_pop(&buffer, &item, i)) { + sum += item.data1; + sum += item.data2; + } + free(buffer.c[i].array); + } + + /* + * All entries should now be accounted for (unless some external + * actor is interfering with our allowed affinity while this + * test is running). + */ + assert(sum == expected_sum); +} + +static void test_signal_interrupt_handler(int signo) +{ + signals_delivered++; +} + +static int set_signal_handler(void) +{ + int ret = 0; + struct sigaction sa; + sigset_t sigset; + + ret = sigemptyset(&sigset); + if (ret < 0) { + perror("sigemptyset"); + return ret; + } + + sa.sa_handler = test_signal_interrupt_handler; + sa.sa_mask = sigset; + sa.sa_flags = 0; + ret = sigaction(SIGUSR1, &sa, NULL); + if (ret < 0) { + perror("sigaction"); + return ret; + } + + printf_verbose("Signal handler set for SIGUSR1\n"); + + return ret; +} + +static void show_usage(int argc, char **argv) +{ + printf("Usage : %s <OPTIONS>\n", + argv[0]); + printf("OPTIONS:\n"); + printf(" [-1 loops] Number of loops for delay injection 1\n"); + printf(" [-2 loops] Number of loops for delay injection 2\n"); + printf(" [-3 loops] Number of loops for delay injection 3\n"); + printf(" [-4 loops] Number of loops for delay injection 4\n"); + printf(" [-5 loops] Number of loops for delay injection 5\n"); + printf(" [-6 loops] Number of loops for delay injection 6\n"); + printf(" [-7 loops] Number of loops for delay injection 7 (-1 to enable -m)\n"); + printf(" [-8 loops] Number of loops for delay injection 8 (-1 to enable -m)\n"); + printf(" [-9 loops] Number of loops for delay injection 9 (-1 to enable -m)\n"); + printf(" [-m N] Yield/sleep/kill every modulo N (default 0: disabled) (>= 0)\n"); + printf(" [-y] Yield\n"); + printf(" [-k] Kill thread with signal\n"); + printf(" [-s S] S: =0: disabled (default), >0: sleep time (ms)\n"); + printf(" [-t N] Number of threads (default 200)\n"); + printf(" [-r N] Number of repetitions per thread (default 5000)\n"); + printf(" [-d] Disable rseq system call (no initialization)\n"); + printf(" [-D M] Disable rseq for each M threads\n"); + printf(" [-T test] Choose test: (s)pinlock, (l)ist, (b)uffer, (m)emcpy, (i)ncrement\n"); + printf(" [-M] Push into buffer and memcpy buffer with memory barriers.\n"); + printf(" [-v] Verbose output.\n"); + printf(" [-h] Show this help.\n"); + printf("\n"); +} + +int main(int argc, char **argv) +{ + int i; + + for (i = 1; i < argc; i++) { + if (argv[i][0] != '-') + continue; + switch (argv[i][1]) { + case '1': + case '2': + case '3': + case '4': + case '5': + case '6': + case '7': + case '8': + case '9': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + loop_cnt[argv[i][1] - '0'] = atol(argv[i + 1]); + i++; + break; + case 'm': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_modulo = atol(argv[i + 1]); + if (opt_modulo < 0) { + show_usage(argc, argv); + goto error; + } + i++; + break; + case 's': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_sleep = atol(argv[i + 1]); + if (opt_sleep < 0) { + show_usage(argc, argv); + goto error; + } + i++; + break; + case 'y': + opt_yield = 1; + break; + case 'k': + opt_signal = 1; + break; + case 'd': + opt_disable_rseq = 1; + break; + case 'D': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_disable_mod = atol(argv[i + 1]); + if (opt_disable_mod < 0) { + show_usage(argc, argv); + goto error; + } + i++; + break; + case 't': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_threads = atol(argv[i + 1]); + if (opt_threads < 0) { + show_usage(argc, argv); + goto error; + } + i++; + break; + case 'r': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_reps = atoll(argv[i + 1]); + if (opt_reps < 0) { + show_usage(argc, argv); + goto error; + } + i++; + break; + case 'h': + show_usage(argc, argv); + goto end; + case 'T': + if (argc < i + 2) { + show_usage(argc, argv); + goto error; + } + opt_test = *argv[i + 1]; + switch (opt_test) { + case 's': + case 'l': + case 'i': + case 'b': + case 'm': + break; + default: + show_usage(argc, argv); + goto error; + } + i++; + break; + case 'v': + verbose = 1; + break; + case 'M': + opt_mb = 1; + break; + default: + show_usage(argc, argv); + goto error; + } + } + + loop_cnt_1 = loop_cnt[1]; + loop_cnt_2 = loop_cnt[2]; + loop_cnt_3 = loop_cnt[3]; + loop_cnt_4 = loop_cnt[4]; + loop_cnt_5 = loop_cnt[5]; + loop_cnt_6 = loop_cnt[6]; + + if (set_signal_handler()) + goto error; + + if (!opt_disable_rseq && rseq_register_current_thread()) + goto error; + switch (opt_test) { + case 's': + printf_verbose("spinlock\n"); + test_percpu_spinlock(); + break; + case 'l': + printf_verbose("linked list\n"); + test_percpu_list(); + break; + case 'b': + printf_verbose("buffer\n"); + test_percpu_buffer(); + break; + case 'm': + printf_verbose("memcpy buffer\n"); + test_percpu_memcpy_buffer(); + break; + case 'i': + printf_verbose("counter increment\n"); + test_percpu_inc(); + break; + } + if (!opt_disable_rseq && rseq_unregister_current_thread()) + abort(); +end: + return 0; + +error: + return -1; +} -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 14/14] rseq: selftests: Provide Makefile, scripts, gitignore (v2) 2018-04-30 22:44 [RFC PATCH for 4.18 00/14] Restartable Sequences Mathieu Desnoyers 2018-04-30 22:44 ` [PATCH 01/14] uapi headers: Provide types_32_64.h (v2) Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers @ 2018-04-30 22:44 ` mathieu.desnoyers 2018-04-30 22:44 ` [PATCH 04/14] arm: Wire up restartable sequences system call Mathieu Desnoyers ` (11 subsequent siblings) 14 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers, Shuah Khan, linux-kselftest A run_param_test.sh script runs many variants of the parametrizable tests. Wire up the rseq Makefile, add directory entry into MAINTAINERS file. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Shuah Khan <shuahkh@osg.samsung.com> CC: Russell King <linux@arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas@arm.com> CC: Will Deacon <will.deacon@arm.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Paul Turner <pjt@google.com> CC: Andrew Hunter <ahh@google.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Andi Kleen <andi@firstfloor.org> CC: Dave Watson <davejwatson@fb.com> CC: Chris Lameter <cl@linux.com> CC: Ingo Molnar <mingo@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Ben Maurer <bmaurer@fb.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Boqun Feng <boqun.feng@gmail.com> CC: linux-kselftest@vger.kernel.org CC: linux-api@vger.kernel.org --- Changes since v1: - Use only rseq, remove use of cpu_opv. --- MAINTAINERS | 1 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/rseq/.gitignore | 6 ++ tools/testing/selftests/rseq/Makefile | 30 ++++++ tools/testing/selftests/rseq/run_param_test.sh | 121 +++++++++++++++++++++++++ 5 files changed, 159 insertions(+) create mode 100644 tools/testing/selftests/rseq/.gitignore create mode 100644 tools/testing/selftests/rseq/Makefile create mode 100755 tools/testing/selftests/rseq/run_param_test.sh diff --git a/MAINTAINERS b/MAINTAINERS index 4d61ce154dfc..5e8968b3ccae 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11991,6 +11991,7 @@ S: Supported F: kernel/rseq.c F: include/uapi/linux/rseq.h F: include/trace/events/rseq.h +F: tools/testing/selftests/rseq/ RFKILL M: Johannes Berg <johannes@sipsolutions.net> diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 32aafa92074c..593fb44c9cd4 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -28,6 +28,7 @@ TARGETS += powerpc TARGETS += proc TARGETS += pstore TARGETS += ptrace +TARGETS += rseq TARGETS += seccomp TARGETS += sigaltstack TARGETS += size diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore new file mode 100644 index 000000000000..cc610da7e369 --- /dev/null +++ b/tools/testing/selftests/rseq/.gitignore @@ -0,0 +1,6 @@ +basic_percpu_ops_test +basic_test +basic_rseq_op_test +param_test +param_test_benchmark +param_test_compare_twice diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile new file mode 100644 index 000000000000..c30c52e1d0d2 --- /dev/null +++ b/tools/testing/selftests/rseq/Makefile @@ -0,0 +1,30 @@ +# SPDX-License-Identifier: GPL-2.0+ OR MIT +CFLAGS += -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./ +LDLIBS += -lpthread + +# Own dependencies because we only want to build against 1st prerequisite, but +# still track changes to header files and depend on shared object. +OVERRIDE_TARGETS = 1 + +TEST_GEN_PROGS = basic_test basic_percpu_ops_test param_test \ + param_test_benchmark param_test_compare_twice + +TEST_GEN_PROGS_EXTENDED = librseq.so + +TEST_PROGS = run_param_test.sh + +include ../lib.mk + +$(OUTPUT)/librseq.so: rseq.c rseq.h rseq-*.h + $(CC) $(CFLAGS) -shared -fPIC $< $(LDLIBS) -o $@ + +$(OUTPUT)/%: %.c $(TEST_GEN_PROGS_EXTENDED) rseq.h rseq-*.h + $(CC) $(CFLAGS) $< $(LDLIBS) -lrseq -o $@ + +$(OUTPUT)/param_test_benchmark: param_test.c $(TEST_GEN_PROGS_EXTENDED) \ + rseq.h rseq-*.h + $(CC) $(CFLAGS) -DBENCHMARK $< $(LDLIBS) -lrseq -o $@ + +$(OUTPUT)/param_test_compare_twice: param_test.c $(TEST_GEN_PROGS_EXTENDED) \ + rseq.h rseq-*.h + $(CC) $(CFLAGS) -DRSEQ_COMPARE_TWICE $< $(LDLIBS) -lrseq -o $@ diff --git a/tools/testing/selftests/rseq/run_param_test.sh b/tools/testing/selftests/rseq/run_param_test.sh new file mode 100755 index 000000000000..3acd6d75ff9f --- /dev/null +++ b/tools/testing/selftests/rseq/run_param_test.sh @@ -0,0 +1,121 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0+ or MIT + +EXTRA_ARGS=${@} + +OLDIFS="$IFS" +IFS=$'\n' +TEST_LIST=( + "-T s" + "-T l" + "-T b" + "-T b -M" + "-T m" + "-T m -M" + "-T i" +) + +TEST_NAME=( + "spinlock" + "list" + "buffer" + "buffer with barrier" + "memcpy" + "memcpy with barrier" + "increment" +) +IFS="$OLDIFS" + +REPS=1000 +SLOW_REPS=100 + +function do_tests() +{ + local i=0 + while [ "$i" -lt "${#TEST_LIST[@]}" ]; do + echo "Running test ${TEST_NAME[$i]}" + ./param_test ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1 + echo "Running compare-twice test ${TEST_NAME[$i]}" + ./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1 + let "i++" + done +} + +echo "Default parameters" +do_tests + +echo "Loop injection: 10000 loops" + +OLDIFS="$IFS" +IFS=$'\n' +INJECT_LIST=( + "1" + "2" + "3" + "4" + "5" + "6" + "7" + "8" + "9" +) +IFS="$OLDIFS" + +NR_LOOPS=10000 + +i=0 +while [ "$i" -lt "${#INJECT_LIST[@]}" ]; do + echo "Injecting at <${INJECT_LIST[$i]}>" + do_tests -${INJECT_LIST[i]} ${NR_LOOPS} + let "i++" +done +NR_LOOPS= + +function inject_blocking() +{ + OLDIFS="$IFS" + IFS=$'\n' + INJECT_LIST=( + "7" + "8" + "9" + ) + IFS="$OLDIFS" + + NR_LOOPS=-1 + + i=0 + while [ "$i" -lt "${#INJECT_LIST[@]}" ]; do + echo "Injecting at <${INJECT_LIST[$i]}>" + do_tests -${INJECT_LIST[i]} -1 ${@} + let "i++" + done + NR_LOOPS= +} + +echo "Yield injection (25%)" +inject_blocking -m 4 -y + +echo "Yield injection (50%)" +inject_blocking -m 2 -y + +echo "Yield injection (100%)" +inject_blocking -m 1 -y + +echo "Kill injection (25%)" +inject_blocking -m 4 -k + +echo "Kill injection (50%)" +inject_blocking -m 2 -k + +echo "Kill injection (100%)" +inject_blocking -m 1 -k + +echo "Sleep injection (1ms, 25%)" +inject_blocking -m 4 -s 1 + +echo "Sleep injection (1ms, 50%)" +inject_blocking -m 2 -s 1 + +echo "Sleep injection (1ms, 100%)" +inject_blocking -m 1 -s 1 -- 2.11.0 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 14/14] rseq: selftests: Provide Makefile, scripts, gitignore (v2) @ 2018-04-30 22:44 ` mathieu.desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers, Shuah Khan A run_param_test.sh script runs many variants of the parametrizable tests. Wire up the rseq Makefile, add directory entry into MAINTAINERS file. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Shuah Khan <shuahkh@osg.samsung.com> CC: Russell King <linux@arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas@arm.com> CC: Will Deacon <will.deacon@arm.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Paul Turner <pjt@google.com> CC: Andrew Hunter <ahh@google.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Andi Kleen <andi@firstfloor.org> CC: Dave Watson <davejwatson@fb.com> CC: Chris Lameter <cl@linux.com> CC: Ingo Molnar <mingo@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Ben Maurer <bmaurer@fb.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Boqun Feng <boqun.feng@gmail.com> CC: linux-kselftest@vger.kernel.org CC: linux-api@vger.kernel.org --- Changes since v1: - Use only rseq, remove use of cpu_opv. --- MAINTAINERS | 1 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/rseq/.gitignore | 6 ++ tools/testing/selftests/rseq/Makefile | 30 ++++++ tools/testing/selftests/rseq/run_param_test.sh | 121 +++++++++++++++++++++++++ 5 files changed, 159 insertions(+) create mode 100644 tools/testing/selftests/rseq/.gitignore create mode 100644 tools/testing/selftests/rseq/Makefile create mode 100755 tools/testing/selftests/rseq/run_param_test.sh diff --git a/MAINTAINERS b/MAINTAINERS index 4d61ce154dfc..5e8968b3ccae 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11991,6 +11991,7 @@ S: Supported F: kernel/rseq.c F: include/uapi/linux/rseq.h F: include/trace/events/rseq.h +F: tools/testing/selftests/rseq/ RFKILL M: Johannes Berg <johannes@sipsolutions.net> diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 32aafa92074c..593fb44c9cd4 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -28,6 +28,7 @@ TARGETS += powerpc TARGETS += proc TARGETS += pstore TARGETS += ptrace +TARGETS += rseq TARGETS += seccomp TARGETS += sigaltstack TARGETS += size diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore new file mode 100644 index 000000000000..cc610da7e369 --- /dev/null +++ b/tools/testing/selftests/rseq/.gitignore @@ -0,0 +1,6 @@ +basic_percpu_ops_test +basic_test +basic_rseq_op_test +param_test +param_test_benchmark +param_test_compare_twice diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile new file mode 100644 index 000000000000..c30c52e1d0d2 --- /dev/null +++ b/tools/testing/selftests/rseq/Makefile @@ -0,0 +1,30 @@ +# SPDX-License-Identifier: GPL-2.0+ OR MIT +CFLAGS += -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./ +LDLIBS += -lpthread + +# Own dependencies because we only want to build against 1st prerequisite, but +# still track changes to header files and depend on shared object. +OVERRIDE_TARGETS = 1 + +TEST_GEN_PROGS = basic_test basic_percpu_ops_test param_test \ + param_test_benchmark param_test_compare_twice + +TEST_GEN_PROGS_EXTENDED = librseq.so + +TEST_PROGS = run_param_test.sh + +include ../lib.mk + +$(OUTPUT)/librseq.so: rseq.c rseq.h rseq-*.h + $(CC) $(CFLAGS) -shared -fPIC $< $(LDLIBS) -o $@ + +$(OUTPUT)/%: %.c $(TEST_GEN_PROGS_EXTENDED) rseq.h rseq-*.h + $(CC) $(CFLAGS) $< $(LDLIBS) -lrseq -o $@ + +$(OUTPUT)/param_test_benchmark: param_test.c $(TEST_GEN_PROGS_EXTENDED) \ + rseq.h rseq-*.h + $(CC) $(CFLAGS) -DBENCHMARK $< $(LDLIBS) -lrseq -o $@ + +$(OUTPUT)/param_test_compare_twice: param_test.c $(TEST_GEN_PROGS_EXTENDED) \ + rseq.h rseq-*.h + $(CC) $(CFLAGS) -DRSEQ_COMPARE_TWICE $< $(LDLIBS) -lrseq -o $@ diff --git a/tools/testing/selftests/rseq/run_param_test.sh b/tools/testing/selftests/rseq/run_param_test.sh new file mode 100755 index 000000000000..3acd6d75ff9f --- /dev/null +++ b/tools/testing/selftests/rseq/run_param_test.sh @@ -0,0 +1,121 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0+ or MIT + +EXTRA_ARGS=${@} + +OLDIFS="$IFS" +IFS=$'\n' +TEST_LIST=( + "-T s" + "-T l" + "-T b" + "-T b -M" + "-T m" + "-T m -M" + "-T i" +) + +TEST_NAME=( + "spinlock" + "list" + "buffer" + "buffer with barrier" + "memcpy" + "memcpy with barrier" + "increment" +) +IFS="$OLDIFS" + +REPS=1000 +SLOW_REPS=100 + +function do_tests() +{ + local i=0 + while [ "$i" -lt "${#TEST_LIST[@]}" ]; do + echo "Running test ${TEST_NAME[$i]}" + ./param_test ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1 + echo "Running compare-twice test ${TEST_NAME[$i]}" + ./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1 + let "i++" + done +} + +echo "Default parameters" +do_tests + +echo "Loop injection: 10000 loops" + +OLDIFS="$IFS" +IFS=$'\n' +INJECT_LIST=( + "1" + "2" + "3" + "4" + "5" + "6" + "7" + "8" + "9" +) +IFS="$OLDIFS" + +NR_LOOPS=10000 + +i=0 +while [ "$i" -lt "${#INJECT_LIST[@]}" ]; do + echo "Injecting at <${INJECT_LIST[$i]}>" + do_tests -${INJECT_LIST[i]} ${NR_LOOPS} + let "i++" +done +NR_LOOPS= + +function inject_blocking() +{ + OLDIFS="$IFS" + IFS=$'\n' + INJECT_LIST=( + "7" + "8" + "9" + ) + IFS="$OLDIFS" + + NR_LOOPS=-1 + + i=0 + while [ "$i" -lt "${#INJECT_LIST[@]}" ]; do + echo "Injecting at <${INJECT_LIST[$i]}>" + do_tests -${INJECT_LIST[i]} -1 ${@} + let "i++" + done + NR_LOOPS= +} + +echo "Yield injection (25%)" +inject_blocking -m 4 -y + +echo "Yield injection (50%)" +inject_blocking -m 2 -y + +echo "Yield injection (100%)" +inject_blocking -m 1 -y + +echo "Kill injection (25%)" +inject_blocking -m 4 -k + +echo "Kill injection (50%)" +inject_blocking -m 2 -k + +echo "Kill injection (100%)" +inject_blocking -m 1 -k + +echo "Sleep injection (1ms, 25%)" +inject_blocking -m 4 -s 1 + +echo "Sleep injection (1ms, 50%)" +inject_blocking -m 2 -s 1 + +echo "Sleep injection (1ms, 100%)" +inject_blocking -m 1 -s 1 -- 2.11.0 ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 14/14] rseq: selftests: Provide Makefile, scripts, gitignore (v2) @ 2018-04-30 22:44 ` mathieu.desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) A run_param_test.sh script runs many variants of the parametrizable tests. Wire up the rseq Makefile, add directory entry into MAINTAINERS file. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com> CC: Shuah Khan <shuahkh at osg.samsung.com> CC: Russell King <linux at arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas at arm.com> CC: Will Deacon <will.deacon at arm.com> CC: Thomas Gleixner <tglx at linutronix.de> CC: Paul Turner <pjt at google.com> CC: Andrew Hunter <ahh at google.com> CC: Peter Zijlstra <peterz at infradead.org> CC: Andy Lutomirski <luto at amacapital.net> CC: Andi Kleen <andi at firstfloor.org> CC: Dave Watson <davejwatson at fb.com> CC: Chris Lameter <cl at linux.com> CC: Ingo Molnar <mingo at redhat.com> CC: "H. Peter Anvin" <hpa at zytor.com> CC: Ben Maurer <bmaurer at fb.com> CC: Steven Rostedt <rostedt at goodmis.org> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com> CC: Josh Triplett <josh at joshtriplett.org> CC: Linus Torvalds <torvalds at linux-foundation.org> CC: Andrew Morton <akpm at linux-foundation.org> CC: Boqun Feng <boqun.feng at gmail.com> CC: linux-kselftest at vger.kernel.org CC: linux-api at vger.kernel.org --- Changes since v1: - Use only rseq, remove use of cpu_opv. --- MAINTAINERS | 1 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/rseq/.gitignore | 6 ++ tools/testing/selftests/rseq/Makefile | 30 ++++++ tools/testing/selftests/rseq/run_param_test.sh | 121 +++++++++++++++++++++++++ 5 files changed, 159 insertions(+) create mode 100644 tools/testing/selftests/rseq/.gitignore create mode 100644 tools/testing/selftests/rseq/Makefile create mode 100755 tools/testing/selftests/rseq/run_param_test.sh diff --git a/MAINTAINERS b/MAINTAINERS index 4d61ce154dfc..5e8968b3ccae 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11991,6 +11991,7 @@ S: Supported F: kernel/rseq.c F: include/uapi/linux/rseq.h F: include/trace/events/rseq.h +F: tools/testing/selftests/rseq/ RFKILL M: Johannes Berg <johannes at sipsolutions.net> diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 32aafa92074c..593fb44c9cd4 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -28,6 +28,7 @@ TARGETS += powerpc TARGETS += proc TARGETS += pstore TARGETS += ptrace +TARGETS += rseq TARGETS += seccomp TARGETS += sigaltstack TARGETS += size diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore new file mode 100644 index 000000000000..cc610da7e369 --- /dev/null +++ b/tools/testing/selftests/rseq/.gitignore @@ -0,0 +1,6 @@ +basic_percpu_ops_test +basic_test +basic_rseq_op_test +param_test +param_test_benchmark +param_test_compare_twice diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile new file mode 100644 index 000000000000..c30c52e1d0d2 --- /dev/null +++ b/tools/testing/selftests/rseq/Makefile @@ -0,0 +1,30 @@ +# SPDX-License-Identifier: GPL-2.0+ OR MIT +CFLAGS += -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./ +LDLIBS += -lpthread + +# Own dependencies because we only want to build against 1st prerequisite, but +# still track changes to header files and depend on shared object. +OVERRIDE_TARGETS = 1 + +TEST_GEN_PROGS = basic_test basic_percpu_ops_test param_test \ + param_test_benchmark param_test_compare_twice + +TEST_GEN_PROGS_EXTENDED = librseq.so + +TEST_PROGS = run_param_test.sh + +include ../lib.mk + +$(OUTPUT)/librseq.so: rseq.c rseq.h rseq-*.h + $(CC) $(CFLAGS) -shared -fPIC $< $(LDLIBS) -o $@ + +$(OUTPUT)/%: %.c $(TEST_GEN_PROGS_EXTENDED) rseq.h rseq-*.h + $(CC) $(CFLAGS) $< $(LDLIBS) -lrseq -o $@ + +$(OUTPUT)/param_test_benchmark: param_test.c $(TEST_GEN_PROGS_EXTENDED) \ + rseq.h rseq-*.h + $(CC) $(CFLAGS) -DBENCHMARK $< $(LDLIBS) -lrseq -o $@ + +$(OUTPUT)/param_test_compare_twice: param_test.c $(TEST_GEN_PROGS_EXTENDED) \ + rseq.h rseq-*.h + $(CC) $(CFLAGS) -DRSEQ_COMPARE_TWICE $< $(LDLIBS) -lrseq -o $@ diff --git a/tools/testing/selftests/rseq/run_param_test.sh b/tools/testing/selftests/rseq/run_param_test.sh new file mode 100755 index 000000000000..3acd6d75ff9f --- /dev/null +++ b/tools/testing/selftests/rseq/run_param_test.sh @@ -0,0 +1,121 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0+ or MIT + +EXTRA_ARGS=${@} + +OLDIFS="$IFS" +IFS=$'\n' +TEST_LIST=( + "-T s" + "-T l" + "-T b" + "-T b -M" + "-T m" + "-T m -M" + "-T i" +) + +TEST_NAME=( + "spinlock" + "list" + "buffer" + "buffer with barrier" + "memcpy" + "memcpy with barrier" + "increment" +) +IFS="$OLDIFS" + +REPS=1000 +SLOW_REPS=100 + +function do_tests() +{ + local i=0 + while [ "$i" -lt "${#TEST_LIST[@]}" ]; do + echo "Running test ${TEST_NAME[$i]}" + ./param_test ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1 + echo "Running compare-twice test ${TEST_NAME[$i]}" + ./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1 + let "i++" + done +} + +echo "Default parameters" +do_tests + +echo "Loop injection: 10000 loops" + +OLDIFS="$IFS" +IFS=$'\n' +INJECT_LIST=( + "1" + "2" + "3" + "4" + "5" + "6" + "7" + "8" + "9" +) +IFS="$OLDIFS" + +NR_LOOPS=10000 + +i=0 +while [ "$i" -lt "${#INJECT_LIST[@]}" ]; do + echo "Injecting at <${INJECT_LIST[$i]}>" + do_tests -${INJECT_LIST[i]} ${NR_LOOPS} + let "i++" +done +NR_LOOPS= + +function inject_blocking() +{ + OLDIFS="$IFS" + IFS=$'\n' + INJECT_LIST=( + "7" + "8" + "9" + ) + IFS="$OLDIFS" + + NR_LOOPS=-1 + + i=0 + while [ "$i" -lt "${#INJECT_LIST[@]}" ]; do + echo "Injecting at <${INJECT_LIST[$i]}>" + do_tests -${INJECT_LIST[i]} -1 ${@} + let "i++" + done + NR_LOOPS= +} + +echo "Yield injection (25%)" +inject_blocking -m 4 -y + +echo "Yield injection (50%)" +inject_blocking -m 2 -y + +echo "Yield injection (100%)" +inject_blocking -m 1 -y + +echo "Kill injection (25%)" +inject_blocking -m 4 -k + +echo "Kill injection (50%)" +inject_blocking -m 2 -k + +echo "Kill injection (100%)" +inject_blocking -m 1 -k + +echo "Sleep injection (1ms, 25%)" +inject_blocking -m 4 -s 1 + +echo "Sleep injection (1ms, 50%)" +inject_blocking -m 2 -s 1 + +echo "Sleep injection (1ms, 100%)" +inject_blocking -m 1 -s 1 -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 105+ messages in thread
* [PATCH 14/14] rseq: selftests: Provide Makefile, scripts, gitignore (v2) @ 2018-04-30 22:44 ` mathieu.desnoyers 0 siblings, 0 replies; 105+ messages in thread From: mathieu.desnoyers @ 2018-04-30 22:44 UTC (permalink / raw) A run_param_test.sh script runs many variants of the parametrizable tests. Wire up the rseq Makefile, add directory entry into MAINTAINERS file. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com> CC: Shuah Khan <shuahkh at osg.samsung.com> CC: Russell King <linux at arm.linux.org.uk> CC: Catalin Marinas <catalin.marinas at arm.com> CC: Will Deacon <will.deacon at arm.com> CC: Thomas Gleixner <tglx at linutronix.de> CC: Paul Turner <pjt at google.com> CC: Andrew Hunter <ahh at google.com> CC: Peter Zijlstra <peterz at infradead.org> CC: Andy Lutomirski <luto at amacapital.net> CC: Andi Kleen <andi at firstfloor.org> CC: Dave Watson <davejwatson at fb.com> CC: Chris Lameter <cl at linux.com> CC: Ingo Molnar <mingo at redhat.com> CC: "H. Peter Anvin" <hpa at zytor.com> CC: Ben Maurer <bmaurer at fb.com> CC: Steven Rostedt <rostedt at goodmis.org> CC: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com> CC: Josh Triplett <josh at joshtriplett.org> CC: Linus Torvalds <torvalds at linux-foundation.org> CC: Andrew Morton <akpm at linux-foundation.org> CC: Boqun Feng <boqun.feng at gmail.com> CC: linux-kselftest at vger.kernel.org CC: linux-api at vger.kernel.org --- Changes since v1: - Use only rseq, remove use of cpu_opv. --- MAINTAINERS | 1 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/rseq/.gitignore | 6 ++ tools/testing/selftests/rseq/Makefile | 30 ++++++ tools/testing/selftests/rseq/run_param_test.sh | 121 +++++++++++++++++++++++++ 5 files changed, 159 insertions(+) create mode 100644 tools/testing/selftests/rseq/.gitignore create mode 100644 tools/testing/selftests/rseq/Makefile create mode 100755 tools/testing/selftests/rseq/run_param_test.sh diff --git a/MAINTAINERS b/MAINTAINERS index 4d61ce154dfc..5e8968b3ccae 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11991,6 +11991,7 @@ S: Supported F: kernel/rseq.c F: include/uapi/linux/rseq.h F: include/trace/events/rseq.h +F: tools/testing/selftests/rseq/ RFKILL M: Johannes Berg <johannes at sipsolutions.net> diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 32aafa92074c..593fb44c9cd4 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -28,6 +28,7 @@ TARGETS += powerpc TARGETS += proc TARGETS += pstore TARGETS += ptrace +TARGETS += rseq TARGETS += seccomp TARGETS += sigaltstack TARGETS += size diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore new file mode 100644 index 000000000000..cc610da7e369 --- /dev/null +++ b/tools/testing/selftests/rseq/.gitignore @@ -0,0 +1,6 @@ +basic_percpu_ops_test +basic_test +basic_rseq_op_test +param_test +param_test_benchmark +param_test_compare_twice diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile new file mode 100644 index 000000000000..c30c52e1d0d2 --- /dev/null +++ b/tools/testing/selftests/rseq/Makefile @@ -0,0 +1,30 @@ +# SPDX-License-Identifier: GPL-2.0+ OR MIT +CFLAGS += -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./ +LDLIBS += -lpthread + +# Own dependencies because we only want to build against 1st prerequisite, but +# still track changes to header files and depend on shared object. +OVERRIDE_TARGETS = 1 + +TEST_GEN_PROGS = basic_test basic_percpu_ops_test param_test \ + param_test_benchmark param_test_compare_twice + +TEST_GEN_PROGS_EXTENDED = librseq.so + +TEST_PROGS = run_param_test.sh + +include ../lib.mk + +$(OUTPUT)/librseq.so: rseq.c rseq.h rseq-*.h + $(CC) $(CFLAGS) -shared -fPIC $< $(LDLIBS) -o $@ + +$(OUTPUT)/%: %.c $(TEST_GEN_PROGS_EXTENDED) rseq.h rseq-*.h + $(CC) $(CFLAGS) $< $(LDLIBS) -lrseq -o $@ + +$(OUTPUT)/param_test_benchmark: param_test.c $(TEST_GEN_PROGS_EXTENDED) \ + rseq.h rseq-*.h + $(CC) $(CFLAGS) -DBENCHMARK $< $(LDLIBS) -lrseq -o $@ + +$(OUTPUT)/param_test_compare_twice: param_test.c $(TEST_GEN_PROGS_EXTENDED) \ + rseq.h rseq-*.h + $(CC) $(CFLAGS) -DRSEQ_COMPARE_TWICE $< $(LDLIBS) -lrseq -o $@ diff --git a/tools/testing/selftests/rseq/run_param_test.sh b/tools/testing/selftests/rseq/run_param_test.sh new file mode 100755 index 000000000000..3acd6d75ff9f --- /dev/null +++ b/tools/testing/selftests/rseq/run_param_test.sh @@ -0,0 +1,121 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0+ or MIT + +EXTRA_ARGS=${@} + +OLDIFS="$IFS" +IFS=$'\n' +TEST_LIST=( + "-T s" + "-T l" + "-T b" + "-T b -M" + "-T m" + "-T m -M" + "-T i" +) + +TEST_NAME=( + "spinlock" + "list" + "buffer" + "buffer with barrier" + "memcpy" + "memcpy with barrier" + "increment" +) +IFS="$OLDIFS" + +REPS=1000 +SLOW_REPS=100 + +function do_tests() +{ + local i=0 + while [ "$i" -lt "${#TEST_LIST[@]}" ]; do + echo "Running test ${TEST_NAME[$i]}" + ./param_test ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1 + echo "Running compare-twice test ${TEST_NAME[$i]}" + ./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1 + let "i++" + done +} + +echo "Default parameters" +do_tests + +echo "Loop injection: 10000 loops" + +OLDIFS="$IFS" +IFS=$'\n' +INJECT_LIST=( + "1" + "2" + "3" + "4" + "5" + "6" + "7" + "8" + "9" +) +IFS="$OLDIFS" + +NR_LOOPS=10000 + +i=0 +while [ "$i" -lt "${#INJECT_LIST[@]}" ]; do + echo "Injecting at <${INJECT_LIST[$i]}>" + do_tests -${INJECT_LIST[i]} ${NR_LOOPS} + let "i++" +done +NR_LOOPS= + +function inject_blocking() +{ + OLDIFS="$IFS" + IFS=$'\n' + INJECT_LIST=( + "7" + "8" + "9" + ) + IFS="$OLDIFS" + + NR_LOOPS=-1 + + i=0 + while [ "$i" -lt "${#INJECT_LIST[@]}" ]; do + echo "Injecting at <${INJECT_LIST[$i]}>" + do_tests -${INJECT_LIST[i]} -1 ${@} + let "i++" + done + NR_LOOPS= +} + +echo "Yield injection (25%)" +inject_blocking -m 4 -y + +echo "Yield injection (50%)" +inject_blocking -m 2 -y + +echo "Yield injection (100%)" +inject_blocking -m 1 -y + +echo "Kill injection (25%)" +inject_blocking -m 4 -k + +echo "Kill injection (50%)" +inject_blocking -m 2 -k + +echo "Kill injection (100%)" +inject_blocking -m 1 -k + +echo "Sleep injection (1ms, 25%)" +inject_blocking -m 4 -s 1 + +echo "Sleep injection (1ms, 50%)" +inject_blocking -m 2 -s 1 + +echo "Sleep injection (1ms, 100%)" +inject_blocking -m 1 -s 1 -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences 2018-04-30 22:44 [RFC PATCH for 4.18 00/14] Restartable Sequences Mathieu Desnoyers ` (13 preceding siblings ...) 2018-04-30 22:44 ` mathieu.desnoyers @ 2018-05-02 3:53 ` Daniel Colascione 2018-05-02 8:43 ` Peter Zijlstra ` (2 more replies) 14 siblings, 3 replies; 105+ messages in thread From: Daniel Colascione @ 2018-05-02 3:53 UTC (permalink / raw) To: mathieu.desnoyers Cc: Peter Zijlstra, paulmck, boqun.feng, luto, davejwatson, linux-kernel, linux-api, pjt, Andrew Morton, linux, tglx, mingo, hpa, Andrew Hunter, andi, cl, bmaurer, rostedt, josh, torvalds, catalin.marinas, will.deacon, mtk.manpages, Joel Fernandes Hi Mathieu: this work looks very cool. See inline. On Mon, Apr 30, 2018 at 3:48 PM Mathieu Desnoyers < mathieu.desnoyers@efficios.com> wrote: > Hi, > Here is an updated RFC round of the Restartable Sequences patchset > based on kernel 4.17-rc3. Based on feedback from Linus, I'm introducing > only the rseq system call, keeping the rest for later. > This already enables speeding up the Facebook jemalloc and arm64 PMC > read from user-space use-cases, as well as speedup of use-cases relying > on getting the current cpu number from user-space. We'll have to wait > until a more complete solution is introduced before the LTTng-UST > tracer can replace its ring buffer atomic instructions with rseq > though. But let's proceed one step at a time. I like the general theme of the kernel using its "superpowers" (in this case, knowledge of preemption) to help userspace do a better job without userspace code needing to enter the kernel to benefit. The per-CPU data structures this patch enables help in a lot of use cases, but I think there's another use case that you might not have considered, one that can benefit from a extension to your proposed API. Consider mutexes: in the kernel, for mutual exclusion, we can use a spinlock, which in the kernel ends up being simpler and (in a lot of scenarios) more efficient than a mutex: a core that takes a spinlock promises to keep the lock held for only a very short time, and it disables interrupts to delay asynchronous work that might unexpectedly lengthen the critical section. A different core that wants to grab that spinlock can just spin on the lock word, confident that its spin will be short because any core owning the lock is guaranteed to release it very quickly. (Long spins would be very bad for power.) The overall result is a lock that's much lighter than a mutex. (A spinlock can also be used in places we can't sleep, but this ability isn't relevant to the discussion below.) Userspace doesn't have a good equivalent to a lightweight spinlock. While you can build a spinlock in userspace, the result ends up having serious problems because of preemption: first, a thread owning such a lock can be preempted in its critical section, lengthening the critical section arbitrarily. Second, a thread spinning on a lock will keep spinning even when the owning thread isn't scheduled anywhere. Userspace can just implement a mutex as a try-acquire and a FUTEX_WAIT on failure. This approach works fine when there's no contention, but a system call is a pretty heavy operation. Why pay for a system call on occasional light congestion with a short critical section. Can we do better? The usual approach to "better" is an "adaptive mutex". Such a thing, when it attempts to acquire a lock another thread owns, spins for some number of iterations, then falls back to futex. I guess that's a little better than immediately jumping to futex, but it's not optimal. We can still spin when the lock owner isn't scheduled, and the spin count is usually some guess (either specified manually or estimated statistically) that's not guaranteed to produce decent results. Even if we do pick a good spin count, we run a very good chance of under- or over-spinning on any given lock-acquire. We always want to sleep when spinning would be pointless. One important case of the spin-while-not-scheduled problem is operation on a uniprocessor system: on such a system, only a single task can be scheduled at a time, making all spins maximally pointless. The usual approach to avoiding wasted spins for adaptive mutexes is for the adaptive mutex library to ask upon initialization "How many cores are in this system?", and if the answer comes back as "1", disable spinning. This approach is inadequate: CPU affinity can change at arbitrary times, and CPU affinity can produce the same spin pessimization that a uniprocessor system does. I think a small enhancement to rseq would let us build a perfect userspace mutex, one that spins on lock-acquire only when the lock owner is running and that sleeps otherwise, freeing userspace from both specifying ad-hoc spin counts and from trying to detect situations in which spinning is generally pointless. It'd work like this: in the per-thread rseq data structure, we'd include a description of a futex operation for the kernel would perform (in the context of the preempted thread) upon preemption, immediately before schedule(). If the futex operation itself sleeps, that's no problem: we will have still accomplished our goal of running some other thread instead of the preempted thread. Suppose we make a userspace mutex implemented with a lock word having three bits: acquired, sleep_mode, and wait_pending, with the rest of the word not being relevant at the moment. We'd implement lock-acquire the usual way, CASing the acquired bit into the lock and deeming the lock taken if we're successful. Except that before attempting the CAS, we'd configure the current thread's rseq area to bitwise-or the sleep_mode bit into the lock word if the current thread is scheduled. Now, imagine that we're a different thread that wants to take the lock while the first thread owns it. We'll attempt a CAS as before. The CAS will fail. That's fine --- we'll spin by retrying the CAS. Here's where we differ from a conventional from a conventional adaptive mutex. A normal adaptive mutex will execute a fixed maximum number of CAS attempts, then FUTEX_WAIT. We, instead, keep spinning until we either grab the lock or we notice the sleep_mode bit set in the lock word. (On every CAS attempt, we update our local cached copy of the lock word.) When we do notice the sleep_mode bit set, we'll fall back to the usual sleeping strategy: CAS the wait_pending bit into the lock word and FUTEX_WAIT. Back in the owning thread, when we release the model, we'll CAS again to reset the acquired bit and (if set) sleep_mode bit, and if we see wait_pending, FUTEX_WAKE any waiters. At this point, we can disable the rseq registration. (If we're preempted after the unlock but before the rseq disarm, we'll spuriously set sleep_mode, but that's fine, since we'll reset it on next lock-acquire.) This scheme gives us optimal spinning behavior. We spin on lock-acquire only as long as the owning thread is actually running. We don't spin at all on uniprocessor machines or in situations where we've set up affinity to create the moral equivalent of a uniprocessor system. We correctly fall back to sleeping when the owner itself schedules (which indicates that the critical section is likely to last a while). And we don't need to choose some arbitrary constant or use some estimation function to guess how many times we want to spin. We can stop spinning as soon as we know it'll be unproductive. In practice, I think you'd want to impose a maximum spin count anyway to guard against 1) unexpected non-scheduling critical section lengthening via bugs, and 2) the possibility that the futex-on-schedule operation sleeps before setting sleep_mode. If you don't think the futex-on-schedule thing is a good idea, there are other ways to accomplish the same basic task. For example, you could add an is_running field to struct rseq, and the kernel would take of making this field true only when the task owning the struct rseq is, in fact, running. A sufficiently clever runtime system could stash the owning thread ID in the lockword and provide a way to find a thread's struct rseq given its thread ID. On lock contention, instead of switching to FUTEX_WAIT when we notice sleep_mode set in the lock word, we'd switch to FUTEX_WAIT when we notice is_running in the owning thread's struct rseq become false. This approach is probably simpler, but makes each spin a bit heavier due to the need to fetch two separate memory locations (the lockword and the is_running field). Anyway, I'm sure there are other variations on the general theme of the rseq mechanism helping to optimize mutex acquisition. What do you think? ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences 2018-05-02 3:53 ` [RFC PATCH for 4.18 00/14] Restartable Sequences Daniel Colascione @ 2018-05-02 8:43 ` Peter Zijlstra 2018-05-02 16:03 ` Mathieu Desnoyers 2018-05-02 17:22 ` Peter Zijlstra 2 siblings, 0 replies; 105+ messages in thread From: Peter Zijlstra @ 2018-05-02 8:43 UTC (permalink / raw) To: Daniel Colascione Cc: mathieu.desnoyers, paulmck, boqun.feng, luto, davejwatson, linux-kernel, linux-api, pjt, Andrew Morton, linux, tglx, mingo, hpa, Andrew Hunter, andi, cl, bmaurer, rostedt, josh, torvalds, catalin.marinas, will.deacon, mtk.manpages, Joel Fernandes On Wed, May 02, 2018 at 03:53:47AM +0000, Daniel Colascione wrote: > The usual approach to "better" is an "adaptive mutex". Such a thing, when > it attempts to acquire a lock another thread owns, spins for some number of > iterations, then falls back to futex. I guess that's a little better than > immediately jumping to futex, but it's not optimal. We can still spin when > the lock owner isn't scheduled, and the spin count is usually some guess > (either specified manually or estimated statistically) that's not > guaranteed to produce decent results. Even if we do pick a good spin count, > we run a very good chance of under- or over-spinning on any given > lock-acquire. We always want to sleep when spinning would be pointless. Look for the FUTEX_LOCK patches from Waiman. ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences 2018-05-02 3:53 ` [RFC PATCH for 4.18 00/14] Restartable Sequences Daniel Colascione @ 2018-05-02 16:03 ` Mathieu Desnoyers 2018-05-02 16:03 ` Mathieu Desnoyers 2018-05-02 17:22 ` Peter Zijlstra 2 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-02 16:03 UTC (permalink / raw) To: Daniel Colascione Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes ----- On May 1, 2018, at 11:53 PM, Daniel Colascione dancol@google.com wrote: [...] > > I think a small enhancement to rseq would let us build a perfect userspace > mutex, one that spins on lock-acquire only when the lock owner is running > and that sleeps otherwise, freeing userspace from both specifying ad-hoc > spin counts and from trying to detect situations in which spinning is > generally pointless. > > It'd work like this: in the per-thread rseq data structure, we'd include a > description of a futex operation for the kernel would perform (in the > context of the preempted thread) upon preemption, immediately before > schedule(). If the futex operation itself sleeps, that's no problem: we > will have still accomplished our goal of running some other thread instead > of the preempted thread. Hi Daniel, I agree that the problem you are aiming to solve is important. Let's see what prevents the proposed rseq implementation from doing what you envision. The main issue here is touching userspace immediately before schedule(). At that specific point, it's not possible to take a page fault. In the proposed rseq implementation, we get away with it by raising a task struct flag, and using it in a return to userspace notifier (where we can actually take a fault), where we touch the userspace TLS area. If we can find a way to solve this limitation, then the rest of your design makes sense to me. Thanks! Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences @ 2018-05-02 16:03 ` Mathieu Desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-02 16:03 UTC (permalink / raw) To: Daniel Colascione Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas ----- On May 1, 2018, at 11:53 PM, Daniel Colascione dancol@google.com wrote: [...] > > I think a small enhancement to rseq would let us build a perfect userspace > mutex, one that spins on lock-acquire only when the lock owner is running > and that sleeps otherwise, freeing userspace from both specifying ad-hoc > spin counts and from trying to detect situations in which spinning is > generally pointless. > > It'd work like this: in the per-thread rseq data structure, we'd include a > description of a futex operation for the kernel would perform (in the > context of the preempted thread) upon preemption, immediately before > schedule(). If the futex operation itself sleeps, that's no problem: we > will have still accomplished our goal of running some other thread instead > of the preempted thread. Hi Daniel, I agree that the problem you are aiming to solve is important. Let's see what prevents the proposed rseq implementation from doing what you envision. The main issue here is touching userspace immediately before schedule(). At that specific point, it's not possible to take a page fault. In the proposed rseq implementation, we get away with it by raising a task struct flag, and using it in a return to userspace notifier (where we can actually take a fault), where we touch the userspace TLS area. If we can find a way to solve this limitation, then the rest of your design makes sense to me. Thanks! Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences 2018-05-02 16:03 ` Mathieu Desnoyers (?) @ 2018-05-02 16:07 ` Daniel Colascione 2018-05-02 16:42 ` Steven Rostedt 2018-05-03 16:12 ` Mathieu Desnoyers -1 siblings, 2 replies; 105+ messages in thread From: Daniel Colascione @ 2018-05-02 16:07 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Peter Zijlstra, Paul McKenney, boqun.feng, luto, davejwatson, linux-kernel, linux-api, Paul Turner, Andrew Morton, linux, tglx, mingo, hpa, Andrew Hunter, andi, cl, bmaurer, rostedt, josh, torvalds, catalin.marinas, will.deacon, Michael Kerrisk-manpages, Joel Fernandes On Wed, May 2, 2018 at 9:03 AM Mathieu Desnoyers < mathieu.desnoyers@efficios.com> wrote: > ----- On May 1, 2018, at 11:53 PM, Daniel Colascione dancol@google.com wrote: > [...] > > > > I think a small enhancement to rseq would let us build a perfect userspace > > mutex, one that spins on lock-acquire only when the lock owner is running > > and that sleeps otherwise, freeing userspace from both specifying ad-hoc > > spin counts and from trying to detect situations in which spinning is > > generally pointless. > > > > It'd work like this: in the per-thread rseq data structure, we'd include a > > description of a futex operation for the kernel would perform (in the > > context of the preempted thread) upon preemption, immediately before > > schedule(). If the futex operation itself sleeps, that's no problem: we > > will have still accomplished our goal of running some other thread instead > > of the preempted thread. > Hi Daniel, > I agree that the problem you are aiming to solve is important. Let's see > what prevents the proposed rseq implementation from doing what you envision. > The main issue here is touching userspace immediately before schedule(). > At that specific point, it's not possible to take a page fault. In the proposed > rseq implementation, we get away with it by raising a task struct flag, and using > it in a return to userspace notifier (where we can actually take a fault), where > we touch the userspace TLS area. > If we can find a way to solve this limitation, then the rest of your design > makes sense to me. Thanks for taking a look! Why couldn't we take a page fault just before schedule? The reason we can't take a page fault in atomic context is that doing so might call schedule. Here, we're about to call schedule _anyway_, so what harm does it do to call something that might call schedule? If we schedule via that call, we can skip the manual schedule we were going to perform. ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences 2018-05-02 16:07 ` Daniel Colascione @ 2018-05-02 16:42 ` Steven Rostedt 2018-05-02 16:55 ` Daniel Colascione 2018-05-03 16:12 ` Mathieu Desnoyers 1 sibling, 1 reply; 105+ messages in thread From: Steven Rostedt @ 2018-05-02 16:42 UTC (permalink / raw) To: Daniel Colascione Cc: Mathieu Desnoyers, Peter Zijlstra, Paul McKenney, boqun.feng, luto, davejwatson, linux-kernel, linux-api, Paul Turner, Andrew Morton, linux, tglx, mingo, hpa, Andrew Hunter, andi, cl, bmaurer, josh, torvalds, catalin.marinas, will.deacon, Michael Kerrisk-manpages, Joel Fernandes On Wed, 02 May 2018 16:07:48 +0000 Daniel Colascione <dancol@google.com> wrote: > Why couldn't we take a page fault just before schedule? The reason we can't > take a page fault in atomic context is that doing so might call schedule. > Here, we're about to call schedule _anyway_, so what harm does it do to > call something that might call schedule? If we schedule via that call, we > can skip the manual schedule we were going to perform. Another issue is slowing down something that is considered a fast path. -- Steve ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences 2018-05-02 16:42 ` Steven Rostedt @ 2018-05-02 16:55 ` Daniel Colascione 0 siblings, 0 replies; 105+ messages in thread From: Daniel Colascione @ 2018-05-02 16:55 UTC (permalink / raw) To: rostedt Cc: Mathieu Desnoyers, Peter Zijlstra, Paul McKenney, boqun.feng, luto, davejwatson, linux-kernel, linux-api, Paul Turner, Andrew Morton, linux, tglx, mingo, hpa, Andrew Hunter, andi, cl, bmaurer, josh, torvalds, catalin.marinas, will.deacon, Michael Kerrisk-manpages, Joel Fernandes On Wed, May 2, 2018 at 9:42 AM Steven Rostedt <rostedt@goodmis.org> wrote: > On Wed, 02 May 2018 16:07:48 +0000 > Daniel Colascione <dancol@google.com> wrote: > > Why couldn't we take a page fault just before schedule? The reason we can't > > take a page fault in atomic context is that doing so might call schedule. > > Here, we're about to call schedule _anyway_, so what harm does it do to > > call something that might call schedule? If we schedule via that call, we > > can skip the manual schedule we were going to perform. > Another issue is slowing down something that is considered a fast path. There are two questions: 1) does this feature slow down schedule when you're not using it? and 2) is schedule unacceptably slow when you are using this feature? The answer to #1 is no; rseq already tests current->rseq during task switch (via rseq_set_notify_resume), so adding a single further branch (which we'd only test when we follow the current->rseq path anyway) isn't a problem. Regarding #2: yes, a futex operation will increase path length for that one task switch, but in the no-page-fault case, only by a tiny amount. We can run benchmarks of course, but I don't see any reason to suspect that the proposal would make task switching unacceptably slow. If we *do* take a page fault, we won't have done much additional work overall, since *somebody* is going to take that page fault anyway when the lock is released, and the latency of the task switch shouldn't increase, since the futex code will very quickly realize that it needs to sleep and call schedule anyway. ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences 2018-05-02 16:07 ` Daniel Colascione @ 2018-05-03 16:12 ` Mathieu Desnoyers 2018-05-03 16:12 ` Mathieu Desnoyers 1 sibling, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-03 16:12 UTC (permalink / raw) To: Daniel Colascione Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes ----- On May 2, 2018, at 12:07 PM, Daniel Colascione dancol@google.com wrote: > On Wed, May 2, 2018 at 9:03 AM Mathieu Desnoyers < > mathieu.desnoyers@efficios.com> wrote: > >> ----- On May 1, 2018, at 11:53 PM, Daniel Colascione dancol@google.com > wrote: >> [...] >> > >> > I think a small enhancement to rseq would let us build a perfect > userspace >> > mutex, one that spins on lock-acquire only when the lock owner is > running >> > and that sleeps otherwise, freeing userspace from both specifying ad-hoc >> > spin counts and from trying to detect situations in which spinning is >> > generally pointless. >> > >> > It'd work like this: in the per-thread rseq data structure, we'd > include a >> > description of a futex operation for the kernel would perform (in the >> > context of the preempted thread) upon preemption, immediately before >> > schedule(). If the futex operation itself sleeps, that's no problem: we >> > will have still accomplished our goal of running some other thread > instead >> > of the preempted thread. > >> Hi Daniel, > >> I agree that the problem you are aiming to solve is important. Let's see >> what prevents the proposed rseq implementation from doing what you > envision. > >> The main issue here is touching userspace immediately before schedule(). >> At that specific point, it's not possible to take a page fault. In the > proposed >> rseq implementation, we get away with it by raising a task struct flag, > and using >> it in a return to userspace notifier (where we can actually take a > fault), where >> we touch the userspace TLS area. > >> If we can find a way to solve this limitation, then the rest of your > design >> makes sense to me. > > Thanks for taking a look! > > Why couldn't we take a page fault just before schedule? The reason we can't > take a page fault in atomic context is that doing so might call schedule. > Here, we're about to call schedule _anyway_, so what harm does it do to > call something that might call schedule? If we schedule via that call, we > can skip the manual schedule we were going to perform. By the way, if we eventually find a way to enhance user-space mutexes in the fashion you describe here, it would belong to another TLS area, and would be registered by another system call than rseq. I proposed a more generic "TLS area registration" system call a few years ago, but Linus told me he wanted a system call that was specific to rseq. If we need to implement other use-cases in a TLS area shared between kernel and user-space in a similar fashion, the plan is to do it in a distinct system call. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences @ 2018-05-03 16:12 ` Mathieu Desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-03 16:12 UTC (permalink / raw) To: Daniel Colascione Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas ----- On May 2, 2018, at 12:07 PM, Daniel Colascione dancol@google.com wrote: > On Wed, May 2, 2018 at 9:03 AM Mathieu Desnoyers < > mathieu.desnoyers@efficios.com> wrote: > >> ----- On May 1, 2018, at 11:53 PM, Daniel Colascione dancol@google.com > wrote: >> [...] >> > >> > I think a small enhancement to rseq would let us build a perfect > userspace >> > mutex, one that spins on lock-acquire only when the lock owner is > running >> > and that sleeps otherwise, freeing userspace from both specifying ad-hoc >> > spin counts and from trying to detect situations in which spinning is >> > generally pointless. >> > >> > It'd work like this: in the per-thread rseq data structure, we'd > include a >> > description of a futex operation for the kernel would perform (in the >> > context of the preempted thread) upon preemption, immediately before >> > schedule(). If the futex operation itself sleeps, that's no problem: we >> > will have still accomplished our goal of running some other thread > instead >> > of the preempted thread. > >> Hi Daniel, > >> I agree that the problem you are aiming to solve is important. Let's see >> what prevents the proposed rseq implementation from doing what you > envision. > >> The main issue here is touching userspace immediately before schedule(). >> At that specific point, it's not possible to take a page fault. In the > proposed >> rseq implementation, we get away with it by raising a task struct flag, > and using >> it in a return to userspace notifier (where we can actually take a > fault), where >> we touch the userspace TLS area. > >> If we can find a way to solve this limitation, then the rest of your > design >> makes sense to me. > > Thanks for taking a look! > > Why couldn't we take a page fault just before schedule? The reason we can't > take a page fault in atomic context is that doing so might call schedule. > Here, we're about to call schedule _anyway_, so what harm does it do to > call something that might call schedule? If we schedule via that call, we > can skip the manual schedule we were going to perform. By the way, if we eventually find a way to enhance user-space mutexes in the fashion you describe here, it would belong to another TLS area, and would be registered by another system call than rseq. I proposed a more generic "TLS area registration" system call a few years ago, but Linus told me he wanted a system call that was specific to rseq. If we need to implement other use-cases in a TLS area shared between kernel and user-space in a similar fashion, the plan is to do it in a distinct system call. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences 2018-05-03 16:12 ` Mathieu Desnoyers (?) @ 2018-05-03 16:22 ` Daniel Colascione 2018-05-03 18:04 ` Mathieu Desnoyers -1 siblings, 1 reply; 105+ messages in thread From: Daniel Colascione @ 2018-05-03 16:22 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Peter Zijlstra, Paul McKenney, boqun.feng, luto, davejwatson, linux-kernel, linux-api, Paul Turner, Andrew Morton, linux, tglx, mingo, hpa, Andrew Hunter, andi, cl, bmaurer, rostedt, josh, torvalds, catalin.marinas, will.deacon, Michael Kerrisk-manpages, Joel Fernandes On Thu, May 3, 2018 at 9:12 AM Mathieu Desnoyers < mathieu.desnoyers@efficios.com> wrote: > By the way, if we eventually find a way to enhance user-space mutexes in the > fashion you describe here, it would belong to another TLS area, and would > be registered by another system call than rseq. I proposed a more generic > "TLS area registration" system call a few years ago, but Linus told me he > wanted a system call that was specific to rseq. If we need to implement > other use-cases in a TLS area shared between kernel and user-space in a > similar fashion, the plan is to do it in a distinct system call. If we proliferate TLS areas; we'd have to register each one upon thread creation, adding to the overall thread creation path. There's already a provision for versioning the TLS area. What's the benefit of splitting the registration over multiple system calls? ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences 2018-05-03 16:22 ` Daniel Colascione @ 2018-05-03 18:04 ` Mathieu Desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-03 18:04 UTC (permalink / raw) To: Daniel Colascione Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes ----- On May 3, 2018, at 12:22 PM, Daniel Colascione dancol@google.com wrote: > On Thu, May 3, 2018 at 9:12 AM Mathieu Desnoyers < > mathieu.desnoyers@efficios.com> wrote: >> By the way, if we eventually find a way to enhance user-space mutexes in > the >> fashion you describe here, it would belong to another TLS area, and would >> be registered by another system call than rseq. I proposed a more generic >> "TLS area registration" system call a few years ago, but Linus told me he >> wanted a system call that was specific to rseq. If we need to implement >> other use-cases in a TLS area shared between kernel and user-space in a >> similar fashion, the plan is to do it in a distinct system call. > > If we proliferate TLS areas; we'd have to register each one upon thread > creation, adding to the overall thread creation path. There's already a > provision for versioning the TLS area. What's the benefit of splitting the > registration over multiple system calls? See the original discussion thread at https://lkml.org/lkml/2016/4/7/502 Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences @ 2018-05-03 18:04 ` Mathieu Desnoyers 0 siblings, 0 replies; 105+ messages in thread From: Mathieu Desnoyers @ 2018-05-03 18:04 UTC (permalink / raw) To: Daniel Colascione Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas ----- On May 3, 2018, at 12:22 PM, Daniel Colascione dancol@google.com wrote: > On Thu, May 3, 2018 at 9:12 AM Mathieu Desnoyers < > mathieu.desnoyers@efficios.com> wrote: >> By the way, if we eventually find a way to enhance user-space mutexes in > the >> fashion you describe here, it would belong to another TLS area, and would >> be registered by another system call than rseq. I proposed a more generic >> "TLS area registration" system call a few years ago, but Linus told me he >> wanted a system call that was specific to rseq. If we need to implement >> other use-cases in a TLS area shared between kernel and user-space in a >> similar fashion, the plan is to do it in a distinct system call. > > If we proliferate TLS areas; we'd have to register each one upon thread > creation, adding to the overall thread creation path. There's already a > provision for versioning the TLS area. What's the benefit of splitting the > registration over multiple system calls? See the original discussion thread at https://lkml.org/lkml/2016/4/7/502 Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences 2018-05-03 16:12 ` Mathieu Desnoyers (?) (?) @ 2018-05-03 16:48 ` Joel Fernandes 2018-05-03 17:18 ` Daniel Colascione -1 siblings, 1 reply; 105+ messages in thread From: Joel Fernandes @ 2018-05-03 16:48 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Daniel Colascione, Peter Zijlstra, Paul McKenney, Boqun Feng, Andy Lutomirski, davejwatson, LKML, linux-api, Paul Turner, Andrew Morton, linux, Thomas Gleixner, Ingo Molnar, hpa, Andrew Hunter, andi, cl, bmaurer, Steven Rostedt, Josh Triplett, torvalds, Catalin Marinas, Will Deacon, mtk.manpages On Thu, May 3, 2018 at 9:12 AM Mathieu Desnoyers < mathieu.desnoyers@efficios.com> wrote: > ----- On May 2, 2018, at 12:07 PM, Daniel Colascione dancol@google.com wrote: > > On Wed, May 2, 2018 at 9:03 AM Mathieu Desnoyers < > > mathieu.desnoyers@efficios.com> wrote: > > > >> ----- On May 1, 2018, at 11:53 PM, Daniel Colascione dancol@google.com > > wrote: > >> [...] > >> > > >> > I think a small enhancement to rseq would let us build a perfect > > userspace > >> > mutex, one that spins on lock-acquire only when the lock owner is > > running > >> > and that sleeps otherwise, freeing userspace from both specifying ad-hoc > >> > spin counts and from trying to detect situations in which spinning is > >> > generally pointless. > >> > > >> > It'd work like this: in the per-thread rseq data structure, we'd > > include a > >> > description of a futex operation for the kernel would perform (in the > >> > context of the preempted thread) upon preemption, immediately before > >> > schedule(). If the futex operation itself sleeps, that's no problem: we > >> > will have still accomplished our goal of running some other thread > > instead > >> > of the preempted thread. > > > >> Hi Daniel, > > > >> I agree that the problem you are aiming to solve is important. Let's see > >> what prevents the proposed rseq implementation from doing what you > > envision. > > > >> The main issue here is touching userspace immediately before schedule(). > >> At that specific point, it's not possible to take a page fault. In the > > proposed > >> rseq implementation, we get away with it by raising a task struct flag, > > and using > >> it in a return to userspace notifier (where we can actually take a > > fault), where > >> we touch the userspace TLS area. > > > >> If we can find a way to solve this limitation, then the rest of your > > design > >> makes sense to me. > > > > Thanks for taking a look! > > > > Why couldn't we take a page fault just before schedule? The reason we can't > > take a page fault in atomic context is that doing so might call schedule. > > Here, we're about to call schedule _anyway_, so what harm does it do to > > call something that might call schedule? If we schedule via that call, we > > can skip the manual schedule we were going to perform. > By the way, if we eventually find a way to enhance user-space mutexes in the > fashion you describe here, it would belong to another TLS area, and would > be registered by another system call than rseq. I proposed a more generic Right. Also I still don't see any good reason why optimistic spinning in the kernel with FUTEX_LOCK, as Peter described, can't be used instead of using the rseq implementation and spinning in userspace, for such a case. I don't really fully buy that we need to design this interface assuming any privilege transition level time. If privilege level transitions are slow, we're going to have bad performance anyway. Unless there's some data to show that we have to optimistically spin in userspace than the kernel because its better to do so, we should really stick to using FUTEX_LOCK and reuse all the work that went into that area for Android and otherwise (and work with Waiman and others on improving that if there are any problems with it). I am excited though about the other synchronization design other than lock implementation that rseq can help in. thanks! - Joel ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences 2018-05-03 16:48 ` Joel Fernandes @ 2018-05-03 17:18 ` Daniel Colascione 2018-05-03 17:46 ` Joel Fernandes 0 siblings, 1 reply; 105+ messages in thread From: Daniel Colascione @ 2018-05-03 17:18 UTC (permalink / raw) To: Joel Fernandes Cc: Mathieu Desnoyers, Peter Zijlstra, Paul McKenney, boqun.feng, luto, davejwatson, linux-kernel, linux-api, Paul Turner, Andrew Morton, linux, tglx, mingo, hpa, Andrew Hunter, andi, cl, bmaurer, rostedt, josh, torvalds, catalin.marinas, will.deacon, Michael Kerrisk-manpages On Thu, May 3, 2018 at 9:48 AM Joel Fernandes <joelaf@google.com> wrote: > > > can skip the manual schedule we were going to perform. > > By the way, if we eventually find a way to enhance user-space mutexes in > the > > fashion you describe here, it would belong to another TLS area, and would > > be registered by another system call than rseq. I proposed a more generic > Right. Also I still don't see any good reason why optimistic spinning in > the kernel with FUTEX_LOCK, as Peter described, can't be used instead of > using the rseq implementation and spinning in userspace, for such a case. I > don't really fully buy that we need to design this interface assuming any > privilege transition level time. > If privilege level transitions are slow, > we're going to have bad performance anyway. That's not the case. There's a large class of program that does useful work while seldom entering the kernel: just ask the user-space network stack people. It's not wise to design interfaces around system calls being cheap. Even if system calls are currently cheap enough on some architectures some of the time, there's no guarantee that they'll stay that way, especially relative to straight-line user-mode execution. A pure user-space approach, on the other hand, involves no work in the kernel, and doing nothing is always the optimal strategy. Besides, there are environments where system calls end up being more expensive than you might think: consider strace or rr. If the kernel needs to get involved on some path, it's best that its involvement be as light as possible. > we should really stick to using FUTEX_LOCK and > reuse all the work that went into that area for Android and otherwise (and > work with Waiman and others on improving that if there are any problems > with it). FUTEX_LOCK is a return to the bad old days when systems gave you a fixed list of synchronization primitives and if you wanted something else, tough. That the latest version of the FUTEX_LOCK patch includes a separate FUTEX_LOCK_SHARED mode is concerning. The functionality the kernel provides to userspace should be more general-purpose and allow more experimentation without changes in the kernel. I see no reason to force userspace into 1) reserving 30 bits of its lockword for a TID and 2) adopting the kernel's idea of spin time heuristics and lock stealing when the same basic functionality can be provided in a generic way while reserving only one bit. That this mechanism happens to be more efficient as well is a bonus. "Mechanism not policy" is still a good design principle. ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences 2018-05-03 17:18 ` Daniel Colascione @ 2018-05-03 17:46 ` Joel Fernandes 0 siblings, 0 replies; 105+ messages in thread From: Joel Fernandes @ 2018-05-03 17:46 UTC (permalink / raw) To: Daniel Colascione Cc: Mathieu Desnoyers, Peter Zijlstra, Paul McKenney, Boqun Feng, Andy Lutomirski, davejwatson, LKML, linux-api, Paul Turner, Andrew Morton, linux, Thomas Gleixner, Ingo Molnar, hpa, Andrew Hunter, andi, cl, bmaurer, Steven Rostedt, Josh Triplett, torvalds, Catalin Marinas, Will Deacon, mtk.manpages, longman Hi Daniel, Nice to have this healthy discussion about pros/cons. Adding Waiman to the discussion as well. Curious to hear what Waiman and Peter think about all this. Some more comments inline. On Thu, May 3, 2018 at 10:19 AM Daniel Colascione <dancol@google.com> wrote: > On Thu, May 3, 2018 at 9:48 AM Joel Fernandes <joelaf@google.com> wrote: > > > > can skip the manual schedule we were going to perform. > > > By the way, if we eventually find a way to enhance user-space mutexes in > > the > > > fashion you describe here, it would belong to another TLS area, and > would > > > be registered by another system call than rseq. I proposed a more > generic > > Right. Also I still don't see any good reason why optimistic spinning in > > the kernel with FUTEX_LOCK, as Peter described, can't be used instead of > > using the rseq implementation and spinning in userspace, for such a case. > I > > don't really fully buy that we need to design this interface assuming any > > privilege transition level time. > > If privilege level transitions are slow, > > we're going to have bad performance anyway. > That's not the case. There's a large class of program that does useful work > while seldom entering the kernel: just ask the user-space network stack > people. Yes, I am aware of that. I was just saying in general, a system such as an Android embedded system, not an HPC based system does make a lot of system calls. I am not arguing that doing more things in userspace is good or bad here. I am just talking about why do something else for no good reasons (see below) when work has already been done on this area. > It's not wise to design interfaces around system calls being cheap. Even if > system calls are currently cheap enough on some architectures some of the > time, there's no guarantee that they'll stay that way, especially relative > to straight-line user-mode execution. A pure user-space approach, on the > other hand, involves no work in the kernel, and doing nothing is always the > optimal strategy. Besides, there are environments where system calls end up > being more expensive than you might think: consider strace or rr. If the > kernel needs to get involved on some path, it's best that its involvement > be as light as possible. Ofcourse, but I think we shouldn't do a premature optimization here without real data on typical Android devices about the cost of system calls entry/exit, vs spin time. I am not against userspace lock based on rseq if there is data and good reason, before investing significant time on reinventing the wheel. > > we should really stick to using FUTEX_LOCK and > > reuse all the work that went into that area for Android and otherwise (and > > work with Waiman and others on improving that if there are any problems > > with it). > FUTEX_LOCK is a return to the bad old days when systems gave you a fixed > list of synchronization primitives and if you wanted something else, tough. I am not saying we should fix sync. primitives made available to userspace, or anything. I am talking about yours/our usecase and whether another sync primitive interface is needed. For example, have another syscall to register TLS area is a new interface, vs using the existing futex interface. Linus is also against adding new sycalls unnecessarily. > That the latest version of the FUTEX_LOCK patch includes a separate > FUTEX_LOCK_SHARED mode is concerning. The functionality the kernel provides Why? That's just for reader-locks. What's the concern there? I know you had something in mind about efficient userspace rw locks but I am curious either way what you have in mind. > to userspace should be more general-purpose and allow more experimentation > without changes in the kernel. I see no reason to force userspace into 1) > reserving 30 bits of its lockword for a TID and 2) adopting the kernel's Based on our offline chat, this is for only 32-bit only systems though right? Also based on Peter's idea of putting the recursion counter outside, there shouldn't be a space issue? > idea of spin time heuristics and lock stealing when the same basic > functionality can be provided in a generic way while reserving only one > bit. That this mechanism happens to be more efficient as well is a bonus. And also probably easy to get wrong. Heuristics are hard and it would be good to work with community on getting best approach for that and improving existing code. Also about "generic way", that's even more reason in my view to do it in the kernel. > "Mechanism not policy" is still a good design principle. Again, I am not advocating forcing of interfaces anything, but I'm against reinventing the wheel and am all for spending time on improving existing things. thanks! - Joel ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences @ 2018-05-03 17:46 ` Joel Fernandes 0 siblings, 0 replies; 105+ messages in thread From: Joel Fernandes @ 2018-05-03 17:46 UTC (permalink / raw) To: Daniel Colascione Cc: Mathieu Desnoyers, Peter Zijlstra, Paul McKenney, Boqun Feng, Andy Lutomirski, davejwatson, LKML, linux-api, Paul Turner, Andrew Morton, linux, Thomas Gleixner, Ingo Molnar, hpa, Andrew Hunter, andi, cl, bmaurer, Steven Rostedt, Josh Triplett, torvalds, Catalin Marinas, Will Deacon, mtk.manpages Hi Daniel, Nice to have this healthy discussion about pros/cons. Adding Waiman to the discussion as well. Curious to hear what Waiman and Peter think about all this. Some more comments inline. On Thu, May 3, 2018 at 10:19 AM Daniel Colascione <dancol@google.com> wrote: > On Thu, May 3, 2018 at 9:48 AM Joel Fernandes <joelaf@google.com> wrote: > > > > can skip the manual schedule we were going to perform. > > > By the way, if we eventually find a way to enhance user-space mutexes in > > the > > > fashion you describe here, it would belong to another TLS area, and > would > > > be registered by another system call than rseq. I proposed a more > generic > > Right. Also I still don't see any good reason why optimistic spinning in > > the kernel with FUTEX_LOCK, as Peter described, can't be used instead of > > using the rseq implementation and spinning in userspace, for such a case. > I > > don't really fully buy that we need to design this interface assuming any > > privilege transition level time. > > If privilege level transitions are slow, > > we're going to have bad performance anyway. > That's not the case. There's a large class of program that does useful work > while seldom entering the kernel: just ask the user-space network stack > people. Yes, I am aware of that. I was just saying in general, a system such as an Android embedded system, not an HPC based system does make a lot of system calls. I am not arguing that doing more things in userspace is good or bad here. I am just talking about why do something else for no good reasons (see below) when work has already been done on this area. > It's not wise to design interfaces around system calls being cheap. Even if > system calls are currently cheap enough on some architectures some of the > time, there's no guarantee that they'll stay that way, especially relative > to straight-line user-mode execution. A pure user-space approach, on the > other hand, involves no work in the kernel, and doing nothing is always the > optimal strategy. Besides, there are environments where system calls end up > being more expensive than you might think: consider strace or rr. If the > kernel needs to get involved on some path, it's best that its involvement > be as light as possible. Ofcourse, but I think we shouldn't do a premature optimization here without real data on typical Android devices about the cost of system calls entry/exit, vs spin time. I am not against userspace lock based on rseq if there is data and good reason, before investing significant time on reinventing the wheel. > > we should really stick to using FUTEX_LOCK and > > reuse all the work that went into that area for Android and otherwise (and > > work with Waiman and others on improving that if there are any problems > > with it). > FUTEX_LOCK is a return to the bad old days when systems gave you a fixed > list of synchronization primitives and if you wanted something else, tough. I am not saying we should fix sync. primitives made available to userspace, or anything. I am talking about yours/our usecase and whether another sync primitive interface is needed. For example, have another syscall to register TLS area is a new interface, vs using the existing futex interface. Linus is also against adding new sycalls unnecessarily. > That the latest version of the FUTEX_LOCK patch includes a separate > FUTEX_LOCK_SHARED mode is concerning. The functionality the kernel provides Why? That's just for reader-locks. What's the concern there? I know you had something in mind about efficient userspace rw locks but I am curious either way what you have in mind. > to userspace should be more general-purpose and allow more experimentation > without changes in the kernel. I see no reason to force userspace into 1) > reserving 30 bits of its lockword for a TID and 2) adopting the kernel's Based on our offline chat, this is for only 32-bit only systems though right? Also based on Peter's idea of putting the recursion counter outside, there shouldn't be a space issue? > idea of spin time heuristics and lock stealing when the same basic > functionality can be provided in a generic way while reserving only one > bit. That this mechanism happens to be more efficient as well is a bonus. And also probably easy to get wrong. Heuristics are hard and it would be good to work with community on getting best approach for that and improving existing code. Also about "generic way", that's even more reason in my view to do it in the kernel. > "Mechanism not policy" is still a good design principle. Again, I am not advocating forcing of interfaces anything, but I'm against reinventing the wheel and am all for spending time on improving existing things. thanks! - Joel ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences 2018-05-03 17:46 ` Joel Fernandes @ 2018-05-04 22:17 ` Ben Maurer -1 siblings, 0 replies; 105+ messages in thread From: Ben Maurer @ 2018-05-04 22:17 UTC (permalink / raw) To: Joel Fernandes, Daniel Colascione Cc: Mathieu Desnoyers, Peter Zijlstra, Paul McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, LKML, linux-api, Paul Turner, Andrew Morton, linux, Thomas Gleixner, Ingo Molnar, hpa, Andrew Hunter, andi, cl, Steven Rostedt, Josh Triplett, torvalds, Catalin Marinas, Will Deacon, mtk.manpages, longman Hey - I think the ideas Daniel brings up here are interesting -- specifically the notion that a thread could set a "pre-sleep wish" to signal it's sleeping. As this conversation shows I think there's a fair bit of depth to that. For example, the FUTEX_LOCK is an alternative approach. Another idea might be using the "currently running cpu" area of rseq to tell if the process in question was sleeping (assuming that the kernel would be modified to set this to -1 every time a process was unscheduled) The idea discussed here seems orthogonal to the core thesis of rseq. I'm wondering if we can focus on getting rseq in, maybe with a eye for making sure this use case could be handled long term. My sense is that this is possible. We could use the flags setting in the per-thread rseq area, or maybe extend the meaning of the structure rseq_cs points to to signal that there was information about how to signal the sleeping of the current process. It seems to me this would be a natural way to add the functionality Daniel talks about if desired in the future. Daniel - do you think there's anything we should do in the patch set today that would make it easier to implement your idea in the future without expanding the scope of the patch today. i.e. is there anything else we need to do to lay the framework for your idea. I'd really love to see us get this patch in. There's a whole host of primitives this unlocks (more efficient RCU, better malloc implementations, fast reader-writer locks). I'm sure we'll have more ideas about APIs to provide once we've explored these use cases, but to me this patch is the MVP we need to ship to get that feedback. It's a solid framework that we can build upon, eg with the "opv" syscall or the idea in this thread if user feedback shows those things are necessary. -b ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences @ 2018-05-04 22:17 ` Ben Maurer 0 siblings, 0 replies; 105+ messages in thread From: Ben Maurer @ 2018-05-04 22:17 UTC (permalink / raw) To: Joel Fernandes, Daniel Colascione Cc: Mathieu Desnoyers, Peter Zijlstra, Paul McKenney, Boqun Feng, Andy Lutomirski, Dave Watson, LKML, linux-api, Paul Turner, Andrew Morton, linux, Thomas Gleixner, Ingo Molnar, hpa, Andrew Hunter, andi, cl, Steven Rostedt, Josh Triplett Hey - I think the ideas Daniel brings up here are interesting -- specifically the notion that a thread could set a "pre-sleep wish" to signal it's sleeping. As this conversation shows I think there's a fair bit of depth to that. For example, the FUTEX_LOCK is an alternative approach. Another idea might be using the "currently running cpu" area of rseq to tell if the process in question was sleeping (assuming that the kernel would be modified to set this to -1 every time a process was unscheduled) The idea discussed here seems orthogonal to the core thesis of rseq. I'm wondering if we can focus on getting rseq in, maybe with a eye for making sure this use case could be handled long term. My sense is that this is possible. We could use the flags setting in the per-thread rseq area, or maybe extend the meaning of the structure rseq_cs points to to signal that there was information about how to signal the sleeping of the current process. It seems to me this would be a natural way to add the functionality Daniel talks about if desired in the future. Daniel - do you think there's anything we should do in the patch set today that would make it easier to implement your idea in the future without expanding the scope of the patch today. i.e. is there anything else we need to do to lay the framework for your idea. I'd really love to see us get this patch in. There's a whole host of primitives this unlocks (more efficient RCU, better malloc implementations, fast reader-writer locks). I'm sure we'll have more ideas about APIs to provide once we've explored these use cases, but to me this patch is the MVP we need to ship to get that feedback. It's a solid framework that we can build upon, eg with the "opv" syscall or the idea in this thread if user feedback shows those things are necessary. -b ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences 2018-05-02 3:53 ` [RFC PATCH for 4.18 00/14] Restartable Sequences Daniel Colascione 2018-05-02 8:43 ` Peter Zijlstra 2018-05-02 16:03 ` Mathieu Desnoyers @ 2018-05-02 17:22 ` Peter Zijlstra 2018-05-02 18:27 ` Daniel Colascione 2 siblings, 1 reply; 105+ messages in thread From: Peter Zijlstra @ 2018-05-02 17:22 UTC (permalink / raw) To: Daniel Colascione Cc: mathieu.desnoyers, paulmck, boqun.feng, luto, davejwatson, linux-kernel, linux-api, pjt, Andrew Morton, linux, tglx, mingo, hpa, Andrew Hunter, andi, cl, bmaurer, rostedt, josh, torvalds, catalin.marinas, will.deacon, mtk.manpages, Joel Fernandes On Wed, May 02, 2018 at 03:53:47AM +0000, Daniel Colascione wrote: > Suppose we make a userspace mutex implemented with a lock word having three > bits: acquired, sleep_mode, and wait_pending, with the rest of the word not > being relevant at the moment. So ideally we'd kill FUTEX_WAIT/FUTEX_WAKE for mutexes entirely, and go with FUTEX_LOCK/FUTEX_UNLOCK that have the same semantics as the existing FUTEX_LOCK_PI/FUTEX_UNLOCK_PI, namely, the word contains the owner TID. As brought up in the last time we talked about spin loops, why do we care if the spin loop is in userspace or not? Aside from the whole PTI thing, the syscall cost was around 150 cycle or so, while a LOCK CMPXCHG is around 20 cycles. So ~7 spins gets you the cost of entry. ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences 2018-05-02 17:22 ` Peter Zijlstra @ 2018-05-02 18:27 ` Daniel Colascione 2018-05-02 20:22 ` Peter Zijlstra 0 siblings, 1 reply; 105+ messages in thread From: Daniel Colascione @ 2018-05-02 18:27 UTC (permalink / raw) To: Peter Zijlstra Cc: Mathieu Desnoyers, Paul McKenney, boqun.feng, luto, davejwatson, linux-kernel, linux-api, Paul Turner, Andrew Morton, linux, tglx, mingo, hpa, Andrew Hunter, andi, cl, bmaurer, rostedt, josh, torvalds, catalin.marinas, will.deacon, Michael Kerrisk-manpages, Joel Fernandes On Wed, May 2, 2018 at 10:22 AM Peter Zijlstra <peterz@infradead.org> wrote: >> On Wed, May 02, 2018 at 03:53:47AM +0000, Daniel Colascione wrote: > > Suppose we make a userspace mutex implemented with a lock word having three > > bits: acquired, sleep_mode, and wait_pending, with the rest of the word not > > being relevant at the moment. > So ideally we'd kill FUTEX_WAIT/FUTEX_WAKE for mutexes entirely, and go > with FUTEX_LOCK/FUTEX_UNLOCK that have the same semantics as the > existing FUTEX_LOCK_PI/FUTEX_UNLOCK_PI, namely, the word contains the > owner TID. That doesn't work if you want to use the rest of the word for something else, like a recursion count. With FUTEX_WAIT and FUTEX_WAKE, you can make a lock with two bits. > As brought up in the last time we talked about spin loops, why do we > care if the spin loop is in userspace or not? Aside from the whole PTI > thing, the syscall cost was around 150 cycle or so, while a LOCK CMPXCHG > is around 20 cycles. So ~7 spins gets you the cost of entry. That's pre-KPTI, isn't it? ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences 2018-05-02 18:27 ` Daniel Colascione @ 2018-05-02 20:22 ` Peter Zijlstra 2018-05-02 20:37 ` Daniel Colascione 2018-05-06 10:03 ` Thomas Gleixner 0 siblings, 2 replies; 105+ messages in thread From: Peter Zijlstra @ 2018-05-02 20:22 UTC (permalink / raw) To: Daniel Colascione Cc: Mathieu Desnoyers, Paul McKenney, boqun.feng, luto, davejwatson, linux-kernel, linux-api, Paul Turner, Andrew Morton, linux, tglx, mingo, hpa, Andrew Hunter, andi, cl, bmaurer, rostedt, josh, torvalds, catalin.marinas, will.deacon, Michael Kerrisk-manpages, Joel Fernandes On Wed, May 02, 2018 at 06:27:22PM +0000, Daniel Colascione wrote: > On Wed, May 2, 2018 at 10:22 AM Peter Zijlstra <peterz@infradead.org> wrote: > >> On Wed, May 02, 2018 at 03:53:47AM +0000, Daniel Colascione wrote: > > > Suppose we make a userspace mutex implemented with a lock word having > three > > > bits: acquired, sleep_mode, and wait_pending, with the rest of the word > not > > > being relevant at the moment. > > > So ideally we'd kill FUTEX_WAIT/FUTEX_WAKE for mutexes entirely, and go > > with FUTEX_LOCK/FUTEX_UNLOCK that have the same semantics as the > > existing FUTEX_LOCK_PI/FUTEX_UNLOCK_PI, namely, the word contains the > > owner TID. > > That doesn't work if you want to use the rest of the word for something > else, like a recursion count. With FUTEX_WAIT and FUTEX_WAKE, you can make > a lock with two bits. Recursive locks are teh most horrible crap ever. And having the tid in the word allows things like kernel based optimistic spins and possibly PI related things. > > As brought up in the last time we talked about spin loops, why do we > > care if the spin loop is in userspace or not? Aside from the whole PTI > > thing, the syscall cost was around 150 cycle or so, while a LOCK CMPXCHG > > is around 20 cycles. So ~7 spins gets you the cost of entry. > > That's pre-KPTI, isn't it? Yes, and once the hardware gets sorted, we'll be there again. I don't think we should design interfaces for 'broken' hardware. ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences 2018-05-02 20:22 ` Peter Zijlstra @ 2018-05-02 20:37 ` Daniel Colascione 2018-05-03 1:15 ` Steven Rostedt 2018-05-03 8:49 ` Peter Zijlstra 2018-05-06 10:03 ` Thomas Gleixner 1 sibling, 2 replies; 105+ messages in thread From: Daniel Colascione @ 2018-05-02 20:37 UTC (permalink / raw) To: Peter Zijlstra Cc: Mathieu Desnoyers, Paul McKenney, boqun.feng, luto, davejwatson, linux-kernel, linux-api, Paul Turner, Andrew Morton, linux, tglx, mingo, hpa, Andrew Hunter, andi, cl, bmaurer, rostedt, josh, torvalds, catalin.marinas, will.deacon, Michael Kerrisk-manpages, Joel Fernandes On Wed, May 2, 2018 at 1:23 PM Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, May 02, 2018 at 06:27:22PM +0000, Daniel Colascione wrote: > > On Wed, May 2, 2018 at 10:22 AM Peter Zijlstra <peterz@infradead.org> wrote: > > >> On Wed, May 02, 2018 at 03:53:47AM +0000, Daniel Colascione wrote: > > > > Suppose we make a userspace mutex implemented with a lock word having > > three > > > > bits: acquired, sleep_mode, and wait_pending, with the rest of the word > > not > > > > being relevant at the moment. > > > > > So ideally we'd kill FUTEX_WAIT/FUTEX_WAKE for mutexes entirely, and go > > > with FUTEX_LOCK/FUTEX_UNLOCK that have the same semantics as the > > > existing FUTEX_LOCK_PI/FUTEX_UNLOCK_PI, namely, the word contains the > > > owner TID. > > > > That doesn't work if you want to use the rest of the word for something > > else, like a recursion count. With FUTEX_WAIT and FUTEX_WAKE, you can make > > a lock with two bits. > Recursive locks are teh most horrible crap ever. And having the tid in What happened to providing mechanism, not policy? You can't wish away recursive locking. It's baked into Java and the CLR, and it's enshrined in POSIX. It's not going away, and there's no reason not to support it efficiently. > the word allows things like kernel based optimistic spins and possibly > PI related things. Sure. A lot of people don't want PI though, or at least they want to opt into it. And we shouldn't require an entry into the kernel for what we can in principle do efficiently in userspace. > > > As brought up in the last time we talked about spin loops, why do we > > > care if the spin loop is in userspace or not? Aside from the whole PTI > > > thing, the syscall cost was around 150 cycle or so, while a LOCK CMPXCHG > > > is around 20 cycles. So ~7 spins gets you the cost of entry. > > > > That's pre-KPTI, isn't it? > Yes, and once the hardware gets sorted, we'll be there again. I don't > think we should design interfaces for 'broken' hardware. It would be a mistake to design interfaces under the assumption that everyone has fast permission level transitions. ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences 2018-05-02 20:37 ` Daniel Colascione @ 2018-05-03 1:15 ` Steven Rostedt 2018-05-03 8:49 ` Peter Zijlstra 1 sibling, 0 replies; 105+ messages in thread From: Steven Rostedt @ 2018-05-03 1:15 UTC (permalink / raw) To: Daniel Colascione Cc: Peter Zijlstra, Mathieu Desnoyers, Paul McKenney, boqun.feng, luto, davejwatson, linux-kernel, linux-api, Paul Turner, Andrew Morton, linux, tglx, mingo, hpa, Andrew Hunter, andi, cl, bmaurer, josh, torvalds, catalin.marinas, will.deacon, Michael Kerrisk-manpages, Joel Fernandes, Robert Haas On Wed, 02 May 2018 20:37:13 +0000 Daniel Colascione <dancol@google.com> wrote: > On Wed, May 2, 2018 at 1:23 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > On Wed, May 02, 2018 at 06:27:22PM +0000, Daniel Colascione wrote: > > > On Wed, May 2, 2018 at 10:22 AM Peter Zijlstra <peterz@infradead.org> > wrote: > > > >> On Wed, May 02, 2018 at 03:53:47AM +0000, Daniel Colascione wrote: > > > > > Suppose we make a userspace mutex implemented with a lock word > having > > > three > > > > > bits: acquired, sleep_mode, and wait_pending, with the rest of the > word > > > not > > > > > being relevant at the moment. > > > > > > > So ideally we'd kill FUTEX_WAIT/FUTEX_WAKE for mutexes entirely, and > go > > > > with FUTEX_LOCK/FUTEX_UNLOCK that have the same semantics as the > > > > existing FUTEX_LOCK_PI/FUTEX_UNLOCK_PI, namely, the word contains the > > > > owner TID. > > > > > > That doesn't work if you want to use the rest of the word for something > > > else, like a recursion count. With FUTEX_WAIT and FUTEX_WAKE, you can > make > > > a lock with two bits. > > > Recursive locks are teh most horrible crap ever. And having the tid in > > What happened to providing mechanism, not policy? > > You can't wish away recursive locking. It's baked into Java and the CLR, > and it's enshrined in POSIX. It's not going away, and there's no reason not > to support it efficiently. > > > the word allows things like kernel based optimistic spins and possibly > > PI related things. > > Sure. A lot of people don't want PI though, or at least they want to opt > into it. And we shouldn't require an entry into the kernel for what we can > in principle do efficiently in userspace. > > > > > As brought up in the last time we talked about spin loops, why do we > > > > care if the spin loop is in userspace or not? Aside from the whole PTI > > > > thing, the syscall cost was around 150 cycle or so, while a LOCK > CMPXCHG > > > > is around 20 cycles. So ~7 spins gets you the cost of entry. What about exit? > > > > > > That's pre-KPTI, isn't it? > > > Yes, and once the hardware gets sorted, we'll be there again. I don't > > think we should design interfaces for 'broken' hardware. > > It would be a mistake to design interfaces under the assumption that > everyone has fast permission level transitions. Note, Robert Haas told me a few years ago at a plumbers conference that postgresql implements their own user space spin locks because anything that goes into the kernel has killed the performance. And they tried to use futex but that still didn't beat out plain userspace locks. -- Steve ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences 2018-05-02 20:37 ` Daniel Colascione 2018-05-03 1:15 ` Steven Rostedt @ 2018-05-03 8:49 ` Peter Zijlstra 1 sibling, 0 replies; 105+ messages in thread From: Peter Zijlstra @ 2018-05-03 8:49 UTC (permalink / raw) To: Daniel Colascione Cc: Mathieu Desnoyers, Paul McKenney, boqun.feng, luto, davejwatson, linux-kernel, linux-api, Paul Turner, Andrew Morton, linux, tglx, mingo, hpa, Andrew Hunter, andi, cl, bmaurer, rostedt, josh, torvalds, catalin.marinas, will.deacon, Michael Kerrisk-manpages, Joel Fernandes On Wed, May 02, 2018 at 08:37:13PM +0000, Daniel Colascione wrote: > > Recursive locks are teh most horrible crap ever. And having the tid in > > What happened to providing mechanism, not policy? > > You can't wish away recursive locking. It's baked into Java and the CLR, > and it's enshrined in POSIX. It's not going away, and there's no reason not > to support it efficiently. You can implement recursive locks just fine with a TID based word, just keep the recursion counter external to the futex word. If owner==self, increment etc.. > > the word allows things like kernel based optimistic spins and possibly > > PI related things. > > Sure. A lot of people don't want PI though, or at least they want to opt > into it. And we shouldn't require an entry into the kernel for what we can > in principle do efficiently in userspace. Any additional PI would certainly be opt-in, but the kernel based spinning might make sense unconditionally. ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [RFC PATCH for 4.18 00/14] Restartable Sequences 2018-05-02 20:22 ` Peter Zijlstra 2018-05-02 20:37 ` Daniel Colascione @ 2018-05-06 10:03 ` Thomas Gleixner 1 sibling, 0 replies; 105+ messages in thread From: Thomas Gleixner @ 2018-05-06 10:03 UTC (permalink / raw) To: Peter Zijlstra Cc: Daniel Colascione, Mathieu Desnoyers, Paul McKenney, boqun.feng, luto, davejwatson, linux-kernel, linux-api, Paul Turner, Andrew Morton, linux, mingo, hpa, Andrew Hunter, andi, cl, bmaurer, rostedt, josh, torvalds, catalin.marinas, will.deacon, Michael Kerrisk-manpages, Joel Fernandes On Wed, 2 May 2018, Peter Zijlstra wrote: > On Wed, May 02, 2018 at 06:27:22PM +0000, Daniel Colascione wrote: > > On Wed, May 2, 2018 at 10:22 AM Peter Zijlstra <peterz@infradead.org> wrote: > > >> On Wed, May 02, 2018 at 03:53:47AM +0000, Daniel Colascione wrote: > > > > Suppose we make a userspace mutex implemented with a lock word having > > three > > > > bits: acquired, sleep_mode, and wait_pending, with the rest of the word > > not > > > > being relevant at the moment. > > > > > So ideally we'd kill FUTEX_WAIT/FUTEX_WAKE for mutexes entirely, and go > > > with FUTEX_LOCK/FUTEX_UNLOCK that have the same semantics as the > > > existing FUTEX_LOCK_PI/FUTEX_UNLOCK_PI, namely, the word contains the > > > owner TID. > > > > That doesn't work if you want to use the rest of the word for something > > else, like a recursion count. With FUTEX_WAIT and FUTEX_WAKE, you can make > > a lock with two bits. > > Recursive locks are teh most horrible crap ever. And having the tid in > the word allows things like kernel based optimistic spins and possibly > PI related things. FWIW, robust futex have also the TID requirement. Thanks, tglx ^ permalink raw reply [flat|nested] 105+ messages in thread
end of thread, other threads:[~2018-05-28 7:00 UTC | newest] Thread overview: 105+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-04-30 22:44 [RFC PATCH for 4.18 00/14] Restartable Sequences Mathieu Desnoyers 2018-04-30 22:44 ` [PATCH 01/14] uapi headers: Provide types_32_64.h (v2) Mathieu Desnoyers 2018-04-30 22:44 ` [PATCH 02/14] rseq: Introduce restartable sequences system call (v13) Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers 2018-05-16 16:24 ` Peter Zijlstra 2018-05-16 16:24 ` Peter Zijlstra 2018-05-16 20:18 ` Mathieu Desnoyers 2018-05-16 20:18 ` Mathieu Desnoyers 2018-04-30 22:44 ` [PATCH 03/14] arm: Add restartable sequences support Mathieu Desnoyers 2018-05-16 16:18 ` Peter Zijlstra 2018-05-16 16:18 ` Peter Zijlstra 2018-05-16 20:13 ` Mathieu Desnoyers 2018-05-16 20:13 ` Mathieu Desnoyers 2018-05-17 13:32 ` Will Deacon 2018-05-17 13:32 ` Will Deacon 2018-05-17 15:30 ` Mathieu Desnoyers 2018-05-17 15:30 ` Mathieu Desnoyers 2018-05-22 18:19 ` Mathieu Desnoyers 2018-05-22 18:19 ` Mathieu Desnoyers 2018-04-30 22:44 ` [PATCH 04/14] arm: Wire up restartable sequences system call Mathieu Desnoyers 2018-04-30 22:44 ` [PATCH 05/14] x86: Add support for restartable sequences (v2) Mathieu Desnoyers 2018-04-30 22:44 ` [PATCH 06/14] x86: Wire up restartable sequence system call Mathieu Desnoyers 2018-04-30 22:44 ` [PATCH 07/14] powerpc: Add support for restartable sequences Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers 2018-05-16 16:18 ` Peter Zijlstra 2018-05-16 16:18 ` Peter Zijlstra 2018-05-16 20:13 ` Mathieu Desnoyers 2018-05-16 20:13 ` Mathieu Desnoyers 2018-05-17 1:19 ` Boqun Feng 2018-05-17 1:19 ` Boqun Feng 2018-05-17 1:19 ` Boqun Feng 2018-05-17 7:43 ` Peter Zijlstra 2018-05-17 7:43 ` Peter Zijlstra 2018-05-17 15:28 ` Mathieu Desnoyers 2018-05-17 15:28 ` Mathieu Desnoyers 2018-05-17 23:50 ` Boqun Feng 2018-05-17 23:50 ` Boqun Feng 2018-05-18 18:17 ` Mathieu Desnoyers 2018-05-18 18:17 ` Mathieu Desnoyers 2018-05-20 14:08 ` Boqun Feng 2018-05-20 14:08 ` Boqun Feng 2018-05-20 14:08 ` Boqun Feng 2018-05-23 20:14 ` Mathieu Desnoyers 2018-05-23 20:14 ` Mathieu Desnoyers 2018-05-23 20:46 ` Paul E. McKenney 2018-05-23 20:46 ` Paul E. McKenney 2018-05-23 21:29 ` Mathieu Desnoyers 2018-05-23 21:29 ` Mathieu Desnoyers 2018-05-24 1:03 ` Michael Ellerman 2018-05-24 1:03 ` Michael Ellerman 2018-05-28 7:00 ` Mathieu Desnoyers 2018-05-28 7:00 ` Mathieu Desnoyers 2018-05-18 12:38 ` Michael Ellerman 2018-05-18 12:38 ` Michael Ellerman 2018-04-30 22:44 ` [PATCH 08/14] powerpc: Wire up restartable sequences system call Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers 2018-04-30 22:44 ` [PATCH 09/14] selftests: lib.mk: Introduce OVERRIDE_TARGETS Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers 2018-04-30 22:44 ` mathieu.desnoyers 2018-04-30 22:44 ` [PATCH 10/14] rseq: selftests: Provide rseq library (v5) Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers 2018-04-30 22:44 ` mathieu.desnoyers 2018-04-30 22:44 ` [PATCH 11/14] rseq: selftests: Provide basic test Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers 2018-04-30 22:44 ` mathieu.desnoyers 2018-04-30 22:44 ` [PATCH 12/14] rseq: selftests: Provide basic percpu ops test (v2) Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers 2018-04-30 22:44 ` mathieu.desnoyers 2018-04-30 22:44 ` [PATCH 13/14] rseq: selftests: Provide parametrized tests (v2) Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers 2018-04-30 22:44 ` mathieu.desnoyers 2018-04-30 22:44 ` [PATCH 14/14] rseq: selftests: Provide Makefile, scripts, gitignore (v2) Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers 2018-04-30 22:44 ` Mathieu Desnoyers 2018-04-30 22:44 ` mathieu.desnoyers 2018-05-02 3:53 ` [RFC PATCH for 4.18 00/14] Restartable Sequences Daniel Colascione 2018-05-02 8:43 ` Peter Zijlstra 2018-05-02 16:03 ` Mathieu Desnoyers 2018-05-02 16:03 ` Mathieu Desnoyers 2018-05-02 16:07 ` Daniel Colascione 2018-05-02 16:42 ` Steven Rostedt 2018-05-02 16:55 ` Daniel Colascione 2018-05-03 16:12 ` Mathieu Desnoyers 2018-05-03 16:12 ` Mathieu Desnoyers 2018-05-03 16:22 ` Daniel Colascione 2018-05-03 18:04 ` Mathieu Desnoyers 2018-05-03 18:04 ` Mathieu Desnoyers 2018-05-03 16:48 ` Joel Fernandes 2018-05-03 17:18 ` Daniel Colascione 2018-05-03 17:46 ` Joel Fernandes 2018-05-03 17:46 ` Joel Fernandes 2018-05-04 22:17 ` Ben Maurer 2018-05-04 22:17 ` Ben Maurer 2018-05-02 17:22 ` Peter Zijlstra 2018-05-02 18:27 ` Daniel Colascione 2018-05-02 20:22 ` Peter Zijlstra 2018-05-02 20:37 ` Daniel Colascione 2018-05-03 1:15 ` Steven Rostedt 2018-05-03 8:49 ` Peter Zijlstra 2018-05-06 10:03 ` Thomas Gleixner
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.