From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.9 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D963AECE564 for ; Wed, 19 Sep 2018 14:40:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6789820877 for ; Wed, 19 Sep 2018 14:40:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="Z1OM37ji" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6789820877 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=efficios.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731847AbeISUSx (ORCPT ); Wed, 19 Sep 2018 16:18:53 -0400 Received: from mail.efficios.com ([167.114.142.138]:38856 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731221AbeISUSx (ORCPT ); Wed, 19 Sep 2018 16:18:53 -0400 Received: from localhost (ip6-localhost [IPv6:::1]) by mail.efficios.com (Postfix) with ESMTP id C714D23EE69; Wed, 19 Sep 2018 10:40:36 -0400 (EDT) Received: from mail.efficios.com ([IPv6:::1]) by localhost (mail02.efficios.com [IPv6:::1]) (amavisd-new, port 10032) with ESMTP id VJMHhmk0R-m4; Wed, 19 Sep 2018 10:40:35 -0400 (EDT) Received: from localhost (ip6-localhost [IPv6:::1]) by mail.efficios.com (Postfix) with ESMTP id 8BAED23EE66; Wed, 19 Sep 2018 10:40:35 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com 8BAED23EE66 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1537368035; bh=g2pRmxuvMbPEb06AKcfvJoehfciBdX0j4FG4a+TcCOA=; h=From:To:Date:Message-Id; b=Z1OM37jiHOTt5/E/sk/qI3iMzgq47HmLLnoAfkT3X5a73jNp6rL8IKJa9ZLqWAyiT 4HktJ5uYxiZ6ZnYd691V2b/rQ5b+1aaI/hIvUzEQjpRYi6ztIHG6jVAtLHvdYZ4y5q 8mCnHwRaNCNwGWBrm/8Yb47YuDXlxQyKPYb5YVCy/ykn4TVo3i0bsTkRF85rbtvBng MHxEqVtNuPuhFUhrF9hGUKoARKTgWSwAW4dnJJmBtMTxbyYCYY413X4Ug5Pl+Xjv1T tTO4DPmoFYbDPFjIfcVit5qoybGvesiY3rxFlpAusJDju7OnrG5CdsTR1cYZQG7ktw foY0B4A/ksK9w== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([IPv6:::1]) by localhost (mail02.efficios.com [IPv6:::1]) (amavisd-new, port 10026) with ESMTP id luyvcithLIFt; Wed, 19 Sep 2018 10:40:35 -0400 (EDT) Received: from thinkos.internal.efficios.com (192-222-157-41.qc.cable.ebox.net [192.222.157.41]) by mail.efficios.com (Postfix) with ESMTPSA id 28C1B23EE5D; Wed, 19 Sep 2018 10:40:35 -0400 (EDT) From: Mathieu Desnoyers To: Michael Kerrisk Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, Peter Zijlstra , "Paul E . McKenney" , Boqun Feng , Andy Lutomirski , Dave Watson , Paul Turner , Andrew Morton , Russell King , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , Andi Kleen , Chris Lameter , Ben Maurer , Steven Rostedt , Josh Triplett , Linus Torvalds , Catalin Marinas , Will Deacon , Mathieu Desnoyers Subject: [PATCH man-pages] Add rseq manpage Date: Wed, 19 Sep 2018 10:40:28 -0400 Message-Id: <20180919144028.10863-1-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.11.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Signed-off-by: Mathieu Desnoyers CC: "Paul E. McKenney" CC: Peter Zijlstra CC: Paul Turner CC: Thomas Gleixner CC: Andy Lutomirski CC: Andi Kleen CC: Dave Watson CC: Chris Lameter CC: Ingo Molnar CC: "H. Peter Anvin" CC: Ben Maurer CC: Steven Rostedt CC: Josh Triplett CC: Linus Torvalds CC: Andrew Morton CC: Russell King CC: Catalin Marinas CC: Will Deacon CC: Michael Kerrisk CC: Boqun Feng CC: linux-api@vger.kernel.org --- man2/rseq.2 | 291 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 291 insertions(+) create mode 100644 man2/rseq.2 diff --git a/man2/rseq.2 b/man2/rseq.2 new file mode 100644 index 000000000..a381963ba --- /dev/null +++ b/man2/rseq.2 @@ -0,0 +1,291 @@ +.\" Copyright 2015-2018 Mathieu Desnoyers +.\" +.\" %%%LICENSE_START(VERBATIM) +.\" Permission is granted to make and distribute verbatim copies of this +.\" manual provided the copyright notice and this permission notice are +.\" preserved on all copies. +.\" +.\" Permission is granted to copy and distribute modified versions of this +.\" manual under the conditions for verbatim copying, provided that the +.\" entire resulting derived work is distributed under the terms of a +.\" permission notice identical to this one. +.\" +.\" Since the Linux kernel and libraries are constantly changing, this +.\" manual page may be incorrect or out-of-date. The author(s) assume no +.\" responsibility for errors or omissions, or for damages resulting from +.\" the use of the information contained herein. The author(s) may not +.\" have taken the same level of care in the production of this manual, +.\" which is licensed free of charge, as they might when working +.\" professionally. +.\" +.\" Formatted or processed versions of this manual, if unaccompanied by +.\" the source, must acknowledge the copyright and authors of this work. +.\" %%%LICENSE_END +.\" +.TH RSEQ 2 2018-09-19 "Linux" "Linux Programmer's Manual" +.SH NAME +rseq \- Restartable sequences and cpu number cache +.SH SYNOPSIS +.nf +.B #include +.sp +.BI "int rseq(struct rseq * " rseq ", uint32_t " rseq_len ", int " flags ", uint32_t " sig "); +.sp +.SH DESCRIPTION +The +.BR rseq () +ABI accelerates user-space operations on per-cpu data by defining a +shared data structure ABI between each user-space thread and the kernel. + +It allows user-space to perform update operations on per-cpu data +without requiring heavy-weight atomic operations. + +The term CPU used in this documentation refers to a hardware execution +context. + +Restartable sequences are atomic with respect to preemption (making it +atomic with respect to other threads running on the same CPU), as well +as signal delivery (user-space execution contexts nested over the same +thread). They either complete atomically with respect to preemption on +the current CPU and signal delivery, or they are aborted. + +It is suited for update operations on per-cpu data. + +It can be used on data structures shared between threads within a +process, and on data structures shared between threads across different +processes. + +.PP +Some examples of operations that can be accelerated or improved +by this ABI: +.IP \[bu] 2 +Memory allocator per-cpu free-lists, +.IP \[bu] 2 +Querying the current CPU number, +.IP \[bu] 2 +Incrementing per-CPU counters, +.IP \[bu] 2 +Modifying data protected by per-CPU spinlocks, +.IP \[bu] 2 +Inserting/removing elements in per-CPU linked-lists, +.IP \[bu] 2 +Writing/reading per-CPU ring buffers content. +.IP \[bu] 2 +Accurately reading performance monitoring unit counters +with respect to thread migration. + +.PP +Restartable sequences must not perform system calls. Doing so may result +in termination of the process by a segmentation fault. + +.PP +The +.I rseq +argument is a pointer to the thread-local rseq structure to be shared +between kernel and user-space. + +.PP +The layout of +.B struct rseq +is as follows: +.TP +.B Structure alignment +This structure is aligned on 32-byte boundary. +.TP +.B Structure size +This structure is extensible. Its size is passed as parameter to the +rseq system call. +.TP +.B Fields + +.TP +.in +4n +.I cpu_id_start +Optimistic cache of the CPU number on which the current thread is +running. Its value is guaranteed to always be a possible CPU number, +even when rseq is not initialized. The value it contains should always +be confirmed by reading the cpu_id field. + +This field is an optimistic cache in the sense that it is always +guaranteed to hold a valid CPU number in the range [ 0 .. +nr_possible_cpus - 1 ]. It can therefore be loaded by user-space and +used as an offset in per-cpu data structures without having to +check whether its value is within the valid bounds compared to the +number of possible CPUs in the system. + +For user-space applications executed on a kernel without rseq support, +the cpu_id_start field stays initialized at 0, which is indeed a valid +CPU number. It is therefore valid to use it as an offset in per-cpu data +structures, and only validate whether it's actually the current CPU +number by comparing it with the cpu_id field within the rseq critical +section. If the kernel does not provide rseq support, that cpu_id field +stays initialized at -1, so the comparison always fails, as intended. + +It is then up to user-space to use a fall-back mechanism, considering +that rseq is not available. + +.in +.TP +.in +4n +.I cpu_id +Cache of the CPU number on which the current thread is running. +-1 if uninitialized. +.in +.TP +.in +4n +.I rseq_cs +The rseq_cs field is a pointer to a struct rseq_cs. Is is NULL when no +rseq assembly block critical section is active for the current thread. +Setting it to point to a critical section descriptor (struct rseq_cs) +marks the beginning of the critical section. +.in +.TP +.in +4n +.I flags +Flags indicating the restart behavior for the current thread. This is +mainly used for debugging purposes. Can be either: +.IP \[bu] +RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT +.IP \[bu] +RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL +.IP \[bu] +RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE +.in + +.PP +The layout of +.B struct rseq_cs +version 0 is as follows: +.TP +.B Structure alignment +This structure is aligned on 32-byte boundary. +.TP +.B Structure size +This structure has a fixed size of 32 bytes. +.TP +.B Fields + +.TP +.in +4n +.I version +Version of this structure. +.in +.TP +.in +4n +.I flags +Flags indicating the restart behavior of this structure. Can be +a combination of: +.IP \[bu] +RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT +.IP \[bu] +RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL +.IP \[bu] +RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE +.TP +.in +4n +.I start_ip +Instruction pointer address of the first instruction of the sequence of +consecutive assembly instructions. +.in +.TP +.in +4n +.I post_commit_offset +Offset (from start_ip address) of the address after the last instruction +of the sequence of consecutive assembly instructions. +.in +.TP +.in +4n +.I abort_ip +Instruction pointer address where to move the execution flow in case of +abort of the sequence of consecutive assembly instructions. +.in + +.PP +The +.I rseq_len +argument is the size of the +.I struct rseq +to register. + +.PP +The +.I flags +argument is 0 for registration, and +.IR RSEQ_FLAG_UNREGISTER +for unregistration. + +.PP +The +.I sig +argument is the 32-bit signature to be expected before the abort +handler code. + +.PP +A single library per process should keep the rseq structure in a +thread-local storage variable. +The +.I cpu_id +field should be initialized to -1, and the +.I cpu_id_start +field should be initialized to a possible CPU value (typically 0). + +.PP +Each thread is responsible for registering and unregistering its rseq +structure. No more than one rseq structure address can be registered +per thread at a given time. + +.PP +In a typical usage scenario, the thread registering the rseq +structure will be performing loads and stores from/to that structure. It +is however also allowed to read that structure from other threads. +The rseq field updates performed by the kernel provide relaxed atomicity +semantics, which guarantee that other threads performing relaxed atomic +reads of the cpu number cache will always observe a consistent value. + +.SH RETURN VALUE +A return value of 0 indicates success. On error, \-1 is returned, and +.I errno +is set appropriately. + +.SH ERRORS +.TP +.B EINVAL +Either +.I flags +contains an invalid value, or +.I rseq +contains an address which is not appropriately aligned, or +.I rseq_len +contains a size that does not match the size received on registration. +.TP +.B ENOSYS +The +.BR rseq () +system call is not implemented by this kernel. +.TP +.B EFAULT +.I rseq +is an invalid address. +.TP +.B EBUSY +Restartable sequence is already registered for this thread. +.TP +.B EPERM +The +.I sig +argument on unregistration does not match the signature received +on registration. + +.SH VERSIONS +The +.BR rseq () +system call was added in Linux 4.18. + +.SH CONFORMING TO +.BR rseq () +is Linux-specific. + +.in +.SH SEE ALSO +.BR sched_getcpu (3) , +.BR membarrier (2) -- 2.11.0