From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mathieu Desnoyers Subject: Re: [RFC PATCH for 4.16 02/21] rseq: Introduce restartable sequences system call (v12) Date: Fri, 15 Dec 2017 16:52:22 +0000 (UTC) Message-ID: <729438855.35910.1513356742518.JavaMail.zimbra@efficios.com> References: <20171214161403.30643-1-mathieu.desnoyers@efficios.com> <12046460.34426.1513275177081.JavaMail.zimbra@efficios.com> <1537392285.34532.1513279478488.JavaMail.zimbra@efficios.com> <20171214212023.GJ3326@worktop> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Chris Lameter Cc: Peter Zijlstra , "Paul E. McKenney" , Boqun Feng , Andy Lutomirski , Dave Watson , linux-kernel , linux-api , Paul Turner , Andrew Morton , Russell King , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Andrew Hunter , Andi Kleen , Ben Maurer , rostedt , Josh Triplett , Linus Torvalds , Catalin Marinas List-Id: linux-api@vger.kernel.org ----- On Dec 15, 2017, at 10:05 AM, Chris Lameter cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org wrote: > On Thu, 14 Dec 2017, Peter Zijlstra wrote: > >> > But my company has extensive user space code that maintains a lot of >> > counters and does other tricks to get full performance out of the >> > hardware. Such a mechanism would also be good from user space. Why keep >> > the good stuff only inside the kernel? >> >> Mathieu's proposal is for userspace, _only_ userspace. > > But what we were talking about are instructions that work effectively in > kernel space whose efficiency restartable sequences could bring to user > space. It can be worthwhile to recap my understanding of this thread so far: AFAIU, Chris' proposal is to use the "gs" segment selector as instruction prefix on x86 rather than explicitly loading CPU number and calculating offsets. This can turn sequences of rseq operations like this cmpxchg: Registers: R1: return value R2: expected value R3: new value R4: cpu_id rseq cmpxchg: load TLS::cpu_id_start into R4 calculate offset of v fs:mov (store rseq descriptor address into TLS::rseq_cs) compare R4 against TLS::cpu_id jne abort mov (load v into R1) compare R1 against R2 jne cmpfail mov (store R3 into *v) into: fs:mov (store rseq descriptor address into TLS::rseq_cs) gs:mov (load *v+off into R1) compare R1 against R2 jne cmpfail gs:mov (store R3 into *v+off) My first concern with this approach is the lack of flexibility of the segment selector method wrt variety of schemes user-space has to deal with for memory allocation. In the kernel, this is achieved by ensuring that all per-cpu data layout is segment-selector-prefix friendly. Another aspect that worries me is applications using the gs segment selector for other purposes. Suddenly reserving the gs segment selector for use by a library like glibc may lead to incompatibilities with applications already using it. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com