From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752628AbdJMRVR (ORCPT ); Fri, 13 Oct 2017 13:21:17 -0400 Received: from mail-it0-f46.google.com ([209.85.214.46]:51685 "EHLO mail-it0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751542AbdJMRVO (ORCPT ); Fri, 13 Oct 2017 13:21:14 -0400 X-Google-Smtp-Source: AOwi7QClkNAlCdr/bSWZ1k1hcWmFvUDVmxCTOxLyLrOxSao3XTzqeM30U9yWFUrkdguuU0NBAH9/in42xz34pOVXUY8= MIME-Version: 1.0 In-Reply-To: <20171012230326.19984-10-mathieu.desnoyers@efficios.com> References: <20171012230326.19984-1-mathieu.desnoyers@efficios.com> <20171012230326.19984-10-mathieu.desnoyers@efficios.com> From: Andy Lutomirski Date: Fri, 13 Oct 2017 10:20:52 -0700 Message-ID: Subject: Re: [RFC PATCH for 4.15 09/14] Provide cpu_opv system call To: Mathieu Desnoyers Cc: "Paul E. McKenney" , Boqun Feng , Peter Zijlstra , Paul Turner , Andrew Hunter , Dave Watson , Josh Triplett , Will Deacon , "linux-kernel@vger.kernel.org" , Thomas Gleixner , Andi Kleen , Chris Lameter , Ingo Molnar , "H. Peter Anvin" , Ben Maurer , Steven Rostedt , Linus Torvalds , Andrew Morton , Russell King , Catalin Marinas , Michael Kerrisk , Linux API Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 12, 2017 at 4:03 PM, Mathieu Desnoyers wrote: > This new cpu_opv system call executes a vector of operations on behalf > of user-space on a specific CPU with preemption disabled. It is inspired > from readv() and writev() system calls which take a "struct iovec" array > as argument. > > The operations available are: comparison, memcpy, add, or, and, xor, > left shift, and right shift. The system call receives a CPU number from > user-space as argument, which is the CPU on which those operations need > to be performed. All preparation steps such as loading pointers, and > applying offsets to arrays, need to be performed by user-space before > invoking the system call. The "comparison" operation can be used to > check that the data used in the preparation step did not change between > preparation of system call inputs and operation execution within the > preempt-off critical section. > > The reason why we require all pointer offsets to be calculated by > user-space beforehand is because we need to use get_user_pages_fast() to > first pin all pages touched by each operation. This takes care of > faulting-in the pages. Then, preemption is disabled, and the operations > are performed atomically with respect to other thread execution on that > CPU, without generating any page fault. > > A maximum limit of 16 operations per cpu_opv syscall invocation is > enforced, so user-space cannot generate a too long preempt-off critical > section. Each operation is also limited a length of PAGE_SIZE bytes, > meaning that an operation can touch a maximum of 4 pages (memcpy: 2 > pages for source, 2 pages for destination if addresses are not aligned > on page boundaries). > > If the thread is not running on the requested CPU, a new > push_task_to_cpu() is invoked to migrate the task to the requested CPU. > If the requested CPU is not part of the cpus allowed mask of the thread, > the system call fails with EINVAL. After the migration has been > performed, preemption is disabled, and the current CPU number is checked > again and compared to the requested CPU number. If it still differs, it > means the scheduler migrated us away from that CPU. Return EAGAIN to > user-space in that case, and let user-space retry (either requesting the > same CPU number, or a different one, depending on the user-space > algorithm constraints). This series seems to get more complicated every time, and it's been so long that I've mostly forgetten all the details. I would have sworn we had a solution that got single-stepping right without any complicated work like this in the kernel and had at most a minor performance hit relative to the absolutely fastest solution. I'll try to dig it up. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andy Lutomirski Subject: Re: [RFC PATCH for 4.15 09/14] Provide cpu_opv system call Date: Fri, 13 Oct 2017 10:20:52 -0700 Message-ID: References: <20171012230326.19984-1-mathieu.desnoyers@efficios.com> <20171012230326.19984-10-mathieu.desnoyers@efficios.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Return-path: In-Reply-To: <20171012230326.19984-10-mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org> Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Mathieu Desnoyers Cc: "Paul E. McKenney" , Boqun Feng , Peter Zijlstra , Paul Turner , Andrew Hunter , Dave Watson , Josh Triplett , Will Deacon , "linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Thomas Gleixner , Andi Kleen , Chris Lameter , Ingo Molnar , "H. Peter Anvin" , Ben Maurer , Steven Rostedt , Linus Torvalds , Andrew Morton , Russell King , Catalin Marinas List-Id: linux-api@vger.kernel.org On Thu, Oct 12, 2017 at 4:03 PM, Mathieu Desnoyers wrote: > This new cpu_opv system call executes a vector of operations on behalf > of user-space on a specific CPU with preemption disabled. It is inspired > from readv() and writev() system calls which take a "struct iovec" array > as argument. > > The operations available are: comparison, memcpy, add, or, and, xor, > left shift, and right shift. The system call receives a CPU number from > user-space as argument, which is the CPU on which those operations need > to be performed. All preparation steps such as loading pointers, and > applying offsets to arrays, need to be performed by user-space before > invoking the system call. The "comparison" operation can be used to > check that the data used in the preparation step did not change between > preparation of system call inputs and operation execution within the > preempt-off critical section. > > The reason why we require all pointer offsets to be calculated by > user-space beforehand is because we need to use get_user_pages_fast() to > first pin all pages touched by each operation. This takes care of > faulting-in the pages. Then, preemption is disabled, and the operations > are performed atomically with respect to other thread execution on that > CPU, without generating any page fault. > > A maximum limit of 16 operations per cpu_opv syscall invocation is > enforced, so user-space cannot generate a too long preempt-off critical > section. Each operation is also limited a length of PAGE_SIZE bytes, > meaning that an operation can touch a maximum of 4 pages (memcpy: 2 > pages for source, 2 pages for destination if addresses are not aligned > on page boundaries). > > If the thread is not running on the requested CPU, a new > push_task_to_cpu() is invoked to migrate the task to the requested CPU. > If the requested CPU is not part of the cpus allowed mask of the thread, > the system call fails with EINVAL. After the migration has been > performed, preemption is disabled, and the current CPU number is checked > again and compared to the requested CPU number. If it still differs, it > means the scheduler migrated us away from that CPU. Return EAGAIN to > user-space in that case, and let user-space retry (either requesting the > same CPU number, or a different one, depending on the user-space > algorithm constraints). This series seems to get more complicated every time, and it's been so long that I've mostly forgetten all the details. I would have sworn we had a solution that got single-stepping right without any complicated work like this in the kernel and had at most a minor performance hit relative to the absolutely fastest solution. I'll try to dig it up.