From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751100AbeCZGZL (ORCPT ); Mon, 26 Mar 2018 02:25:11 -0400 Received: from isilmar-4.linta.de ([136.243.71.142]:40874 "EHLO isilmar-4.linta.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750851AbeCZGZJ (ORCPT ); Mon, 26 Mar 2018 02:25:09 -0400 Date: Mon, 26 Mar 2018 08:24:49 +0200 From: Dominik Brodowski To: Al Viro Cc: Ingo Molnar , Linus Torvalds , Linux Kernel Mailing List , Arnd Bergmann , linux-arch , Ralf Baechle , James Hogan , linux-mips , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , ppc-dev , Martin Schwidefsky , Heiko Carstens , linux-s390 , "David S . Miller" , sparclinux@vger.kernel.org, Ingo Molnar , Jiri Slaby , the arch/x86 maintainers Subject: Re: [RFC] new SYSCALL_DEFINE/COMPAT_SYSCALL_DEFINE wrappers Message-ID: <20180326062449.GA27503@light.dominikbrodowski.net> References: <20180318161056.5377-5-linux@dominikbrodowski.net> <20180318174014.GR30522@ZenIV.linux.org.uk> <20180318181848.GU30522@ZenIV.linux.org.uk> <20180319042300.GW30522@ZenIV.linux.org.uk> <20180319092920.tbh2xwkruegshzqe@gmail.com> <20180319232342.GX30522@ZenIV.linux.org.uk> <20180322001532.GA18399@ZenIV.linux.org.uk> <20180326004017.GA2211@ZenIV.linux.org.uk> <20180326034750.GN30522@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180326034750.GN30522@ZenIV.linux.org.uk> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 26, 2018 at 04:47:50AM +0100, Al Viro wrote: > * mips n32 and x86 x32 can become an extra source of headache. > That actually applies to any plans of passing struct pt_regs *. As it > is, e.g. syscall 515 on amd64 is compat_sys_readv(). Dispatched via > this: > /* > * NB: Native and x32 syscalls are dispatched from the same > * table. The only functional difference is the x32 bit in > * regs->orig_ax, which changes the behavior of some syscalls. > */ > if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) { > nr = array_index_nospec(nr & __SYSCALL_MASK, NR_syscalls); > regs->ax = sys_call_table[nr]( > regs->di, regs->si, regs->dx, > regs->r10, regs->r8, regs->r9); > } > Now, syscall 145 via 32bit call is *also* compat_sys_readv(), dispatched > via > nr = array_index_nospec(nr, IA32_NR_syscalls); > /* > * It's possible that a 32-bit syscall implementation > * takes a 64-bit parameter but nonetheless assumes that > * the high bits are zero. Make sure we zero-extend all > * of the args. > */ > regs->ax = ia32_sys_call_table[nr]( > (unsigned int)regs->bx, (unsigned int)regs->cx, > (unsigned int)regs->dx, (unsigned int)regs->si, > (unsigned int)regs->di, (unsigned int)regs->bp); > Right now it works - we call the same function, passing it arguments picked > from different set of registers (di/si/dx in x32 case, bx/cx/dx in i386 one). > But if we switch to passing struct pt_regs * and have the wrapper fetch > regs->{bx,cx,dx}, we have a problem. It won't work for both entry points. > > IMO it's a good reason to have dispatcher(s) handle extraction from pt_regs > and let the wrapper deal with the resulting 6 u64 or 6 u32, normalizing > them and arranging them into arguments expected by syscall body. > > Linus, Dominik - how do you plan dealing with that fun? Regardless of the > way we generate the glue, the issue remains. We can't get the same > struct pt_regs *-taking function for both; we either need to produce > a separate chunk of glue for each compat_sys_... involved (either making > COMPAT_SYSCALL_DEFINE generate both, or having duplicate X32_SYSCALL_DEFINE > for each of those COMPAT_SYSCALL_DEFINE - with identical body, at that) > or we need to have the registers-to-slots mapping done in dispatcher... Nice catch. A similar thing is needed already for non-compat syscalls like sys_close(), which takes pt_regs->bx on IA32_EMULATION and pt_regs->di on native x86-64. Therefore, I propose to generate all the stubs we need within SYSCALL_DEFINEx() and COMPAT_SYSCALL_DEFINEx() (actually, within the arch-provided version of these macros). See https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git syscalls-WIP for details on my current plans. Thanks, Dominik From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dominik Brodowski Subject: Re: [RFC] new SYSCALL_DEFINE/COMPAT_SYSCALL_DEFINE wrappers Date: Mon, 26 Mar 2018 08:24:49 +0200 Message-ID: <20180326062449.GA27503@light.dominikbrodowski.net> References: <20180318161056.5377-5-linux@dominikbrodowski.net> <20180318174014.GR30522@ZenIV.linux.org.uk> <20180318181848.GU30522@ZenIV.linux.org.uk> <20180319042300.GW30522@ZenIV.linux.org.uk> <20180319092920.tbh2xwkruegshzqe@gmail.com> <20180319232342.GX30522@ZenIV.linux.org.uk> <20180322001532.GA18399@ZenIV.linux.org.uk> <20180326004017.GA2211@ZenIV.linux.org.uk> <20180326034750.GN30522@ZenIV.linux.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <20180326034750.GN30522@ZenIV.linux.org.uk> Sender: linux-kernel-owner@vger.kernel.org List-Archive: List-Post: To: Al Viro Cc: Ingo Molnar , Linus Torvalds , Linux Kernel Mailing List , Arnd Bergmann , linux-arch , Ralf Baechle , James Hogan , linux-mips , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , ppc-dev , Martin Schwidefsky , Heiko Carstens , linux-s390 , "David S . Miller" , sparclinux@vger.kernel.org, Ingo Molnar , Jiri Slaby , the List-ID: On Mon, Mar 26, 2018 at 04:47:50AM +0100, Al Viro wrote: > * mips n32 and x86 x32 can become an extra source of headache. > That actually applies to any plans of passing struct pt_regs *. As it > is, e.g. syscall 515 on amd64 is compat_sys_readv(). Dispatched via > this: > /* > * NB: Native and x32 syscalls are dispatched from the same > * table. The only functional difference is the x32 bit in > * regs->orig_ax, which changes the behavior of some syscalls. > */ > if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) { > nr = array_index_nospec(nr & __SYSCALL_MASK, NR_syscalls); > regs->ax = sys_call_table[nr]( > regs->di, regs->si, regs->dx, > regs->r10, regs->r8, regs->r9); > } > Now, syscall 145 via 32bit call is *also* compat_sys_readv(), dispatched > via > nr = array_index_nospec(nr, IA32_NR_syscalls); > /* > * It's possible that a 32-bit syscall implementation > * takes a 64-bit parameter but nonetheless assumes that > * the high bits are zero. Make sure we zero-extend all > * of the args. > */ > regs->ax = ia32_sys_call_table[nr]( > (unsigned int)regs->bx, (unsigned int)regs->cx, > (unsigned int)regs->dx, (unsigned int)regs->si, > (unsigned int)regs->di, (unsigned int)regs->bp); > Right now it works - we call the same function, passing it arguments picked > from different set of registers (di/si/dx in x32 case, bx/cx/dx in i386 one). > But if we switch to passing struct pt_regs * and have the wrapper fetch > regs->{bx,cx,dx}, we have a problem. It won't work for both entry points. > > IMO it's a good reason to have dispatcher(s) handle extraction from pt_regs > and let the wrapper deal with the resulting 6 u64 or 6 u32, normalizing > them and arranging them into arguments expected by syscall body. > > Linus, Dominik - how do you plan dealing with that fun? Regardless of the > way we generate the glue, the issue remains. We can't get the same > struct pt_regs *-taking function for both; we either need to produce > a separate chunk of glue for each compat_sys_... involved (either making > COMPAT_SYSCALL_DEFINE generate both, or having duplicate X32_SYSCALL_DEFINE > for each of those COMPAT_SYSCALL_DEFINE - with identical body, at that) > or we need to have the registers-to-slots mapping done in dispatcher... Nice catch. A similar thing is needed already for non-compat syscalls like sys_close(), which takes pt_regs->bx on IA32_EMULATION and pt_regs->di on native x86-64. Therefore, I propose to generate all the stubs we need within SYSCALL_DEFINEx() and COMPAT_SYSCALL_DEFINEx() (actually, within the arch-provided version of these macros). See https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git syscalls-WIP for details on my current plans. Thanks, Dominik From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dominik Brodowski Date: Mon, 26 Mar 2018 06:24:49 +0000 Subject: Re: [RFC] new SYSCALL_DEFINE/COMPAT_SYSCALL_DEFINE wrappers Message-Id: <20180326062449.GA27503@light.dominikbrodowski.net> List-Id: References: <20180318161056.5377-5-linux@dominikbrodowski.net> <20180318174014.GR30522@ZenIV.linux.org.uk> <20180318181848.GU30522@ZenIV.linux.org.uk> <20180319042300.GW30522@ZenIV.linux.org.uk> <20180319092920.tbh2xwkruegshzqe@gmail.com> <20180319232342.GX30522@ZenIV.linux.org.uk> <20180322001532.GA18399@ZenIV.linux.org.uk> <20180326004017.GA2211@ZenIV.linux.org.uk> <20180326034750.GN30522@ZenIV.linux.org.uk> In-Reply-To: <20180326034750.GN30522@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Al Viro Cc: Ingo Molnar , Linus Torvalds , Linux Kernel Mailing List , Arnd Bergmann , linux-arch , Ralf Baechle , James Hogan , linux-mips , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , ppc-dev , Martin Schwidefsky , Heiko Carstens , linux-s390 , "David S . Miller" , sparclinux@vger.kernel.org, Ingo Molnar , Jiri Slaby , the On Mon, Mar 26, 2018 at 04:47:50AM +0100, Al Viro wrote: > * mips n32 and x86 x32 can become an extra source of headache. > That actually applies to any plans of passing struct pt_regs *. As it > is, e.g. syscall 515 on amd64 is compat_sys_readv(). Dispatched via > this: > /* > * NB: Native and x32 syscalls are dispatched from the same > * table. The only functional difference is the x32 bit in > * regs->orig_ax, which changes the behavior of some syscalls. > */ > if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) { > nr = array_index_nospec(nr & __SYSCALL_MASK, NR_syscalls); > regs->ax = sys_call_table[nr]( > regs->di, regs->si, regs->dx, > regs->r10, regs->r8, regs->r9); > } > Now, syscall 145 via 32bit call is *also* compat_sys_readv(), dispatched > via > nr = array_index_nospec(nr, IA32_NR_syscalls); > /* > * It's possible that a 32-bit syscall implementation > * takes a 64-bit parameter but nonetheless assumes that > * the high bits are zero. Make sure we zero-extend all > * of the args. > */ > regs->ax = ia32_sys_call_table[nr]( > (unsigned int)regs->bx, (unsigned int)regs->cx, > (unsigned int)regs->dx, (unsigned int)regs->si, > (unsigned int)regs->di, (unsigned int)regs->bp); > Right now it works - we call the same function, passing it arguments picked > from different set of registers (di/si/dx in x32 case, bx/cx/dx in i386 one). > But if we switch to passing struct pt_regs * and have the wrapper fetch > regs->{bx,cx,dx}, we have a problem. It won't work for both entry points. > > IMO it's a good reason to have dispatcher(s) handle extraction from pt_regs > and let the wrapper deal with the resulting 6 u64 or 6 u32, normalizing > them and arranging them into arguments expected by syscall body. > > Linus, Dominik - how do you plan dealing with that fun? Regardless of the > way we generate the glue, the issue remains. We can't get the same > struct pt_regs *-taking function for both; we either need to produce > a separate chunk of glue for each compat_sys_... involved (either making > COMPAT_SYSCALL_DEFINE generate both, or having duplicate X32_SYSCALL_DEFINE > for each of those COMPAT_SYSCALL_DEFINE - with identical body, at that) > or we need to have the registers-to-slots mapping done in dispatcher... Nice catch. A similar thing is needed already for non-compat syscalls like sys_close(), which takes pt_regs->bx on IA32_EMULATION and pt_regs->di on native x86-64. Therefore, I propose to generate all the stubs we need within SYSCALL_DEFINEx() and COMPAT_SYSCALL_DEFINEx() (actually, within the arch-provided version of these macros). See https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git syscalls-WIP for details on my current plans. Thanks, Dominik