From: Catalin Marinas <catalin.marinas@arm.com> To: Yury Norov <ynorov@caviumnetworks.com> Cc: David Miller <davem@davemloft.net>, arnd@arndb.de, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-s390@vger.kernel.org, libc-alpha@sourceware.org, schwidefsky@de.ibm.com, heiko.carstens@de.ibm.com, pinskia@gmail.com, broonie@kernel.org, joseph@codesourcery.com, christoph.muellner@theobroma-systems.com, bamvor.zhangjian@huawei.com, szabolcs.nagy@arm.com, klimov.linux@gmail.com, Nathan_Lynch@mentor.com, agraf@suse.de, Prasun.Kapoor@caviumnetworks.com, kilobyte@angband.pl, geert@linux-m68k.org, philipp.tomsich@theobroma-systems.com Subject: Re: [PATCH 01/23] all: syscall wrappers: add documentation Date: Thu, 26 May 2016 23:29:45 +0100 [thread overview] Message-ID: <20160526222943.GA16729@MBP.local> (raw) In-Reply-To: <20160526204819.GA10274@yury-N73SV> On Thu, May 26, 2016 at 11:48:19PM +0300, Yury Norov wrote: > On Wed, May 25, 2016 at 02:28:21PM -0700, David Miller wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > Date: Wed, 25 May 2016 23:01:06 +0200 > > > > > On Wednesday, May 25, 2016 1:50:39 PM CEST David Miller wrote: > > >> From: Arnd Bergmann <arnd@arndb.de> > > >> Date: Wed, 25 May 2016 22:47:33 +0200 > > >> > > >> > If we use the normal calling conventions, we could remove these overrides > > >> > along with the respective special-case handling in glibc. None of them > > >> > look particularly performance-sensitive, but I could be wrong there. > > >> > > >> You could set the lowest bit in the system call entry pointer to indicate > > >> the upper-half clears should be elided. > > > > > > Right, but that would introduce an extra conditional branch in the syscall > > > hotpath, and likely eliminate the gains from passing the loff_t arguments > > > in a single register instead of a pair. > > > > Ok, then, how much are you really gaining from avoiding a 'shift' and > > an 'or' to build the full 64-bit value? 3 cycles? Maybe 4? > > 4 cycles in kernel and ~same cost in glibc to create a pair. It would take a single instruction per argument in the kernel to do shift+or and maybe 1-2 more instructions to move the remaining arguments in place (we do this for a few wrappers in arch/arm64/kernel/entry32.S). And the glibc counterpart. > And 8 'mov's that exist for every syscall, even yield(). > > > And the executing the wrappers, those have a non-trivial cost too. > > The cost is pretty trivial though. See kernel/compat_wrapper.o: > COMPAT_SYSCALL_WRAP2(creat, const char __user *, pathname, umode_t, mode); > 0: a9bf7bfd stp x29, x30, [sp,#-16]! > 4: 910003fd mov x29, sp > 8: 2a0003e0 mov w0, w0 > c: 94000000 bl 0 <sys_creat> > 10: a8c17bfd ldp x29, x30, [sp],#16 > 14: d65f03c0 ret I would say the above could be more expensive than 8 movs (16 bytes to write, read, a branch and a ret). You can also add the I-cache locality, having wrappers for each syscalls instead of a single place for zeroing the upper half (where no other wrapper is necessary). Can we trick the compiler into doing a tail call optimisation. This could have simply been: COMPAT_SYSCALL_WRAP2(creat, ...): mov w0, w0 b <sys_creat> > > Cost wise, this seems like it all cancels out in the end, but what > > do I know? > > I think you know something, and I also think Heiko and other s390 guys > know something as well. So I'd like to listen their arguments here. > > For me spark64 way is looking reasonable only because it's really simple > and takes less coding. I'll try it on some branch and share here what happened. The kernel code will definitely look simpler ;). It would be good to see if there actually is any performance impact. Even with 16 more cycles on syscall entry, would they be lost in the noise? You don't need a full implementation, just some dummy mov x0, x0 on the entry path. -- Catalin
WARNING: multiple messages have this Message-ID (diff)
From: catalin.marinas@arm.com (Catalin Marinas) To: linux-arm-kernel@lists.infradead.org Subject: [PATCH 01/23] all: syscall wrappers: add documentation Date: Thu, 26 May 2016 23:29:45 +0100 [thread overview] Message-ID: <20160526222943.GA16729@MBP.local> (raw) In-Reply-To: <20160526204819.GA10274@yury-N73SV> On Thu, May 26, 2016 at 11:48:19PM +0300, Yury Norov wrote: > On Wed, May 25, 2016 at 02:28:21PM -0700, David Miller wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > Date: Wed, 25 May 2016 23:01:06 +0200 > > > > > On Wednesday, May 25, 2016 1:50:39 PM CEST David Miller wrote: > > >> From: Arnd Bergmann <arnd@arndb.de> > > >> Date: Wed, 25 May 2016 22:47:33 +0200 > > >> > > >> > If we use the normal calling conventions, we could remove these overrides > > >> > along with the respective special-case handling in glibc. None of them > > >> > look particularly performance-sensitive, but I could be wrong there. > > >> > > >> You could set the lowest bit in the system call entry pointer to indicate > > >> the upper-half clears should be elided. > > > > > > Right, but that would introduce an extra conditional branch in the syscall > > > hotpath, and likely eliminate the gains from passing the loff_t arguments > > > in a single register instead of a pair. > > > > Ok, then, how much are you really gaining from avoiding a 'shift' and > > an 'or' to build the full 64-bit value? 3 cycles? Maybe 4? > > 4 cycles in kernel and ~same cost in glibc to create a pair. It would take a single instruction per argument in the kernel to do shift+or and maybe 1-2 more instructions to move the remaining arguments in place (we do this for a few wrappers in arch/arm64/kernel/entry32.S). And the glibc counterpart. > And 8 'mov's that exist for every syscall, even yield(). > > > And the executing the wrappers, those have a non-trivial cost too. > > The cost is pretty trivial though. See kernel/compat_wrapper.o: > COMPAT_SYSCALL_WRAP2(creat, const char __user *, pathname, umode_t, mode); > 0: a9bf7bfd stp x29, x30, [sp,#-16]! > 4: 910003fd mov x29, sp > 8: 2a0003e0 mov w0, w0 > c: 94000000 bl 0 <sys_creat> > 10: a8c17bfd ldp x29, x30, [sp],#16 > 14: d65f03c0 ret I would say the above could be more expensive than 8 movs (16 bytes to write, read, a branch and a ret). You can also add the I-cache locality, having wrappers for each syscalls instead of a single place for zeroing the upper half (where no other wrapper is necessary). Can we trick the compiler into doing a tail call optimisation. This could have simply been: COMPAT_SYSCALL_WRAP2(creat, ...): mov w0, w0 b <sys_creat> > > Cost wise, this seems like it all cancels out in the end, but what > > do I know? > > I think you know something, and I also think Heiko and other s390 guys > know something as well. So I'd like to listen their arguments here. > > For me spark64 way is looking reasonable only because it's really simple > and takes less coding. I'll try it on some branch and share here what happened. The kernel code will definitely look simpler ;). It would be good to see if there actually is any performance impact. Even with 16 more cycles on syscall entry, would they be lost in the noise? You don't need a full implementation, just some dummy mov x0, x0 on the entry path. -- Catalin
next prev parent reply other threads:[~2016-05-26 22:29 UTC|newest] Thread overview: 207+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-05-24 0:04 [PATCH v6 00/21] ILP32 for ARM64 Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` [PATCH 01/23] all: syscall wrappers: add documentation Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-25 19:30 ` David Miller 2016-05-25 19:30 ` David Miller 2016-05-25 20:03 ` Yury Norov 2016-05-25 20:03 ` Yury Norov 2016-05-25 20:03 ` Yury Norov 2016-05-25 20:21 ` David Miller 2016-05-25 20:21 ` David Miller 2016-05-25 20:47 ` Arnd Bergmann 2016-05-25 20:47 ` Arnd Bergmann 2016-05-25 20:50 ` David Miller 2016-05-25 20:50 ` David Miller 2016-05-25 21:01 ` Arnd Bergmann 2016-05-25 21:01 ` Arnd Bergmann 2016-05-25 21:28 ` David Miller 2016-05-25 21:28 ` David Miller 2016-05-26 14:20 ` Catalin Marinas 2016-05-26 14:20 ` Catalin Marinas 2016-05-26 14:50 ` Szabolcs Nagy 2016-05-26 14:50 ` Szabolcs Nagy 2016-05-26 14:50 ` Szabolcs Nagy 2016-05-26 15:19 ` Catalin Marinas 2016-05-26 15:19 ` Catalin Marinas 2016-05-26 19:43 ` David Miller 2016-05-26 19:43 ` David Miller 2016-05-27 10:10 ` Catalin Marinas 2016-05-27 10:10 ` Catalin Marinas 2016-05-26 20:48 ` Yury Norov 2016-05-26 20:48 ` Yury Norov 2016-05-26 20:48 ` Yury Norov 2016-05-26 22:29 ` Catalin Marinas [this message] 2016-05-26 22:29 ` Catalin Marinas 2016-05-27 0:37 ` Yury Norov 2016-05-27 0:37 ` Yury Norov 2016-05-27 0:37 ` Yury Norov 2016-05-27 6:03 ` Heiko Carstens 2016-05-27 6:03 ` Heiko Carstens 2016-05-27 8:42 ` Arnd Bergmann 2016-05-27 8:42 ` Arnd Bergmann 2016-05-27 9:30 ` Catalin Marinas 2016-05-27 9:30 ` Catalin Marinas 2016-05-27 10:49 ` Arnd Bergmann 2016-05-27 10:49 ` Arnd Bergmann 2016-05-27 13:04 ` Catalin Marinas 2016-05-27 13:04 ` Catalin Marinas 2016-05-27 16:58 ` Yury Norov 2016-05-27 16:58 ` Yury Norov 2016-05-27 16:58 ` Yury Norov 2016-05-27 17:36 ` Catalin Marinas 2016-05-27 17:36 ` Catalin Marinas 2016-05-27 9:01 ` Catalin Marinas 2016-05-27 9:01 ` Catalin Marinas 2016-06-14 23:08 ` Yury Norov 2016-06-14 23:08 ` Yury Norov 2016-06-14 23:08 ` Yury Norov 2016-05-27 5:52 ` Heiko Carstens 2016-05-27 5:52 ` Heiko Carstens 2016-05-24 0:04 ` [PATCH 02/23] all: introduce COMPAT_WRAPPER option and enable it for s390 Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` [PATCH 03/23] all: s390: move wrapper infrastructure to generic headers Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` [PATCH 04/23] all: s390: move compat_wrappers.c from arch/s390/kernel to kernel/ Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` [PATCH 05/23] all: wrap needed syscalls in generic unistd Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` [PATCH 06/23] compat ABI: use non-compat openat and open_by_handle_at variants Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` [PATCH 07/23] 32-bit ABI: introduce ARCH_32BIT_OFF_T config option Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` [PATCH 08/23] arm64: ilp32: add documentation on the ILP32 ABI for ARM64 Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` [PATCH 09/23] arm64: ensure the kernel is compiled for LP64 Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` [PATCH 10/23] arm64: rename COMPAT to AARCH32_EL0 in Kconfig Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` [PATCH 11/23] arm64:uapi: set __BITS_PER_LONG correctly for ILP32 and LP64 Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` [PATCH 12/23] thread: move thread bits accessors to separated file Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` [PATCH 13/23] arm64: introduce is_a32_task and is_a32_thread (for AArch32 compat) Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-06-12 12:21 ` Zhangjian (Bamvor) 2016-06-12 12:21 ` Zhangjian (Bamvor) 2016-06-12 12:21 ` Zhangjian (Bamvor) 2016-06-12 13:08 ` Zhangjian (Bamvor) 2016-06-12 13:08 ` Zhangjian (Bamvor) 2016-06-12 13:08 ` Zhangjian (Bamvor) 2016-06-12 17:56 ` Yury Norov 2016-06-12 17:56 ` Yury Norov 2016-06-12 17:56 ` Yury Norov 2016-05-24 0:04 ` [PATCH 14/23] arm64: ilp32: add is_ilp32_compat_{task,thread} and TIF_32BIT_AARCH64 Yury Norov 2016-05-24 0:04 ` [PATCH 14/23] arm64: ilp32: add is_ilp32_compat_{task, thread} " Yury Norov 2016-05-24 0:04 ` [PATCH 14/23] arm64: ilp32: add is_ilp32_compat_{task,thread} " Yury Norov 2016-05-24 0:04 ` [PATCH 15/23] arm64: introduce binfmt_elf32.c Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` [PATCH 16/23] arm64: ilp32: introduce binfmt_ilp32.c Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-26 13:49 ` Zhangjian (Bamvor) 2016-05-26 13:49 ` Zhangjian (Bamvor) 2016-05-26 13:49 ` Zhangjian (Bamvor) 2016-05-26 21:08 ` Yury Norov 2016-05-26 21:08 ` Yury Norov 2016-05-26 21:08 ` Yury Norov 2016-06-15 0:40 ` Yury Norov 2016-06-15 0:40 ` Yury Norov 2016-06-15 0:40 ` Yury Norov 2016-06-13 3:05 ` Zhangjian (Bamvor) 2016-06-13 3:05 ` Zhangjian (Bamvor) 2016-06-13 3:05 ` Zhangjian (Bamvor) 2016-06-13 13:22 ` Zhangjian (Bamvor) 2016-06-13 13:22 ` Zhangjian (Bamvor) 2016-06-13 13:22 ` Zhangjian (Bamvor) 2016-05-24 0:04 ` [PATCH 17/23] arm64: ptrace: handle ptrace_request differently for aarch32 and ilp32 Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-06-08 1:34 ` zhouchengming 2016-06-08 1:34 ` zhouchengming 2016-06-08 1:34 ` zhouchengming 2016-06-08 17:00 ` Yury Norov 2016-06-08 17:00 ` Yury Norov 2016-06-08 17:00 ` Yury Norov 2016-06-25 9:36 ` zhouchengming 2016-06-25 9:36 ` zhouchengming 2016-06-25 9:36 ` zhouchengming 2016-06-25 14:15 ` Bamvor Zhang 2016-06-25 14:15 ` Bamvor Zhang 2016-06-27 2:09 ` zhouchengming 2016-06-27 2:09 ` zhouchengming 2016-06-27 2:09 ` zhouchengming 2016-05-24 0:04 ` [PATCH 18/23] arm64: ilp32: add sys_ilp32.c and a separate table (in entry.S) to use it Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-25 20:26 ` Arnd Bergmann 2016-05-25 20:26 ` Arnd Bergmann 2016-05-24 0:04 ` [PATCH 19/23] arm64: signal: share lp64 signal routines to ilp32 Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` [PATCH 20/23] arm64: signal32: move ilp32 and aarch32 common code to separated file Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` [PATCH 21/23] arm64: ilp32: introduce ilp32-specific handlers for sigframe and ucontext Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-06-04 11:34 ` Zhangjian (Bamvor) 2016-06-04 11:34 ` Zhangjian (Bamvor) 2016-06-04 11:34 ` Zhangjian (Bamvor) 2016-06-12 12:34 ` Zhangjian (Bamvor) 2016-06-12 12:34 ` Zhangjian (Bamvor) 2016-06-12 12:34 ` Zhangjian (Bamvor) 2016-06-12 13:12 ` Zhangjian (Bamvor) 2016-06-12 13:12 ` Zhangjian (Bamvor) 2016-06-12 13:12 ` Zhangjian (Bamvor) 2016-06-12 17:44 ` Yury Norov 2016-06-12 17:44 ` Yury Norov 2016-06-12 17:44 ` Yury Norov 2016-06-16 11:21 ` Zhangjian (Bamvor) 2016-06-16 11:21 ` Zhangjian (Bamvor) 2016-06-16 11:21 ` Zhangjian (Bamvor) 2016-06-12 12:39 ` Zhangjian (Bamvor) 2016-06-12 12:39 ` Zhangjian (Bamvor) 2016-06-12 12:39 ` Zhangjian (Bamvor) 2016-05-24 0:04 ` [PATCH 22/23] arm64:ilp32: add vdso-ilp32 and use for signal return Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` [PATCH 23/23] arm64:ilp32: add ARM64_ILP32 to Kconfig Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-24 0:04 ` Yury Norov 2016-05-25 10:42 ` [PATCH v6 00/21] ILP32 for ARM64 Szabolcs Nagy 2016-05-25 10:42 ` Szabolcs Nagy 2016-05-25 10:42 ` Szabolcs Nagy 2016-05-25 16:41 ` Yury Norov 2016-05-25 16:41 ` Yury Norov 2016-05-25 16:41 ` Yury Norov 2016-06-02 19:03 ` Yury Norov 2016-06-02 19:03 ` Yury Norov 2016-06-02 19:03 ` Yury Norov 2016-06-02 19:03 ` Yury Norov 2016-06-03 11:02 ` Szabolcs Nagy 2016-06-03 11:02 ` Szabolcs Nagy 2016-06-03 11:02 ` Szabolcs Nagy
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20160526222943.GA16729@MBP.local \ --to=catalin.marinas@arm.com \ --cc=Nathan_Lynch@mentor.com \ --cc=Prasun.Kapoor@caviumnetworks.com \ --cc=agraf@suse.de \ --cc=arnd@arndb.de \ --cc=bamvor.zhangjian@huawei.com \ --cc=broonie@kernel.org \ --cc=christoph.muellner@theobroma-systems.com \ --cc=davem@davemloft.net \ --cc=geert@linux-m68k.org \ --cc=heiko.carstens@de.ibm.com \ --cc=joseph@codesourcery.com \ --cc=kilobyte@angband.pl \ --cc=klimov.linux@gmail.com \ --cc=libc-alpha@sourceware.org \ --cc=linux-arch@vger.kernel.org \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-doc@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-s390@vger.kernel.org \ --cc=philipp.tomsich@theobroma-systems.com \ --cc=pinskia@gmail.com \ --cc=schwidefsky@de.ibm.com \ --cc=szabolcs.nagy@arm.com \ --cc=ynorov@caviumnetworks.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.