All of lore.kernel.org
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Yury Norov <ynorov@caviumnetworks.com>
Cc: David Miller <davem@davemloft.net>,
	arnd@arndb.de, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-arch@vger.kernel.org, linux-s390@vger.kernel.org,
	libc-alpha@sourceware.org, schwidefsky@de.ibm.com,
	heiko.carstens@de.ibm.com, pinskia@gmail.com, broonie@kernel.org,
	joseph@codesourcery.com,
	christoph.muellner@theobroma-systems.com,
	bamvor.zhangjian@huawei.com, szabolcs.nagy@arm.com,
	klimov.linux@gmail.com, Nathan_Lynch@mentor.com, agraf@suse.de,
	Prasun.Kapoor@caviumnetworks.com, kilobyte@angband.pl,
	geert@linux-m68k.org, philipp.tomsich@theobroma-systems.com
Subject: Re: [PATCH 01/23] all: syscall wrappers: add documentation
Date: Thu, 26 May 2016 23:29:45 +0100	[thread overview]
Message-ID: <20160526222943.GA16729@MBP.local> (raw)
In-Reply-To: <20160526204819.GA10274@yury-N73SV>

On Thu, May 26, 2016 at 11:48:19PM +0300, Yury Norov wrote:
> On Wed, May 25, 2016 at 02:28:21PM -0700, David Miller wrote:
> > From: Arnd Bergmann <arnd@arndb.de>
> > Date: Wed, 25 May 2016 23:01:06 +0200
> > 
> > > On Wednesday, May 25, 2016 1:50:39 PM CEST David Miller wrote:
> > >> From: Arnd Bergmann <arnd@arndb.de>
> > >> Date: Wed, 25 May 2016 22:47:33 +0200
> > >> 
> > >> > If we use the normal calling conventions, we could remove these overrides
> > >> > along with the respective special-case handling in glibc. None of them
> > >> > look particularly performance-sensitive, but I could be wrong there.
> > >> 
> > >> You could set the lowest bit in the system call entry pointer to indicate
> > >> the upper-half clears should be elided.
> > > 
> > > Right, but that would introduce an extra conditional branch in the syscall
> > > hotpath, and likely eliminate the gains from passing the loff_t arguments
> > > in a single register instead of a pair.
> > 
> > Ok, then, how much are you really gaining from avoiding a 'shift' and
> > an 'or' to build the full 64-bit value?  3 cycles?  Maybe 4?
> 
> 4 cycles in kernel and ~same cost in glibc to create a pair.

It would take a single instruction per argument in the kernel to do
shift+or and maybe 1-2 more instructions to move the remaining arguments
in place (we do this for a few wrappers in arch/arm64/kernel/entry32.S).
And the glibc counterpart.

> And 8 'mov's that exist for every syscall, even yield().
> 
> > And the executing the wrappers, those have a non-trivial cost too.
> 
> The cost is pretty trivial though. See kernel/compat_wrapper.o:
> COMPAT_SYSCALL_WRAP2(creat, const char __user *, pathname, umode_t, mode);
> 0:   a9bf7bfd        stp     x29, x30, [sp,#-16]!
> 4:   910003fd        mov     x29, sp
> 8:   2a0003e0        mov     w0, w0
> c:   94000000        bl      0 <sys_creat>
> 10:  a8c17bfd        ldp     x29, x30, [sp],#16
> 14:  d65f03c0        ret

I would say the above could be more expensive than 8 movs (16 bytes to
write, read, a branch and a ret). You can also add the I-cache locality,
having wrappers for each syscalls instead of a single place for zeroing
the upper half (where no other wrapper is necessary).

Can we trick the compiler into doing a tail call optimisation. This
could have simply been:

COMPAT_SYSCALL_WRAP2(creat, ...):
	mov	w0, w0
	b	<sys_creat>

> > Cost wise, this seems like it all cancels out in the end, but what
> > do I know?
> 
> I think you know something, and I also think Heiko and other s390 guys
> know something as well. So I'd like to listen their arguments here.
> 
> For me spark64 way is looking reasonable only because it's really simple
> and takes less coding. I'll try it on some branch and share here what happened.

The kernel code will definitely look simpler ;). It would be good to see
if there actually is any performance impact. Even with 16 more cycles on
syscall entry, would they be lost in the noise? You don't need a full
implementation, just some dummy mov x0, x0 on the entry path.

-- 
Catalin

WARNING: multiple messages have this Message-ID (diff)
From: catalin.marinas@arm.com (Catalin Marinas)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 01/23] all: syscall wrappers: add documentation
Date: Thu, 26 May 2016 23:29:45 +0100	[thread overview]
Message-ID: <20160526222943.GA16729@MBP.local> (raw)
In-Reply-To: <20160526204819.GA10274@yury-N73SV>

On Thu, May 26, 2016 at 11:48:19PM +0300, Yury Norov wrote:
> On Wed, May 25, 2016 at 02:28:21PM -0700, David Miller wrote:
> > From: Arnd Bergmann <arnd@arndb.de>
> > Date: Wed, 25 May 2016 23:01:06 +0200
> > 
> > > On Wednesday, May 25, 2016 1:50:39 PM CEST David Miller wrote:
> > >> From: Arnd Bergmann <arnd@arndb.de>
> > >> Date: Wed, 25 May 2016 22:47:33 +0200
> > >> 
> > >> > If we use the normal calling conventions, we could remove these overrides
> > >> > along with the respective special-case handling in glibc. None of them
> > >> > look particularly performance-sensitive, but I could be wrong there.
> > >> 
> > >> You could set the lowest bit in the system call entry pointer to indicate
> > >> the upper-half clears should be elided.
> > > 
> > > Right, but that would introduce an extra conditional branch in the syscall
> > > hotpath, and likely eliminate the gains from passing the loff_t arguments
> > > in a single register instead of a pair.
> > 
> > Ok, then, how much are you really gaining from avoiding a 'shift' and
> > an 'or' to build the full 64-bit value?  3 cycles?  Maybe 4?
> 
> 4 cycles in kernel and ~same cost in glibc to create a pair.

It would take a single instruction per argument in the kernel to do
shift+or and maybe 1-2 more instructions to move the remaining arguments
in place (we do this for a few wrappers in arch/arm64/kernel/entry32.S).
And the glibc counterpart.

> And 8 'mov's that exist for every syscall, even yield().
> 
> > And the executing the wrappers, those have a non-trivial cost too.
> 
> The cost is pretty trivial though. See kernel/compat_wrapper.o:
> COMPAT_SYSCALL_WRAP2(creat, const char __user *, pathname, umode_t, mode);
> 0:   a9bf7bfd        stp     x29, x30, [sp,#-16]!
> 4:   910003fd        mov     x29, sp
> 8:   2a0003e0        mov     w0, w0
> c:   94000000        bl      0 <sys_creat>
> 10:  a8c17bfd        ldp     x29, x30, [sp],#16
> 14:  d65f03c0        ret

I would say the above could be more expensive than 8 movs (16 bytes to
write, read, a branch and a ret). You can also add the I-cache locality,
having wrappers for each syscalls instead of a single place for zeroing
the upper half (where no other wrapper is necessary).

Can we trick the compiler into doing a tail call optimisation. This
could have simply been:

COMPAT_SYSCALL_WRAP2(creat, ...):
	mov	w0, w0
	b	<sys_creat>

> > Cost wise, this seems like it all cancels out in the end, but what
> > do I know?
> 
> I think you know something, and I also think Heiko and other s390 guys
> know something as well. So I'd like to listen their arguments here.
> 
> For me spark64 way is looking reasonable only because it's really simple
> and takes less coding. I'll try it on some branch and share here what happened.

The kernel code will definitely look simpler ;). It would be good to see
if there actually is any performance impact. Even with 16 more cycles on
syscall entry, would they be lost in the noise? You don't need a full
implementation, just some dummy mov x0, x0 on the entry path.

-- 
Catalin

  reply	other threads:[~2016-05-26 22:29 UTC|newest]

Thread overview: 207+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-24  0:04 [PATCH v6 00/21] ILP32 for ARM64 Yury Norov
2016-05-24  0:04 ` Yury Norov
2016-05-24  0:04 ` Yury Norov
2016-05-24  0:04 ` [PATCH 01/23] all: syscall wrappers: add documentation Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-25 19:30   ` David Miller
2016-05-25 19:30     ` David Miller
2016-05-25 20:03     ` Yury Norov
2016-05-25 20:03       ` Yury Norov
2016-05-25 20:03       ` Yury Norov
2016-05-25 20:21       ` David Miller
2016-05-25 20:21         ` David Miller
2016-05-25 20:47         ` Arnd Bergmann
2016-05-25 20:47           ` Arnd Bergmann
2016-05-25 20:50           ` David Miller
2016-05-25 20:50             ` David Miller
2016-05-25 21:01             ` Arnd Bergmann
2016-05-25 21:01               ` Arnd Bergmann
2016-05-25 21:28               ` David Miller
2016-05-25 21:28                 ` David Miller
2016-05-26 14:20                 ` Catalin Marinas
2016-05-26 14:20                   ` Catalin Marinas
2016-05-26 14:50                   ` Szabolcs Nagy
2016-05-26 14:50                     ` Szabolcs Nagy
2016-05-26 14:50                     ` Szabolcs Nagy
2016-05-26 15:19                     ` Catalin Marinas
2016-05-26 15:19                       ` Catalin Marinas
2016-05-26 19:43                   ` David Miller
2016-05-26 19:43                     ` David Miller
2016-05-27 10:10                     ` Catalin Marinas
2016-05-27 10:10                       ` Catalin Marinas
2016-05-26 20:48                 ` Yury Norov
2016-05-26 20:48                   ` Yury Norov
2016-05-26 20:48                   ` Yury Norov
2016-05-26 22:29                   ` Catalin Marinas [this message]
2016-05-26 22:29                     ` Catalin Marinas
2016-05-27  0:37                     ` Yury Norov
2016-05-27  0:37                       ` Yury Norov
2016-05-27  0:37                       ` Yury Norov
2016-05-27  6:03                       ` Heiko Carstens
2016-05-27  6:03                         ` Heiko Carstens
2016-05-27  8:42                         ` Arnd Bergmann
2016-05-27  8:42                           ` Arnd Bergmann
2016-05-27  9:30                           ` Catalin Marinas
2016-05-27  9:30                             ` Catalin Marinas
2016-05-27 10:49                             ` Arnd Bergmann
2016-05-27 10:49                               ` Arnd Bergmann
2016-05-27 13:04                               ` Catalin Marinas
2016-05-27 13:04                                 ` Catalin Marinas
2016-05-27 16:58                                 ` Yury Norov
2016-05-27 16:58                                   ` Yury Norov
2016-05-27 16:58                                   ` Yury Norov
2016-05-27 17:36                                   ` Catalin Marinas
2016-05-27 17:36                                     ` Catalin Marinas
2016-05-27  9:01                         ` Catalin Marinas
2016-05-27  9:01                           ` Catalin Marinas
2016-06-14 23:08                     ` Yury Norov
2016-06-14 23:08                       ` Yury Norov
2016-06-14 23:08                       ` Yury Norov
2016-05-27  5:52     ` Heiko Carstens
2016-05-27  5:52       ` Heiko Carstens
2016-05-24  0:04 ` [PATCH 02/23] all: introduce COMPAT_WRAPPER option and enable it for s390 Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04 ` [PATCH 03/23] all: s390: move wrapper infrastructure to generic headers Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04 ` [PATCH 04/23] all: s390: move compat_wrappers.c from arch/s390/kernel to kernel/ Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04 ` [PATCH 05/23] all: wrap needed syscalls in generic unistd Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04 ` [PATCH 06/23] compat ABI: use non-compat openat and open_by_handle_at variants Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04 ` [PATCH 07/23] 32-bit ABI: introduce ARCH_32BIT_OFF_T config option Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04 ` [PATCH 08/23] arm64: ilp32: add documentation on the ILP32 ABI for ARM64 Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04 ` [PATCH 09/23] arm64: ensure the kernel is compiled for LP64 Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04 ` [PATCH 10/23] arm64: rename COMPAT to AARCH32_EL0 in Kconfig Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04 ` [PATCH 11/23] arm64:uapi: set __BITS_PER_LONG correctly for ILP32 and LP64 Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04 ` [PATCH 12/23] thread: move thread bits accessors to separated file Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04 ` [PATCH 13/23] arm64: introduce is_a32_task and is_a32_thread (for AArch32 compat) Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-06-12 12:21   ` Zhangjian (Bamvor)
2016-06-12 12:21     ` Zhangjian (Bamvor)
2016-06-12 12:21     ` Zhangjian (Bamvor)
2016-06-12 13:08     ` Zhangjian (Bamvor)
2016-06-12 13:08       ` Zhangjian (Bamvor)
2016-06-12 13:08       ` Zhangjian (Bamvor)
2016-06-12 17:56       ` Yury Norov
2016-06-12 17:56         ` Yury Norov
2016-06-12 17:56         ` Yury Norov
2016-05-24  0:04 ` [PATCH 14/23] arm64: ilp32: add is_ilp32_compat_{task,thread} and TIF_32BIT_AARCH64 Yury Norov
2016-05-24  0:04   ` [PATCH 14/23] arm64: ilp32: add is_ilp32_compat_{task, thread} " Yury Norov
2016-05-24  0:04   ` [PATCH 14/23] arm64: ilp32: add is_ilp32_compat_{task,thread} " Yury Norov
2016-05-24  0:04 ` [PATCH 15/23] arm64: introduce binfmt_elf32.c Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04 ` [PATCH 16/23] arm64: ilp32: introduce binfmt_ilp32.c Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-26 13:49   ` Zhangjian (Bamvor)
2016-05-26 13:49     ` Zhangjian (Bamvor)
2016-05-26 13:49     ` Zhangjian (Bamvor)
2016-05-26 21:08     ` Yury Norov
2016-05-26 21:08       ` Yury Norov
2016-05-26 21:08       ` Yury Norov
2016-06-15  0:40     ` Yury Norov
2016-06-15  0:40       ` Yury Norov
2016-06-15  0:40       ` Yury Norov
2016-06-13  3:05   ` Zhangjian (Bamvor)
2016-06-13  3:05     ` Zhangjian (Bamvor)
2016-06-13  3:05     ` Zhangjian (Bamvor)
2016-06-13 13:22     ` Zhangjian (Bamvor)
2016-06-13 13:22       ` Zhangjian (Bamvor)
2016-06-13 13:22       ` Zhangjian (Bamvor)
2016-05-24  0:04 ` [PATCH 17/23] arm64: ptrace: handle ptrace_request differently for aarch32 and ilp32 Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-06-08  1:34   ` zhouchengming
2016-06-08  1:34     ` zhouchengming
2016-06-08  1:34     ` zhouchengming
2016-06-08 17:00     ` Yury Norov
2016-06-08 17:00       ` Yury Norov
2016-06-08 17:00       ` Yury Norov
2016-06-25  9:36       ` zhouchengming
2016-06-25  9:36         ` zhouchengming
2016-06-25  9:36         ` zhouchengming
2016-06-25 14:15         ` Bamvor Zhang
2016-06-25 14:15           ` Bamvor Zhang
2016-06-27  2:09           ` zhouchengming
2016-06-27  2:09             ` zhouchengming
2016-06-27  2:09             ` zhouchengming
2016-05-24  0:04 ` [PATCH 18/23] arm64: ilp32: add sys_ilp32.c and a separate table (in entry.S) to use it Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-25 20:26   ` Arnd Bergmann
2016-05-25 20:26     ` Arnd Bergmann
2016-05-24  0:04 ` [PATCH 19/23] arm64: signal: share lp64 signal routines to ilp32 Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04 ` [PATCH 20/23] arm64: signal32: move ilp32 and aarch32 common code to separated file Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04 ` [PATCH 21/23] arm64: ilp32: introduce ilp32-specific handlers for sigframe and ucontext Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-06-04 11:34   ` Zhangjian (Bamvor)
2016-06-04 11:34     ` Zhangjian (Bamvor)
2016-06-04 11:34     ` Zhangjian (Bamvor)
2016-06-12 12:34     ` Zhangjian (Bamvor)
2016-06-12 12:34       ` Zhangjian (Bamvor)
2016-06-12 12:34       ` Zhangjian (Bamvor)
2016-06-12 13:12     ` Zhangjian (Bamvor)
2016-06-12 13:12       ` Zhangjian (Bamvor)
2016-06-12 13:12       ` Zhangjian (Bamvor)
2016-06-12 17:44     ` Yury Norov
2016-06-12 17:44       ` Yury Norov
2016-06-12 17:44       ` Yury Norov
2016-06-16 11:21       ` Zhangjian (Bamvor)
2016-06-16 11:21         ` Zhangjian (Bamvor)
2016-06-16 11:21         ` Zhangjian (Bamvor)
2016-06-12 12:39   ` Zhangjian (Bamvor)
2016-06-12 12:39     ` Zhangjian (Bamvor)
2016-06-12 12:39     ` Zhangjian (Bamvor)
2016-05-24  0:04 ` [PATCH 22/23] arm64:ilp32: add vdso-ilp32 and use for signal return Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04 ` [PATCH 23/23] arm64:ilp32: add ARM64_ILP32 to Kconfig Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-24  0:04   ` Yury Norov
2016-05-25 10:42 ` [PATCH v6 00/21] ILP32 for ARM64 Szabolcs Nagy
2016-05-25 10:42   ` Szabolcs Nagy
2016-05-25 10:42   ` Szabolcs Nagy
2016-05-25 16:41   ` Yury Norov
2016-05-25 16:41     ` Yury Norov
2016-05-25 16:41     ` Yury Norov
2016-06-02 19:03 ` Yury Norov
2016-06-02 19:03   ` Yury Norov
2016-06-02 19:03   ` Yury Norov
2016-06-02 19:03   ` Yury Norov
2016-06-03 11:02   ` Szabolcs Nagy
2016-06-03 11:02     ` Szabolcs Nagy
2016-06-03 11:02     ` Szabolcs Nagy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160526222943.GA16729@MBP.local \
    --to=catalin.marinas@arm.com \
    --cc=Nathan_Lynch@mentor.com \
    --cc=Prasun.Kapoor@caviumnetworks.com \
    --cc=agraf@suse.de \
    --cc=arnd@arndb.de \
    --cc=bamvor.zhangjian@huawei.com \
    --cc=broonie@kernel.org \
    --cc=christoph.muellner@theobroma-systems.com \
    --cc=davem@davemloft.net \
    --cc=geert@linux-m68k.org \
    --cc=heiko.carstens@de.ibm.com \
    --cc=joseph@codesourcery.com \
    --cc=kilobyte@angband.pl \
    --cc=klimov.linux@gmail.com \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=philipp.tomsich@theobroma-systems.com \
    --cc=pinskia@gmail.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=szabolcs.nagy@arm.com \
    --cc=ynorov@caviumnetworks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.