All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] x86/asm/entry/32: Explain stub32_clone logic
@ 2015-04-22 16:40 Denys Vlasenko
  2015-04-22 16:40 ` [PATCH 2/2] x86/asm/entry/32: Remove unnecessary optimization in stub32_clone Denys Vlasenko
  2015-04-22 16:53 ` [PATCH 1/2] x86/asm/entry/32: Explain stub32_clone logic Andy Lutomirski
  0 siblings, 2 replies; 11+ messages in thread
From: Denys Vlasenko @ 2015-04-22 16:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Borislav Petkov,
	H. Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	x86, linux-kernel

The reason for copying of %r8 to %rcx is quite non-obvious.
Add a comment which explains why it is done.

Fix indentation and trailing whitespace while at it.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
---
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ingo Molnar <mingo@kernel.org>
CC: Borislav Petkov <bp@alien8.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
 arch/x86/ia32/ia32entry.S | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 2ca052e..8e72256 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -562,9 +562,17 @@ GLOBAL(\label)
 
 	ALIGN
 GLOBAL(stub32_clone)
-	leaq sys_clone(%rip),%rax
+	leaq	sys_clone(%rip), %rax
+	/*
+	 * 32-bit clone API is clone(..., int tls_val, int *child_tidptr).
+	 * 64-bit clone API is clone(..., int *child_tidptr, int tls_val).
+	 * Native 64-bit kernel's sys_clone() implements the latter.
+	 * We need to swap args here. But since tls_val is in fact ignored
+	 * by sys_clone(), we can get away with an assignment
+	 * (arg4 = arg5) instead of a full swap:
+	 */
 	mov	%r8, %rcx
-	jmp  ia32_ptregs_common	
+	jmp	ia32_ptregs_common
 
 	ALIGN
 ia32_ptregs_common:
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] x86/asm/entry/32: Remove unnecessary optimization in stub32_clone
  2015-04-22 16:40 [PATCH 1/2] x86/asm/entry/32: Explain stub32_clone logic Denys Vlasenko
@ 2015-04-22 16:40 ` Denys Vlasenko
  2015-04-22 16:54   ` Andy Lutomirski
  2015-04-22 16:53 ` [PATCH 1/2] x86/asm/entry/32: Explain stub32_clone logic Andy Lutomirski
  1 sibling, 1 reply; 11+ messages in thread
From: Denys Vlasenko @ 2015-04-22 16:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Borislav Petkov,
	H. Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	x86, linux-kernel

Really swap arguments #4 and #5 in stub32_clone instead of "optimizing"
it into a move.

Yes, tls_val is currently unused. Yes, on some CPUs XCHG is a little bit
more expensive than MOV. But a cycle or two on an expensive syscall like
clone() is way below noise floor, and obfuscation of logic introduced
by this optimization is simply not worth it.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ingo Molnar <mingo@kernel.org>
CC: Borislav Petkov <bp@alien8.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/ia32/ia32entry.S | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 8e72256..0c302d0 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -567,11 +567,9 @@ GLOBAL(stub32_clone)
 	 * 32-bit clone API is clone(..., int tls_val, int *child_tidptr).
 	 * 64-bit clone API is clone(..., int *child_tidptr, int tls_val).
 	 * Native 64-bit kernel's sys_clone() implements the latter.
-	 * We need to swap args here. But since tls_val is in fact ignored
-	 * by sys_clone(), we can get away with an assignment
-	 * (arg4 = arg5) instead of a full swap:
+	 * We need to swap args here:
 	 */
-	mov	%r8, %rcx
+	xchg	%r8, %rcx
 	jmp	ia32_ptregs_common
 
 	ALIGN
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] x86/asm/entry/32: Explain stub32_clone logic
  2015-04-22 16:40 [PATCH 1/2] x86/asm/entry/32: Explain stub32_clone logic Denys Vlasenko
  2015-04-22 16:40 ` [PATCH 2/2] x86/asm/entry/32: Remove unnecessary optimization in stub32_clone Denys Vlasenko
@ 2015-04-22 16:53 ` Andy Lutomirski
  1 sibling, 0 replies; 11+ messages in thread
From: Andy Lutomirski @ 2015-04-22 16:53 UTC (permalink / raw)
  To: Denys Vlasenko, Josh Triplett
  Cc: Ingo Molnar, Linus Torvalds, Steven Rostedt, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel

On Wed, Apr 22, 2015 at 9:40 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> The reason for copying of %r8 to %rcx is quite non-obvious.
> Add a comment which explains why it is done.
>
> Fix indentation and trailing whitespace while at it.

Seems reasonable, but I think that Josh's clone patch is even better.

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] x86/asm/entry/32: Remove unnecessary optimization in stub32_clone
  2015-04-22 16:40 ` [PATCH 2/2] x86/asm/entry/32: Remove unnecessary optimization in stub32_clone Denys Vlasenko
@ 2015-04-22 16:54   ` Andy Lutomirski
  2015-04-22 17:10     ` Josh Triplett
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2015-04-22 16:54 UTC (permalink / raw)
  To: Denys Vlasenko, Josh Triplett
  Cc: Ingo Molnar, Linus Torvalds, Steven Rostedt, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel

On Wed, Apr 22, 2015 at 9:40 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> Really swap arguments #4 and #5 in stub32_clone instead of "optimizing"
> it into a move.
>
> Yes, tls_val is currently unused. Yes, on some CPUs XCHG is a little bit
> more expensive than MOV. But a cycle or two on an expensive syscall like
> clone() is way below noise floor, and obfuscation of logic introduced
> by this optimization is simply not worth it.

Ditto re: Josh's patch.

--Andy

>
> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> CC: Linus Torvalds <torvalds@linux-foundation.org>
> CC: Steven Rostedt <rostedt@goodmis.org>
> CC: Ingo Molnar <mingo@kernel.org>
> CC: Borislav Petkov <bp@alien8.de>
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Frederic Weisbecker <fweisbec@gmail.com>
> CC: Alexei Starovoitov <ast@plumgrid.com>
> CC: Will Drewry <wad@chromium.org>
> CC: Kees Cook <keescook@chromium.org>
> CC: x86@kernel.org
> CC: linux-kernel@vger.kernel.org
> ---
>  arch/x86/ia32/ia32entry.S | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
> index 8e72256..0c302d0 100644
> --- a/arch/x86/ia32/ia32entry.S
> +++ b/arch/x86/ia32/ia32entry.S
> @@ -567,11 +567,9 @@ GLOBAL(stub32_clone)
>          * 32-bit clone API is clone(..., int tls_val, int *child_tidptr).
>          * 64-bit clone API is clone(..., int *child_tidptr, int tls_val).
>          * Native 64-bit kernel's sys_clone() implements the latter.
> -        * We need to swap args here. But since tls_val is in fact ignored
> -        * by sys_clone(), we can get away with an assignment
> -        * (arg4 = arg5) instead of a full swap:
> +        * We need to swap args here:
>          */
> -       mov     %r8, %rcx
> +       xchg    %r8, %rcx
>         jmp     ia32_ptregs_common
>
>         ALIGN
> --
> 1.8.1.4
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] x86/asm/entry/32: Remove unnecessary optimization in stub32_clone
  2015-04-22 16:54   ` Andy Lutomirski
@ 2015-04-22 17:10     ` Josh Triplett
  2015-04-22 18:04       ` Denys Vlasenko
  2015-04-22 18:22       ` Linus Torvalds
  0 siblings, 2 replies; 11+ messages in thread
From: Josh Triplett @ 2015-04-22 17:10 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Denys Vlasenko, Ingo Molnar, Linus Torvalds, Steven Rostedt,
	Borislav Petkov, H. Peter Anvin, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	X86 ML, linux-kernel

On Wed, Apr 22, 2015 at 09:54:24AM -0700, Andy Lutomirski wrote:
> On Wed, Apr 22, 2015 at 9:40 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> > Really swap arguments #4 and #5 in stub32_clone instead of "optimizing"
> > it into a move.
> >
> > Yes, tls_val is currently unused. Yes, on some CPUs XCHG is a little bit
> > more expensive than MOV. But a cycle or two on an expensive syscall like
> > clone() is way below noise floor, and obfuscation of logic introduced
> > by this optimization is simply not worth it.
> 
> Ditto re: Josh's patch.

I do think my two-patch HAVE_COPY_THREAD_TLS series should go in fixing
this, but I'd like to see the final version of Denys' comment added on
top of it (with an update to the type and name of the tls argument to
match the changes to sys_clone).

Denys, would you consider submitting a patch adding your comment on top
of the two-patch series I just sent?

Thanks,
Josh Triplett

> --Andy
> 
> >
> > Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> > CC: Linus Torvalds <torvalds@linux-foundation.org>
> > CC: Steven Rostedt <rostedt@goodmis.org>
> > CC: Ingo Molnar <mingo@kernel.org>
> > CC: Borislav Petkov <bp@alien8.de>
> > CC: "H. Peter Anvin" <hpa@zytor.com>
> > CC: Andy Lutomirski <luto@amacapital.net>
> > CC: Oleg Nesterov <oleg@redhat.com>
> > CC: Frederic Weisbecker <fweisbec@gmail.com>
> > CC: Alexei Starovoitov <ast@plumgrid.com>
> > CC: Will Drewry <wad@chromium.org>
> > CC: Kees Cook <keescook@chromium.org>
> > CC: x86@kernel.org
> > CC: linux-kernel@vger.kernel.org
> > ---
> >  arch/x86/ia32/ia32entry.S | 6 ++----
> >  1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
> > index 8e72256..0c302d0 100644
> > --- a/arch/x86/ia32/ia32entry.S
> > +++ b/arch/x86/ia32/ia32entry.S
> > @@ -567,11 +567,9 @@ GLOBAL(stub32_clone)
> >          * 32-bit clone API is clone(..., int tls_val, int *child_tidptr).
> >          * 64-bit clone API is clone(..., int *child_tidptr, int tls_val).
> >          * Native 64-bit kernel's sys_clone() implements the latter.
> > -        * We need to swap args here. But since tls_val is in fact ignored
> > -        * by sys_clone(), we can get away with an assignment
> > -        * (arg4 = arg5) instead of a full swap:
> > +        * We need to swap args here:
> >          */
> > -       mov     %r8, %rcx
> > +       xchg    %r8, %rcx
> >         jmp     ia32_ptregs_common
> >
> >         ALIGN
> > --
> > 1.8.1.4
> >
> 
> 
> 
> -- 
> Andy Lutomirski
> AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] x86/asm/entry/32: Remove unnecessary optimization in stub32_clone
  2015-04-22 17:10     ` Josh Triplett
@ 2015-04-22 18:04       ` Denys Vlasenko
  2015-04-22 18:22       ` Linus Torvalds
  1 sibling, 0 replies; 11+ messages in thread
From: Denys Vlasenko @ 2015-04-22 18:04 UTC (permalink / raw)
  To: Josh Triplett, Andy Lutomirski
  Cc: Ingo Molnar, Linus Torvalds, Steven Rostedt, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel

On 04/22/2015 07:10 PM, Josh Triplett wrote:
> On Wed, Apr 22, 2015 at 09:54:24AM -0700, Andy Lutomirski wrote:
>> On Wed, Apr 22, 2015 at 9:40 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>>> Really swap arguments #4 and #5 in stub32_clone instead of "optimizing"
>>> it into a move.
>>>
>>> Yes, tls_val is currently unused. Yes, on some CPUs XCHG is a little bit
>>> more expensive than MOV. But a cycle or two on an expensive syscall like
>>> clone() is way below noise floor, and obfuscation of logic introduced
>>> by this optimization is simply not worth it.
>>
>> Ditto re: Josh's patch.
> 
> I do think my two-patch HAVE_COPY_THREAD_TLS series should go in fixing
> this, but I'd like to see the final version of Denys' comment added on
> top of it (with an update to the type and name of the tls argument to
> match the changes to sys_clone).
> 
> Denys, would you consider submitting a patch adding your comment on top
> of the two-patch series I just sent?

Okay.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] x86/asm/entry/32: Remove unnecessary optimization in stub32_clone
  2015-04-22 17:10     ` Josh Triplett
  2015-04-22 18:04       ` Denys Vlasenko
@ 2015-04-22 18:22       ` Linus Torvalds
  2015-04-22 20:12         ` Josh Triplett
  1 sibling, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2015-04-22 18:22 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Andy Lutomirski, Denys Vlasenko, Ingo Molnar, Steven Rostedt,
	Borislav Petkov, H. Peter Anvin, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	X86 ML, linux-kernel

On Wed, Apr 22, 2015 at 10:10 AM, Josh Triplett <josh@joshtriplett.org> wrote:
>
> I do think my two-patch HAVE_COPY_THREAD_TLS series should go in fixing
> this

Ugh, I absolutely detesrt that patch.

Don't make random crazy function signatures that depend on some config
option. That's just evil. The patch is a mess of #ifdef's and should
be shot in the head and staked with a silver stake to  make sure it
never re-appears.

Either:

 (a) make the change for every architecture

 (b) have side-by-side interfaces. With different names!

but not that disgusting "the calling conventions of these random
functions are different on different architectures and we use a config
flag to distinguish the cases".

           Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] x86/asm/entry/32: Remove unnecessary optimization in stub32_clone
  2015-04-22 18:22       ` Linus Torvalds
@ 2015-04-22 20:12         ` Josh Triplett
  2015-04-23  6:24           ` Ingo Molnar
  0 siblings, 1 reply; 11+ messages in thread
From: Josh Triplett @ 2015-04-22 20:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Denys Vlasenko, Ingo Molnar, Steven Rostedt,
	Borislav Petkov, H. Peter Anvin, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	X86 ML, linux-kernel

On Wed, Apr 22, 2015 at 11:22:02AM -0700, Linus Torvalds wrote:
> On Wed, Apr 22, 2015 at 10:10 AM, Josh Triplett <josh@joshtriplett.org> wrote:
> >
> > I do think my two-patch HAVE_COPY_THREAD_TLS series should go in fixing
> > this
> 
> Ugh, I absolutely detesrt that patch.
> 
> Don't make random crazy function signatures that depend on some config
> option. That's just evil. The patch is a mess of #ifdef's and should
> be shot in the head and staked with a silver stake to  make sure it
> never re-appears.
> 
> Either:
> 
>  (a) make the change for every architecture
> 
>  (b) have side-by-side interfaces. With different names!

...that's exactly what I did.  They're called copy_thread and
copy_thread_tls; I very intentionally did not conditionally change the
signature of copy_thread, for exactly that reason.  Those functions are
implemented in architecture-specific code, so the config option just
specifies which of the two functions the architecture provides.

*sys_clone* has different function signatures based on config options,
but I didn't touch that other than fixing the type of the tls argument.
That's historical baggage that we can't throw away without breaking
userspace.

- Josh Triplett

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] x86/asm/entry/32: Remove unnecessary optimization in stub32_clone
  2015-04-22 20:12         ` Josh Triplett
@ 2015-04-23  6:24           ` Ingo Molnar
  2015-04-23  7:36             ` Josh Triplett
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2015-04-23  6:24 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Linus Torvalds, Andy Lutomirski, Denys Vlasenko, Steven Rostedt,
	Borislav Petkov, H. Peter Anvin, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	X86 ML, linux-kernel, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner


* Josh Triplett <josh@joshtriplett.org> wrote:

> On Wed, Apr 22, 2015 at 11:22:02AM -0700, Linus Torvalds wrote:
> > On Wed, Apr 22, 2015 at 10:10 AM, Josh Triplett <josh@joshtriplett.org> wrote:
> > >
> > > I do think my two-patch HAVE_COPY_THREAD_TLS series should go in fixing
> > > this
> > 
> > Ugh, I absolutely detesrt that patch.
> > 
> > Don't make random crazy function signatures that depend on some config
> > option. That's just evil. The patch is a mess of #ifdef's and should
> > be shot in the head and staked with a silver stake to  make sure it
> > never re-appears.
> > 
> > Either:
> > 
> >  (a) make the change for every architecture
> > 
> >  (b) have side-by-side interfaces. With different names!
> 
> ...that's exactly what I did.  They're called copy_thread and 
> copy_thread_tls; I very intentionally did not conditionally change 
> the signature of copy_thread, for exactly that reason.  Those 
> functions are implemented in architecture-specific code, so the 
> config option just specifies which of the two functions the 
> architecture provides.
> 
> *sys_clone* has different function signatures based on config 
> options, but I didn't touch that other than fixing the type of the 
> tls argument. That's historical baggage that we can't throw away 
> without breaking userspace.

So you want to add a new clone() parameter. I strongly suspect that 
you won't be the last person who wants to do this.

So why not leave the compatibility baggage alone and introduce a new 
clean clone syscall with a flexible, backwards and forwards compatible 
parameter ABI?

Something like:

  SYSCALL_DEFINE1(clone_params, struct clone_params __user *, params)

  struct clone_params {
	__u32 size;	/* User-space sets it to sizeof(struct clone_params) */

	__u32 tls_val;	/* Slightly out of order, for optimal structure packing */
	__u64 clone_flags;
	__u64 new_sp_addr;
	__u64 parent_tid_addr;
	__u64 child_tid_addr;
	__u64 tls;
  };

The only real cost of this approach is that this parameter structure 
has to be copied (once): should be fast as it fits into a single cache 
line and is a constant size copy. Also note how this parameter block 
can be passed down by const reference from that point on, instead of 
copying/shuffling 5-6 parameters into increasingly long function 
parameter lists. So it might in fact turn out to be slightly faster as 
well.

Note how easily extensible it is: a new field can be added by 
appending to the structure, and compatibility is achieved in the 
kernel without explicit versioning, by checking params->size:

  params->size == sizeof(*params):

     Good, kernel and userspace ABI version matches. This is a simple
     check and the most typical situation - it will be the fastest as
     well.


  params->size < sizeof(*params):

     Old binary calls into new kernel. Missing fields are set to 0.
     Binaries will be forward compatible without any versioning hacks.


  params->size > sizeof(*params):

     New user-space calls into old kernel. System call can still be
     serviced if the new field(s) are all zero. We return -ENOSYS if 
     any field is non-zero. (i.e. if new user-space tries to make use
     of a new syscall feature that this kernel has not implemented
     yet.) This way user-space can figure out whether a particular
     new parameter is supported by a kernel, without having to add new
     system calls or versioning.

Also note how 'fool proof' this kind of 'versioning' is: the parameter 
block is extended by appending to the end, cleanly extending the 
kernel internals to handle the new parameter - end of story. The 
zeroing logic on size mismatch will take care of the rest 
automatically, it's all ABI forwards and backwards compatible to the 
maximum possible extent.

It's relatively easy to use this same interface from 32-bit compat 
environments as well: they can use the very same parameter block, the 
kernel's compat layer possibly just needs to do a minor amount of 
pointer translation for the _addr fields, for example to truncate them 
down to 32 bits for security, by filling in the high words of pointers 
with 0. (This too can be optimized further if needed, by open coding 
the copy and zeroing.) This too is forwards and backwards ABI 
compatible to the maximum possible extent.

So introduce a single new syscall with the right ABI architecture and 
you can leave the old mistakes alone.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] x86/asm/entry/32: Remove unnecessary optimization in stub32_clone
  2015-04-23  6:24           ` Ingo Molnar
@ 2015-04-23  7:36             ` Josh Triplett
  0 siblings, 0 replies; 11+ messages in thread
From: Josh Triplett @ 2015-04-23  7:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andy Lutomirski, Denys Vlasenko, Steven Rostedt,
	Borislav Petkov, H. Peter Anvin, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	X86 ML, linux-kernel, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner

On Thu, Apr 23, 2015 at 08:24:38AM +0200, Ingo Molnar wrote:
> * Josh Triplett <josh@joshtriplett.org> wrote:
> > On Wed, Apr 22, 2015 at 11:22:02AM -0700, Linus Torvalds wrote:
> > > On Wed, Apr 22, 2015 at 10:10 AM, Josh Triplett <josh@joshtriplett.org> wrote:
> > > >
> > > > I do think my two-patch HAVE_COPY_THREAD_TLS series should go in fixing
> > > > this
> > > 
> > > Ugh, I absolutely detesrt that patch.
> > > 
> > > Don't make random crazy function signatures that depend on some config
> > > option. That's just evil. The patch is a mess of #ifdef's and should
> > > be shot in the head and staked with a silver stake to  make sure it
> > > never re-appears.
> > > 
> > > Either:
> > > 
> > >  (a) make the change for every architecture
> > > 
> > >  (b) have side-by-side interfaces. With different names!
> > 
> > ...that's exactly what I did.  They're called copy_thread and 
> > copy_thread_tls; I very intentionally did not conditionally change 
> > the signature of copy_thread, for exactly that reason.  Those 
> > functions are implemented in architecture-specific code, so the 
> > config option just specifies which of the two functions the 
> > architecture provides.
> > 
> > *sys_clone* has different function signatures based on config 
> > options, but I didn't touch that other than fixing the type of the 
> > tls argument. That's historical baggage that we can't throw away 
> > without breaking userspace.
> 
> So you want to add a new clone() parameter. I strongly suspect that 
> you won't be the last person who wants to do this.
> 
> So why not leave the compatibility baggage alone and introduce a new 
> clean clone syscall with a flexible, backwards and forwards compatible 
> parameter ABI?

...that's also exactly what I did, in the clone4 patch series, which
uses exactly the arg-structure approach you're describing. :)

Take a look at the clone4 patch series (v2).  We'll be updating it to v3
to address some feedback, and we're hoping to aim for the 4.2 merge
window.

> Something like:
> 
>   SYSCALL_DEFINE1(clone_params, struct clone_params __user *, params)

clone4 passes the size outside the params structure, since otherwise
you'd need to copy the first few bytes to know how much more to copy.

clone4 also passes flags outside the params structure, to allow for a
potential future flag that might indicate a completely different
interpretation of the params structure.

But otherwise, yes, clone4 moves all the parameters into a structure.

(Differences between clone4_args and your clone_params structure: size
and flags passed outside, type of the tls parameter is "unsigned long"
since it's pointer-sized, and the structure includes the stack_size
argument needed on some architectures.)

> The only real cost of this approach is that this parameter structure 
> has to be copied (once): should be fast as it fits into a single cache 
> line and is a constant size copy. Also note how this parameter block 
> can be passed down by const reference from that point on, instead of 
> copying/shuffling 5-6 parameters into increasingly long function 
> parameter lists. So it might in fact turn out to be slightly faster as 
> well.

Agreed.

> Note how easily extensible it is: a new field can be added by 
> appending to the structure, and compatibility is achieved in the 
> kernel without explicit versioning, by checking params->size:
> 
>   params->size == sizeof(*params):
> 
>      Good, kernel and userspace ABI version matches. This is a simple
>      check and the most typical situation - it will be the fastest as
>      well.
> 
> 
>   params->size < sizeof(*params):
> 
>      Old binary calls into new kernel. Missing fields are set to 0.
>      Binaries will be forward compatible without any versioning hacks.

I combined these two cases by just copying the userspace struct over a
pre-zeroed params structure; then any fields not copied over will remain
zero.

>   params->size > sizeof(*params):
> 
>      New user-space calls into old kernel. System call can still be
>      serviced if the new field(s) are all zero. We return -ENOSYS if 
>      any field is non-zero. (i.e. if new user-space tries to make use
>      of a new syscall feature that this kernel has not implemented
>      yet.) This way user-space can figure out whether a particular
>      new parameter is supported by a kernel, without having to add new
>      system calls or versioning.

In theory clone4 could have had this special case for "userspace passed
extra parameters this kernel doesn't understand, but they're all zero",
but since this is a brand new ABI, it seems far easier for userspace to
simply pass only the size needed for its non-zero parameters, and then
the kernel can reject structures larger than it expects.  Only pass the
new version of the args structure if you need to pass non-zero values
for the new fields in that version.

Since clone4 doesn't even copy the structure if the size exceeds what it
understands, that also avoids additional special cases such as the user
passing in an obscenely large size.

> Also note how 'fool proof' this kind of 'versioning' is: the parameter 
> block is extended by appending to the end, cleanly extending the 
> kernel internals to handle the new parameter - end of story. The 
> zeroing logic on size mismatch will take care of the rest 
> automatically, it's all ABI forwards and backwards compatible to the 
> maximum possible extent.

Right.

> It's relatively easy to use this same interface from 32-bit compat 
> environments as well: they can use the very same parameter block, the 
> kernel's compat layer possibly just needs to do a minor amount of 
> pointer translation for the _addr fields, for example to truncate them 
> down to 32 bits for security, by filling in the high words of pointers 
> with 0. (This too can be optimized further if needed, by open coding 
> the copy and zeroing.) This too is forwards and backwards ABI 
> compatible to the maximum possible extent.

That's precisely what the compat version of clone4 does, using the
kernel's existing compat types and conversion functions.

> So introduce a single new syscall with the right ABI architecture and 
> you can leave the old mistakes alone.

Not quite.  Because copy_thread digs the tls argument out of the saved
syscall registers rather than via C parameter passing, with the current
implementation, no syscall can call down into copy_thread (by way of
copy_process) unless it has the tls parameter passed in the
architecture-specific register corresponding to the same syscall
argument sys_clone passes it in.  That's exactly why I submitted the
HAVE_COPY_THREAD_TLS series (the first two patches of the clone4
series): to eliminate that hack and allow for a replacement clone
syscall.

Would you consider reviewing the clone4 v2 patch series, which should
provide exactly the new, easily-versioned, cross-architecture interface
you're asking for?

- Josh Triplett

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] x86/asm/entry/32: Explain stub32_clone logic
@ 2015-06-03 13:58 Denys Vlasenko
  0 siblings, 0 replies; 11+ messages in thread
From: Denys Vlasenko @ 2015-06-03 13:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Borislav Petkov,
	H. Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	x86, linux-kernel

The reason for copying of %r8 to %rcx is quite non-obvious.
Add a comment which explains why it is done.

Fix indentation and trailing whitespace while at it.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ingo Molnar <mingo@kernel.org>
CC: Borislav Petkov <bp@alien8.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
---

This is a resend.

 arch/x86/ia32/ia32entry.S | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 2ca052e..8e72256 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -562,9 +562,17 @@ GLOBAL(\label)
 
 	ALIGN
 GLOBAL(stub32_clone)
-	leaq sys_clone(%rip),%rax
+	leaq	sys_clone(%rip), %rax
+	/*
+	 * 32-bit clone API is clone(..., int tls_val, int *child_tidptr).
+	 * 64-bit clone API is clone(..., int *child_tidptr, int tls_val).
+	 * Native 64-bit kernel's sys_clone() implements the latter.
+	 * We need to swap args here. But since tls_val is in fact ignored
+	 * by sys_clone(), we can get away with an assignment
+	 * (arg4 = arg5) instead of a full swap:
+	 */
 	mov	%r8, %rcx
-	jmp  ia32_ptregs_common	
+	jmp	ia32_ptregs_common
 
 	ALIGN
 ia32_ptregs_common:
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-06-03 13:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-22 16:40 [PATCH 1/2] x86/asm/entry/32: Explain stub32_clone logic Denys Vlasenko
2015-04-22 16:40 ` [PATCH 2/2] x86/asm/entry/32: Remove unnecessary optimization in stub32_clone Denys Vlasenko
2015-04-22 16:54   ` Andy Lutomirski
2015-04-22 17:10     ` Josh Triplett
2015-04-22 18:04       ` Denys Vlasenko
2015-04-22 18:22       ` Linus Torvalds
2015-04-22 20:12         ` Josh Triplett
2015-04-23  6:24           ` Ingo Molnar
2015-04-23  7:36             ` Josh Triplett
2015-04-22 16:53 ` [PATCH 1/2] x86/asm/entry/32: Explain stub32_clone logic Andy Lutomirski
2015-06-03 13:58 Denys Vlasenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.