All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 1/2] fork: add clone3
@ 2019-06-04 16:09 Christian Brauner
  2019-06-04 16:09 ` [PATCH v3 2/2] arch: wire-up clone3() syscall Christian Brauner
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Christian Brauner @ 2019-06-04 16:09 UTC (permalink / raw)
  To: viro, linux-kernel, torvalds, jannh
  Cc: keescook, fweimer, oleg, arnd, dhowells, Christian Brauner,
	Pavel Emelyanov, Andrew Morton, Adrian Reber, Andrei Vagin,
	linux-api

This adds the clone3 system call.

As mentioned several times already (cf. [7], [8]) here's the promised
patchset for clone3().

We recently merged the CLONE_PIDFD patchset (cf. [1]). It took the last
free flag from clone().

Independent of the CLONE_PIDFD patchset a time namespace has been discussed
at Linux Plumber Conference last year and has been sent out and reviewed
(cf. [5]). It is expected that it will go upstream in the not too distant
future. However, it relies on the addition of the CLONE_NEWTIME flag to
clone(). The only other good candidate - CLONE_DETACHED - is currently not
recyclable as we have identified at least two large or widely used
codebases that currently pass this flag (cf. [2], [3], and [4]). Given that
CLONE_PIDFD grabbed the last clone() flag the time namespace is effectively
blocked. clone3() has the advantage that it will unblock this patchset
again. In general, clone3() is extensible and allows for the implementation
of new features.

The idea is to keep clone3() very simple and close to the original clone(),
specifically, to keep on supporting old clone()-based workloads.
We know there have been various creative proposals how a new process
creation syscall or even api is supposed to look like. Some people even
going so far as to argue that the traditional fork()+exec() split should be
abandoned in favor of an in-kernel version of spawn(). Independent of
whether or not we personally think spawn() is a good idea this patchset has
and does not want to have anything to do with this.
One stance we take is that there's no real good alternative to
clone()+exec() and we need and want to support this model going forward;
independent of spawn().
The following requirements guided clone3():
- bump the number of available flags
- move arguments that are currently passed as separate arguments
  in clone() into a dedicated struct clone_args
  - choose a struct layout that is easy to handle on 32 and on 64 bit
  - choose a struct layout that is extensible
  - give new flags that currently need to abuse another flag's dedicated
    return argument in clone() their own dedicated return argument
    (e.g. CLONE_PIDFD)
  - use a separate kernel internal struct kernel_clone_args that is
    properly typed according to current kernel conventions in fork.c and is
    different from  the uapi struct clone_args
- port _do_fork() to use kernel_clone_args so that all process creation
  syscalls such as fork(), vfork(), clone(), and clone3() behave identical
  (Arnd suggested, that we can probably also port do_fork() itself in a
   separate patchset.)
- ease of transition for userspace from clone() to clone3()
  This very much means that we do *not* remove functionality that userspace
  currently relies on as the latter is a good way of creating a syscall
  that won't be adopted.
- do not try to be clever or complex: keep clone3() as dumb as possible

In accordance with Linus suggestions (cf. [11]), clone3() has the following
signature:

/* uapi */
struct clone_args {
        __aligned_u64 flags;
        __aligned_u64 pidfd;
        __aligned_u64 child_tid;
        __aligned_u64 parent_tid;
        __aligned_u64 exit_signal;
        __aligned_u64 stack;
        __aligned_u64 stack_size;
        __aligned_u64 tls;
};

/* kernel internal */
struct kernel_clone_args {
        u64 flags;
        int __user *pidfd;
        int __user *child_tid;
        int __user *parent_tid;
        int exit_signal;
        unsigned long stack;
        unsigned long stack_size;
        unsigned long tls;
};

long sys_clone3(struct clone_args __user *uargs, size_t size)

clone3() cleanly supports all of the supported flags from clone() and thus
all legacy workloads.
The advantage of sticking close to the old clone() is the low cost for
userspace to switch to this new api. Quite a lot of userspace apis (e.g.
pthreads) are based on the clone() syscall. With the new clone3() syscall
supporting all of the old workloads and opening up the ability to add new
features should make switching to it for userspace more appealing. In
essence, glibc can just write a simple wrapper to switch from clone() to
clone3().

There has been some interest in this patchset already. We have received a
patch from the CRIU corner for clone3() that would set the PID/TID of a
restored process without /proc/sys/kernel/ns_last_pid to eliminate a race.

/* User visible differences to legacy clone() */
- CLONE_DETACHED will cause EINVAL with clone3()
- CSIGNAL is deprecated
  It is superseeded by a dedicated "exit_signal" argument in struct
  clone_args freeing up space for additional flags.
  This is based on a suggestion from Andrei and Linus (cf. [9] and [10])

/* References */
[1]: b3e5838252665ee4cfa76b82bdf1198dca81e5be
[2]: https://dxr.mozilla.org/mozilla-central/source/security/sandbox/linux/SandboxFilter.cpp#343
[3]: https://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_create.c#n233
[4]: https://sources.debian.org/src/blcr/0.8.5-2.3/cr_module/cr_dump_self.c/?hl=740#L740
[5]: https://lore.kernel.org/lkml/20190425161416.26600-1-dima@arista.com/
[6]: https://lore.kernel.org/lkml/20190425161416.26600-2-dima@arista.com/
[7]: https://lore.kernel.org/lkml/CAHrFyr5HxpGXA2YrKza-oB-GGwJCqwPfyhD-Y5wbktWZdt0sGQ@mail.gmail.com/
[8]: https://lore.kernel.org/lkml/20190524102756.qjsjxukuq2f4t6bo@brauner.io/
[9]: https://lore.kernel.org/lkml/20190529222414.GA6492@gmail.com/
[10]: https://lore.kernel.org/lkml/CAHk-=whQP-Ykxi=zSYaV9iXsHsENa+2fdj-zYKwyeyed63Lsfw@mail.gmail.com/
[11]: https://lore.kernel.org/lkml/CAHk-=wieuV4hGwznPsX-8E0G2FKhx3NjZ9X3dTKh5zKd+iqOBw@mail.gmail.com/

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <christian@brauner.io>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Jann Horn <jannh@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Adrian Reber <adrian@lisas.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: linux-api@vger.kernel.org
---
v1:
- Linus Torvalds <torvalds@linux-foundation.org>:
  - redesign based on Linus proposal
  - switch from arg-based to revision-based naming scheme: s/clone6/clone3/
- Arnd Bergmann <arnd@arndb.de>:
  - use a single copy_from_user() instead of multiple get_user() calls
    since the latter have a constant overhead on some architectures
  - a range of other tweaks and suggestions
v2:
- Linus Torvalds <torvalds@linux-foundation.org>,
  Andrei Vagin <avagin@gmail.com>:
  - replace CSIGNAL flag with dedicated exit_signal argument in struct
    clone_args
- Christian Brauner <christian@brauner.io>:
  - improve naming for some struct clone_args members
v3:
- Arnd Bergmann <arnd@arndb.de>:
  - replace memset with constructor for clarity and better object code
  - call flag verification function clone3_flags_valid() on
    kernel_clone_args instead of clone_args
  - remove __ARCH_WANT_SYS_CLONE ifdefine around sys_clone3()
- Christian Brauner <christian@brauner.io>:
  - replace clone3_flags_valid() with clone3_args_valid() and call in
    clone3() directly rather than in copy_clone_args_from_user()
    This cleanly separates copying the args from userspace from the
    verification whether those args are sane.
- David Howells <dhowells@redhat.com>:
  - align new struct member assignments with tabs
  - replace CLONE_MAX by with a non-uapi exported CLONE_LEGACY_FLAGS and
    define it as  0xffffffffULL for clarity
  - make copy_clone_args_from_user() noinline
  - avoid assigning to local variables from struct kernel_clone_args
    members in cases where it makes sense
---
 arch/x86/ia32/sys_ia32.c   |  12 ++-
 include/linux/sched/task.h |  17 +++-
 include/linux/syscalls.h   |   4 +
 include/uapi/linux/sched.h |  16 +++
 kernel/fork.c              | 201 ++++++++++++++++++++++++++++---------
 5 files changed, 199 insertions(+), 51 deletions(-)

diff --git a/arch/x86/ia32/sys_ia32.c b/arch/x86/ia32/sys_ia32.c
index a43212036257..64a6c952091e 100644
--- a/arch/x86/ia32/sys_ia32.c
+++ b/arch/x86/ia32/sys_ia32.c
@@ -237,6 +237,14 @@ COMPAT_SYSCALL_DEFINE5(x86_clone, unsigned long, clone_flags,
 		       unsigned long, newsp, int __user *, parent_tidptr,
 		       unsigned long, tls_val, int __user *, child_tidptr)
 {
-	return _do_fork(clone_flags, newsp, 0, parent_tidptr, child_tidptr,
-			tls_val);
+	struct kernel_clone_args args = {
+		.flags		= (clone_flags & ~CSIGNAL),
+		.child_tid	= child_tidptr,
+		.parent_tid	= parent_tidptr,
+		.exit_signal	= (clone_flags & CSIGNAL),
+		.stack		= newsp,
+		.tls		= tls_val,
+	};
+
+	return _do_fork(&args);
 }
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index f1227f2c38a4..109a0df5af39 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -8,11 +8,26 @@
  */
 
 #include <linux/sched.h>
+#include <linux/uaccess.h>
 
 struct task_struct;
 struct rusage;
 union thread_union;
 
+/* All the bits taken by the old clone syscall. */
+#define CLONE_LEGACY_FLAGS 0xffffffffULL
+
+struct kernel_clone_args {
+	u64 flags;
+	int __user *pidfd;
+	int __user *child_tid;
+	int __user *parent_tid;
+	int exit_signal;
+	unsigned long stack;
+	unsigned long stack_size;
+	unsigned long tls;
+};
+
 /*
  * This serializes "schedule()" and also protects
  * the run-queue from deletions/modifications (but
@@ -73,7 +88,7 @@ extern void do_group_exit(int);
 extern void exit_files(struct task_struct *);
 extern void exit_itimers(struct signal_struct *);
 
-extern long _do_fork(unsigned long, unsigned long, unsigned long, int __user *, int __user *, unsigned long);
+extern long _do_fork(struct kernel_clone_args *kargs);
 extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, int __user *);
 struct task_struct *fork_idle(int);
 struct mm_struct *copy_init_mm(void);
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index e2870fe1be5b..60a81f374ca3 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -70,6 +70,7 @@ struct sigaltstack;
 struct rseq;
 union bpf_attr;
 struct io_uring_params;
+struct clone_args;
 
 #include <linux/types.h>
 #include <linux/aio_abi.h>
@@ -852,6 +853,9 @@ asmlinkage long sys_clone(unsigned long, unsigned long, int __user *,
 	       int __user *, unsigned long);
 #endif
 #endif
+
+asmlinkage long sys_clone3(struct clone_args __user *uargs, size_t size);
+
 asmlinkage long sys_execve(const char __user *filename,
 		const char __user *const __user *argv,
 		const char __user *const __user *envp);
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index ed4ee170bee2..f5331dbdcaa2 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -2,6 +2,8 @@
 #ifndef _UAPI_LINUX_SCHED_H
 #define _UAPI_LINUX_SCHED_H
 
+#include <linux/types.h>
+
 /*
  * cloning flags:
  */
@@ -31,6 +33,20 @@
 #define CLONE_NEWNET		0x40000000	/* New network namespace */
 #define CLONE_IO		0x80000000	/* Clone io context */
 
+/*
+ * Arguments for the clone3 syscall
+ */
+struct clone_args {
+	__aligned_u64 flags;
+	__aligned_u64 pidfd;
+	__aligned_u64 child_tid;
+	__aligned_u64 parent_tid;
+	__aligned_u64 exit_signal;
+	__aligned_u64 stack;
+	__aligned_u64 stack_size;
+	__aligned_u64 tls;
+};
+
 /*
  * Scheduling policies
  */
diff --git a/kernel/fork.c b/kernel/fork.c
index b4cba953040a..08ff131f26b4 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1760,19 +1760,15 @@ static __always_inline void delayed_free_task(struct task_struct *tsk)
  * flags). The actual kick-off is left to the caller.
  */
 static __latent_entropy struct task_struct *copy_process(
-					unsigned long clone_flags,
-					unsigned long stack_start,
-					unsigned long stack_size,
-					int __user *parent_tidptr,
-					int __user *child_tidptr,
 					struct pid *pid,
 					int trace,
-					unsigned long tls,
-					int node)
+					int node,
+					struct kernel_clone_args *args)
 {
 	int pidfd = -1, retval;
 	struct task_struct *p;
 	struct multiprocess_signals delayed;
+	u64 clone_flags = args->flags;
 
 	/*
 	 * Don't allow sharing the root directory with processes in a different
@@ -1821,27 +1817,12 @@ static __latent_entropy struct task_struct *copy_process(
 	}
 
 	if (clone_flags & CLONE_PIDFD) {
-		int reserved;
-
 		/*
-		 * - CLONE_PARENT_SETTID is useless for pidfds and also
-		 *   parent_tidptr is used to return pidfds.
 		 * - CLONE_DETACHED is blocked so that we can potentially
 		 *   reuse it later for CLONE_PIDFD.
 		 * - CLONE_THREAD is blocked until someone really needs it.
 		 */
-		if (clone_flags &
-		    (CLONE_DETACHED | CLONE_PARENT_SETTID | CLONE_THREAD))
-			return ERR_PTR(-EINVAL);
-
-		/*
-		 * Verify that parent_tidptr is sane so we can potentially
-		 * reuse it later.
-		 */
-		if (get_user(reserved, parent_tidptr))
-			return ERR_PTR(-EFAULT);
-
-		if (reserved != 0)
+		if (clone_flags & (CLONE_DETACHED | CLONE_THREAD))
 			return ERR_PTR(-EINVAL);
 	}
 
@@ -1874,11 +1855,11 @@ static __latent_entropy struct task_struct *copy_process(
 	 * p->set_child_tid which is (ab)used as a kthread's data pointer for
 	 * kernel threads (PF_KTHREAD).
 	 */
-	p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? child_tidptr : NULL;
+	p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? args->child_tid : NULL;
 	/*
 	 * Clear TID on mm_release()?
 	 */
-	p->clear_child_tid = (clone_flags & CLONE_CHILD_CLEARTID) ? child_tidptr : NULL;
+	p->clear_child_tid = (clone_flags & CLONE_CHILD_CLEARTID) ? args->child_tid : NULL;
 
 	ftrace_graph_init_task(p);
 
@@ -2037,7 +2018,8 @@ static __latent_entropy struct task_struct *copy_process(
 	retval = copy_io(clone_flags, p);
 	if (retval)
 		goto bad_fork_cleanup_namespaces;
-	retval = copy_thread_tls(clone_flags, stack_start, stack_size, p, tls);
+	retval = copy_thread_tls(clone_flags, args->stack, args->stack_size, p,
+				 args->tls);
 	if (retval)
 		goto bad_fork_cleanup_io;
 
@@ -2062,7 +2044,7 @@ static __latent_entropy struct task_struct *copy_process(
 			goto bad_fork_free_pid;
 
 		pidfd = retval;
-		retval = put_user(pidfd, parent_tidptr);
+		retval = put_user(pidfd, args->pidfd);
 		if (retval)
 			goto bad_fork_put_pidfd;
 	}
@@ -2105,7 +2087,7 @@ static __latent_entropy struct task_struct *copy_process(
 		if (clone_flags & CLONE_PARENT)
 			p->exit_signal = current->group_leader->exit_signal;
 		else
-			p->exit_signal = (clone_flags & CSIGNAL);
+			p->exit_signal = args->exit_signal;
 		p->group_leader = p;
 		p->tgid = p->pid;
 	}
@@ -2313,8 +2295,11 @@ static inline void init_idle_pids(struct task_struct *idle)
 struct task_struct *fork_idle(int cpu)
 {
 	struct task_struct *task;
-	task = copy_process(CLONE_VM, 0, 0, NULL, NULL, &init_struct_pid, 0, 0,
-			    cpu_to_node(cpu));
+	struct kernel_clone_args args = {
+		.flags = CLONE_VM,
+	};
+
+	task = copy_process(&init_struct_pid, 0, cpu_to_node(cpu), &args);
 	if (!IS_ERR(task)) {
 		init_idle_pids(task);
 		init_idle(task, cpu);
@@ -2334,13 +2319,9 @@ struct mm_struct *copy_init_mm(void)
  * It copies the process, and if successful kick-starts
  * it and waits for it to finish using the VM if required.
  */
-long _do_fork(unsigned long clone_flags,
-	      unsigned long stack_start,
-	      unsigned long stack_size,
-	      int __user *parent_tidptr,
-	      int __user *child_tidptr,
-	      unsigned long tls)
+long _do_fork(struct kernel_clone_args *args)
 {
+	u64 clone_flags = args->flags;
 	struct completion vfork;
 	struct pid *pid;
 	struct task_struct *p;
@@ -2356,7 +2337,7 @@ long _do_fork(unsigned long clone_flags,
 	if (!(clone_flags & CLONE_UNTRACED)) {
 		if (clone_flags & CLONE_VFORK)
 			trace = PTRACE_EVENT_VFORK;
-		else if ((clone_flags & CSIGNAL) != SIGCHLD)
+		else if (args->exit_signal != SIGCHLD)
 			trace = PTRACE_EVENT_CLONE;
 		else
 			trace = PTRACE_EVENT_FORK;
@@ -2365,8 +2346,7 @@ long _do_fork(unsigned long clone_flags,
 			trace = 0;
 	}
 
-	p = copy_process(clone_flags, stack_start, stack_size, parent_tidptr,
-			 child_tidptr, NULL, trace, tls, NUMA_NO_NODE);
+	p = copy_process(NULL, trace, NUMA_NO_NODE, args);
 	add_latent_entropy();
 
 	if (IS_ERR(p))
@@ -2382,7 +2362,7 @@ long _do_fork(unsigned long clone_flags,
 	nr = pid_vnr(pid);
 
 	if (clone_flags & CLONE_PARENT_SETTID)
-		put_user(nr, parent_tidptr);
+		put_user(nr, args->parent_tid);
 
 	if (clone_flags & CLONE_VFORK) {
 		p->vfork_done = &vfork;
@@ -2414,8 +2394,16 @@ long do_fork(unsigned long clone_flags,
 	      int __user *parent_tidptr,
 	      int __user *child_tidptr)
 {
-	return _do_fork(clone_flags, stack_start, stack_size,
-			parent_tidptr, child_tidptr, 0);
+	struct kernel_clone_args args = {
+		.flags		= (clone_flags & ~CSIGNAL),
+		.child_tid	= child_tidptr,
+		.parent_tid	= parent_tidptr,
+		.exit_signal	= (clone_flags & CSIGNAL),
+		.stack		= stack_start,
+		.stack_size	= stack_size,
+	};
+
+	return _do_fork(&args);
 }
 #endif
 
@@ -2424,15 +2412,25 @@ long do_fork(unsigned long clone_flags,
  */
 pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags)
 {
-	return _do_fork(flags|CLONE_VM|CLONE_UNTRACED, (unsigned long)fn,
-		(unsigned long)arg, NULL, NULL, 0);
+	struct kernel_clone_args args = {
+		.flags		= ((flags | CLONE_VM | CLONE_UNTRACED) & ~CSIGNAL),
+		.exit_signal	= (flags & CSIGNAL),
+		.stack		= (unsigned long)fn,
+		.stack_size	= (unsigned long)arg,
+	};
+
+	return _do_fork(&args);
 }
 
 #ifdef __ARCH_WANT_SYS_FORK
 SYSCALL_DEFINE0(fork)
 {
 #ifdef CONFIG_MMU
-	return _do_fork(SIGCHLD, 0, 0, NULL, NULL, 0);
+	struct kernel_clone_args args = {
+		.exit_signal = SIGCHLD,
+	};
+
+	return _do_fork(&args);
 #else
 	/* can not support in nommu mode */
 	return -EINVAL;
@@ -2443,8 +2441,12 @@ SYSCALL_DEFINE0(fork)
 #ifdef __ARCH_WANT_SYS_VFORK
 SYSCALL_DEFINE0(vfork)
 {
-	return _do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, 0,
-			0, NULL, NULL, 0);
+	struct kernel_clone_args args = {
+		.flags		= CLONE_VFORK | CLONE_VM,
+		.exit_signal	= SIGCHLD,
+	};
+
+	return _do_fork(&args);
 }
 #endif
 
@@ -2472,7 +2474,110 @@ SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
 		 unsigned long, tls)
 #endif
 {
-	return _do_fork(clone_flags, newsp, 0, parent_tidptr, child_tidptr, tls);
+	struct kernel_clone_args args = {
+		.flags		= (clone_flags & ~CSIGNAL),
+		.pidfd		= parent_tidptr,
+		.child_tid	= child_tidptr,
+		.parent_tid	= parent_tidptr,
+		.exit_signal	= (clone_flags & CSIGNAL),
+		.stack		= newsp,
+		.tls		= tls,
+	};
+
+	/* clone(CLONE_PIDFD) uses parent_tidptr to return a pidfd */
+	if ((clone_flags & CLONE_PIDFD) && (clone_flags & CLONE_PARENT_SETTID))
+		return -EINVAL;
+
+	return _do_fork(&args);
+}
+
+noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
+					      struct clone_args __user *uargs,
+					      size_t size)
+{
+	struct clone_args args;
+
+	if (unlikely(size > PAGE_SIZE))
+		return -E2BIG;
+
+	if (unlikely(size < sizeof(struct clone_args)))
+		return -EINVAL;
+
+	if (unlikely(!access_ok(uargs, size)))
+		return -EFAULT;
+
+	if (size > sizeof(struct clone_args)) {
+		unsigned char __user *addr;
+		unsigned char __user *end;
+		unsigned char val;
+
+		addr = (void __user *)uargs + sizeof(struct clone_args);
+		end = (void __user *)uargs + size;
+
+		for (; addr < end; addr++) {
+			if (get_user(val, addr))
+				return -EFAULT;
+			if (val)
+				return -E2BIG;
+		}
+
+		size = sizeof(struct clone_args);
+	}
+
+	if (copy_from_user(&args, uargs, size))
+		return -EFAULT;
+
+	*kargs = (struct kernel_clone_args){
+		.flags		= args.flags,
+		.pidfd		= u64_to_user_ptr(args.pidfd),
+		.child_tid	= u64_to_user_ptr(args.child_tid),
+		.parent_tid	= u64_to_user_ptr(args.parent_tid),
+		.exit_signal	= args.exit_signal,
+		.stack		= args.stack,
+		.stack_size	= args.stack_size,
+		.tls		= args.tls,
+	};
+
+	return 0;
+}
+
+static bool clone3_args_valid(const struct kernel_clone_args *kargs)
+{
+	/*
+	 * All lower bits of the flag word are taken.
+	 * Verify that no other unknown flags are passed along.
+	 */
+	if (kargs->flags & ~CLONE_LEGACY_FLAGS)
+		return false;
+
+	/*
+	 * - make the CLONE_DETACHED bit reuseable for clone3
+	 * - make the CSIGNAL bits reuseable for clone3
+	 */
+	if (kargs->flags & (CLONE_DETACHED | CSIGNAL))
+		return false;
+
+	if ((kargs->flags & (CLONE_THREAD | CLONE_PARENT)) &&
+	    kargs->exit_signal)
+		return false;
+
+	return true;
+}
+
+SYSCALL_DEFINE2(clone3, struct clone_args __user *, uargs, size_t, size)
+{
+	int err;
+
+	struct kernel_clone_args kargs;
+
+	err = copy_clone_args_from_user(&kargs, uargs, size);
+	if (err)
+		return err;
+
+	if (!clone3_args_valid(&kargs))
+		return -EINVAL;
+
+	return _do_fork(&kargs);
 }
 #endif
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 2/2] arch: wire-up clone3() syscall
  2019-06-04 16:09 [PATCH v3 1/2] fork: add clone3 Christian Brauner
@ 2019-06-04 16:09 ` Christian Brauner
  2019-06-04 18:40   ` Arnd Bergmann
  2019-06-20 18:44   ` [PATCH v3 2/2] arch: wire-up clone3() syscall Guenter Roeck
  2019-06-04 21:54 ` [PATCH v3 1/2] fork: add clone3 Christian Brauner
  2019-06-06 21:46 ` Serge E. Hallyn
  2 siblings, 2 replies; 23+ messages in thread
From: Christian Brauner @ 2019-06-04 16:09 UTC (permalink / raw)
  To: viro, linux-kernel, torvalds, jannh
  Cc: keescook, fweimer, oleg, arnd, dhowells, Christian Brauner,
	Andrew Morton, Adrian Reber, linux-api, linux-arch, x86

Wire up the clone3() call on all arches that don't require hand-rolled
assembly.

Some of the arches look like they need special assembly massaging and it is
probably smarter if the appropriate arch maintainers would do the actual
wiring. Arches that are wired-up are:
- x86{_32,64}
- arm{64}
- xtensa

Signed-off-by: Christian Brauner <christian@brauner.io>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Adrian Reber <adrian@lisas.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: x86@kernel.org
---
v1: unchanged
v2: unchanged
v3:
- Christian Brauner <christian@brauner.io>:
  - wire up clone3 on all arches that don't have hand-rolled entry points
    for clone
---
 arch/arm/tools/syscall.tbl                  | 1 +
 arch/arm64/include/asm/unistd.h             | 2 +-
 arch/arm64/include/asm/unistd32.h           | 2 ++
 arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
 arch/x86/entry/syscalls/syscall_32.tbl      | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl      | 1 +
 arch/xtensa/kernel/syscalls/syscall.tbl     | 1 +
 include/uapi/asm-generic/unistd.h           | 4 +++-
 8 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index aaf479a9e92d..e99a82bdb93a 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -447,3 +447,4 @@
 431	common	fsconfig			sys_fsconfig
 432	common	fsmount				sys_fsmount
 433	common	fspick				sys_fspick
+436	common	clone3				sys_clone3
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 70e6882853c0..24480c2d95da 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -44,7 +44,7 @@
 #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls		434
+#define __NR_compat_syscalls		437
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index c39e90600bb3..b144ea675d70 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -886,6 +886,8 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig)
 __SYSCALL(__NR_fsmount, sys_fsmount)
 #define __NR_fspick 433
 __SYSCALL(__NR_fspick, sys_fspick)
+#define __NR_clone3 436
+__SYSCALL(__NR_clone3, sys_clone3)
 
 /*
  * Please add new compat syscalls above this comment and update
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index 26339e417695..3110440bcc31 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -439,3 +439,4 @@
 431	common	fsconfig			sys_fsconfig
 432	common	fsmount				sys_fsmount
 433	common	fspick				sys_fspick
+436	common	clone3				sys_clone3
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index ad968b7bac72..80e26211feff 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -438,3 +438,4 @@
 431	i386	fsconfig		sys_fsconfig			__ia32_sys_fsconfig
 432	i386	fsmount			sys_fsmount			__ia32_sys_fsmount
 433	i386	fspick			sys_fspick			__ia32_sys_fspick
+436	i386	clone3			sys_clone3			__ia32_sys_clone3
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index b4e6f9e6204a..7968f0b5b5e8 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -355,6 +355,7 @@
 431	common	fsconfig		__x64_sys_fsconfig
 432	common	fsmount			__x64_sys_fsmount
 433	common	fspick			__x64_sys_fspick
+436	common	clone3			__x64_sys_clone3/ptregs
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index 5fa0ee1c8e00..b2767c8c2b4e 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -404,3 +404,4 @@
 431	common	fsconfig			sys_fsconfig
 432	common	fsmount				sys_fsmount
 433	common	fspick				sys_fspick
+436	common	clone3				sys_clone3
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index a87904daf103..45bc87687c47 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -844,9 +844,11 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig)
 __SYSCALL(__NR_fsmount, sys_fsmount)
 #define __NR_fspick 433
 __SYSCALL(__NR_fspick, sys_fspick)
+#define __NR_clone3 436
+__SYSCALL(__NR_clone3, sys_clone3)
 
 #undef __NR_syscalls
-#define __NR_syscalls 434
+#define __NR_syscalls 437
 
 /*
  * 32 bit systems traditionally used different
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] arch: wire-up clone3() syscall
  2019-06-04 16:09 ` [PATCH v3 2/2] arch: wire-up clone3() syscall Christian Brauner
@ 2019-06-04 18:40   ` Arnd Bergmann
  2019-06-04 21:29     ` Christian Brauner
  2019-06-20 18:44   ` [PATCH v3 2/2] arch: wire-up clone3() syscall Guenter Roeck
  1 sibling, 1 reply; 23+ messages in thread
From: Arnd Bergmann @ 2019-06-04 18:40 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Al Viro, Linux Kernel Mailing List, Linus Torvalds, Jann Horn,
	Kees Cook, Florian Weimer, Oleg Nesterov, David Howells,
	Andrew Morton, Adrian Reber, Linux API, linux-arch,
	the arch/x86 maintainers

On Tue, Jun 4, 2019 at 6:09 PM Christian Brauner <christian@brauner.io> wrote:
>
> Wire up the clone3() call on all arches that don't require hand-rolled
> assembly.
>
> Some of the arches look like they need special assembly massaging and it is
> probably smarter if the appropriate arch maintainers would do the actual
> wiring. Arches that are wired-up are:
> - x86{_32,64}
> - arm{64}
> - xtensa

The ones you did look good to me. I would hope that we can do all other
architectures the same way, even if they have special assembly wrappers
for the old clone(). The most interesting cases appear to be ia64, alpha,
m68k and sparc, so it would be good if their maintainers could take a
look.

What do you use for testing? Would it be possible to override the
internal clone() function in glibc with an LD_PRELOAD library
to quickly test one of the other architectures for regressions?

      Arnd

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] arch: wire-up clone3() syscall
  2019-06-04 18:40   ` Arnd Bergmann
@ 2019-06-04 21:29     ` Christian Brauner
  2020-01-15 22:41         ` Vineet Gupta
  0 siblings, 1 reply; 23+ messages in thread
From: Christian Brauner @ 2019-06-04 21:29 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Al Viro, Linux Kernel Mailing List, Linus Torvalds, Jann Horn,
	Kees Cook, Florian Weimer, Oleg Nesterov, David Howells,
	Andrew Morton, Adrian Reber, Linux API, linux-arch,
	the arch/x86 maintainers

On Tue, Jun 04, 2019 at 08:40:01PM +0200, Arnd Bergmann wrote:
> On Tue, Jun 4, 2019 at 6:09 PM Christian Brauner <christian@brauner.io> wrote:
> >
> > Wire up the clone3() call on all arches that don't require hand-rolled
> > assembly.
> >
> > Some of the arches look like they need special assembly massaging and it is
> > probably smarter if the appropriate arch maintainers would do the actual
> > wiring. Arches that are wired-up are:
> > - x86{_32,64}
> > - arm{64}
> > - xtensa
> 
> The ones you did look good to me. I would hope that we can do all other
> architectures the same way, even if they have special assembly wrappers
> for the old clone(). The most interesting cases appear to be ia64, alpha,
> m68k and sparc, so it would be good if their maintainers could take a
> look.

Yes, agreed. They can sort this out even after this lands.

> 
> What do you use for testing? Would it be possible to override the
> internal clone() function in glibc with an LD_PRELOAD library
> to quickly test one of the other architectures for regressions?

I have a test program that is rather horrendously ugly and I compiled
kernels for x86 and the arms and tested in qemu. The program basically
looks like [1].

Christian

[1]:
#define _GNU_SOURCE
#include <err.h>
#include <errno.h>
#include <fcntl.h>
#include <linux/sched.h>
#include <linux/types.h>
#include <sched.h>
#include <signal.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mount.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/sysmacros.h>
#include <sys/types.h>
#include <sys/un.h>
#include <sys/wait.h>
#include <unistd.h>

static pid_t raw_clone(struct clone_args *args)
{
	return syscall(__NR_clone3, args, sizeof(struct clone_args));
}

static pid_t raw_clone_legacy(int *pidfd, unsigned int flags)
{
	return syscall(__NR_clone, flags, 0, pidfd, 0, 0);
}

static int wait_for_pid(pid_t pid)
{
	int status, ret;

again:
	ret = waitpid(pid, &status, 0);
	if (ret == -1) {
		if (errno == EINTR)
			goto again;

		return -1;
	}

	if (ret != pid)
		goto again;

	if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
		return -1;

	return 0;
}

#define ptr_to_u64(ptr) ((__u64)((uintptr_t)(ptr)))
#define u64_to_ptr(n) ((uintptr_t)((__u64)(n)))

int main(int argc, char *argv[])
{
	int pidfd = -1;
	pid_t parent_tid = -1, pid = -1;
	struct clone_args args = {0};
	args.parent_tid = ptr_to_u64(&parent_tid);
	args.pidfd = ptr_to_u64(&pidfd);
	args.flags = CLONE_PIDFD | CLONE_PARENT_SETTID;
	args.exit_signal = SIGCHLD;

	pid = raw_clone(&args);
	if (pid < 0) {
		fprintf(stderr, "%s - Failed to create new process\n",
			strerror(errno));
		exit(EXIT_FAILURE);
	}

	if (pid == 0) {
		printf("I am the child with pid %d\n", getpid());
		exit(EXIT_SUCCESS);
	}

	printf("raw_clone: I am the parent. My child's pid is   %d\n", pid);
	printf("raw_clone: I am the parent. My child's pidfd is %d\n",
	       *(int *)args.pidfd);
	printf("raw_clone: I am the parent. My child's paren_tid value is %d\n",
	       *(pid_t *)args.parent_tid);

	if (wait_for_pid(pid))
		exit(EXIT_FAILURE);

	if (pid != *(pid_t *)args.parent_tid)
		exit(EXIT_FAILURE);

	close(pidfd);

	printf("\n\n");
	pidfd = -1;
	pid = raw_clone_legacy(&pidfd, CLONE_PIDFD | SIGCHLD);
	if (pid < 0) {
		fprintf(stderr, "%s - Failed to create new process\n",
			strerror(errno));
		exit(EXIT_FAILURE);
	}

	if (pid == 0) {
		printf("I am the child with pid %d\n", getpid());
		exit(EXIT_SUCCESS);
	}

	printf("raw_clone_legacy: I am the parent. My child's pid is   %d\n",
	       pid);
	printf("raw_clone_legacy: I am the parent. My child's pidfd is %d\n",
	       pidfd);

	if (wait_for_pid(pid))
		exit(EXIT_FAILURE);

	if (pid != *(pid_t *)args.parent_tid)
		exit(EXIT_FAILURE);

	return 0;
}

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] fork: add clone3
  2019-06-04 16:09 [PATCH v3 1/2] fork: add clone3 Christian Brauner
  2019-06-04 16:09 ` [PATCH v3 2/2] arch: wire-up clone3() syscall Christian Brauner
@ 2019-06-04 21:54 ` Christian Brauner
  2019-06-06 21:46 ` Serge E. Hallyn
  2 siblings, 0 replies; 23+ messages in thread
From: Christian Brauner @ 2019-06-04 21:54 UTC (permalink / raw)
  To: torvalds
  Cc: keescook, fweimer, oleg, arnd, dhowells, Pavel Emelyanov,
	Andrew Morton, Adrian Reber, Andrei Vagin, linux-api, viro,
	linux-kernel, jannh

On Tue, Jun 04, 2019 at 06:09:43PM +0200, Christian Brauner wrote:
> This adds the clone3 system call.
> 
> As mentioned several times already (cf. [7], [8]) here's the promised
> patchset for clone3().
> 
> We recently merged the CLONE_PIDFD patchset (cf. [1]). It took the last
> free flag from clone().
> 
> Independent of the CLONE_PIDFD patchset a time namespace has been discussed
> at Linux Plumber Conference last year and has been sent out and reviewed
> (cf. [5]). It is expected that it will go upstream in the not too distant
> future. However, it relies on the addition of the CLONE_NEWTIME flag to
> clone(). The only other good candidate - CLONE_DETACHED - is currently not
> recyclable as we have identified at least two large or widely used
> codebases that currently pass this flag (cf. [2], [3], and [4]). Given that
> CLONE_PIDFD grabbed the last clone() flag the time namespace is effectively
> blocked. clone3() has the advantage that it will unblock this patchset
> again. In general, clone3() is extensible and allows for the implementation
> of new features.
> 
> The idea is to keep clone3() very simple and close to the original clone(),
> specifically, to keep on supporting old clone()-based workloads.
> We know there have been various creative proposals how a new process
> creation syscall or even api is supposed to look like. Some people even
> going so far as to argue that the traditional fork()+exec() split should be
> abandoned in favor of an in-kernel version of spawn(). Independent of
> whether or not we personally think spawn() is a good idea this patchset has
> and does not want to have anything to do with this.
> One stance we take is that there's no real good alternative to
> clone()+exec() and we need and want to support this model going forward;
> independent of spawn().
> The following requirements guided clone3():
> - bump the number of available flags
> - move arguments that are currently passed as separate arguments
>   in clone() into a dedicated struct clone_args
>   - choose a struct layout that is easy to handle on 32 and on 64 bit
>   - choose a struct layout that is extensible
>   - give new flags that currently need to abuse another flag's dedicated
>     return argument in clone() their own dedicated return argument
>     (e.g. CLONE_PIDFD)
>   - use a separate kernel internal struct kernel_clone_args that is
>     properly typed according to current kernel conventions in fork.c and is
>     different from  the uapi struct clone_args
> - port _do_fork() to use kernel_clone_args so that all process creation
>   syscalls such as fork(), vfork(), clone(), and clone3() behave identical
>   (Arnd suggested, that we can probably also port do_fork() itself in a
>    separate patchset.)
> - ease of transition for userspace from clone() to clone3()
>   This very much means that we do *not* remove functionality that userspace
>   currently relies on as the latter is a good way of creating a syscall
>   that won't be adopted.
> - do not try to be clever or complex: keep clone3() as dumb as possible
> 
> In accordance with Linus suggestions (cf. [11]), clone3() has the following
> signature:
> 
> /* uapi */
> struct clone_args {
>         __aligned_u64 flags;
>         __aligned_u64 pidfd;
>         __aligned_u64 child_tid;
>         __aligned_u64 parent_tid;
>         __aligned_u64 exit_signal;
>         __aligned_u64 stack;
>         __aligned_u64 stack_size;
>         __aligned_u64 tls;
> };
> 
> /* kernel internal */
> struct kernel_clone_args {
>         u64 flags;
>         int __user *pidfd;
>         int __user *child_tid;
>         int __user *parent_tid;
>         int exit_signal;
>         unsigned long stack;
>         unsigned long stack_size;
>         unsigned long tls;
> };
> 
> long sys_clone3(struct clone_args __user *uargs, size_t size)
> 
> clone3() cleanly supports all of the supported flags from clone() and thus
> all legacy workloads.
> The advantage of sticking close to the old clone() is the low cost for
> userspace to switch to this new api. Quite a lot of userspace apis (e.g.
> pthreads) are based on the clone() syscall. With the new clone3() syscall
> supporting all of the old workloads and opening up the ability to add new
> features should make switching to it for userspace more appealing. In
> essence, glibc can just write a simple wrapper to switch from clone() to
> clone3().
> 
> There has been some interest in this patchset already. We have received a
> patch from the CRIU corner for clone3() that would set the PID/TID of a
> restored process without /proc/sys/kernel/ns_last_pid to eliminate a race.
> 
> /* User visible differences to legacy clone() */
> - CLONE_DETACHED will cause EINVAL with clone3()
> - CSIGNAL is deprecated
>   It is superseeded by a dedicated "exit_signal" argument in struct
>   clone_args freeing up space for additional flags.
>   This is based on a suggestion from Andrei and Linus (cf. [9] and [10])
> 
> /* References */
> [1]: b3e5838252665ee4cfa76b82bdf1198dca81e5be
> [2]: https://dxr.mozilla.org/mozilla-central/source/security/sandbox/linux/SandboxFilter.cpp#343
> [3]: https://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_create.c#n233
> [4]: https://sources.debian.org/src/blcr/0.8.5-2.3/cr_module/cr_dump_self.c/?hl=740#L740
> [5]: https://lore.kernel.org/lkml/20190425161416.26600-1-dima@arista.com/
> [6]: https://lore.kernel.org/lkml/20190425161416.26600-2-dima@arista.com/
> [7]: https://lore.kernel.org/lkml/CAHrFyr5HxpGXA2YrKza-oB-GGwJCqwPfyhD-Y5wbktWZdt0sGQ@mail.gmail.com/
> [8]: https://lore.kernel.org/lkml/20190524102756.qjsjxukuq2f4t6bo@brauner.io/
> [9]: https://lore.kernel.org/lkml/20190529222414.GA6492@gmail.com/
> [10]: https://lore.kernel.org/lkml/CAHk-=whQP-Ykxi=zSYaV9iXsHsENa+2fdj-zYKwyeyed63Lsfw@mail.gmail.com/
> [11]: https://lore.kernel.org/lkml/CAHk-=wieuV4hGwznPsX-8E0G2FKhx3NjZ9X3dTKh5zKd+iqOBw@mail.gmail.com/
> 
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Christian Brauner <christian@brauner.io>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Pavel Emelyanov <xemul@virtuozzo.com>
> Cc: Jann Horn <jannh@google.com>
> Cc: David Howells <dhowells@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Adrian Reber <adrian@lisas.de>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Andrei Vagin <avagin@gmail.com>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Florian Weimer <fweimer@redhat.com>
> Cc: linux-api@vger.kernel.org

Linus,

Would you in principle be fine receiving this for 5.3 through my tree
together with the pidfd_open() and pidfd polling patches or would you
prefer a separate PR for it, or have this go alltogether through someone
else's tree (all assuming no nack of course)?
(I'd let Al handle close_range() as this seems vfs territory.)

Thanks!
Christian

> ---
> v1:
> - Linus Torvalds <torvalds@linux-foundation.org>:
>   - redesign based on Linus proposal
>   - switch from arg-based to revision-based naming scheme: s/clone6/clone3/
> - Arnd Bergmann <arnd@arndb.de>:
>   - use a single copy_from_user() instead of multiple get_user() calls
>     since the latter have a constant overhead on some architectures
>   - a range of other tweaks and suggestions
> v2:
> - Linus Torvalds <torvalds@linux-foundation.org>,
>   Andrei Vagin <avagin@gmail.com>:
>   - replace CSIGNAL flag with dedicated exit_signal argument in struct
>     clone_args
> - Christian Brauner <christian@brauner.io>:
>   - improve naming for some struct clone_args members
> v3:
> - Arnd Bergmann <arnd@arndb.de>:
>   - replace memset with constructor for clarity and better object code
>   - call flag verification function clone3_flags_valid() on
>     kernel_clone_args instead of clone_args
>   - remove __ARCH_WANT_SYS_CLONE ifdefine around sys_clone3()
> - Christian Brauner <christian@brauner.io>:
>   - replace clone3_flags_valid() with clone3_args_valid() and call in
>     clone3() directly rather than in copy_clone_args_from_user()
>     This cleanly separates copying the args from userspace from the
>     verification whether those args are sane.
> - David Howells <dhowells@redhat.com>:
>   - align new struct member assignments with tabs
>   - replace CLONE_MAX by with a non-uapi exported CLONE_LEGACY_FLAGS and
>     define it as  0xffffffffULL for clarity
>   - make copy_clone_args_from_user() noinline
>   - avoid assigning to local variables from struct kernel_clone_args
>     members in cases where it makes sense
> ---
>  arch/x86/ia32/sys_ia32.c   |  12 ++-
>  include/linux/sched/task.h |  17 +++-
>  include/linux/syscalls.h   |   4 +
>  include/uapi/linux/sched.h |  16 +++
>  kernel/fork.c              | 201 ++++++++++++++++++++++++++++---------
>  5 files changed, 199 insertions(+), 51 deletions(-)
> 
> diff --git a/arch/x86/ia32/sys_ia32.c b/arch/x86/ia32/sys_ia32.c
> index a43212036257..64a6c952091e 100644
> --- a/arch/x86/ia32/sys_ia32.c
> +++ b/arch/x86/ia32/sys_ia32.c
> @@ -237,6 +237,14 @@ COMPAT_SYSCALL_DEFINE5(x86_clone, unsigned long, clone_flags,
>  		       unsigned long, newsp, int __user *, parent_tidptr,
>  		       unsigned long, tls_val, int __user *, child_tidptr)
>  {
> -	return _do_fork(clone_flags, newsp, 0, parent_tidptr, child_tidptr,
> -			tls_val);
> +	struct kernel_clone_args args = {
> +		.flags		= (clone_flags & ~CSIGNAL),
> +		.child_tid	= child_tidptr,
> +		.parent_tid	= parent_tidptr,
> +		.exit_signal	= (clone_flags & CSIGNAL),
> +		.stack		= newsp,
> +		.tls		= tls_val,
> +	};
> +
> +	return _do_fork(&args);
>  }
> diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
> index f1227f2c38a4..109a0df5af39 100644
> --- a/include/linux/sched/task.h
> +++ b/include/linux/sched/task.h
> @@ -8,11 +8,26 @@
>   */
>  
>  #include <linux/sched.h>
> +#include <linux/uaccess.h>
>  
>  struct task_struct;
>  struct rusage;
>  union thread_union;
>  
> +/* All the bits taken by the old clone syscall. */
> +#define CLONE_LEGACY_FLAGS 0xffffffffULL
> +
> +struct kernel_clone_args {
> +	u64 flags;
> +	int __user *pidfd;
> +	int __user *child_tid;
> +	int __user *parent_tid;
> +	int exit_signal;
> +	unsigned long stack;
> +	unsigned long stack_size;
> +	unsigned long tls;
> +};
> +
>  /*
>   * This serializes "schedule()" and also protects
>   * the run-queue from deletions/modifications (but
> @@ -73,7 +88,7 @@ extern void do_group_exit(int);
>  extern void exit_files(struct task_struct *);
>  extern void exit_itimers(struct signal_struct *);
>  
> -extern long _do_fork(unsigned long, unsigned long, unsigned long, int __user *, int __user *, unsigned long);
> +extern long _do_fork(struct kernel_clone_args *kargs);
>  extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, int __user *);
>  struct task_struct *fork_idle(int);
>  struct mm_struct *copy_init_mm(void);
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index e2870fe1be5b..60a81f374ca3 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -70,6 +70,7 @@ struct sigaltstack;
>  struct rseq;
>  union bpf_attr;
>  struct io_uring_params;
> +struct clone_args;
>  
>  #include <linux/types.h>
>  #include <linux/aio_abi.h>
> @@ -852,6 +853,9 @@ asmlinkage long sys_clone(unsigned long, unsigned long, int __user *,
>  	       int __user *, unsigned long);
>  #endif
>  #endif
> +
> +asmlinkage long sys_clone3(struct clone_args __user *uargs, size_t size);
> +
>  asmlinkage long sys_execve(const char __user *filename,
>  		const char __user *const __user *argv,
>  		const char __user *const __user *envp);
> diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
> index ed4ee170bee2..f5331dbdcaa2 100644
> --- a/include/uapi/linux/sched.h
> +++ b/include/uapi/linux/sched.h
> @@ -2,6 +2,8 @@
>  #ifndef _UAPI_LINUX_SCHED_H
>  #define _UAPI_LINUX_SCHED_H
>  
> +#include <linux/types.h>
> +
>  /*
>   * cloning flags:
>   */
> @@ -31,6 +33,20 @@
>  #define CLONE_NEWNET		0x40000000	/* New network namespace */
>  #define CLONE_IO		0x80000000	/* Clone io context */
>  
> +/*
> + * Arguments for the clone3 syscall
> + */
> +struct clone_args {
> +	__aligned_u64 flags;
> +	__aligned_u64 pidfd;
> +	__aligned_u64 child_tid;
> +	__aligned_u64 parent_tid;
> +	__aligned_u64 exit_signal;
> +	__aligned_u64 stack;
> +	__aligned_u64 stack_size;
> +	__aligned_u64 tls;
> +};
> +
>  /*
>   * Scheduling policies
>   */
> diff --git a/kernel/fork.c b/kernel/fork.c
> index b4cba953040a..08ff131f26b4 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1760,19 +1760,15 @@ static __always_inline void delayed_free_task(struct task_struct *tsk)
>   * flags). The actual kick-off is left to the caller.
>   */
>  static __latent_entropy struct task_struct *copy_process(
> -					unsigned long clone_flags,
> -					unsigned long stack_start,
> -					unsigned long stack_size,
> -					int __user *parent_tidptr,
> -					int __user *child_tidptr,
>  					struct pid *pid,
>  					int trace,
> -					unsigned long tls,
> -					int node)
> +					int node,
> +					struct kernel_clone_args *args)
>  {
>  	int pidfd = -1, retval;
>  	struct task_struct *p;
>  	struct multiprocess_signals delayed;
> +	u64 clone_flags = args->flags;
>  
>  	/*
>  	 * Don't allow sharing the root directory with processes in a different
> @@ -1821,27 +1817,12 @@ static __latent_entropy struct task_struct *copy_process(
>  	}
>  
>  	if (clone_flags & CLONE_PIDFD) {
> -		int reserved;
> -
>  		/*
> -		 * - CLONE_PARENT_SETTID is useless for pidfds and also
> -		 *   parent_tidptr is used to return pidfds.
>  		 * - CLONE_DETACHED is blocked so that we can potentially
>  		 *   reuse it later for CLONE_PIDFD.
>  		 * - CLONE_THREAD is blocked until someone really needs it.
>  		 */
> -		if (clone_flags &
> -		    (CLONE_DETACHED | CLONE_PARENT_SETTID | CLONE_THREAD))
> -			return ERR_PTR(-EINVAL);
> -
> -		/*
> -		 * Verify that parent_tidptr is sane so we can potentially
> -		 * reuse it later.
> -		 */
> -		if (get_user(reserved, parent_tidptr))
> -			return ERR_PTR(-EFAULT);
> -
> -		if (reserved != 0)
> +		if (clone_flags & (CLONE_DETACHED | CLONE_THREAD))
>  			return ERR_PTR(-EINVAL);
>  	}
>  
> @@ -1874,11 +1855,11 @@ static __latent_entropy struct task_struct *copy_process(
>  	 * p->set_child_tid which is (ab)used as a kthread's data pointer for
>  	 * kernel threads (PF_KTHREAD).
>  	 */
> -	p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? child_tidptr : NULL;
> +	p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? args->child_tid : NULL;
>  	/*
>  	 * Clear TID on mm_release()?
>  	 */
> -	p->clear_child_tid = (clone_flags & CLONE_CHILD_CLEARTID) ? child_tidptr : NULL;
> +	p->clear_child_tid = (clone_flags & CLONE_CHILD_CLEARTID) ? args->child_tid : NULL;
>  
>  	ftrace_graph_init_task(p);
>  
> @@ -2037,7 +2018,8 @@ static __latent_entropy struct task_struct *copy_process(
>  	retval = copy_io(clone_flags, p);
>  	if (retval)
>  		goto bad_fork_cleanup_namespaces;
> -	retval = copy_thread_tls(clone_flags, stack_start, stack_size, p, tls);
> +	retval = copy_thread_tls(clone_flags, args->stack, args->stack_size, p,
> +				 args->tls);
>  	if (retval)
>  		goto bad_fork_cleanup_io;
>  
> @@ -2062,7 +2044,7 @@ static __latent_entropy struct task_struct *copy_process(
>  			goto bad_fork_free_pid;
>  
>  		pidfd = retval;
> -		retval = put_user(pidfd, parent_tidptr);
> +		retval = put_user(pidfd, args->pidfd);
>  		if (retval)
>  			goto bad_fork_put_pidfd;
>  	}
> @@ -2105,7 +2087,7 @@ static __latent_entropy struct task_struct *copy_process(
>  		if (clone_flags & CLONE_PARENT)
>  			p->exit_signal = current->group_leader->exit_signal;
>  		else
> -			p->exit_signal = (clone_flags & CSIGNAL);
> +			p->exit_signal = args->exit_signal;
>  		p->group_leader = p;
>  		p->tgid = p->pid;
>  	}
> @@ -2313,8 +2295,11 @@ static inline void init_idle_pids(struct task_struct *idle)
>  struct task_struct *fork_idle(int cpu)
>  {
>  	struct task_struct *task;
> -	task = copy_process(CLONE_VM, 0, 0, NULL, NULL, &init_struct_pid, 0, 0,
> -			    cpu_to_node(cpu));
> +	struct kernel_clone_args args = {
> +		.flags = CLONE_VM,
> +	};
> +
> +	task = copy_process(&init_struct_pid, 0, cpu_to_node(cpu), &args);
>  	if (!IS_ERR(task)) {
>  		init_idle_pids(task);
>  		init_idle(task, cpu);
> @@ -2334,13 +2319,9 @@ struct mm_struct *copy_init_mm(void)
>   * It copies the process, and if successful kick-starts
>   * it and waits for it to finish using the VM if required.
>   */
> -long _do_fork(unsigned long clone_flags,
> -	      unsigned long stack_start,
> -	      unsigned long stack_size,
> -	      int __user *parent_tidptr,
> -	      int __user *child_tidptr,
> -	      unsigned long tls)
> +long _do_fork(struct kernel_clone_args *args)
>  {
> +	u64 clone_flags = args->flags;
>  	struct completion vfork;
>  	struct pid *pid;
>  	struct task_struct *p;
> @@ -2356,7 +2337,7 @@ long _do_fork(unsigned long clone_flags,
>  	if (!(clone_flags & CLONE_UNTRACED)) {
>  		if (clone_flags & CLONE_VFORK)
>  			trace = PTRACE_EVENT_VFORK;
> -		else if ((clone_flags & CSIGNAL) != SIGCHLD)
> +		else if (args->exit_signal != SIGCHLD)
>  			trace = PTRACE_EVENT_CLONE;
>  		else
>  			trace = PTRACE_EVENT_FORK;
> @@ -2365,8 +2346,7 @@ long _do_fork(unsigned long clone_flags,
>  			trace = 0;
>  	}
>  
> -	p = copy_process(clone_flags, stack_start, stack_size, parent_tidptr,
> -			 child_tidptr, NULL, trace, tls, NUMA_NO_NODE);
> +	p = copy_process(NULL, trace, NUMA_NO_NODE, args);
>  	add_latent_entropy();
>  
>  	if (IS_ERR(p))
> @@ -2382,7 +2362,7 @@ long _do_fork(unsigned long clone_flags,
>  	nr = pid_vnr(pid);
>  
>  	if (clone_flags & CLONE_PARENT_SETTID)
> -		put_user(nr, parent_tidptr);
> +		put_user(nr, args->parent_tid);
>  
>  	if (clone_flags & CLONE_VFORK) {
>  		p->vfork_done = &vfork;
> @@ -2414,8 +2394,16 @@ long do_fork(unsigned long clone_flags,
>  	      int __user *parent_tidptr,
>  	      int __user *child_tidptr)
>  {
> -	return _do_fork(clone_flags, stack_start, stack_size,
> -			parent_tidptr, child_tidptr, 0);
> +	struct kernel_clone_args args = {
> +		.flags		= (clone_flags & ~CSIGNAL),
> +		.child_tid	= child_tidptr,
> +		.parent_tid	= parent_tidptr,
> +		.exit_signal	= (clone_flags & CSIGNAL),
> +		.stack		= stack_start,
> +		.stack_size	= stack_size,
> +	};
> +
> +	return _do_fork(&args);
>  }
>  #endif
>  
> @@ -2424,15 +2412,25 @@ long do_fork(unsigned long clone_flags,
>   */
>  pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags)
>  {
> -	return _do_fork(flags|CLONE_VM|CLONE_UNTRACED, (unsigned long)fn,
> -		(unsigned long)arg, NULL, NULL, 0);
> +	struct kernel_clone_args args = {
> +		.flags		= ((flags | CLONE_VM | CLONE_UNTRACED) & ~CSIGNAL),
> +		.exit_signal	= (flags & CSIGNAL),
> +		.stack		= (unsigned long)fn,
> +		.stack_size	= (unsigned long)arg,
> +	};
> +
> +	return _do_fork(&args);
>  }
>  
>  #ifdef __ARCH_WANT_SYS_FORK
>  SYSCALL_DEFINE0(fork)
>  {
>  #ifdef CONFIG_MMU
> -	return _do_fork(SIGCHLD, 0, 0, NULL, NULL, 0);
> +	struct kernel_clone_args args = {
> +		.exit_signal = SIGCHLD,
> +	};
> +
> +	return _do_fork(&args);
>  #else
>  	/* can not support in nommu mode */
>  	return -EINVAL;
> @@ -2443,8 +2441,12 @@ SYSCALL_DEFINE0(fork)
>  #ifdef __ARCH_WANT_SYS_VFORK
>  SYSCALL_DEFINE0(vfork)
>  {
> -	return _do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, 0,
> -			0, NULL, NULL, 0);
> +	struct kernel_clone_args args = {
> +		.flags		= CLONE_VFORK | CLONE_VM,
> +		.exit_signal	= SIGCHLD,
> +	};
> +
> +	return _do_fork(&args);
>  }
>  #endif
>  
> @@ -2472,7 +2474,110 @@ SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
>  		 unsigned long, tls)
>  #endif
>  {
> -	return _do_fork(clone_flags, newsp, 0, parent_tidptr, child_tidptr, tls);
> +	struct kernel_clone_args args = {
> +		.flags		= (clone_flags & ~CSIGNAL),
> +		.pidfd		= parent_tidptr,
> +		.child_tid	= child_tidptr,
> +		.parent_tid	= parent_tidptr,
> +		.exit_signal	= (clone_flags & CSIGNAL),
> +		.stack		= newsp,
> +		.tls		= tls,
> +	};
> +
> +	/* clone(CLONE_PIDFD) uses parent_tidptr to return a pidfd */
> +	if ((clone_flags & CLONE_PIDFD) && (clone_flags & CLONE_PARENT_SETTID))
> +		return -EINVAL;
> +
> +	return _do_fork(&args);
> +}
> +
> +noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
> +					      struct clone_args __user *uargs,
> +					      size_t size)
> +{
> +	struct clone_args args;
> +
> +	if (unlikely(size > PAGE_SIZE))
> +		return -E2BIG;
> +
> +	if (unlikely(size < sizeof(struct clone_args)))
> +		return -EINVAL;
> +
> +	if (unlikely(!access_ok(uargs, size)))
> +		return -EFAULT;
> +
> +	if (size > sizeof(struct clone_args)) {
> +		unsigned char __user *addr;
> +		unsigned char __user *end;
> +		unsigned char val;
> +
> +		addr = (void __user *)uargs + sizeof(struct clone_args);
> +		end = (void __user *)uargs + size;
> +
> +		for (; addr < end; addr++) {
> +			if (get_user(val, addr))
> +				return -EFAULT;
> +			if (val)
> +				return -E2BIG;
> +		}
> +
> +		size = sizeof(struct clone_args);
> +	}
> +
> +	if (copy_from_user(&args, uargs, size))
> +		return -EFAULT;
> +
> +	*kargs = (struct kernel_clone_args){
> +		.flags		= args.flags,
> +		.pidfd		= u64_to_user_ptr(args.pidfd),
> +		.child_tid	= u64_to_user_ptr(args.child_tid),
> +		.parent_tid	= u64_to_user_ptr(args.parent_tid),
> +		.exit_signal	= args.exit_signal,
> +		.stack		= args.stack,
> +		.stack_size	= args.stack_size,
> +		.tls		= args.tls,
> +	};
> +
> +	return 0;
> +}
> +
> +static bool clone3_args_valid(const struct kernel_clone_args *kargs)
> +{
> +	/*
> +	 * All lower bits of the flag word are taken.
> +	 * Verify that no other unknown flags are passed along.
> +	 */
> +	if (kargs->flags & ~CLONE_LEGACY_FLAGS)
> +		return false;
> +
> +	/*
> +	 * - make the CLONE_DETACHED bit reuseable for clone3
> +	 * - make the CSIGNAL bits reuseable for clone3
> +	 */
> +	if (kargs->flags & (CLONE_DETACHED | CSIGNAL))
> +		return false;
> +
> +	if ((kargs->flags & (CLONE_THREAD | CLONE_PARENT)) &&
> +	    kargs->exit_signal)
> +		return false;
> +
> +	return true;
> +}
> +
> +SYSCALL_DEFINE2(clone3, struct clone_args __user *, uargs, size_t, size)
> +{
> +	int err;
> +
> +	struct kernel_clone_args kargs;
> +
> +	err = copy_clone_args_from_user(&kargs, uargs, size);
> +	if (err)
> +		return err;
> +
> +	if (!clone3_args_valid(&kargs))
> +		return -EINVAL;
> +
> +	return _do_fork(&kargs);
>  }
>  #endif
>  
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] fork: add clone3
  2019-06-04 16:09 [PATCH v3 1/2] fork: add clone3 Christian Brauner
  2019-06-04 16:09 ` [PATCH v3 2/2] arch: wire-up clone3() syscall Christian Brauner
  2019-06-04 21:54 ` [PATCH v3 1/2] fork: add clone3 Christian Brauner
@ 2019-06-06 21:46 ` Serge E. Hallyn
  2019-06-08  8:15   ` Christian Brauner
  2 siblings, 1 reply; 23+ messages in thread
From: Serge E. Hallyn @ 2019-06-06 21:46 UTC (permalink / raw)
  To: Christian Brauner
  Cc: viro, linux-kernel, torvalds, jannh, keescook, fweimer, oleg,
	arnd, dhowells, Pavel Emelyanov, Andrew Morton, Adrian Reber,
	Andrei Vagin, linux-api

On Tue, Jun 04, 2019 at 06:09:43PM +0200, Christian Brauner wrote:
> This adds the clone3 system call.
> 
> As mentioned several times already (cf. [7], [8]) here's the promised
> patchset for clone3().
> 
> We recently merged the CLONE_PIDFD patchset (cf. [1]). It took the last
> free flag from clone().
> 
> Independent of the CLONE_PIDFD patchset a time namespace has been discussed
> at Linux Plumber Conference last year and has been sent out and reviewed
> (cf. [5]). It is expected that it will go upstream in the not too distant
> future. However, it relies on the addition of the CLONE_NEWTIME flag to
> clone(). The only other good candidate - CLONE_DETACHED - is currently not
> recyclable as we have identified at least two large or widely used
> codebases that currently pass this flag (cf. [2], [3], and [4]). Given that
> CLONE_PIDFD grabbed the last clone() flag the time namespace is effectively
> blocked. clone3() has the advantage that it will unblock this patchset
> again. In general, clone3() is extensible and allows for the implementation
> of new features.
> 
> The idea is to keep clone3() very simple and close to the original clone(),
> specifically, to keep on supporting old clone()-based workloads.
> We know there have been various creative proposals how a new process
> creation syscall or even api is supposed to look like. Some people even
> going so far as to argue that the traditional fork()+exec() split should be
> abandoned in favor of an in-kernel version of spawn(). Independent of
> whether or not we personally think spawn() is a good idea this patchset has
> and does not want to have anything to do with this.
> One stance we take is that there's no real good alternative to
> clone()+exec() and we need and want to support this model going forward;
> independent of spawn().
> The following requirements guided clone3():
> - bump the number of available flags
> - move arguments that are currently passed as separate arguments
>   in clone() into a dedicated struct clone_args
>   - choose a struct layout that is easy to handle on 32 and on 64 bit
>   - choose a struct layout that is extensible
>   - give new flags that currently need to abuse another flag's dedicated
>     return argument in clone() their own dedicated return argument
>     (e.g. CLONE_PIDFD)
>   - use a separate kernel internal struct kernel_clone_args that is
>     properly typed according to current kernel conventions in fork.c and is
>     different from  the uapi struct clone_args
> - port _do_fork() to use kernel_clone_args so that all process creation
>   syscalls such as fork(), vfork(), clone(), and clone3() behave identical
>   (Arnd suggested, that we can probably also port do_fork() itself in a
>    separate patchset.)
> - ease of transition for userspace from clone() to clone3()
>   This very much means that we do *not* remove functionality that userspace
>   currently relies on as the latter is a good way of creating a syscall
>   that won't be adopted.
> - do not try to be clever or complex: keep clone3() as dumb as possible
> 
> In accordance with Linus suggestions (cf. [11]), clone3() has the following
> signature:
> 
> /* uapi */
> struct clone_args {
>         __aligned_u64 flags;
>         __aligned_u64 pidfd;
>         __aligned_u64 child_tid;
>         __aligned_u64 parent_tid;
>         __aligned_u64 exit_signal;
>         __aligned_u64 stack;
>         __aligned_u64 stack_size;
>         __aligned_u64 tls;
> };
> 
> /* kernel internal */
> struct kernel_clone_args {
>         u64 flags;
>         int __user *pidfd;
>         int __user *child_tid;
>         int __user *parent_tid;
>         int exit_signal;
>         unsigned long stack;
>         unsigned long stack_size;
>         unsigned long tls;
> };
> 
> long sys_clone3(struct clone_args __user *uargs, size_t size)
> 
> clone3() cleanly supports all of the supported flags from clone() and thus
> all legacy workloads.
> The advantage of sticking close to the old clone() is the low cost for
> userspace to switch to this new api. Quite a lot of userspace apis (e.g.
> pthreads) are based on the clone() syscall. With the new clone3() syscall
> supporting all of the old workloads and opening up the ability to add new
> features should make switching to it for userspace more appealing. In
> essence, glibc can just write a simple wrapper to switch from clone() to
> clone3().
> 
> There has been some interest in this patchset already. We have received a
> patch from the CRIU corner for clone3() that would set the PID/TID of a
> restored process without /proc/sys/kernel/ns_last_pid to eliminate a race.
> 
> /* User visible differences to legacy clone() */
> - CLONE_DETACHED will cause EINVAL with clone3()
> - CSIGNAL is deprecated
>   It is superseeded by a dedicated "exit_signal" argument in struct
>   clone_args freeing up space for additional flags.
>   This is based on a suggestion from Andrei and Linus (cf. [9] and [10])
> 
> /* References */
> [1]: b3e5838252665ee4cfa76b82bdf1198dca81e5be
> [2]: https://dxr.mozilla.org/mozilla-central/source/security/sandbox/linux/SandboxFilter.cpp#343
> [3]: https://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_create.c#n233
> [4]: https://sources.debian.org/src/blcr/0.8.5-2.3/cr_module/cr_dump_self.c/?hl=740#L740
> [5]: https://lore.kernel.org/lkml/20190425161416.26600-1-dima@arista.com/
> [6]: https://lore.kernel.org/lkml/20190425161416.26600-2-dima@arista.com/
> [7]: https://lore.kernel.org/lkml/CAHrFyr5HxpGXA2YrKza-oB-GGwJCqwPfyhD-Y5wbktWZdt0sGQ@mail.gmail.com/
> [8]: https://lore.kernel.org/lkml/20190524102756.qjsjxukuq2f4t6bo@brauner.io/
> [9]: https://lore.kernel.org/lkml/20190529222414.GA6492@gmail.com/
> [10]: https://lore.kernel.org/lkml/CAHk-=whQP-Ykxi=zSYaV9iXsHsENa+2fdj-zYKwyeyed63Lsfw@mail.gmail.com/
> [11]: https://lore.kernel.org/lkml/CAHk-=wieuV4hGwznPsX-8E0G2FKhx3NjZ9X3dTKh5zKd+iqOBw@mail.gmail.com/
> 
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Christian Brauner <christian@brauner.io>

Acked-by: Serge Hallyn <serge@hallyn.com>

> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Pavel Emelyanov <xemul@virtuozzo.com>
> Cc: Jann Horn <jannh@google.com>
> Cc: David Howells <dhowells@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Adrian Reber <adrian@lisas.de>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Andrei Vagin <avagin@gmail.com>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Florian Weimer <fweimer@redhat.com>
> Cc: linux-api@vger.kernel.org
> ---
> v1:
> - Linus Torvalds <torvalds@linux-foundation.org>:
>   - redesign based on Linus proposal
>   - switch from arg-based to revision-based naming scheme: s/clone6/clone3/
> - Arnd Bergmann <arnd@arndb.de>:
>   - use a single copy_from_user() instead of multiple get_user() calls
>     since the latter have a constant overhead on some architectures
>   - a range of other tweaks and suggestions
> v2:
> - Linus Torvalds <torvalds@linux-foundation.org>,
>   Andrei Vagin <avagin@gmail.com>:
>   - replace CSIGNAL flag with dedicated exit_signal argument in struct
>     clone_args
> - Christian Brauner <christian@brauner.io>:
>   - improve naming for some struct clone_args members
> v3:
> - Arnd Bergmann <arnd@arndb.de>:
>   - replace memset with constructor for clarity and better object code
>   - call flag verification function clone3_flags_valid() on
>     kernel_clone_args instead of clone_args
>   - remove __ARCH_WANT_SYS_CLONE ifdefine around sys_clone3()
> - Christian Brauner <christian@brauner.io>:
>   - replace clone3_flags_valid() with clone3_args_valid() and call in
>     clone3() directly rather than in copy_clone_args_from_user()
>     This cleanly separates copying the args from userspace from the
>     verification whether those args are sane.
> - David Howells <dhowells@redhat.com>:
>   - align new struct member assignments with tabs
>   - replace CLONE_MAX by with a non-uapi exported CLONE_LEGACY_FLAGS and
>     define it as  0xffffffffULL for clarity
>   - make copy_clone_args_from_user() noinline
>   - avoid assigning to local variables from struct kernel_clone_args
>     members in cases where it makes sense
> ---
>  arch/x86/ia32/sys_ia32.c   |  12 ++-
>  include/linux/sched/task.h |  17 +++-
>  include/linux/syscalls.h   |   4 +
>  include/uapi/linux/sched.h |  16 +++
>  kernel/fork.c              | 201 ++++++++++++++++++++++++++++---------
>  5 files changed, 199 insertions(+), 51 deletions(-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] fork: add clone3
  2019-06-06 21:46 ` Serge E. Hallyn
@ 2019-06-08  8:15   ` Christian Brauner
  0 siblings, 0 replies; 23+ messages in thread
From: Christian Brauner @ 2019-06-08  8:15 UTC (permalink / raw)
  To: viro, linux-kernel, torvalds
  Cc: jannh, Serge E. Hallyn, keescook, fweimer, oleg, arnd, dhowells,
	Pavel Emelyanov, Andrew Morton, Adrian Reber, Andrei Vagin,
	linux-api

On Thu, Jun 06, 2019 at 04:46:45PM -0500, Serge Hallyn wrote:
> On Tue, Jun 04, 2019 at 06:09:43PM +0200, Christian Brauner wrote:
> > This adds the clone3 system call.
> > 
> > As mentioned several times already (cf. [7], [8]) here's the promised
> > patchset for clone3().
> > 
> > We recently merged the CLONE_PIDFD patchset (cf. [1]). It took the last
> > free flag from clone().
> > 
> > Independent of the CLONE_PIDFD patchset a time namespace has been discussed
> > at Linux Plumber Conference last year and has been sent out and reviewed
> > (cf. [5]). It is expected that it will go upstream in the not too distant
> > future. However, it relies on the addition of the CLONE_NEWTIME flag to
> > clone(). The only other good candidate - CLONE_DETACHED - is currently not
> > recyclable as we have identified at least two large or widely used
> > codebases that currently pass this flag (cf. [2], [3], and [4]). Given that
> > CLONE_PIDFD grabbed the last clone() flag the time namespace is effectively
> > blocked. clone3() has the advantage that it will unblock this patchset
> > again. In general, clone3() is extensible and allows for the implementation
> > of new features.
> > 
> > The idea is to keep clone3() very simple and close to the original clone(),
> > specifically, to keep on supporting old clone()-based workloads.
> > We know there have been various creative proposals how a new process
> > creation syscall or even api is supposed to look like. Some people even
> > going so far as to argue that the traditional fork()+exec() split should be
> > abandoned in favor of an in-kernel version of spawn(). Independent of
> > whether or not we personally think spawn() is a good idea this patchset has
> > and does not want to have anything to do with this.
> > One stance we take is that there's no real good alternative to
> > clone()+exec() and we need and want to support this model going forward;
> > independent of spawn().
> > The following requirements guided clone3():
> > - bump the number of available flags
> > - move arguments that are currently passed as separate arguments
> >   in clone() into a dedicated struct clone_args
> >   - choose a struct layout that is easy to handle on 32 and on 64 bit
> >   - choose a struct layout that is extensible
> >   - give new flags that currently need to abuse another flag's dedicated
> >     return argument in clone() their own dedicated return argument
> >     (e.g. CLONE_PIDFD)
> >   - use a separate kernel internal struct kernel_clone_args that is
> >     properly typed according to current kernel conventions in fork.c and is
> >     different from  the uapi struct clone_args
> > - port _do_fork() to use kernel_clone_args so that all process creation
> >   syscalls such as fork(), vfork(), clone(), and clone3() behave identical
> >   (Arnd suggested, that we can probably also port do_fork() itself in a
> >    separate patchset.)
> > - ease of transition for userspace from clone() to clone3()
> >   This very much means that we do *not* remove functionality that userspace
> >   currently relies on as the latter is a good way of creating a syscall
> >   that won't be adopted.
> > - do not try to be clever or complex: keep clone3() as dumb as possible
> > 
> > In accordance with Linus suggestions (cf. [11]), clone3() has the following
> > signature:
> > 
> > /* uapi */
> > struct clone_args {
> >         __aligned_u64 flags;
> >         __aligned_u64 pidfd;
> >         __aligned_u64 child_tid;
> >         __aligned_u64 parent_tid;
> >         __aligned_u64 exit_signal;
> >         __aligned_u64 stack;
> >         __aligned_u64 stack_size;
> >         __aligned_u64 tls;
> > };
> > 
> > /* kernel internal */
> > struct kernel_clone_args {
> >         u64 flags;
> >         int __user *pidfd;
> >         int __user *child_tid;
> >         int __user *parent_tid;
> >         int exit_signal;
> >         unsigned long stack;
> >         unsigned long stack_size;
> >         unsigned long tls;
> > };
> > 
> > long sys_clone3(struct clone_args __user *uargs, size_t size)
> > 
> > clone3() cleanly supports all of the supported flags from clone() and thus
> > all legacy workloads.
> > The advantage of sticking close to the old clone() is the low cost for
> > userspace to switch to this new api. Quite a lot of userspace apis (e.g.
> > pthreads) are based on the clone() syscall. With the new clone3() syscall
> > supporting all of the old workloads and opening up the ability to add new
> > features should make switching to it for userspace more appealing. In
> > essence, glibc can just write a simple wrapper to switch from clone() to
> > clone3().
> > 
> > There has been some interest in this patchset already. We have received a
> > patch from the CRIU corner for clone3() that would set the PID/TID of a
> > restored process without /proc/sys/kernel/ns_last_pid to eliminate a race.
> > 
> > /* User visible differences to legacy clone() */
> > - CLONE_DETACHED will cause EINVAL with clone3()
> > - CSIGNAL is deprecated
> >   It is superseeded by a dedicated "exit_signal" argument in struct
> >   clone_args freeing up space for additional flags.
> >   This is based on a suggestion from Andrei and Linus (cf. [9] and [10])
> > 
> > /* References */
> > [1]: b3e5838252665ee4cfa76b82bdf1198dca81e5be
> > [2]: https://dxr.mozilla.org/mozilla-central/source/security/sandbox/linux/SandboxFilter.cpp#343
> > [3]: https://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_create.c#n233
> > [4]: https://sources.debian.org/src/blcr/0.8.5-2.3/cr_module/cr_dump_self.c/?hl=740#L740
> > [5]: https://lore.kernel.org/lkml/20190425161416.26600-1-dima@arista.com/
> > [6]: https://lore.kernel.org/lkml/20190425161416.26600-2-dima@arista.com/
> > [7]: https://lore.kernel.org/lkml/CAHrFyr5HxpGXA2YrKza-oB-GGwJCqwPfyhD-Y5wbktWZdt0sGQ@mail.gmail.com/
> > [8]: https://lore.kernel.org/lkml/20190524102756.qjsjxukuq2f4t6bo@brauner.io/
> > [9]: https://lore.kernel.org/lkml/20190529222414.GA6492@gmail.com/
> > [10]: https://lore.kernel.org/lkml/CAHk-=whQP-Ykxi=zSYaV9iXsHsENa+2fdj-zYKwyeyed63Lsfw@mail.gmail.com/
> > [11]: https://lore.kernel.org/lkml/CAHk-=wieuV4hGwznPsX-8E0G2FKhx3NjZ9X3dTKh5zKd+iqOBw@mail.gmail.com/
> > 
> > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> > Signed-off-by: Christian Brauner <christian@brauner.io>
> 
> Acked-by: Serge Hallyn <serge@hallyn.com>

This also carries an Ack by Arnd and there don't seem to be technical
issues anymore.
So I'm going to move this over into my for-next branch targeting 5.3 to
see some testing.

Thanks!
Christian

> 
> > Cc: Arnd Bergmann <arnd@arndb.de>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Pavel Emelyanov <xemul@virtuozzo.com>
> > Cc: Jann Horn <jannh@google.com>
> > Cc: David Howells <dhowells@redhat.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > Cc: Adrian Reber <adrian@lisas.de>
> > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > Cc: Andrei Vagin <avagin@gmail.com>
> > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > Cc: Florian Weimer <fweimer@redhat.com>
> > Cc: linux-api@vger.kernel.org
> > ---
> > v1:
> > - Linus Torvalds <torvalds@linux-foundation.org>:
> >   - redesign based on Linus proposal
> >   - switch from arg-based to revision-based naming scheme: s/clone6/clone3/
> > - Arnd Bergmann <arnd@arndb.de>:
> >   - use a single copy_from_user() instead of multiple get_user() calls
> >     since the latter have a constant overhead on some architectures
> >   - a range of other tweaks and suggestions
> > v2:
> > - Linus Torvalds <torvalds@linux-foundation.org>,
> >   Andrei Vagin <avagin@gmail.com>:
> >   - replace CSIGNAL flag with dedicated exit_signal argument in struct
> >     clone_args
> > - Christian Brauner <christian@brauner.io>:
> >   - improve naming for some struct clone_args members
> > v3:
> > - Arnd Bergmann <arnd@arndb.de>:
> >   - replace memset with constructor for clarity and better object code
> >   - call flag verification function clone3_flags_valid() on
> >     kernel_clone_args instead of clone_args
> >   - remove __ARCH_WANT_SYS_CLONE ifdefine around sys_clone3()
> > - Christian Brauner <christian@brauner.io>:
> >   - replace clone3_flags_valid() with clone3_args_valid() and call in
> >     clone3() directly rather than in copy_clone_args_from_user()
> >     This cleanly separates copying the args from userspace from the
> >     verification whether those args are sane.
> > - David Howells <dhowells@redhat.com>:
> >   - align new struct member assignments with tabs
> >   - replace CLONE_MAX by with a non-uapi exported CLONE_LEGACY_FLAGS and
> >     define it as  0xffffffffULL for clarity
> >   - make copy_clone_args_from_user() noinline
> >   - avoid assigning to local variables from struct kernel_clone_args
> >     members in cases where it makes sense
> > ---
> >  arch/x86/ia32/sys_ia32.c   |  12 ++-
> >  include/linux/sched/task.h |  17 +++-
> >  include/linux/syscalls.h   |   4 +
> >  include/uapi/linux/sched.h |  16 +++
> >  kernel/fork.c              | 201 ++++++++++++++++++++++++++++---------
> >  5 files changed, 199 insertions(+), 51 deletions(-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] arch: wire-up clone3() syscall
  2019-06-04 16:09 ` [PATCH v3 2/2] arch: wire-up clone3() syscall Christian Brauner
  2019-06-04 18:40   ` Arnd Bergmann
@ 2019-06-20 18:44   ` Guenter Roeck
  2019-06-20 22:10     ` Christian Brauner
  1 sibling, 1 reply; 23+ messages in thread
From: Guenter Roeck @ 2019-06-20 18:44 UTC (permalink / raw)
  To: Christian Brauner
  Cc: viro, linux-kernel, torvalds, jannh, keescook, fweimer, oleg,
	arnd, dhowells, Andrew Morton, Adrian Reber, linux-api,
	linux-arch, x86

On Tue, Jun 04, 2019 at 06:09:44PM +0200, Christian Brauner wrote:
> Wire up the clone3() call on all arches that don't require hand-rolled
> assembly.
> 
> Some of the arches look like they need special assembly massaging and it is
> probably smarter if the appropriate arch maintainers would do the actual
> wiring. Arches that are wired-up are:
> - x86{_32,64}
> - arm{64}
> - xtensa
> 

This patch results in build failures on various architecetures.

h8300-linux-ld: arch/h8300/kernel/syscalls.o:(.data+0x6d0): undefined reference to `sys_clone3'

nios2-linux-ld: arch/nios2/kernel/syscall_table.o:(.data+0x6d0): undefined reference to `sys_clone3'

There may be others; -next is in too bad shape right now to get a complete
picture. Wondering, though: What is special with this syscall ? Normally
one would only get a warning that a syscall is not wired up.

Guenter

> Signed-off-by: Christian Brauner <christian@brauner.io>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: David Howells <dhowells@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Adrian Reber <adrian@lisas.de>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Florian Weimer <fweimer@redhat.com>
> Cc: linux-api@vger.kernel.org
> Cc: linux-arch@vger.kernel.org
> Cc: x86@kernel.org
> ---
> v1: unchanged
> v2: unchanged
> v3:
> - Christian Brauner <christian@brauner.io>:
>   - wire up clone3 on all arches that don't have hand-rolled entry points
>     for clone
> ---
>  arch/arm/tools/syscall.tbl                  | 1 +
>  arch/arm64/include/asm/unistd.h             | 2 +-
>  arch/arm64/include/asm/unistd32.h           | 2 ++
>  arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
>  arch/x86/entry/syscalls/syscall_32.tbl      | 1 +
>  arch/x86/entry/syscalls/syscall_64.tbl      | 1 +
>  arch/xtensa/kernel/syscalls/syscall.tbl     | 1 +
>  include/uapi/asm-generic/unistd.h           | 4 +++-
>  8 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
> index aaf479a9e92d..e99a82bdb93a 100644
> --- a/arch/arm/tools/syscall.tbl
> +++ b/arch/arm/tools/syscall.tbl
> @@ -447,3 +447,4 @@
>  431	common	fsconfig			sys_fsconfig
>  432	common	fsmount				sys_fsmount
>  433	common	fspick				sys_fspick
> +436	common	clone3				sys_clone3
> diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
> index 70e6882853c0..24480c2d95da 100644
> --- a/arch/arm64/include/asm/unistd.h
> +++ b/arch/arm64/include/asm/unistd.h
> @@ -44,7 +44,7 @@
>  #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
>  #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
>  
> -#define __NR_compat_syscalls		434
> +#define __NR_compat_syscalls		437
>  #endif
>  
>  #define __ARCH_WANT_SYS_CLONE
> diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
> index c39e90600bb3..b144ea675d70 100644
> --- a/arch/arm64/include/asm/unistd32.h
> +++ b/arch/arm64/include/asm/unistd32.h
> @@ -886,6 +886,8 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig)
>  __SYSCALL(__NR_fsmount, sys_fsmount)
>  #define __NR_fspick 433
>  __SYSCALL(__NR_fspick, sys_fspick)
> +#define __NR_clone3 436
> +__SYSCALL(__NR_clone3, sys_clone3)
>  
>  /*
>   * Please add new compat syscalls above this comment and update
> diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
> index 26339e417695..3110440bcc31 100644
> --- a/arch/microblaze/kernel/syscalls/syscall.tbl
> +++ b/arch/microblaze/kernel/syscalls/syscall.tbl
> @@ -439,3 +439,4 @@
>  431	common	fsconfig			sys_fsconfig
>  432	common	fsmount				sys_fsmount
>  433	common	fspick				sys_fspick
> +436	common	clone3				sys_clone3
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index ad968b7bac72..80e26211feff 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -438,3 +438,4 @@
>  431	i386	fsconfig		sys_fsconfig			__ia32_sys_fsconfig
>  432	i386	fsmount			sys_fsmount			__ia32_sys_fsmount
>  433	i386	fspick			sys_fspick			__ia32_sys_fspick
> +436	i386	clone3			sys_clone3			__ia32_sys_clone3
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index b4e6f9e6204a..7968f0b5b5e8 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -355,6 +355,7 @@
>  431	common	fsconfig		__x64_sys_fsconfig
>  432	common	fsmount			__x64_sys_fsmount
>  433	common	fspick			__x64_sys_fspick
> +436	common	clone3			__x64_sys_clone3/ptregs
>  
>  #
>  # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
> index 5fa0ee1c8e00..b2767c8c2b4e 100644
> --- a/arch/xtensa/kernel/syscalls/syscall.tbl
> +++ b/arch/xtensa/kernel/syscalls/syscall.tbl
> @@ -404,3 +404,4 @@
>  431	common	fsconfig			sys_fsconfig
>  432	common	fsmount				sys_fsmount
>  433	common	fspick				sys_fspick
> +436	common	clone3				sys_clone3
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index a87904daf103..45bc87687c47 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -844,9 +844,11 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig)
>  __SYSCALL(__NR_fsmount, sys_fsmount)
>  #define __NR_fspick 433
>  __SYSCALL(__NR_fspick, sys_fspick)
> +#define __NR_clone3 436
> +__SYSCALL(__NR_clone3, sys_clone3)
>  
>  #undef __NR_syscalls
> -#define __NR_syscalls 434
> +#define __NR_syscalls 437
>  
>  /*
>   * 32 bit systems traditionally used different

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] arch: wire-up clone3() syscall
  2019-06-20 18:44   ` [PATCH v3 2/2] arch: wire-up clone3() syscall Guenter Roeck
@ 2019-06-20 22:10     ` Christian Brauner
  2019-06-21  9:37       ` Arnd Bergmann
  0 siblings, 1 reply; 23+ messages in thread
From: Christian Brauner @ 2019-06-20 22:10 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: viro, linux-kernel, torvalds, jannh, keescook, fweimer, oleg,
	arnd, dhowells, Andrew Morton, Adrian Reber, linux-api,
	linux-arch, x86

On Thu, Jun 20, 2019 at 11:44:51AM -0700, Guenter Roeck wrote:
> On Tue, Jun 04, 2019 at 06:09:44PM +0200, Christian Brauner wrote:
> > Wire up the clone3() call on all arches that don't require hand-rolled
> > assembly.
> > 
> > Some of the arches look like they need special assembly massaging and it is
> > probably smarter if the appropriate arch maintainers would do the actual
> > wiring. Arches that are wired-up are:
> > - x86{_32,64}
> > - arm{64}
> > - xtensa
> > 
> 
> This patch results in build failures on various architecetures.
> 
> h8300-linux-ld: arch/h8300/kernel/syscalls.o:(.data+0x6d0): undefined reference to `sys_clone3'
> 
> nios2-linux-ld: arch/nios2/kernel/syscall_table.o:(.data+0x6d0): undefined reference to `sys_clone3'
> 
> There may be others; -next is in too bad shape right now to get a complete
> picture. Wondering, though: What is special with this syscall ? Normally
> one would only get a warning that a syscall is not wired up.

clone3() was placed under __ARCH_WANT_SYS_CLONE. Most architectures
simply define __ARCH_WANT_SYS_CLONE and are done with it.
Some however, such as nios2 and h8300 don't define it but instead
provide a sys_clone stub of their own because of architectural
requirements (or tweaks) and they are mostly written in assembly. (That
should be left to arch maintainers for sys_clone3.)

The build failures were on my radar already. I hadn't yet replied
since I haven't pushed the fixup below.
The solution is to define __ARCH_WANT_SYS_CLONE3 and add a
cond_syscall(clone3) so we catch all architectures that do not yet
provide clone3 with a ENOSYS until maintainers have added it.

diff --git a/arch/arm/include/asm/unistd.h b/arch/arm/include/asm/unistd.h
index 7a39e77984ef..aa35aa5d68dc 100644
--- a/arch/arm/include/asm/unistd.h
+++ b/arch/arm/include/asm/unistd.h
@@ -40,6 +40,7 @@
 #define __ARCH_WANT_SYS_FORK
 #define __ARCH_WANT_SYS_VFORK
 #define __ARCH_WANT_SYS_CLONE
+#define __ARCH_WANT_SYS_CLONE3
 
 /*
  * Unimplemented (or alternatively implemented) syscalls
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 24480c2d95da..e4e0523102e2 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -48,6 +48,7 @@
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
+#define __ARCH_WANT_SYS_CLONE3
 
 #ifndef __COMPAT_SYSCALL_NR
 #include <uapi/asm/unistd.h>
diff --git a/arch/x86/include/asm/unistd.h b/arch/x86/include/asm/unistd.h
index 146859efd83c..097589753fec 100644
--- a/arch/x86/include/asm/unistd.h
+++ b/arch/x86/include/asm/unistd.h
@@ -54,5 +54,6 @@
 # define __ARCH_WANT_SYS_FORK
 # define __ARCH_WANT_SYS_VFORK
 # define __ARCH_WANT_SYS_CLONE
+# define __ARCH_WANT_SYS_CLONE3
 
 #endif /* _ASM_X86_UNISTD_H */
diff --git a/arch/xtensa/include/asm/unistd.h b/arch/xtensa/include/asm/unistd.h
index 30af4dc3ce7b..b52236245e51 100644
--- a/arch/xtensa/include/asm/unistd.h
+++ b/arch/xtensa/include/asm/unistd.h
@@ -3,6 +3,7 @@
 #define _XTENSA_UNISTD_H
 
 #define __ARCH_WANT_SYS_CLONE
+#define __ARCH_WANT_SYS_CLONE3
 #include <uapi/asm/unistd.h>
 
 #define __ARCH_WANT_NEW_STAT
diff --git a/kernel/fork.c b/kernel/fork.c
index 08ff131f26b4..98abea995629 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2490,7 +2490,9 @@ SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
 
 	return _do_fork(&args);
 }
+#endif
 
+#ifdef __ARCH_WANT_SYS_CLONE3
 noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
 					      struct clone_args __user *uargs,
 					      size_t size)
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 4d9ae5ea6caf..34b76895b81e 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -137,6 +137,8 @@ COND_SYSCALL(capset);
 /* kernel/exit.c */
 
 /* kernel/fork.c */
+/* __ARCH_WANT_SYS_CLONE3 */
+COND_SYSCALL(clone3);
 
 /* kernel/futex.c */
 COND_SYSCALL(futex);

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] arch: wire-up clone3() syscall
  2019-06-20 22:10     ` Christian Brauner
@ 2019-06-21  9:37       ` Arnd Bergmann
  2019-06-21 11:18         ` Christian Brauner
  0 siblings, 1 reply; 23+ messages in thread
From: Arnd Bergmann @ 2019-06-21  9:37 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Guenter Roeck, Al Viro, Linux Kernel Mailing List,
	Linus Torvalds, Jann Horn, Kees Cook, Florian Weimer,
	Oleg Nesterov, David Howells, Andrew Morton, Adrian Reber,
	Linux API, linux-arch, the arch/x86 maintainers

On Fri, Jun 21, 2019 at 12:10 AM Christian Brauner <christian@brauner.io> wrote:
> On Thu, Jun 20, 2019 at 11:44:51AM -0700, Guenter Roeck wrote:
> > On Tue, Jun 04, 2019 at 06:09:44PM +0200, Christian Brauner wrote:
>
> clone3() was placed under __ARCH_WANT_SYS_CLONE. Most architectures
> simply define __ARCH_WANT_SYS_CLONE and are done with it.
> Some however, such as nios2 and h8300 don't define it but instead
> provide a sys_clone stub of their own because of architectural
> requirements (or tweaks) and they are mostly written in assembly. (That
> should be left to arch maintainers for sys_clone3.)
>
> The build failures were on my radar already. I hadn't yet replied
> since I haven't pushed the fixup below.
> The solution is to define __ARCH_WANT_SYS_CLONE3 and add a
> cond_syscall(clone3) so we catch all architectures that do not yet
> provide clone3 with a ENOSYS until maintainers have added it.
>
> diff --git a/arch/arm/include/asm/unistd.h b/arch/arm/include/asm/unistd.h
> index 7a39e77984ef..aa35aa5d68dc 100644
> --- a/arch/arm/include/asm/unistd.h
> +++ b/arch/arm/include/asm/unistd.h
> @@ -40,6 +40,7 @@
>  #define __ARCH_WANT_SYS_FORK
>  #define __ARCH_WANT_SYS_VFORK
>  #define __ARCH_WANT_SYS_CLONE
> +#define __ARCH_WANT_SYS_CLONE3

I never really liked having __ARCH_WANT_SYS_CLONE here
because it was the only one that a new architecture needed to
set: all the other __ARCH_WANT_* are for system calls that
are already superseded by newer ones, so a new architecture
would start out with an empty list.

Since __ARCH_WANT_SYS_CLONE3 replaces
__ARCH_WANT_SYS_CLONE for new architectures, how about
leaving __ARCH_WANT_SYS_CLONE untouched but instead
coming up with the reverse for clone3 and mark the architectures
that specifically don't want it (if any)?

       Arnd

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] arch: wire-up clone3() syscall
  2019-06-21  9:37       ` Arnd Bergmann
@ 2019-06-21 11:18         ` Christian Brauner
  2019-06-21 14:20             ` Arnd Bergmann
  0 siblings, 1 reply; 23+ messages in thread
From: Christian Brauner @ 2019-06-21 11:18 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Guenter Roeck, Al Viro, Linux Kernel Mailing List,
	Linus Torvalds, Jann Horn, Kees Cook, Florian Weimer,
	Oleg Nesterov, David Howells, Andrew Morton, Adrian Reber,
	Linux API, linux-arch, the arch/x86 maintainers

On Fri, Jun 21, 2019 at 11:37:50AM +0200, Arnd Bergmann wrote:
> On Fri, Jun 21, 2019 at 12:10 AM Christian Brauner <christian@brauner.io> wrote:
> > On Thu, Jun 20, 2019 at 11:44:51AM -0700, Guenter Roeck wrote:
> > > On Tue, Jun 04, 2019 at 06:09:44PM +0200, Christian Brauner wrote:
> >
> > clone3() was placed under __ARCH_WANT_SYS_CLONE. Most architectures
> > simply define __ARCH_WANT_SYS_CLONE and are done with it.
> > Some however, such as nios2 and h8300 don't define it but instead
> > provide a sys_clone stub of their own because of architectural
> > requirements (or tweaks) and they are mostly written in assembly. (That
> > should be left to arch maintainers for sys_clone3.)
> >
> > The build failures were on my radar already. I hadn't yet replied
> > since I haven't pushed the fixup below.
> > The solution is to define __ARCH_WANT_SYS_CLONE3 and add a
> > cond_syscall(clone3) so we catch all architectures that do not yet
> > provide clone3 with a ENOSYS until maintainers have added it.
> >
> > diff --git a/arch/arm/include/asm/unistd.h b/arch/arm/include/asm/unistd.h
> > index 7a39e77984ef..aa35aa5d68dc 100644
> > --- a/arch/arm/include/asm/unistd.h
> > +++ b/arch/arm/include/asm/unistd.h
> > @@ -40,6 +40,7 @@
> >  #define __ARCH_WANT_SYS_FORK
> >  #define __ARCH_WANT_SYS_VFORK
> >  #define __ARCH_WANT_SYS_CLONE
> > +#define __ARCH_WANT_SYS_CLONE3
> 
> I never really liked having __ARCH_WANT_SYS_CLONE here
> because it was the only one that a new architecture needed to
> set: all the other __ARCH_WANT_* are for system calls that
> are already superseded by newer ones, so a new architecture
> would start out with an empty list.
> 
> Since __ARCH_WANT_SYS_CLONE3 replaces
> __ARCH_WANT_SYS_CLONE for new architectures, how about
> leaving __ARCH_WANT_SYS_CLONE untouched but instead

__ARCH_WANT_SYS_CLONE is left untouched. :)

> coming up with the reverse for clone3 and mark the architectures
> that specifically don't want it (if any)?

Afaict, your suggestion is more or less the same thing what is done
here. So I'm not sure it buys us anything apart from future
architectures not needing to set __ARCH_WANT_SYS_CLONE3.

I expect the macro above to be only here temporarily until all arches
have caught up and we're sure that they don't require assembly stubs
(cf. [1]). A decision I'd leave to the maintainers (since even for
nios2 we were kind of on the fence what exactly the sys_clone stub was
supposed to do).

But I'm happy to take a patch from you if it's equally or more simple
than this one right here.

In any case, linux-next should be fine on all arches with this fixup
now.

Christian


[1]: Architectures such as nios2 or h8300 simply take the asm-generic
     syscall definitions and generate their syscall table from it. But
     since they don't define __ARCH_WANT_SYS_CLONE the build would fail
     complaining about sys_clone3 missing. The reason this doesn't
     happen for legacy clone is that nios2 and h8300 provide assembly
     stubs for sys_clone but they don't for sys_clone3.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] arch: wire-up clone3() syscall
  2019-06-21 11:18         ` Christian Brauner
@ 2019-06-21 14:20             ` Arnd Bergmann
  0 siblings, 0 replies; 23+ messages in thread
From: Arnd Bergmann @ 2019-06-21 14:20 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Guenter Roeck, Al Viro, Linux Kernel Mailing List,
	Linus Torvalds, Jann Horn, Kees Cook, Florian Weimer,
	Oleg Nesterov, David Howells, Andrew Morton, Adrian Reber,
	Linux API, linux-arch, the arch/x86 maintainers, Ley Foon Tan,
	moderated list:NIOS2 ARCHITECTURE

On Fri, Jun 21, 2019 at 1:18 PM Christian Brauner <christian@brauner.io> wrote:
> On Fri, Jun 21, 2019 at 11:37:50AM +0200, Arnd Bergmann wrote:
> >
> > I never really liked having __ARCH_WANT_SYS_CLONE here
> > because it was the only one that a new architecture needed to
> > set: all the other __ARCH_WANT_* are for system calls that
> > are already superseded by newer ones, so a new architecture
> > would start out with an empty list.
> >
> > Since __ARCH_WANT_SYS_CLONE3 replaces
> > __ARCH_WANT_SYS_CLONE for new architectures, how about
> > leaving __ARCH_WANT_SYS_CLONE untouched but instead
>
> __ARCH_WANT_SYS_CLONE is left untouched. :)
>
> > coming up with the reverse for clone3 and mark the architectures
> > that specifically don't want it (if any)?
>
> Afaict, your suggestion is more or less the same thing what is done
> here. So I'm not sure it buys us anything apart from future
> architectures not needing to set __ARCH_WANT_SYS_CLONE3.
>
> I expect the macro above to be only here temporarily until all arches
> have caught up and we're sure that they don't require assembly stubs
> (cf. [1]). A decision I'd leave to the maintainers (since even for
> nios2 we were kind of on the fence what exactly the sys_clone stub was
> supposed to do).
>
> But I'm happy to take a patch from you if it's equally or more simple
> than this one right here.
>
> In any case, linux-next should be fine on all arches with this fixup
> now.

I've looked at bit more closely at the nios2 implementation, and I
believe this is purely an artifact of this file being copied over
from m68k, which also has an odd definition. The glibc side
of nios2 clone() is also odd in other ways, but that appears
to be unrelated to the kernel ABI.

I think the best option here would be to not have any special
cases and just hook up clone3() the same way on all
architectures, with no #ifdef at all. If it turns out to not work
on a particular architecture later, they can still disable the
syscall then.

      Arnd

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] arch: wire-up clone3() syscall
@ 2019-06-21 14:20             ` Arnd Bergmann
  0 siblings, 0 replies; 23+ messages in thread
From: Arnd Bergmann @ 2019-06-21 14:20 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Guenter Roeck, Al Viro, Linux Kernel Mailing List,
	Linus Torvalds, Jann Horn, Kees Cook, Florian Weimer,
	Oleg Nesterov, David Howells, Andrew Morton, Adrian Reber,
	Linux API, linux-arch, the arch/x86 maintainers, Ley Foon Tan,
	moderated list:NIOS2 ARCHITECTURE

On Fri, Jun 21, 2019 at 1:18 PM Christian Brauner <christian@brauner.io> wrote:
> On Fri, Jun 21, 2019 at 11:37:50AM +0200, Arnd Bergmann wrote:
> >
> > I never really liked having __ARCH_WANT_SYS_CLONE here
> > because it was the only one that a new architecture needed to
> > set: all the other __ARCH_WANT_* are for system calls that
> > are already superseded by newer ones, so a new architecture
> > would start out with an empty list.
> >
> > Since __ARCH_WANT_SYS_CLONE3 replaces
> > __ARCH_WANT_SYS_CLONE for new architectures, how about
> > leaving __ARCH_WANT_SYS_CLONE untouched but instead
>
> __ARCH_WANT_SYS_CLONE is left untouched. :)
>
> > coming up with the reverse for clone3 and mark the architectures
> > that specifically don't want it (if any)?
>
> Afaict, your suggestion is more or less the same thing what is done
> here. So I'm not sure it buys us anything apart from future
> architectures not needing to set __ARCH_WANT_SYS_CLONE3.
>
> I expect the macro above to be only here temporarily until all arches
> have caught up and we're sure that they don't require assembly stubs
> (cf. [1]). A decision I'd leave to the maintainers (since even for
> nios2 we were kind of on the fence what exactly the sys_clone stub was
> supposed to do).
>
> But I'm happy to take a patch from you if it's equally or more simple
> than this one right here.
>
> In any case, linux-next should be fine on all arches with this fixup
> now.

I've looked at bit more closely at the nios2 implementation, and I
believe this is purely an artifact of this file being copied over
from m68k, which also has an odd definition. The glibc side
of nios2 clone() is also odd in other ways, but that appears
to be unrelated to the kernel ABI.

I think the best option here would be to not have any special
cases and just hook up clone3() the same way on all
architectures, with no #ifdef at all. If it turns out to not work
on a particular architecture later, they can still disable the
syscall then.

      Arnd

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] arch: wire-up clone3() syscall
  2019-06-21 14:20             ` Arnd Bergmann
@ 2019-06-21 15:30               ` Christian Brauner
  -1 siblings, 0 replies; 23+ messages in thread
From: Christian Brauner @ 2019-06-21 15:30 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Guenter Roeck, Al Viro, Linux Kernel Mailing List,
	Linus Torvalds, Jann Horn, Kees Cook, Florian Weimer,
	Oleg Nesterov, David Howells, Andrew Morton, Adrian Reber,
	Linux API, linux-arch, the arch/x86 maintainers, Ley Foon Tan,
	moderated list:NIOS2 ARCHITECTURE

On Fri, Jun 21, 2019 at 04:20:15PM +0200, Arnd Bergmann wrote:
> On Fri, Jun 21, 2019 at 1:18 PM Christian Brauner <christian@brauner.io> wrote:
> > On Fri, Jun 21, 2019 at 11:37:50AM +0200, Arnd Bergmann wrote:
> > >
> > > I never really liked having __ARCH_WANT_SYS_CLONE here
> > > because it was the only one that a new architecture needed to
> > > set: all the other __ARCH_WANT_* are for system calls that
> > > are already superseded by newer ones, so a new architecture
> > > would start out with an empty list.
> > >
> > > Since __ARCH_WANT_SYS_CLONE3 replaces
> > > __ARCH_WANT_SYS_CLONE for new architectures, how about
> > > leaving __ARCH_WANT_SYS_CLONE untouched but instead
> >
> > __ARCH_WANT_SYS_CLONE is left untouched. :)
> >
> > > coming up with the reverse for clone3 and mark the architectures
> > > that specifically don't want it (if any)?
> >
> > Afaict, your suggestion is more or less the same thing what is done
> > here. So I'm not sure it buys us anything apart from future
> > architectures not needing to set __ARCH_WANT_SYS_CLONE3.
> >
> > I expect the macro above to be only here temporarily until all arches
> > have caught up and we're sure that they don't require assembly stubs
> > (cf. [1]). A decision I'd leave to the maintainers (since even for
> > nios2 we were kind of on the fence what exactly the sys_clone stub was
> > supposed to do).
> >
> > But I'm happy to take a patch from you if it's equally or more simple
> > than this one right here.
> >
> > In any case, linux-next should be fine on all arches with this fixup
> > now.
> 
> I've looked at bit more closely at the nios2 implementation, and I
> believe this is purely an artifact of this file being copied over
> from m68k, which also has an odd definition. The glibc side
> of nios2 clone() is also odd in other ways, but that appears
> to be unrelated to the kernel ABI.
> 
> I think the best option here would be to not have any special
> cases and just hook up clone3() the same way on all
> architectures, with no #ifdef at all. If it turns out to not work
> on a particular architecture later, they can still disable the
> syscall then.

Hm, if you believe that this is fine and want to "vouch" for it by
whipping up a patch that replaces the wiring up done in [1] I'm happy to
take it. :) Otherwise I'd feel more comfortable not adding all arches at
once.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=clone

Christian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] arch: wire-up clone3() syscall
@ 2019-06-21 15:30               ` Christian Brauner
  0 siblings, 0 replies; 23+ messages in thread
From: Christian Brauner @ 2019-06-21 15:30 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Guenter Roeck, Al Viro, Linux Kernel Mailing List,
	Linus Torvalds, Jann Horn, Kees Cook, Florian Weimer,
	Oleg Nesterov, David Howells, Andrew Morton, Adrian Reber,
	Linux API, linux-arch, the arch/x86 maintainers, Ley Foon Tan,
	moderated list:NIOS2 ARCHITECTURE

On Fri, Jun 21, 2019 at 04:20:15PM +0200, Arnd Bergmann wrote:
> On Fri, Jun 21, 2019 at 1:18 PM Christian Brauner <christian@brauner.io> wrote:
> > On Fri, Jun 21, 2019 at 11:37:50AM +0200, Arnd Bergmann wrote:
> > >
> > > I never really liked having __ARCH_WANT_SYS_CLONE here
> > > because it was the only one that a new architecture needed to
> > > set: all the other __ARCH_WANT_* are for system calls that
> > > are already superseded by newer ones, so a new architecture
> > > would start out with an empty list.
> > >
> > > Since __ARCH_WANT_SYS_CLONE3 replaces
> > > __ARCH_WANT_SYS_CLONE for new architectures, how about
> > > leaving __ARCH_WANT_SYS_CLONE untouched but instead
> >
> > __ARCH_WANT_SYS_CLONE is left untouched. :)
> >
> > > coming up with the reverse for clone3 and mark the architectures
> > > that specifically don't want it (if any)?
> >
> > Afaict, your suggestion is more or less the same thing what is done
> > here. So I'm not sure it buys us anything apart from future
> > architectures not needing to set __ARCH_WANT_SYS_CLONE3.
> >
> > I expect the macro above to be only here temporarily until all arches
> > have caught up and we're sure that they don't require assembly stubs
> > (cf. [1]). A decision I'd leave to the maintainers (since even for
> > nios2 we were kind of on the fence what exactly the sys_clone stub was
> > supposed to do).
> >
> > But I'm happy to take a patch from you if it's equally or more simple
> > than this one right here.
> >
> > In any case, linux-next should be fine on all arches with this fixup
> > now.
> 
> I've looked at bit more closely at the nios2 implementation, and I
> believe this is purely an artifact of this file being copied over
> from m68k, which also has an odd definition. The glibc side
> of nios2 clone() is also odd in other ways, but that appears
> to be unrelated to the kernel ABI.
> 
> I think the best option here would be to not have any special
> cases and just hook up clone3() the same way on all
> architectures, with no #ifdef at all. If it turns out to not work
> on a particular architecture later, they can still disable the
> syscall then.

Hm, if you believe that this is fine and want to "vouch" for it by
whipping up a patch that replaces the wiring up done in [1] I'm happy to
take it. :) Otherwise I'd feel more comfortable not adding all arches at
once.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=clone

Christian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] arch: wire-up clone3() syscall
  2019-06-21 15:30               ` Christian Brauner
@ 2019-07-01 15:14                 ` Arnd Bergmann
  -1 siblings, 0 replies; 23+ messages in thread
From: Arnd Bergmann @ 2019-07-01 15:14 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Guenter Roeck, Al Viro, Linux Kernel Mailing List,
	Linus Torvalds, Jann Horn, Kees Cook, Florian Weimer,
	Oleg Nesterov, David Howells, Andrew Morton, Adrian Reber,
	Linux API, linux-arch, the arch/x86 maintainers, Ley Foon Tan,
	moderated list:NIOS2 ARCHITECTURE

On Fri, Jun 21, 2019 at 5:30 PM Christian Brauner <christian@brauner.io> wrote:
> On Fri, Jun 21, 2019 at 04:20:15PM +0200, Arnd Bergmann wrote:
> > On Fri, Jun 21, 2019 at 1:18 PM Christian Brauner <christian@brauner.io> wrote:
> Hm, if you believe that this is fine and want to "vouch" for it by
> whipping up a patch that replaces the wiring up done in [1] I'm happy to
> take it. :) Otherwise I'd feel more comfortable not adding all arches at
> once.
>
> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=clone

Sorry for my late reply. I had actually looked at the implementations
in a little
more detail and I think you are right that adding these are better
left to the arch
maintainers in case of clone3.

      Arnd

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] arch: wire-up clone3() syscall
@ 2019-07-01 15:14                 ` Arnd Bergmann
  0 siblings, 0 replies; 23+ messages in thread
From: Arnd Bergmann @ 2019-07-01 15:14 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Guenter Roeck, Al Viro, Linux Kernel Mailing List,
	Linus Torvalds, Jann Horn, Kees Cook, Florian Weimer,
	Oleg Nesterov, David Howells, Andrew Morton, Adrian Reber,
	Linux API, linux-arch, the arch/x86 maintainers, Ley Foon Tan,
	moderated list:NIOS2 ARCHITECTURE

On Fri, Jun 21, 2019 at 5:30 PM Christian Brauner <christian@brauner.io> wrote:
> On Fri, Jun 21, 2019 at 04:20:15PM +0200, Arnd Bergmann wrote:
> > On Fri, Jun 21, 2019 at 1:18 PM Christian Brauner <christian@brauner.io> wrote:
> Hm, if you believe that this is fine and want to "vouch" for it by
> whipping up a patch that replaces the wiring up done in [1] I'm happy to
> take it. :) Otherwise I'd feel more comfortable not adding all arches at
> once.
>
> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=clone

Sorry for my late reply. I had actually looked at the implementations
in a little
more detail and I think you are right that adding these are better
left to the arch
maintainers in case of clone3.

      Arnd

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] arch: wire-up clone3() syscall
  2019-07-01 15:14                 ` Arnd Bergmann
@ 2019-07-01 15:24                   ` Christian Brauner
  -1 siblings, 0 replies; 23+ messages in thread
From: Christian Brauner @ 2019-07-01 15:24 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Guenter Roeck, Al Viro, Linux Kernel Mailing List,
	Linus Torvalds, Jann Horn, Kees Cook, Florian Weimer,
	Oleg Nesterov, David Howells, Andrew Morton, Adrian Reber,
	Linux API, linux-arch, the arch/x86 maintainers, Ley Foon Tan,
	moderated list:NIOS2 ARCHITECTURE

On Mon, Jul 01, 2019 at 05:14:51PM +0200, Arnd Bergmann wrote:
> On Fri, Jun 21, 2019 at 5:30 PM Christian Brauner <christian@brauner.io> wrote:
> > On Fri, Jun 21, 2019 at 04:20:15PM +0200, Arnd Bergmann wrote:
> > > On Fri, Jun 21, 2019 at 1:18 PM Christian Brauner <christian@brauner.io> wrote:
> > Hm, if you believe that this is fine and want to "vouch" for it by
> > whipping up a patch that replaces the wiring up done in [1] I'm happy to
> > take it. :) Otherwise I'd feel more comfortable not adding all arches at
> > once.
> >
> > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=clone
> 
> Sorry for my late reply. I had actually looked at the implementations
> in a little
> more detail and I think you are right that adding these are better
> left to the arch
> maintainers in case of clone3.

Perfect, thanks!

Christian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] arch: wire-up clone3() syscall
@ 2019-07-01 15:24                   ` Christian Brauner
  0 siblings, 0 replies; 23+ messages in thread
From: Christian Brauner @ 2019-07-01 15:24 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Guenter Roeck, Al Viro, Linux Kernel Mailing List,
	Linus Torvalds, Jann Horn, Kees Cook, Florian Weimer,
	Oleg Nesterov, David Howells, Andrew Morton, Adrian Reber,
	Linux API, linux-arch, the arch/x86 maintainers, Ley Foon Tan,
	moderated list:NIOS2 ARCHITECTURE

On Mon, Jul 01, 2019 at 05:14:51PM +0200, Arnd Bergmann wrote:
> On Fri, Jun 21, 2019 at 5:30 PM Christian Brauner <christian@brauner.io> wrote:
> > On Fri, Jun 21, 2019 at 04:20:15PM +0200, Arnd Bergmann wrote:
> > > On Fri, Jun 21, 2019 at 1:18 PM Christian Brauner <christian@brauner.io> wrote:
> > Hm, if you believe that this is fine and want to "vouch" for it by
> > whipping up a patch that replaces the wiring up done in [1] I'm happy to
> > take it. :) Otherwise I'd feel more comfortable not adding all arches at
> > once.
> >
> > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=clone
> 
> Sorry for my late reply. I had actually looked at the implementations
> in a little
> more detail and I think you are right that adding these are better
> left to the arch
> maintainers in case of clone3.

Perfect, thanks!

Christian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* clone3 on ARC (was Re: [PATCH v3 2/2] arch: wire-up clone3() syscall)
  2019-06-04 21:29     ` Christian Brauner
@ 2020-01-15 22:41         ` Vineet Gupta
  0 siblings, 0 replies; 23+ messages in thread
From: Vineet Gupta @ 2020-01-15 22:41 UTC (permalink / raw)
  To: Christian Brauner, Arnd Bergmann
  Cc: Al Viro, Linux Kernel Mailing List, Linus Torvalds, Jann Horn,
	Kees Cook, Florian Weimer, Oleg Nesterov, David Howells,
	Andrew Morton, Adrian Reber, Linux API, linux-arch,
	the arch/x86 maintainers, arcml

On 6/4/19 2:29 PM, Christian Brauner wrote:
> On Tue, Jun 04, 2019 at 08:40:01PM +0200, Arnd Bergmann wrote:
>> On Tue, Jun 4, 2019 at 6:09 PM Christian Brauner <christian@brauner.io> wrote:
>>>
>>> Wire up the clone3() call on all arches that don't require hand-rolled
>>> assembly.
>>>
>>> Some of the arches look like they need special assembly massaging and it is
>>> probably smarter if the appropriate arch maintainers would do the actual
>>> wiring. Arches that are wired-up are:
>>> - x86{_32,64}
>>> - arm{64}
>>> - xtensa
>>
>> The ones you did look good to me. I would hope that we can do all other
>> architectures the same way, even if they have special assembly wrappers
>> for the old clone(). The most interesting cases appear to be ia64, alpha,
>> m68k and sparc, so it would be good if their maintainers could take a
>> look.
> 
> Yes, agreed. They can sort this out even after this lands.
> 
>>
>> What do you use for testing? Would it be possible to override the
>> internal clone() function in glibc with an LD_PRELOAD library
>> to quickly test one of the other architectures for regressions?
> 
> I have a test program that is rather horrendously ugly and I compiled
> kernels for x86 and the arms and tested in qemu. The program basically
> looks like [1].

I just got around to fixing this for ARC (patch to follow after we sort out the
testing) and was trying to use the test case below for a qucik and dirty smoke
test (so existing toolchain lacking with headers lacking NR_clone3 or struct
clone_args etc). I did hack those up, but then spotted below

uapi/linux/sched.h

|    struct clone_args {
|	__aligned_u64 flags;
|	__aligned_u64 pidfd;
|	__aligned_u64 child_tid;
|	__aligned_u64 parent_tid;
..
..

Are all clone3 arg fields supposed to be 64-bit wide, even things like @child_tid,
@tls .... which are traditionally ARCH word wide ?


> 
> Christian
> 
> [1]:
> #define _GNU_SOURCE
> #include <err.h>
> #include <errno.h>
> #include <fcntl.h>
> #include <linux/sched.h>
> #include <linux/types.h>
> #include <sched.h>
> #include <signal.h>
> #include <stdint.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/mount.h>
> #include <sys/socket.h>
> #include <sys/stat.h>
> #include <sys/syscall.h>
> #include <sys/sysmacros.h>
> #include <sys/types.h>
> #include <sys/un.h>
> #include <sys/wait.h>
> #include <unistd.h>
> 
> static pid_t raw_clone(struct clone_args *args)
> {
> 	return syscall(__NR_clone3, args, sizeof(struct clone_args));
> }
> 
> static pid_t raw_clone_legacy(int *pidfd, unsigned int flags)
> {
> 	return syscall(__NR_clone, flags, 0, pidfd, 0, 0);
> }
> 
> static int wait_for_pid(pid_t pid)
> {
> 	int status, ret;
> 
> again:
> 	ret = waitpid(pid, &status, 0);
> 	if (ret == -1) {
> 		if (errno == EINTR)
> 			goto again;
> 
> 		return -1;
> 	}
> 
> 	if (ret != pid)
> 		goto again;
> 
> 	if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
> 		return -1;
> 
> 	return 0;
> }
> 
> #define ptr_to_u64(ptr) ((__u64)((uintptr_t)(ptr)))
> #define u64_to_ptr(n) ((uintptr_t)((__u64)(n)))
> 
> int main(int argc, char *argv[])
> {
> 	int pidfd = -1;
> 	pid_t parent_tid = -1, pid = -1;
> 	struct clone_args args = {0};
> 	args.parent_tid = ptr_to_u64(&parent_tid);
> 	args.pidfd = ptr_to_u64(&pidfd);
> 	args.flags = CLONE_PIDFD | CLONE_PARENT_SETTID;
> 	args.exit_signal = SIGCHLD;
> 
> 	pid = raw_clone(&args);
> 	if (pid < 0) {
> 		fprintf(stderr, "%s - Failed to create new process\n",
> 			strerror(errno));
> 		exit(EXIT_FAILURE);
> 	}
> 
> 	if (pid == 0) {
> 		printf("I am the child with pid %d\n", getpid());
> 		exit(EXIT_SUCCESS);
> 	}
> 
> 	printf("raw_clone: I am the parent. My child's pid is   %d\n", pid);
> 	printf("raw_clone: I am the parent. My child's pidfd is %d\n",
> 	       *(int *)args.pidfd);
> 	printf("raw_clone: I am the parent. My child's paren_tid value is %d\n",
> 	       *(pid_t *)args.parent_tid);
> 
> 	if (wait_for_pid(pid))
> 		exit(EXIT_FAILURE);
> 
> 	if (pid != *(pid_t *)args.parent_tid)
> 		exit(EXIT_FAILURE);
> 
> 	close(pidfd);
> 
> 	printf("\n\n");
> 	pidfd = -1;
> 	pid = raw_clone_legacy(&pidfd, CLONE_PIDFD | SIGCHLD);
> 	if (pid < 0) {
> 		fprintf(stderr, "%s - Failed to create new process\n",
> 			strerror(errno));
> 		exit(EXIT_FAILURE);
> 	}
> 
> 	if (pid == 0) {
> 		printf("I am the child with pid %d\n", getpid());
> 		exit(EXIT_SUCCESS);
> 	}
> 
> 	printf("raw_clone_legacy: I am the parent. My child's pid is   %d\n",
> 	       pid);
> 	printf("raw_clone_legacy: I am the parent. My child's pidfd is %d\n",
> 	       pidfd);
> 
> 	if (wait_for_pid(pid))
> 		exit(EXIT_FAILURE);
> 
> 	if (pid != *(pid_t *)args.parent_tid)
> 		exit(EXIT_FAILURE);
> 
> 	return 0;
> }
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* clone3 on ARC (was Re: [PATCH v3 2/2] arch: wire-up clone3() syscall)
@ 2020-01-15 22:41         ` Vineet Gupta
  0 siblings, 0 replies; 23+ messages in thread
From: Vineet Gupta @ 2020-01-15 22:41 UTC (permalink / raw)
  To: Christian Brauner, Arnd Bergmann
  Cc: Florian Weimer, linux-arch, Kees Cook, Jann Horn, Linux API,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Oleg Nesterov, David Howells, Al Viro, Andrew Morton, arcml,
	Linus Torvalds, Adrian Reber

On 6/4/19 2:29 PM, Christian Brauner wrote:
> On Tue, Jun 04, 2019 at 08:40:01PM +0200, Arnd Bergmann wrote:
>> On Tue, Jun 4, 2019 at 6:09 PM Christian Brauner <christian@brauner.io> wrote:
>>>
>>> Wire up the clone3() call on all arches that don't require hand-rolled
>>> assembly.
>>>
>>> Some of the arches look like they need special assembly massaging and it is
>>> probably smarter if the appropriate arch maintainers would do the actual
>>> wiring. Arches that are wired-up are:
>>> - x86{_32,64}
>>> - arm{64}
>>> - xtensa
>>
>> The ones you did look good to me. I would hope that we can do all other
>> architectures the same way, even if they have special assembly wrappers
>> for the old clone(). The most interesting cases appear to be ia64, alpha,
>> m68k and sparc, so it would be good if their maintainers could take a
>> look.
> 
> Yes, agreed. They can sort this out even after this lands.
> 
>>
>> What do you use for testing? Would it be possible to override the
>> internal clone() function in glibc with an LD_PRELOAD library
>> to quickly test one of the other architectures for regressions?
> 
> I have a test program that is rather horrendously ugly and I compiled
> kernels for x86 and the arms and tested in qemu. The program basically
> looks like [1].

I just got around to fixing this for ARC (patch to follow after we sort out the
testing) and was trying to use the test case below for a qucik and dirty smoke
test (so existing toolchain lacking with headers lacking NR_clone3 or struct
clone_args etc). I did hack those up, but then spotted below

uapi/linux/sched.h

|    struct clone_args {
|	__aligned_u64 flags;
|	__aligned_u64 pidfd;
|	__aligned_u64 child_tid;
|	__aligned_u64 parent_tid;
..
..

Are all clone3 arg fields supposed to be 64-bit wide, even things like @child_tid,
@tls .... which are traditionally ARCH word wide ?


> 
> Christian
> 
> [1]:
> #define _GNU_SOURCE
> #include <err.h>
> #include <errno.h>
> #include <fcntl.h>
> #include <linux/sched.h>
> #include <linux/types.h>
> #include <sched.h>
> #include <signal.h>
> #include <stdint.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/mount.h>
> #include <sys/socket.h>
> #include <sys/stat.h>
> #include <sys/syscall.h>
> #include <sys/sysmacros.h>
> #include <sys/types.h>
> #include <sys/un.h>
> #include <sys/wait.h>
> #include <unistd.h>
> 
> static pid_t raw_clone(struct clone_args *args)
> {
> 	return syscall(__NR_clone3, args, sizeof(struct clone_args));
> }
> 
> static pid_t raw_clone_legacy(int *pidfd, unsigned int flags)
> {
> 	return syscall(__NR_clone, flags, 0, pidfd, 0, 0);
> }
> 
> static int wait_for_pid(pid_t pid)
> {
> 	int status, ret;
> 
> again:
> 	ret = waitpid(pid, &status, 0);
> 	if (ret == -1) {
> 		if (errno == EINTR)
> 			goto again;
> 
> 		return -1;
> 	}
> 
> 	if (ret != pid)
> 		goto again;
> 
> 	if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
> 		return -1;
> 
> 	return 0;
> }
> 
> #define ptr_to_u64(ptr) ((__u64)((uintptr_t)(ptr)))
> #define u64_to_ptr(n) ((uintptr_t)((__u64)(n)))
> 
> int main(int argc, char *argv[])
> {
> 	int pidfd = -1;
> 	pid_t parent_tid = -1, pid = -1;
> 	struct clone_args args = {0};
> 	args.parent_tid = ptr_to_u64(&parent_tid);
> 	args.pidfd = ptr_to_u64(&pidfd);
> 	args.flags = CLONE_PIDFD | CLONE_PARENT_SETTID;
> 	args.exit_signal = SIGCHLD;
> 
> 	pid = raw_clone(&args);
> 	if (pid < 0) {
> 		fprintf(stderr, "%s - Failed to create new process\n",
> 			strerror(errno));
> 		exit(EXIT_FAILURE);
> 	}
> 
> 	if (pid == 0) {
> 		printf("I am the child with pid %d\n", getpid());
> 		exit(EXIT_SUCCESS);
> 	}
> 
> 	printf("raw_clone: I am the parent. My child's pid is   %d\n", pid);
> 	printf("raw_clone: I am the parent. My child's pidfd is %d\n",
> 	       *(int *)args.pidfd);
> 	printf("raw_clone: I am the parent. My child's paren_tid value is %d\n",
> 	       *(pid_t *)args.parent_tid);
> 
> 	if (wait_for_pid(pid))
> 		exit(EXIT_FAILURE);
> 
> 	if (pid != *(pid_t *)args.parent_tid)
> 		exit(EXIT_FAILURE);
> 
> 	close(pidfd);
> 
> 	printf("\n\n");
> 	pidfd = -1;
> 	pid = raw_clone_legacy(&pidfd, CLONE_PIDFD | SIGCHLD);
> 	if (pid < 0) {
> 		fprintf(stderr, "%s - Failed to create new process\n",
> 			strerror(errno));
> 		exit(EXIT_FAILURE);
> 	}
> 
> 	if (pid == 0) {
> 		printf("I am the child with pid %d\n", getpid());
> 		exit(EXIT_SUCCESS);
> 	}
> 
> 	printf("raw_clone_legacy: I am the parent. My child's pid is   %d\n",
> 	       pid);
> 	printf("raw_clone_legacy: I am the parent. My child's pidfd is %d\n",
> 	       pidfd);
> 
> 	if (wait_for_pid(pid))
> 		exit(EXIT_FAILURE);
> 
> 	if (pid != *(pid_t *)args.parent_tid)
> 		exit(EXIT_FAILURE);
> 
> 	return 0;
> }
> 

_______________________________________________
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: clone3 on ARC (was Re: [PATCH v3 2/2] arch: wire-up clone3() syscall)
  2020-01-15 22:41         ` Vineet Gupta
@ 2020-01-16 11:25           ` Christian Brauner
  -1 siblings, 0 replies; 23+ messages in thread
From: Christian Brauner @ 2020-01-16 11:25 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Christian Brauner, Arnd Bergmann, Al Viro,
	Linux Kernel Mailing List, Linus Torvalds, Jann Horn, Kees Cook,
	Florian Weimer, Oleg Nesterov, David Howells, Andrew Morton,
	Adrian Reber, Linux API, linux-arch, the arch/x86 maintainers,
	arcml

On Wed, Jan 15, 2020 at 10:41:20PM +0000, Vineet Gupta wrote:
> On 6/4/19 2:29 PM, Christian Brauner wrote:
> > On Tue, Jun 04, 2019 at 08:40:01PM +0200, Arnd Bergmann wrote:
> >> On Tue, Jun 4, 2019 at 6:09 PM Christian Brauner <christian@brauner.io> wrote:
> >>>
> >>> Wire up the clone3() call on all arches that don't require hand-rolled
> >>> assembly.
> >>>
> >>> Some of the arches look like they need special assembly massaging and it is
> >>> probably smarter if the appropriate arch maintainers would do the actual
> >>> wiring. Arches that are wired-up are:
> >>> - x86{_32,64}
> >>> - arm{64}
> >>> - xtensa
> >>
> >> The ones you did look good to me. I would hope that we can do all other
> >> architectures the same way, even if they have special assembly wrappers
> >> for the old clone(). The most interesting cases appear to be ia64, alpha,
> >> m68k and sparc, so it would be good if their maintainers could take a
> >> look.
> > 
> > Yes, agreed. They can sort this out even after this lands.
> > 
> >>
> >> What do you use for testing? Would it be possible to override the
> >> internal clone() function in glibc with an LD_PRELOAD library
> >> to quickly test one of the other architectures for regressions?
> > 
> > I have a test program that is rather horrendously ugly and I compiled
> > kernels for x86 and the arms and tested in qemu. The program basically
> > looks like [1].
> 
> I just got around to fixing this for ARC (patch to follow after we sort out the
> testing) and was trying to use the test case below for a qucik and dirty smoke
> test (so existing toolchain lacking with headers lacking NR_clone3 or struct
> clone_args etc). I did hack those up, but then spotted below
> 
> uapi/linux/sched.h
> 
> |    struct clone_args {
> |	__aligned_u64 flags;
> |	__aligned_u64 pidfd;
> |	__aligned_u64 child_tid;
> |	__aligned_u64 parent_tid;
> ..
> ..
> 
> Are all clone3 arg fields supposed to be 64-bit wide, even things like @child_tid,
> @tls .... which are traditionally ARCH word wide ?

This is just the kernel ABI we expose to userspace with the intention to
make it easy for us to handle 32 and 64 bit. A libc like glibc is
expected to expose a properly typed struct to userspace. The kernel
struct kernel_clone_args has "correct" typing.

Christian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: clone3 on ARC (was Re: [PATCH v3 2/2] arch: wire-up clone3() syscall)
@ 2020-01-16 11:25           ` Christian Brauner
  0 siblings, 0 replies; 23+ messages in thread
From: Christian Brauner @ 2020-01-16 11:25 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Florian Weimer, linux-arch, Kees Cook, Arnd Bergmann, Jann Horn,
	Linux API, the arch/x86 maintainers, Linux Kernel Mailing List,
	Oleg Nesterov, David Howells, Al Viro, Andrew Morton, arcml,
	Linus Torvalds, Christian Brauner, Adrian Reber

On Wed, Jan 15, 2020 at 10:41:20PM +0000, Vineet Gupta wrote:
> On 6/4/19 2:29 PM, Christian Brauner wrote:
> > On Tue, Jun 04, 2019 at 08:40:01PM +0200, Arnd Bergmann wrote:
> >> On Tue, Jun 4, 2019 at 6:09 PM Christian Brauner <christian@brauner.io> wrote:
> >>>
> >>> Wire up the clone3() call on all arches that don't require hand-rolled
> >>> assembly.
> >>>
> >>> Some of the arches look like they need special assembly massaging and it is
> >>> probably smarter if the appropriate arch maintainers would do the actual
> >>> wiring. Arches that are wired-up are:
> >>> - x86{_32,64}
> >>> - arm{64}
> >>> - xtensa
> >>
> >> The ones you did look good to me. I would hope that we can do all other
> >> architectures the same way, even if they have special assembly wrappers
> >> for the old clone(). The most interesting cases appear to be ia64, alpha,
> >> m68k and sparc, so it would be good if their maintainers could take a
> >> look.
> > 
> > Yes, agreed. They can sort this out even after this lands.
> > 
> >>
> >> What do you use for testing? Would it be possible to override the
> >> internal clone() function in glibc with an LD_PRELOAD library
> >> to quickly test one of the other architectures for regressions?
> > 
> > I have a test program that is rather horrendously ugly and I compiled
> > kernels for x86 and the arms and tested in qemu. The program basically
> > looks like [1].
> 
> I just got around to fixing this for ARC (patch to follow after we sort out the
> testing) and was trying to use the test case below for a qucik and dirty smoke
> test (so existing toolchain lacking with headers lacking NR_clone3 or struct
> clone_args etc). I did hack those up, but then spotted below
> 
> uapi/linux/sched.h
> 
> |    struct clone_args {
> |	__aligned_u64 flags;
> |	__aligned_u64 pidfd;
> |	__aligned_u64 child_tid;
> |	__aligned_u64 parent_tid;
> ..
> ..
> 
> Are all clone3 arg fields supposed to be 64-bit wide, even things like @child_tid,
> @tls .... which are traditionally ARCH word wide ?

This is just the kernel ABI we expose to userspace with the intention to
make it easy for us to handle 32 and 64 bit. A libc like glibc is
expected to expose a properly typed struct to userspace. The kernel
struct kernel_clone_args has "correct" typing.

Christian

_______________________________________________
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-01-16 11:25 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-04 16:09 [PATCH v3 1/2] fork: add clone3 Christian Brauner
2019-06-04 16:09 ` [PATCH v3 2/2] arch: wire-up clone3() syscall Christian Brauner
2019-06-04 18:40   ` Arnd Bergmann
2019-06-04 21:29     ` Christian Brauner
2020-01-15 22:41       ` clone3 on ARC (was Re: [PATCH v3 2/2] arch: wire-up clone3() syscall) Vineet Gupta
2020-01-15 22:41         ` Vineet Gupta
2020-01-16 11:25         ` Christian Brauner
2020-01-16 11:25           ` Christian Brauner
2019-06-20 18:44   ` [PATCH v3 2/2] arch: wire-up clone3() syscall Guenter Roeck
2019-06-20 22:10     ` Christian Brauner
2019-06-21  9:37       ` Arnd Bergmann
2019-06-21 11:18         ` Christian Brauner
2019-06-21 14:20           ` Arnd Bergmann
2019-06-21 14:20             ` Arnd Bergmann
2019-06-21 15:30             ` Christian Brauner
2019-06-21 15:30               ` Christian Brauner
2019-07-01 15:14               ` Arnd Bergmann
2019-07-01 15:14                 ` Arnd Bergmann
2019-07-01 15:24                 ` Christian Brauner
2019-07-01 15:24                   ` Christian Brauner
2019-06-04 21:54 ` [PATCH v3 1/2] fork: add clone3 Christian Brauner
2019-06-06 21:46 ` Serge E. Hallyn
2019-06-08  8:15   ` Christian Brauner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.