linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v11 1/2] fork: extend clone3() to support setting a PID
@ 2019-11-15 12:36 Adrian Reber
  2019-11-15 12:36 ` [PATCH v11 2/2] selftests: add tests for clone3() with *set_tid Adrian Reber
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Adrian Reber @ 2019-11-15 12:36 UTC (permalink / raw)
  To: Christian Brauner, Eric Biederman, Pavel Emelyanov, Jann Horn,
	Oleg Nesterov, Dmitry Safonov, Andrei Vagin, Rasmus Villemoes
  Cc: linux-kernel, Mike Rapoport, Radostin Stoyanov, Adrian Reber

The main motivation to add set_tid to clone3() is CRIU.

To restore a process with the same PID/TID CRIU currently uses
/proc/sys/kernel/ns_last_pid. It writes the desired (PID - 1) to
ns_last_pid and then (quickly) does a clone(). This works most of the
time, but it is racy. It is also slow as it requires multiple syscalls.

Extending clone3() to support *set_tid makes it possible restore a
process using CRIU without accessing /proc/sys/kernel/ns_last_pid and
race free (as long as the desired PID/TID is available).

This clone3() extension places the same restrictions (CAP_SYS_ADMIN)
on clone3() with *set_tid as they are currently in place for ns_last_pid.

The original version of this change was using a single value for
set_tid. At the 2019 LPC, after presenting set_tid, it was, however,
decided to change set_tid to an array to enable setting the PID of a
process in multiple PID namespaces at the same time. If a process is
created in a PID namespace it is possible to influence the PID inside
and outside of the PID namespace. Details also in the corresponding
selftest.

To create a process with the following PIDs:

      PID NS level         Requested PID
        0 (host)              31496
        1                        42
        2                         1

For that example the two newly introduced parameters to struct
clone_args (set_tid and set_tid_size) would need to be:

  set_tid[0] = 1;
  set_tid[1] = 42;
  set_tid[2] = 31496;
  set_tid_size = 3;

If only the PIDs of the two innermost nested PID namespaces should be
defined it would look like this:

  set_tid[0] = 1;
  set_tid[1] = 42;
  set_tid_size = 2;

The PID of the newly created process would then be the next available
free PID in the PID namespace level 0 (host) and 42 in the PID namespace
at level 1 and the PID of the process in the innermost PID namespace
would be 1.

The set_tid array is used to specify the PID of a process starting
from the innermost nested PID namespaces up to set_tid_size PID namespaces.

set_tid_size cannot be larger then the current PID namespace level.

Signed-off-by: Adrian Reber <areber@redhat.com>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
---
v2:
 - Removed (size < sizeof(struct clone_args)) as discussed with
   Christian and Dmitry
 - Added comment to ((set_tid != 1) && idr_get_cursor() <= 1) (Oleg)
 - Use idr_alloc() instead of idr_alloc_cyclic() (Oleg)

v3:
 - Return EEXIST if PID is already in use (Christian)
 - Drop CLONE_SET_TID (Christian and Oleg)
 - Use idr_is_empty() instead of idr_get_cursor() (Oleg)
 - Handle different `struct clone_args` sizes (Dmitry)

v4:
 - Rework struct size check with defines (Christian)
 - Reduce number of set_tid checks (Oleg)
 - Less parentheses and more robust code (Oleg)
 - Do ns_capable() on correct user_ns (Oleg, Christian)

v5:
 - make set_tid checks earlier in alloc_pid() (Christian)
 - remove unnecessary comment and struct size check (Christian)

v6:
 - remove CLONE_SET_TID from description (Christian)
 - add clone3() tests for different clone_args sizes (Christian)
 - move more set_tid checks to alloc_pid() (Oleg)
 - make all set_tid checks lockless (Oleg)

v7:
 - changed set_tid to be an array to set the PID of a process
   in multiple nested PID namespaces at the same time as discussed
   at LPC 2019 (container MC)

v8:
 - skip unnecessary memset() (Rasmus)
 - replace set_tid copy loop with a single copy (Christian)
 - more parameter error checking (Christian)
 - cache set_tid in alloc_pid() (Oleg)
 - move code in "else" branch (Oleg)

v9:
 - added kernel-doc to include/uapi/linux/sched.h (Christian)
 - moved a variable to limit its scope; keep all set_tid_size
   related changes in one place (Oleg)

v10:
 - added CLONE_ARGS_SIZE_VER1 to sched.h (Christian)

v11:
 - abort alloc_pid() correctly if one of the PIDs specified in
   set_pid[] is invalid (Andrei)
---
 include/linux/pid.h           |  3 +-
 include/linux/pid_namespace.h |  2 +
 include/linux/sched/task.h    |  3 ++
 include/uapi/linux/sched.h    | 53 +++++++++++++++++---------
 kernel/fork.c                 | 24 +++++++++++-
 kernel/pid.c                  | 72 +++++++++++++++++++++++++++--------
 kernel/pid_namespace.c        |  2 -
 7 files changed, 122 insertions(+), 37 deletions(-)

diff --git a/include/linux/pid.h b/include/linux/pid.h
index 034e3cd60dc0..998ae7d24450 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -124,7 +124,8 @@ extern struct pid *find_vpid(int nr);
 extern struct pid *find_get_pid(int nr);
 extern struct pid *find_ge_pid(int nr, struct pid_namespace *);
 
-extern struct pid *alloc_pid(struct pid_namespace *ns);
+extern struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
+			     size_t set_tid_size);
 extern void free_pid(struct pid *pid);
 extern void disable_pid_allocation(struct pid_namespace *ns);
 
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 49538b172483..2ed6af88794b 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -12,6 +12,8 @@
 #include <linux/ns_common.h>
 #include <linux/idr.h>
 
+/* MAX_PID_NS_LEVEL is needed for limiting size of 'struct pid' */
+#define MAX_PID_NS_LEVEL 32
 
 struct fs_pin;
 
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 4b1c3b664f51..f1879884238e 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -26,6 +26,9 @@ struct kernel_clone_args {
 	unsigned long stack;
 	unsigned long stack_size;
 	unsigned long tls;
+	pid_t *set_tid;
+	/* Number of elements in *set_tid */
+	size_t set_tid_size;
 };
 
 /*
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 1d500ed03c63..2e649cfa07f4 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -39,24 +39,38 @@
 #ifndef __ASSEMBLY__
 /**
  * struct clone_args - arguments for the clone3 syscall
- * @flags:       Flags for the new process as listed above.
- *               All flags are valid except for CSIGNAL and
- *               CLONE_DETACHED.
- * @pidfd:       If CLONE_PIDFD is set, a pidfd will be
- *               returned in this argument.
- * @child_tid:   If CLONE_CHILD_SETTID is set, the TID of the
- *               child process will be returned in the child's
- *               memory.
- * @parent_tid:  If CLONE_PARENT_SETTID is set, the TID of
- *               the child process will be returned in the
- *               parent's memory.
- * @exit_signal: The exit_signal the parent process will be
- *               sent when the child exits.
- * @stack:       Specify the location of the stack for the
- *               child process.
- * @stack_size:  The size of the stack for the child process.
- * @tls:         If CLONE_SETTLS is set, the tls descriptor
- *               is set to tls.
+ * @flags:        Flags for the new process as listed above.
+ *                All flags are valid except for CSIGNAL and
+ *                CLONE_DETACHED.
+ * @pidfd:        If CLONE_PIDFD is set, a pidfd will be
+ *                returned in this argument.
+ * @child_tid:    If CLONE_CHILD_SETTID is set, the TID of the
+ *                child process will be returned in the child's
+ *                memory.
+ * @parent_tid:   If CLONE_PARENT_SETTID is set, the TID of
+ *                the child process will be returned in the
+ *                parent's memory.
+ * @exit_signal:  The exit_signal the parent process will be
+ *                sent when the child exits.
+ * @stack:        Specify the location of the stack for the
+ *                child process.
+ * @stack_size:   The size of the stack for the child process.
+ * @tls:          If CLONE_SETTLS is set, the tls descriptor
+ *                is set to tls.
+ * @set_tid:      Pointer to an array of type *pid_t. The size
+ *                of the array is defined using @set_tid_size.
+ *                This array is used select PIDs/TIDs for newly
+ *                created processes. The first element in this
+ *                defines the PID in the most nested PID
+ *                namespace. Each additional element in the array
+ *                defines the PID in the parent PID namespace of
+ *                the original PID namespace. If the array has
+ *                less entries than the number of currently
+ *                nested PID namespaces only the PIDs in the
+ *                corresponding namespaces are set.
+ * @set_tid_size: This defines the size of the array referenced
+ *                in @set_tid. This cannot be larger than the
+ *                kernel's limit of nested PID namespaces.
  *
  * The structure is versioned by size and thus extensible.
  * New struct members must go at the end of the struct and
@@ -71,10 +85,13 @@ struct clone_args {
 	__aligned_u64 stack;
 	__aligned_u64 stack_size;
 	__aligned_u64 tls;
+	__aligned_u64 set_tid;
+	__aligned_u64 set_tid_size;
 };
 #endif
 
 #define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
+#define CLONE_ARGS_SIZE_VER1 80 /* sizeof second published struct */
 
 /*
  * Scheduling policies
diff --git a/kernel/fork.c b/kernel/fork.c
index 954e875e72b1..417570263f1f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2087,7 +2087,8 @@ static __latent_entropy struct task_struct *copy_process(
 	stackleak_task_init(p);
 
 	if (pid != &init_struct_pid) {
-		pid = alloc_pid(p->nsproxy->pid_ns_for_children);
+		pid = alloc_pid(p->nsproxy->pid_ns_for_children, args->set_tid,
+				args->set_tid_size);
 		if (IS_ERR(pid)) {
 			retval = PTR_ERR(pid);
 			goto bad_fork_cleanup_thread;
@@ -2590,6 +2591,7 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
 {
 	int err;
 	struct clone_args args;
+	pid_t *kset_tid = kargs->set_tid;
 
 	if (unlikely(usize > PAGE_SIZE))
 		return -E2BIG;
@@ -2600,6 +2602,15 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
 	if (err)
 		return err;
 
+	if (unlikely(args.set_tid_size > MAX_PID_NS_LEVEL))
+		return -EINVAL;
+
+	if (unlikely(!args.set_tid && args.set_tid_size > 0))
+		return -EINVAL;
+
+	if (unlikely(args.set_tid && args.set_tid_size == 0))
+		return -EINVAL;
+
 	/*
 	 * Verify that higher 32bits of exit_signal are unset and that
 	 * it is a valid signal
@@ -2617,8 +2628,16 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
 		.stack		= args.stack,
 		.stack_size	= args.stack_size,
 		.tls		= args.tls,
+		.set_tid_size	= args.set_tid_size,
 	};
 
+	if (args.set_tid &&
+		copy_from_user(kset_tid, u64_to_user_ptr(args.set_tid),
+			(kargs->set_tid_size * sizeof(pid_t))))
+		return -EFAULT;
+
+	kargs->set_tid = kset_tid;
+
 	return 0;
 }
 
@@ -2662,6 +2681,9 @@ SYSCALL_DEFINE2(clone3, struct clone_args __user *, uargs, size_t, size)
 	int err;
 
 	struct kernel_clone_args kargs;
+	pid_t set_tid[MAX_PID_NS_LEVEL];
+
+	kargs.set_tid = set_tid;
 
 	err = copy_clone_args_from_user(&kargs, uargs, size);
 	if (err)
diff --git a/kernel/pid.c b/kernel/pid.c
index 7b5f6c963d72..2278e249141d 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -157,7 +157,8 @@ void free_pid(struct pid *pid)
 	call_rcu(&pid->rcu, delayed_put_pid);
 }
 
-struct pid *alloc_pid(struct pid_namespace *ns)
+struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
+		      size_t set_tid_size)
 {
 	struct pid *pid;
 	enum pid_type type;
@@ -166,6 +167,17 @@ struct pid *alloc_pid(struct pid_namespace *ns)
 	struct upid *upid;
 	int retval = -ENOMEM;
 
+	/*
+	 * set_tid_size contains the size of the set_tid array. Starting at
+	 * the most nested currently active PID namespace it tells alloc_pid()
+	 * which PID to set for a process in that most nested PID namespace
+	 * up to set_tid_size PID namespaces. It does not have to set the PID
+	 * for a process in all nested PID namespaces but set_tid_size must
+	 * never be greater than the current ns->level + 1.
+	 */
+	if (set_tid_size > ns->level + 1)
+		return ERR_PTR(-EINVAL);
+
 	pid = kmem_cache_alloc(ns->pid_cachep, GFP_KERNEL);
 	if (!pid)
 		return ERR_PTR(retval);
@@ -174,24 +186,54 @@ struct pid *alloc_pid(struct pid_namespace *ns)
 	pid->level = ns->level;
 
 	for (i = ns->level; i >= 0; i--) {
-		int pid_min = 1;
+		int tid = 0;
+
+		if (set_tid_size) {
+			tid = set_tid[ns->level - i];
+
+			retval = -EINVAL;
+			if (tid < 1 || tid >= pid_max)
+				goto out_free;
+			/*
+			 * Also fail if a PID != 1 is requested and
+			 * no PID 1 exists.
+			 */
+			if (tid != 1 && !tmp->child_reaper)
+				goto out_free;
+			retval = -EPERM;
+			if (!ns_capable(tmp->user_ns, CAP_SYS_ADMIN))
+				goto out_free;
+			set_tid_size--;
+		}
 
 		idr_preload(GFP_KERNEL);
 		spin_lock_irq(&pidmap_lock);
 
-		/*
-		 * init really needs pid 1, but after reaching the maximum
-		 * wrap back to RESERVED_PIDS
-		 */
-		if (idr_get_cursor(&tmp->idr) > RESERVED_PIDS)
-			pid_min = RESERVED_PIDS;
-
-		/*
-		 * Store a null pointer so find_pid_ns does not find
-		 * a partially initialized PID (see below).
-		 */
-		nr = idr_alloc_cyclic(&tmp->idr, NULL, pid_min,
-				      pid_max, GFP_ATOMIC);
+		if (tid) {
+			nr = idr_alloc(&tmp->idr, NULL, tid,
+				       tid + 1, GFP_ATOMIC);
+			/*
+			 * If ENOSPC is returned it means that the PID is
+			 * alreay in use. Return EEXIST in that case.
+			 */
+			if (nr == -ENOSPC)
+				nr = -EEXIST;
+		} else {
+			int pid_min = 1;
+			/*
+			 * init really needs pid 1, but after reaching the
+			 * maximum wrap back to RESERVED_PIDS
+			 */
+			if (idr_get_cursor(&tmp->idr) > RESERVED_PIDS)
+				pid_min = RESERVED_PIDS;
+
+			/*
+			 * Store a null pointer so find_pid_ns does not find
+			 * a partially initialized PID (see below).
+			 */
+			nr = idr_alloc_cyclic(&tmp->idr, NULL, pid_min,
+					      pid_max, GFP_ATOMIC);
+		}
 		spin_unlock_irq(&pidmap_lock);
 		idr_preload_end();
 
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index a6a79f85c81a..d40017e79ebe 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -26,8 +26,6 @@
 
 static DEFINE_MUTEX(pid_caches_mutex);
 static struct kmem_cache *pid_ns_cachep;
-/* MAX_PID_NS_LEVEL is needed for limiting size of 'struct pid' */
-#define MAX_PID_NS_LEVEL 32
 /* Write once array, filled from the beginning. */
 static struct kmem_cache *pid_cache[MAX_PID_NS_LEVEL];
 

base-commit: 7acdfe534e726450fe8051abc2f36380fb2c2c0e
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v11 2/2] selftests: add tests for clone3() with *set_tid
  2019-11-15 12:36 [PATCH v11 1/2] fork: extend clone3() to support setting a PID Adrian Reber
@ 2019-11-15 12:36 ` Adrian Reber
  2019-11-15 22:20   ` Andrei Vagin
  2019-11-18  1:46   ` Andrei Vagin
  2019-11-15 16:33 ` [PATCH v11 1/2] fork: extend clone3() to support setting a PID Oleg Nesterov
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 11+ messages in thread
From: Adrian Reber @ 2019-11-15 12:36 UTC (permalink / raw)
  To: Christian Brauner, Eric Biederman, Pavel Emelyanov, Jann Horn,
	Oleg Nesterov, Dmitry Safonov, Andrei Vagin, Rasmus Villemoes
  Cc: linux-kernel, Mike Rapoport, Radostin Stoyanov, Adrian Reber

This tests clone3() with *set_tid to see if all desired PIDs are working
as expected. The tests are trying multiple invalid input parameters as
well as creating processes while specifying a certain PID in multiple
PID namespaces at the same time.

Additionally this moves common clone3() test code into clone3_selftests.h.

Signed-off-by: Adrian Reber <areber@redhat.com>
---
v9:
 - applied all changes from Christian's review (except using the
   NSpid: parsing code from selftests/pidfd/pidfd_fdinfo_test.c)

v10:
 - added even more '\n' and include file fixes (Christian)

v11:
 - added more return code checking at multiple places (Andrei)
 - also add set_tid/set_tid_size to internal struct (Andrei)
---
 tools/testing/selftests/clone3/.gitignore     |   1 +
 tools/testing/selftests/clone3/Makefile       |   2 +-
 tools/testing/selftests/clone3/clone3.c       |   8 +-
 .../selftests/clone3/clone3_clear_sighand.c   |  20 +-
 .../selftests/clone3/clone3_selftests.h       |  35 ++
 .../testing/selftests/clone3/clone3_set_tid.c | 381 ++++++++++++++++++
 6 files changed, 421 insertions(+), 26 deletions(-)
 create mode 100644 tools/testing/selftests/clone3/clone3_selftests.h
 create mode 100644 tools/testing/selftests/clone3/clone3_set_tid.c

diff --git a/tools/testing/selftests/clone3/.gitignore b/tools/testing/selftests/clone3/.gitignore
index 2a30ae18b06e..0dc4f32c6cb8 100644
--- a/tools/testing/selftests/clone3/.gitignore
+++ b/tools/testing/selftests/clone3/.gitignore
@@ -1,2 +1,3 @@
 clone3
 clone3_clear_sighand
+clone3_set_tid
diff --git a/tools/testing/selftests/clone3/Makefile b/tools/testing/selftests/clone3/Makefile
index eb26eb793c80..cf976c732906 100644
--- a/tools/testing/selftests/clone3/Makefile
+++ b/tools/testing/selftests/clone3/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 CFLAGS += -g -I../../../../usr/include/
 
-TEST_GEN_PROGS := clone3 clone3_clear_sighand
+TEST_GEN_PROGS := clone3 clone3_clear_sighand clone3_set_tid
 
 include ../lib.mk
diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c
index 0f8a9ef40117..4669b3d418e7 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -18,6 +18,7 @@
 #include <sched.h>
 
 #include "../kselftest.h"
+#include "clone3_selftests.h"
 
 /*
  * Different sizes of struct clone_args
@@ -35,11 +36,6 @@ enum test_mode {
 	CLONE3_ARGS_INVAL_EXIT_SIGNAL_NSIG,
 };
 
-static pid_t raw_clone(struct clone_args *args, size_t size)
-{
-	return syscall(__NR_clone3, args, size);
-}
-
 static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
 {
 	struct clone_args args = {
@@ -83,7 +79,7 @@ static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
 
 	memcpy(&args_ext.args, &args, sizeof(struct clone_args));
 
-	pid = raw_clone((struct clone_args *)&args_ext, size);
+	pid = sys_clone3((struct clone_args *)&args_ext, size);
 	if (pid < 0) {
 		ksft_print_msg("%s - Failed to create new process\n",
 				strerror(errno));
diff --git a/tools/testing/selftests/clone3/clone3_clear_sighand.c b/tools/testing/selftests/clone3/clone3_clear_sighand.c
index 0d957be1bdc5..456783ad19d6 100644
--- a/tools/testing/selftests/clone3/clone3_clear_sighand.c
+++ b/tools/testing/selftests/clone3/clone3_clear_sighand.c
@@ -14,30 +14,12 @@
 #include <sys/wait.h>
 
 #include "../kselftest.h"
+#include "clone3_selftests.h"
 
 #ifndef CLONE_CLEAR_SIGHAND
 #define CLONE_CLEAR_SIGHAND 0x100000000ULL
 #endif
 
-#ifndef __NR_clone3
-#define __NR_clone3 -1
-struct clone_args {
-	__aligned_u64 flags;
-	__aligned_u64 pidfd;
-	__aligned_u64 child_tid;
-	__aligned_u64 parent_tid;
-	__aligned_u64 exit_signal;
-	__aligned_u64 stack;
-	__aligned_u64 stack_size;
-	__aligned_u64 tls;
-};
-#endif
-
-static pid_t sys_clone3(struct clone_args *args, size_t size)
-{
-	return syscall(__NR_clone3, args, size);
-}
-
 static void test_clone3_supported(void)
 {
 	pid_t pid;
diff --git a/tools/testing/selftests/clone3/clone3_selftests.h b/tools/testing/selftests/clone3/clone3_selftests.h
new file mode 100644
index 000000000000..1a270390766a
--- /dev/null
+++ b/tools/testing/selftests/clone3/clone3_selftests.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _CLONE3_SELFTESTS_H
+#define _CLONE3_SELFTESTS_H
+
+#define _GNU_SOURCE
+#include <sched.h>
+#include <stdint.h>
+#include <syscall.h>
+#include <linux/types.h>
+
+#define ptr_to_u64(ptr) ((__u64)((uintptr_t)(ptr)))
+
+#ifndef __NR_clone3
+#define __NR_clone3 -1
+struct clone_args {
+	__aligned_u64 flags;
+	__aligned_u64 pidfd;
+	__aligned_u64 child_tid;
+	__aligned_u64 parent_tid;
+	__aligned_u64 exit_signal;
+	__aligned_u64 stack;
+	__aligned_u64 stack_size;
+	__aligned_u64 tls;
+	__aligned_u64 set_tid;
+	__aligned_u64 set_tid_size;
+};
+#endif
+
+static pid_t sys_clone3(struct clone_args *args, size_t size)
+{
+	return syscall(__NR_clone3, args, size);
+}
+
+#endif /* _CLONE3_SELFTESTS_H */
diff --git a/tools/testing/selftests/clone3/clone3_set_tid.c b/tools/testing/selftests/clone3/clone3_set_tid.c
new file mode 100644
index 000000000000..3480e1c46983
--- /dev/null
+++ b/tools/testing/selftests/clone3/clone3_set_tid.c
@@ -0,0 +1,381 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Based on Christian Brauner's clone3() example.
+ * These tests are assuming to be running in the host's
+ * PID namespace.
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <linux/types.h>
+#include <linux/sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <sys/un.h>
+#include <sys/wait.h>
+#include <unistd.h>
+#include <sched.h>
+
+#include "../kselftest.h"
+#include "clone3_selftests.h"
+
+#ifndef MAX_PID_NS_LEVEL
+#define MAX_PID_NS_LEVEL 32
+#endif
+
+static int pipe_1[2];
+static int pipe_2[2];
+
+static int call_clone3_set_tid(pid_t *set_tid,
+			       size_t set_tid_size,
+			       int flags,
+			       int expected_pid,
+			       bool wait_for_it)
+{
+	int status;
+	pid_t pid = -1;
+
+	struct clone_args args = {
+		.flags = flags,
+		.exit_signal = SIGCHLD,
+		.set_tid = ptr_to_u64(set_tid),
+		.set_tid_size = set_tid_size,
+	};
+
+	pid = sys_clone3(&args, sizeof(struct clone_args));
+	if (pid < 0) {
+		ksft_print_msg("%s - Failed to create new process\n",
+			       strerror(errno));
+		return -errno;
+	}
+
+	if (pid == 0) {
+		int ret;
+		char tmp = 0;
+		int exit_code = EXIT_SUCCESS;
+
+		ksft_print_msg("I am the child, my PID is %d (expected %d)\n",
+			       getpid(), set_tid[0]);
+		if (wait_for_it) {
+			ksft_print_msg("[%d] Child is ready and waiting\n",
+				       getpid());
+
+			/* Signal the parent that the child is ready */
+			close(pipe_1[0]);
+			ret = write(pipe_1[1], &tmp, 1);
+			if (ret != 1) {
+				ksft_print_msg(
+					"Writing to pipe returned %d", ret);
+				exit_code = EXIT_FAILURE;
+			}
+			close(pipe_1[1]);
+			close(pipe_2[1]);
+			ret = read(pipe_2[0], &tmp, 1);
+			if (ret != 1) {
+				ksft_print_msg(
+					"Reading from pipe returned %d", ret);
+				exit_code = EXIT_FAILURE;
+			}
+			close(pipe_2[0]);
+		}
+
+		if (set_tid[0] != getpid())
+			_exit(EXIT_FAILURE);
+		_exit(exit_code);
+	}
+
+	if (expected_pid == 0 || expected_pid == pid) {
+		ksft_print_msg("I am the parent (%d). My child's pid is %d\n",
+			       getpid(), pid);
+	} else {
+		ksft_print_msg(
+			"Expected child pid %d does not match actual pid %d\n",
+			expected_pid, pid);
+		return -1;
+	}
+
+	if (waitpid(pid, &status, 0) < 0) {
+		ksft_print_msg("Child returned %s\n", strerror(errno));
+		return -errno;
+	}
+
+	if (!WIFEXITED(status))
+		return -1;
+
+	return WEXITSTATUS(status);
+}
+
+static void test_clone3_set_tid(pid_t *set_tid,
+				size_t set_tid_size,
+				int flags,
+				int expected,
+				int expected_pid,
+				bool wait_for_it)
+{
+	int ret;
+
+	ksft_print_msg(
+		"[%d] Trying clone3() with CLONE_SET_TID to %d and 0x%x\n",
+		getpid(), set_tid[0], flags);
+	ret = call_clone3_set_tid(set_tid, set_tid_size, flags, expected_pid,
+				  wait_for_it);
+	ksft_print_msg(
+		"[%d] clone3() with CLONE_SET_TID %d says :%d - expected %d\n",
+		getpid(), set_tid[0], ret, expected);
+	if (ret != expected)
+		ksft_test_result_fail(
+			"[%d] Result (%d) is different than expected (%d)\n",
+			getpid(), ret, expected);
+	else
+		ksft_test_result_pass(
+			"[%d] Result (%d) matches expectation (%d)\n",
+			getpid(), ret, expected);
+}
+int main(int argc, char *argv[])
+{
+	FILE *f;
+	char buf;
+	char *line;
+	int status;
+	int ret = -1;
+	size_t len = 0;
+	int pid_max = 0;
+	uid_t uid = getuid();
+	char proc_path[100] = {0};
+	pid_t pid, ns1, ns2, ns3, ns_pid;
+	pid_t set_tid[MAX_PID_NS_LEVEL * 2];
+
+	if (pipe(pipe_1) < 0 || pipe(pipe_2) < 0)
+		ksft_exit_fail_msg("pipe() failed\n");
+
+	ksft_print_header();
+	ksft_set_plan(27);
+
+	f = fopen("/proc/sys/kernel/pid_max", "r");
+	if (f == NULL)
+		ksft_exit_fail_msg(
+			"%s - Could not open /proc/sys/kernel/pid_max\n",
+			strerror(errno));
+	fscanf(f, "%d", &pid_max);
+	fclose(f);
+	ksft_print_msg("/proc/sys/kernel/pid_max %d\n", pid_max);
+
+	/* Try invalid settings */
+	memset(&set_tid, 0, sizeof(set_tid));
+	test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL + 1, 0, -EINVAL, 0, 0);
+
+	test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 2, 0, -EINVAL, 0, 0);
+
+	test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 2 + 1, 0,
+			-EINVAL, 0, 0);
+
+	test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 42, 0, -EINVAL, 0, 0);
+
+	/*
+	 * This can actually work if this test running in a MAX_PID_NS_LEVEL - 1
+	 * nested PID namespace.
+	 */
+	test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL - 1, 0, -EINVAL, 0, 0);
+
+	memset(&set_tid, 0xff, sizeof(set_tid));
+	test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL + 1, 0, -EINVAL, 0, 0);
+
+	test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 2, 0, -EINVAL, 0, 0);
+
+	test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 2 + 1, 0,
+			-EINVAL, 0, 0);
+
+	test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 42, 0, -EINVAL, 0, 0);
+
+	/*
+	 * This can actually work if this test running in a MAX_PID_NS_LEVEL - 1
+	 * nested PID namespace.
+	 */
+	test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL - 1, 0, -EINVAL, 0, 0);
+
+	memset(&set_tid, 0, sizeof(set_tid));
+	/* Try with an invalid PID */
+	set_tid[0] = 0;
+	test_clone3_set_tid(set_tid, 1, 0, -EINVAL, 0, 0);
+
+	set_tid[0] = -1;
+	test_clone3_set_tid(set_tid, 1, 0, -EINVAL, 0, 0);
+
+	/* Claim that the set_tid array actually contains 2 elements. */
+	test_clone3_set_tid(set_tid, 2, 0, -EINVAL, 0, 0);
+
+	/* Try it in a new PID namespace */
+	if (uid == 0)
+		test_clone3_set_tid(set_tid, 1, CLONE_NEWPID, -EINVAL, 0, 0);
+	else
+		ksft_test_result_skip("Clone3() with set_tid requires root\n");
+
+	/* Try with a valid PID (1) this should return -EEXIST. */
+	set_tid[0] = 1;
+	if (uid == 0)
+		test_clone3_set_tid(set_tid, 1, 0, -EEXIST, 0, 0);
+	else
+		ksft_test_result_skip("Clone3() with set_tid requires root\n");
+
+	/* Try it in a new PID namespace */
+	if (uid == 0)
+		test_clone3_set_tid(set_tid, 1, CLONE_NEWPID, 0, 0, 0);
+	else
+		ksft_test_result_skip("Clone3() with set_tid requires root\n");
+
+	/* pid_max should fail everywhere */
+	set_tid[0] = pid_max;
+	test_clone3_set_tid(set_tid, 1, 0, -EINVAL, 0, 0);
+
+	if (uid == 0)
+		test_clone3_set_tid(set_tid, 1, CLONE_NEWPID, -EINVAL, 0, 0);
+	else
+		ksft_test_result_skip("Clone3() with set_tid requires root\n");
+
+	if (uid != 0) {
+		/*
+		 * All remaining tests require root. Tell the framework
+		 * that all those tests are skipped as non-root.
+		 */
+		ksft_cnt.ksft_xskip += ksft_plan - ksft_test_num();
+		goto out;
+	}
+
+	/* Find the current active PID */
+	pid = fork();
+	if (pid == 0) {
+		ksft_print_msg("Child has PID %d\n", getpid());
+		_exit(EXIT_SUCCESS);
+	}
+	if (waitpid(pid, &status, 0) < 0)
+		ksft_exit_fail_msg("Waiting for child %d failed", pid);
+
+	/* After the child has finished, its PID should be free. */
+	set_tid[0] = pid;
+	test_clone3_set_tid(set_tid, 1, 0, 0, 0, 0);
+
+	/* This should fail as there is no PID 1 in that namespace */
+	test_clone3_set_tid(set_tid, 1, CLONE_NEWPID, -EINVAL, 0, 0);
+
+	/*
+	 * Creating a process with PID 1 in the newly created most nested
+	 * PID namespace and PID 'pid' in the parent PID namespace. This
+	 * needs to work.
+	 */
+	set_tid[0] = 1;
+	set_tid[1] = pid;
+	test_clone3_set_tid(set_tid, 2, CLONE_NEWPID, 0, pid, 0);
+
+	ksft_print_msg("unshare PID namespace\n");
+	if (unshare(CLONE_NEWPID) == -1)
+		ksft_exit_fail_msg("unshare(CLONE_NEWPID) failed: %s\n",
+				strerror(errno));
+
+	set_tid[0] = pid;
+
+	/* This should fail as there is no PID 1 in that namespace */
+	test_clone3_set_tid(set_tid, 1, 0, -EINVAL, 0, 0);
+
+	/* Let's create a PID 1 */
+	ns_pid = fork();
+	if (ns_pid == 0) {
+		ksft_print_msg("Child in PID namespace has PID %d\n", getpid());
+		set_tid[0] = 2;
+		test_clone3_set_tid(set_tid, 1, 0, 0, 2, 0);
+
+		set_tid[0] = 1;
+		set_tid[1] = -1;
+		set_tid[2] = pid;
+		/* This should fail as there is invalid PID at level '1'. */
+		test_clone3_set_tid(set_tid, 3, CLONE_NEWPID, -EINVAL, 0, 0);
+
+		set_tid[0] = 1;
+		set_tid[1] = 42;
+		set_tid[2] = pid;
+		/*
+		 * This should fail as there are not enough active PID
+		 * namespaces. Again assuming this is running in the host's
+		 * PID namespace. Not yet nested.
+		 */
+		test_clone3_set_tid(set_tid, 4, CLONE_NEWPID, -EINVAL, 0, 0);
+
+		/*
+		 * This should work and from the parent we should see
+		 * something like 'NSpid:	pid	42	1'.
+		 */
+		test_clone3_set_tid(set_tid, 3, CLONE_NEWPID, 0, 42, true);
+
+		_exit(ksft_cnt.ksft_pass);
+	}
+
+	close(pipe_1[1]);
+	close(pipe_2[0]);
+	while (read(pipe_1[0], &buf, 1) > 0) {
+		ksft_print_msg("[%d] Child is ready and waiting\n", getpid());
+		break;
+	}
+
+	snprintf(proc_path, sizeof(proc_path), "/proc/%d/status", pid);
+	f = fopen(proc_path, "r");
+	if (f == NULL)
+		ksft_exit_fail_msg(
+			"%s - Could not open %s\n",
+			strerror(errno), proc_path);
+
+	while (getline(&line, &len, f) != -1) {
+		if (strstr(line, "NSpid")) {
+			int i;
+
+			/* Verify that all generated PIDs are as expected. */
+			i = sscanf(line, "NSpid:\t%d\t%d\t%d",
+				   &ns3, &ns2, &ns1);
+			if (i != 3) {
+				ksft_print_msg(
+					"Unexpected 'NSPid:' entry: %s",
+					line);
+				ns1 = ns2 = ns3 = 0;
+			}
+			break;
+		}
+	}
+	fclose(f);
+	free(line);
+	close(pipe_2[0]);
+
+	/* Tell the clone3()'d child to finish. */
+	write(pipe_2[1], &buf, 1);
+	close(pipe_2[1]);
+
+	if (waitpid(ns_pid, &status, 0) < 0) {
+		ksft_print_msg("Child returned %s\n", strerror(errno));
+		ret = -errno;
+		goto out;
+	}
+
+	if (!WIFEXITED(status))
+		ksft_test_result_fail("Child error\n");
+
+	if (WEXITSTATUS(status))
+		/*
+		 * Update the number of total tests with the tests from the
+		 * child processes.
+		 */
+		ksft_cnt.ksft_pass = WEXITSTATUS(status);
+
+	if (ns3 == pid && ns2 == 42 && ns1 == 1)
+		ksft_test_result_pass(
+			"PIDs in all namespaces as expected (%d,%d,%d)\n",
+			ns3, ns2, ns1);
+	else
+		ksft_test_result_fail(
+			"PIDs in all namespaces not as expected (%d,%d,%d)\n",
+			ns3, ns2, ns1);
+out:
+	ret = 0;
+
+	return !ret ? ksft_exit_pass() : ksft_exit_fail();
+}
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v11 1/2] fork: extend clone3() to support setting a PID
  2019-11-15 12:36 [PATCH v11 1/2] fork: extend clone3() to support setting a PID Adrian Reber
  2019-11-15 12:36 ` [PATCH v11 2/2] selftests: add tests for clone3() with *set_tid Adrian Reber
@ 2019-11-15 16:33 ` Oleg Nesterov
  2019-11-15 16:54 ` Dmitry Safonov
  2019-11-15 21:54 ` Andrei Vagin
  3 siblings, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2019-11-15 16:33 UTC (permalink / raw)
  To: Adrian Reber
  Cc: Christian Brauner, Eric Biederman, Pavel Emelyanov, Jann Horn,
	Dmitry Safonov, Andrei Vagin, Rasmus Villemoes, linux-kernel,
	Mike Rapoport, Radostin Stoyanov

On 11/15, Adrian Reber wrote:
>
> v11:
>  - abort alloc_pid() correctly if one of the PIDs specified in
>    set_pid[] is invalid (Andrei)
> ---
>  include/linux/pid.h           |  3 +-
>  include/linux/pid_namespace.h |  2 +
>  include/linux/sched/task.h    |  3 ++
>  include/uapi/linux/sched.h    | 53 +++++++++++++++++---------
>  kernel/fork.c                 | 24 +++++++++++-
>  kernel/pid.c                  | 72 +++++++++++++++++++++++++++--------
>  kernel/pid_namespace.c        |  2 -
>  7 files changed, 122 insertions(+), 37 deletions(-)

Reviewed-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v11 1/2] fork: extend clone3() to support setting a PID
  2019-11-15 12:36 [PATCH v11 1/2] fork: extend clone3() to support setting a PID Adrian Reber
  2019-11-15 12:36 ` [PATCH v11 2/2] selftests: add tests for clone3() with *set_tid Adrian Reber
  2019-11-15 16:33 ` [PATCH v11 1/2] fork: extend clone3() to support setting a PID Oleg Nesterov
@ 2019-11-15 16:54 ` Dmitry Safonov
  2019-11-15 17:02   ` Dmitry Safonov
  2019-11-15 22:51   ` Christian Brauner
  2019-11-15 21:54 ` Andrei Vagin
  3 siblings, 2 replies; 11+ messages in thread
From: Dmitry Safonov @ 2019-11-15 16:54 UTC (permalink / raw)
  To: Adrian Reber, Christian Brauner, Eric Biederman, Pavel Emelyanov,
	Jann Horn, Oleg Nesterov, Andrei Vagin, Rasmus Villemoes
  Cc: linux-kernel, Mike Rapoport, Radostin Stoyanov

On 11/15/19 12:36 PM, Adrian Reber wrote:
[..]
> Signed-off-by: Adrian Reber <areber@redhat.com>
> Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>

Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>

[though, I have 2 minor nits below]

[..]
> + * @set_tid:      Pointer to an array of type *pid_t. The size
> + *                of the array is defined using @set_tid_size.
> + *                This array is used select PIDs/TIDs for newly

/is used select/is used to select/s


[..]
> +	size_t set_tid_size;
> +	__aligned_u64 set_tid_size;

[..]
> +		.set_tid_size	= args.set_tid_size,

Is sizeof(size_t) == 32 on native 32-bit platforms?
Maybe `args.set_tid_size` should be checked?

Thanks,
          Dmitry

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v11 1/2] fork: extend clone3() to support setting a PID
  2019-11-15 16:54 ` Dmitry Safonov
@ 2019-11-15 17:02   ` Dmitry Safonov
  2019-11-15 22:51   ` Christian Brauner
  1 sibling, 0 replies; 11+ messages in thread
From: Dmitry Safonov @ 2019-11-15 17:02 UTC (permalink / raw)
  To: Adrian Reber, Christian Brauner, Eric Biederman, Pavel Emelyanov,
	Jann Horn, Oleg Nesterov, Andrei Vagin, Rasmus Villemoes
  Cc: linux-kernel, Mike Rapoport, Radostin Stoyanov

On 11/15/19 4:54 PM, Dmitry Safonov wrote:

> [..]
>> +	size_t set_tid_size;
>> +	__aligned_u64 set_tid_size;
> 
> [..]
>> +		.set_tid_size	= args.set_tid_size,
> 
> Is sizeof(size_t) == 32 on native 32-bit platforms?
> Maybe `args.set_tid_size` should be checked?

Nevermind, I missed that
+	if (unlikely(args.set_tid_size > MAX_PID_NS_LEVEL))
+		return -EINVAL;

is checked for `args' - it should be good enough.

Thanks,
          Dmitry

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v11 1/2] fork: extend clone3() to support setting a PID
  2019-11-15 12:36 [PATCH v11 1/2] fork: extend clone3() to support setting a PID Adrian Reber
                   ` (2 preceding siblings ...)
  2019-11-15 16:54 ` Dmitry Safonov
@ 2019-11-15 21:54 ` Andrei Vagin
  3 siblings, 0 replies; 11+ messages in thread
From: Andrei Vagin @ 2019-11-15 21:54 UTC (permalink / raw)
  To: Adrian Reber
  Cc: Christian Brauner, Eric Biederman, Pavel Emelyanov, Jann Horn,
	Oleg Nesterov, Dmitry Safonov, Rasmus Villemoes, linux-kernel,
	Mike Rapoport, Radostin Stoyanov

On Fri, Nov 15, 2019 at 01:36:20PM +0100, Adrian Reber wrote:
> 
> v11:
>  - abort alloc_pid() correctly if one of the PIDs specified in
>    set_pid[] is invalid (Andrei)
> ---
>  include/linux/pid.h           |  3 +-
>  include/linux/pid_namespace.h |  2 +
>  include/linux/sched/task.h    |  3 ++
>  include/uapi/linux/sched.h    | 53 +++++++++++++++++---------
>  kernel/fork.c                 | 24 +++++++++++-
>  kernel/pid.c                  | 72 +++++++++++++++++++++++++++--------
>  kernel/pid_namespace.c        |  2 -
>  7 files changed, 122 insertions(+), 37 deletions(-)

Acked-by: Andrei Vagin <avagin@gmail.com>

Thanks,
Andrei

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v11 2/2] selftests: add tests for clone3() with *set_tid
  2019-11-15 12:36 ` [PATCH v11 2/2] selftests: add tests for clone3() with *set_tid Adrian Reber
@ 2019-11-15 22:20   ` Andrei Vagin
  2019-11-15 22:53     ` Christian Brauner
  2019-11-18  1:46   ` Andrei Vagin
  1 sibling, 1 reply; 11+ messages in thread
From: Andrei Vagin @ 2019-11-15 22:20 UTC (permalink / raw)
  To: Adrian Reber
  Cc: Christian Brauner, Eric Biederman, Pavel Emelyanov, Jann Horn,
	Oleg Nesterov, Dmitry Safonov, Rasmus Villemoes, linux-kernel,
	Mike Rapoport, Radostin Stoyanov

[-- Attachment #1: Type: text/plain, Size: 1534 bytes --]

On Fri, Nov 15, 2019 at 01:36:21PM +0100, Adrian Reber wrote:
> This tests clone3() with *set_tid to see if all desired PIDs are working
> as expected. The tests are trying multiple invalid input parameters as
> well as creating processes while specifying a certain PID in multiple
> PID namespaces at the same time.
> 
> Additionally this moves common clone3() test code into clone3_selftests.h.
> 
> Signed-off-by: Adrian Reber <areber@redhat.com>
> ---
> v9:
>  - applied all changes from Christian's review (except using the
>    NSpid: parsing code from selftests/pidfd/pidfd_fdinfo_test.c)
> 
> v10:
>  - added even more '\n' and include file fixes (Christian)
> 
> v11:
>  - added more return code checking at multiple places (Andrei)
>  - also add set_tid/set_tid_size to internal struct (Andrei)

I think we can add a test case to trigger the issue what I found in the
previous version of the kernel patch. You can find my version of this
test case in the attached patch.

nit: we need to flush stdout and stderr buffers before calling the raw
clone3 syscall and _exit(). Otherwise, some log messages can be lost and
some of them can be printed twice.

To trigger this issue, you can run the test and redirect its output to
file or pipe:

$ ./clone3_set_tid | cat

I have attached the patch to address both these problems. It is a draft
version and may require some work.

Adrian and Christian, it is up to you to decide whether we want to
update the current patch or to fix this on top by a separate patch.

Thanks,
Andrei


[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 2586 bytes --]

diff --git a/tools/testing/selftests/clone3/Makefile b/tools/testing/selftests/clone3/Makefile
index cf976c732906..a7c7d8d15e1c 100644
--- a/tools/testing/selftests/clone3/Makefile
+++ b/tools/testing/selftests/clone3/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
-CFLAGS += -g -I../../../../usr/include/
+CFLAGS += -Wall -g -I../../../../usr/include/
 
 TEST_GEN_PROGS := clone3 clone3_clear_sighand clone3_set_tid
 
diff --git a/tools/testing/selftests/clone3/clone3_set_tid.c b/tools/testing/selftests/clone3/clone3_set_tid.c
index 3480e1c46983..ab1df5ce201f 100644
--- a/tools/testing/selftests/clone3/clone3_set_tid.c
+++ b/tools/testing/selftests/clone3/clone3_set_tid.c
@@ -30,6 +30,12 @@
 static int pipe_1[2];
 static int pipe_2[2];
 
+static void flush()
+{
+	fflush(stdout);
+	fflush(stderr);
+}
+
 static int call_clone3_set_tid(pid_t *set_tid,
 			       size_t set_tid_size,
 			       int flags,
@@ -46,6 +52,7 @@ static int call_clone3_set_tid(pid_t *set_tid,
 		.set_tid_size = set_tid_size,
 	};
 
+	flush();
 	pid = sys_clone3(&args, sizeof(struct clone_args));
 	if (pid < 0) {
 		ksft_print_msg("%s - Failed to create new process\n",
@@ -83,6 +90,7 @@ static int call_clone3_set_tid(pid_t *set_tid,
 			close(pipe_2[0]);
 		}
 
+		flush();
 		if (set_tid[0] != getpid())
 			_exit(EXIT_FAILURE);
 		_exit(exit_code);
@@ -153,7 +161,7 @@ int main(int argc, char *argv[])
 		ksft_exit_fail_msg("pipe() failed\n");
 
 	ksft_print_header();
-	ksft_set_plan(27);
+	ksft_set_plan(29);
 
 	f = fopen("/proc/sys/kernel/pid_max", "r");
 	if (f == NULL)
@@ -249,6 +257,7 @@ int main(int argc, char *argv[])
 	pid = fork();
 	if (pid == 0) {
 		ksft_print_msg("Child has PID %d\n", getpid());
+		flush();
 		_exit(EXIT_SUCCESS);
 	}
 	if (waitpid(pid, &status, 0) < 0)
@@ -283,6 +292,19 @@ int main(int argc, char *argv[])
 	/* Let's create a PID 1 */
 	ns_pid = fork();
 	if (ns_pid == 0) {
+		set_tid[0] = 43;
+		set_tid[1] = -1;
+		/*
+		 * This should fail as there are not enough active PID
+		 * namespaces. Again assuming this is running in the host's
+		 * PID namespace. Not yet nested.
+		 */
+		test_clone3_set_tid(set_tid, 2, 0, -EINVAL, 0, 0);
+
+		set_tid[0] = 43;
+		set_tid[1] = pid;
+		test_clone3_set_tid(set_tid, 2, 0, 0, 43, 0);
+
 		ksft_print_msg("Child in PID namespace has PID %d\n", getpid());
 		set_tid[0] = 2;
 		test_clone3_set_tid(set_tid, 1, 0, 0, 2, 0);
@@ -309,6 +331,8 @@ int main(int argc, char *argv[])
 		 */
 		test_clone3_set_tid(set_tid, 3, CLONE_NEWPID, 0, 42, true);
 
+
+		flush();
 		_exit(ksft_cnt.ksft_pass);
 	}
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v11 1/2] fork: extend clone3() to support setting a PID
  2019-11-15 16:54 ` Dmitry Safonov
  2019-11-15 17:02   ` Dmitry Safonov
@ 2019-11-15 22:51   ` Christian Brauner
  1 sibling, 0 replies; 11+ messages in thread
From: Christian Brauner @ 2019-11-15 22:51 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: Adrian Reber, Eric Biederman, Pavel Emelyanov, Jann Horn,
	Oleg Nesterov, Andrei Vagin, Rasmus Villemoes, linux-kernel,
	Mike Rapoport, Radostin Stoyanov

On Fri, Nov 15, 2019 at 04:54:06PM +0000, Dmitry Safonov wrote:
> On 11/15/19 12:36 PM, Adrian Reber wrote:
> [..]
> > Signed-off-by: Adrian Reber <areber@redhat.com>
> > Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
> 
> Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
> 
> [though, I have 2 minor nits below]
> 
> [..]
> > + * @set_tid:      Pointer to an array of type *pid_t. The size
> > + *                of the array is defined using @set_tid_size.
> > + *                This array is used select PIDs/TIDs for newly
> 
> /is used select/is used to select/s

I fixed this up while applying.

Thanks!
Christian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v11 2/2] selftests: add tests for clone3() with *set_tid
  2019-11-15 22:20   ` Andrei Vagin
@ 2019-11-15 22:53     ` Christian Brauner
  0 siblings, 0 replies; 11+ messages in thread
From: Christian Brauner @ 2019-11-15 22:53 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Adrian Reber, Eric Biederman, Pavel Emelyanov, Jann Horn,
	Oleg Nesterov, Dmitry Safonov, Rasmus Villemoes, linux-kernel,
	Mike Rapoport, Radostin Stoyanov

On Fri, Nov 15, 2019 at 02:20:18PM -0800, Andrei Vagin wrote:
> On Fri, Nov 15, 2019 at 01:36:21PM +0100, Adrian Reber wrote:
> > This tests clone3() with *set_tid to see if all desired PIDs are working
> > as expected. The tests are trying multiple invalid input parameters as
> > well as creating processes while specifying a certain PID in multiple
> > PID namespaces at the same time.
> > 
> > Additionally this moves common clone3() test code into clone3_selftests.h.
> > 
> > Signed-off-by: Adrian Reber <areber@redhat.com>
> > ---
> > v9:
> >  - applied all changes from Christian's review (except using the
> >    NSpid: parsing code from selftests/pidfd/pidfd_fdinfo_test.c)
> > 
> > v10:
> >  - added even more '\n' and include file fixes (Christian)
> > 
> > v11:
> >  - added more return code checking at multiple places (Andrei)
> >  - also add set_tid/set_tid_size to internal struct (Andrei)
> 
> I think we can add a test case to trigger the issue what I found in the
> previous version of the kernel patch. You can find my version of this
> test case in the attached patch.
> 
> nit: we need to flush stdout and stderr buffers before calling the raw
> clone3 syscall and _exit(). Otherwise, some log messages can be lost and
> some of them can be printed twice.
> 
> To trigger this issue, you can run the test and redirect its output to
> file or pipe:
> 
> $ ./clone3_set_tid | cat
> 
> I have attached the patch to address both these problems. It is a draft
> version and may require some work.
> 
> Adrian and Christian, it is up to you to decide whether we want to
> update the current patch or to fix this on top by a separate patch.

If you give me a proper commit with a commit message I'll put it on top
as another patch. :)

Christian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v11 2/2] selftests: add tests for clone3() with *set_tid
  2019-11-15 12:36 ` [PATCH v11 2/2] selftests: add tests for clone3() with *set_tid Adrian Reber
  2019-11-15 22:20   ` Andrei Vagin
@ 2019-11-18  1:46   ` Andrei Vagin
  2019-11-18  7:42     ` Christian Brauner
  1 sibling, 1 reply; 11+ messages in thread
From: Andrei Vagin @ 2019-11-18  1:46 UTC (permalink / raw)
  To: Adrian Reber
  Cc: Christian Brauner, Eric Biederman, Pavel Emelyanov, Jann Horn,
	Oleg Nesterov, Dmitry Safonov, Rasmus Villemoes, linux-kernel,
	Mike Rapoport, Radostin Stoyanov

On Fri, Nov 15, 2019 at 01:36:21PM +0100, Adrian Reber wrote:
> This tests clone3() with *set_tid to see if all desired PIDs are working
> as expected. The tests are trying multiple invalid input parameters as
> well as creating processes while specifying a certain PID in multiple
> PID namespaces at the same time.
> 
> Additionally this moves common clone3() test code into clone3_selftests.h.
> 
> Signed-off-by: Adrian Reber <areber@redhat.com>
> ---
> v9:
>  - applied all changes from Christian's review (except using the
>    NSpid: parsing code from selftests/pidfd/pidfd_fdinfo_test.c)
> 
> v10:
>  - added even more '\n' and include file fixes (Christian)
> 
> v11:
>  - added more return code checking at multiple places (Andrei)
>  - also add set_tid/set_tid_size to internal struct (Andrei)
> ---
>  tools/testing/selftests/clone3/.gitignore     |   1 +
>  tools/testing/selftests/clone3/Makefile       |   2 +-
>  tools/testing/selftests/clone3/clone3.c       |   8 +-
>  .../selftests/clone3/clone3_clear_sighand.c   |  20 +-
>  .../selftests/clone3/clone3_selftests.h       |  35 ++
>  .../testing/selftests/clone3/clone3_set_tid.c | 381 ++++++++++++++++++
>  6 files changed, 421 insertions(+), 26 deletions(-)
>  create mode 100644 tools/testing/selftests/clone3/clone3_selftests.h
>  create mode 100644 tools/testing/selftests/clone3/clone3_set_tid.c
> 
> diff --git a/tools/testing/selftests/clone3/.gitignore b/tools/testing/selftests/clone3/.gitignore
> index 2a30ae18b06e..0dc4f32c6cb8 100644
> --- a/tools/testing/selftests/clone3/.gitignore
> +++ b/tools/testing/selftests/clone3/.gitignore
> @@ -1,2 +1,3 @@
>  clone3
>  clone3_clear_sighand
> +clone3_set_tid
> diff --git a/tools/testing/selftests/clone3/Makefile b/tools/testing/selftests/clone3/Makefile
> index eb26eb793c80..cf976c732906 100644
> --- a/tools/testing/selftests/clone3/Makefile
> +++ b/tools/testing/selftests/clone3/Makefile
> @@ -1,6 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0
>  CFLAGS += -g -I../../../../usr/include/
>  
> -TEST_GEN_PROGS := clone3 clone3_clear_sighand
> +TEST_GEN_PROGS := clone3 clone3_clear_sighand clone3_set_tid
>  
>  include ../lib.mk
> diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c
> index 0f8a9ef40117..4669b3d418e7 100644
> --- a/tools/testing/selftests/clone3/clone3.c
> +++ b/tools/testing/selftests/clone3/clone3.c
> @@ -18,6 +18,7 @@
>  #include <sched.h>
>  
>  #include "../kselftest.h"
> +#include "clone3_selftests.h"
>  
>  /*
>   * Different sizes of struct clone_args
> @@ -35,11 +36,6 @@ enum test_mode {
>  	CLONE3_ARGS_INVAL_EXIT_SIGNAL_NSIG,
>  };
>  
> -static pid_t raw_clone(struct clone_args *args, size_t size)
> -{
> -	return syscall(__NR_clone3, args, size);
> -}
> -
>  static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
>  {
>  	struct clone_args args = {
> @@ -83,7 +79,7 @@ static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
>  
>  	memcpy(&args_ext.args, &args, sizeof(struct clone_args));
>  
> -	pid = raw_clone((struct clone_args *)&args_ext, size);
> +	pid = sys_clone3((struct clone_args *)&args_ext, size);
>  	if (pid < 0) {
>  		ksft_print_msg("%s - Failed to create new process\n",
>  				strerror(errno));
> diff --git a/tools/testing/selftests/clone3/clone3_clear_sighand.c b/tools/testing/selftests/clone3/clone3_clear_sighand.c
> index 0d957be1bdc5..456783ad19d6 100644
> --- a/tools/testing/selftests/clone3/clone3_clear_sighand.c
> +++ b/tools/testing/selftests/clone3/clone3_clear_sighand.c
> @@ -14,30 +14,12 @@
>  #include <sys/wait.h>
>  
>  #include "../kselftest.h"
> +#include "clone3_selftests.h"
>  
>  #ifndef CLONE_CLEAR_SIGHAND
>  #define CLONE_CLEAR_SIGHAND 0x100000000ULL
>  #endif
>  
> -#ifndef __NR_clone3
> -#define __NR_clone3 -1
> -struct clone_args {
> -	__aligned_u64 flags;
> -	__aligned_u64 pidfd;
> -	__aligned_u64 child_tid;
> -	__aligned_u64 parent_tid;
> -	__aligned_u64 exit_signal;
> -	__aligned_u64 stack;
> -	__aligned_u64 stack_size;
> -	__aligned_u64 tls;
> -};
> -#endif
> -
> -static pid_t sys_clone3(struct clone_args *args, size_t size)
> -{
> -	return syscall(__NR_clone3, args, size);
> -}
> -
>  static void test_clone3_supported(void)
>  {
>  	pid_t pid;
> diff --git a/tools/testing/selftests/clone3/clone3_selftests.h b/tools/testing/selftests/clone3/clone3_selftests.h
> new file mode 100644
> index 000000000000..1a270390766a
> --- /dev/null
> +++ b/tools/testing/selftests/clone3/clone3_selftests.h
> @@ -0,0 +1,35 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _CLONE3_SELFTESTS_H
> +#define _CLONE3_SELFTESTS_H
> +
> +#define _GNU_SOURCE
> +#include <sched.h>
> +#include <stdint.h>
> +#include <syscall.h>
> +#include <linux/types.h>
> +
> +#define ptr_to_u64(ptr) ((__u64)((uintptr_t)(ptr)))
> +
> +#ifndef __NR_clone3
> +#define __NR_clone3 -1
> +struct clone_args {
> +	__aligned_u64 flags;
> +	__aligned_u64 pidfd;
> +	__aligned_u64 child_tid;
> +	__aligned_u64 parent_tid;
> +	__aligned_u64 exit_signal;
> +	__aligned_u64 stack;
> +	__aligned_u64 stack_size;
> +	__aligned_u64 tls;
> +	__aligned_u64 set_tid;
> +	__aligned_u64 set_tid_size;
> +};
> +#endif
> +
> +static pid_t sys_clone3(struct clone_args *args, size_t size)
> +{
> +	return syscall(__NR_clone3, args, size);
> +}
> +
> +#endif /* _CLONE3_SELFTESTS_H */
> diff --git a/tools/testing/selftests/clone3/clone3_set_tid.c b/tools/testing/selftests/clone3/clone3_set_tid.c
> new file mode 100644
> index 000000000000..3480e1c46983
> --- /dev/null
> +++ b/tools/testing/selftests/clone3/clone3_set_tid.c
> @@ -0,0 +1,381 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Based on Christian Brauner's clone3() example.
> + * These tests are assuming to be running in the host's
> + * PID namespace.
> + */
> +
> +#define _GNU_SOURCE
> +#include <errno.h>
> +#include <linux/types.h>
> +#include <linux/sched.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <stdbool.h>
> +#include <sys/syscall.h>
> +#include <sys/types.h>
> +#include <sys/un.h>
> +#include <sys/wait.h>
> +#include <unistd.h>
> +#include <sched.h>
> +
> +#include "../kselftest.h"
> +#include "clone3_selftests.h"
> +
> +#ifndef MAX_PID_NS_LEVEL
> +#define MAX_PID_NS_LEVEL 32
> +#endif
> +
> +static int pipe_1[2];
> +static int pipe_2[2];
> +
> +static int call_clone3_set_tid(pid_t *set_tid,
> +			       size_t set_tid_size,
> +			       int flags,
> +			       int expected_pid,
> +			       bool wait_for_it)
> +{
> +	int status;
> +	pid_t pid = -1;
> +
> +	struct clone_args args = {
> +		.flags = flags,
> +		.exit_signal = SIGCHLD,
> +		.set_tid = ptr_to_u64(set_tid),
> +		.set_tid_size = set_tid_size,
> +	};
> +
> +	pid = sys_clone3(&args, sizeof(struct clone_args));
> +	if (pid < 0) {
> +		ksft_print_msg("%s - Failed to create new process\n",
> +			       strerror(errno));
> +		return -errno;
> +	}
> +
> +	if (pid == 0) {
> +		int ret;
> +		char tmp = 0;
> +		int exit_code = EXIT_SUCCESS;
> +
> +		ksft_print_msg("I am the child, my PID is %d (expected %d)\n",
> +			       getpid(), set_tid[0]);
> +		if (wait_for_it) {
> +			ksft_print_msg("[%d] Child is ready and waiting\n",
> +				       getpid());
> +
> +			/* Signal the parent that the child is ready */
> +			close(pipe_1[0]);
> +			ret = write(pipe_1[1], &tmp, 1);
> +			if (ret != 1) {
> +				ksft_print_msg(
> +					"Writing to pipe returned %d", ret);
> +				exit_code = EXIT_FAILURE;
> +			}
> +			close(pipe_1[1]);
> +			close(pipe_2[1]);
> +			ret = read(pipe_2[0], &tmp, 1);
> +			if (ret != 1) {
> +				ksft_print_msg(
> +					"Reading from pipe returned %d", ret);
> +				exit_code = EXIT_FAILURE;
> +			}
> +			close(pipe_2[0]);
> +		}
> +
> +		if (set_tid[0] != getpid())
> +			_exit(EXIT_FAILURE);
> +		_exit(exit_code);
> +	}
> +
> +	if (expected_pid == 0 || expected_pid == pid) {
> +		ksft_print_msg("I am the parent (%d). My child's pid is %d\n",
> +			       getpid(), pid);
> +	} else {
> +		ksft_print_msg(
> +			"Expected child pid %d does not match actual pid %d\n",
> +			expected_pid, pid);
> +		return -1;
> +	}
> +
> +	if (waitpid(pid, &status, 0) < 0) {
> +		ksft_print_msg("Child returned %s\n", strerror(errno));
> +		return -errno;
> +	}
> +
> +	if (!WIFEXITED(status))
> +		return -1;
> +
> +	return WEXITSTATUS(status);
> +}
> +
> +static void test_clone3_set_tid(pid_t *set_tid,
> +				size_t set_tid_size,
> +				int flags,
> +				int expected,
> +				int expected_pid,
> +				bool wait_for_it)
> +{
> +	int ret;
> +
> +	ksft_print_msg(
> +		"[%d] Trying clone3() with CLONE_SET_TID to %d and 0x%x\n",
> +		getpid(), set_tid[0], flags);
> +	ret = call_clone3_set_tid(set_tid, set_tid_size, flags, expected_pid,
> +				  wait_for_it);
> +	ksft_print_msg(
> +		"[%d] clone3() with CLONE_SET_TID %d says :%d - expected %d\n",
> +		getpid(), set_tid[0], ret, expected);
> +	if (ret != expected)
> +		ksft_test_result_fail(
> +			"[%d] Result (%d) is different than expected (%d)\n",
> +			getpid(), ret, expected);
> +	else
> +		ksft_test_result_pass(
> +			"[%d] Result (%d) matches expectation (%d)\n",
> +			getpid(), ret, expected);
> +}
> +int main(int argc, char *argv[])
> +{
> +	FILE *f;
> +	char buf;
> +	char *line;
> +	int status;
> +	int ret = -1;
> +	size_t len = 0;
> +	int pid_max = 0;
> +	uid_t uid = getuid();
> +	char proc_path[100] = {0};
> +	pid_t pid, ns1, ns2, ns3, ns_pid;
> +	pid_t set_tid[MAX_PID_NS_LEVEL * 2];
> +
> +	if (pipe(pipe_1) < 0 || pipe(pipe_2) < 0)
> +		ksft_exit_fail_msg("pipe() failed\n");
> +
> +	ksft_print_header();
> +	ksft_set_plan(27);
> +
> +	f = fopen("/proc/sys/kernel/pid_max", "r");
> +	if (f == NULL)
> +		ksft_exit_fail_msg(
> +			"%s - Could not open /proc/sys/kernel/pid_max\n",
> +			strerror(errno));
> +	fscanf(f, "%d", &pid_max);
> +	fclose(f);
> +	ksft_print_msg("/proc/sys/kernel/pid_max %d\n", pid_max);
> +
> +	/* Try invalid settings */
> +	memset(&set_tid, 0, sizeof(set_tid));
> +	test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL + 1, 0, -EINVAL, 0, 0);
> +
> +	test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 2, 0, -EINVAL, 0, 0);
> +
> +	test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 2 + 1, 0,
> +			-EINVAL, 0, 0);
> +
> +	test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 42, 0, -EINVAL, 0, 0);
> +
> +	/*
> +	 * This can actually work if this test running in a MAX_PID_NS_LEVEL - 1
> +	 * nested PID namespace.
> +	 */
> +	test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL - 1, 0, -EINVAL, 0, 0);
> +
> +	memset(&set_tid, 0xff, sizeof(set_tid));
> +	test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL + 1, 0, -EINVAL, 0, 0);
> +
> +	test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 2, 0, -EINVAL, 0, 0);
> +
> +	test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 2 + 1, 0,
> +			-EINVAL, 0, 0);
> +
> +	test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 42, 0, -EINVAL, 0, 0);
> +
> +	/*
> +	 * This can actually work if this test running in a MAX_PID_NS_LEVEL - 1
> +	 * nested PID namespace.
> +	 */
> +	test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL - 1, 0, -EINVAL, 0, 0);
> +
> +	memset(&set_tid, 0, sizeof(set_tid));
> +	/* Try with an invalid PID */
> +	set_tid[0] = 0;
> +	test_clone3_set_tid(set_tid, 1, 0, -EINVAL, 0, 0);
> +
> +	set_tid[0] = -1;
> +	test_clone3_set_tid(set_tid, 1, 0, -EINVAL, 0, 0);
> +
> +	/* Claim that the set_tid array actually contains 2 elements. */
> +	test_clone3_set_tid(set_tid, 2, 0, -EINVAL, 0, 0);
> +
> +	/* Try it in a new PID namespace */
> +	if (uid == 0)
> +		test_clone3_set_tid(set_tid, 1, CLONE_NEWPID, -EINVAL, 0, 0);
> +	else
> +		ksft_test_result_skip("Clone3() with set_tid requires root\n");
> +
> +	/* Try with a valid PID (1) this should return -EEXIST. */
> +	set_tid[0] = 1;
> +	if (uid == 0)
> +		test_clone3_set_tid(set_tid, 1, 0, -EEXIST, 0, 0);
> +	else
> +		ksft_test_result_skip("Clone3() with set_tid requires root\n");
> +
> +	/* Try it in a new PID namespace */
> +	if (uid == 0)
> +		test_clone3_set_tid(set_tid, 1, CLONE_NEWPID, 0, 0, 0);
> +	else
> +		ksft_test_result_skip("Clone3() with set_tid requires root\n");
> +
> +	/* pid_max should fail everywhere */
> +	set_tid[0] = pid_max;
> +	test_clone3_set_tid(set_tid, 1, 0, -EINVAL, 0, 0);
> +
> +	if (uid == 0)
> +		test_clone3_set_tid(set_tid, 1, CLONE_NEWPID, -EINVAL, 0, 0);
> +	else
> +		ksft_test_result_skip("Clone3() with set_tid requires root\n");
> +
> +	if (uid != 0) {
> +		/*
> +		 * All remaining tests require root. Tell the framework
> +		 * that all those tests are skipped as non-root.
> +		 */
> +		ksft_cnt.ksft_xskip += ksft_plan - ksft_test_num();
> +		goto out;
> +	}
> +
> +	/* Find the current active PID */
> +	pid = fork();
> +	if (pid == 0) {
> +		ksft_print_msg("Child has PID %d\n", getpid());
> +		_exit(EXIT_SUCCESS);
> +	}
> +	if (waitpid(pid, &status, 0) < 0)
> +		ksft_exit_fail_msg("Waiting for child %d failed", pid);
> +
> +	/* After the child has finished, its PID should be free. */
> +	set_tid[0] = pid;
> +	test_clone3_set_tid(set_tid, 1, 0, 0, 0, 0);
> +
> +	/* This should fail as there is no PID 1 in that namespace */
> +	test_clone3_set_tid(set_tid, 1, CLONE_NEWPID, -EINVAL, 0, 0);
> +
> +	/*
> +	 * Creating a process with PID 1 in the newly created most nested
> +	 * PID namespace and PID 'pid' in the parent PID namespace. This
> +	 * needs to work.
> +	 */
> +	set_tid[0] = 1;
> +	set_tid[1] = pid;
> +	test_clone3_set_tid(set_tid, 2, CLONE_NEWPID, 0, pid, 0);
> +
> +	ksft_print_msg("unshare PID namespace\n");
> +	if (unshare(CLONE_NEWPID) == -1)
> +		ksft_exit_fail_msg("unshare(CLONE_NEWPID) failed: %s\n",
> +				strerror(errno));
> +
> +	set_tid[0] = pid;
> +
> +	/* This should fail as there is no PID 1 in that namespace */
> +	test_clone3_set_tid(set_tid, 1, 0, -EINVAL, 0, 0);
> +
> +	/* Let's create a PID 1 */
> +	ns_pid = fork();
> +	if (ns_pid == 0) {
> +		ksft_print_msg("Child in PID namespace has PID %d\n", getpid());
> +		set_tid[0] = 2;
> +		test_clone3_set_tid(set_tid, 1, 0, 0, 2, 0);
> +
> +		set_tid[0] = 1;
> +		set_tid[1] = -1;
> +		set_tid[2] = pid;
> +		/* This should fail as there is invalid PID at level '1'. */
> +		test_clone3_set_tid(set_tid, 3, CLONE_NEWPID, -EINVAL, 0, 0);
> +
> +		set_tid[0] = 1;
> +		set_tid[1] = 42;
> +		set_tid[2] = pid;
> +		/*
> +		 * This should fail as there are not enough active PID
> +		 * namespaces. Again assuming this is running in the host's
> +		 * PID namespace. Not yet nested.
> +		 */
> +		test_clone3_set_tid(set_tid, 4, CLONE_NEWPID, -EINVAL, 0, 0);
> +
> +		/*
> +		 * This should work and from the parent we should see
> +		 * something like 'NSpid:	pid	42	1'.
> +		 */
> +		test_clone3_set_tid(set_tid, 3, CLONE_NEWPID, 0, 42, true);
> +
> +		_exit(ksft_cnt.ksft_pass);
> +	}
> +
> +	close(pipe_1[1]);
> +	close(pipe_2[0]);
> +	while (read(pipe_1[0], &buf, 1) > 0) {
> +		ksft_print_msg("[%d] Child is ready and waiting\n", getpid());
> +		break;
> +	}
> +
> +	snprintf(proc_path, sizeof(proc_path), "/proc/%d/status", pid);
> +	f = fopen(proc_path, "r");
> +	if (f == NULL)
> +		ksft_exit_fail_msg(
> +			"%s - Could not open %s\n",
> +			strerror(errno), proc_path);
> +
> +	while (getline(&line, &len, f) != -1) {
> +		if (strstr(line, "NSpid")) {
> +			int i;
> +
> +			/* Verify that all generated PIDs are as expected. */
> +			i = sscanf(line, "NSpid:\t%d\t%d\t%d",
> +				   &ns3, &ns2, &ns1);
> +			if (i != 3) {
> +				ksft_print_msg(
> +					"Unexpected 'NSPid:' entry: %s",
> +					line);
> +				ns1 = ns2 = ns3 = 0;
> +			}
> +			break;
> +		}
> +	}
> +	fclose(f);
> +	free(line);
> +	close(pipe_2[0]);
> +
> +	/* Tell the clone3()'d child to finish. */
> +	write(pipe_2[1], &buf, 1);
> +	close(pipe_2[1]);
> +
> +	if (waitpid(ns_pid, &status, 0) < 0) {
> +		ksft_print_msg("Child returned %s\n", strerror(errno));
> +		ret = -errno;
> +		goto out;
> +	}
> +
> +	if (!WIFEXITED(status))
> +		ksft_test_result_fail("Child error\n");
> +
> +	if (WEXITSTATUS(status))
> +		/*
> +		 * Update the number of total tests with the tests from the
> +		 * child processes.
> +		 */
> +		ksft_cnt.ksft_pass = WEXITSTATUS(status);

I just found that accounting of failed test cases in this test doesn't
work properly. And if one of the test cases fails in a child process,
the whole test will have the pass status.


For example, if we add a fake fail:

diff --git a/tools/testing/selftests/clone3/clone3_set_tid.c b/tools/testing/selftests/clone3/clone3_set_tid.c
index 3480e1c46..82c99c42f 100644
--- a/tools/testing/selftests/clone3/clone3_set_tid.c
+++ b/tools/testing/selftests/clone3/clone3_set_tid.c
@@ -301,7 +301,7 @@ int main(int argc, char *argv[])
                 * namespaces. Again assuming this is running in the host's
                 * PID namespace. Not yet nested.
                 */
-               test_clone3_set_tid(set_tid, 4, CLONE_NEWPID, -EINVAL, 0, 0);
+               test_clone3_set_tid(set_tid, 4, CLONE_NEWPID, -EPERM, 0, 0);
 
                /*
                 * This should work and from the parent we should see

$ make run_tests 
....
# ok 21 [21104] Result (0) matches expectation (0)
# # unshare PID namespace
# # [21104] Trying clone3() with CLONE_SET_TID to 21106 and 0x0
# # Invalid argument - Failed to create new process
# # [21104] clone3() with CLONE_SET_TID 21106 says :-22 - expected -22
# ok 22 [21104] Result (-22) matches expectation (-22)
# # [21104] Child is ready and waiting
# ok 26 PIDs in all namespaces as expected (21106,42,1)
# # Planned tests != run tests (27 != 26)
# # Pass 26 Fail 0 Xfail 0 Xpass 0 Skip 0 Error 0
ok 3 selftests: clone3: clone3_set_tid


> +
> +	if (ns3 == pid && ns2 == 42 && ns1 == 1)
> +		ksft_test_result_pass(
> +			"PIDs in all namespaces as expected (%d,%d,%d)\n",
> +			ns3, ns2, ns1);
> +	else
> +		ksft_test_result_fail(
> +			"PIDs in all namespaces not as expected (%d,%d,%d)\n",
> +			ns3, ns2, ns1);
> +out:
> +	ret = 0;
> +
> +	return !ret ? ksft_exit_pass() : ksft_exit_fail();
> +}
> -- 
> 2.23.0
> 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v11 2/2] selftests: add tests for clone3() with *set_tid
  2019-11-18  1:46   ` Andrei Vagin
@ 2019-11-18  7:42     ` Christian Brauner
  0 siblings, 0 replies; 11+ messages in thread
From: Christian Brauner @ 2019-11-18  7:42 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Adrian Reber, Eric Biederman, Pavel Emelyanov, Jann Horn,
	Oleg Nesterov, Dmitry Safonov, Rasmus Villemoes, linux-kernel,
	Mike Rapoport, Radostin Stoyanov

On Sun, Nov 17, 2019 at 05:46:42PM -0800, Andrei Vagin wrote:
> On Fri, Nov 15, 2019 at 01:36:21PM +0100, Adrian Reber wrote:
> > +	if (!WIFEXITED(status))
> > +		ksft_test_result_fail("Child error\n");
> > +
> > +	if (WEXITSTATUS(status))
> > +		/*
> > +		 * Update the number of total tests with the tests from the
> > +		 * child processes.
> > +		 */
> > +		ksft_cnt.ksft_pass = WEXITSTATUS(status);
> 
> I just found that accounting of failed test cases in this test doesn't
> work properly. And if one of the test cases fails in a child process,
> the whole test will have the pass status.
> 
> 
> For example, if we add a fake fail:
> 
> diff --git a/tools/testing/selftests/clone3/clone3_set_tid.c b/tools/testing/selftests/clone3/clone3_set_tid.c
> index 3480e1c46..82c99c42f 100644
> --- a/tools/testing/selftests/clone3/clone3_set_tid.c
> +++ b/tools/testing/selftests/clone3/clone3_set_tid.c
> @@ -301,7 +301,7 @@ int main(int argc, char *argv[])
>                  * namespaces. Again assuming this is running in the host's
>                  * PID namespace. Not yet nested.
>                  */
> -               test_clone3_set_tid(set_tid, 4, CLONE_NEWPID, -EINVAL, 0, 0);
> +               test_clone3_set_tid(set_tid, 4, CLONE_NEWPID, -EPERM, 0, 0);
>  
>                 /*
>                  * This should work and from the parent we should see
> 
> $ make run_tests 
> ....
> # ok 21 [21104] Result (0) matches expectation (0)
> # # unshare PID namespace
> # # [21104] Trying clone3() with CLONE_SET_TID to 21106 and 0x0
> # # Invalid argument - Failed to create new process
> # # [21104] clone3() with CLONE_SET_TID 21106 says :-22 - expected -22
> # ok 22 [21104] Result (-22) matches expectation (-22)
> # # [21104] Child is ready and waiting
> # ok 26 PIDs in all namespaces as expected (21106,42,1)
> # # Planned tests != run tests (27 != 26)
> # # Pass 26 Fail 0 Xfail 0 Xpass 0 Skip 0 Error 0
> ok 3 selftests: clone3: clone3_set_tid

Thanks for reporting this.
With your patch series you just sent this problem should be addressed.

Thanks!
Christian

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-11-18  7:42 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-15 12:36 [PATCH v11 1/2] fork: extend clone3() to support setting a PID Adrian Reber
2019-11-15 12:36 ` [PATCH v11 2/2] selftests: add tests for clone3() with *set_tid Adrian Reber
2019-11-15 22:20   ` Andrei Vagin
2019-11-15 22:53     ` Christian Brauner
2019-11-18  1:46   ` Andrei Vagin
2019-11-18  7:42     ` Christian Brauner
2019-11-15 16:33 ` [PATCH v11 1/2] fork: extend clone3() to support setting a PID Oleg Nesterov
2019-11-15 16:54 ` Dmitry Safonov
2019-11-15 17:02   ` Dmitry Safonov
2019-11-15 22:51   ` Christian Brauner
2019-11-15 21:54 ` Andrei Vagin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).