linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Introduce CABA helper process tree
@ 2022-06-10 16:32 Pavel Tikhomirov
  2022-06-10 16:32 ` [PATCH 1/2] Add CABA tree to task_struct Pavel Tikhomirov
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Pavel Tikhomirov @ 2022-06-10 16:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Pavel Tikhomirov, Eric Biederman, Kees Cook, Alexander Viro,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Andrew Morton,
	linux-ia64, linux-mm, linux-fsdevel

Please see "Add CABA tree to task_struct" for deeper explanation, and
"tests: Add CABA selftest" for a small test and an actual case for which
we might need CABA.

Probably the original problem of restoring process tree with complex
sessions can be resolved by allowing sessions copying, like we do for
process group, but I'm not sure if that would be too secure to do it,
and if there would not be another similar resource in future.

We can use CABA not only for CRIU for restoring processes, in normal
life when processes detach CABA will help to understand from which place
in process tree they were originally started from sshd/crond or
something else.

Hope my idea is not completely insane =)

CC: Eric Biederman <ebiederm@xmission.com>
CC: Kees Cook <keescook@chromium.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Ingo Molnar <mingo@redhat.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Juri Lelli <juri.lelli@redhat.com>
CC: Vincent Guittot <vincent.guittot@linaro.org>
CC: Dietmar Eggemann <dietmar.eggemann@arm.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ben Segall <bsegall@google.com>
CC: Mel Gorman <mgorman@suse.de>
CC: Daniel Bristot de Oliveira <bristot@redhat.com>
CC: Valentin Schneider <vschneid@redhat.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: linux-ia64@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-mm@kvack.org
CC: linux-fsdevel@vger.kernel.org

Pavel Tikhomirov (2):
  Add CABA tree to task_struct
  tests: Add CABA selftest

 arch/ia64/kernel/mca.c                   |   3 +
 fs/exec.c                                |   1 +
 fs/proc/array.c                          |  18 +
 include/linux/sched.h                    |   7 +
 init/init_task.c                         |   3 +
 kernel/exit.c                            |  50 ++-
 kernel/fork.c                            |   4 +
 tools/testing/selftests/Makefile         |   1 +
 tools/testing/selftests/caba/.gitignore  |   1 +
 tools/testing/selftests/caba/Makefile    |   7 +
 tools/testing/selftests/caba/caba_test.c | 501 +++++++++++++++++++++++
 tools/testing/selftests/caba/config      |   1 +
 12 files changed, 591 insertions(+), 6 deletions(-)
 create mode 100644 tools/testing/selftests/caba/.gitignore
 create mode 100644 tools/testing/selftests/caba/Makefile
 create mode 100644 tools/testing/selftests/caba/caba_test.c
 create mode 100644 tools/testing/selftests/caba/config

-- 
2.35.3


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] Add CABA tree to task_struct
  2022-06-10 16:32 [PATCH 0/2] Introduce CABA helper process tree Pavel Tikhomirov
@ 2022-06-10 16:32 ` Pavel Tikhomirov
  2022-06-10 21:02   ` kernel test robot
  2022-06-10 16:32 ` [PATCH 2/2] tests: Add CABA selftest Pavel Tikhomirov
  2022-06-10 16:38 ` [PATCH 0/2] Introduce CABA helper process tree Pavel Tikhomirov
  2 siblings, 1 reply; 5+ messages in thread
From: Pavel Tikhomirov @ 2022-06-10 16:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Pavel Tikhomirov, Eric Biederman, Kees Cook, Alexander Viro,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Andrew Morton,
	linux-ia64, linux-mm, linux-fsdevel

In linux after parent (father) process dies, children processes are
moved (reparented) to a reaper process. Roughly speaking:

1) If father has other yet alive thread, this thread would be a reaper.

2) Else if there is father's ancestor (with no pidns level change in the
middle), which has PR_SET_CHILD_SUBREAPER set, this ancestor would be a
reaper.

3) Else father's pidns init would be a reaper for fathers children.

The problem with this for CRIU is that when CRIU comes to dump processes
it does not know the order in which processes and their resources were
created. And processes can have resources which a) can only be inherited
when we clone processes, b) can only be created by specific processes
and c) are shared between several processes (the example of such a
resource is process session). For such resources CRIU restore would need
to re-invent such order of process creation which at the same time
creates the desired process tree topology and allows to inherit all
resources right.

When process reparenting involves child-sub-reapers one can drastically
mix processes in process tree so that it is not obvious how to restore
everything right.

So this is what we came up with to help CRIU to overcome this problem:

CABA = Closest Alive Born Ancestor
CABD = Closest Alive Born Descendant

We want to put processes in one more tree - CABA tree. This tree is not
affecting reparenting or process creation in any way except for
providing a new information to CRIU so that it can understand from where
the reparented child had reparented, though original father is already
dead and probably a fathers father too, we can still have information
about the process which is still alive and was originally a parent of
process sequence (of already dead processes) which lead to us - CABA.

CC: Eric Biederman <ebiederm@xmission.com>
CC: Kees Cook <keescook@chromium.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Ingo Molnar <mingo@redhat.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Juri Lelli <juri.lelli@redhat.com>
CC: Vincent Guittot <vincent.guittot@linaro.org>
CC: Dietmar Eggemann <dietmar.eggemann@arm.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ben Segall <bsegall@google.com>
CC: Mel Gorman <mgorman@suse.de>
CC: Daniel Bristot de Oliveira <bristot@redhat.com>
CC: Valentin Schneider <vschneid@redhat.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: linux-ia64@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-mm@kvack.org
CC: linux-fsdevel@vger.kernel.org

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
---
 arch/ia64/kernel/mca.c |  3 +++
 fs/exec.c              |  1 +
 fs/proc/array.c        | 18 +++++++++++++++
 include/linux/sched.h  |  7 ++++++
 init/init_task.c       |  3 +++
 kernel/exit.c          | 50 +++++++++++++++++++++++++++++++++++++-----
 kernel/fork.c          |  4 ++++
 7 files changed, 80 insertions(+), 6 deletions(-)

diff --git a/arch/ia64/kernel/mca.c b/arch/ia64/kernel/mca.c
index c62a66710ad6..74bf75fef9df 100644
--- a/arch/ia64/kernel/mca.c
+++ b/arch/ia64/kernel/mca.c
@@ -1793,6 +1793,9 @@ format_mca_init_stack(void *mca_data, unsigned long offset,
 	p->parent = p->real_parent = p->group_leader = p;
 	INIT_LIST_HEAD(&p->children);
 	INIT_LIST_HEAD(&p->sibling);
+	p->caba = p->real_parent;
+	INIT_LIST_HEAD(&p->cabds);
+	INIT_LIST_HEAD(&p->cabd);
 	strncpy(p->comm, type, sizeof(p->comm)-1);
 }
 
diff --git a/fs/exec.c b/fs/exec.c
index 0989fb8472a1..23e48db6c5b1 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1136,6 +1136,7 @@ static int de_thread(struct task_struct *tsk)
 
 		list_replace_rcu(&leader->tasks, &tsk->tasks);
 		list_replace_init(&leader->sibling, &tsk->sibling);
+		list_replace_init(&leader->cabd, &tsk->cabd);
 
 		tsk->group_leader = tsk;
 		leader->group_leader = tsk;
diff --git a/fs/proc/array.c b/fs/proc/array.c
index eb815759842c..6c43a8d64f65 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -151,11 +151,26 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
 	const struct cred *cred;
 	pid_t ppid, tpid = 0, tgid, ngid;
 	unsigned int max_fds = 0;
+	struct task_struct *caba;
+	struct pid *caba_pid;
+	int caba_level = 0;
+	pid_t caba_pids[MAX_PID_NS_LEVEL] = {};
 
 	rcu_read_lock();
 	ppid = pid_alive(p) ?
 		task_tgid_nr_ns(rcu_dereference(p->real_parent), ns) : 0;
 
+#ifdef CONFIG_PID_NS
+	caba = rcu_dereference(p->caba);
+	caba_pid = get_task_pid(caba, PIDTYPE_PID);
+	if (caba_pid) {
+		caba_level = caba_pid->level;
+		for (g = ns->level; g <= caba_level; g++)
+			caba_pids[g] = task_pid_nr_ns(caba, caba_pid->numbers[g].ns);
+		put_pid(caba_pid);
+	}
+#endif
+
 	tracer = ptrace_parent(p);
 	if (tracer)
 		tpid = task_pid_nr_ns(tracer, ns);
@@ -214,6 +229,9 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
 	seq_puts(m, "\nNSsid:");
 	for (g = ns->level; g <= pid->level; g++)
 		seq_put_decimal_ull(m, "\t", task_session_nr_ns(p, pid->numbers[g].ns));
+	seq_puts(m, "\nNScaba:");
+	for (g = ns->level; g <= caba_level; g++)
+		seq_put_decimal_ull(m, "\t", caba_pids[g]);
 #endif
 	seq_putc(m, '\n');
 }
diff --git a/include/linux/sched.h b/include/linux/sched.h
index c46f3a63b758..358af0cf8f73 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -973,6 +973,13 @@ struct task_struct {
 	struct list_head		sibling;
 	struct task_struct		*group_leader;
 
+	/* Closest Alive Born Ancestor process: */
+	struct task_struct __rcu	*caba;
+
+	/* Closest Alive Born Descendants list: */
+	struct list_head		cabds;
+	struct list_head		cabd;
+
 	/*
 	 * 'ptraced' is the list of tasks this task is using ptrace() on.
 	 *
diff --git a/init/init_task.c b/init/init_task.c
index 73cc8f03511a..a0b206dd74ef 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -109,6 +109,9 @@ struct task_struct init_task
 	.children	= LIST_HEAD_INIT(init_task.children),
 	.sibling	= LIST_HEAD_INIT(init_task.sibling),
 	.group_leader	= &init_task,
+	.caba		= &init_task,
+	.cabds		= LIST_HEAD_INIT(init_task.cabds),
+	.cabd		= LIST_HEAD_INIT(init_task.cabd),
 	RCU_POINTER_INITIALIZER(real_cred, &init_cred),
 	RCU_POINTER_INITIALIZER(cred, &init_cred),
 	.comm		= INIT_TASK_COMM,
diff --git a/kernel/exit.c b/kernel/exit.c
index f072959fcab7..5eae2ff93576 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -82,6 +82,7 @@ static void __unhash_process(struct task_struct *p, bool group_dead)
 
 		list_del_rcu(&p->tasks);
 		list_del_init(&p->sibling);
+		list_del_init(&p->cabd);
 		__this_cpu_dec(process_counts);
 	}
 	list_del_rcu(&p->thread_group);
@@ -562,11 +563,11 @@ static struct task_struct *find_child_reaper(struct task_struct *father,
  * 3. give it to the init process (PID 1) in our pid namespace
  */
 static struct task_struct *find_new_reaper(struct task_struct *father,
-					   struct task_struct *child_reaper)
+					   struct task_struct *child_reaper,
+					   struct task_struct *thread)
 {
-	struct task_struct *thread, *reaper;
+	struct task_struct *reaper;
 
-	thread = find_alive_thread(father);
 	if (thread)
 		return thread;
 
@@ -620,6 +621,31 @@ static void reparent_leader(struct task_struct *father, struct task_struct *p,
 	kill_orphaned_pgrp(p, father);
 }
 
+static struct task_struct *find_new_caba(struct task_struct *father,
+					 struct task_struct *thread)
+{
+	struct task_struct *caba;
+
+	if (thread)
+		return thread;
+
+	caba = father->caba;
+	while (1) {
+		if (caba == &init_task)
+			break;
+		if (WARN_ON_ONCE(caba->caba == caba))
+			break;
+
+		thread = find_alive_thread(caba);
+		if (thread)
+			return thread;
+
+		caba = caba->caba;
+	}
+
+	return caba;
+}
+
 /*
  * This does two things:
  *
@@ -631,17 +657,19 @@ static void reparent_leader(struct task_struct *father, struct task_struct *p,
 static void forget_original_parent(struct task_struct *father,
 					struct list_head *dead)
 {
-	struct task_struct *p, *t, *reaper;
+	struct task_struct *p, *t, *reaper, *thread, *caba;
 
 	if (unlikely(!list_empty(&father->ptraced)))
 		exit_ptrace(father, dead);
 
 	/* Can drop and reacquire tasklist_lock */
 	reaper = find_child_reaper(father, dead);
+	thread = find_alive_thread(father);
+
 	if (list_empty(&father->children))
-		return;
+		goto caba;
 
-	reaper = find_new_reaper(father, reaper);
+	reaper = find_new_reaper(father, reaper, thread);
 	list_for_each_entry(p, &father->children, sibling) {
 		for_each_thread(p, t) {
 			RCU_INIT_POINTER(t->real_parent, reaper);
@@ -661,6 +689,16 @@ static void forget_original_parent(struct task_struct *father,
 			reparent_leader(father, p, dead);
 	}
 	list_splice_tail_init(&father->children, &reaper->children);
+caba:
+	if (list_empty(&father->cabds))
+		return;
+
+	caba = find_new_caba(father, thread);
+	list_for_each_entry(p, &father->cabds, cabd) {
+		for_each_thread(p, t)
+			RCU_INIT_POINTER(t->caba, caba);
+	}
+	list_splice_tail_init(&father->cabds, &caba->cabds);
 }
 
 /*
diff --git a/kernel/fork.c b/kernel/fork.c
index 9d44f2d46c69..e397122721ff 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2123,6 +2123,8 @@ static __latent_entropy struct task_struct *copy_process(
 	p->flags |= PF_FORKNOEXEC;
 	INIT_LIST_HEAD(&p->children);
 	INIT_LIST_HEAD(&p->sibling);
+	INIT_LIST_HEAD(&p->cabds);
+	INIT_LIST_HEAD(&p->cabd);
 	rcu_copy_process(p);
 	p->vfork_done = NULL;
 	spin_lock_init(&p->alloc_lock);
@@ -2386,6 +2388,7 @@ static __latent_entropy struct task_struct *copy_process(
 		p->parent_exec_id = current->self_exec_id;
 		p->exit_signal = args->exit_signal;
 	}
+	p->caba = p->real_parent;
 
 	klp_copy_process(p);
 
@@ -2437,6 +2440,7 @@ static __latent_entropy struct task_struct *copy_process(
 			p->signal->has_child_subreaper = p->real_parent->signal->has_child_subreaper ||
 							 p->real_parent->signal->is_child_subreaper;
 			list_add_tail(&p->sibling, &p->real_parent->children);
+			list_add_tail(&p->cabd, &p->caba->cabds);
 			list_add_tail_rcu(&p->tasks, &init_task.tasks);
 			attach_pid(p, PIDTYPE_TGID);
 			attach_pid(p, PIDTYPE_PGID);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] tests: Add CABA selftest
  2022-06-10 16:32 [PATCH 0/2] Introduce CABA helper process tree Pavel Tikhomirov
  2022-06-10 16:32 ` [PATCH 1/2] Add CABA tree to task_struct Pavel Tikhomirov
@ 2022-06-10 16:32 ` Pavel Tikhomirov
  2022-06-10 16:38 ` [PATCH 0/2] Introduce CABA helper process tree Pavel Tikhomirov
  2 siblings, 0 replies; 5+ messages in thread
From: Pavel Tikhomirov @ 2022-06-10 16:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Pavel Tikhomirov, Eric Biederman, Kees Cook, Alexander Viro,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Andrew Morton,
	linux-ia64, linux-mm, linux-fsdevel

This test creates a "tricky" example process tree where session leaders
of two sessions are children of pid namespace init, also they have their
own children, leader of session A has child with session B and leader
from session B has child with session A.

We check that Closest Alive Born Ancestor tree is right for this case.
This case illustrates how CABA tree helps to understand order of
creation between sessions.

CC: Eric Biederman <ebiederm@xmission.com>
CC: Kees Cook <keescook@chromium.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Ingo Molnar <mingo@redhat.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Juri Lelli <juri.lelli@redhat.com>
CC: Vincent Guittot <vincent.guittot@linaro.org>
CC: Dietmar Eggemann <dietmar.eggemann@arm.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ben Segall <bsegall@google.com>
CC: Mel Gorman <mgorman@suse.de>
CC: Daniel Bristot de Oliveira <bristot@redhat.com>
CC: Valentin Schneider <vschneid@redhat.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: linux-ia64@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-mm@kvack.org
CC: linux-fsdevel@vger.kernel.org

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
---
 tools/testing/selftests/Makefile         |   1 +
 tools/testing/selftests/caba/.gitignore  |   1 +
 tools/testing/selftests/caba/Makefile    |   7 +
 tools/testing/selftests/caba/caba_test.c | 501 +++++++++++++++++++++++
 tools/testing/selftests/caba/config      |   1 +
 5 files changed, 511 insertions(+)
 create mode 100644 tools/testing/selftests/caba/.gitignore
 create mode 100644 tools/testing/selftests/caba/Makefile
 create mode 100644 tools/testing/selftests/caba/caba_test.c
 create mode 100644 tools/testing/selftests/caba/config

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index de11992dc577..e231bd93b4c4 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -3,6 +3,7 @@ TARGETS += alsa
 TARGETS += arm64
 TARGETS += bpf
 TARGETS += breakpoints
+TARGETS += caba
 TARGETS += capabilities
 TARGETS += cgroup
 TARGETS += clone3
diff --git a/tools/testing/selftests/caba/.gitignore b/tools/testing/selftests/caba/.gitignore
new file mode 100644
index 000000000000..aa2c55b774e2
--- /dev/null
+++ b/tools/testing/selftests/caba/.gitignore
@@ -0,0 +1 @@
+caba_test
diff --git a/tools/testing/selftests/caba/Makefile b/tools/testing/selftests/caba/Makefile
new file mode 100644
index 000000000000..4260145c3747
--- /dev/null
+++ b/tools/testing/selftests/caba/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for caba selftests.
+CFLAGS = -g -I../../../../usr/include/ -Wall -O2
+
+TEST_GEN_FILES += caba_test
+
+include ../lib.mk
diff --git a/tools/testing/selftests/caba/caba_test.c b/tools/testing/selftests/caba/caba_test.c
new file mode 100644
index 000000000000..7a2e3f0f39db
--- /dev/null
+++ b/tools/testing/selftests/caba/caba_test.c
@@ -0,0 +1,501 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sched.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+#include <sys/prctl.h>
+#include <sys/socket.h>
+#include <sys/mount.h>
+#include <sys/user.h>
+
+#include "../kselftest_harness.h"
+
+#ifndef CLONE_NEWPID
+#define CLONE_NEWPID 0x20000000	/* New pid namespace */
+#endif
+
+/* Attempt to de-conflict with the selftests tree. */
+#ifndef SKIP
+#define SKIP(s, ...)	XFAIL(s, ##__VA_ARGS__)
+#endif
+
+struct process
+{
+	pid_t pid;
+	pid_t real;
+	pid_t caba;
+	int sks[2];
+	int dead;
+};
+
+struct process *processes;
+int nr_processes = 8;
+int current = 0;
+
+static void cleanup(void)
+{
+	kill(processes[0].pid, SIGKILL);
+	/* It's enought to kill pidns init for others to die */
+	kill(processes[1].pid, SIGKILL);
+}
+
+enum commands
+{
+	TEST_FORK,
+	TEST_WAIT,
+	TEST_SUBREAPER,
+	TEST_SETSID,
+	TEST_DIE,
+	/* unused */
+	TEST_GETSID,
+	TEST_SETNS,
+	TEST_SETPGID,
+	TEST_GETPGID,
+	TEST_GETPPID,
+};
+
+struct command
+{
+	enum commands	cmd;
+	int		arg1;
+	int		arg2;
+};
+
+static void handle_command(void);
+
+static void mainloop(void)
+{
+	while (1)
+		handle_command();
+}
+
+#define CLONE_STACK_SIZE 4096
+#define __stack_aligned__ __attribute__((aligned(16)))
+/* All arguments should be above stack, because it grows down */
+struct clone_args {
+	char stack[CLONE_STACK_SIZE] __stack_aligned__;
+	char stack_ptr[0];
+	int id;
+};
+
+static int get_real_pid()
+{
+	char buf[11];
+	int ret;
+
+	ret = readlink("/proc/self", buf, sizeof(buf)-1);
+	if (ret <= 0) {
+		fprintf(stderr, "%d: readlink /proc/self :%m", current);
+		return -1;
+	}
+	buf[ret] = '\0';
+
+	processes[current].real = atoi(buf);
+	return 0;
+}
+
+static int clone_func(void *_arg)
+{
+	struct clone_args *args = (struct clone_args *) _arg;
+
+	current = args->id;
+
+	if (get_real_pid())
+		exit(1);
+
+	printf("%3d: Hello. My pid is %d\n", args->id, getpid());
+	mainloop();
+	exit(0);
+}
+
+static int make_child(int id, int flags)
+{
+	struct clone_args args;
+	pid_t cid;
+
+	args.id = id;
+
+	cid = clone(clone_func, args.stack_ptr,
+			flags | SIGCHLD, &args);
+
+	if (cid < 0)
+		fprintf(stderr, "clone(%d, %d) :%m", id, flags);
+
+	processes[id].pid = cid;
+
+	return cid;
+}
+
+static int open_proc(void)
+{
+	int fd;
+	char proc_mountpoint[] = "/tmp/.caba_test.proc.XXXXXX";
+
+	if (mkdtemp(proc_mountpoint) == NULL) {
+		fprintf(stderr, "mkdtemp failed %s :%m\n", proc_mountpoint);
+		return -1;
+	}
+
+	if (mount("proc", proc_mountpoint, "proc", MS_MGC_VAL | MS_NOSUID | MS_NOEXEC | MS_NODEV, NULL)) {
+		fprintf(stderr, "mount proc failed :%m\n");
+		rmdir(proc_mountpoint);
+		return -1;
+	}
+
+	fd = open(proc_mountpoint, O_RDONLY | O_DIRECTORY, 0);
+	if (fd < 0)
+		fprintf(stderr, "can't open proc :%m\n");
+
+	if (umount2(proc_mountpoint, MNT_DETACH)) {
+		fprintf(stderr, "can't umount proc :%m\n");
+		goto err_close;
+	}
+
+	if (rmdir(proc_mountpoint)) {
+		fprintf(stderr, "can't remove tmp dir :%m\n");
+		goto err_close;
+	}
+
+	return fd;
+err_close:
+	if (fd >= 0)
+		close(fd);
+	return -1;
+}
+
+static int open_pidns(int pid)
+{
+	int proc, fd;
+	char pidns_path[PATH_MAX];
+
+	proc = open_proc();
+	if (proc < 0) {
+		fprintf(stderr, "open proc\n");
+		return -1;
+	}
+
+	sprintf(pidns_path, "%d/ns/pid", pid);
+	fd = openat(proc, pidns_path, O_RDONLY);
+	if (fd == -1)
+		fprintf(stderr, "open pidns fd\n");
+
+	close(proc);
+	return fd;
+}
+
+static int setns_pid(int pid, int nstype)
+{
+	int pidns, ret;
+
+	pidns = open_pidns(pid);
+	if (pidns < 0)
+		return -1;
+
+	ret = setns(pidns, nstype);
+	if (ret == -1)
+		fprintf(stderr, "setns :%m\n");
+
+	close(pidns);
+	return ret;
+}
+
+static void handle_command(void)
+{
+	int sk = processes[current].sks[0], ret, status = 0;
+	struct command cmd;
+
+	ret = read(sk, &cmd, sizeof(cmd));
+	if (ret != sizeof(cmd)) {
+		fprintf(stderr, "Unable to get command :%m\n");
+		goto err;
+	}
+
+	switch (cmd.cmd) {
+	case TEST_FORK:
+		{
+			pid_t pid;
+
+			pid = make_child(cmd.arg1, cmd.arg2);
+			if (pid == -1) {
+				status = -1;
+				goto err;
+			}
+
+			printf("%3d: fork(%d, %x) = %d\n",
+					current, cmd.arg1, cmd.arg2, pid);
+			processes[cmd.arg1].pid = pid;
+		}
+		break;
+	case TEST_WAIT:
+		printf("%3d: wait(%d) = %d\n", current,
+				cmd.arg1, processes[cmd.arg1].pid);
+
+		if (waitpid(processes[cmd.arg1].pid, NULL, 0) == -1) {
+			fprintf(stderr, "waitpid(%d) :%m\n", processes[cmd.arg1].pid);
+			status = -1;
+		}
+		break;
+	case TEST_SUBREAPER:
+		printf("%3d: subreaper(%d)\n", current, cmd.arg1);
+		if (prctl(PR_SET_CHILD_SUBREAPER, cmd.arg1, 0, 0, 0) == -1) {
+			fprintf(stderr, "PR_SET_CHILD_SUBREAPER :%m\n");
+			status = -1;
+		}
+		break;
+	case TEST_SETSID:
+		printf("%3d: setsid()\n", current);
+		if(setsid() == -1) {
+			fprintf(stderr, "setsid :%m\n");
+			status = -1;
+		}
+		break;
+	case TEST_GETSID:
+		printf("%3d: getsid()\n", current);
+		status = getsid(getpid());
+		if(status == -1)
+			fprintf(stderr, "getsid :%m\n");
+		break;
+	case TEST_SETPGID:
+		printf("%3d: setpgid(%d, %d)\n", current, cmd.arg1, cmd.arg2);
+		if(setpgid(processes[cmd.arg1].pid, processes[cmd.arg2].pid) == -1) {
+			fprintf(stderr, "setpgid :%m\n");
+			status = -1;
+		}
+		break;
+	case TEST_GETPGID:
+		printf("%3d: getpgid()\n", current);
+		status = getpgid(0);
+		if(status == -1)
+			fprintf(stderr, "getpgid :%m\n");
+		break;
+	case TEST_GETPPID:
+		printf("%3d: getppid()\n", current);
+		status = getppid();
+		if(status == -1)
+			fprintf(stderr, "getppid :%m\n");
+		break;
+	case TEST_SETNS:
+		printf("%3d: setns(%d, %d) = %d\n", current,
+				cmd.arg1, cmd.arg2, processes[cmd.arg1].pid);
+		setns_pid(processes[cmd.arg1].pid, cmd.arg2);
+
+		break;
+	case TEST_DIE:
+		printf("%3d: die()\n", current);
+		processes[current].dead = 1;
+		shutdown(sk, SHUT_RDWR);
+		exit(0);
+	}
+
+	ret = write(sk, &status, sizeof(status));
+	if (ret != sizeof(status)) {
+		fprintf(stderr, "Unable to answer :%m\n");
+		goto err;
+	}
+
+	if (status < 0)
+		goto err;
+
+	return;
+err:
+	shutdown(sk, SHUT_RDWR);
+	exit(1);
+}
+
+static int send_command(int id, enum commands op, int arg1, int arg2)
+{
+	int sk = processes[id].sks[1], ret, status;
+	struct command cmd = {op, arg1, arg2};
+
+	if (op == TEST_FORK) {
+		if (processes[arg1].pid) {
+			fprintf(stderr, "%d is busy :%m\n", arg1);
+			return -1;
+		}
+	}
+
+	ret = write(sk, &cmd, sizeof(cmd));
+	if (ret != sizeof(cmd)) {
+		fprintf(stderr, "Unable to send command :%m\n");
+		goto err;
+	}
+
+	status = 0;
+	ret = read(sk, &status, sizeof(status));
+	if (ret != sizeof(status) && !(status == 0 && op == TEST_DIE)) {
+		fprintf(stderr, "Unable to get answer :%m\n");
+		goto err;
+	}
+
+	if (status != -1 && (op == TEST_GETSID || op == TEST_GETPGID || op == TEST_GETPPID))
+		return status;
+
+	if (status) {
+		fprintf(stderr, "The command(%d, %d, %d) failed :%m\n", op, arg1, arg2);
+		goto err;
+	}
+
+	return 0;
+err:
+	cleanup();
+	exit(1);
+}
+
+static int get_caba(int pid, int *caba) {
+	char buf[64], *str;
+	FILE *fp;
+	size_t n;
+
+	if (!pid)
+		snprintf(buf, sizeof(buf), "/proc/self/status");
+	else
+		snprintf(buf, sizeof(buf), "/proc/%d/status", pid);
+
+	fp = fopen(buf, "r");
+	if (!fp) {
+		perror("fopen");
+		return -1;
+	}
+
+	str = NULL;
+	while (getline(&str, &n, fp) != -1) {
+		if (strncmp(str, "NScaba:", 7) == 0) {
+			if (str[7] == '\0') {
+				*caba = 0;
+			} else {
+				if (sscanf(str+7, "%d", caba) != 1) {
+					perror("sscanf");
+					goto err;
+				}
+			}
+
+			fclose(fp);
+			free(str);
+			return 0;
+		}
+	}
+err:
+	free(str);
+	fclose(fp);
+	return -1;
+}
+
+static bool caba_supported(void)
+{
+	int caba;
+
+	return !get_caba(0, &caba);
+}
+
+FIXTURE(caba) {
+};
+
+FIXTURE_SETUP(caba)
+{
+	bool ret;
+
+	ret = caba_supported();
+	ASSERT_GE(ret, 0);
+	if (!ret)
+		SKIP(return, "CABA is not supported");
+}
+
+FIXTURE_TEARDOWN(caba)
+{
+	bool ret;
+
+	ret = caba_supported();
+	ASSERT_GE(ret, 0);
+	if (!ret)
+		SKIP(return, "CABA is not supported");
+
+	cleanup();
+}
+
+TEST_F(caba, complex_sessions)
+{
+	int ret, i, pid, caba;
+
+	ret = caba_supported();
+	ASSERT_GE(ret, 0);
+	if (!ret)
+		SKIP(return, "CABA is not supported");
+
+	processes = mmap(NULL, PAGE_SIZE, PROT_WRITE | PROT_READ, MAP_SHARED | MAP_ANONYMOUS, 0, 0); ASSERT_NE(processes, MAP_FAILED);
+	for (i = 0; i < nr_processes; i++) {
+		ret = socketpair(PF_UNIX, SOCK_STREAM, 0, processes[i].sks); ASSERT_EQ(ret, 0);
+
+	}
+
+	/*
+	 * Create init:
+	 * (pid, sid)
+	 * (1, 1)
+	 */
+	pid = make_child(0, 0); ASSERT_GT(pid, 0);
+	ret = send_command(0, TEST_FORK,	  1, CLONE_NEWPID); ASSERT_EQ(ret, 0);
+	ret = send_command(1, TEST_SETSID,	  0, 0); ASSERT_EQ(ret, 0);
+
+	/*
+	 * Create sequence of processes from one session:
+	 * (pid, sid)
+	 * (1, 1)---(2, 2)---(3, 2)---(4, 2)---(5, 2)
+	 */
+	ret = send_command(1, TEST_FORK,	  2, 0); ASSERT_EQ(ret, 0);
+	ret = send_command(2, TEST_SETSID,	  0, 0); ASSERT_EQ(ret, 0);
+	ret = send_command(2, TEST_FORK,	  3, 0); ASSERT_EQ(ret, 0);
+	ret = send_command(3, TEST_FORK,	  4, 0); ASSERT_EQ(ret, 0);
+	ret = send_command(4, TEST_FORK,	  5, 0); ASSERT_EQ(ret, 0);
+	/*
+	 * Create another session in the middle of first one:
+	 * (pid, sid)
+	 * (1, 1)---(2, 2)---(3, 2)---(4, 4)-+-(5, 2)
+	 *                                   `-(6, 4)---(7, 4)
+	 */
+	ret = send_command(4, TEST_SETSID,	  0, 0); ASSERT_EQ(ret, 0);
+	ret = send_command(4, TEST_FORK,	  6, 0); ASSERT_EQ(ret, 0);
+	ret = send_command(6, TEST_FORK,	  7, 0); ASSERT_EQ(ret, 0);
+
+	/*
+	 * Kill 6 while having 2 as child-sub-reaper:
+	 * (pid, sid)
+	 * (1, 1)---(2, 2)---(3, 2)---(4, 4)-+-(5, 2)
+	 *                 `-(7, 4)
+	 */
+	ret = send_command(2, TEST_SUBREAPER, 1, 0); ASSERT_EQ(ret, 0);
+	ret = send_command(6, TEST_DIE,	  0, 0); ASSERT_EQ(ret, 0);
+	ret = send_command(4, TEST_WAIT,	  6, 0); ASSERT_EQ(ret, 0);
+	ret = send_command(2, TEST_SUBREAPER, 0, 0); ASSERT_EQ(ret, 0);
+
+	/*
+	 * Kill 3:
+	 * (pid, sid)
+	 * (1, 1)-+-(2, 2)---(7, 4)
+	 *        `-(4, 4)---(5, 2)
+	 * note: This is a "tricky" session tree example where it's not obvious
+	 * whether sid 2 was created first or sid 4 when creating the tree.
+	 */
+	ret = send_command(3, TEST_DIE,	  0, 0); ASSERT_EQ(ret, 0);
+	ret = send_command(2, TEST_WAIT,	  3, 0); ASSERT_EQ(ret, 0);
+
+	/*
+	 * CABA tree for this would be:
+	 * (pid, sid)
+	 * (1, 1)---(2, 2)---(4, 4)-+-(5, 2)
+	 *                          `-(7, 4)
+	 * note: CABA allows us to understand that session 2 was created first.
+	 */
+	ret = get_caba(processes[2].real, &caba); ASSERT_EQ(ret, 0); ASSERT_EQ(caba, processes[1].real);
+	ret = get_caba(processes[4].real, &caba); ASSERT_EQ(ret, 0); ASSERT_EQ(caba, processes[2].real);
+	ret = get_caba(processes[5].real, &caba); ASSERT_EQ(ret, 0); ASSERT_EQ(caba, processes[4].real);
+	ret = get_caba(processes[7].real, &caba); ASSERT_EQ(ret, 0); ASSERT_EQ(caba, processes[4].real);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/caba/config b/tools/testing/selftests/caba/config
new file mode 100644
index 000000000000..eae7bdaa3790
--- /dev/null
+++ b/tools/testing/selftests/caba/config
@@ -0,0 +1 @@
+CONFIG_PID_NS=y
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] Introduce CABA helper process tree
  2022-06-10 16:32 [PATCH 0/2] Introduce CABA helper process tree Pavel Tikhomirov
  2022-06-10 16:32 ` [PATCH 1/2] Add CABA tree to task_struct Pavel Tikhomirov
  2022-06-10 16:32 ` [PATCH 2/2] tests: Add CABA selftest Pavel Tikhomirov
@ 2022-06-10 16:38 ` Pavel Tikhomirov
  2 siblings, 0 replies; 5+ messages in thread
From: Pavel Tikhomirov @ 2022-06-10 16:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Eric Biederman, Kees Cook, Alexander Viro, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Andrew Morton,
	linux-ia64, linux-mm, linux-fsdevel, kernel

CC: kernel@openvz.org

On 10.06.2022 19:32, Pavel Tikhomirov wrote:
> Please see "Add CABA tree to task_struct" for deeper explanation, and
> "tests: Add CABA selftest" for a small test and an actual case for which
> we might need CABA.
> 
> Probably the original problem of restoring process tree with complex
> sessions can be resolved by allowing sessions copying, like we do for
> process group, but I'm not sure if that would be too secure to do it,
> and if there would not be another similar resource in future.
> 
> We can use CABA not only for CRIU for restoring processes, in normal
> life when processes detach CABA will help to understand from which place
> in process tree they were originally started from sshd/crond or
> something else.
> 
> Hope my idea is not completely insane =)
> 
> CC: Eric Biederman <ebiederm@xmission.com>
> CC: Kees Cook <keescook@chromium.org>
> CC: Alexander Viro <viro@zeniv.linux.org.uk>
> CC: Ingo Molnar <mingo@redhat.com>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: Juri Lelli <juri.lelli@redhat.com>
> CC: Vincent Guittot <vincent.guittot@linaro.org>
> CC: Dietmar Eggemann <dietmar.eggemann@arm.com>
> CC: Steven Rostedt <rostedt@goodmis.org>
> CC: Ben Segall <bsegall@google.com>
> CC: Mel Gorman <mgorman@suse.de>
> CC: Daniel Bristot de Oliveira <bristot@redhat.com>
> CC: Valentin Schneider <vschneid@redhat.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: linux-ia64@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> CC: linux-mm@kvack.org
> CC: linux-fsdevel@vger.kernel.org
> 
> Pavel Tikhomirov (2):
>    Add CABA tree to task_struct
>    tests: Add CABA selftest
> 
>   arch/ia64/kernel/mca.c                   |   3 +
>   fs/exec.c                                |   1 +
>   fs/proc/array.c                          |  18 +
>   include/linux/sched.h                    |   7 +
>   init/init_task.c                         |   3 +
>   kernel/exit.c                            |  50 ++-
>   kernel/fork.c                            |   4 +
>   tools/testing/selftests/Makefile         |   1 +
>   tools/testing/selftests/caba/.gitignore  |   1 +
>   tools/testing/selftests/caba/Makefile    |   7 +
>   tools/testing/selftests/caba/caba_test.c | 501 +++++++++++++++++++++++
>   tools/testing/selftests/caba/config      |   1 +
>   12 files changed, 591 insertions(+), 6 deletions(-)
>   create mode 100644 tools/testing/selftests/caba/.gitignore
>   create mode 100644 tools/testing/selftests/caba/Makefile
>   create mode 100644 tools/testing/selftests/caba/caba_test.c
>   create mode 100644 tools/testing/selftests/caba/config
> 

-- 
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] Add CABA tree to task_struct
  2022-06-10 16:32 ` [PATCH 1/2] Add CABA tree to task_struct Pavel Tikhomirov
@ 2022-06-10 21:02   ` kernel test robot
  0 siblings, 0 replies; 5+ messages in thread
From: kernel test robot @ 2022-06-10 21:02 UTC (permalink / raw)
  To: Pavel Tikhomirov, linux-kernel
  Cc: kbuild-all, Pavel Tikhomirov, Eric Biederman, Kees Cook,
	Alexander Viro, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Andrew Morton,
	Linux Memory Management List, linux-ia64, linux-fsdevel

Hi Pavel,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on shuah-kselftest/next]
[also build test WARNING on kees/for-next/execve tip/sched/core linus/master v5.19-rc1 next-20220610]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Pavel-Tikhomirov/Introduce-CABA-helper-process-tree/20220611-003433
base:   https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git next
config: i386-randconfig-a001 (https://download.01.org/0day-ci/archive/20220611/202206110409.b8UJYnuq-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/0875a2bed5ff95643c487dfcc28a550db06ea418
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Pavel-Tikhomirov/Introduce-CABA-helper-process-tree/20220611-003433
        git checkout 0875a2bed5ff95643c487dfcc28a550db06ea418
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash fs/proc/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   fs/proc/array.c: In function 'task_state':
>> fs/proc/array.c:157:15: warning: unused variable 'caba_pids' [-Wunused-variable]
     157 |         pid_t caba_pids[MAX_PID_NS_LEVEL] = {};
         |               ^~~~~~~~~
>> fs/proc/array.c:156:13: warning: unused variable 'caba_level' [-Wunused-variable]
     156 |         int caba_level = 0;
         |             ^~~~~~~~~~
>> fs/proc/array.c:155:21: warning: unused variable 'caba_pid' [-Wunused-variable]
     155 |         struct pid *caba_pid;
         |                     ^~~~~~~~
>> fs/proc/array.c:154:29: warning: unused variable 'caba' [-Wunused-variable]
     154 |         struct task_struct *caba;
         |                             ^~~~


vim +/caba_pids +157 fs/proc/array.c

   143	
   144	static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
   145					struct pid *pid, struct task_struct *p)
   146	{
   147		struct user_namespace *user_ns = seq_user_ns(m);
   148		struct group_info *group_info;
   149		int g, umask = -1;
   150		struct task_struct *tracer;
   151		const struct cred *cred;
   152		pid_t ppid, tpid = 0, tgid, ngid;
   153		unsigned int max_fds = 0;
 > 154		struct task_struct *caba;
 > 155		struct pid *caba_pid;
 > 156		int caba_level = 0;
 > 157		pid_t caba_pids[MAX_PID_NS_LEVEL] = {};
   158	
   159		rcu_read_lock();
   160		ppid = pid_alive(p) ?
   161			task_tgid_nr_ns(rcu_dereference(p->real_parent), ns) : 0;
   162	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-06-10 21:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-10 16:32 [PATCH 0/2] Introduce CABA helper process tree Pavel Tikhomirov
2022-06-10 16:32 ` [PATCH 1/2] Add CABA tree to task_struct Pavel Tikhomirov
2022-06-10 21:02   ` kernel test robot
2022-06-10 16:32 ` [PATCH 2/2] tests: Add CABA selftest Pavel Tikhomirov
2022-06-10 16:38 ` [PATCH 0/2] Introduce CABA helper process tree Pavel Tikhomirov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).