All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] prctl: propagate has_child_subreaper flag to every descendant
@ 2017-01-19 16:43 Pavel Tikhomirov
  2017-01-20 18:14 ` Oleg Nesterov
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Pavel Tikhomirov @ 2017-01-19 16:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Andrew Morton, Cyrill Gorcunov, John Stultz,
	Thomas Gleixner, Oleg Nesterov, Nicolas Pitre, Pavel Tikhomirov,
	Michal Hocko, Stanislav Kinsburskiy, Mateusz Guzik, linux-kernel,
	Pavel Emelyanov, Konstantin Khorenko

If process forks some children when it has is_child_subreaper
flag enabled they will inherit has_child_subreaper flag - first
group, when is_child_subreaper is disabled forked children will
not inherit it - second group. So child-subreaper does not reparent
all his descendants when their parents die. Having these two
differently behaving groups can lead to confusion. Also it is
a problem for CRIU, as when we restore process tree we need to
somehow determine which descendants belong to which group and
much harder - to put them exactly to these group.

To simplify these we can add a propagation of has_child_subreaper
flag on PR_SET_CHILD_SUBREAPER, walking all descendants of child-
subreaper to setup has_child_subreaper flag.

In common cases when process like systemd first sets itself to
be a child-subreaper and only after that forks its services, we will
have zero-length list of descendants to walk. Testing with binary
subtree of 2^15 processes prctl took < 0.007 sec and has shown close
to linear dependency(~0.2 * n * usec) on lower numbers of processes.

Using csr_descendant list to collect descendants and do tree walk
without recursion.

Optimize:

a) When descendant already has has_child_subreaper flag all his subtree
has it too already.

b) When some descendant is child_reaper, it's subtree is in different
pidns from us(original child-subreaper) and processes from other pidns
will never reparent to us.

So we can skip their(a,b) subtree from walk.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
---
 include/linux/sched.h |  2 ++
 kernel/sys.c          | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4d19052..9cb44c4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1715,6 +1715,8 @@ struct task_struct {
 	struct signal_struct *signal;
 	struct sighand_struct *sighand;
 
+	struct list_head csr_descendant;
+
 	sigset_t blocked, real_blocked;
 	sigset_t saved_sigmask;	/* restored if set_restore_sigmask() was used */
 	struct sigpending pending;
diff --git a/kernel/sys.c b/kernel/sys.c
index 842914e..05b6d7d 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2063,6 +2063,55 @@ static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
 }
 #endif
 
+static DEFINE_SPINLOCK(descendants_lock);
+
+static void prctl_set_child_subreaper(struct task_struct *reaper, bool arg2)
+{
+	LIST_HEAD(descendants);
+
+	reaper->signal->is_child_subreaper = arg2;
+	if (!arg2)
+		return;
+
+	spin_lock(&descendants_lock);
+	read_lock(&tasklist_lock);
+
+	list_add(&reaper->csr_descendant, &descendants);
+
+	while (!list_empty(&descendants)) {
+		struct task_struct *tsk;
+		struct task_struct *p;
+
+		tsk = list_first_entry(&descendants, struct task_struct,
+				csr_descendant);
+
+		list_for_each_entry(p, &tsk->children, sibling) {
+			/*
+			 * If tsk has has_child_subreaper - all its decendants
+			 * already have these flag too and new decendants will
+			 * inherit it on fork, so nothing to be done here.
+			 */
+			if (p->signal->has_child_subreaper)
+				continue;
+
+			/*
+			 * If we've found child_reaper - skip descendants in
+			 * it's subtree as they will never get out pidns
+			 */
+			if (is_child_reaper(task_pid(p)))
+				continue;
+
+			p->signal->has_child_subreaper = 1;
+			list_add(&p->csr_descendant, &descendants);
+		}
+
+		list_del_init(&tsk->csr_descendant);
+	}
+
+	read_unlock(&tasklist_lock);
+	spin_unlock(&descendants_lock);
+}
+
 SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		unsigned long, arg4, unsigned long, arg5)
 {
@@ -2213,7 +2262,7 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		error = prctl_get_tid_address(me, (int __user **)arg2);
 		break;
 	case PR_SET_CHILD_SUBREAPER:
-		me->signal->is_child_subreaper = !!arg2;
+		prctl_set_child_subreaper(me, !!arg2);
 		break;
 	case PR_GET_CHILD_SUBREAPER:
 		error = put_user(me->signal->is_child_subreaper,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-01-30 18:17 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-19 16:43 [PATCH] prctl: propagate has_child_subreaper flag to every descendant Pavel Tikhomirov
2017-01-20 18:14 ` Oleg Nesterov
2017-01-22 10:00   ` Pavel Tikhomirov
2017-01-22 10:11   ` Pavel Tikhomirov
2017-01-23 11:55     ` Oleg Nesterov
2017-01-23 12:52       ` task_is_descendant() cleanup Oleg Nesterov
2017-01-25 21:59         ` Kees Cook
2017-01-30 13:49           ` Oleg Nesterov
2017-01-23 14:30       ` [PATCH] prctl: propagate has_child_subreaper flag to every descendant Pavel Tikhomirov
2017-01-23 16:06         ` Oleg Nesterov
2017-01-23 11:57 ` [PATCH] introduce the walk_process_tree() helper Oleg Nesterov
2017-01-23 12:07   ` Oleg Nesterov
2017-01-24 15:01   ` Pavel Tikhomirov
2017-01-23 16:44 ` setns() && PR_SET_CHILD_SUBREAPER Oleg Nesterov
2017-01-23 18:21   ` Eric W. Biederman
2017-01-24 14:07     ` Oleg Nesterov
2017-01-24 15:24       ` Eric W. Biederman
2017-01-30 18:16         ` Oleg Nesterov
2017-01-30 18:17         ` [PATCH] exit: fix the setns() && PR_SET_CHILD_SUBREAPER interaction Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.