All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] prctl: make PR_SET_CHILD_SUBREAPER deterministic
@ 2017-01-30 14:48 Pavel Tikhomirov
  2017-01-30 14:48 ` [PATCH 1/2] introduce the walk_process_tree() helper Pavel Tikhomirov
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Pavel Tikhomirov @ 2017-01-30 14:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Andrew Morton, Cyrill Gorcunov, John Stultz,
	Thomas Gleixner, Oleg Nesterov, Nicolas Pitre, Michal Hocko,
	Stanislav Kinsburskiy, Mateusz Guzik, linux-kernel,
	Pavel Emelyanov, Konstantin Khorenko, Pavel Tikhomirov,
	Lennart Poettering, Eric W . Biederman, Kay Sievers

Oleg Nesterov (1):
  introduce the walk_process_tree() helper

Pavel Tikhomirov (1):
  prctl: propagate has_child_subreaper flag to every descendant

 include/linux/sched.h |  3 +++
 kernel/fork.c         | 42 +++++++++++++++++++++++++++++++++++++++---
 kernel/sys.c          | 22 ++++++++++++++++++++++
 3 files changed, 64 insertions(+), 3 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] introduce the walk_process_tree() helper
  2017-01-30 14:48 [PATCH v3 0/2] prctl: make PR_SET_CHILD_SUBREAPER deterministic Pavel Tikhomirov
@ 2017-01-30 14:48 ` Pavel Tikhomirov
  2017-01-30 14:48 ` [PATCH v3 2/2] prctl: propagate has_child_subreaper flag to every descendant Pavel Tikhomirov
  2017-01-30 14:59 ` [PATCH v3 0/2] prctl: make PR_SET_CHILD_SUBREAPER deterministic Pavel Tikhomirov
  2 siblings, 0 replies; 4+ messages in thread
From: Pavel Tikhomirov @ 2017-01-30 14:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Andrew Morton, Cyrill Gorcunov, John Stultz,
	Thomas Gleixner, Oleg Nesterov, Nicolas Pitre, Michal Hocko,
	Stanislav Kinsburskiy, Mateusz Guzik, linux-kernel,
	Pavel Emelyanov, Konstantin Khorenko, Pavel Tikhomirov,
	Lennart Poettering, Eric W . Biederman, Kay Sievers

From: Oleg Nesterov <oleg@redhat.com>

Add the new helper to walk the process tree, the next patch adds a user.
Note that it visits the group leaders only, proc_visitor can do
for_each_thread itself or we can trivially extend walk_process_tree() to
do this.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
---
 include/linux/sched.h |  3 +++
 kernel/fork.c         | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ad3ec9e..7f8ab91 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -3067,6 +3067,9 @@ extern bool current_is_single_threaded(void);
 #define for_each_process_thread(p, t)	\
 	for_each_process(p) for_each_thread(p, t)
 
+typedef int (*proc_visitor)(struct task_struct *p, void *data);
+void walk_process_tree(struct task_struct *top, proc_visitor, void *);
+
 static inline int get_nr_threads(struct task_struct *tsk)
 {
 	return tsk->signal->nr_threads;
diff --git a/kernel/fork.c b/kernel/fork.c
index 11c5c8a..135b7a4 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2053,6 +2053,38 @@ SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
 }
 #endif
 
+void walk_process_tree(struct task_struct *top, proc_visitor visitor, void *data)
+{
+	struct task_struct *leader, *parent, *child;
+	int res;
+
+	read_lock(&tasklist_lock);
+	leader = top = top->group_leader;
+down:
+	for_each_thread(leader, parent) {
+		list_for_each_entry(child, &parent->children, sibling) {
+			res = visitor(child, data);
+			if (res) {
+				if (res < 0)
+					goto out;
+				leader = child;
+				goto down;
+			}
+up:
+			;
+		}
+	}
+
+	if (leader != top) {
+		child = leader;
+		parent = child->real_parent;
+		leader = parent->group_leader;
+		goto up;
+	}
+out:
+	read_unlock(&tasklist_lock);
+}
+
 #ifndef ARCH_MIN_MMSTRUCT_ALIGN
 #define ARCH_MIN_MMSTRUCT_ALIGN 0
 #endif
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v3 2/2] prctl: propagate has_child_subreaper flag to every descendant
  2017-01-30 14:48 [PATCH v3 0/2] prctl: make PR_SET_CHILD_SUBREAPER deterministic Pavel Tikhomirov
  2017-01-30 14:48 ` [PATCH 1/2] introduce the walk_process_tree() helper Pavel Tikhomirov
@ 2017-01-30 14:48 ` Pavel Tikhomirov
  2017-01-30 14:59 ` [PATCH v3 0/2] prctl: make PR_SET_CHILD_SUBREAPER deterministic Pavel Tikhomirov
  2 siblings, 0 replies; 4+ messages in thread
From: Pavel Tikhomirov @ 2017-01-30 14:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Andrew Morton, Cyrill Gorcunov, John Stultz,
	Thomas Gleixner, Oleg Nesterov, Nicolas Pitre, Michal Hocko,
	Stanislav Kinsburskiy, Mateusz Guzik, linux-kernel,
	Pavel Emelyanov, Konstantin Khorenko, Pavel Tikhomirov,
	Lennart Poettering, Eric W . Biederman, Kay Sievers

If process forks some children when it has is_child_subreaper
flag enabled they will inherit has_child_subreaper flag - first
group, when is_child_subreaper is disabled forked children will
not inherit it - second group. So child-subreaper does not reparent
all his descendants when their parents die. Having these two
differently behaving groups can lead to confusion. Also it is
a problem for CRIU, as when we restore process tree we need to
somehow determine which descendants belong to which group and
much harder - to put them exactly to these group.

To simplify these we can add a propagation of has_child_subreaper
flag on PR_SET_CHILD_SUBREAPER, walking all descendants of child-
subreaper to setup has_child_subreaper flag.

In common cases when process like systemd first sets itself to
be a child-subreaper and only after that forks its services, we will
have zero-length list of descendants to walk. Testing with binary
subtree of 2^15 processes prctl took < 0.007 sec and has shown close
to linear dependency(~0.2 * n * usec) on lower numbers of processes.

Moreover, I doubt someone intentionaly pre-forks the children whitch
should reparent to previous reaper(e.g. init) before becoming
subreaper, because some our ancestor migh have had is_child_subreaper
flag while forking our sub-tree and our childs will all inherit
has_child_subreaper flag, and we have no way to influence it. And only
way to check if we have no has_child_subreaper flag is to create some
childs, kill them and see where they will reparent to.

Using walk_process_tree helper to walk subtree, thanks to Oleg! Timing
seems to be the same.

Optimize:

a) When descendant already has has_child_subreaper flag all his subtree
has it too already.

* for a) to be true need to move has_child_subreaper inheritance under
the same tasklist_lock with adding task to its ->real_parent->children
as without it process can inherit zero has_child_subreaper, then we
set 1 to it's parent flag, check that it has no more parents, and only
after child with wrong flag is added to the tree.

* also make these inheritance more clear by using real_parent instead of
current, as on clone(CLONE_PARENT) if current has is_child_subreaper
and real_parent has no is_child_subreaper or has_child_subreaper, child
will have has_child_subreaper flag set without actually having a
subreaper in it's ancestors.

b) When some descendant is child_reaper, it's subtree is in different
pidns from us(original child-subreaper) and processes from other pidns
will never reparent to us.

So we can skip their(a,b) subtree from walk.

v2: switch to walk_process_tree() general helper, move
has_child_subreaper inheritance
v3: remove csr_descendant leftover, change current to real_parent
in has_child_subreaper inheritance

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
---
 kernel/fork.c | 10 +++++++---
 kernel/sys.c  | 22 ++++++++++++++++++++++
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 135b7a4..c814e59 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1367,9 +1367,6 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 	sig->oom_score_adj = current->signal->oom_score_adj;
 	sig->oom_score_adj_min = current->signal->oom_score_adj_min;
 
-	sig->has_child_subreaper = current->signal->has_child_subreaper ||
-				   current->signal->is_child_subreaper;
-
 	mutex_init(&sig->cred_guard_mutex);
 
 	return 0;
@@ -1800,6 +1797,13 @@ static __latent_entropy struct task_struct *copy_process(
 
 			p->signal->leader_pid = pid;
 			p->signal->tty = tty_kref_get(current->signal->tty);
+			/*
+			 * Inherit has_child_subreaper flag under the same
+			 * tasklist_lock with adding child to the process tree
+			 * for propagate_has_child_subreaper optimization.
+			 */
+			p->signal->has_child_subreaper = p->real_parent->signal->has_child_subreaper ||
+							 p->real_parent->signal->is_child_subreaper;
 			list_add_tail(&p->sibling, &p->real_parent->children);
 			list_add_tail_rcu(&p->tasks, &init_task.tasks);
 			attach_pid(p, PIDTYPE_PGID);
diff --git a/kernel/sys.c b/kernel/sys.c
index 842914e..0e4d566 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2063,6 +2063,24 @@ static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
 }
 #endif
 
+static int propagate_has_child_subreaper(struct task_struct *p, void *data)
+{
+	/*
+	 * If task has has_child_subreaper - all its decendants
+	 * already have these flag too and new decendants will
+	 * inherit it on fork, skip them.
+	 *
+	 * If we've found child_reaper - skip descendants in
+	 * it's subtree as they will never get out pidns.
+	 */
+	if (p->signal->has_child_subreaper ||
+	    is_child_reaper(task_pid(p)))
+		return 0;
+
+	p->signal->has_child_subreaper = 1;
+	return 1;
+}
+
 SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		unsigned long, arg4, unsigned long, arg5)
 {
@@ -2214,6 +2232,10 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		break;
 	case PR_SET_CHILD_SUBREAPER:
 		me->signal->is_child_subreaper = !!arg2;
+		if (!arg2)
+			break;
+
+		walk_process_tree(me, propagate_has_child_subreaper, NULL);
 		break;
 	case PR_GET_CHILD_SUBREAPER:
 		error = put_user(me->signal->is_child_subreaper,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3 0/2] prctl: make PR_SET_CHILD_SUBREAPER deterministic
  2017-01-30 14:48 [PATCH v3 0/2] prctl: make PR_SET_CHILD_SUBREAPER deterministic Pavel Tikhomirov
  2017-01-30 14:48 ` [PATCH 1/2] introduce the walk_process_tree() helper Pavel Tikhomirov
  2017-01-30 14:48 ` [PATCH v3 2/2] prctl: propagate has_child_subreaper flag to every descendant Pavel Tikhomirov
@ 2017-01-30 14:59 ` Pavel Tikhomirov
  2 siblings, 0 replies; 4+ messages in thread
From: Pavel Tikhomirov @ 2017-01-30 14:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Andrew Morton, Cyrill Gorcunov, John Stultz,
	Thomas Gleixner, Oleg Nesterov, Nicolas Pitre, Michal Hocko,
	Stanislav Kinsburskiy, Mateusz Guzik, linux-kernel,
	Pavel Emelyanov, Konstantin Khorenko, Lennart Poettering,
	Eric W . Biederman, Kay Sievers

please drop it, errors in commit message

On 01/30/2017 05:48 PM, Pavel Tikhomirov wrote:
> Oleg Nesterov (1):
>   introduce the walk_process_tree() helper
>
> Pavel Tikhomirov (1):
>   prctl: propagate has_child_subreaper flag to every descendant
>
>  include/linux/sched.h |  3 +++
>  kernel/fork.c         | 42 +++++++++++++++++++++++++++++++++++++++---
>  kernel/sys.c          | 22 ++++++++++++++++++++++
>  3 files changed, 64 insertions(+), 3 deletions(-)
>

-- 
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-01-30 15:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-30 14:48 [PATCH v3 0/2] prctl: make PR_SET_CHILD_SUBREAPER deterministic Pavel Tikhomirov
2017-01-30 14:48 ` [PATCH 1/2] introduce the walk_process_tree() helper Pavel Tikhomirov
2017-01-30 14:48 ` [PATCH v3 2/2] prctl: propagate has_child_subreaper flag to every descendant Pavel Tikhomirov
2017-01-30 14:59 ` [PATCH v3 0/2] prctl: make PR_SET_CHILD_SUBREAPER deterministic Pavel Tikhomirov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.