All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes
@ 2010-02-02 12:04 Lennart Poettering
  2010-02-03  8:24 ` KOSAKI Motohiro
                   ` (4 more replies)
  0 siblings, 5 replies; 29+ messages in thread
From: Lennart Poettering @ 2010-02-02 12:04 UTC (permalink / raw)
  To: linux-kernel

[ I already sent this patch half a year ago or so, as an RFC. I didn't
really get any comments back then, however I am still interested in
seeing this patch in the kernel tree. So here I go again: please
comment! I have updated the patch to apply to the current upstream git
master. ]

Right now, if a process dies all its children are reparented to init.
This logic has good uses, i.e. for double forking when daemonizing.
However it also allows child processes to "escape" their parents, which
is a problem for software like session managers (such as gnome-session)
or other process supervisors.

This patch adds a simple flag for each process that marks it as an
"anchor" process for all its children and grandchildren. If a child of
such an anchor dies all its children will not be reparented to init, but
instead to this anchor, escaping this anchor process is not possible. A
task with this flag set hence acts is little "sub-init".

Anchors are fully recursive: if an anchor dies, all its children are
reparented to next higher anchor in the process tree.

This is orthogonal to PID namespaces. PID namespaces virtualize the
actual IDs in addition to introducing "sub-inits". This patch introduces
"sub-inits" inside the same PID namespace.

This patch is compile tested only. It's relatively trivial, and is
written in ignorance of the expected locking logic for accessing
task_struct->parent. This mail is primarily intended as a request for
comments. So please, I'd be happy about any comments!

Lennart

diff --git a/include/linux/prctl.h b/include/linux/prctl.h
index a3baeb2..e9b3dd1 100644
--- a/include/linux/prctl.h
+++ b/include/linux/prctl.h
@@ -102,4 +102,7 @@
 
 #define PR_MCE_KILL_GET 34
 
+#define PR_SET_ANCHOR 35
+#define PR_GET_ANCHOR 36
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index abdfacc..e9ab271 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1294,6 +1294,9 @@ struct task_struct {
 				 * execve */
 	unsigned in_iowait:1;
 
+	/* When a child of one of our children dies, reparent it to me, instead
+	 * of init. */
+	unsigned child_anchor:1;
 
 	/* Revert to default priority/policy when forking */
 	unsigned sched_reset_on_fork:1;
@@ -1306,6 +1309,7 @@ struct task_struct {
 	unsigned long stack_canary;
 #endif
 
+
 	/* 
 	 * pointers to (original) parent process, youngest child, younger sibling,
 	 * older sibling, respectively.  (p->father can be replaced with 
diff --git a/kernel/exit.c b/kernel/exit.c
index 546774a..416883e 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -704,7 +704,7 @@ static void exit_mm(struct task_struct * tsk)
 static struct task_struct *find_new_reaper(struct task_struct *father)
 {
 	struct pid_namespace *pid_ns = task_active_pid_ns(father);
-	struct task_struct *thread;
+	struct task_struct *thread, *anchor;
 
 	thread = father;
 	while_each_thread(father, thread) {
@@ -715,6 +715,11 @@ static struct task_struct *find_new_reaper(struct task_struct *father)
 		return thread;
 	}
 
+	/* find the first ancestor which is marked child_anchor */
+	for (anchor = father->parent; anchor != &init_task; anchor = anchor->parent)
+		if (anchor->child_anchor)
+			return anchor;
+
 	if (unlikely(pid_ns->child_reaper == father)) {
 		write_unlock_irq(&tasklist_lock);
 		if (unlikely(pid_ns == &init_pid_ns))
diff --git a/kernel/fork.c b/kernel/fork.c
index 5b2959b..3d11673 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1265,6 +1265,8 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 		p->parent_exec_id = current->self_exec_id;
 	}
 
+	p->child_anchor = 0;
+
 	spin_lock(&current->sighand->siglock);
 
 	/*
diff --git a/kernel/sys.c b/kernel/sys.c
index 26a6b73..8a1dfb1 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1578,6 +1578,13 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 			else
 				error = PR_MCE_KILL_DEFAULT;
 			break;
+		case PR_SET_ANCHOR:
+			me->child_anchor = !!arg2;
+			error = 0;
+			break;
+		case PR_GET_ANCHOR:
+			error = put_user(me->child_anchor, (int __user *) arg2);
+			break;
 		default:
 			error = -EINVAL;
 			break;
-- 
1.6.6



Lennart

-- 
Lennart Poettering                        Red Hat, Inc.
lennart [at] poettering [dot] net
http://0pointer.net/lennart/           GnuPG 0x1A015CC4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes
  2010-02-02 12:04 [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes Lennart Poettering
@ 2010-02-03  8:24 ` KOSAKI Motohiro
  2010-02-03  9:53   ` Lennart Poettering
  2010-02-03 15:31 ` Américo Wang
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: KOSAKI Motohiro @ 2010-02-03  8:24 UTC (permalink / raw)
  To: Lennart Poettering; +Cc: kosaki.motohiro, linux-kernel

> [ I already sent this patch half a year ago or so, as an RFC. I didn't
> really get any comments back then, however I am still interested in
> seeing this patch in the kernel tree. So here I go again: please
> comment! I have updated the patch to apply to the current upstream git
> master. ]
> 
> Right now, if a process dies all its children are reparented to init.
> This logic has good uses, i.e. for double forking when daemonizing.
> However it also allows child processes to "escape" their parents, which
> is a problem for software like session managers (such as gnome-session)
> or other process supervisors.

I think you need to explain why this patch improve gnome-session.

 - What's happen on current gnome-session. and When?
 - After the patch, Which behavior will be changed?
 - Why do you think gnome-session can ignore old kernel?
 - etc..

We don't have any input for judgement and advise.



> 
> This patch adds a simple flag for each process that marks it as an
> "anchor" process for all its children and grandchildren. If a child of
> such an anchor dies all its children will not be reparented to init, but
> instead to this anchor, escaping this anchor process is not possible. A
> task with this flag set hence acts is little "sub-init".
> 
> Anchors are fully recursive: if an anchor dies, all its children are
> reparented to next higher anchor in the process tree.
> 
> This is orthogonal to PID namespaces. PID namespaces virtualize the
> actual IDs in addition to introducing "sub-inits". This patch introduces
> "sub-inits" inside the same PID namespace.
> 
> This patch is compile tested only. It's relatively trivial, and is
> written in ignorance of the expected locking logic for accessing
> task_struct->parent. This mail is primarily intended as a request for
> comments. So please, I'd be happy about any comments!



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes
  2010-02-03  8:24 ` KOSAKI Motohiro
@ 2010-02-03  9:53   ` Lennart Poettering
  0 siblings, 0 replies; 29+ messages in thread
From: Lennart Poettering @ 2010-02-03  9:53 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: linux-kernel

On Wed, 03.02.10 17:24, KOSAKI Motohiro (kosaki.motohiro@jp.fujitsu.com) wrote:

> 
> > [ I already sent this patch half a year ago or so, as an RFC. I didn't
> > really get any comments back then, however I am still interested in
> > seeing this patch in the kernel tree. So here I go again: please
> > comment! I have updated the patch to apply to the current upstream git
> > master. ]
> > 
> > Right now, if a process dies all its children are reparented to init.
> > This logic has good uses, i.e. for double forking when daemonizing.
> > However it also allows child processes to "escape" their parents, which
> > is a problem for software like session managers (such as gnome-session)
> > or other process supervisors.
> 
> I think you need to explain why this patch improve gnome-session.
> 
>  - What's happen on current gnome-session. and When?

If a child of a supervisor daemon such as g-s does a double fork (and
unfortunately most existing user daemons do), then that supervisor
deaemon will be unable to monitor that child anymore, i.e. do
something when it dies, such as restarting it, or tearing the session
down, or doing something when it segfaults and so on.

Also, if g-s itself dies, clients that escaped it via double-forking
will stay around even if PR_DEATHSIG is used. With this patch applied
PR_DEATHSIG will work for them too because child processes cannot
escape their parents anymore if the parent wants that. And
getrusage(RUSAGE_CHILDREN) will start to return useful results in g-s
too.

Also, as a minor side-effect the output of "ps xawf" or similar tools
becomes much more useful since processes belonging to a session will
actually show up as children of g-s in the tree instead of as
unattached processes.

Right now, only init itself can do process supervising properly, since
it will be getting the SIGCHLD for those processes that escaped their
parents by double forking. With this patch I want to extend this
power to non-init supervisor daemons, such as g-s.

Also, this makes it easier to write and test init daemon because you
can run them as PID != 1 and still get very similar functionality
regarding SIGCHLD.

>  - After the patch, Which behavior will be changed?

For normal processes, nothing. And for those which use this new
PR_SETACNHOR call the children won't be able to escape them anymore
via a double fork. Or as I already tried to explain:

> > This patch adds a simple flag for each process that marks it as an
> > "anchor" process for all its children and grandchildren. If a child of
> > such an anchor dies all its children will not be reparented to init, but
> > instead to this anchor, escaping this anchor process is not possible. A
> > task with this flag set hence acts as little "sub-init".

>  - Why do you think gnome-session can ignore old kernel?

Did I say that?

On new kernels supervisor daemons can make use of this and children
won't be able to escape them. On old kernels they cannot and children
will continue to escape them. But uh, that should be fine. So on newer
kernels g-s can supervise all user daemons nicely, and on old kernels
we continue with the status quo. That should be fine.

Lennart

-- 
Lennart Poettering                        Red Hat, Inc.
lennart [at] poettering [dot] net
http://0pointer.net/lennart/           GnuPG 0x1A015CC4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes
  2010-02-02 12:04 [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes Lennart Poettering
  2010-02-03  8:24 ` KOSAKI Motohiro
@ 2010-02-03 15:31 ` Américo Wang
  2010-02-03 17:49   ` Lennart Poettering
  2010-02-04 15:42 ` Kay Sievers
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: Américo Wang @ 2010-02-03 15:31 UTC (permalink / raw)
  To: Lennart Poettering; +Cc: linux-kernel

On Tue, Feb 02, 2010 at 01:04:57PM +0100, Lennart Poettering wrote:
>
>This patch adds a simple flag for each process that marks it as an
>"anchor" process for all its children and grandchildren. If a child of
>such an anchor dies all its children will not be reparented to init, but
>instead to this anchor, escaping this anchor process is not possible. A
>task with this flag set hence acts is little "sub-init".
>

This will break the applictions which using 'getppid() == 1' to check
if its real parent is dead or not...

-- 
Live like a child, think like the god.
 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes
  2010-02-03 15:31 ` Américo Wang
@ 2010-02-03 17:49   ` Lennart Poettering
  2010-02-05  9:54     ` Américo Wang
  0 siblings, 1 reply; 29+ messages in thread
From: Lennart Poettering @ 2010-02-03 17:49 UTC (permalink / raw)
  To: Américo Wang; +Cc: linux-kernel

On Wed, 03.02.10 23:31, Américo Wang (xiyou.wangcong@gmail.com) wrote:

> On Tue, Feb 02, 2010 at 01:04:57PM +0100, Lennart Poettering wrote:
> >
> >This patch adds a simple flag for each process that marks it as an
> >"anchor" process for all its children and grandchildren. If a child of
> >such an anchor dies all its children will not be reparented to init, but
> >instead to this anchor, escaping this anchor process is not possible. A
> >task with this flag set hence acts is little "sub-init".
> 
> This will break the applictions which using 'getppid() == 1' to check
> if its real parent is dead or not...

Usage of the PR_SETANCHOR flag is optional for a process. It won't
break anything unless enabled. So I don't really see a problem here.

Of course, when this flag is used the behaviour is different from what
traditional Unix says what happens with the children of a process when
it dies. But uh, that's the whole point and that's why this flag is
enabled optionally only.

Also, on a side note: code that checks if its parent process died most
likely should rewritten to use PR_DEATHSIG or something like that
anyway, so that it is notified about the parent dying instead of
polling for it manually.

Lennart

-- 
Lennart Poettering                        Red Hat, Inc.
lennart [at] poettering [dot] net
http://0pointer.net/lennart/           GnuPG 0x1A015CC4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes
  2010-02-02 12:04 [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes Lennart Poettering
  2010-02-03  8:24 ` KOSAKI Motohiro
  2010-02-03 15:31 ` Américo Wang
@ 2010-02-04 15:42 ` Kay Sievers
  2010-02-04 20:59   ` Kay Sievers
  2010-03-04 14:08 ` Oleg Nesterov
  2010-12-20 14:26 ` Scott James Remnant
  4 siblings, 1 reply; 29+ messages in thread
From: Kay Sievers @ 2010-02-04 15:42 UTC (permalink / raw)
  To: Lennart Poettering; +Cc: linux-kernel

On Tue, 2010-02-02 at 13:04 +0100, Lennart Poettering wrote:
> Right now, if a process dies all its children are reparented to init.
> This logic has good uses, i.e. for double forking when daemonizing.
> However it also allows child processes to "escape" their parents,
> which
> is a problem for software like session managers (such as
> gnome-session)
> or other process supervisors.
> 
> This patch adds a simple flag for each process that marks it as an
> "anchor" process for all its children and grandchildren. If a child of
> such an anchor dies all its children will not be reparented to init,
> but instead to this anchor, escaping this anchor process is not possible.
> A task with this flag set hence acts is little "sub-init".
> 
> Anchors are fully recursive: if an anchor dies, all its children are
> reparented to next higher anchor in the process tree.
> 
> This is orthogonal to PID namespaces. PID namespaces virtualize the
> actual IDs in addition to introducing "sub-inits". This patch
> introduces
> "sub-inits" inside the same PID namespace.

Sounds good to me. And seems useful for all sorts of session tracking
and "prettifying ps". :)

It seems to work fine here. With a double-fork, the child gets the
intermediate-fork pid as the parent, and when this dies, it get
re-parented to the anchor pid instead of directly to pid 1. Only when
the anchor pid dies, it will be re-parented to pid 1.

Thanks,
Kay

$ ./sub-init 1
[26209] main: anchor=1
[26209] main: forked 'help' 26210
[26209] main: wait for 'help' to exit 26210
[26210] help: has parent 26209
[26210] help: forked 'child' 26211, sleep
[26211] child: has parent 26210, sleep
[26211] child: has parent 26210, sleep
[26210] help: exit
[26209] main: 'help' 26210 returned, sleep
[26211] child: has parent 26209, sleep
[26211] child: has parent 26209, sleep
[26209] main: exit
[26211] child: has parent 1, sleep
[26211] child: has parent 1, sleep
[26211] child: has parent 1, sleep
[26211] child: has parent 1, sleep
[26211] child: has parent 1, sleep
[26211] child: has parent 1, sleep
[26211] child: exit



#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/prctl.h>
#include <sys/wait.h>

#define PR_SET_ANCHOR 35
#define PR_GET_ANCHOR 36

int main(int argc, char *argv[])
{
	int is_anch;
	pid_t pid;

	if (argc > 1)
		prctl(PR_SET_ANCHOR, 1);
	prctl(PR_GET_ANCHOR, &is_anch);
	printf("[%i] main: anchor=%i\n", getpid(), is_anch);

	pid = fork();
	if (pid == 0) {
		pid_t pid2;

		printf("[%i] help: has parent %i\n", getpid(), getppid());
		pid2 = fork();
		if (pid2 == 0) {
			int i;

			for (i = 0; i < 30; i += 3) {
				printf("[%i] child: has parent %i, sleep\n", getpid(), getppid());
				sleep(1);
			}
			printf("[%i] child: exit\n", getpid());
		} else {
			printf("[%i] help: forked 'child' %i, sleep\n", getpid(), pid2);
			sleep(2);
			printf("[%i] help: exit\n", getpid());
			return 0;
		}
	} else {
		printf("[%i] main: forked 'help' %i\n", getpid(), pid);
		printf("[%i] main: wait for 'help' to exit %i\n", getpid(), pid);
		waitpid(pid, NULL, 0);
		printf("[%i] main: 'help' %i returned, sleep\n", getpid(), pid);
		sleep(2);
		printf("[%i] main: exit\n", getpid());
	}

	return 0;
}


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes
  2010-02-04 15:42 ` Kay Sievers
@ 2010-02-04 20:59   ` Kay Sievers
  0 siblings, 0 replies; 29+ messages in thread
From: Kay Sievers @ 2010-02-04 20:59 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Lennart Poettering, linux-kernel

On Thu, 2010-02-04 at 16:42 +0100, Kay Sievers wrote:

> Sounds good to me. And seems useful for all sorts of session tracking
> and "prettifying ps". :)

Here is the output of 'ps" with a wrapped gnome-session with the anchor
flag set. All the started programs stay childs of the session, instead
of becoming childs of init:

Thanks,
Kay

PID TTY      STAT   TIME COMMAND
    2 ?        S      0:00 [kthreadd]
    3 ?        S      0:00  \_ [migration/0]
    4 ?        S      0:00  \_ [ksoftirqd/0]
    5 ?        S      0:00  \_ [migration/1]
    6 ?        S      0:00  \_ [ksoftirqd/1]
    7 ?        S      0:00  \_ [events/0]
    8 ?        S      0:00  \_ [events/1]
    9 ?        S      0:00  \_ [khelper]
   10 ?        S      0:00  \_ [async/mgr]
   11 ?        S      0:00  \_ [sync_supers]
   12 ?        S      0:00  \_ [bdi-default]
   13 ?        S      0:00  \_ [kblockd/0]
   14 ?        S      0:00  \_ [kblockd/1]
   15 ?        S      0:00  \_ [kacpid]
   16 ?        S      0:00  \_ [kacpi_notify]
   17 ?        S      0:00  \_ [kacpi_hotplug]
   18 ?        S      0:00  \_ [ata/0]
   19 ?        S      0:00  \_ [ata/1]
   20 ?        S      0:00  \_ [ata_aux]
   21 ?        S      0:00  \_ [kseriod]
   24 ?        S      0:00  \_ [kondemand/0]
   25 ?        S      0:00  \_ [kondemand/1]
   26 ?        S      0:00  \_ [kswapd0]
   27 ?        S      0:00  \_ [aio/0]
   28 ?        S      0:00  \_ [aio/1]
   29 ?        S      0:00  \_ [crypto/0]
   30 ?        S      0:00  \_ [crypto/1]
   33 ?        S      0:00  \_ [scsi_eh_0]
   34 ?        S      0:00  \_ [scsi_eh_1]
   35 ?        S      0:00  \_ [scsi_eh_2]
   36 ?        S      0:00  \_ [scsi_eh_3]
   41 ?        S      0:00  \_ [kpsmoused]
   43 ?        S      0:00  \_ [jbd2/sda1-8]
   44 ?        S      0:00  \_ [ext4-dio-unwrit]
   45 ?        S      0:00  \_ [ext4-dio-unwrit]
  233 ?        S      0:00  \_ [ksuspend_usbd]
  238 ?        S      0:00  \_ [khubd]
  272 ?        S      0:00  \_ [cfg80211]
  283 ?        S      0:00  \_ [kvm-irqfd-clean]
  324 ?        S      0:00  \_ [ktpacpid]
  339 ?        S      0:00  \_ [iwlagn]
  340 ?        S      0:00  \_ [phy0]
  364 ?        S      0:00  \_ [i915]
  425 ?        S      0:00  \_ [hd-audio0]
  471 ?        S      0:00  \_ [flush-259:0]
  489 ?        S      0:00  \_ [usbhid_resumer]
  502 ?        S      0:00  \_ [scsi_eh_4]
  503 ?        S      0:00  \_ [usb-storage]
  514 ?        S      0:00  \_ [kauditd]
  526 ?        S      0:00  \_ [kstriped]
  564 ?        S      0:00  \_ [kjournald]
    1 ?        Ss     0:00 init [5]  
   96 ?        S<s    0:00 /sbin/udevd --daemon
  212 ?        S<     0:00  \_ /sbin/udevd --daemon
  213 ?        S<     0:00  \_ /sbin/udevd --daemon
  913 ?        Ss     0:00 /sbin/acpid
  920 ?        Ss     0:00 /bin/dbus-daemon --system
 1068 ?        Ss     0:00 avahi-daemon: running [yio.local]
 1086 ?        Sl     0:00 /sbin/rsyslogd -c 4 -f /etc/rsyslog.conf
 1091 ?        Ssl    0:00 /usr/sbin/console-kit-daemon
 1139 ?        Ss     0:00 /usr/sbin/sshd -o PidFile=/var/run/sshd.init.pid
 1234 ?        Ssl    0:00 /usr/sbin/nscd
 1252 ?        Ss     0:00 /usr/sbin/cupsd -C /etc/cups/cupsd.conf
 1255 ?        S      0:00 /usr/sbin/gdm
 1263 ?        S      0:00  \_ /usr/lib/gdm/gdm-simple-slave --display-id /org/gnome/DisplayManager/Display1
 1290 tty7     Ss+    0:15      \_ /usr/bin/Xorg :0 -br -verbose -auth /var/run/gdm/auth-for-gdm-t73y8a/database -nolisten tcp vt7
 1445 ?        S      0:00      \_ /usr/lib/gdm/gdm-session-worker
 1455 ?        Ssl    0:00          \_ /usr/bin/gnome-session
 1535 ?        Ss     0:00              \_ /usr/bin/gpg-agent --sh --daemon --write-env-file /home/kay/.gnupg/agent.info /usr/bin/ssh-agent /bin/bash /etc/X11/xinit/xinitrc
 1536 ?        Ss     0:00              \_ /usr/bin/ssh-agent /bin/bash /etc/X11/xinit/xinitrc
 1546 ?        S      0:00              \_ dbus-launch --exit-with-session /usr/bin/gnome-session
 1547 ?        Ss     0:00              \_ /bin/dbus-daemon --fork --print-pid 5 --print-address 9 --session
 1556 ?        S      0:00              \_ /usr/lib/GConf/2/gconfd-2
 1588 ?        Sl     0:00              \_ gnome-keyring-daemon --start --components=pkcs11
 1589 ?        SLl    0:00              \_ gnome-keyring-daemon --start --components=secrets
 1592 ?        Sl     0:00              \_ gnome-keyring-daemon --start --components=ssh
 1597 ?        Ssl    0:01              \_ /usr/lib/gnome-settings-daemon/gnome-settings-daemon
 1598 ?        Ss     0:00              \_ seahorse-daemon
 1604 ?        S      0:00              \_ /usr/lib64/gvfs/gvfsd
 1611 ?        Ssl    0:00              \_ /usr/lib64/gvfs//gvfs-fuse-daemon /home/kay/.gvfs
 1636 ?        S      0:01              \_ /usr/bin/metacity
 1642 ?        Ssl    0:00              \_ /usr/bin/pulseaudio --start --log-target=syslog
 1740 ?        S      0:00              |   \_ /usr/lib/pulse/gconf-helper
 1649 ?        S      0:01              \_ gnome-panel
 1651 ?        S      0:02              \_ nautilus
 1653 ?        Ssl    0:00              \_ /usr/lib/bonobo/bonobo-activation-server --ac-activate --ior-output-fd=18
 1668 ?        S      0:00              \_ python /usr/share/system-config-printer/applet.py
 1669 ?        S      0:03              \_ /usr/lib/gnome-main-menu/main-menu --oaf-activate-iid=OAFIID:GNOME_MainMenu_Factory --oaf-ior-fd=18
 1672 ?        S      0:00              \_ evolution-alarm-notify
 1673 ?        S      0:00              \_ /usr/lib/polkit-gnome/polkit-gnome-authentication-agent-1
 1676 ?        S      0:00              \_ gnome-power-manager
 1678 ?        S      0:00              \_ gnome-volume-control-applet
 1681 ?        S      0:01              \_ nm-applet --sm-disable
 1684 ?        S      0:00              \_ /usr/lib/gdu-notification-daemon
 1687 ?        S      0:00              \_ bluetooth-applet
 1705 ?        S      0:00              \_ /usr/lib/notification-daemon-1.0/notification-daemon
 1712 ?        S      0:00              \_ /usr/lib/evolution-data-server/e-calendar-factory
 1714 ?        Ss     0:00              \_ gnome-screensaver
 1719 ?        S      0:00              \_ /usr/lib/evolution-data-server/e-addressbook-factory
 1726 ?        S      0:00              \_ /usr/lib64/gvfs/gvfs-gdu-volume-monitor
 1737 ?        S      0:00              \_ /usr/lib64/gvfs/gvfs-gphoto2-volume-monitor
 1745 ?        S      0:00              \_ /usr/lib64/gvfs/gvfsd-trash --spawner :1.8 /org/gtk/gvfs/exec_spaw/0
 1774 ?        S      0:00              \_ /usr/lib64/gvfs/gvfsd-burn --spawner :1.8 /org/gtk/gvfs/exec_spaw/1
 1786 ?        S      0:00              \_ /usr/lib64/gvfs/gvfsd-metadata
 1885 ?        Sl     0:01              \_ /usr/bin/gnome-terminal -x /bin/sh -c cd '/home/kay/Desktop' && exec $SHELL
 1927 ?        S      0:00              |   \_ gnome-pty-helper
 1928 pts/1    Ss     0:00              |   \_ /bin/bash
 2124 pts/1    R+     0:00              |       \_ ps afx
 1981 ?        S      0:01              \_ pidgin
 2014 ?        SLl    0:06              \_ evolution
 2065 ?        S      0:00              \_ /bin/sh /usr/bin/firefox
 2070 ?        Rl     0:04              |   \_ /usr/lib64/firefox/firefox
 2111 ?        S      0:01              \_ xchat
 2112 ?        S      0:00              |   \_ xchat
 2123 ?        S      0:00              |   \_ xchat
 2117 ?        S      0:00              \_ palimpsest
 1356 ?        Ss     0:00 /usr/lib/postfix/master
 1377 ?        S      0:00  \_ pickup -l -t fifo -u
 1390 ?        Ss     0:00 /usr/sbin/crond
 1434 ?        Ssl    0:00 /usr/sbin/NetworkManager
 1695 ?        S      0:00  \_ /sbin/dhclient -d -sf /usr/lib/NetworkManager/nm-dhcp-client.action -pf /var/run/dhclient-eth0.pid -lf /var/lib/dhcp/dhclient-73a36e75-368a-434c-b6c0-cfda0e3f1b50-eth0.lease -cf /var/run/nm-dhclient-eth0.conf eth0
 1438 ?        S      0:00 /usr/sbin/modem-manager
 1441 ?        S      0:00 /usr/sbin/wpa_supplicant -c /etc/wpa_supplicant/wpa_supplicant.conf -u -f /var/log/wpa_supplicant.log
 1443 ?        S      0:00 /usr/sbin/nm-system-settings --config /etc/NetworkManager/nm-system-settings.conf
 1550 ?        S      0:00 /usr/lib/DeviceKit-power/devkit-power-daemon
 1644 ?        SNl    0:00 /usr/lib/rtkit/rtkit-daemon
 1648 ?        S      0:06 /usr/lib/polkit-1/polkitd
 1707 ?        S      0:00 /usr/lib/DeviceKit-disks/devkit-disks-daemon
 1708 ?        S      0:00  \_ devkit-disks-daemon: polling /dev/sdb /dev/sdc
 1807 tty1     Ss+    0:00 /sbin/mingetty --noclear tty1
 1808 tty2     Ss+    0:00 /sbin/mingetty tty2
 1809 tty3     Ss+    0:00 /sbin/mingetty tty3
 1810 tty4     Ss+    0:00 /sbin/mingetty tty4
 1811 tty5     Ss+    0:00 /sbin/mingetty tty5
 1812 tty6     Ss+    0:00 /sbin/mingetty tty6



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for  child processes
  2010-02-03 17:49   ` Lennart Poettering
@ 2010-02-05  9:54     ` Américo Wang
  2010-02-11 10:21       ` Kay Sievers
  0 siblings, 1 reply; 29+ messages in thread
From: Américo Wang @ 2010-02-05  9:54 UTC (permalink / raw)
  To: Lennart Poettering; +Cc: linux-kernel

On Thu, Feb 4, 2010 at 1:49 AM, Lennart Poettering <mzxreary@0pointer.de> wrote:
> On Wed, 03.02.10 23:31, Américo Wang (xiyou.wangcong@gmail.com) wrote:
>
>> On Tue, Feb 02, 2010 at 01:04:57PM +0100, Lennart Poettering wrote:
>> >
>> >This patch adds a simple flag for each process that marks it as an
>> >"anchor" process for all its children and grandchildren. If a child of
>> >such an anchor dies all its children will not be reparented to init, but
>> >instead to this anchor, escaping this anchor process is not possible. A
>> >task with this flag set hence acts is little "sub-init".
>>
>> This will break the applictions which using 'getppid() == 1' to check
>> if its real parent is dead or not...
>
> Usage of the PR_SETANCHOR flag is optional for a process. It won't
> break anything unless enabled. So I don't really see a problem here.
>
> Of course, when this flag is used the behaviour is different from what
> traditional Unix says what happens with the children of a process when
> it dies. But uh, that's the whole point and that's why this flag is
> enabled optionally only.


As for the example you mentioned, gnome-session, with your patch
applied, it will use this to set itself as "anchor", all the programs that
are started within it will be children of it. If one of these programs uses
"getppid() == 1" trick, it will break it.

>
> Also, on a side note: code that checks if its parent process died most
> likely should rewritten to use PR_DEATHSIG or something like that
> anyway, so that it is notified about the parent dying instead of
> polling for it manually.
>

I agree, but this is not a reason for you to break the compatiblity.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for  child processes
  2010-02-05  9:54     ` Américo Wang
@ 2010-02-11 10:21       ` Kay Sievers
  0 siblings, 0 replies; 29+ messages in thread
From: Kay Sievers @ 2010-02-11 10:21 UTC (permalink / raw)
  To: Américo Wang; +Cc: Lennart Poettering, linux-kernel

On Fri, Feb 5, 2010 at 10:54, Américo Wang <xiyou.wangcong@gmail.com> wrote:
>> Also, on a side note: code that checks if its parent process died most
>> likely should rewritten to use PR_DEATHSIG or something like that
>> anyway, so that it is notified about the parent dying instead of
>> polling for it manually.
>>
>
> I agree, but this is not a reason for you to break the compatiblity.

Any substantial comments instead?

Thanks,
Kay

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes
  2010-02-02 12:04 [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes Lennart Poettering
                   ` (2 preceding siblings ...)
  2010-02-04 15:42 ` Kay Sievers
@ 2010-03-04 14:08 ` Oleg Nesterov
  2010-03-04 22:14   ` Roland McGrath
                     ` (2 more replies)
  2010-12-20 14:26 ` Scott James Remnant
  4 siblings, 3 replies; 29+ messages in thread
From: Oleg Nesterov @ 2010-03-04 14:08 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: linux-kernel, Americo Wang, James Morris, Kay Sievers,
	KOSAKI Motohiro, Kyle McMartin, Linus Torvalds, Michael Kerrisk,
	Roland McGrath

On 02/02, Lennart Poettering wrote:
>
> This patch adds a simple flag for each process that marks it as an
> "anchor" process for all its children and grandchildren. If a child of
> such an anchor dies all its children will not be reparented to init, but
> instead to this anchor, escaping this anchor process is not possible. A
> task with this flag set hence acts is little "sub-init".

Lennart, this patch adds a noticeable linux-only feature. I see
your point, but imho your idea needs the "strong" acks. I cc'ed
some heavyweights, if someone dislikes your idea he can nack it
right now.


Security. This is beyond my understanding, hopefully the cc'ed
experts can help.

Should we clear ->child_anchor flags when the "sub-init" execs? Or,
at least, when the task changes its credentials? Probably not, but
dunno.

The more problematic case is when the descendant of the "sub-init"
execs the setuid application. Should we allow the reparenting to
!/sbin/init task in this case?

Should we clear ->pdeath_signal after reparenting to sub-init ?

Do we need the new security_operations->task_reparent() method ?
Or, perhaps we can reuse ->task_wait() if we add the "parent"
argument?

Something else we should think about?


As for the patch itself,

>  static struct task_struct *find_new_reaper(struct task_struct *father)
>  {
>  	struct pid_namespace *pid_ns = task_active_pid_ns(father);
> -	struct task_struct *thread;
> +	struct task_struct *thread, *anchor;
>
>  	thread = father;
>  	while_each_thread(father, thread) {
> @@ -715,6 +715,11 @@ static struct task_struct *find_new_reaper(struct task_struct *father)
>  		return thread;
>  	}
>
> +	/* find the first ancestor which is marked child_anchor */
> +	for (anchor = father->parent; anchor != &init_task; anchor = anchor->parent)
> +		if (anchor->child_anchor)
> +			return anchor;
> +
>  	if (unlikely(pid_ns->child_reaper == father)) {
>  		write_unlock_irq(&tasklist_lock);
>  		if (unlikely(pid_ns == &init_pid_ns))

This is not exactly right:

	- We can race with the exiting anchor. IOW, we must not reparent
	  to anchor if it has already passed exit_notify(). You can check
	  PF_EXITING flag like while_each_thread() above does.

	- "anchor != &init_task" is not correct, the task must not escape
	  its container. We should stop checking the ->parent list when we
	  hit ->child_reaper, not init_task

	- if a sub-namespace init dies, we shouldn't skip zap_pid_ns_processes()
	  logic, move the "for" loop below. This also closes another possible
	  race, the anchor can be already dead when we take tasklist again.

> @@ -1578,6 +1578,13 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>  			else
>  				error = PR_MCE_KILL_DEFAULT;
>  			break;
> +		case PR_SET_ANCHOR:
> +			me->child_anchor = !!arg2;
> +			error = 0;
> +			break;

It is a bit strange that PR_SET_ANCHOR acts per-thread, not per process.

Suppose that a task A does prtcl(PR_SET_ANCHOR) and marks itself as a local
child reaper. Then its sub-thread B forks() the process C which also forks
the child X. When C dies, X will be re-parented to init. Is this what we
really want?

To me, it looks more natural if PR_SET_ANCHOR marks the whole process as
a local reaper, not only the thread which called PR_SET_ANCHOR.

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes
  2010-03-04 14:08 ` Oleg Nesterov
@ 2010-03-04 22:14   ` Roland McGrath
  2010-03-05 18:51     ` Kay Sievers
  2010-03-06  0:20     ` Lennart Poettering
  2010-03-05  4:47   ` KOSAKI Motohiro
  2010-03-06  0:16   ` Lennart Poettering
  2 siblings, 2 replies; 29+ messages in thread
From: Roland McGrath @ 2010-03-04 22:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Lennart Poettering, linux-kernel, Americo Wang, James Morris,
	Kay Sievers, KOSAKI Motohiro, Kyle McMartin, Linus Torvalds,
	Michael Kerrisk

> Security. This is beyond my understanding, hopefully the cc'ed
> experts can help.

There are a few different aspects of behavior change to think about.

1. Who can get a SIGCHLD and wait result they weren't expecting.
2. Who sees some PID for getppid() when they are expecting 1.
3. What ps shows.

When I start thinking through what might be security issues, they are
almost all #1 questions.  There is a hairy nest of many variations of #1
questions.  The #2 question is pretty simple, but it also could be an issue
for security when setuid is involved (or just correctness for any
application).

My impression is that #3 is the only actual motivation for this feature.
So perhaps we should consider an approach that leaves the rest of the
semantics alone and only affects that.

Lennart, am I right that this is all you are looking for?  Does it even
matter to you that this change the PPID that ps groks today?  How about if
it's just an entirely new kind of assocation that ps et al can learn to
display, and not even the traditional PPID field changes?

> To me, it looks more natural if PR_SET_ANCHOR marks the whole process as
> a local reaper, not only the thread which called PR_SET_ANCHOR.

Agreed.  It could probably be a bit in signal_struct.flags,
which also means no memory cost for adding the feature.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes
  2010-03-04 14:08 ` Oleg Nesterov
  2010-03-04 22:14   ` Roland McGrath
@ 2010-03-05  4:47   ` KOSAKI Motohiro
  2010-03-05 18:55     ` Kay Sievers
  2010-03-06  0:16   ` Lennart Poettering
  2 siblings, 1 reply; 29+ messages in thread
From: KOSAKI Motohiro @ 2010-03-05  4:47 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: kosaki.motohiro, Lennart Poettering, linux-kernel, Americo Wang,
	James Morris, Kay Sievers, Kyle McMartin, Linus Torvalds,
	Michael Kerrisk, Roland McGrath

> On 02/02, Lennart Poettering wrote:
> >
> > This patch adds a simple flag for each process that marks it as an
> > "anchor" process for all its children and grandchildren. If a child of
> > such an anchor dies all its children will not be reparented to init, but
> > instead to this anchor, escaping this anchor process is not possible. A
> > task with this flag set hence acts is little "sub-init".
> 
> Lennart, this patch adds a noticeable linux-only feature. I see
> your point, but imho your idea needs the "strong" acks. I cc'ed
> some heavyweights, if someone dislikes your idea he can nack it
> right now.
> 
> 
> Security. This is beyond my understanding, hopefully the cc'ed
> experts can help.
> 
> Should we clear ->child_anchor flags when the "sub-init" execs? Or,
> at least, when the task changes its credentials? Probably not, but
> dunno.
> 
> The more problematic case is when the descendant of the "sub-init"
> execs the setuid application. Should we allow the reparenting to
> !/sbin/init task in this case?
> 
> Should we clear ->pdeath_signal after reparenting to sub-init ?
> 
> Do we need the new security_operations->task_reparent() method ?
> Or, perhaps we can reuse ->task_wait() if we add the "parent"
> argument?
> 
> Something else we should think about?

I think changing reparent rule is a bit risky. instead, I propse
that exporting ANCHOR flag via /proc and ps parse it.

What do you think?




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for  child processes
  2010-03-04 22:14   ` Roland McGrath
@ 2010-03-05 18:51     ` Kay Sievers
  2010-03-05 19:18       ` Roland McGrath
  2010-03-06  0:20     ` Lennart Poettering
  1 sibling, 1 reply; 29+ messages in thread
From: Kay Sievers @ 2010-03-05 18:51 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Oleg Nesterov, Lennart Poettering, linux-kernel, Americo Wang,
	James Morris, KOSAKI Motohiro, Kyle McMartin, Linus Torvalds,
	Michael Kerrisk

On Thu, Mar 4, 2010 at 14:14, Roland McGrath <roland@redhat.com> wrote:
>> Security. This is beyond my understanding, hopefully the cc'ed
>> experts can help.
>
> There are a few different aspects of behavior change to think about.
>
> 1. Who can get a SIGCHLD and wait result they weren't expecting.
> 2. Who sees some PID for getppid() when they are expecting 1.
> 3. What ps shows.
>
> When I start thinking through what might be security issues, they are
> almost all #1 questions.  There is a hairy nest of many variations of #1
> questions.  The #2 question is pretty simple, but it also could be an issue
> for security when setuid is involved (or just correctness for any
> application).
>
> My impression is that #3 is the only actual motivation for this feature.
> So perhaps we should consider an approach that leaves the rest of the
> semantics alone and only affects that.

Oh, no. Actually getting the SIGCHILD is the needed feature here. A
process who sets the ANCHOR flag is surely expected to handle these
signals. It's all about a user "init-like" process" that can do
similar things for a logged-in user what /sbin/init can to for the
system. So, it's all about 1.), and 3.) is a nice side-effect, but not
the motivation to do this.

And 2.) is just very broken behavior that should be fixed in the
application, and it can be worked around in the sub-init process if
needed.

Thanks,
Kay

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for  child processes
  2010-03-05  4:47   ` KOSAKI Motohiro
@ 2010-03-05 18:55     ` Kay Sievers
  0 siblings, 0 replies; 29+ messages in thread
From: Kay Sievers @ 2010-03-05 18:55 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Oleg Nesterov, Lennart Poettering, linux-kernel, Americo Wang,
	James Morris, Kyle McMartin, Linus Torvalds, Michael Kerrisk,
	Roland McGrath

On Thu, Mar 4, 2010 at 20:47, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
>> On 02/02, Lennart Poettering wrote:
>> >
>> > This patch adds a simple flag for each process that marks it as an
>> > "anchor" process for all its children and grandchildren. If a child of
>> > such an anchor dies all its children will not be reparented to init, but
>> > instead to this anchor, escaping this anchor process is not possible. A
>> > task with this flag set hence acts is little "sub-init".
>>
>> Lennart, this patch adds a noticeable linux-only feature. I see
>> your point, but imho your idea needs the "strong" acks. I cc'ed
>> some heavyweights, if someone dislikes your idea he can nack it
>> right now.
>>
>>
>> Security. This is beyond my understanding, hopefully the cc'ed
>> experts can help.
>>
>> Should we clear ->child_anchor flags when the "sub-init" execs? Or,
>> at least, when the task changes its credentials? Probably not, but
>> dunno.
>>
>> The more problematic case is when the descendant of the "sub-init"
>> execs the setuid application. Should we allow the reparenting to
>> !/sbin/init task in this case?
>>
>> Should we clear ->pdeath_signal after reparenting to sub-init ?
>>
>> Do we need the new security_operations->task_reparent() method ?
>> Or, perhaps we can reuse ->task_wait() if we add the "parent"
>> argument?
>>
>> Something else we should think about?
>
> I think changing reparent rule is a bit risky. instead, I propse
> that exporting ANCHOR flag via /proc and ps parse it.
>
> What do you think?

No, that does not help anything, as mentioned earlier. It's not about
making 'ps' look nice, we need the signals if processes die, so we
want to be the parent of the process to keep the usual semantics.

Thanks,
Kay

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for  child processes
  2010-03-05 18:51     ` Kay Sievers
@ 2010-03-05 19:18       ` Roland McGrath
  2010-03-06  0:24         ` Lennart Poettering
  0 siblings, 1 reply; 29+ messages in thread
From: Roland McGrath @ 2010-03-05 19:18 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Oleg Nesterov, Lennart Poettering, linux-kernel, Americo Wang,
	James Morris, KOSAKI Motohiro, Kyle McMartin, Linus Torvalds,
	Michael Kerrisk

> Oh, no. Actually getting the SIGCHILD is the needed feature here. A
> process who sets the ANCHOR flag is surely expected to handle these
> signals. It's all about a user "init-like" process" that can do
> similar things for a logged-in user what /sbin/init can to for the
> system. So, it's all about 1.), and 3.) is a nice side-effect, but not
> the motivation to do this.

Please explain this more explicitly.  What the actual init does with
miscellaneous reparented processes is just reap them and ignore their
status.  What do you intend an "anchor" process to do other than that?


Thanks,
Roland

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes
  2010-03-04 14:08 ` Oleg Nesterov
  2010-03-04 22:14   ` Roland McGrath
  2010-03-05  4:47   ` KOSAKI Motohiro
@ 2010-03-06  0:16   ` Lennart Poettering
  2010-03-11  4:14     ` Eric W. Biederman
  2 siblings, 1 reply; 29+ messages in thread
From: Lennart Poettering @ 2010-03-06  0:16 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Americo Wang, James Morris, Kay Sievers,
	KOSAKI Motohiro, Kyle McMartin, Linus Torvalds, Michael Kerrisk,
	Roland McGrath

On Thu, 04.03.10 15:08, Oleg Nesterov (oleg@redhat.com) wrote:

> Should we clear ->child_anchor flags when the "sub-init" execs? Or,
> at least, when the task changes its credentials? Probably not, but
> dunno.

Since this flag is only useful for a very well defined type of processes
(i.e. session managers, supervising daemons, init systems) it might make
sense to reset it automatically when privs are dropped or we exec
something. After all, I don't see how we'd gain any useful functionality
when we allow this flag to continue to be set. However we would
certainly be on the safer side when we reset it, because that way it can
never leak it to processes that are differently privileged or do not
expect it.

So, for the sake of being on the safe side, I think we should reset the
flag on exec()/setuid().

> It is a bit strange that PR_SET_ANCHOR acts per-thread, not per
> process.

Yes, I agree, this should be per-process indeed.

Lennart

-- 
Lennart Poettering                        Red Hat, Inc.
lennart [at] poettering [dot] net
http://0pointer.net/lennart/           GnuPG 0x1A015CC4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes
  2010-03-04 22:14   ` Roland McGrath
  2010-03-05 18:51     ` Kay Sievers
@ 2010-03-06  0:20     ` Lennart Poettering
  2010-03-08 23:11       ` Roland McGrath
  1 sibling, 1 reply; 29+ messages in thread
From: Lennart Poettering @ 2010-03-06  0:20 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Oleg Nesterov, linux-kernel, Americo Wang, James Morris,
	Kay Sievers, KOSAKI Motohiro, Kyle McMartin, Linus Torvalds,
	Michael Kerrisk

On Thu, 04.03.10 14:14, Roland McGrath (roland@redhat.com) wrote:

> There are a few different aspects of behavior change to think about.
> 
> 1. Who can get a SIGCHLD and wait result they weren't expecting.
> 2. Who sees some PID for getppid() when they are expecting 1.
> 3. What ps shows.
> 
> When I start thinking through what might be security issues, they are
> almost all #1 questions.  There is a hairy nest of many variations of #1
> questions.  The #2 question is pretty simple, but it also could be an issue
> for security when setuid is involved (or just correctness for any
> application).
> 
> My impression is that #3 is the only actual motivation for this feature.
> So perhaps we should consider an approach that leaves the rest of the
> semantics alone and only affects that.
> 
> Lennart, am I right that this is all you are looking for?  Does it even
> matter to you that this change the PPID that ps groks today?  How about if
> it's just an entirely new kind of assocation that ps et al can learn to
> display, and not even the traditional PPID field changes?

Uh, no. Actually it's the fact that my sub-init gets the SIGCHLD, which
I am looking for. The clean ps tree is just a side-effect.

When the sub-init gets the SIGCHLD for its "grandchildren" then we can
supervise double-forking daemons, and properly handle daemons that die
due to SIGSEGV and suchlike. 

So what I am after is the SIGCHLD for the grandparents, the clean ps
tree is kinda boring.

Lennart

-- 
Lennart Poettering                        Red Hat, Inc.
lennart [at] poettering [dot] net
http://0pointer.net/lennart/           GnuPG 0x1A015CC4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for   child processes
  2010-03-05 19:18       ` Roland McGrath
@ 2010-03-06  0:24         ` Lennart Poettering
  2010-03-09  0:45           ` Ray Lee
  0 siblings, 1 reply; 29+ messages in thread
From: Lennart Poettering @ 2010-03-06  0:24 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Kay Sievers, Oleg Nesterov, linux-kernel, Americo Wang,
	James Morris, KOSAKI Motohiro, Kyle McMartin, Linus Torvalds,
	Michael Kerrisk

On Fri, 05.03.10 11:18, Roland McGrath (roland@redhat.com) wrote:

> 
> > Oh, no. Actually getting the SIGCHILD is the needed feature here. A
> > process who sets the ANCHOR flag is surely expected to handle these
> > signals. It's all about a user "init-like" process" that can do
> > similar things for a logged-in user what /sbin/init can to for the
> > system. So, it's all about 1.), and 3.) is a nice side-effect, but not
> > the motivation to do this.
> 
> Please explain this more explicitly.  What the actual init does with
> miscellaneous reparented processes is just reap them and ignore their
> status.  What do you intend an "anchor" process to do other than that?

It could use the grandchildren's SIGCHLDs for various task management
issues: i.e. watching double-forking daemons, catch SIGSEGVS so that you
can crosslink that service state to systems like abrt. Or even just that
you can implement a safe restarting logic: i.e. so that we can easily
wait that a process and its children are fully dead before we restart
the service.

Lennart

-- 
Lennart Poettering                        Red Hat, Inc.
lennart [at] poettering [dot] net
http://0pointer.net/lennart/           GnuPG 0x1A015CC4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes
  2010-03-06  0:20     ` Lennart Poettering
@ 2010-03-08 23:11       ` Roland McGrath
  0 siblings, 0 replies; 29+ messages in thread
From: Roland McGrath @ 2010-03-08 23:11 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Oleg Nesterov, linux-kernel, Americo Wang, James Morris,
	Kay Sievers, KOSAKI Motohiro, Kyle McMartin, Linus Torvalds,
	Michael Kerrisk

> Uh, no. Actually it's the fact that my sub-init gets the SIGCHLD, which
> I am looking for. The clean ps tree is just a side-effect.

Ok.  In that case, there is a substantial can of worms in considering
compatibility breakages, and especially the various wrinkles of setuid that
could be security issues.  It will take a lot of careful thought to be sure
how we can do this without opening new ways to confuse and abuse a setuid
program.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for  child processes
  2010-03-06  0:24         ` Lennart Poettering
@ 2010-03-09  0:45           ` Ray Lee
  2010-03-09 13:19             ` Oleg Nesterov
  0 siblings, 1 reply; 29+ messages in thread
From: Ray Lee @ 2010-03-09  0:45 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Roland McGrath, Kay Sievers, Oleg Nesterov, linux-kernel,
	Americo Wang, James Morris, KOSAKI Motohiro, Kyle McMartin,
	Linus Torvalds, Michael Kerrisk

On Fri, Mar 5, 2010 at 4:24 PM, Lennart Poettering <mzxreary@0pointer.de> wrote:
> On Fri, 05.03.10 11:18, Roland McGrath (roland@redhat.com) wrote:
>
>>
>> > Oh, no. Actually getting the SIGCHILD is the needed feature here. A
>> > process who sets the ANCHOR flag is surely expected to handle these
>> > signals. It's all about a user "init-like" process" that can do
>> > similar things for a logged-in user what /sbin/init can to for the
>> > system. So, it's all about 1.), and 3.) is a nice side-effect, but not
>> > the motivation to do this.
>>
>> Please explain this more explicitly.  What the actual init does with
>> miscellaneous reparented processes is just reap them and ignore their
>> status.  What do you intend an "anchor" process to do other than that?
>
> It could use the grandchildren's SIGCHLDs for various task management
> issues: i.e. watching double-forking daemons, catch SIGSEGVS so that you
> can crosslink that service state to systems like abrt. Or even just that
> you can implement a safe restarting logic: i.e. so that we can easily
> wait that a process and its children are fully dead before we restart
> the service.

The kernel already offers system-wide process exit notification via
taskstats (a netlink interface), though unfortunately I believe it's
optional. It's pretty easy to use (as these things go, anyway -- I was
able to hack up an arbitrary process exit watcher in about a half hour
based on Documentation/accounting/getdelays.c).

Would this existing mechanism cover what you need?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes
  2010-03-09  0:45           ` Ray Lee
@ 2010-03-09 13:19             ` Oleg Nesterov
  0 siblings, 0 replies; 29+ messages in thread
From: Oleg Nesterov @ 2010-03-09 13:19 UTC (permalink / raw)
  To: Ray Lee
  Cc: Lennart Poettering, Roland McGrath, Kay Sievers, linux-kernel,
	Americo Wang, James Morris, KOSAKI Motohiro, Kyle McMartin,
	Linus Torvalds, Michael Kerrisk

On 03/08, Ray Lee wrote:
>
> The kernel already offers system-wide process exit notification via
> taskstats (a netlink interface), though unfortunately I believe it's
> optional. It's pretty easy to use (as these things go, anyway -- I was
> able to hack up an arbitrary process exit watcher in about a half hour
> based on Documentation/accounting/getdelays.c).

Or proc connector (optional too). Unlike taskstats it notifies about
fork() as well. But, iirc it doesn't allow to filter out the unwanted pids.

Actually, I don't really understand how PR_SET_ANCHOR task can monitor
several daemons. I mean, when the grandchild dies, the sub-init doesn't
know who forked this child during daemonize().


Cough, can't resist... With utrace it would very simple to create the
module which allows to monitor the child's fork/exit/etc with almost
zero overhead, and this overhead only applies to the "traced" tasks.

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes
  2010-03-06  0:16   ` Lennart Poettering
@ 2010-03-11  4:14     ` Eric W. Biederman
  2010-03-11  7:56       ` KOSAKI Motohiro
  0 siblings, 1 reply; 29+ messages in thread
From: Eric W. Biederman @ 2010-03-11  4:14 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Oleg Nesterov, linux-kernel, Americo Wang, James Morris,
	Kay Sievers, KOSAKI Motohiro, Kyle McMartin, Linus Torvalds,
	Michael Kerrisk, Roland McGrath

Lennart Poettering <mzxreary@0pointer.de> writes:

> On Thu, 04.03.10 15:08, Oleg Nesterov (oleg@redhat.com) wrote:
>
>> Should we clear ->child_anchor flags when the "sub-init" execs? Or,
>> at least, when the task changes its credentials? Probably not, but
>> dunno.
>
> Since this flag is only useful for a very well defined type of processes
> (i.e. session managers, supervising daemons, init systems) it might make
> sense to reset it automatically when privs are dropped or we exec
> something. After all, I don't see how we'd gain any useful functionality
> when we allow this flag to continue to be set. However we would
> certainly be on the safer side when we reset it, because that way it can
> never leak it to processes that are differently privileged or do not
> expect it.
>
> So, for the sake of being on the safe side, I think we should reset the
> flag on exec()/setuid().
>
>> It is a bit strange that PR_SET_ANCHOR acts per-thread, not per
>> process.
>
> Yes, I agree, this should be per-process indeed.

Have you take a look at the pid namespace?

Except for the fact it requires privilege to create it seems to do
what you want.  It is certainly what I have been using when I want
an inescapable environment.

If nothing else I get the feeling that what you are after is
a generalization of the child_reaper feature in the pid namespace
and yet you haven't touched any of that code.

Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes
  2010-03-11  4:14     ` Eric W. Biederman
@ 2010-03-11  7:56       ` KOSAKI Motohiro
  0 siblings, 0 replies; 29+ messages in thread
From: KOSAKI Motohiro @ 2010-03-11  7:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: kosaki.motohiro, Lennart Poettering, Oleg Nesterov, linux-kernel,
	Americo Wang, James Morris, Kay Sievers, Kyle McMartin,
	Linus Torvalds, Michael Kerrisk, Roland McGrath

> Lennart Poettering <mzxreary@0pointer.de> writes:
> 
> > On Thu, 04.03.10 15:08, Oleg Nesterov (oleg@redhat.com) wrote:
> >
> >> Should we clear ->child_anchor flags when the "sub-init" execs? Or,
> >> at least, when the task changes its credentials? Probably not, but
> >> dunno.
> >
> > Since this flag is only useful for a very well defined type of processes
> > (i.e. session managers, supervising daemons, init systems) it might make
> > sense to reset it automatically when privs are dropped or we exec
> > something. After all, I don't see how we'd gain any useful functionality
> > when we allow this flag to continue to be set. However we would
> > certainly be on the safer side when we reset it, because that way it can
> > never leak it to processes that are differently privileged or do not
> > expect it.
> >
> > So, for the sake of being on the safe side, I think we should reset the
> > flag on exec()/setuid().
> >
> >> It is a bit strange that PR_SET_ANCHOR acts per-thread, not per
> >> process.
> >
> > Yes, I agree, this should be per-process indeed.
> 
> Have you take a look at the pid namespace?
> 
> Except for the fact it requires privilege to create it seems to do
> what you want.  It is certainly what I have been using when I want
> an inescapable environment.
> 
> If nothing else I get the feeling that what you are after is
> a generalization of the child_reaper feature in the pid namespace
> and yet you haven't touched any of that code.

I guess it doesn't fit for gnome-session. because gtop or similar
system monitoring process assume it can see all processes in the system.

thanks.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes
  2010-02-02 12:04 [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes Lennart Poettering
                   ` (3 preceding siblings ...)
  2010-03-04 14:08 ` Oleg Nesterov
@ 2010-12-20 14:26 ` Scott James Remnant
  2010-12-20 14:51   ` Kay Sievers
  2010-12-21  9:56   ` Lennart Poettering
  4 siblings, 2 replies; 29+ messages in thread
From: Scott James Remnant @ 2010-12-20 14:26 UTC (permalink / raw)
  To: Lennart Poettering; +Cc: linux-kernel

On Tue, Feb 2, 2010 at 12:04 PM, Lennart Poettering
<lennart@poettering.net> wrote:

> Right now, if a process dies all its children are reparented to init.
> This logic has good uses, i.e. for double forking when daemonizing.
> However it also allows child processes to "escape" their parents, which
> is a problem for software like session managers (such as gnome-session)
> or other process supervisors.
>
> This patch adds a simple flag for each process that marks it as an
> "anchor" process for all its children and grandchildren. If a child of
> such an anchor dies all its children will not be reparented to init, but
> instead to this anchor, escaping this anchor process is not possible. A
> task with this flag set hence acts is little "sub-init".
>
Why can't you simply begin a new pid namespace with the session
manager or other process supervisor?  That way the session
manager/process supervisor is for all intents and purposes an init
daemon, so shouldn't be surprised about getting SIGCHLD.

More to the point, it means that as far as the processes themselves
are concerned, they're being reparented to pid 1 just as they were
before, so you wouldn't be breaking any assumptions there either.

You could use the existing init daemon to create these pid namespaces
when it spawns the session manager.

Scott

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes
  2010-12-20 14:26 ` Scott James Remnant
@ 2010-12-20 14:51   ` Kay Sievers
  2010-12-21  9:56   ` Lennart Poettering
  1 sibling, 0 replies; 29+ messages in thread
From: Kay Sievers @ 2010-12-20 14:51 UTC (permalink / raw)
  To: Scott James Remnant; +Cc: Lennart Poettering, linux-kernel

On Mon, Dec 20, 2010 at 15:26, Scott James Remnant <scott@netsplit.com> wrote:
> On Tue, Feb 2, 2010 at 12:04 PM, Lennart Poettering
> <lennart@poettering.net> wrote:
>
>> Right now, if a process dies all its children are reparented to init.
>> This logic has good uses, i.e. for double forking when daemonizing.
>> However it also allows child processes to "escape" their parents, which
>> is a problem for software like session managers (such as gnome-session)
>> or other process supervisors.
>>
>> This patch adds a simple flag for each process that marks it as an
>> "anchor" process for all its children and grandchildren. If a child of
>> such an anchor dies all its children will not be reparented to init, but
>> instead to this anchor, escaping this anchor process is not possible. A
>> task with this flag set hence acts is little "sub-init".
>>
> Why can't you simply begin a new pid namespace with the session
> manager or other process supervisor?

We do not want to disconnect users from the system. Too much stuff
depends on that for good reasons.

This is only about a "user init" process, which is a much softer
concept which better fits into our current setups. It is not really
about disconnecting the user from the system, by putting him in a
"container".

The systems view from the management/administration perspective, with
users with their own pids, would really get far to complicated, I
think.

> That way the session
> manager/process supervisor is for all intents and purposes an init
> daemon, so shouldn't be surprised about getting SIGCHLD.

That shouldn't be a problem.

> More to the point, it means that as far as the processes themselves
> are concerned, they're being reparented to pid 1 just as they were
> before, so you wouldn't be breaking any assumptions there either.

For now, I don't think that this will break anything. Stuff that
really expects to have ppid() ==  1 should be fixed anyway.

> You could use the existing init daemon to create these pid namespaces
> when it spawns the session manager.

We already use the existing init daemon for that. :)

This is mainly about 'prettifying ps'. The cgroups already provide us
with all the needed information, it would be just nice to localize
SIGCHLD handling to the "user init", where the signal belongs to.

Kay

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for  child processes
  2010-12-20 14:26 ` Scott James Remnant
  2010-12-20 14:51   ` Kay Sievers
@ 2010-12-21  9:56   ` Lennart Poettering
  2010-12-21 12:05     ` Scott James Remnant
  1 sibling, 1 reply; 29+ messages in thread
From: Lennart Poettering @ 2010-12-21  9:56 UTC (permalink / raw)
  To: Scott James Remnant; +Cc: linux-kernel

On Mon, 20.12.10 14:26, Scott James Remnant (scott@netsplit.com) wrote:

> > This patch adds a simple flag for each process that marks it as an
> > "anchor" process for all its children and grandchildren. If a child of
> > such an anchor dies all its children will not be reparented to init, but
> > instead to this anchor, escaping this anchor process is not possible. A
> > task with this flag set hence acts is little "sub-init".
> >
> Why can't you simply begin a new pid namespace with the session
> manager or other process supervisor?  That way the session
> manager/process supervisor is for all intents and purposes an init
> daemon, so shouldn't be surprised about getting SIGCHLD.

PID namespaces primarily provide an independent PID numbering scheme for
a subset of processes, i.e. so that identical may PIDs refer to different
processes depending on the namespace they are running in. As a side
effect this also provides init-like behaviour for processes that aren't
the original PID 1 of the operating system. For systemd we are only
interested in this side effect, but are not interested at all in the
renumbering of processes, and in fact would even really dislike if it
happened. That's why PR_SET_ANCHOR is useful: it gives us init-like
behaviour without renaming all processes.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes
  2010-12-21  9:56   ` Lennart Poettering
@ 2010-12-21 12:05     ` Scott James Remnant
  2010-12-23 15:44       ` Lennart Poettering
  0 siblings, 1 reply; 29+ messages in thread
From: Scott James Remnant @ 2010-12-21 12:05 UTC (permalink / raw)
  To: Lennart Poettering; +Cc: linux-kernel

On Tue, Dec 21, 2010 at 9:56 AM, Lennart Poettering
<mzxreary@0pointer.de> wrote:
> On Mon, 20.12.10 14:26, Scott James Remnant (scott@netsplit.com) wrote:
>
>> > This patch adds a simple flag for each process that marks it as an
>> > "anchor" process for all its children and grandchildren. If a child of
>> > such an anchor dies all its children will not be reparented to init, but
>> > instead to this anchor, escaping this anchor process is not possible. A
>> > task with this flag set hence acts is little "sub-init".
>> >
>> Why can't you simply begin a new pid namespace with the session
>> manager or other process supervisor?  That way the session
>> manager/process supervisor is for all intents and purposes an init
>> daemon, so shouldn't be surprised about getting SIGCHLD.
>
> PID namespaces primarily provide an independent PID numbering scheme for
> a subset of processes, i.e. so that identical may PIDs refer to different
> processes depending on the namespace they are running in. As a side
> effect this also provides init-like behaviour for processes that aren't
> the original PID 1 of the operating system. For systemd we are only
> interested in this side effect, but are not interested at all in the
> renumbering of processes, and in fact would even really dislike if it
> happened. That's why PR_SET_ANCHOR is useful: it gives us init-like
> behaviour without renaming all processes.
>
Right, but I don't get why you need this behavior to supervise either
system or user processes.  You already get all the functionality you
need to track processes via either cgroups or the proc connector (or a
combination of both).

So is this really just about making ps look pretty, as Kay says?

Scott

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for  child processes
  2010-12-21 12:05     ` Scott James Remnant
@ 2010-12-23 15:44       ` Lennart Poettering
  2010-12-23 16:00         ` Scott James Remnant
  0 siblings, 1 reply; 29+ messages in thread
From: Lennart Poettering @ 2010-12-23 15:44 UTC (permalink / raw)
  To: Scott James Remnant; +Cc: linux-kernel

On Tue, 21.12.10 12:05, Scott James Remnant (scott@netsplit.com) wrote:

> > PID namespaces primarily provide an independent PID numbering scheme for
> > a subset of processes, i.e. so that identical may PIDs refer to different
> > processes depending on the namespace they are running in. As a side
> > effect this also provides init-like behaviour for processes that aren't
> > the original PID 1 of the operating system. For systemd we are only
> > interested in this side effect, but are not interested at all in the
> > renumbering of processes, and in fact would even really dislike if it
> > happened. That's why PR_SET_ANCHOR is useful: it gives us init-like
> > behaviour without renaming all processes.
> >
> Right, but I don't get why you need this behavior to supervise either
> system or user processes.  You already get all the functionality you
> need to track processes via either cgroups or the proc connector (or a
> combination of both).

Well, we want a clean way to get access to the full siginfo_t of the
SIGCHLD for the main process of a service. the proc connector is awful
and cgroups does not pass siginfo_t's back to userspace, hence the
cleanest way to get this done properly and beautifully is to make the
session systemd a mini-init via PR_SET_ANCHOR, because then the per-user
systemd's and the per-system systemd can use the exact same code to
handle process managment.

> So is this really just about making ps look pretty, as Kay says?

That's a side effect, but for me it's mostly about getting a simple way
to get the SIGCHLDs, focussed on the children of the session manager and
with minimal wakeups.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes
  2010-12-23 15:44       ` Lennart Poettering
@ 2010-12-23 16:00         ` Scott James Remnant
  0 siblings, 0 replies; 29+ messages in thread
From: Scott James Remnant @ 2010-12-23 16:00 UTC (permalink / raw)
  To: Lennart Poettering; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1952 bytes --]

On Thu, Dec 23, 2010 at 3:44 PM, Lennart Poettering
<mzxreary@0pointer.de> wrote:
> On Tue, 21.12.10 12:05, Scott James Remnant (scott@netsplit.com) wrote:
>
>> > PID namespaces primarily provide an independent PID numbering scheme for
>> > a subset of processes, i.e. so that identical may PIDs refer to different
>> > processes depending on the namespace they are running in. As a side
>> > effect this also provides init-like behaviour for processes that aren't
>> > the original PID 1 of the operating system. For systemd we are only
>> > interested in this side effect, but are not interested at all in the
>> > renumbering of processes, and in fact would even really dislike if it
>> > happened. That's why PR_SET_ANCHOR is useful: it gives us init-like
>> > behaviour without renaming all processes.
>> >
>> Right, but I don't get why you need this behavior to supervise either
>> system or user processes.  You already get all the functionality you
>> need to track processes via either cgroups or the proc connector (or a
>> combination of both).
>
> Well, we want a clean way to get access to the full siginfo_t of the
> SIGCHLD for the main process of a service. the proc connector is awful
> and cgroups does not pass siginfo_t's back to userspace, hence the
> cleanest way to get this done properly and beautifully is to make the
> session systemd a mini-init via PR_SET_ANCHOR, because then the per-user
> systemd's and the per-system systemd can use the exact same code to
> handle process managment.
>
Ah, if you're after the siginfo_t this makes more sense.

You may or may not be interested in a patch I did a couple of years
ago, this also spliced into the same kind of code, but to notify init
via signal when it got given a new process via adoption.  I assume
this would work with a PR_SET_ANCHOR'd mini-init too?

(Apologies if the attachment screws up, still getting used to gmail :p)

Scott

[-- Attachment #2: notify-adoption-simple.patch --]
[-- Type: text/x-patch, Size: 5654 bytes --]

diff --git a/fs/exec.c b/fs/exec.c
index c5f1a92..07a8782 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1011,6 +1011,7 @@ int flush_old_exec(struct linux_binprm * bprm)
 		suid_keys(current);
 		set_dumpable(current->mm, suid_dumpable);
 		current->pdeath_signal = 0;
+		current->adopt_signal = 0;
 	} else if (file_permission(bprm->file, MAY_READ) ||
 			(bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP)) {
 		suid_keys(current);
@@ -1099,6 +1100,7 @@ void compute_creds(struct linux_binprm *bprm)
 	if (bprm->e_uid != current->uid) {
 		suid_keys(current);
 		current->pdeath_signal = 0;
+		current->adopt_signal = 0;
 	}
 	exec_keys(current);
 
diff --git a/include/linux/prctl.h b/include/linux/prctl.h
index 48d887e..1fa1b75 100644
--- a/include/linux/prctl.h
+++ b/include/linux/prctl.h
@@ -85,4 +85,8 @@
 #define PR_SET_TIMERSLACK 29
 #define PR_GET_TIMERSLACK 30
 
+/* Set/get notification of adoption by signal */
+#define PR_SET_ADOPTSIG 31  /* Second arg is a signal */
+#define PR_GET_ADOPTSIG 32  /* Second arg is a ptr to return the signal */
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 55e30d1..bcd2af3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1133,6 +1133,7 @@ struct task_struct {
 	int exit_state;
 	int exit_code, exit_signal;
 	int pdeath_signal;  /*  The signal sent when the parent dies  */
+	int adopt_signal;  /*  The signal sent when a process is reparented  */
 	/* ??? */
 	unsigned int personality;
 	unsigned did_exec:1;
@@ -1829,6 +1830,7 @@ extern int kill_pgrp(struct pid *pid, int sig, int priv);
 extern int kill_pid(struct pid *pid, int sig, int priv);
 extern int kill_proc_info(int, struct siginfo *, pid_t);
 extern int do_notify_parent(struct task_struct *, int);
+extern void do_notify_parent_adopted(struct task_struct *, struct task_struct *);
 extern void force_sig(int, struct task_struct *);
 extern void force_sig_specific(int, struct task_struct *);
 extern int send_sig(int, struct task_struct *, int);
diff --git a/kernel/exit.c b/kernel/exit.c
index 2d8be7e..813a232 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -813,6 +813,9 @@ static void reparent_thread(struct task_struct *p, struct task_struct *father)
 		/* We already hold the tasklist_lock here.  */
 		group_send_sig_info(p->pdeath_signal, SEND_SIG_NOINFO, p);
 
+	if (p->real_parent->adopt_signal)
+		do_notify_parent_adopted(p, father);
+
 	list_move_tail(&p->sibling, &p->real_parent->children);
 
 	/* If this is a threaded reparent there is no need to
diff --git a/kernel/signal.c b/kernel/signal.c
index 4530fc6..40228e2 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1474,6 +1474,43 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
 	spin_unlock_irqrestore(&sighand->siglock, flags);
 }
 
+/* Let init know that it has adopted a new child */
+void do_notify_parent_adopted(struct task_struct *tsk, struct task_struct *father)
+{
+	struct siginfo info;
+	unsigned long flags;
+	struct task_struct *reaper;
+	struct sighand_struct *sighand;
+	int ret;
+
+	reaper = tsk->real_parent;
+
+	memset (&info, 0, sizeof info);
+	info.si_signo = reaper->adopt_signal;
+	/*
+	 * set code to the same range as SIGCHLD so the right bits of
+	 * siginfo_t get copied, to userspace this will appear as si_code=0
+	 */
+	info.si_code = __SI_CHLD;
+	/*
+	 * see comment in do_notify_parent() about the following 4 lines
+	 */
+	rcu_read_lock();
+	info.si_pid = task_pid_nr_ns(tsk, reaper->nsproxy->pid_ns);
+	info.si_status = task_pid_nr_ns(father, reaper->nsproxy->pid_ns);
+	rcu_read_unlock();
+
+	info.si_uid = tsk->uid;
+
+	info.si_utime = cputime_to_clock_t(tsk->utime);
+	info.si_stime = cputime_to_clock_t(tsk->stime);
+
+	sighand = reaper->sighand;
+	spin_lock_irqsave(&sighand->siglock, flags);
+	__group_send_sig_info(reaper->adopt_signal, &info, reaper);
+	spin_unlock_irqrestore(&sighand->siglock, flags);
+}
+
 static inline int may_ptrace_stop(void)
 {
 	if (!likely(current->ptrace & PT_PTRACED))
diff --git a/kernel/sys.c b/kernel/sys.c
index 31deba8..1720053 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1726,6 +1726,16 @@ asmlinkage long sys_prctl(int option, unsigned long arg2, unsigned long arg3,
 			else
 				current->timer_slack_ns = arg2;
 			break;
+		case PR_SET_ADOPTSIG:
+			if (!valid_signal(arg2)) {
+				error = -EINVAL;
+				break;
+			}
+			current->adopt_signal = arg2;
+			break;
+		case PR_GET_ADOPTSIG:
+			error = put_user(current->adopt_signal, (int __user *)arg2);
+			break;
 		default:
 			error = -EINVAL;
 			break;
diff --git a/security/commoncap.c b/security/commoncap.c
index 6cbec11..a2da3ab 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -365,6 +365,7 @@ void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe)
 			  current->cap_permitted)) {
 		set_dumpable(current->mm, suid_dumpable);
 		current->pdeath_signal = 0;
+		current->adopt_signal = 0;
 
 		if (unsafe & ~LSM_UNSAFE_PTRACE_CAP) {
 			if (!capable(CAP_SETUID)) {
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 75777cb..8f089c8 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2280,8 +2280,10 @@ static void selinux_bprm_post_apply_creds(struct linux_binprm *bprm)
 		spin_unlock_irq(&current->sighand->siglock);
 	}
 
-	/* Always clear parent death signal on SID transitions. */
+	/* Always clear parent death signal and adoption notification
+	 * on SID transitions. */
 	current->pdeath_signal = 0;
+	current->adopt_signal = 0;
 
 	/* Check whether the new SID can inherit resource limits
 	   from the old SID.  If not, reset all soft limits to

^ permalink raw reply related	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2010-12-23 16:01 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-02 12:04 [PATCH] exit: PR_SET_ANCHOR for marking processes as reapers for child processes Lennart Poettering
2010-02-03  8:24 ` KOSAKI Motohiro
2010-02-03  9:53   ` Lennart Poettering
2010-02-03 15:31 ` Américo Wang
2010-02-03 17:49   ` Lennart Poettering
2010-02-05  9:54     ` Américo Wang
2010-02-11 10:21       ` Kay Sievers
2010-02-04 15:42 ` Kay Sievers
2010-02-04 20:59   ` Kay Sievers
2010-03-04 14:08 ` Oleg Nesterov
2010-03-04 22:14   ` Roland McGrath
2010-03-05 18:51     ` Kay Sievers
2010-03-05 19:18       ` Roland McGrath
2010-03-06  0:24         ` Lennart Poettering
2010-03-09  0:45           ` Ray Lee
2010-03-09 13:19             ` Oleg Nesterov
2010-03-06  0:20     ` Lennart Poettering
2010-03-08 23:11       ` Roland McGrath
2010-03-05  4:47   ` KOSAKI Motohiro
2010-03-05 18:55     ` Kay Sievers
2010-03-06  0:16   ` Lennart Poettering
2010-03-11  4:14     ` Eric W. Biederman
2010-03-11  7:56       ` KOSAKI Motohiro
2010-12-20 14:26 ` Scott James Remnant
2010-12-20 14:51   ` Kay Sievers
2010-12-21  9:56   ` Lennart Poettering
2010-12-21 12:05     ` Scott James Remnant
2010-12-23 15:44       ` Lennart Poettering
2010-12-23 16:00         ` Scott James Remnant

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.