linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7][v7] Container-init signal semantics
@ 2009-01-17 20:26 Sukadev Bhattiprolu
  2009-01-17 20:35 ` [PATCH 1/7][v7] Remove 'handler' parameter to tracehook functions Sukadev Bhattiprolu
                   ` (9 more replies)
  0 siblings, 10 replies; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2009-01-17 20:26 UTC (permalink / raw)
  To: oleg, ebiederm, roland, bastian; +Cc: daniel, xemul, containers, linux-kernel


Container-init must behave like global-init to processes within the
container and hence it must be immune to unhandled fatal signals from
within the container (i.e SIG_DFL signals that terminate the process).

But the same container-init must behave like a normal process to 
processes in ancestor namespaces and so if it receives the same fatal
signal from a process in ancestor namespace, the signal must be
processed.

Implementing these semantics requires that send_signal() determine pid
namespace of the sender but since signals can originate from workqueues/
interrupt-handlers, determining pid namespace of sender may not always
be possible or safe.

This patchset implements the design/simplified semantics suggested by
Oleg Nesterov.  The simplified semantics for container-init are:

	- container-init must never be terminated by a signal from a
	  descendant process.

	- container-init must never be immune to SIGKILL from an ancestor
	  namespace (so a process in parent namespace must always be able
	  to terminate a descendant container).

	- container-init may be immune to unhandled fatal signals (like
	  SIGUSR1) even if they are from ancestor namespace (SIGKILL is
	  the only reliable signal from ancestor namespace).

Patches in this set:

	[PATCH 1/7] Remove 'handler' parameter to tracehook functions
	[PATCH 2/7] Protect init from unwanted signals more
	[PATCH 3/7] Add from_ancestor_ns parameter to send_signal()
	[PATCH 4/7] Protect cinit from unblocked SIG_DFL signals
	[PATCH 5/7] Protect cinit from blocked fatal signals
	[PATCH 6/7] SI_USER: Masquerade si_pid when crossing pid ns boundary
	[PATCH 7/7] proc: Show SIG_DFL signals to init as "ignored" signals

Changelog[v7]:
	- siginfo_from_user() and siginfo_from_ancestor_ns() are fairly simple
	  and used only in send_signal(). Remove them and move the logic into
	  send_signal() (Drop old patch 4, update new patch 4/7)

	- Update /proc/pid/status to include SIG_DFL signals to init in the
	  "ignored" set (and remove the TODO in Patch 0/7) (Patch 7/7)

Changelog[v6]:

	- Patches 3,4: Have kill_pid_info_as_uid() pass in 'from_ancestor_ns'
	  parameter to __send_signal() and remove SI_ASYNCIO check in
	  siginfo_from_user().
	- Patches 4,6: Update changelog and simplify code

Changelog[v5]:
	- Patch 2/6: Remove SIG_IGN check in sig_task_ignored() and let
	  sig_handler_ignored() check SIG_IGN.
        - Patch 3/6. Put siginfo_from_ancestor_ns() back under CONFIG_PID_NS
	  and remove warning in rt_sigqueueinfo().
	- (Patch 5/6)Simplify check in get_signal_to_deliver()
	- (Patch 6/6)Simplify masquerading pid
	- LTP-20081219-intermediate showed no new errors on 2.6.28-rc5-mm2.

Changelog[v4]:
	- [Bugfix] Patch 3/7. Check ns == NULL in siginfo_from_ancestor_ns().
	  Although http://lkml.org/lkml/2008/12/16/502 makes it less likely
	  that ns == NULL, looks like an explicit check won't hurt ?
	- Remove SIGNAL_UNKILLABLE_FROM_NS flag and simplify logic as
	  suggested by Oleg Nesterov.
	- Dropped patch that set SIGNAL_UNKILLABLE_FROM_NS and set
	  SIGNAL_UNKILLABLE in patch 5/7 to be bisect-safe.
	- Add a warning in rt_sigqueueinfo() if SI_ASYNCIO is used
	  (patch 3/7)
	- Added two patches (6/7 and 7/7) to masquerade si_pid for
	  SI_USER and SI_TKILL


Changelog[v3]:
	Changes based on discussions of previous version:
		http://lkml.org/lkml/2008/11/25/458

	Major changes:

	- Define SIGNAL_UNKILLABLE_FROM_NS and use in container-inits to
	  skip fatal signals from same namespace but process SIGKILL/SIGSTOP
	  from ancestor namespace.
	- Use SI_FROMUSER() and si_code != SI_ASYNCIO to determine if
	  it is safe to dereference pid-namespace of caller. Highly
	  experimental :-)
	- Masquerading si_pid when crossing namespace boundary: relevant
	  patches merged in -mm and dropped from this set.

	Minor changes:

	- Remove 'handler' parameter to tracehook functions
	- Update sig_ignored() to drop SIG_DFL signals to global init early
	  (tried to address Roland's  and Oleg's comments)
	- Use 'same_ns' flag to drop SIGKILL/SIGSTOP to cinit from same
	  namespace


Limitations/side-effects of current design

	- Container-init is immune to suicide - kill(getpid(), SIGKILL) is
	  ignored. Use exit() :-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/7][v7] Remove 'handler' parameter to tracehook functions
  2009-01-17 20:26 [PATCH 0/7][v7] Container-init signal semantics Sukadev Bhattiprolu
@ 2009-01-17 20:35 ` Sukadev Bhattiprolu
  2009-01-17 20:35 ` [PATCH 2/7][v7] Protect init from unwanted signals more Sukadev Bhattiprolu
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2009-01-17 20:35 UTC (permalink / raw)
  To: oleg, ebiederm, roland, bastian; +Cc: daniel, xemul, containers, linux-kernel


From: Oleg Nesterov <oleg@redhat.com>
Date: Wed, 24 Dec 2008 13:35:17 -0800
Subject: [PATCH 1/7][v7] Remove 'handler' parameter to tracehook functions

Based on an earlier patch submitted by Oleg Nesterov and comments
from Roland McGrath (http://lkml.org/lkml/2008/11/19/258).

The handler parameter is currently unused in the tracehook functions.
Besides, the tracehook functions are called with siglock held, so the
functions can check the handler if they later need to.

Removing the parameter simiplifies changes to sig_ignored() in a follow-on
patch.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Roland McGrath <roland@redhat.com>
---
 arch/x86/kernel/ptrace.c  |    2 +-
 include/linux/tracehook.h |   13 ++++---------
 kernel/signal.c           |    6 +++---
 3 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 0a6d8c1..d6ef716 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1585,6 +1585,6 @@ asmregparm void syscall_trace_leave(struct pt_regs *regs)
 	 * system call instruction.
 	 */
 	if (test_thread_flag(TIF_SINGLESTEP) &&
-	    tracehook_consider_fatal_signal(current, SIGTRAP, SIG_DFL))
+	    tracehook_consider_fatal_signal(current, SIGTRAP))
 		send_sigtrap(current, regs, 0, TRAP_BRKPT);
 }
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 6186a78..eb4c654 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -388,17 +388,14 @@ static inline void tracehook_signal_handler(int sig, siginfo_t *info,
  * tracehook_consider_ignored_signal - suppress short-circuit of ignored signal
  * @task:		task receiving the signal
  * @sig:		signal number being sent
- * @handler:		%SIG_IGN or %SIG_DFL
  *
  * Return zero iff tracing doesn't care to examine this ignored signal,
  * so it can short-circuit normal delivery and never even get queued.
- * Either @handler is %SIG_DFL and @sig's default is ignore, or it's %SIG_IGN.
  *
  * Called with @task->sighand->siglock held.
  */
 static inline int tracehook_consider_ignored_signal(struct task_struct *task,
-						    int sig,
-						    void __user *handler)
+						    int sig)
 {
 	return (task_ptrace(task) & PT_PTRACED) != 0;
 }
@@ -407,19 +404,17 @@ static inline int tracehook_consider_ignored_signal(struct task_struct *task,
  * tracehook_consider_fatal_signal - suppress special handling of fatal signal
  * @task:		task receiving the signal
  * @sig:		signal number being sent
- * @handler:		%SIG_DFL or %SIG_IGN
  *
  * Return nonzero to prevent special handling of this termination signal.
- * Normally @handler is %SIG_DFL.  It can be %SIG_IGN if @sig is ignored,
- * in which case force_sig() is about to reset it to %SIG_DFL.
+ * Normally handler for signal is %SIG_DFL.  It can be %SIG_IGN if @sig is
+ * ignored, in which case force_sig() is about to reset it to %SIG_DFL.
  * When this returns zero, this signal might cause a quick termination
  * that does not give the debugger a chance to intercept the signal.
  *
  * Called with or without @task->sighand->siglock held.
  */
 static inline int tracehook_consider_fatal_signal(struct task_struct *task,
-						  int sig,
-						  void __user *handler)
+						  int sig)
 {
 	return (task_ptrace(task) & PT_PTRACED) != 0;
 }
diff --git a/kernel/signal.c b/kernel/signal.c
index 2a64304..7945e71 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -72,7 +72,7 @@ static int sig_ignored(struct task_struct *t, int sig)
 	/*
 	 * Tracers may want to know about even ignored signals.
 	 */
-	return !tracehook_consider_ignored_signal(t, sig, handler);
+	return !tracehook_consider_ignored_signal(t, sig);
 }
 
 /*
@@ -316,7 +316,7 @@ int unhandled_signal(struct task_struct *tsk, int sig)
 		return 1;
 	if (handler != SIG_IGN && handler != SIG_DFL)
 		return 0;
-	return !tracehook_consider_fatal_signal(tsk, sig, handler);
+	return !tracehook_consider_fatal_signal(tsk, sig);
 }
 
 
@@ -775,7 +775,7 @@ static void complete_signal(int sig, struct task_struct *p, int group)
 	    !(signal->flags & (SIGNAL_UNKILLABLE | SIGNAL_GROUP_EXIT)) &&
 	    !sigismember(&t->real_blocked, sig) &&
 	    (sig == SIGKILL ||
-	     !tracehook_consider_fatal_signal(t, sig, SIG_DFL))) {
+	     !tracehook_consider_fatal_signal(t, sig))) {
 		/*
 		 * This signal will be fatal to the whole group.
 		 */
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/7][v7] Protect init from unwanted signals more
  2009-01-17 20:26 [PATCH 0/7][v7] Container-init signal semantics Sukadev Bhattiprolu
  2009-01-17 20:35 ` [PATCH 1/7][v7] Remove 'handler' parameter to tracehook functions Sukadev Bhattiprolu
@ 2009-01-17 20:35 ` Sukadev Bhattiprolu
  2009-01-17 20:35 ` [PATCH 3/7][v7] Add from_ancestor_ns parameter to send_signal() Sukadev Bhattiprolu
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2009-01-17 20:35 UTC (permalink / raw)
  To: oleg, ebiederm, roland, bastian; +Cc: daniel, xemul, containers, linux-kernel


From: Oleg Nesterov <oleg@redhat.com>
Date: Wed, 24 Dec 2008 13:35:23 -0800
Subject: [PATCH 2/7][v7] Protect init from unwanted signals more

(This is a modified version of the patch submitted by Oleg Nesterov
http://lkml.org/lkml/2008/11/18/249 and tries to address comments
that came up in that discussion)

init ignores the SIG_DFL signals but we queue them anyway, including
SIGKILL. This is mostly OK, the signal will be dropped silently when
dequeued, but the pending SIGKILL has 2 bad implications:

        - it implies fatal_signal_pending(), so we confuse things
          like wait_for_completion_killable/lock_page_killable.

        - for the sub-namespace inits, the pending SIGKILL can
          mask (legacy_queue) the subsequent SIGKILL from the
          parent namespace which must kill cinit reliably.
          (preparation, cinits don't have SIGNAL_UNKILLABLE yet)

The patch can't help when init is ptraced, but ptracing of init is
not "safe" anyway.

Changelog[v5]:
	- (Oleg Nesterov) Remove SIG_IGN check in sig_task_ignored()
	  and let sig_handler_ignored() check SIG_IGN.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Roland McGrath <roland@redhat.com>
---
 kernel/signal.c |   16 +++++++++++++---
 1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 7945e71..87f3f30 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -53,10 +53,21 @@ static int sig_handler_ignored(void __user *handler, int sig)
 		(handler == SIG_DFL && sig_kernel_ignore(sig));
 }
 
-static int sig_ignored(struct task_struct *t, int sig)
+static int sig_task_ignored(struct task_struct *t, int sig)
 {
 	void __user *handler;
 
+	handler = sig_handler(t, sig);
+
+	if (unlikely(t->signal->flags & SIGNAL_UNKILLABLE) &&
+			handler == SIG_DFL)
+		return 1;
+
+	return sig_handler_ignored(handler, sig);
+}
+
+static int sig_ignored(struct task_struct *t, int sig)
+{
 	/*
 	 * Blocked signals are never ignored, since the
 	 * signal handler may change by the time it is
@@ -65,8 +76,7 @@ static int sig_ignored(struct task_struct *t, int sig)
 	if (sigismember(&t->blocked, sig) || sigismember(&t->real_blocked, sig))
 		return 0;
 
-	handler = sig_handler(t, sig);
-	if (!sig_handler_ignored(handler, sig))
+	if (!sig_task_ignored(t, sig))
 		return 0;
 
 	/*
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/7][v7] Add from_ancestor_ns parameter to send_signal()
  2009-01-17 20:26 [PATCH 0/7][v7] Container-init signal semantics Sukadev Bhattiprolu
  2009-01-17 20:35 ` [PATCH 1/7][v7] Remove 'handler' parameter to tracehook functions Sukadev Bhattiprolu
  2009-01-17 20:35 ` [PATCH 2/7][v7] Protect init from unwanted signals more Sukadev Bhattiprolu
@ 2009-01-17 20:35 ` Sukadev Bhattiprolu
  2009-01-17 20:36 ` [PATCH 4/7][v7] Protect cinit from unblocked SIG_DFL signals Sukadev Bhattiprolu
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2009-01-17 20:35 UTC (permalink / raw)
  To: oleg, ebiederm, roland, bastian; +Cc: daniel, xemul, containers, linux-kernel


From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Date: Tue, 6 Jan 2009 17:28:05 -0800
Subject: [PATCH 3/7][v7] Add from_ancestor_ns parameter to send_signal()

send_signal() (or its helper) needs to determine the pid namespace
of the sender. But a signal sent via kill_pid_info_as_uid() comes
from within the kernel and send_signal() does not need to determine
the pid namespace of the sender. So define a helper for send_signal()
which takes an additional parameter, 'from_ancestor_ns' and have
kill_pid_info_as_uid() use that helper directly.

The 'from_ancestor_ns' parameter will be used in a follow-on patch.

Changelog[v6]:
	- New patch added to this patchset, based on suggestions from
	  Roland McGrath and Oleg Nesterov.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 kernel/signal.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 87f3f30..bb3b6f5 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -821,8 +821,8 @@ static inline int legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
-static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
-			int group)
+static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
+			int group, int from_ancestor_ns)
 {
 	struct sigpending *pending;
 	struct sigqueue *q;
@@ -896,6 +896,12 @@ out_set:
 	return 0;
 }
 
+static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
+			int group)
+{
+	return __send_signal(sig, info, t, group, 0);
+}
+
 int print_fatal_signals;
 
 static void print_fatal_signal(struct pt_regs *regs, int signr)
@@ -1138,7 +1144,7 @@ int kill_pid_info_as_uid(int sig, struct siginfo *info, struct pid *pid,
 	if (sig && p->sighand) {
 		unsigned long flags;
 		spin_lock_irqsave(&p->sighand->siglock, flags);
-		ret = __group_send_sig_info(sig, info, p);
+		ret = __send_signal(sig, info, p, 1, 0);
 		spin_unlock_irqrestore(&p->sighand->siglock, flags);
 	}
 out_unlock:
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/7][v7] Protect cinit from unblocked SIG_DFL signals
  2009-01-17 20:26 [PATCH 0/7][v7] Container-init signal semantics Sukadev Bhattiprolu
                   ` (2 preceding siblings ...)
  2009-01-17 20:35 ` [PATCH 3/7][v7] Add from_ancestor_ns parameter to send_signal() Sukadev Bhattiprolu
@ 2009-01-17 20:36 ` Sukadev Bhattiprolu
  2009-01-17 22:12   ` Oleg Nesterov
  2009-01-20  1:09   ` Sukadev Bhattiprolu
  2009-01-17 20:36 ` [PATCH 5/7][v7] Protect cinit from blocked fatal signals Sukadev Bhattiprolu
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2009-01-17 20:36 UTC (permalink / raw)
  To: oleg, ebiederm, roland, bastian; +Cc: daniel, xemul, containers, linux-kernel


From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Date: Wed, 24 Dec 2008 14:03:57 -0800
Subject: [PATCH 4/7][v7] Protect cinit from unblocked SIG_DFL signals

Drop early any SIG_DFL or SIG_IGN signals to container-init from within
the same container. But queue SIGSTOP and SIGKILL to the container-init
if they are from an ancestor container.

Blocked, fatal signals (i.e when SIG_DFL is to terminate) from within the
container can still terminate the container-init. That will be addressed
in the next patch.

Note:	To be bisect-safe, SIGNAL_UNKILLABLE will be set for container-inits
   	in a follow-on patch. Until then, this patch is just a preparatory
	step.

Changelog[v7]:
	- siginfo_from_user(), siginfo_from_ancestor_ns() are needed in only
	  one place.  Remove them and move their logic into send_signal().
	  Seems to make code more easier to read :-)

Changelog[v6]:
	- (Roland McGrath) Remove unnecessary helper signal_task_unkillable()
	  and fold checks into sig_task_ignored().

Changelog[v4]:
	- (Oleg Nesterov) Remove SIGNAL_UNKILLABLE_FROM_NS and rename
	  'same_ns' to 'from_ancestor_ns'.
	- SIGNAL_UNKILLABLE is not yet set for container-inits (will be
	  set in follow-on patch).

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 kernel/signal.c |   28 +++++++++++++++++++---------
 1 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index bb3b6f5..41060ae 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -53,20 +53,21 @@ static int sig_handler_ignored(void __user *handler, int sig)
 		(handler == SIG_DFL && sig_kernel_ignore(sig));
 }
 
-static int sig_task_ignored(struct task_struct *t, int sig)
+static int sig_task_ignored(struct task_struct *t, int sig,
+		int from_ancestor_ns)
 {
 	void __user *handler;
 
 	handler = sig_handler(t, sig);
 
 	if (unlikely(t->signal->flags & SIGNAL_UNKILLABLE) &&
-			handler == SIG_DFL)
+			handler == SIG_DFL && !from_ancestor_ns)
 		return 1;
 
 	return sig_handler_ignored(handler, sig);
 }
 
-static int sig_ignored(struct task_struct *t, int sig)
+static int sig_ignored(struct task_struct *t, int sig, int from_ancestor_ns)
 {
 	/*
 	 * Blocked signals are never ignored, since the
@@ -76,7 +77,7 @@ static int sig_ignored(struct task_struct *t, int sig)
 	if (sigismember(&t->blocked, sig) || sigismember(&t->real_blocked, sig))
 		return 0;
 
-	if (!sig_task_ignored(t, sig))
+	if (!sig_task_ignored(t, sig, from_ancestor_ns))
 		return 0;
 
 	/*
@@ -632,7 +633,7 @@ static int check_kill_permission(int sig, struct siginfo *info,
  * Returns true if the signal should be actually delivered, otherwise
  * it should be dropped.
  */
-static int prepare_signal(int sig, struct task_struct *p)
+static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
 {
 	struct signal_struct *signal = p->signal;
 	struct task_struct *t;
@@ -716,7 +717,7 @@ static int prepare_signal(int sig, struct task_struct *p)
 		}
 	}
 
-	return !sig_ignored(p, sig);
+	return !sig_ignored(p, sig, from_ancestor_ns);
 }
 
 /*
@@ -830,7 +831,8 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
 	trace_sched_signal_send(sig, t);
 
 	assert_spin_locked(&t->sighand->siglock);
-	if (!prepare_signal(sig, t))
+
+	if (!prepare_signal(sig, t, from_ancestor_ns))
 		return 0;
 
 	pending = group ? &t->signal->shared_pending : &t->pending;
@@ -899,7 +901,15 @@ out_set:
 static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
 			int group)
 {
-	return __send_signal(sig, info, t, group, 0);
+	int from_ancestor_ns = 0;
+
+#ifdef CONFIG_PID_NS
+	if (!is_si_special(info) && SI_FROMUSER(info) &&
+			task_pid_nr_ns(current, task_active_pid_ns(t)) <= 0)
+		from_ancestor_ns = 1;
+#endif
+
+	return __send_signal(sig, info, t, group, from_ancestor_ns);
 }
 
 int print_fatal_signals;
@@ -1331,7 +1341,7 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
 		goto ret;
 
 	ret = 1; /* the signal is ignored */
-	if (!prepare_signal(sig, t))
+	if (!prepare_signal(sig, t, 1))
 		goto out;
 
 	ret = 0;
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/7][v7] Protect cinit from blocked fatal signals
  2009-01-17 20:26 [PATCH 0/7][v7] Container-init signal semantics Sukadev Bhattiprolu
                   ` (3 preceding siblings ...)
  2009-01-17 20:36 ` [PATCH 4/7][v7] Protect cinit from unblocked SIG_DFL signals Sukadev Bhattiprolu
@ 2009-01-17 20:36 ` Sukadev Bhattiprolu
  2009-01-17 20:37 ` [PATCH 6/7][v7] SI_USER: Masquerade si_pid when crossing pid ns boundary Sukadev Bhattiprolu
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2009-01-17 20:36 UTC (permalink / raw)
  To: oleg, ebiederm, roland, bastian; +Cc: daniel, xemul, containers, linux-kernel


From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Date: Wed, 24 Dec 2008 14:04:24 -0800
Subject: [PATCH 5/7][v7] Protect cinit from blocked fatal signals

Normally SIG_DFL signals to global and container-init are dropped early.
But if a signal is blocked when it is posted, we cannot drop the signal
since the receiver may install a handler before unblocking the signal.
Once this signal is queued however, the receiver container-init has
no way of knowing if the signal was sent from an ancestor or descendant
namespace.  This patch ensures that contianer-init drops all SIG_DFL
signals in get_signal_to_deliver() except SIGKILL/SIGSTOP.

If SIGSTOP/SIGKILL originate from a descendant of container-init they
are never queued (i.e dropped in sig_ignored() in an earler patch).

If SIGSTOP/SIGKILL originate from parent namespace, the signal is queued
and container-init processes the signal.

IOW, if get_signal_to_deliver() sees a sig_kernel_only() signal for global
or container-init, the signal must have been generated internally or must
have come from an ancestor ns and we process the signal.

Further, the signal_group_exit() check was needed to cover the case of
a multi-threaded init sending SIGKILL to other threads when doing an
exit() or exec(). But since the new sig_kernel_only() check covers the
SIGKILL, the signal_group_exit() check is no longer needed and can be
removed.

Finally, now that we have all pieces in place, set SIGNAL_UNKILLABLE for
container-inits.

Changelog[v6]:
	- Add a note regarding the signal_group_exit() in patch description.

Changelog[v5]:
	- (Oleg Nesterov) Drop signal_unkillable(), simplify check in
	  get_signal_to_deliver() and drop check for signal_group_exit()
	  since it is covered by sig_kernel_only().

Changelog[v4]:
	- Rename sig_unkillable() to unkillable_by_sig()
	- Remove SIGNAL_UNKILLABLE_FROM_NS flag and simplify (Oleg Nesterov)
	- Set SIGNAL_UNKILLABLE for container-init in this patch.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 kernel/fork.c   |    2 ++
 kernel/signal.c |    9 ++++++++-
 2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index dba2d3f..d3e93ef 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -812,6 +812,8 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 	atomic_set(&sig->live, 1);
 	init_waitqueue_head(&sig->wait_chldexit);
 	sig->flags = 0;
+	if (clone_flags & CLONE_NEWPID)
+		sig->flags |= SIGNAL_UNKILLABLE;
 	sig->group_exit_code = 0;
 	sig->group_exit_task = NULL;
 	sig->group_stop_count = 0;
diff --git a/kernel/signal.c b/kernel/signal.c
index 41060ae..ccc020f 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1857,9 +1857,16 @@ relock:
 
 		/*
 		 * Global init gets no signals it doesn't want.
+		 * Container-init gets no signals it doesn't want from same
+		 * container.
+		 *
+		 * Note that if global/container-init sees a sig_kernel_only()
+		 * signal here, the signal must have been generated internally
+		 * or must have come from an ancestor namespace. In either
+		 * case, the signal cannot be dropped.
 		 */
 		if (unlikely(signal->flags & SIGNAL_UNKILLABLE) &&
-		    !signal_group_exit(signal))
+				!sig_kernel_only(signr))
 			continue;
 
 		if (sig_kernel_stop(signr)) {
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/7][v7] SI_USER: Masquerade si_pid when crossing pid ns boundary
  2009-01-17 20:26 [PATCH 0/7][v7] Container-init signal semantics Sukadev Bhattiprolu
                   ` (4 preceding siblings ...)
  2009-01-17 20:36 ` [PATCH 5/7][v7] Protect cinit from blocked fatal signals Sukadev Bhattiprolu
@ 2009-01-17 20:37 ` Sukadev Bhattiprolu
  2009-01-17 20:37 ` [PATCH 7/7][v7] proc: Show SIG_DFL signals to init as "ignored" signals Sukadev Bhattiprolu
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2009-01-17 20:37 UTC (permalink / raw)
  To: oleg, ebiederm, roland, bastian; +Cc: daniel, xemul, containers, linux-kernel


From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Date: Wed, 24 Dec 2008 14:14:18 -0800
Subject: [PATCH 6/7][v7] SI_USER: Masquerade si_pid when crossing pid ns boundary

When sending a signal to a descendant namespace, set ->si_pid to 0 since
the sender does not have a pid in the receiver's namespace.

Note:
	- If rt_sigqueueinfo() sets si_code to SI_USER when sending a
	  signal across a pid namespace boundary, the value in ->si_pid
	  will be cleared to 0.

Changelog[v5]:
	- (Oleg Nesterov) Address both sys_kill() and sys_tkill() cases
	  in send_signal() to simplify code (this drops patch 7/7 from
	  earlier version of patchset).

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 kernel/signal.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index ccc020f..81b8603 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -880,6 +880,8 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
 			break;
 		default:
 			copy_siginfo(&q->info, info);
+			if (from_ancestor_ns)
+				q->info.si_pid = 0;
 			break;
 		}
 	} else if (!is_si_special(info)) {
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 7/7][v7] proc: Show SIG_DFL signals to init as "ignored" signals
  2009-01-17 20:26 [PATCH 0/7][v7] Container-init signal semantics Sukadev Bhattiprolu
                   ` (5 preceding siblings ...)
  2009-01-17 20:37 ` [PATCH 6/7][v7] SI_USER: Masquerade si_pid when crossing pid ns boundary Sukadev Bhattiprolu
@ 2009-01-17 20:37 ` Sukadev Bhattiprolu
  2009-01-17 22:19   ` Oleg Nesterov
  2009-01-19  2:09 ` [PATCH 0/7][v7] Container-init signal semantics KAMEZAWA Hiroyuki
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2009-01-17 20:37 UTC (permalink / raw)
  To: oleg, ebiederm, roland, bastian; +Cc: daniel, xemul, containers, linux-kernel


From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Date: Sat, 17 Jan 2009 09:26:26 -0800
Subject: [PATCH 7/7][v7] proc: Show SIG_DFL signals to init as "ignored" signals

Init processes ignore SIG_DFL signals unless they are from an ancestor
namespace.  Ensure /proc/pid/status correcly reports these signals.

Values for "normal" /bin/bash from /proc/pid/status:

	SigIgn: 0000000000384004
	SigCgt: 000000004b813efb

	Signals 9, 16, 19 etc are neither ignored nor caught

/bin/bash as container-init, Status viewed from outside container:

	SigIgn: 0000000000384004
	SigCgt: 000000004b813efb

	(same as 'normal')

/bin/bash as container-init, Status viewed from inside container:

	SigIgn: ffffffffb47ec104
	SigCgt: 000000004b813efb

	All "uncaught" signals, including SIGKILL(9) SIGSTOP(19) are ignored.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 fs/proc/array.c |   18 +++++++++++++++++-
 1 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 7e4877d..16e39db 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -228,15 +228,31 @@ static void render_sigset_t(struct seq_file *m, const char *header,
 	seq_printf(m, "\n");
 }
 
+static int is_unkillable(struct task_struct *p)
+{
+	if (!(p->signal->flags & SIGNAL_UNKILLABLE))
+		return 0;
+
+	/* If caller is an ancestor, we are not unkillable */
+	if (task_pid_nr_ns(current, task_active_pid_ns(p)) <= 0)
+		return 0;
+
+	return 1;
+}
+
 static void collect_sigign_sigcatch(struct task_struct *p, sigset_t *ign,
 				    sigset_t *catch)
 {
 	struct k_sigaction *k;
 	int i;
+	int unkillable;
+	
+	unkillable = is_unkillable(p);
 
 	k = p->sighand->action;
 	for (i = 1; i <= _NSIG; ++i, ++k) {
-		if (k->sa.sa_handler == SIG_IGN)
+		if (k->sa.sa_handler == SIG_IGN ||
+				(k->sa.sa_handler == SIG_DFL && unkillable))
 			sigaddset(ign, i);
 		else if (k->sa.sa_handler != SIG_DFL)
 			sigaddset(catch, i);
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/7][v7] Protect cinit from unblocked SIG_DFL signals
  2009-01-17 20:36 ` [PATCH 4/7][v7] Protect cinit from unblocked SIG_DFL signals Sukadev Bhattiprolu
@ 2009-01-17 22:12   ` Oleg Nesterov
  2009-01-20  1:07     ` Sukadev Bhattiprolu
  2009-01-20  1:09   ` Sukadev Bhattiprolu
  1 sibling, 1 reply; 26+ messages in thread
From: Oleg Nesterov @ 2009-01-17 22:12 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: ebiederm, roland, bastian, daniel, xemul, containers, linux-kernel

On 01/17, Sukadev Bhattiprolu wrote:
>
> @@ -1331,7 +1341,7 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
>  		goto ret;
>
>  	ret = 1; /* the signal is ignored */
> -	if (!prepare_signal(sig, t))
> +	if (!prepare_signal(sig, t, 1))

Hmm, just noticed. This looks wrong, it should be prepare_signal(sig, t, 0),
no?

For example, /sbin/init can create the posix timer with sigev_signo = SIGKILL
and it won't be killed before this patch.

This also looks wrong from the masquerading pov.


Otherwise, the patches 1-6 are imho fine.

Oleg.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 7/7][v7] proc: Show SIG_DFL signals to init as "ignored" signals
  2009-01-17 20:37 ` [PATCH 7/7][v7] proc: Show SIG_DFL signals to init as "ignored" signals Sukadev Bhattiprolu
@ 2009-01-17 22:19   ` Oleg Nesterov
  2009-01-20  1:04     ` Sukadev Bhattiprolu
  0 siblings, 1 reply; 26+ messages in thread
From: Oleg Nesterov @ 2009-01-17 22:19 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: ebiederm, roland, bastian, daniel, xemul, containers, linux-kernel

On 01/17, Sukadev Bhattiprolu wrote:
>
> Init processes ignore SIG_DFL signals unless they are from an ancestor
> namespace.  Ensure /proc/pid/status correcly reports these signals.

This is the user-visible change, and I don't really understand why do we
need it.

Imho, this patch can confuse the user-space. Why should we report that,
say, SIGCONT is ignored by the global init?


Even if I am wrong, I believe this change is orthogonal to rhis series,
and should be posted separately.

Oleg.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/7][v7] Container-init signal semantics
  2009-01-17 20:26 [PATCH 0/7][v7] Container-init signal semantics Sukadev Bhattiprolu
                   ` (6 preceding siblings ...)
  2009-01-17 20:37 ` [PATCH 7/7][v7] proc: Show SIG_DFL signals to init as "ignored" signals Sukadev Bhattiprolu
@ 2009-01-19  2:09 ` KAMEZAWA Hiroyuki
  2009-01-21  3:05   ` Sukadev Bhattiprolu
  2009-01-21  4:39 ` Bryan Donlan
  2009-02-07 21:20 ` Sukadev Bhattiprolu
  9 siblings, 1 reply; 26+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-01-19  2:09 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: oleg, ebiederm, roland, bastian, daniel, xemul, containers, linux-kernel

On Sat, 17 Jan 2009 12:26:38 -0800
Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> wrote:

> 
> Container-init must behave like global-init to processes within the
> container and hence it must be immune to unhandled fatal signals from
> within the container (i.e SIG_DFL signals that terminate the process).
> 
> But the same container-init must behave like a normal process to 
> processes in ancestor namespaces and so if it receives the same fatal
> signal from a process in ancestor namespace, the signal must be
> processed.
> 
> Implementing these semantics requires that send_signal() determine pid
> namespace of the sender but since signals can originate from workqueues/
> interrupt-handlers, determining pid namespace of sender may not always
> be possible or safe.
> 

Is this feature is for blocking signals from children to name-space
creator(owner) ? And automatically used when namespace/cgroup is created ?
IOW, Container-init is Namespace-Cgroup-init ? 

I'm glad if you add some documentation updates about how-it-works to patch set.

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 7/7][v7] proc: Show SIG_DFL signals to init as "ignored" signals
  2009-01-17 22:19   ` Oleg Nesterov
@ 2009-01-20  1:04     ` Sukadev Bhattiprolu
  2009-01-20  7:33       ` Oleg Nesterov
  0 siblings, 1 reply; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2009-01-20  1:04 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: ebiederm, roland, bastian, daniel, xemul, containers, linux-kernel

Oleg Nesterov [oleg@redhat.com] wrote:
| On 01/17, Sukadev Bhattiprolu wrote:
| >
| > Init processes ignore SIG_DFL signals unless they are from an ancestor
| > namespace.  Ensure /proc/pid/status correcly reports these signals.
| 
| This is the user-visible change, and I don't really understand why do we
| need it.

This discussion came up earlier, with Bastian and Roland and my understanding
was that we should fix the SigIgn line in /proc/pid/status - so I had added
a TODO for this patchset.

| 
| Imho, this patch can confuse the user-space. Why should we report that,
| say, SIGCONT is ignored by the global init?

But it is ignored right ?

Also, if user space looks at the SigIgn line and assumes that SIGKILL or
SIGUSR1 will kill init, user space can still be confused when it doesn't
really kill - no ?

| 
| 
| Even if I am wrong, I believe this change is orthogonal to rhis series,
| and should be posted separately.
| 

You are right that its not strictly tied to this patchset. init was
dropping SIGKILL before too.

So, should I just post separately or drop altogether ?

Sukadev

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/7][v7] Protect cinit from unblocked SIG_DFL signals
  2009-01-17 22:12   ` Oleg Nesterov
@ 2009-01-20  1:07     ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2009-01-20  1:07 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: ebiederm, roland, bastian, daniel, xemul, containers, linux-kernel

Oleg Nesterov [oleg@redhat.com] wrote:
| On 01/17, Sukadev Bhattiprolu wrote:
| >
| > @@ -1331,7 +1341,7 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
| >  		goto ret;
| >
| >  	ret = 1; /* the signal is ignored */
| > -	if (!prepare_signal(sig, t))
| > +	if (!prepare_signal(sig, t, 1))
| 
| Hmm, just noticed. This looks wrong, it should be prepare_signal(sig, t, 0),
| no?

Grr. I missed that when we changed the parameter from 'same-ns' to
'from_ancestor_ns' and inverted the logic.

| 
| For example, /sbin/init can create the posix timer with sigev_signo = SIGKILL
| and it won't be killed before this patch.
| 
| This also looks wrong from the masquerading pov.
| 
| 
| Otherwise, the patches 1-6 are imho fine.

Thanks, Will resend this patch.

Sukadev

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 4/7][v7] Protect cinit from unblocked SIG_DFL signals
  2009-01-17 20:36 ` [PATCH 4/7][v7] Protect cinit from unblocked SIG_DFL signals Sukadev Bhattiprolu
  2009-01-17 22:12   ` Oleg Nesterov
@ 2009-01-20  1:09   ` Sukadev Bhattiprolu
  1 sibling, 0 replies; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2009-01-20  1:09 UTC (permalink / raw)
  To: oleg, ebiederm, roland, bastian; +Cc: daniel, xemul, containers, linux-kernel

Minor update - set 'from_ancestor_ns' parameter to 0 in send_sigque().

---

From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Date: Wed, 24 Dec 2008 14:03:57 -0800
Subject: [PATCH 4/7][v7] Protect cinit from unblocked SIG_DFL signals

Drop early any SIG_DFL or SIG_IGN signals to container-init from within
the same container. But queue SIGSTOP and SIGKILL to the container-init
if they are from an ancestor container.

Blocked, fatal signals (i.e when SIG_DFL is to terminate) from within the
container can still terminate the container-init. That will be addressed
in the next patch.

Note:	To be bisect-safe, SIGNAL_UNKILLABLE will be set for container-inits
   	in a follow-on patch. Until then, this patch is just a preparatory
	step.

Changelog[v7]:
	- (Oleg Nesterov) Bugfix: send_sigqueue() should use from_ancestor_ns=0
	- siginfo_from_user(), siginfo_from_ancestor_ns() are needed in only
	  one place.  Remove them and move their logic into send_signal().
	  Seems to make code more easier to read :-)

Changelog[v6]:
	- (Roland McGrath) Remove unnecessary helper signal_task_unkillable()
	  and fold checks into sig_task_ignored().

Changelog[v4]:
	- (Oleg Nesterov) Remove SIGNAL_UNKILLABLE_FROM_NS and rename
	  'same_ns' to 'from_ancestor_ns'.
	- SIGNAL_UNKILLABLE is not yet set for container-inits (will be
	  set in follow-on patch).

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 kernel/signal.c |   28 +++++++++++++++++++---------
 1 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index bb3b6f5..35556aa 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -53,20 +53,21 @@ static int sig_handler_ignored(void __user *handler, int sig)
 		(handler == SIG_DFL && sig_kernel_ignore(sig));
 }
 
-static int sig_task_ignored(struct task_struct *t, int sig)
+static int sig_task_ignored(struct task_struct *t, int sig,
+		int from_ancestor_ns)
 {
 	void __user *handler;
 
 	handler = sig_handler(t, sig);
 
 	if (unlikely(t->signal->flags & SIGNAL_UNKILLABLE) &&
-			handler == SIG_DFL)
+			handler == SIG_DFL && !from_ancestor_ns)
 		return 1;
 
 	return sig_handler_ignored(handler, sig);
 }
 
-static int sig_ignored(struct task_struct *t, int sig)
+static int sig_ignored(struct task_struct *t, int sig, int from_ancestor_ns)
 {
 	/*
 	 * Blocked signals are never ignored, since the
@@ -76,7 +77,7 @@ static int sig_ignored(struct task_struct *t, int sig)
 	if (sigismember(&t->blocked, sig) || sigismember(&t->real_blocked, sig))
 		return 0;
 
-	if (!sig_task_ignored(t, sig))
+	if (!sig_task_ignored(t, sig, from_ancestor_ns))
 		return 0;
 
 	/*
@@ -632,7 +633,7 @@ static int check_kill_permission(int sig, struct siginfo *info,
  * Returns true if the signal should be actually delivered, otherwise
  * it should be dropped.
  */
-static int prepare_signal(int sig, struct task_struct *p)
+static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
 {
 	struct signal_struct *signal = p->signal;
 	struct task_struct *t;
@@ -716,7 +717,7 @@ static int prepare_signal(int sig, struct task_struct *p)
 		}
 	}
 
-	return !sig_ignored(p, sig);
+	return !sig_ignored(p, sig, from_ancestor_ns);
 }
 
 /*
@@ -830,7 +831,8 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
 	trace_sched_signal_send(sig, t);
 
 	assert_spin_locked(&t->sighand->siglock);
-	if (!prepare_signal(sig, t))
+
+	if (!prepare_signal(sig, t, from_ancestor_ns))
 		return 0;
 
 	pending = group ? &t->signal->shared_pending : &t->pending;
@@ -899,7 +901,15 @@ out_set:
 static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
 			int group)
 {
-	return __send_signal(sig, info, t, group, 0);
+	int from_ancestor_ns = 0;
+
+#ifdef CONFIG_PID_NS
+	if (!is_si_special(info) && SI_FROMUSER(info) &&
+			task_pid_nr_ns(current, task_active_pid_ns(t)) <= 0)
+		from_ancestor_ns = 1;
+#endif
+
+	return __send_signal(sig, info, t, group, from_ancestor_ns);
 }
 
 int print_fatal_signals;
@@ -1331,7 +1341,7 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
 		goto ret;
 
 	ret = 1; /* the signal is ignored */
-	if (!prepare_signal(sig, t))
+	if (!prepare_signal(sig, t, 0))
 		goto out;
 
 	ret = 0;
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 7/7][v7] proc: Show SIG_DFL signals to init as "ignored" signals
  2009-01-20  1:04     ` Sukadev Bhattiprolu
@ 2009-01-20  7:33       ` Oleg Nesterov
  2009-01-20 16:09         ` Sukadev Bhattiprolu
  0 siblings, 1 reply; 26+ messages in thread
From: Oleg Nesterov @ 2009-01-20  7:33 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: ebiederm, roland, bastian, daniel, xemul, containers, linux-kernel

On 01/19, Sukadev Bhattiprolu wrote:
>
> Oleg Nesterov [oleg@redhat.com] wrote:
> | On 01/17, Sukadev Bhattiprolu wrote:
> | >
> | > Init processes ignore SIG_DFL signals unless they are from an ancestor
> | > namespace.  Ensure /proc/pid/status correcly reports these signals.
> |
> | This is the user-visible change, and I don't really understand why do we
> | need it.
>
> This discussion came up earlier, with Bastian and Roland and my understanding
> was that we should fix the SigIgn line in /proc/pid/status - so I had added
> a TODO for this patchset.

Hmm. I must admit I don't understand what this patch buys us. Admin should
know that init is special and can't be killed by the SIG_DFL signal.

Now we change the contents if pid/status, and every user-visible change
should have a good reason, imho.

> | Imho, this patch can confuse the user-space. Why should we report that,
> | say, SIGCONT is ignored by the global init?
>
> But it is ignored right ?

Following this logic, we can report that, say, SIG_DFL'ed SIGWINCH is always
ignored by any task, not only by init?

> Also, if user space looks at the SigIgn line and assumes that SIGKILL or
> SIGUSR1 will kill init, user space can still be confused when it doesn't
> really kill - no ?

yes, but this is oddity is very old.

But, otoh, SIGCHLD. There is a huge difference between SIG_IGN and SIG_DFL
in that case, why should we hide it?

More generally, why should we hide from admin what init does with signals?

> So, should I just post separately or drop altogether ?

I guess you already see that personally I dislike this patch ;) At least
I'd ask you to not mix it with this series.

But perhaps I am wrong, I can't "prove" this change is not good, I'd be
ready to agree with majority.

Oleg.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 7/7][v7] proc: Show SIG_DFL signals to init as "ignored" signals
  2009-01-20  7:33       ` Oleg Nesterov
@ 2009-01-20 16:09         ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2009-01-20 16:09 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: ebiederm, roland, bastian, daniel, xemul, containers, linux-kernel

| 
| > | Imho, this patch can confuse the user-space. Why should we report that,
| > | say, SIGCONT is ignored by the global init?
| >
| > But it is ignored right ?
| 
| Following this logic, we can report that, say, SIG_DFL'ed SIGWINCH is always
| ignored by any task, not only by init?

Small difference with SIGWINCH though. SIGWINCH default action applies
to all processes. But the special behavior for SIGKILL only applies to
init. But yes, its an old oddity.

| I guess you already see that personally I dislike this patch ;) At least
| I'd ask you to not mix it with this series.

Ok.

| 
| But perhaps I am wrong, I can't "prove" this change is not good, I'd be
| ready to agree with majority.

I can't prove the opposite either. Will go with the majority...

Roland, Eric, Bastian. Any comments ? Should I send patches 1-6 to akpm ?

Sukadev

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/7][v7] Container-init signal semantics
  2009-01-19  2:09 ` [PATCH 0/7][v7] Container-init signal semantics KAMEZAWA Hiroyuki
@ 2009-01-21  3:05   ` Sukadev Bhattiprolu
  2009-01-21  3:53     ` KAMEZAWA Hiroyuki
                       ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2009-01-21  3:05 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: oleg, ebiederm, roland, bastian, daniel, xemul, containers, linux-kernel

KAMEZAWA Hiroyuki [kamezawa.hiroyu@jp.fujitsu.com] wrote:
| On Sat, 17 Jan 2009 12:26:38 -0800
| Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> wrote:
| 
| > 
| > Container-init must behave like global-init to processes within the
| > container and hence it must be immune to unhandled fatal signals from
| > within the container (i.e SIG_DFL signals that terminate the process).
| > 
| > But the same container-init must behave like a normal process to 
| > processes in ancestor namespaces and so if it receives the same fatal
| > signal from a process in ancestor namespace, the signal must be
| > processed.
| > 
| > Implementing these semantics requires that send_signal() determine pid
| > namespace of the sender but since signals can originate from workqueues/
| > interrupt-handlers, determining pid namespace of sender may not always
| > be possible or safe.
| > 
| 
| Is this feature is for blocking signals from children to name-space
| creator(owner) ?  And automatically used when namespace/cgroup is created ?
| IOW, Container-init is Namespace-Cgroup-init ? 

I am not sure what "Namespace-cgroup-init refers" to.

But, yes, this patchset applies to the first process in a pid namespace
i.e the child of clone(NEWPID) call.

| 
| I'm glad if you add some documentation updates about how-it-works to patch set.

Yes,  when the patchset is accepted, I am planning to add some notes to
/sbin/init man page. 

Thanks,

Sukadev

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/7][v7] Container-init signal semantics
  2009-01-21  3:05   ` Sukadev Bhattiprolu
@ 2009-01-21  3:53     ` KAMEZAWA Hiroyuki
  2009-01-21  4:16       ` Eric W. Biederman
  2009-01-21  4:05     ` Serge E. Hallyn
  2009-01-22  5:48     ` Matt Helsley
  2 siblings, 1 reply; 26+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-01-21  3:53 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: oleg, ebiederm, roland, bastian, daniel, xemul, containers, linux-kernel

On Tue, 20 Jan 2009 19:05:00 -0800
Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> wrote:

> KAMEZAWA Hiroyuki [kamezawa.hiroyu@jp.fujitsu.com] wrote:
> | On Sat, 17 Jan 2009 12:26:38 -0800
> | Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> wrote:
> | 
> | > 
> | > Container-init must behave like global-init to processes within the
> | > container and hence it must be immune to unhandled fatal signals from
> | > within the container (i.e SIG_DFL signals that terminate the process).
> | > 
> | > But the same container-init must behave like a normal process to 
> | > processes in ancestor namespaces and so if it receives the same fatal
> | > signal from a process in ancestor namespace, the signal must be
> | > processed.
> | > 
> | > Implementing these semantics requires that send_signal() determine pid
> | > namespace of the sender but since signals can originate from workqueues/
> | > interrupt-handlers, determining pid namespace of sender may not always
> | > be possible or safe.
> | > 
> | 
> | Is this feature is for blocking signals from children to name-space
> | creator(owner) ?  And automatically used when namespace/cgroup is created ?
> | IOW, Container-init is Namespace-Cgroup-init ? 
> 
> I am not sure what "Namespace-cgroup-init refers" to.
> 
> But, yes, this patchset applies to the first process in a pid namespace
> i.e the child of clone(NEWPID) call.
> 
O.K. thank you.

What makes me confused is "Container". There is no "Container" in
the linux kernel, just cgroup and its subsyss.
(Some source codes still use "cont" but new codes all use "cgroup" or "cgrp" )

So, I asked whether "Container" means "Namespace subsys" or something different.

-Kame


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/7][v7] Container-init signal semantics
  2009-01-21  3:05   ` Sukadev Bhattiprolu
  2009-01-21  3:53     ` KAMEZAWA Hiroyuki
@ 2009-01-21  4:05     ` Serge E. Hallyn
  2009-01-22  5:48     ` Matt Helsley
  2 siblings, 0 replies; 26+ messages in thread
From: Serge E. Hallyn @ 2009-01-21  4:05 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: KAMEZAWA Hiroyuki, linux-kernel, bastian, oleg, ebiederm,
	containers, roland, xemul

Quoting Sukadev Bhattiprolu (sukadev@linux.vnet.ibm.com):
> KAMEZAWA Hiroyuki [kamezawa.hiroyu@jp.fujitsu.com] wrote:
> | On Sat, 17 Jan 2009 12:26:38 -0800
> | Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> wrote:
> | 
> | > 
> | > Container-init must behave like global-init to processes within the
> | > container and hence it must be immune to unhandled fatal signals from
> | > within the container (i.e SIG_DFL signals that terminate the process).
> | > 
> | > But the same container-init must behave like a normal process to 
> | > processes in ancestor namespaces and so if it receives the same fatal
> | > signal from a process in ancestor namespace, the signal must be
> | > processed.
> | > 
> | > Implementing these semantics requires that send_signal() determine pid
> | > namespace of the sender but since signals can originate from workqueues/
> | > interrupt-handlers, determining pid namespace of sender may not always
> | > be possible or safe.
> | > 
> | 
> | Is this feature is for blocking signals from children to name-space
> | creator(owner) ?  And automatically used when namespace/cgroup is created ?
> | IOW, Container-init is Namespace-Cgroup-init ? 
> 
> I am not sure what "Namespace-cgroup-init refers" to.

It sounds like he means the first task in a newly created
ns cgroup.  In which case he would mean any task which has
done an unshare with any of CLONE_NEWNS, CLONE_NEWIPC, etc,
or is the result of clone with the same.

> But, yes, this patchset applies to the first process in a pid namespace
> i.e the child of clone(NEWPID) call.
> 
> | 
> | I'm glad if you add some documentation updates about how-it-works to patch set.
> 
> Yes,  when the patchset is accepted, I am planning to add some notes to
> /sbin/init man page. 
> 
> Thanks,
> 
> Sukadev
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/7][v7] Container-init signal semantics
  2009-01-21  3:53     ` KAMEZAWA Hiroyuki
@ 2009-01-21  4:16       ` Eric W. Biederman
  2009-01-21  4:23         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 26+ messages in thread
From: Eric W. Biederman @ 2009-01-21  4:16 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Sukadev Bhattiprolu, oleg, roland, bastian, daniel, xemul,
	containers, linux-kernel

KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:

> What makes me confused is "Container". There is no "Container" in
> the linux kernel, just cgroup and its subsyss.
> (Some source codes still use "cont" but new codes all use "cgroup" or "cgrp" )
>
> So, I asked whether "Container" means "Namespace subsys" or something different.

"Container" is what we use to refer to the user space concept, that is built
from a set of namespaces and control groups.  A Container that looks like
a standard linux install from the inside is what we expect the most common
use of namespaces and control groups to be.

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/7][v7] Container-init signal semantics
  2009-01-21  4:16       ` Eric W. Biederman
@ 2009-01-21  4:23         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 26+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-01-21  4:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Sukadev Bhattiprolu, oleg, roland, bastian, daniel, xemul,
	containers, linux-kernel

On Tue, 20 Jan 2009 20:16:40 -0800
ebiederm@xmission.com (Eric W. Biederman) wrote:

> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:
> 
> > What makes me confused is "Container". There is no "Container" in
> > the linux kernel, just cgroup and its subsyss.
> > (Some source codes still use "cont" but new codes all use "cgroup" or "cgrp" )
> >
> > So, I asked whether "Container" means "Namespace subsys" or something different.
> 
> "Container" is what we use to refer to the user space concept, that is built
> from a set of namespaces and control groups.  A Container that looks like
> a standard linux install from the inside is what we expect the most common
> use of namespaces and control groups to be.
> 

Thank you for clarification :)

Regards,
-Kame


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/7][v7] Container-init signal semantics
  2009-01-17 20:26 [PATCH 0/7][v7] Container-init signal semantics Sukadev Bhattiprolu
                   ` (7 preceding siblings ...)
  2009-01-19  2:09 ` [PATCH 0/7][v7] Container-init signal semantics KAMEZAWA Hiroyuki
@ 2009-01-21  4:39 ` Bryan Donlan
  2009-01-21  8:31   ` Oleg Nesterov
  2009-02-07 21:20 ` Sukadev Bhattiprolu
  9 siblings, 1 reply; 26+ messages in thread
From: Bryan Donlan @ 2009-01-21  4:39 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: oleg, ebiederm, roland, bastian, daniel, xemul, containers, linux-kernel

On Sat, Jan 17, 2009 at 3:26 PM, Sukadev Bhattiprolu
<sukadev@linux.vnet.ibm.com> wrote:
>
> Container-init must behave like global-init to processes within the
> container and hence it must be immune to unhandled fatal signals from
> within the container (i.e SIG_DFL signals that terminate the process).
>
> But the same container-init must behave like a normal process to
> processes in ancestor namespaces and so if it receives the same fatal
> signal from a process in ancestor namespace, the signal must be
> processed.
>
> Implementing these semantics requires that send_signal() determine pid
> namespace of the sender but since signals can originate from workqueues/
> interrupt-handlers, determining pid namespace of sender may not always
> be possible or safe.
>
> This patchset implements the design/simplified semantics suggested by
> Oleg Nesterov.  The simplified semantics for container-init are:
>
>        - container-init must never be terminated by a signal from a
>          descendant process.
>
>        - container-init must never be immune to SIGKILL from an ancestor
>          namespace (so a process in parent namespace must always be able
>          to terminate a descendant container).
>
>        - container-init may be immune to unhandled fatal signals (like
>          SIGUSR1) even if they are from ancestor namespace (SIGKILL is
>          the only reliable signal from ancestor namespace).

SIGSTOP is normally uncatchable; I note that patch 4 states that
SIGSTOP is allowed through to container-init, but given this summary
is SIGSTOP still reliable when sent to a container-init from an
ancestor namespace?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/7][v7] Container-init signal semantics
  2009-01-21  4:39 ` Bryan Donlan
@ 2009-01-21  8:31   ` Oleg Nesterov
  0 siblings, 0 replies; 26+ messages in thread
From: Oleg Nesterov @ 2009-01-21  8:31 UTC (permalink / raw)
  To: Bryan Donlan
  Cc: Sukadev Bhattiprolu, ebiederm, roland, bastian, daniel, xemul,
	containers, linux-kernel

On 01/20, Bryan Donlan wrote:
>
> On Sat, Jan 17, 2009 at 3:26 PM, Sukadev Bhattiprolu
> <sukadev@linux.vnet.ibm.com> wrote:
> >
> >        - container-init may be immune to unhandled fatal signals (like
> >          SIGUSR1) even if they are from ancestor namespace (SIGKILL is
> >          the only reliable signal from ancestor namespace).
>
> SIGSTOP is normally uncatchable; I note that patch 4 states that
> SIGSTOP is allowed through to container-init, but given this summary
> is SIGSTOP still reliable when sent to a container-init from an
> ancestor namespace?

Yes we should handle SIGSTOP fine if it sent from the parent namespace.

Also. Currently it is possible to ptrace the global init, but even
ptracer can't stop it (but ptrace_stop() works). With these patches
ptracer can stop init.

I forgot to mention this behaviour change, imho this side-effect
is good.

Oleg.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/7][v7] Container-init signal semantics
  2009-01-21  3:05   ` Sukadev Bhattiprolu
  2009-01-21  3:53     ` KAMEZAWA Hiroyuki
  2009-01-21  4:05     ` Serge E. Hallyn
@ 2009-01-22  5:48     ` Matt Helsley
  2 siblings, 0 replies; 26+ messages in thread
From: Matt Helsley @ 2009-01-22  5:48 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: KAMEZAWA Hiroyuki, linux-kernel, bastian, oleg, ebiederm,
	containers, roland, xemul, Michael Kerrisk [manpages]

On Tue, 2009-01-20 at 19:05 -0800, Sukadev Bhattiprolu wrote:
> KAMEZAWA Hiroyuki [kamezawa.hiroyu@jp.fujitsu.com] wrote:
> | On Sat, 17 Jan 2009 12:26:38 -0800
> | Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> wrote:
> | 
> | > 
> | > Container-init must behave like global-init to processes within the
> | > container and hence it must be immune to unhandled fatal signals from
> | > within the container (i.e SIG_DFL signals that terminate the process).
> | > 
> | > But the same container-init must behave like a normal process to 
> | > processes in ancestor namespaces and so if it receives the same fatal
> | > signal from a process in ancestor namespace, the signal must be
> | > processed.
> | > 
> | > Implementing these semantics requires that send_signal() determine pid
> | > namespace of the sender but since signals can originate from workqueues/
> | > interrupt-handlers, determining pid namespace of sender may not always
> | > be possible or safe.
> | > 
> | 
> | Is this feature is for blocking signals from children to name-space
> | creator(owner) ?  And automatically used when namespace/cgroup is created ?
> | IOW, Container-init is Namespace-Cgroup-init ? 
> 
> I am not sure what "Namespace-cgroup-init refers" to.
> 
> But, yes, this patchset applies to the first process in a pid namespace
> i.e the child of clone(NEWPID) call.
> 
> | 
> | I'm glad if you add some documentation updates about how-it-works to patch set.
> 
> Yes,  when the patchset is accepted, I am planning to add some notes to
> /sbin/init man page. 

When it's accepted I wonder if it might also be good to contact the
manpages list too so they can update the kill/signal manpages.

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/7][v7] Container-init signal semantics
  2009-01-17 20:26 [PATCH 0/7][v7] Container-init signal semantics Sukadev Bhattiprolu
                   ` (8 preceding siblings ...)
  2009-01-21  4:39 ` Bryan Donlan
@ 2009-02-07 21:20 ` Sukadev Bhattiprolu
  2009-02-09  4:04   ` Roland McGrath
  9 siblings, 1 reply; 26+ messages in thread
From: Sukadev Bhattiprolu @ 2009-02-07 21:20 UTC (permalink / raw)
  To: oleg, ebiederm, roland, bastian; +Cc: daniel, xemul, containers, linux-kernel

Sukadev Bhattiprolu [sukadev@linux.vnet.ibm.com] wrote:
| Patches in this set:
| 
| 	[PATCH 1/7] Remove 'handler' parameter to tracehook functions
| 	[PATCH 2/7] Protect init from unwanted signals more
| 	[PATCH 3/7] Add from_ancestor_ns parameter to send_signal()
| 	[PATCH 4/7] Protect cinit from unblocked SIG_DFL signals
| 	[PATCH 5/7] Protect cinit from blocked fatal signals
| 	[PATCH 6/7] SI_USER: Masquerade si_pid when crossing pid ns boundary
| 	[PATCH 7/7] proc: Show SIG_DFL signals to init as "ignored" signals

Roland, Eric, Daniel, Pavel

Any comments on this patchset ? We are not sure about patch 7, but should
I send the rest to Andrew ?

Sukadev

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/7][v7] Container-init signal semantics
  2009-02-07 21:20 ` Sukadev Bhattiprolu
@ 2009-02-09  4:04   ` Roland McGrath
  0 siblings, 0 replies; 26+ messages in thread
From: Roland McGrath @ 2009-02-09  4:04 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: oleg, ebiederm, bastian, daniel, xemul, containers, linux-kernel

> Any comments on this patchset ? We are not sure about patch 7, but should
> I send the rest to Andrew ?

I think they are fine now if Oleg doesn't see any problems.  The /proc
change in patch 7 is a matter of taste and I just don't care either way.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2009-02-09  4:05 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-17 20:26 [PATCH 0/7][v7] Container-init signal semantics Sukadev Bhattiprolu
2009-01-17 20:35 ` [PATCH 1/7][v7] Remove 'handler' parameter to tracehook functions Sukadev Bhattiprolu
2009-01-17 20:35 ` [PATCH 2/7][v7] Protect init from unwanted signals more Sukadev Bhattiprolu
2009-01-17 20:35 ` [PATCH 3/7][v7] Add from_ancestor_ns parameter to send_signal() Sukadev Bhattiprolu
2009-01-17 20:36 ` [PATCH 4/7][v7] Protect cinit from unblocked SIG_DFL signals Sukadev Bhattiprolu
2009-01-17 22:12   ` Oleg Nesterov
2009-01-20  1:07     ` Sukadev Bhattiprolu
2009-01-20  1:09   ` Sukadev Bhattiprolu
2009-01-17 20:36 ` [PATCH 5/7][v7] Protect cinit from blocked fatal signals Sukadev Bhattiprolu
2009-01-17 20:37 ` [PATCH 6/7][v7] SI_USER: Masquerade si_pid when crossing pid ns boundary Sukadev Bhattiprolu
2009-01-17 20:37 ` [PATCH 7/7][v7] proc: Show SIG_DFL signals to init as "ignored" signals Sukadev Bhattiprolu
2009-01-17 22:19   ` Oleg Nesterov
2009-01-20  1:04     ` Sukadev Bhattiprolu
2009-01-20  7:33       ` Oleg Nesterov
2009-01-20 16:09         ` Sukadev Bhattiprolu
2009-01-19  2:09 ` [PATCH 0/7][v7] Container-init signal semantics KAMEZAWA Hiroyuki
2009-01-21  3:05   ` Sukadev Bhattiprolu
2009-01-21  3:53     ` KAMEZAWA Hiroyuki
2009-01-21  4:16       ` Eric W. Biederman
2009-01-21  4:23         ` KAMEZAWA Hiroyuki
2009-01-21  4:05     ` Serge E. Hallyn
2009-01-22  5:48     ` Matt Helsley
2009-01-21  4:39 ` Bryan Donlan
2009-01-21  8:31   ` Oleg Nesterov
2009-02-07 21:20 ` Sukadev Bhattiprolu
2009-02-09  4:04   ` Roland McGrath

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).