linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6][v5]: Container-init signal semantics
@ 2008-12-27 20:46 Sukadev Bhattiprolu
  2008-12-27 20:49 ` [PATCH 1/6][v5] Remove 'handler' parameter to tracehook functions Sukadev Bhattiprolu
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Sukadev Bhattiprolu @ 2008-12-27 20:46 UTC (permalink / raw)
  To: oleg, ebiederm, roland, bastian; +Cc: daniel, xemul, containers, linux-kernel


Container-init must behave like global-init to processes within the
container and hence it must be immune to unhandled fatal signals from
within the container (i.e SIG_DFL signals that terminate the process).

But the same container-init must behave like a normal process to 
processes in ancestor namespaces and so if it receives the same fatal
signal from a process in ancestor namespace, the signal must be
processed.

Implementing these semantics requires that send_signal() determine pid
namespace of the sender but since signals can originate from workqueues/
interrupt-handlers, determining pid namespace of sender may not always
be possible or safe.

This patchset implements the design/simplified semantics suggested by
Oleg Nesterov.  The simplified semantics for container-init are:

	- container-init must never be terminated by a signal from a
	  descendant process.

	- container-init must never be immune to SIGKILL from an ancestor
	  namespace (so a process in parent namespace must always be able
	  to terminate a descendant container).

	- container-init may be immune to unhandled fatal signals (like
	  SIGUSR1) even if they are from ancestor namespace (SIGKILL is
	  the only reliable signal from ancestor namespace).

Patches in this set:

	[PATCH 1/6] Remove 'handler' parameter to tracehook functions
	[PATCH 2/6] Protect init from unwanted signals more
	[PATCH 3/6] Define siginfo_from_ancestor_ns()
	[PATCH 4/6] Protect cinit from unblocked SIG_DFL signals
	[PATCH 5/6] Protect cinit from blocked fatal signals
	[PATCH 6/6] SI_USER: Masquerade si_pid when crossing pid ns boundary

Changelog[v5]:
	- Patch 2/6: Remove SIG_IGN check in sig_task_ignored() and let
	  sig_handler_ignored() check SIG_IGN.
        - Patch 3/6. Put siginfo_from_ancestor_ns() back under CONFIG_PID_NS
	  and remove warning in rt_sigqueueinfo().
	- (Patch 5/6)Simplify check in get_signal_to_deliver()
	- (Patch 6/6)Simplify masquerading pid
	- LTP-20081219-intermediate showed no new errors on 2.6.28-rc5-mm2.
Changelog[v4]:
	- [Bugfix] Patch 3/7. Check ns == NULL in siginfo_from_ancestor_ns().
	  Although http://lkml.org/lkml/2008/12/16/502 makes it less likely
	  that ns == NULL, looks like an explicit check won't hurt ?
	- Remove SIGNAL_UNKILLABLE_FROM_NS flag and simplify logic as
	  suggested by Oleg Nesterov.
	- Dropped patch that set SIGNAL_UNKILLABLE_FROM_NS and set
	  SIGNAL_UNKILLABLE in patch 5/7 to be bisect-safe.
	- Add a warning in rt_sigqueueinfo() if SI_ASYNCIO is used
	  (patch 3/7)
	- Added two patches (6/7 and 7/7) to masquerade si_pid for
	  SI_USER and SI_TKILL


Changelog[v3]:
	Changes based on discussions of previous version:
		http://lkml.org/lkml/2008/11/25/458

	Major changes:

	- Define SIGNAL_UNKILLABLE_FROM_NS and use in container-inits to
	  skip fatal signals from same namespace but process SIGKILL/SIGSTOP
	  from ancestor namespace.
	- Use SI_FROMUSER() and si_code != SI_ASYNCIO to determine if
	  it is safe to dereference pid-namespace of caller. Highly
	  experimental :-)
	- Masquerading si_pid when crossing namespace boundary: relevant
	  patches merged in -mm and dropped from this set.

	Minor changes:

	- Remove 'handler' parameter to tracehook functions
	- Update sig_ignored() to drop SIG_DFL signals to global init early
	  (tried to address Roland's  and Oleg's comments)
	- Use 'same_ns' flag to drop SIGKILL/SIGSTOP to cinit from same
	  namespace


TODO:
	- Use sig_task_unkillable() in fs/proc/array.c:task_sig()
	  to correctly report ignored signals for container/global
	  init.

Limitations/side-effects of current design

	- Container-init is immune to suicide - kill(getpid(), SIGKILL) is
	  ignored. Use exit() :-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/6][v5] Remove 'handler' parameter to tracehook functions
  2008-12-27 20:46 [PATCH 0/6][v5]: Container-init signal semantics Sukadev Bhattiprolu
@ 2008-12-27 20:49 ` Sukadev Bhattiprolu
  2008-12-27 20:51 ` [PATCH 2/6][v5] Protect init from unwanted signals more Sukadev Bhattiprolu
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Sukadev Bhattiprolu @ 2008-12-27 20:49 UTC (permalink / raw)
  To: oleg, ebiederm, roland, bastian; +Cc: daniel, xemul, containers, linux-kernel


>From 7af2b3b090f84741ed7547ca7220a0525bd4f888 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg@redhat.com>
Date: Wed, 24 Dec 2008 13:35:17 -0800
Subject: [PATCH 1/6][v5] Remove 'handler' parameter to tracehook functions

Based on an earlier patch submitted by Oleg Nesterov and comments
from Roland McGrath (http://lkml.org/lkml/2008/11/19/258).

The handler parameter is currently unused in the tracehook functions.
Besides, the tracehook functions are called with siglock held, so the
functions can check the handler if they later need to.

Removing the parameter simiplifies changes to sig_ignored() in a follow-on
patch.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Roland McGrath <roland@redhat.com>
---
 arch/x86/kernel/ptrace.c  |    2 +-
 include/linux/tracehook.h |   13 ++++---------
 kernel/signal.c           |    6 +++---
 3 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 0a6d8c1..d6ef716 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1585,6 +1585,6 @@ asmregparm void syscall_trace_leave(struct pt_regs *regs)
 	 * system call instruction.
 	 */
 	if (test_thread_flag(TIF_SINGLESTEP) &&
-	    tracehook_consider_fatal_signal(current, SIGTRAP, SIG_DFL))
+	    tracehook_consider_fatal_signal(current, SIGTRAP))
 		send_sigtrap(current, regs, 0, TRAP_BRKPT);
 }
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 6186a78..eb4c654 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -388,17 +388,14 @@ static inline void tracehook_signal_handler(int sig, siginfo_t *info,
  * tracehook_consider_ignored_signal - suppress short-circuit of ignored signal
  * @task:		task receiving the signal
  * @sig:		signal number being sent
- * @handler:		%SIG_IGN or %SIG_DFL
  *
  * Return zero iff tracing doesn't care to examine this ignored signal,
  * so it can short-circuit normal delivery and never even get queued.
- * Either @handler is %SIG_DFL and @sig's default is ignore, or it's %SIG_IGN.
  *
  * Called with @task->sighand->siglock held.
  */
 static inline int tracehook_consider_ignored_signal(struct task_struct *task,
-						    int sig,
-						    void __user *handler)
+						    int sig)
 {
 	return (task_ptrace(task) & PT_PTRACED) != 0;
 }
@@ -407,19 +404,17 @@ static inline int tracehook_consider_ignored_signal(struct task_struct *task,
  * tracehook_consider_fatal_signal - suppress special handling of fatal signal
  * @task:		task receiving the signal
  * @sig:		signal number being sent
- * @handler:		%SIG_DFL or %SIG_IGN
  *
  * Return nonzero to prevent special handling of this termination signal.
- * Normally @handler is %SIG_DFL.  It can be %SIG_IGN if @sig is ignored,
- * in which case force_sig() is about to reset it to %SIG_DFL.
+ * Normally handler for signal is %SIG_DFL.  It can be %SIG_IGN if @sig is
+ * ignored, in which case force_sig() is about to reset it to %SIG_DFL.
  * When this returns zero, this signal might cause a quick termination
  * that does not give the debugger a chance to intercept the signal.
  *
  * Called with or without @task->sighand->siglock held.
  */
 static inline int tracehook_consider_fatal_signal(struct task_struct *task,
-						  int sig,
-						  void __user *handler)
+						  int sig)
 {
 	return (task_ptrace(task) & PT_PTRACED) != 0;
 }
diff --git a/kernel/signal.c b/kernel/signal.c
index 2a64304..7945e71 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -72,7 +72,7 @@ static int sig_ignored(struct task_struct *t, int sig)
 	/*
 	 * Tracers may want to know about even ignored signals.
 	 */
-	return !tracehook_consider_ignored_signal(t, sig, handler);
+	return !tracehook_consider_ignored_signal(t, sig);
 }
 
 /*
@@ -316,7 +316,7 @@ int unhandled_signal(struct task_struct *tsk, int sig)
 		return 1;
 	if (handler != SIG_IGN && handler != SIG_DFL)
 		return 0;
-	return !tracehook_consider_fatal_signal(tsk, sig, handler);
+	return !tracehook_consider_fatal_signal(tsk, sig);
 }
 
 
@@ -775,7 +775,7 @@ static void complete_signal(int sig, struct task_struct *p, int group)
 	    !(signal->flags & (SIGNAL_UNKILLABLE | SIGNAL_GROUP_EXIT)) &&
 	    !sigismember(&t->real_blocked, sig) &&
 	    (sig == SIGKILL ||
-	     !tracehook_consider_fatal_signal(t, sig, SIG_DFL))) {
+	     !tracehook_consider_fatal_signal(t, sig))) {
 		/*
 		 * This signal will be fatal to the whole group.
 		 */
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/6][v5] Protect init from unwanted signals more
  2008-12-27 20:46 [PATCH 0/6][v5]: Container-init signal semantics Sukadev Bhattiprolu
  2008-12-27 20:49 ` [PATCH 1/6][v5] Remove 'handler' parameter to tracehook functions Sukadev Bhattiprolu
@ 2008-12-27 20:51 ` Sukadev Bhattiprolu
  2008-12-27 20:52 ` [PATCH 3/6][v5] Define siginfo_from_ancestor_ns() Sukadev Bhattiprolu
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Sukadev Bhattiprolu @ 2008-12-27 20:51 UTC (permalink / raw)
  To: oleg, ebiederm, roland, bastian; +Cc: daniel, xemul, containers, linux-kernel


From: Oleg Nesterov <oleg@redhat.com>
Date: Wed, 24 Dec 2008 13:35:23 -0800
Subject: [PATCH 2/6][v5] Protect init from unwanted signals more

(This is a modified version of the patch submitted by Oleg Nesterov
http://lkml.org/lkml/2008/11/18/249 and tries to address comments
that came up in that discussion)

init ignores the SIG_DFL signals but we queue them anyway, including
SIGKILL. This is mostly OK, the signal will be dropped silently when
dequeued, but the pending SIGKILL has 2 bad implications:

        - it implies fatal_signal_pending(), so we confuse things
          like wait_for_completion_killable/lock_page_killable.

        - for the sub-namespace inits, the pending SIGKILL can
          mask (legacy_queue) the subsequent SIGKILL from the
          parent namespace which must kill cinit reliably.
          (preparation, cinits don't have SIGNAL_UNKILLABLE yet)

The patch can't help when init is ptraced, but ptracing of init is
not "safe" anyway.

Changelog[v5]:
	- (Oleg Nesterov) Remove SIG_IGN check in sig_task_ignored()
	  and let sig_handler_ignored() check SIG_IGN.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Roland McGrath <roland@redhat.com>
---
 kernel/signal.c |   16 +++++++++++++---
 1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 7945e71..30b9c5c 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -53,10 +53,21 @@ static int sig_handler_ignored(void __user *handler, int sig)
 		(handler == SIG_DFL && sig_kernel_ignore(sig));
 }
 
-static int sig_ignored(struct task_struct *t, int sig)
+static int sig_task_ignored(struct task_struct *t, int sig)
 {
 	void __user *handler;
 
+	handler = sig_handler(t, sig);
+
+	if (unlikely(t->signal->flags & SIGNAL_UNKILLABLE) &&
+			handler == SIG_DFL)
+		return 1;
+	
+	return sig_handler_ignored(handler, sig);
+}
+
+static int sig_ignored(struct task_struct *t, int sig)
+{
 	/*
 	 * Blocked signals are never ignored, since the
 	 * signal handler may change by the time it is
@@ -65,8 +76,7 @@ static int sig_ignored(struct task_struct *t, int sig)
 	if (sigismember(&t->blocked, sig) || sigismember(&t->real_blocked, sig))
 		return 0;
 
-	handler = sig_handler(t, sig);
-	if (!sig_handler_ignored(handler, sig))
+	if (!sig_task_ignored(t, sig))
 		return 0;
 
 	/*
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/6][v5] Define siginfo_from_ancestor_ns()
  2008-12-27 20:46 [PATCH 0/6][v5]: Container-init signal semantics Sukadev Bhattiprolu
  2008-12-27 20:49 ` [PATCH 1/6][v5] Remove 'handler' parameter to tracehook functions Sukadev Bhattiprolu
  2008-12-27 20:51 ` [PATCH 2/6][v5] Protect init from unwanted signals more Sukadev Bhattiprolu
@ 2008-12-27 20:52 ` Sukadev Bhattiprolu
  2008-12-31  0:12   ` Roland McGrath
                     ` (2 more replies)
  2008-12-27 20:53 ` [PATCH 4/6][v5] Protect cinit from unblocked SIG_DFL signals Sukadev Bhattiprolu
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 13+ messages in thread
From: Sukadev Bhattiprolu @ 2008-12-27 20:52 UTC (permalink / raw)
  To: oleg, ebiederm, roland, bastian; +Cc: daniel, xemul, containers, linux-kernel


From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Date: Wed, 24 Dec 2008 13:46:02 -0800
Subject: [PATCH 3/6][v5] Define siginfo_from_ancestor_ns()

Determine if sender of a signal is from an ancestor namespace. This
function will be used in a follow-on patch.

This is based on discussions on the patch from Oleg Nesterov and me
http://lkml.org/lkml/2008/11/25/462.

Changelog[v5]:
	- (Oleg Nesterov) Put siginfo_from_ancestor_ns() back under
	  CONFIG_PID_NS.
	- (Oleg Nesterov) Remove the warning in rt_sigqueueinfo().

Changelog[v4]:
	- siginfo_from_ancestor_ns() is fairly clean and it does not need
	  to be under CONFIG_PID_NS. Only siginfo_from_user() needs to be.
	- Warn if rt_sigqueueinfo() uses SI_ASYNCIO.
	- Added a check for pid-ns of receiver being NULL (in case it is
	  exiting).

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 kernel/signal.c |   56 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 56 insertions(+), 0 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 30b9c5c..f33100d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -820,6 +820,61 @@ static inline int legacy_queue(struct sigpending *signals, int sig)
 {
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
+/*
+ * Return 1 if this signal originated directly from a user process (i.e via
+ * kill(), tkill(), sigqueue()).  Return 0 otherwise.
+ *
+ * TODO:
+ * 	  To make this less hacky, make SI_ASYNCIO a kernel signal.
+ */
+#ifdef CONFIG_PID_NS
+static inline int siginfo_from_user(siginfo_t *info)
+{
+	if (!is_si_special(info) && SI_FROMUSER(info) &&
+				info->si_code != SI_ASYNCIO)
+		return 1;
+
+	return 0;
+}
+
+static inline int siginfo_from_ancestor_ns(struct task_struct *t,
+                       siginfo_t *info)
+{
+	struct pid_namespace *ns;
+
+	/*
+	 * Ensure signal is from user-space before checking pid namespace.
+	 * (We maybe called from interrupt context and dereferencing
+	 * pid namespace would be safe).
+	 */
+	if (siginfo_from_user(info)) {
+		/*
+		 * If we do not have a pid in the receiver's namespace,
+		 * we must be from an ancestor namespace.
+		 *
+		 * Note: 
+		 * 	If receiver is exiting, ns == NULL, signal will be
+		 * 	queued but eventually ignored anyway (wants_signal()
+		 * 	is FALSE). Assume here that signal is from ancestor
+		 * 	namespace and queue the signal.
+		 */
+		ns = task_active_pid_ns(t);
+		if (!ns || task_pid_nr_ns(current, ns) <= 0)
+			return 1;
+	}
+
+	return 0;
+}
+
+#else
+
+static inline int siginfo_from_ancestor_ns(struct task_struct *t,
+                       siginfo_t *info)
+{
+	return 0;
+}
+
+#endif
 
 static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
 			int group)
@@ -2324,6 +2379,7 @@ sys_rt_sigqueueinfo(pid_t pid, int sig, siginfo_t __user *uinfo)
 	   Nor can they impersonate a kill(), which adds source info.  */
 	if (info.si_code >= 0)
 		return -EPERM;
+
 	info.si_signo = sig;
 
 	/* POSIX.1b doesn't mention process groups.  */
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/6][v5] Protect cinit from unblocked SIG_DFL signals
  2008-12-27 20:46 [PATCH 0/6][v5]: Container-init signal semantics Sukadev Bhattiprolu
                   ` (2 preceding siblings ...)
  2008-12-27 20:52 ` [PATCH 3/6][v5] Define siginfo_from_ancestor_ns() Sukadev Bhattiprolu
@ 2008-12-27 20:53 ` Sukadev Bhattiprolu
  2008-12-31  0:19   ` Roland McGrath
  2008-12-27 20:54 ` [PATCH 5/6][v5] Protect cinit from blocked fatal signals Sukadev Bhattiprolu
  2008-12-27 20:55 ` [PATCH 6/6][v5] SI_USER: Masquerade si_pid when crossing pid ns boundary Sukadev Bhattiprolu
  5 siblings, 1 reply; 13+ messages in thread
From: Sukadev Bhattiprolu @ 2008-12-27 20:53 UTC (permalink / raw)
  To: oleg, ebiederm, roland, bastian; +Cc: daniel, xemul, containers, linux-kernel


From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Date: Wed, 24 Dec 2008 14:03:57 -0800
Subject: [PATCH 4/6][v5] Protect cinit from unblocked SIG_DFL signals

Drop early any SIG_DFL or SIG_IGN signals to container-init from within
the same container. But queue SIGSTOP and SIGKILL to the container-init
if they are from an ancestor container.

Blocked, fatal signals (i.e when SIG_DFL is to terminate) from within the
container can still terminate the container-init. That will be addressed
in the next patch.

Note:	To be bisect-safe, SIGNAL_UNKILLABLE will be set for container-inits
   	in a follow-on patch. Until then, this patch is just a preparatory
	step.

Changelog[v4]:
	- (Oleg Nesterov) Remove SIGNAL_UNKILLABLE_FROM_NS and rename
	  'same_ns' to 'from_ancestor_ns'.
	- SIGNAL_UNKILLABLE is not yet set for container-inits (will be
	  set in follow-on patch).

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 kernel/signal.c |   36 +++++++++++++++++++++++++++---------
 1 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index f33100d..b51781e 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -53,20 +53,34 @@ static int sig_handler_ignored(void __user *handler, int sig)
 		(handler == SIG_DFL && sig_kernel_ignore(sig));
 }
 
-static int sig_task_ignored(struct task_struct *t, int sig)
+/*
+ * Return 1 if task @t is either the global init or the container-init
+ * of the sender's container. Return 0 otherwise.
+ */
+static int sig_task_unkillable(struct task_struct *t, int from_ancestor_ns)
+{
+	int flags = t->signal->flags;
+
+	if (unlikely(flags & SIGNAL_UNKILLABLE) && !from_ancestor_ns)
+		return 1;
+
+	return 0;
+}
+
+static int sig_task_ignored(struct task_struct *t, int sig, int from_ancestor_ns)
 {
 	void __user *handler;
 
 	handler = sig_handler(t, sig);
 
-	if (unlikely(t->signal->flags & SIGNAL_UNKILLABLE) &&
+	if (sig_task_unkillable(t, from_ancestor_ns) &&
 			handler == SIG_DFL)
 		return 1;
-	
+
 	return sig_handler_ignored(handler, sig);
 }
 
-static int sig_ignored(struct task_struct *t, int sig)
+static int sig_ignored(struct task_struct *t, int sig, int from_ancestor_ns)
 {
 	/*
 	 * Blocked signals are never ignored, since the
@@ -76,7 +90,7 @@ static int sig_ignored(struct task_struct *t, int sig)
 	if (sigismember(&t->blocked, sig) || sigismember(&t->real_blocked, sig))
 		return 0;
 
-	if (!sig_task_ignored(t, sig))
+	if (!sig_task_ignored(t, sig, from_ancestor_ns))
 		return 0;
 
 	/*
@@ -632,7 +646,7 @@ static int check_kill_permission(int sig, struct siginfo *info,
  * Returns true if the signal should be actually delivered, otherwise
  * it should be dropped.
  */
-static int prepare_signal(int sig, struct task_struct *p)
+static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
 {
 	struct signal_struct *signal = p->signal;
 	struct task_struct *t;
@@ -716,7 +730,7 @@ static int prepare_signal(int sig, struct task_struct *p)
 		}
 	}
 
-	return !sig_ignored(p, sig);
+	return !sig_ignored(p, sig, from_ancestor_ns);
 }
 
 /*
@@ -881,11 +895,15 @@ static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
 {
 	struct sigpending *pending;
 	struct sigqueue *q;
+	int from_ancestor_ns;
 
 	trace_sched_signal_send(sig, t);
 
 	assert_spin_locked(&t->sighand->siglock);
-	if (!prepare_signal(sig, t))
+
+	from_ancestor_ns = siginfo_from_ancestor_ns(t, info);
+
+	if (!prepare_signal(sig, t, from_ancestor_ns))
 		return 0;
 
 	pending = group ? &t->signal->shared_pending : &t->pending;
@@ -1380,7 +1398,7 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
 		goto ret;
 
 	ret = 1; /* the signal is ignored */
-	if (!prepare_signal(sig, t))
+	if (!prepare_signal(sig, t, 1))
 		goto out;
 
 	ret = 0;
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/6][v5] Protect cinit from blocked fatal signals
  2008-12-27 20:46 [PATCH 0/6][v5]: Container-init signal semantics Sukadev Bhattiprolu
                   ` (3 preceding siblings ...)
  2008-12-27 20:53 ` [PATCH 4/6][v5] Protect cinit from unblocked SIG_DFL signals Sukadev Bhattiprolu
@ 2008-12-27 20:54 ` Sukadev Bhattiprolu
  2009-01-05 15:16   ` Oleg Nesterov
  2008-12-27 20:55 ` [PATCH 6/6][v5] SI_USER: Masquerade si_pid when crossing pid ns boundary Sukadev Bhattiprolu
  5 siblings, 1 reply; 13+ messages in thread
From: Sukadev Bhattiprolu @ 2008-12-27 20:54 UTC (permalink / raw)
  To: oleg, ebiederm, roland, bastian; +Cc: daniel, xemul, containers, linux-kernel


From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Date: Wed, 24 Dec 2008 14:04:24 -0800
Subject: [PATCH 5/6][v5] Protect cinit from blocked fatal signals

Normally SIG_DFL signals to global and container-init are dropped early.
But if a signal is blocked when it is posted, we cannot drop the signal
since the receiver may install a handler before unblocking the signal.
Once this signal is queued however, the receiver container-init has
no way of knowing if the signal was sent from an ancestor or descendant
namespace.  This patch ensures that contianer-init drops all SIG_DFL
signals in get_signal_to_deliver() except SIGKILL/SIGSTOP.

If SIGSTOP/SIGKILL originate from a descendant of container-init they
are never queued (i.e dropped in sig_ignored() in an earler patch).

If SIGSTOP/SIGKILL originate from parent namespace, the signal is queued
and container-init processes the signal.

And now that we have all pieces in place, set SIGNAL_UNKILLABLE for
container-inits.

Changelog[v5]:
	- (Oleg Nesterov) Drop signal_unkillable(), simplify check in
	  get_signal_to_deliver() and drop check for signal_group_exit()
	  since it is covered by sig_kernel_only().

Changelog[v4]:
	- Rename sig_unkillable() to unkillable_by_sig()
	- Remove SIGNAL_UNKILLABLE_FROM_NS flag and simplify (Oleg Nesterov)
	- Set SIGNAL_UNKILLABLE for container-init in this patch.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 kernel/fork.c   |    2 ++
 kernel/signal.c |    6 ++++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index dba2d3f..d3e93ef 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -812,6 +812,8 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 	atomic_set(&sig->live, 1);
 	init_waitqueue_head(&sig->wait_chldexit);
 	sig->flags = 0;
+	if (clone_flags & CLONE_NEWPID)
+		sig->flags |= SIGNAL_UNKILLABLE;
 	sig->group_exit_code = 0;
 	sig->group_exit_task = NULL;
 	sig->group_stop_count = 0;
diff --git a/kernel/signal.c b/kernel/signal.c
index b51781e..d90b033 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1914,9 +1914,11 @@ relock:
 
 		/*
 		 * Global init gets no signals it doesn't want.
+		 * Container-init gets no signals it doesn't want from same
+		 * container.
 		 */
-		if (unlikely(signal->flags & SIGNAL_UNKILLABLE) &&
-		    !signal_group_exit(signal))
+		if ((signal->flags & SIGNAL_UNKILLABLE) &&
+				!sig_kernel_only(signr))
 			continue;
 
 		if (sig_kernel_stop(signr)) {
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/6][v5] SI_USER: Masquerade si_pid when crossing pid ns boundary
  2008-12-27 20:46 [PATCH 0/6][v5]: Container-init signal semantics Sukadev Bhattiprolu
                   ` (4 preceding siblings ...)
  2008-12-27 20:54 ` [PATCH 5/6][v5] Protect cinit from blocked fatal signals Sukadev Bhattiprolu
@ 2008-12-27 20:55 ` Sukadev Bhattiprolu
  5 siblings, 0 replies; 13+ messages in thread
From: Sukadev Bhattiprolu @ 2008-12-27 20:55 UTC (permalink / raw)
  To: oleg, ebiederm, roland, bastian; +Cc: daniel, xemul, containers, linux-kernel


From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Date: Wed, 24 Dec 2008 14:14:18 -0800
Subject: [PATCH 6/6][v5] SI_USER: Masquerade si_pid when crossing pid ns boundary

When sending a signal to a descendant namespace, set ->si_pid to 0 since
the sender does not have a pid in the receiver's namespace.

Note:
	- If rt_sigqueueinfo() sets si_code to SI_USER when sending a
	  signal across a pid namespace boundary, the value in ->si_pid
	  will be cleared to 0.

Changelog[v5]:
	- (Oleg Nesterov) Address both sys_kill() and sys_tkill() cases
	  in send_signal() to simplify code (this drops patch 7/7 from
	  earlier version of patchset).

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 kernel/signal.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index d90b033..7ffd044 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -951,6 +951,8 @@ static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
 			break;
 		default:
 			copy_siginfo(&q->info, info);
+			if (from_ancestor_ns)
+				q->info.si_pid = 0;
 			break;
 		}
 	} else if (!is_si_special(info)) {
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/6][v5] Define siginfo_from_ancestor_ns()
  2008-12-27 20:52 ` [PATCH 3/6][v5] Define siginfo_from_ancestor_ns() Sukadev Bhattiprolu
@ 2008-12-31  0:12   ` Roland McGrath
  2009-01-05 12:42   ` Oleg Nesterov
  2009-01-05 14:33   ` Oleg Nesterov
  2 siblings, 0 replies; 13+ messages in thread
From: Roland McGrath @ 2008-12-31  0:12 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: oleg, ebiederm, bastian, daniel, xemul, containers, linux-kernel

Drop the second hunk of the patch (stray whitespace change).

The treatment of SI_ASYNCIO is wrong, as I described in an earlier posting.
(It is indeed a user-generated signal.)  You'll need to avoid using this
function in the path from the USB driver case (interrupt level), since you
can't use SI_ASYNCIO to avoid unsafe use of pid_ns data structures.  If you
do as I suggested in that posting, this is no problem since you just never
do this check in the kill_pid_info_as_uid (perhaps renamed) case.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/6][v5] Protect cinit from unblocked SIG_DFL signals
  2008-12-27 20:53 ` [PATCH 4/6][v5] Protect cinit from unblocked SIG_DFL signals Sukadev Bhattiprolu
@ 2008-12-31  0:19   ` Roland McGrath
  2009-01-05 13:24     ` Oleg Nesterov
  0 siblings, 1 reply; 13+ messages in thread
From: Roland McGrath @ 2008-12-31  0:19 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: oleg, ebiederm, bastian, daniel, xemul, containers, linux-kernel

Since you didn't add the sig_task_unkillable() helper for its one simple
condition in patch 2/6, I don't think you need it here either for two.
Just add '&& !from_ancestor_ns' inline.

This patch has more stray whitespace changes.  
I think checkpatch.pl will spot those for you.

I'd just make the flag a parameter to send_signal() directly.
Then you can just change kill_pid_info_as_uid (or its replacement)
to call send_signal() with that new flag (and the 'group' flag) 
directly instead of using the trivial __group_send_sig_info wrapper.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/6][v5] Define siginfo_from_ancestor_ns()
  2008-12-27 20:52 ` [PATCH 3/6][v5] Define siginfo_from_ancestor_ns() Sukadev Bhattiprolu
  2008-12-31  0:12   ` Roland McGrath
@ 2009-01-05 12:42   ` Oleg Nesterov
  2009-01-05 14:33   ` Oleg Nesterov
  2 siblings, 0 replies; 13+ messages in thread
From: Oleg Nesterov @ 2009-01-05 12:42 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: ebiederm, roland, bastian, daniel, xemul, containers, linux-kernel

On 12/27, Sukadev Bhattiprolu wrote:
>
> +static inline int siginfo_from_ancestor_ns(struct task_struct *t,
> +                       siginfo_t *info)
> +{
> +	struct pid_namespace *ns;
> +
> +	/*
> +	 * Ensure signal is from user-space before checking pid namespace.
> +	 * (We maybe called from interrupt context and dereferencing
> +	 * pid namespace would be safe).
> +	 */
> +	if (siginfo_from_user(info)) {
> +		/*
> +		 * If we do not have a pid in the receiver's namespace,
> +		 * we must be from an ancestor namespace.
> +		 *
> +		 * Note:
> +		 * 	If receiver is exiting, ns == NULL,

Confused. I thought we alread have the patch which ensures
task_active_pid_ns() is never NULL?

If not, we can get ns from task_pid(t). See also below.

>                       signal will be
> +		 * 	queued but eventually ignored anyway (wants_signal()
> +		 * 	is FALSE).

This is only true for thread-specific signals, please remove
this comment,

> +		ns = task_active_pid_ns(t);
> +		if (!ns || task_pid_nr_ns(current, ns) <= 0)
> +			return 1;

See above. In any case, we shouldn't return 1 if ns == NULL.
But afaics we always can know its namespace.

Oleg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/6][v5] Protect cinit from unblocked SIG_DFL signals
  2008-12-31  0:19   ` Roland McGrath
@ 2009-01-05 13:24     ` Oleg Nesterov
  0 siblings, 0 replies; 13+ messages in thread
From: Oleg Nesterov @ 2009-01-05 13:24 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Sukadev Bhattiprolu, ebiederm, bastian, daniel, xemul,
	containers, linux-kernel

On 12/30, Roland McGrath wrote:
>
> I'd just make the flag a parameter to send_signal() directly.
> Then you can just change kill_pid_info_as_uid (or its replacement)
> to call send_signal() with that new flag (and the 'group' flag)
> directly instead of using the trivial __group_send_sig_info wrapper.

Ah, good. Can't understand why I didn't think about this before!

But, perhaps, it is better to add the new helper, __send_signal()
or whatever which has the new "from_ancestor_ns" argument. Then,

	static int send_signal(...)
	{
		bool from_ancestor_ns = 0;

	#ifdef CONFIG_PID_NS
		if (!is_si_special(info) && SI_FROMUSER(info)) {
			from_ancestor_ns = !task_pid_nr_ns(current, task_active_pid_ns(t));
		}
	#endif

		return __send_signal(..., from_ancestor_ns);
	}

but this is cosmetic issue.

Oleg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/6][v5] Define siginfo_from_ancestor_ns()
  2008-12-27 20:52 ` [PATCH 3/6][v5] Define siginfo_from_ancestor_ns() Sukadev Bhattiprolu
  2008-12-31  0:12   ` Roland McGrath
  2009-01-05 12:42   ` Oleg Nesterov
@ 2009-01-05 14:33   ` Oleg Nesterov
  2 siblings, 0 replies; 13+ messages in thread
From: Oleg Nesterov @ 2009-01-05 14:33 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: ebiederm, roland, bastian, daniel, xemul, containers, linux-kernel

Really minor nit, just noticed...

On 12/27, Sukadev Bhattiprolu wrote:
>
> +static inline int siginfo_from_ancestor_ns(struct task_struct *t,
> +                       siginfo_t *info)
> +{
> +	struct pid_namespace *ns;
> +
> +	/*
> +	 * Ensure signal is from user-space before checking pid namespace.
> +	 * (We maybe called from interrupt context and dereferencing
> +	 * pid namespace would be safe).
> +	 */
> +	if (siginfo_from_user(info)) {

I can't parse the comment above, and imho it is confusing and
misleading. We can dereference pid namespace even in interrupt
context.

Also, the comment looks as if "when siginfo_from_user() is false,
it is not safe/possible to derive from_ancestor_ns". This is not
true, in that case we know that from_ancestor_ns must be false.

from_ancestor_ns == T means the signal was sent from the user
space, and it was sent to the task in the sub-namespace, so it
is clear why we check siginfo_from_user().

Oleg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/6][v5] Protect cinit from blocked fatal signals
  2008-12-27 20:54 ` [PATCH 5/6][v5] Protect cinit from blocked fatal signals Sukadev Bhattiprolu
@ 2009-01-05 15:16   ` Oleg Nesterov
  0 siblings, 0 replies; 13+ messages in thread
From: Oleg Nesterov @ 2009-01-05 15:16 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: ebiederm, roland, bastian, daniel, xemul, containers, linux-kernel

On 12/27, Sukadev Bhattiprolu wrote:
>
> +		 * Container-init gets no signals it doesn't want from same
> +		 * container.
>  		 */
> -		if (unlikely(signal->flags & SIGNAL_UNKILLABLE) &&
> -		    !signal_group_exit(signal))
> +		if ((signal->flags & SIGNAL_UNKILLABLE) &&
> +				!sig_kernel_only(signr))

minor, but why did you remove unlikely() ?

And could you improve the changelog a bit? Please explain
that !signal_group_exit() was needed to handle the case
when the mt init does exit/exec and sends SIGKILL to all
sub-threads. Now this is covered by sig_kernel_only().

In short, please explain that if SIGNAL_UNKILLABLE task
sees the sig_kernel_only() signal here, then it was either
sent by the parent namespace or it was generated internally.
In both cases we should not drop it.

But your current changelog mostly explains why do we need
another SIGNAL_UNKILLABLE check in get_signal_to_deliver(),
this is orthogonal to the change itself.

Oleg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-01-05 15:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-27 20:46 [PATCH 0/6][v5]: Container-init signal semantics Sukadev Bhattiprolu
2008-12-27 20:49 ` [PATCH 1/6][v5] Remove 'handler' parameter to tracehook functions Sukadev Bhattiprolu
2008-12-27 20:51 ` [PATCH 2/6][v5] Protect init from unwanted signals more Sukadev Bhattiprolu
2008-12-27 20:52 ` [PATCH 3/6][v5] Define siginfo_from_ancestor_ns() Sukadev Bhattiprolu
2008-12-31  0:12   ` Roland McGrath
2009-01-05 12:42   ` Oleg Nesterov
2009-01-05 14:33   ` Oleg Nesterov
2008-12-27 20:53 ` [PATCH 4/6][v5] Protect cinit from unblocked SIG_DFL signals Sukadev Bhattiprolu
2008-12-31  0:19   ` Roland McGrath
2009-01-05 13:24     ` Oleg Nesterov
2008-12-27 20:54 ` [PATCH 5/6][v5] Protect cinit from blocked fatal signals Sukadev Bhattiprolu
2009-01-05 15:16   ` Oleg Nesterov
2008-12-27 20:55 ` [PATCH 6/6][v5] SI_USER: Masquerade si_pid when crossing pid ns boundary Sukadev Bhattiprolu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).