All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
@ 2011-08-11 20:23 Daniel Lezcano
       [not found] ` <1313094241-3674-1-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
                   ` (3 more replies)
  0 siblings, 4 replies; 64+ messages in thread
From: Daniel Lezcano @ 2011-08-11 20:23 UTC (permalink / raw)
  To: akpm; +Cc: oleg, bonbons, containers, linux-kernel, serge

From: Daniel Lezcano <dlezcano@fr.ibm.com>

In the case of a VPS, when we shutdown/halt/reboot the container, the
reboot utility will invoke the sys_reboot syscall which has the bad
effect to reboot the host. The way to fix that is to drop the
CAP_SYS_REBOOT capability in the container.

In this case, the container shutdowns correctly but, at the end, the
init process is waiting indefinitely and we have the containers stuck
with one process (the init process).

In order to fix that, we used a hypervisor process, parent of the
container's init process, watching for the container's utmp file and
detecting when the runlevel changes. When this runlevel change is
detected we wait for the container to have one process left and then we
kill the container's init.

That works well if we modify the distro configuration files, we make
/var/run to not be a tmpfs and we remove all the files inside this
directory when the container boots. *But* as soon as we upgrade the
container distro, all the tweaks are lost. So this method works but at
the cost of tweaking the containers configuration files again and again,
each time there is an update, which is not tolerable in a production
environment.

This patchset solves the problem by send a SIGCHLD signal to the process
parent of the init process of the child pid namespace. By this way, we know
when a pid namespace wanted to reboot/halt/shutdown and we can take advantage
of that to kill, restart or suspend the container.

Daniel Lezcano (2):
  add SA_CLDREBOOT flag
  Notify container-init parent a 'reboot' occured

 arch/alpha/include/asm/signal.h   |    2 +
 arch/arm/include/asm/signal.h     |    2 +
 arch/avr32/include/asm/signal.h   |    2 +
 arch/cris/include/asm/signal.h    |    2 +
 arch/h8300/include/asm/signal.h   |    2 +
 arch/ia64/include/asm/signal.h    |    2 +
 arch/m32r/include/asm/signal.h    |    2 +
 arch/m68k/include/asm/signal.h    |    2 +
 arch/mips/include/asm/signal.h    |    2 +
 arch/mn10300/include/asm/signal.h |    2 +
 arch/parisc/include/asm/signal.h  |    2 +
 arch/powerpc/include/asm/signal.h |    2 +
 arch/s390/include/asm/signal.h    |    2 +
 arch/sparc/include/asm/signal.h   |    2 +-
 arch/x86/include/asm/signal.h     |    2 +
 arch/xtensa/include/asm/signal.h  |    2 +
 include/asm-generic/siginfo.h     |    3 +-
 include/asm-generic/signal.h      |    2 +
 include/linux/sched.h             |    1 +
 kernel/signal.c                   |   40 +++++++++++++++++++++++++++++++++++++
 kernel/sys.c                      |   20 ++++++++++++++++-
 21 files changed, 94 insertions(+), 4 deletions(-)

-- 
1.7.4.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 1/2] add SA_CLDREBOOT flag
       [not found] ` <1313094241-3674-1-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
@ 2011-08-11 20:24   ` Daniel Lezcano
  2011-08-11 20:24   ` [PATCH 2/2] Notify container-init parent a 'reboot' occured Daniel Lezcano
  2011-08-14 16:17   ` [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot Oleg Nesterov
  2 siblings, 0 replies; 64+ messages in thread
From: Daniel Lezcano @ 2011-08-11 20:24 UTC (permalink / raw)
  To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy,
	oleg-6lXkIZvqkOAvJsYlp49lxw, linux-kernel-u79uwXL29TY76Z2rM5mHXA

Add the SA_CLDREBOOT flag to enable the CLD_REBOOT on SIGCHLD
when a process of a child pid namespace calls the reboot syscall.

Signed-off-by: Daniel Lezcano <daniel.lezcano-GANU6spQydw@public.gmane.org>
---
 arch/alpha/include/asm/signal.h   |    2 ++
 arch/arm/include/asm/signal.h     |    2 ++
 arch/avr32/include/asm/signal.h   |    2 ++
 arch/cris/include/asm/signal.h    |    2 ++
 arch/h8300/include/asm/signal.h   |    2 ++
 arch/ia64/include/asm/signal.h    |    2 ++
 arch/m32r/include/asm/signal.h    |    2 ++
 arch/m68k/include/asm/signal.h    |    2 ++
 arch/mips/include/asm/signal.h    |    2 ++
 arch/mn10300/include/asm/signal.h |    2 ++
 arch/parisc/include/asm/signal.h  |    2 ++
 arch/powerpc/include/asm/signal.h |    2 ++
 arch/s390/include/asm/signal.h    |    2 ++
 arch/sparc/include/asm/signal.h   |    2 +-
 arch/x86/include/asm/signal.h     |    2 ++
 arch/xtensa/include/asm/signal.h  |    2 ++
 include/asm-generic/signal.h      |    2 ++
 17 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/arch/alpha/include/asm/signal.h b/arch/alpha/include/asm/signal.h
index a938830..40a2285 100644
--- a/arch/alpha/include/asm/signal.h
+++ b/arch/alpha/include/asm/signal.h
@@ -82,6 +82,7 @@ typedef unsigned long sigset_t;
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
+ * SA_CLDREBOOT flag to turn on SIGCHLD when pid namespace calls reboot
  *
  * SA_ONESHOT and SA_NOMASK are the historical Linux names for the Single
  * Unix names RESETHAND and NODEFER respectively.
@@ -94,6 +95,7 @@ typedef unsigned long sigset_t;
 #define SA_RESETHAND	0x00000010
 #define SA_NOCLDWAIT	0x00000020
 #define SA_SIGINFO	0x00000040
+#define SA_CLDREBOOT    0x00000080
 
 #define SA_ONESHOT	SA_RESETHAND
 #define SA_NOMASK	SA_NODEFER
diff --git a/arch/arm/include/asm/signal.h b/arch/arm/include/asm/signal.h
index 43ba0fb..2b585ec 100644
--- a/arch/arm/include/asm/signal.h
+++ b/arch/arm/include/asm/signal.h
@@ -78,6 +78,7 @@ typedef unsigned long sigset_t;
  * SA_NOCLDSTOP		flag to turn off SIGCHLD when children stop.
  * SA_NOCLDWAIT		flag on SIGCHLD to inhibit zombies.
  * SA_SIGINFO		deliver the signal with SIGINFO structs
+ * SA_CLDREBOOT         flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_THIRTYTWO		delivers the signal in 32-bit mode, even if the task 
  *			is running in 26-bit.
  * SA_ONSTACK		allows alternate signal stacks (see sigaltstack(2)).
@@ -91,6 +92,7 @@ typedef unsigned long sigset_t;
 #define SA_NOCLDSTOP	0x00000001
 #define SA_NOCLDWAIT	0x00000002
 #define SA_SIGINFO	0x00000004
+#define SA_CLDREBOOT    0x00000008
 #define SA_THIRTYTWO	0x02000000
 #define SA_RESTORER	0x04000000
 #define SA_ONSTACK	0x08000000
diff --git a/arch/avr32/include/asm/signal.h b/arch/avr32/include/asm/signal.h
index 8790dfc..a025546 100644
--- a/arch/avr32/include/asm/signal.h
+++ b/arch/avr32/include/asm/signal.h
@@ -83,6 +83,7 @@ typedef unsigned long sigset_t;
  * SA_NOCLDSTOP		flag to turn off SIGCHLD when children stop.
  * SA_NOCLDWAIT		flag on SIGCHLD to inhibit zombies.
  * SA_SIGINFO		deliver the signal with SIGINFO structs
+ * SA_CLDREBOOT         flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_ONSTACK		indicates that a registered stack_t will be used.
  * SA_RESTART		flag to get restarting signals (which were the default long ago)
  * SA_NODEFER		prevents the current signal from being masked in the handler.
@@ -94,6 +95,7 @@ typedef unsigned long sigset_t;
 #define SA_NOCLDSTOP	0x00000001
 #define SA_NOCLDWAIT	0x00000002
 #define SA_SIGINFO	0x00000004
+#define SA_CLDREBOOT    0x00000008
 #define SA_RESTORER	0x04000000
 #define SA_ONSTACK	0x08000000
 #define SA_RESTART	0x10000000
diff --git a/arch/cris/include/asm/signal.h b/arch/cris/include/asm/signal.h
index ea6af9a..fa804f0 100644
--- a/arch/cris/include/asm/signal.h
+++ b/arch/cris/include/asm/signal.h
@@ -76,6 +76,7 @@ typedef unsigned long sigset_t;
  * SA_ONSTACK indicates that a registered stack_t will be used.
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_CLDREBOOT flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -87,6 +88,7 @@ typedef unsigned long sigset_t;
 #define SA_NOCLDSTOP	0x00000001u
 #define SA_NOCLDWAIT	0x00000002u
 #define SA_SIGINFO	0x00000004u
+#define SA_CLDREBOOT    0x00000008u
 #define SA_ONSTACK	0x08000000u
 #define SA_RESTART	0x10000000u
 #define SA_NODEFER	0x40000000u
diff --git a/arch/h8300/include/asm/signal.h b/arch/h8300/include/asm/signal.h
index fd8b66e..13b4b5a 100644
--- a/arch/h8300/include/asm/signal.h
+++ b/arch/h8300/include/asm/signal.h
@@ -76,6 +76,7 @@ typedef unsigned long sigset_t;
  * SA_ONSTACK indicates that a registered stack_t will be used.
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_CLDREBOOT         flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -86,6 +87,7 @@ typedef unsigned long sigset_t;
 #define SA_NOCLDSTOP	0x00000001
 #define SA_NOCLDWAIT	0x00000002 /* not supported yet */
 #define SA_SIGINFO	0x00000004
+#define SA_CLDREBOOT    0x00000008
 #define SA_ONSTACK	0x08000000
 #define SA_RESTART	0x10000000
 #define SA_NODEFER	0x40000000
diff --git a/arch/ia64/include/asm/signal.h b/arch/ia64/include/asm/signal.h
index b166248..30245a1 100644
--- a/arch/ia64/include/asm/signal.h
+++ b/arch/ia64/include/asm/signal.h
@@ -58,6 +58,7 @@
  * SA_ONSTACK indicates that a registered stack_t will be used.
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_CLDREBOOT flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -68,6 +69,7 @@
 #define SA_NOCLDSTOP	0x00000001
 #define SA_NOCLDWAIT	0x00000002
 #define SA_SIGINFO	0x00000004
+#define SA_CLDREBOOT    0x00000008
 #define SA_ONSTACK	0x08000000
 #define SA_RESTART	0x10000000
 #define SA_NODEFER	0x40000000
diff --git a/arch/m32r/include/asm/signal.h b/arch/m32r/include/asm/signal.h
index b2eeb0d..c547212 100644
--- a/arch/m32r/include/asm/signal.h
+++ b/arch/m32r/include/asm/signal.h
@@ -78,6 +78,7 @@ typedef unsigned long sigset_t;
  * SA_ONSTACK indicates that a registered stack_t will be used.
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_CLDREBOOT         flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -88,6 +89,7 @@ typedef unsigned long sigset_t;
 #define SA_NOCLDSTOP	0x00000001u
 #define SA_NOCLDWAIT	0x00000002u
 #define SA_SIGINFO	0x00000004u
+#define SA_CLDREBOOT    0x00000008u
 #define SA_ONSTACK	0x08000000u
 #define SA_RESTART	0x10000000u
 #define SA_NODEFER	0x40000000u
diff --git a/arch/m68k/include/asm/signal.h b/arch/m68k/include/asm/signal.h
index 60e8866..14334b2 100644
--- a/arch/m68k/include/asm/signal.h
+++ b/arch/m68k/include/asm/signal.h
@@ -76,6 +76,7 @@ typedef unsigned long sigset_t;
  * SA_ONSTACK indicates that a registered stack_t will be used.
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_CLDREBOOT flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -86,6 +87,7 @@ typedef unsigned long sigset_t;
 #define SA_NOCLDSTOP	0x00000001
 #define SA_NOCLDWAIT	0x00000002
 #define SA_SIGINFO	0x00000004
+#define SA_CLDREBOOT    0x00000008
 #define SA_ONSTACK	0x08000000
 #define SA_RESTART	0x10000000
 #define SA_NODEFER	0x40000000
diff --git a/arch/mips/include/asm/signal.h b/arch/mips/include/asm/signal.h
index c783f36..acf7494 100644
--- a/arch/mips/include/asm/signal.h
+++ b/arch/mips/include/asm/signal.h
@@ -66,6 +66,7 @@ typedef unsigned long old_sigset_t;		/* at least 32 bits */
  * SA_ONSTACK indicates that a registered stack_t will be used.
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_CLDREBOOT flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -80,6 +81,7 @@ typedef unsigned long old_sigset_t;		/* at least 32 bits */
 #define SA_NODEFER	0x40000000
 #define SA_NOCLDWAIT	0x00010000
 #define SA_NOCLDSTOP	0x00000001
+#define SA_CLDREBOOT    0x00000002
 
 #define SA_NOMASK	SA_NODEFER
 #define SA_ONESHOT	SA_RESETHAND
diff --git a/arch/mn10300/include/asm/signal.h b/arch/mn10300/include/asm/signal.h
index 1865d72..ad1f454 100644
--- a/arch/mn10300/include/asm/signal.h
+++ b/arch/mn10300/include/asm/signal.h
@@ -86,6 +86,7 @@ typedef unsigned long sigset_t;
  * SA_ONSTACK indicates that a registered stack_t will be used.
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_CLDREBOOT flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -96,6 +97,7 @@ typedef unsigned long sigset_t;
 #define SA_NOCLDSTOP	0x00000001U
 #define SA_NOCLDWAIT	0x00000002U
 #define SA_SIGINFO	0x00000004U
+#define SA_CLDREBOOT    0x00000008U
 #define SA_ONSTACK	0x08000000U
 #define SA_RESTART	0x10000000U
 #define SA_NODEFER	0x40000000U
diff --git a/arch/parisc/include/asm/signal.h b/arch/parisc/include/asm/signal.h
index c203563..85c2fec 100644
--- a/arch/parisc/include/asm/signal.h
+++ b/arch/parisc/include/asm/signal.h
@@ -50,6 +50,7 @@
  * SA_ONSTACK indicates that a registered stack_t will be used.
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_CLDREBOOT         flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -58,6 +59,7 @@
  * Unix names RESETHAND and NODEFER respectively.
  */
 #define SA_ONSTACK	0x00000001
+#define SA_CLDREBOOT    0x00000002
 #define SA_RESETHAND	0x00000004
 #define SA_NOCLDSTOP	0x00000008
 #define SA_SIGINFO	0x00000010
diff --git a/arch/powerpc/include/asm/signal.h b/arch/powerpc/include/asm/signal.h
index 3eb13be..f96e2d5 100644
--- a/arch/powerpc/include/asm/signal.h
+++ b/arch/powerpc/include/asm/signal.h
@@ -65,6 +65,7 @@ typedef struct {
  * SA_ONSTACK is not currently supported, but will allow sigaltstack(2).
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_CLDREBOOT flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -75,6 +76,7 @@ typedef struct {
 #define SA_NOCLDSTOP	0x00000001U
 #define SA_NOCLDWAIT	0x00000002U
 #define SA_SIGINFO	0x00000004U
+#define SA_CLDREBOOT    0x00000008U
 #define SA_ONSTACK	0x08000000U
 #define SA_RESTART	0x10000000U
 #define SA_NODEFER	0x40000000U
diff --git a/arch/s390/include/asm/signal.h b/arch/s390/include/asm/signal.h
index cdf5cb2..449a2dc 100644
--- a/arch/s390/include/asm/signal.h
+++ b/arch/s390/include/asm/signal.h
@@ -86,6 +86,7 @@ typedef unsigned long sigset_t;
  * SA_ONSTACK indicates that a registered stack_t will be used.
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_CLDREBOOT  flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -96,6 +97,7 @@ typedef unsigned long sigset_t;
 #define SA_NOCLDSTOP    0x00000001
 #define SA_NOCLDWAIT    0x00000002
 #define SA_SIGINFO      0x00000004
+#define SA_CLDREBOOT    0x00000008
 #define SA_ONSTACK      0x08000000
 #define SA_RESTART      0x10000000
 #define SA_NODEFER      0x40000000
diff --git a/arch/sparc/include/asm/signal.h b/arch/sparc/include/asm/signal.h
index e49b828..26cbd7e 100644
--- a/arch/sparc/include/asm/signal.h
+++ b/arch/sparc/include/asm/signal.h
@@ -146,7 +146,7 @@ struct sigstack {
 #define SA_NOMASK	0x20u
 #define SA_NOCLDWAIT    0x100u
 #define SA_SIGINFO      0x200u
-
+#define SA_CLDREBOOT    0x400u
 
 #define SIG_BLOCK          0x01	/* for blocking signals */
 #define SIG_UNBLOCK        0x02	/* for unblocking signals */
diff --git a/arch/x86/include/asm/signal.h b/arch/x86/include/asm/signal.h
index 598457c..83696a5 100644
--- a/arch/x86/include/asm/signal.h
+++ b/arch/x86/include/asm/signal.h
@@ -90,6 +90,7 @@ typedef unsigned long sigset_t;
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
+ * SA_CLDREBOOT flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_NODEFER prevents the current signal from being masked in the handler.
  *
  * SA_ONESHOT and SA_NOMASK are the historical Linux names for the Single
@@ -98,6 +99,7 @@ typedef unsigned long sigset_t;
 #define SA_NOCLDSTOP	0x00000001u
 #define SA_NOCLDWAIT	0x00000002u
 #define SA_SIGINFO	0x00000004u
+#define SA_CLDREBOOT    0x00000008u
 #define SA_ONSTACK	0x08000000u
 #define SA_RESTART	0x10000000u
 #define SA_NODEFER	0x40000000u
diff --git a/arch/xtensa/include/asm/signal.h b/arch/xtensa/include/asm/signal.h
index 633ba73..5961f12 100644
--- a/arch/xtensa/include/asm/signal.h
+++ b/arch/xtensa/include/asm/signal.h
@@ -77,6 +77,7 @@ typedef struct {
  * SA_ONSTACK indicates that a registered stack_t will be used.
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_CLDREBOOT flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -87,6 +88,7 @@ typedef struct {
 #define SA_NOCLDSTOP	0x00000001
 #define SA_NOCLDWAIT	0x00000002 /* not supported yet */
 #define SA_SIGINFO	0x00000004
+#define SA_CLDREBOOT    0x00000008
 #define SA_ONSTACK	0x08000000
 #define SA_RESTART	0x10000000
 #define SA_NODEFER	0x40000000
diff --git a/include/asm-generic/signal.h b/include/asm-generic/signal.h
index 555c0ae..504f910 100644
--- a/include/asm-generic/signal.h
+++ b/include/asm-generic/signal.h
@@ -57,6 +57,7 @@
  * SA_ONSTACK indicates that a registered stack_t will be used.
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_REBOOT flag to turn on SIGCHLD when pid namespace reboots
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -67,6 +68,7 @@
 #define SA_NOCLDSTOP	0x00000001
 #define SA_NOCLDWAIT	0x00000002
 #define SA_SIGINFO	0x00000004
+#define SA_CLDREBOOT    0x00000004
 #define SA_ONSTACK	0x08000000
 #define SA_RESTART	0x10000000
 #define SA_NODEFER	0x40000000
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 1/2] add SA_CLDREBOOT flag
  2011-08-11 20:23 [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot Daniel Lezcano
       [not found] ` <1313094241-3674-1-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
@ 2011-08-11 20:24 ` Daniel Lezcano
       [not found]   ` <1313094241-3674-2-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
  2011-08-14 16:15   ` Oleg Nesterov
  2011-08-11 20:24 ` [PATCH 2/2] Notify container-init parent a 'reboot' occured Daniel Lezcano
  2011-08-14 16:17 ` [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot Oleg Nesterov
  3 siblings, 2 replies; 64+ messages in thread
From: Daniel Lezcano @ 2011-08-11 20:24 UTC (permalink / raw)
  To: akpm; +Cc: oleg, bonbons, containers, linux-kernel, serge

Add the SA_CLDREBOOT flag to enable the CLD_REBOOT on SIGCHLD
when a process of a child pid namespace calls the reboot syscall.

Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
---
 arch/alpha/include/asm/signal.h   |    2 ++
 arch/arm/include/asm/signal.h     |    2 ++
 arch/avr32/include/asm/signal.h   |    2 ++
 arch/cris/include/asm/signal.h    |    2 ++
 arch/h8300/include/asm/signal.h   |    2 ++
 arch/ia64/include/asm/signal.h    |    2 ++
 arch/m32r/include/asm/signal.h    |    2 ++
 arch/m68k/include/asm/signal.h    |    2 ++
 arch/mips/include/asm/signal.h    |    2 ++
 arch/mn10300/include/asm/signal.h |    2 ++
 arch/parisc/include/asm/signal.h  |    2 ++
 arch/powerpc/include/asm/signal.h |    2 ++
 arch/s390/include/asm/signal.h    |    2 ++
 arch/sparc/include/asm/signal.h   |    2 +-
 arch/x86/include/asm/signal.h     |    2 ++
 arch/xtensa/include/asm/signal.h  |    2 ++
 include/asm-generic/signal.h      |    2 ++
 17 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/arch/alpha/include/asm/signal.h b/arch/alpha/include/asm/signal.h
index a938830..40a2285 100644
--- a/arch/alpha/include/asm/signal.h
+++ b/arch/alpha/include/asm/signal.h
@@ -82,6 +82,7 @@ typedef unsigned long sigset_t;
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
+ * SA_CLDREBOOT flag to turn on SIGCHLD when pid namespace calls reboot
  *
  * SA_ONESHOT and SA_NOMASK are the historical Linux names for the Single
  * Unix names RESETHAND and NODEFER respectively.
@@ -94,6 +95,7 @@ typedef unsigned long sigset_t;
 #define SA_RESETHAND	0x00000010
 #define SA_NOCLDWAIT	0x00000020
 #define SA_SIGINFO	0x00000040
+#define SA_CLDREBOOT    0x00000080
 
 #define SA_ONESHOT	SA_RESETHAND
 #define SA_NOMASK	SA_NODEFER
diff --git a/arch/arm/include/asm/signal.h b/arch/arm/include/asm/signal.h
index 43ba0fb..2b585ec 100644
--- a/arch/arm/include/asm/signal.h
+++ b/arch/arm/include/asm/signal.h
@@ -78,6 +78,7 @@ typedef unsigned long sigset_t;
  * SA_NOCLDSTOP		flag to turn off SIGCHLD when children stop.
  * SA_NOCLDWAIT		flag on SIGCHLD to inhibit zombies.
  * SA_SIGINFO		deliver the signal with SIGINFO structs
+ * SA_CLDREBOOT         flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_THIRTYTWO		delivers the signal in 32-bit mode, even if the task 
  *			is running in 26-bit.
  * SA_ONSTACK		allows alternate signal stacks (see sigaltstack(2)).
@@ -91,6 +92,7 @@ typedef unsigned long sigset_t;
 #define SA_NOCLDSTOP	0x00000001
 #define SA_NOCLDWAIT	0x00000002
 #define SA_SIGINFO	0x00000004
+#define SA_CLDREBOOT    0x00000008
 #define SA_THIRTYTWO	0x02000000
 #define SA_RESTORER	0x04000000
 #define SA_ONSTACK	0x08000000
diff --git a/arch/avr32/include/asm/signal.h b/arch/avr32/include/asm/signal.h
index 8790dfc..a025546 100644
--- a/arch/avr32/include/asm/signal.h
+++ b/arch/avr32/include/asm/signal.h
@@ -83,6 +83,7 @@ typedef unsigned long sigset_t;
  * SA_NOCLDSTOP		flag to turn off SIGCHLD when children stop.
  * SA_NOCLDWAIT		flag on SIGCHLD to inhibit zombies.
  * SA_SIGINFO		deliver the signal with SIGINFO structs
+ * SA_CLDREBOOT         flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_ONSTACK		indicates that a registered stack_t will be used.
  * SA_RESTART		flag to get restarting signals (which were the default long ago)
  * SA_NODEFER		prevents the current signal from being masked in the handler.
@@ -94,6 +95,7 @@ typedef unsigned long sigset_t;
 #define SA_NOCLDSTOP	0x00000001
 #define SA_NOCLDWAIT	0x00000002
 #define SA_SIGINFO	0x00000004
+#define SA_CLDREBOOT    0x00000008
 #define SA_RESTORER	0x04000000
 #define SA_ONSTACK	0x08000000
 #define SA_RESTART	0x10000000
diff --git a/arch/cris/include/asm/signal.h b/arch/cris/include/asm/signal.h
index ea6af9a..fa804f0 100644
--- a/arch/cris/include/asm/signal.h
+++ b/arch/cris/include/asm/signal.h
@@ -76,6 +76,7 @@ typedef unsigned long sigset_t;
  * SA_ONSTACK indicates that a registered stack_t will be used.
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_CLDREBOOT flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -87,6 +88,7 @@ typedef unsigned long sigset_t;
 #define SA_NOCLDSTOP	0x00000001u
 #define SA_NOCLDWAIT	0x00000002u
 #define SA_SIGINFO	0x00000004u
+#define SA_CLDREBOOT    0x00000008u
 #define SA_ONSTACK	0x08000000u
 #define SA_RESTART	0x10000000u
 #define SA_NODEFER	0x40000000u
diff --git a/arch/h8300/include/asm/signal.h b/arch/h8300/include/asm/signal.h
index fd8b66e..13b4b5a 100644
--- a/arch/h8300/include/asm/signal.h
+++ b/arch/h8300/include/asm/signal.h
@@ -76,6 +76,7 @@ typedef unsigned long sigset_t;
  * SA_ONSTACK indicates that a registered stack_t will be used.
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_CLDREBOOT         flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -86,6 +87,7 @@ typedef unsigned long sigset_t;
 #define SA_NOCLDSTOP	0x00000001
 #define SA_NOCLDWAIT	0x00000002 /* not supported yet */
 #define SA_SIGINFO	0x00000004
+#define SA_CLDREBOOT    0x00000008
 #define SA_ONSTACK	0x08000000
 #define SA_RESTART	0x10000000
 #define SA_NODEFER	0x40000000
diff --git a/arch/ia64/include/asm/signal.h b/arch/ia64/include/asm/signal.h
index b166248..30245a1 100644
--- a/arch/ia64/include/asm/signal.h
+++ b/arch/ia64/include/asm/signal.h
@@ -58,6 +58,7 @@
  * SA_ONSTACK indicates that a registered stack_t will be used.
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_CLDREBOOT flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -68,6 +69,7 @@
 #define SA_NOCLDSTOP	0x00000001
 #define SA_NOCLDWAIT	0x00000002
 #define SA_SIGINFO	0x00000004
+#define SA_CLDREBOOT    0x00000008
 #define SA_ONSTACK	0x08000000
 #define SA_RESTART	0x10000000
 #define SA_NODEFER	0x40000000
diff --git a/arch/m32r/include/asm/signal.h b/arch/m32r/include/asm/signal.h
index b2eeb0d..c547212 100644
--- a/arch/m32r/include/asm/signal.h
+++ b/arch/m32r/include/asm/signal.h
@@ -78,6 +78,7 @@ typedef unsigned long sigset_t;
  * SA_ONSTACK indicates that a registered stack_t will be used.
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_CLDREBOOT         flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -88,6 +89,7 @@ typedef unsigned long sigset_t;
 #define SA_NOCLDSTOP	0x00000001u
 #define SA_NOCLDWAIT	0x00000002u
 #define SA_SIGINFO	0x00000004u
+#define SA_CLDREBOOT    0x00000008u
 #define SA_ONSTACK	0x08000000u
 #define SA_RESTART	0x10000000u
 #define SA_NODEFER	0x40000000u
diff --git a/arch/m68k/include/asm/signal.h b/arch/m68k/include/asm/signal.h
index 60e8866..14334b2 100644
--- a/arch/m68k/include/asm/signal.h
+++ b/arch/m68k/include/asm/signal.h
@@ -76,6 +76,7 @@ typedef unsigned long sigset_t;
  * SA_ONSTACK indicates that a registered stack_t will be used.
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_CLDREBOOT flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -86,6 +87,7 @@ typedef unsigned long sigset_t;
 #define SA_NOCLDSTOP	0x00000001
 #define SA_NOCLDWAIT	0x00000002
 #define SA_SIGINFO	0x00000004
+#define SA_CLDREBOOT    0x00000008
 #define SA_ONSTACK	0x08000000
 #define SA_RESTART	0x10000000
 #define SA_NODEFER	0x40000000
diff --git a/arch/mips/include/asm/signal.h b/arch/mips/include/asm/signal.h
index c783f36..acf7494 100644
--- a/arch/mips/include/asm/signal.h
+++ b/arch/mips/include/asm/signal.h
@@ -66,6 +66,7 @@ typedef unsigned long old_sigset_t;		/* at least 32 bits */
  * SA_ONSTACK indicates that a registered stack_t will be used.
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_CLDREBOOT flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -80,6 +81,7 @@ typedef unsigned long old_sigset_t;		/* at least 32 bits */
 #define SA_NODEFER	0x40000000
 #define SA_NOCLDWAIT	0x00010000
 #define SA_NOCLDSTOP	0x00000001
+#define SA_CLDREBOOT    0x00000002
 
 #define SA_NOMASK	SA_NODEFER
 #define SA_ONESHOT	SA_RESETHAND
diff --git a/arch/mn10300/include/asm/signal.h b/arch/mn10300/include/asm/signal.h
index 1865d72..ad1f454 100644
--- a/arch/mn10300/include/asm/signal.h
+++ b/arch/mn10300/include/asm/signal.h
@@ -86,6 +86,7 @@ typedef unsigned long sigset_t;
  * SA_ONSTACK indicates that a registered stack_t will be used.
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_CLDREBOOT flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -96,6 +97,7 @@ typedef unsigned long sigset_t;
 #define SA_NOCLDSTOP	0x00000001U
 #define SA_NOCLDWAIT	0x00000002U
 #define SA_SIGINFO	0x00000004U
+#define SA_CLDREBOOT    0x00000008U
 #define SA_ONSTACK	0x08000000U
 #define SA_RESTART	0x10000000U
 #define SA_NODEFER	0x40000000U
diff --git a/arch/parisc/include/asm/signal.h b/arch/parisc/include/asm/signal.h
index c203563..85c2fec 100644
--- a/arch/parisc/include/asm/signal.h
+++ b/arch/parisc/include/asm/signal.h
@@ -50,6 +50,7 @@
  * SA_ONSTACK indicates that a registered stack_t will be used.
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_CLDREBOOT         flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -58,6 +59,7 @@
  * Unix names RESETHAND and NODEFER respectively.
  */
 #define SA_ONSTACK	0x00000001
+#define SA_CLDREBOOT    0x00000002
 #define SA_RESETHAND	0x00000004
 #define SA_NOCLDSTOP	0x00000008
 #define SA_SIGINFO	0x00000010
diff --git a/arch/powerpc/include/asm/signal.h b/arch/powerpc/include/asm/signal.h
index 3eb13be..f96e2d5 100644
--- a/arch/powerpc/include/asm/signal.h
+++ b/arch/powerpc/include/asm/signal.h
@@ -65,6 +65,7 @@ typedef struct {
  * SA_ONSTACK is not currently supported, but will allow sigaltstack(2).
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_CLDREBOOT flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -75,6 +76,7 @@ typedef struct {
 #define SA_NOCLDSTOP	0x00000001U
 #define SA_NOCLDWAIT	0x00000002U
 #define SA_SIGINFO	0x00000004U
+#define SA_CLDREBOOT    0x00000008U
 #define SA_ONSTACK	0x08000000U
 #define SA_RESTART	0x10000000U
 #define SA_NODEFER	0x40000000U
diff --git a/arch/s390/include/asm/signal.h b/arch/s390/include/asm/signal.h
index cdf5cb2..449a2dc 100644
--- a/arch/s390/include/asm/signal.h
+++ b/arch/s390/include/asm/signal.h
@@ -86,6 +86,7 @@ typedef unsigned long sigset_t;
  * SA_ONSTACK indicates that a registered stack_t will be used.
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_CLDREBOOT  flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -96,6 +97,7 @@ typedef unsigned long sigset_t;
 #define SA_NOCLDSTOP    0x00000001
 #define SA_NOCLDWAIT    0x00000002
 #define SA_SIGINFO      0x00000004
+#define SA_CLDREBOOT    0x00000008
 #define SA_ONSTACK      0x08000000
 #define SA_RESTART      0x10000000
 #define SA_NODEFER      0x40000000
diff --git a/arch/sparc/include/asm/signal.h b/arch/sparc/include/asm/signal.h
index e49b828..26cbd7e 100644
--- a/arch/sparc/include/asm/signal.h
+++ b/arch/sparc/include/asm/signal.h
@@ -146,7 +146,7 @@ struct sigstack {
 #define SA_NOMASK	0x20u
 #define SA_NOCLDWAIT    0x100u
 #define SA_SIGINFO      0x200u
-
+#define SA_CLDREBOOT    0x400u
 
 #define SIG_BLOCK          0x01	/* for blocking signals */
 #define SIG_UNBLOCK        0x02	/* for unblocking signals */
diff --git a/arch/x86/include/asm/signal.h b/arch/x86/include/asm/signal.h
index 598457c..83696a5 100644
--- a/arch/x86/include/asm/signal.h
+++ b/arch/x86/include/asm/signal.h
@@ -90,6 +90,7 @@ typedef unsigned long sigset_t;
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
+ * SA_CLDREBOOT flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_NODEFER prevents the current signal from being masked in the handler.
  *
  * SA_ONESHOT and SA_NOMASK are the historical Linux names for the Single
@@ -98,6 +99,7 @@ typedef unsigned long sigset_t;
 #define SA_NOCLDSTOP	0x00000001u
 #define SA_NOCLDWAIT	0x00000002u
 #define SA_SIGINFO	0x00000004u
+#define SA_CLDREBOOT    0x00000008u
 #define SA_ONSTACK	0x08000000u
 #define SA_RESTART	0x10000000u
 #define SA_NODEFER	0x40000000u
diff --git a/arch/xtensa/include/asm/signal.h b/arch/xtensa/include/asm/signal.h
index 633ba73..5961f12 100644
--- a/arch/xtensa/include/asm/signal.h
+++ b/arch/xtensa/include/asm/signal.h
@@ -77,6 +77,7 @@ typedef struct {
  * SA_ONSTACK indicates that a registered stack_t will be used.
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_CLDREBOOT flag to turn on SIGCHLD when pid namespace calls reboot
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -87,6 +88,7 @@ typedef struct {
 #define SA_NOCLDSTOP	0x00000001
 #define SA_NOCLDWAIT	0x00000002 /* not supported yet */
 #define SA_SIGINFO	0x00000004
+#define SA_CLDREBOOT    0x00000008
 #define SA_ONSTACK	0x08000000
 #define SA_RESTART	0x10000000
 #define SA_NODEFER	0x40000000
diff --git a/include/asm-generic/signal.h b/include/asm-generic/signal.h
index 555c0ae..504f910 100644
--- a/include/asm-generic/signal.h
+++ b/include/asm-generic/signal.h
@@ -57,6 +57,7 @@
  * SA_ONSTACK indicates that a registered stack_t will be used.
  * SA_RESTART flag to get restarting signals (which were the default long ago)
  * SA_NOCLDSTOP flag to turn off SIGCHLD when children stop.
+ * SA_REBOOT flag to turn on SIGCHLD when pid namespace reboots
  * SA_RESETHAND clears the handler when the signal is delivered.
  * SA_NOCLDWAIT flag on SIGCHLD to inhibit zombies.
  * SA_NODEFER prevents the current signal from being masked in the handler.
@@ -67,6 +68,7 @@
 #define SA_NOCLDSTOP	0x00000001
 #define SA_NOCLDWAIT	0x00000002
 #define SA_SIGINFO	0x00000004
+#define SA_CLDREBOOT    0x00000004
 #define SA_ONSTACK	0x08000000
 #define SA_RESTART	0x10000000
 #define SA_NODEFER	0x40000000
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 2/2] Notify container-init parent a 'reboot' occured
       [not found] ` <1313094241-3674-1-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
  2011-08-11 20:24   ` [PATCH 1/2] add SA_CLDREBOOT flag Daniel Lezcano
@ 2011-08-11 20:24   ` Daniel Lezcano
  2011-08-14 16:17   ` [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot Oleg Nesterov
  2 siblings, 0 replies; 64+ messages in thread
From: Daniel Lezcano @ 2011-08-11 20:24 UTC (permalink / raw)
  To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy,
	oleg-6lXkIZvqkOAvJsYlp49lxw, linux-kernel-u79uwXL29TY76Z2rM5mHXA

When the reboot syscall is called and the pid namespace where the calling
process belongs to is not from the init pidns, we send a SIGCHLD with CLD_REBOOTED
to the parent of this pid namespace.

Signed-off-by: Daniel Lezcano <daniel.lezcano-GANU6spQydw@public.gmane.org>
---
 include/asm-generic/siginfo.h |    3 ++-
 include/linux/sched.h         |    1 +
 kernel/signal.c               |   40 ++++++++++++++++++++++++++++++++++++++++
 kernel/sys.c                  |   20 ++++++++++++++++++--
 4 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/include/asm-generic/siginfo.h b/include/asm-generic/siginfo.h
index 0dd4e87..9bff4a2 100644
--- a/include/asm-generic/siginfo.h
+++ b/include/asm-generic/siginfo.h
@@ -218,7 +218,8 @@ typedef struct siginfo {
 #define CLD_TRAPPED	(__SI_CHLD|4)	/* traced child has trapped */
 #define CLD_STOPPED	(__SI_CHLD|5)	/* child has stopped */
 #define CLD_CONTINUED	(__SI_CHLD|6)	/* stopped child has continued */
-#define NSIGCHLD	6
+#define CLD_REBOOTED    (__SI_CHLD|7)   /* process was killed by a reboot */
+#define NSIGCHLD	7
 
 /*
  * SIGPOLL si_codes
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 20b03bf..c62dc9e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2170,6 +2170,7 @@ extern int kill_pgrp(struct pid *pid, int sig, int priv);
 extern int kill_pid(struct pid *pid, int sig, int priv);
 extern int kill_proc_info(int, struct siginfo *, pid_t);
 extern __must_check bool do_notify_parent(struct task_struct *, int);
+extern void do_notify_parent_cldreboot(struct task_struct *, int, char *);
 extern void __wake_up_parent(struct task_struct *p, struct task_struct *parent);
 extern void force_sig(int, struct task_struct *);
 extern int send_sig(int, struct task_struct *, int);
diff --git a/kernel/signal.c b/kernel/signal.c
index 291c970..7d3d44c 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1668,6 +1668,46 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	return autoreap;
 }
 
+void do_notify_parent_cldreboot(struct task_struct *tsk, int why, char *buffer)
+{
+	struct siginfo info = { };
+	struct task_struct *parent;
+	struct sighand_struct *sighand;
+	unsigned long flags;
+
+	if (tsk->ptrace)
+		parent = tsk->parent;
+	else {
+		tsk = tsk->group_leader;
+		parent = tsk->real_parent;
+	}
+
+	info.si_signo = SIGCHLD;
+	info.si_errno = 0;
+	info.si_status = why;
+
+	rcu_read_lock();
+	info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns);
+	info.si_uid = __task_cred(tsk)->uid;
+	rcu_read_unlock();
+
+	info.si_utime = cputime_to_clock_t(tsk->utime);
+	info.si_stime = cputime_to_clock_t(tsk->stime);
+
+	info.si_code = CLD_REBOOTED;
+
+	sighand = parent->sighand;
+	spin_lock_irqsave(&sighand->siglock, flags);
+	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
+	    sighand->action[SIGCHLD-1].sa.sa_flags & SA_CLDREBOOT)
+		__group_send_sig_info(SIGCHLD, &info, parent);
+	/*
+	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
+	 */
+	__wake_up_parent(tsk, parent);
+	spin_unlock_irqrestore(&sighand->siglock, flags);
+}
+
 /**
  * do_notify_parent_cldstop - notify parent of stopped/continued state change
  * @tsk: task reporting the state change
diff --git a/kernel/sys.c b/kernel/sys.c
index a101ba3..8f7a9ed 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -42,6 +42,7 @@
 #include <linux/syscalls.h>
 #include <linux/kprobes.h>
 #include <linux/user_namespace.h>
+#include <linux/sched.h>
 
 #include <linux/kmsg_dump.h>
 
@@ -411,6 +412,13 @@ void kernel_power_off(void)
 }
 EXPORT_SYMBOL_GPL(kernel_power_off);
 
+static void pid_namespace_reboot(struct pid_namespace *pid_ns,
+				int cmd, char *buffer)
+{
+	struct task_struct *tsk = pid_ns->child_reaper;
+	do_notify_parent_cldreboot(tsk, cmd, buffer);
+}
+
 static DEFINE_MUTEX(reboot_mutex);
 
 /*
@@ -426,10 +434,18 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
 {
 	char buffer[256];
 	int ret = 0;
+	struct pid_namespace *pid_ns = current->nsproxy->pid_ns;
+
+        /* We only trust the superuser with rebooting the system. */
+	if (!capable(CAP_SYS_BOOT)) {
+		/* If we are not in the initial pid namespace, we send a signal
+		 * to the parent of this init pid namespace, notifying a shutdown
+		 * occured */
+		if (pid_ns != &init_pid_ns)
+			pid_namespace_reboot(pid_ns, cmd, buffer);
 
-	/* We only trust the superuser with rebooting the system. */
-	if (!capable(CAP_SYS_BOOT))
 		return -EPERM;
+	}
 
 	/* For safety, we require "magic" arguments. */
 	if (magic1 != LINUX_REBOOT_MAGIC1 ||
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 2/2] Notify container-init parent a 'reboot' occured
  2011-08-11 20:23 [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot Daniel Lezcano
       [not found] ` <1313094241-3674-1-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
  2011-08-11 20:24 ` [PATCH 1/2] add SA_CLDREBOOT flag Daniel Lezcano
@ 2011-08-11 20:24 ` Daniel Lezcano
  2011-08-11 21:09   ` Serge Hallyn
                     ` (3 more replies)
  2011-08-14 16:17 ` [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot Oleg Nesterov
  3 siblings, 4 replies; 64+ messages in thread
From: Daniel Lezcano @ 2011-08-11 20:24 UTC (permalink / raw)
  To: akpm; +Cc: oleg, bonbons, containers, linux-kernel, serge

When the reboot syscall is called and the pid namespace where the calling
process belongs to is not from the init pidns, we send a SIGCHLD with CLD_REBOOTED
to the parent of this pid namespace.

Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
---
 include/asm-generic/siginfo.h |    3 ++-
 include/linux/sched.h         |    1 +
 kernel/signal.c               |   40 ++++++++++++++++++++++++++++++++++++++++
 kernel/sys.c                  |   20 ++++++++++++++++++--
 4 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/include/asm-generic/siginfo.h b/include/asm-generic/siginfo.h
index 0dd4e87..9bff4a2 100644
--- a/include/asm-generic/siginfo.h
+++ b/include/asm-generic/siginfo.h
@@ -218,7 +218,8 @@ typedef struct siginfo {
 #define CLD_TRAPPED	(__SI_CHLD|4)	/* traced child has trapped */
 #define CLD_STOPPED	(__SI_CHLD|5)	/* child has stopped */
 #define CLD_CONTINUED	(__SI_CHLD|6)	/* stopped child has continued */
-#define NSIGCHLD	6
+#define CLD_REBOOTED    (__SI_CHLD|7)   /* process was killed by a reboot */
+#define NSIGCHLD	7
 
 /*
  * SIGPOLL si_codes
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 20b03bf..c62dc9e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2170,6 +2170,7 @@ extern int kill_pgrp(struct pid *pid, int sig, int priv);
 extern int kill_pid(struct pid *pid, int sig, int priv);
 extern int kill_proc_info(int, struct siginfo *, pid_t);
 extern __must_check bool do_notify_parent(struct task_struct *, int);
+extern void do_notify_parent_cldreboot(struct task_struct *, int, char *);
 extern void __wake_up_parent(struct task_struct *p, struct task_struct *parent);
 extern void force_sig(int, struct task_struct *);
 extern int send_sig(int, struct task_struct *, int);
diff --git a/kernel/signal.c b/kernel/signal.c
index 291c970..7d3d44c 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1668,6 +1668,46 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	return autoreap;
 }
 
+void do_notify_parent_cldreboot(struct task_struct *tsk, int why, char *buffer)
+{
+	struct siginfo info = { };
+	struct task_struct *parent;
+	struct sighand_struct *sighand;
+	unsigned long flags;
+
+	if (tsk->ptrace)
+		parent = tsk->parent;
+	else {
+		tsk = tsk->group_leader;
+		parent = tsk->real_parent;
+	}
+
+	info.si_signo = SIGCHLD;
+	info.si_errno = 0;
+	info.si_status = why;
+
+	rcu_read_lock();
+	info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns);
+	info.si_uid = __task_cred(tsk)->uid;
+	rcu_read_unlock();
+
+	info.si_utime = cputime_to_clock_t(tsk->utime);
+	info.si_stime = cputime_to_clock_t(tsk->stime);
+
+	info.si_code = CLD_REBOOTED;
+
+	sighand = parent->sighand;
+	spin_lock_irqsave(&sighand->siglock, flags);
+	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
+	    sighand->action[SIGCHLD-1].sa.sa_flags & SA_CLDREBOOT)
+		__group_send_sig_info(SIGCHLD, &info, parent);
+	/*
+	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
+	 */
+	__wake_up_parent(tsk, parent);
+	spin_unlock_irqrestore(&sighand->siglock, flags);
+}
+
 /**
  * do_notify_parent_cldstop - notify parent of stopped/continued state change
  * @tsk: task reporting the state change
diff --git a/kernel/sys.c b/kernel/sys.c
index a101ba3..8f7a9ed 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -42,6 +42,7 @@
 #include <linux/syscalls.h>
 #include <linux/kprobes.h>
 #include <linux/user_namespace.h>
+#include <linux/sched.h>
 
 #include <linux/kmsg_dump.h>
 
@@ -411,6 +412,13 @@ void kernel_power_off(void)
 }
 EXPORT_SYMBOL_GPL(kernel_power_off);
 
+static void pid_namespace_reboot(struct pid_namespace *pid_ns,
+				int cmd, char *buffer)
+{
+	struct task_struct *tsk = pid_ns->child_reaper;
+	do_notify_parent_cldreboot(tsk, cmd, buffer);
+}
+
 static DEFINE_MUTEX(reboot_mutex);
 
 /*
@@ -426,10 +434,18 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
 {
 	char buffer[256];
 	int ret = 0;
+	struct pid_namespace *pid_ns = current->nsproxy->pid_ns;
+
+        /* We only trust the superuser with rebooting the system. */
+	if (!capable(CAP_SYS_BOOT)) {
+		/* If we are not in the initial pid namespace, we send a signal
+		 * to the parent of this init pid namespace, notifying a shutdown
+		 * occured */
+		if (pid_ns != &init_pid_ns)
+			pid_namespace_reboot(pid_ns, cmd, buffer);
 
-	/* We only trust the superuser with rebooting the system. */
-	if (!capable(CAP_SYS_BOOT))
 		return -EPERM;
+	}
 
 	/* For safety, we require "magic" arguments. */
 	if (magic1 != LINUX_REBOOT_MAGIC1 ||
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/2] Notify container-init parent a 'reboot' occured
       [not found]   ` <1313094241-3674-3-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
@ 2011-08-11 21:09     ` Serge Hallyn
  2011-08-13  0:19     ` Matt Helsley
  2011-08-14 16:01     ` Oleg Nesterov
  2 siblings, 0 replies; 64+ messages in thread
From: Serge Hallyn @ 2011-08-11 21:09 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	oleg-6lXkIZvqkOAvJsYlp49lxw

Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org):
> When the reboot syscall is called and the pid namespace where the calling
> process belongs to is not from the init pidns, we send a SIGCHLD with CLD_REBOOTED
> to the parent of this pid namespace.
> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano-GANU6spQydw@public.gmane.org>

...

> +void do_notify_parent_cldreboot(struct task_struct *tsk, int why, char *buffer)
> +{
> +	struct siginfo info = { };
> +	struct task_struct *parent;
> +	struct sighand_struct *sighand;
> +	unsigned long flags;
> +
> +	if (tsk->ptrace)
> +		parent = tsk->parent;
> +	else {
> +		tsk = tsk->group_leader;
> +		parent = tsk->real_parent;
> +	}
> +
> +	info.si_signo = SIGCHLD;
> +	info.si_errno = 0;
> +	info.si_status = why;
> +
> +	rcu_read_lock();
> +	info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns);
> +	info.si_uid = __task_cred(tsk)->uid;
	
	This eventually should become:

	info.si_uid = user_ns_map_uid(task_cred_xxx(t, user_ns),
	                              current_cred(), current_uid());

	I've got a first-stab patch at converting the rest of
	kernel/signal.c in http://kernel.ubuntu.com/git?p=serge/userns-2.6.git

> +	rcu_read_unlock();
> +
> +	info.si_utime = cputime_to_clock_t(tsk->utime);
> +	info.si_stime = cputime_to_clock_t(tsk->stime);
> +
> +	info.si_code = CLD_REBOOTED;
> +
> +	sighand = parent->sighand;
> +	spin_lock_irqsave(&sighand->siglock, flags);
> +	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
> +	    sighand->action[SIGCHLD-1].sa.sa_flags & SA_CLDREBOOT)
> +		__group_send_sig_info(SIGCHLD, &info, parent);
> +	/*
> +	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
> +	 */
> +	__wake_up_parent(tsk, parent);
> +	spin_unlock_irqrestore(&sighand->siglock, flags);
> +}

...

> @@ -426,10 +434,18 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
>  {
>  	char buffer[256];
>  	int ret = 0;
> +	struct pid_namespace *pid_ns = current->nsproxy->pid_ns;
> +
> +        /* We only trust the superuser with rebooting the system. */
> +	if (!capable(CAP_SYS_BOOT)) {

Doesn't this mean that an unprivileged task in a container can shut
down the container?

The pidns->user_ns patch I sent earlier today gives you what you need
so that you can add

		if (!ns_capable(current_pid_ns()->user_ns, CAP_SYS_BOOT)
			return -EPERM;

right here to prevent that.

> +		/* If we are not in the initial pid namespace, we send a signal
> +		 * to the parent of this init pid namespace, notifying a shutdown
> +		 * occured */
> +		if (pid_ns != &init_pid_ns)
> +			pid_namespace_reboot(pid_ns, cmd, buffer);
>  
> -	/* We only trust the superuser with rebooting the system. */
> -	if (!capable(CAP_SYS_BOOT))
>  		return -EPERM;
> +	}
>  
>  	/* For safety, we require "magic" arguments. */
>  	if (magic1 != LINUX_REBOOT_MAGIC1 ||
> -- 
> 1.7.4.1
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/2] Notify container-init parent a 'reboot' occured
  2011-08-11 20:24 ` [PATCH 2/2] Notify container-init parent a 'reboot' occured Daniel Lezcano
@ 2011-08-11 21:09   ` Serge Hallyn
  2011-08-11 21:30     ` Daniel Lezcano
  2011-08-11 21:30     ` Daniel Lezcano
       [not found]   ` <1313094241-3674-3-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 64+ messages in thread
From: Serge Hallyn @ 2011-08-11 21:09 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: akpm, containers, bonbons, oleg, linux-kernel

Quoting Daniel Lezcano (daniel.lezcano@free.fr):
> When the reboot syscall is called and the pid namespace where the calling
> process belongs to is not from the init pidns, we send a SIGCHLD with CLD_REBOOTED
> to the parent of this pid namespace.
> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>

...

> +void do_notify_parent_cldreboot(struct task_struct *tsk, int why, char *buffer)
> +{
> +	struct siginfo info = { };
> +	struct task_struct *parent;
> +	struct sighand_struct *sighand;
> +	unsigned long flags;
> +
> +	if (tsk->ptrace)
> +		parent = tsk->parent;
> +	else {
> +		tsk = tsk->group_leader;
> +		parent = tsk->real_parent;
> +	}
> +
> +	info.si_signo = SIGCHLD;
> +	info.si_errno = 0;
> +	info.si_status = why;
> +
> +	rcu_read_lock();
> +	info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns);
> +	info.si_uid = __task_cred(tsk)->uid;
	
	This eventually should become:

	info.si_uid = user_ns_map_uid(task_cred_xxx(t, user_ns),
	                              current_cred(), current_uid());

	I've got a first-stab patch at converting the rest of
	kernel/signal.c in http://kernel.ubuntu.com/git?p=serge/userns-2.6.git

> +	rcu_read_unlock();
> +
> +	info.si_utime = cputime_to_clock_t(tsk->utime);
> +	info.si_stime = cputime_to_clock_t(tsk->stime);
> +
> +	info.si_code = CLD_REBOOTED;
> +
> +	sighand = parent->sighand;
> +	spin_lock_irqsave(&sighand->siglock, flags);
> +	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
> +	    sighand->action[SIGCHLD-1].sa.sa_flags & SA_CLDREBOOT)
> +		__group_send_sig_info(SIGCHLD, &info, parent);
> +	/*
> +	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
> +	 */
> +	__wake_up_parent(tsk, parent);
> +	spin_unlock_irqrestore(&sighand->siglock, flags);
> +}

...

> @@ -426,10 +434,18 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
>  {
>  	char buffer[256];
>  	int ret = 0;
> +	struct pid_namespace *pid_ns = current->nsproxy->pid_ns;
> +
> +        /* We only trust the superuser with rebooting the system. */
> +	if (!capable(CAP_SYS_BOOT)) {

Doesn't this mean that an unprivileged task in a container can shut
down the container?

The pidns->user_ns patch I sent earlier today gives you what you need
so that you can add

		if (!ns_capable(current_pid_ns()->user_ns, CAP_SYS_BOOT)
			return -EPERM;

right here to prevent that.

> +		/* If we are not in the initial pid namespace, we send a signal
> +		 * to the parent of this init pid namespace, notifying a shutdown
> +		 * occured */
> +		if (pid_ns != &init_pid_ns)
> +			pid_namespace_reboot(pid_ns, cmd, buffer);
>  
> -	/* We only trust the superuser with rebooting the system. */
> -	if (!capable(CAP_SYS_BOOT))
>  		return -EPERM;
> +	}
>  
>  	/* For safety, we require "magic" arguments. */
>  	if (magic1 != LINUX_REBOOT_MAGIC1 ||
> -- 
> 1.7.4.1
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/2] Notify container-init parent a 'reboot' occured
  2011-08-11 21:09   ` Serge Hallyn
@ 2011-08-11 21:30     ` Daniel Lezcano
  2011-08-11 21:30     ` Daniel Lezcano
  1 sibling, 0 replies; 64+ messages in thread
From: Daniel Lezcano @ 2011-08-11 21:30 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	oleg-6lXkIZvqkOAvJsYlp49lxw

On 08/11/2011 11:09 PM, Serge Hallyn wrote:
> Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org):
>> When the reboot syscall is called and the pid namespace where the calling
>> process belongs to is not from the init pidns, we send a SIGCHLD with CLD_REBOOTED
>> to the parent of this pid namespace.
>>
>> Signed-off-by: Daniel Lezcano <daniel.lezcano-GANU6spQydw@public.gmane.org>
> ...
>
>> +void do_notify_parent_cldreboot(struct task_struct *tsk, int why, char *buffer)
>> +{
>> +	struct siginfo info = { };
>> +	struct task_struct *parent;
>> +	struct sighand_struct *sighand;
>> +	unsigned long flags;
>> +
>> +	if (tsk->ptrace)
>> +		parent = tsk->parent;
>> +	else {
>> +		tsk = tsk->group_leader;
>> +		parent = tsk->real_parent;
>> +	}
>> +
>> +	info.si_signo = SIGCHLD;
>> +	info.si_errno = 0;
>> +	info.si_status = why;
>> +
>> +	rcu_read_lock();
>> +	info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns);
>> +	info.si_uid = __task_cred(tsk)->uid;
> 	
> 	This eventually should become:
>
> 	info.si_uid = user_ns_map_uid(task_cred_xxx(t, user_ns),
> 	                              current_cred(), current_uid());
>
> 	I've got a first-stab patch at converting the rest of
> 	kernel/signal.c in http://kernel.ubuntu.com/git?p=serge/userns-2.6.git

Ok, thanks.

>> +	rcu_read_unlock();
>> +
>> +	info.si_utime = cputime_to_clock_t(tsk->utime);
>> +	info.si_stime = cputime_to_clock_t(tsk->stime);
>> +
>> +	info.si_code = CLD_REBOOTED;
>> +
>> +	sighand = parent->sighand;
>> +	spin_lock_irqsave(&sighand->siglock, flags);
>> +	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
>> +	    sighand->action[SIGCHLD-1].sa.sa_flags & SA_CLDREBOOT)
>> +		__group_send_sig_info(SIGCHLD, &info, parent);
>> +	/*
>> +	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
>> +	 */
>> +	__wake_up_parent(tsk, parent);
>> +	spin_unlock_irqrestore(&sighand->siglock, flags);
>> +}
> ...
>
>> @@ -426,10 +434,18 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
>>  {
>>  	char buffer[256];
>>  	int ret = 0;
>> +	struct pid_namespace *pid_ns = current->nsproxy->pid_ns;
>> +
>> +        /* We only trust the superuser with rebooting the system. */
>> +	if (!capable(CAP_SYS_BOOT)) {
> Doesn't this mean that an unprivileged task in a container can shut
> down the container?

Ha ha ! Right, good catch :)

Yes, rethinking about it, we can do what initially proposed Bruno by
just preventing to reboot when we are not in the init_pid_ns. Actually, 
the sys_reboot occurs after the services shutdown and "kill -1 SIGTERM"
and "kill -1 SIGKILL", and would not make sense to do that in a child
pid namespace, except if we are in a container where we don't want to
reboot :)

So IMO, it is safe to do:

	if (!ns_capable(current_pid_ns()->user_ns, CAP_SYS_BOOT))
 		return -EPERM;
	
	if (pid_ns != &init_pid_ns)
		return pid_namespace_reboot(pid_ns, cmd, buffer);


> The pidns->user_ns patch I sent earlier today gives you what you need
> so that you can add
>
> 		if (!ns_capable(current_pid_ns()->user_ns, CAP_SYS_BOOT)
> 			return -EPERM;
>
> right here to prevent that.
>
>> +		/* If we are not in the initial pid namespace, we send a signal
>> +		 * to the parent of this init pid namespace, notifying a shutdown
>> +		 * occured */
>> +		if (pid_ns != &init_pid_ns)
>> +			pid_namespace_reboot(pid_ns, cmd, buffer);
>>  
>> -	/* We only trust the superuser with rebooting the system. */
>> -	if (!capable(CAP_SYS_BOOT))
>>  		return -EPERM;
>> +	}
>>  
>>  	/* For safety, we require "magic" arguments. */
>>  	if (magic1 != LINUX_REBOOT_MAGIC1 ||
>> -- 
>> 1.7.4.1
>>
>> _______________________________________________
>> Containers mailing list
>> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
>> https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/2] Notify container-init parent a 'reboot' occured
  2011-08-11 21:09   ` Serge Hallyn
  2011-08-11 21:30     ` Daniel Lezcano
@ 2011-08-11 21:30     ` Daniel Lezcano
       [not found]       ` <4E444A04.3070403-GANU6spQydw@public.gmane.org>
  2011-08-11 21:50       ` Serge Hallyn
  1 sibling, 2 replies; 64+ messages in thread
From: Daniel Lezcano @ 2011-08-11 21:30 UTC (permalink / raw)
  To: Serge Hallyn; +Cc: akpm, containers, bonbons, oleg, linux-kernel

On 08/11/2011 11:09 PM, Serge Hallyn wrote:
> Quoting Daniel Lezcano (daniel.lezcano@free.fr):
>> When the reboot syscall is called and the pid namespace where the calling
>> process belongs to is not from the init pidns, we send a SIGCHLD with CLD_REBOOTED
>> to the parent of this pid namespace.
>>
>> Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
> ...
>
>> +void do_notify_parent_cldreboot(struct task_struct *tsk, int why, char *buffer)
>> +{
>> +	struct siginfo info = { };
>> +	struct task_struct *parent;
>> +	struct sighand_struct *sighand;
>> +	unsigned long flags;
>> +
>> +	if (tsk->ptrace)
>> +		parent = tsk->parent;
>> +	else {
>> +		tsk = tsk->group_leader;
>> +		parent = tsk->real_parent;
>> +	}
>> +
>> +	info.si_signo = SIGCHLD;
>> +	info.si_errno = 0;
>> +	info.si_status = why;
>> +
>> +	rcu_read_lock();
>> +	info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns);
>> +	info.si_uid = __task_cred(tsk)->uid;
> 	
> 	This eventually should become:
>
> 	info.si_uid = user_ns_map_uid(task_cred_xxx(t, user_ns),
> 	                              current_cred(), current_uid());
>
> 	I've got a first-stab patch at converting the rest of
> 	kernel/signal.c in http://kernel.ubuntu.com/git?p=serge/userns-2.6.git

Ok, thanks.

>> +	rcu_read_unlock();
>> +
>> +	info.si_utime = cputime_to_clock_t(tsk->utime);
>> +	info.si_stime = cputime_to_clock_t(tsk->stime);
>> +
>> +	info.si_code = CLD_REBOOTED;
>> +
>> +	sighand = parent->sighand;
>> +	spin_lock_irqsave(&sighand->siglock, flags);
>> +	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
>> +	    sighand->action[SIGCHLD-1].sa.sa_flags & SA_CLDREBOOT)
>> +		__group_send_sig_info(SIGCHLD, &info, parent);
>> +	/*
>> +	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
>> +	 */
>> +	__wake_up_parent(tsk, parent);
>> +	spin_unlock_irqrestore(&sighand->siglock, flags);
>> +}
> ...
>
>> @@ -426,10 +434,18 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
>>  {
>>  	char buffer[256];
>>  	int ret = 0;
>> +	struct pid_namespace *pid_ns = current->nsproxy->pid_ns;
>> +
>> +        /* We only trust the superuser with rebooting the system. */
>> +	if (!capable(CAP_SYS_BOOT)) {
> Doesn't this mean that an unprivileged task in a container can shut
> down the container?

Ha ha ! Right, good catch :)

Yes, rethinking about it, we can do what initially proposed Bruno by
just preventing to reboot when we are not in the init_pid_ns. Actually, 
the sys_reboot occurs after the services shutdown and "kill -1 SIGTERM"
and "kill -1 SIGKILL", and would not make sense to do that in a child
pid namespace, except if we are in a container where we don't want to
reboot :)

So IMO, it is safe to do:

	if (!ns_capable(current_pid_ns()->user_ns, CAP_SYS_BOOT))
 		return -EPERM;
	
	if (pid_ns != &init_pid_ns)
		return pid_namespace_reboot(pid_ns, cmd, buffer);


> The pidns->user_ns patch I sent earlier today gives you what you need
> so that you can add
>
> 		if (!ns_capable(current_pid_ns()->user_ns, CAP_SYS_BOOT)
> 			return -EPERM;
>
> right here to prevent that.
>
>> +		/* If we are not in the initial pid namespace, we send a signal
>> +		 * to the parent of this init pid namespace, notifying a shutdown
>> +		 * occured */
>> +		if (pid_ns != &init_pid_ns)
>> +			pid_namespace_reboot(pid_ns, cmd, buffer);
>>  
>> -	/* We only trust the superuser with rebooting the system. */
>> -	if (!capable(CAP_SYS_BOOT))
>>  		return -EPERM;
>> +	}
>>  
>>  	/* For safety, we require "magic" arguments. */
>>  	if (magic1 != LINUX_REBOOT_MAGIC1 ||
>> -- 
>> 1.7.4.1
>>
>> _______________________________________________
>> Containers mailing list
>> Containers@lists.linux-foundation.org
>> https://lists.linux-foundation.org/mailman/listinfo/containers


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/2] Notify container-init parent a 'reboot' occured
       [not found]       ` <4E444A04.3070403-GANU6spQydw@public.gmane.org>
@ 2011-08-11 21:50         ` Serge Hallyn
  2011-08-12 16:29           ` Serge Hallyn
  1 sibling, 0 replies; 64+ messages in thread
From: Serge Hallyn @ 2011-08-11 21:50 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	oleg-6lXkIZvqkOAvJsYlp49lxw

Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org):
> On 08/11/2011 11:09 PM, Serge Hallyn wrote:
> > Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org):
> >> When the reboot syscall is called and the pid namespace where the calling
> >> process belongs to is not from the init pidns, we send a SIGCHLD with CLD_REBOOTED
> >> to the parent of this pid namespace.
> >>
> >> Signed-off-by: Daniel Lezcano <daniel.lezcano-GANU6spQydw@public.gmane.org>
> > ...
> >
> >> +void do_notify_parent_cldreboot(struct task_struct *tsk, int why, char *buffer)
> >> +{
> >> +	struct siginfo info = { };
> >> +	struct task_struct *parent;
> >> +	struct sighand_struct *sighand;
> >> +	unsigned long flags;
> >> +
> >> +	if (tsk->ptrace)
> >> +		parent = tsk->parent;
> >> +	else {
> >> +		tsk = tsk->group_leader;
> >> +		parent = tsk->real_parent;
> >> +	}
> >> +
> >> +	info.si_signo = SIGCHLD;
> >> +	info.si_errno = 0;
> >> +	info.si_status = why;
> >> +
> >> +	rcu_read_lock();
> >> +	info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns);
> >> +	info.si_uid = __task_cred(tsk)->uid;
> > 	
> > 	This eventually should become:
> >
> > 	info.si_uid = user_ns_map_uid(task_cred_xxx(t, user_ns),
> > 	                              current_cred(), current_uid());
> >
> > 	I've got a first-stab patch at converting the rest of
> > 	kernel/signal.c in http://kernel.ubuntu.com/git?p=serge/userns-2.6.git
> 
> Ok, thanks.
> 
> >> +	rcu_read_unlock();
> >> +
> >> +	info.si_utime = cputime_to_clock_t(tsk->utime);
> >> +	info.si_stime = cputime_to_clock_t(tsk->stime);
> >> +
> >> +	info.si_code = CLD_REBOOTED;
> >> +
> >> +	sighand = parent->sighand;
> >> +	spin_lock_irqsave(&sighand->siglock, flags);
> >> +	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
> >> +	    sighand->action[SIGCHLD-1].sa.sa_flags & SA_CLDREBOOT)
> >> +		__group_send_sig_info(SIGCHLD, &info, parent);
> >> +	/*
> >> +	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
> >> +	 */
> >> +	__wake_up_parent(tsk, parent);
> >> +	spin_unlock_irqrestore(&sighand->siglock, flags);
> >> +}
> > ...
> >
> >> @@ -426,10 +434,18 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
> >>  {
> >>  	char buffer[256];
> >>  	int ret = 0;
> >> +	struct pid_namespace *pid_ns = current->nsproxy->pid_ns;
> >> +
> >> +        /* We only trust the superuser with rebooting the system. */
> >> +	if (!capable(CAP_SYS_BOOT)) {
> > Doesn't this mean that an unprivileged task in a container can shut
> > down the container?
> 
> Ha ha ! Right, good catch :)
> 
> Yes, rethinking about it, we can do what initially proposed Bruno by
> just preventing to reboot when we are not in the init_pid_ns. Actually, 
> the sys_reboot occurs after the services shutdown and "kill -1 SIGTERM"
> and "kill -1 SIGKILL", and would not make sense to do that in a child
> pid namespace, except if we are in a container where we don't want to
> reboot :)
> 
> So IMO, it is safe to do:
> 
> 	if (!ns_capable(current_pid_ns()->user_ns, CAP_SYS_BOOT))
>  		return -EPERM;

That sounds good.  Until the pid_ns->user_ns patch goes in, just
capable(CAP_SYS_BOOT) works too.

Actually, if this is the only thing CAP_SYS_BOOT grants you, and
if it is always fully namespaced, then I'm not sure there'll ever
be a reason to switch this to ns_capable().

thanks,
-serge

> 	if (pid_ns != &init_pid_ns)
> 		return pid_namespace_reboot(pid_ns, cmd, buffer);
> 
> 
> > The pidns->user_ns patch I sent earlier today gives you what you need
> > so that you can add
> >
> > 		if (!ns_capable(current_pid_ns()->user_ns, CAP_SYS_BOOT)
> > 			return -EPERM;
> >
> > right here to prevent that.
> >
> >> +		/* If we are not in the initial pid namespace, we send a signal
> >> +		 * to the parent of this init pid namespace, notifying a shutdown
> >> +		 * occured */
> >> +		if (pid_ns != &init_pid_ns)
> >> +			pid_namespace_reboot(pid_ns, cmd, buffer);
> >>  
> >> -	/* We only trust the superuser with rebooting the system. */
> >> -	if (!capable(CAP_SYS_BOOT))
> >>  		return -EPERM;
> >> +	}
> >>  
> >>  	/* For safety, we require "magic" arguments. */
> >>  	if (magic1 != LINUX_REBOOT_MAGIC1 ||
> >> -- 
> >> 1.7.4.1
> >>
> >> _______________________________________________
> >> Containers mailing list
> >> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> >> https://lists.linux-foundation.org/mailman/listinfo/containers
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/2] Notify container-init parent a 'reboot' occured
  2011-08-11 21:30     ` Daniel Lezcano
       [not found]       ` <4E444A04.3070403-GANU6spQydw@public.gmane.org>
@ 2011-08-11 21:50       ` Serge Hallyn
  1 sibling, 0 replies; 64+ messages in thread
From: Serge Hallyn @ 2011-08-11 21:50 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: akpm, containers, bonbons, oleg, linux-kernel

Quoting Daniel Lezcano (daniel.lezcano@free.fr):
> On 08/11/2011 11:09 PM, Serge Hallyn wrote:
> > Quoting Daniel Lezcano (daniel.lezcano@free.fr):
> >> When the reboot syscall is called and the pid namespace where the calling
> >> process belongs to is not from the init pidns, we send a SIGCHLD with CLD_REBOOTED
> >> to the parent of this pid namespace.
> >>
> >> Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
> > ...
> >
> >> +void do_notify_parent_cldreboot(struct task_struct *tsk, int why, char *buffer)
> >> +{
> >> +	struct siginfo info = { };
> >> +	struct task_struct *parent;
> >> +	struct sighand_struct *sighand;
> >> +	unsigned long flags;
> >> +
> >> +	if (tsk->ptrace)
> >> +		parent = tsk->parent;
> >> +	else {
> >> +		tsk = tsk->group_leader;
> >> +		parent = tsk->real_parent;
> >> +	}
> >> +
> >> +	info.si_signo = SIGCHLD;
> >> +	info.si_errno = 0;
> >> +	info.si_status = why;
> >> +
> >> +	rcu_read_lock();
> >> +	info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns);
> >> +	info.si_uid = __task_cred(tsk)->uid;
> > 	
> > 	This eventually should become:
> >
> > 	info.si_uid = user_ns_map_uid(task_cred_xxx(t, user_ns),
> > 	                              current_cred(), current_uid());
> >
> > 	I've got a first-stab patch at converting the rest of
> > 	kernel/signal.c in http://kernel.ubuntu.com/git?p=serge/userns-2.6.git
> 
> Ok, thanks.
> 
> >> +	rcu_read_unlock();
> >> +
> >> +	info.si_utime = cputime_to_clock_t(tsk->utime);
> >> +	info.si_stime = cputime_to_clock_t(tsk->stime);
> >> +
> >> +	info.si_code = CLD_REBOOTED;
> >> +
> >> +	sighand = parent->sighand;
> >> +	spin_lock_irqsave(&sighand->siglock, flags);
> >> +	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
> >> +	    sighand->action[SIGCHLD-1].sa.sa_flags & SA_CLDREBOOT)
> >> +		__group_send_sig_info(SIGCHLD, &info, parent);
> >> +	/*
> >> +	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
> >> +	 */
> >> +	__wake_up_parent(tsk, parent);
> >> +	spin_unlock_irqrestore(&sighand->siglock, flags);
> >> +}
> > ...
> >
> >> @@ -426,10 +434,18 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
> >>  {
> >>  	char buffer[256];
> >>  	int ret = 0;
> >> +	struct pid_namespace *pid_ns = current->nsproxy->pid_ns;
> >> +
> >> +        /* We only trust the superuser with rebooting the system. */
> >> +	if (!capable(CAP_SYS_BOOT)) {
> > Doesn't this mean that an unprivileged task in a container can shut
> > down the container?
> 
> Ha ha ! Right, good catch :)
> 
> Yes, rethinking about it, we can do what initially proposed Bruno by
> just preventing to reboot when we are not in the init_pid_ns. Actually, 
> the sys_reboot occurs after the services shutdown and "kill -1 SIGTERM"
> and "kill -1 SIGKILL", and would not make sense to do that in a child
> pid namespace, except if we are in a container where we don't want to
> reboot :)
> 
> So IMO, it is safe to do:
> 
> 	if (!ns_capable(current_pid_ns()->user_ns, CAP_SYS_BOOT))
>  		return -EPERM;

That sounds good.  Until the pid_ns->user_ns patch goes in, just
capable(CAP_SYS_BOOT) works too.

Actually, if this is the only thing CAP_SYS_BOOT grants you, and
if it is always fully namespaced, then I'm not sure there'll ever
be a reason to switch this to ns_capable().

thanks,
-serge

> 	if (pid_ns != &init_pid_ns)
> 		return pid_namespace_reboot(pid_ns, cmd, buffer);
> 
> 
> > The pidns->user_ns patch I sent earlier today gives you what you need
> > so that you can add
> >
> > 		if (!ns_capable(current_pid_ns()->user_ns, CAP_SYS_BOOT)
> > 			return -EPERM;
> >
> > right here to prevent that.
> >
> >> +		/* If we are not in the initial pid namespace, we send a signal
> >> +		 * to the parent of this init pid namespace, notifying a shutdown
> >> +		 * occured */
> >> +		if (pid_ns != &init_pid_ns)
> >> +			pid_namespace_reboot(pid_ns, cmd, buffer);
> >>  
> >> -	/* We only trust the superuser with rebooting the system. */
> >> -	if (!capable(CAP_SYS_BOOT))
> >>  		return -EPERM;
> >> +	}
> >>  
> >>  	/* For safety, we require "magic" arguments. */
> >>  	if (magic1 != LINUX_REBOOT_MAGIC1 ||
> >> -- 
> >> 1.7.4.1
> >>
> >> _______________________________________________
> >> Containers mailing list
> >> Containers@lists.linux-foundation.org
> >> https://lists.linux-foundation.org/mailman/listinfo/containers
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/2] Notify container-init parent a 'reboot' occured
  2011-08-11 21:30     ` Daniel Lezcano
@ 2011-08-12 16:29           ` Serge Hallyn
  2011-08-11 21:50       ` Serge Hallyn
  1 sibling, 0 replies; 64+ messages in thread
From: Serge Hallyn @ 2011-08-12 16:29 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	oleg-6lXkIZvqkOAvJsYlp49lxw

Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org):
...
> > Doesn't this mean that an unprivileged task in a container can shut
> > down the container?
> 
> Ha ha ! Right, good catch :)
> 
> Yes, rethinking about it, we can do what initially proposed Bruno by
> just preventing to reboot when we are not in the init_pid_ns. Actually, 
> the sys_reboot occurs after the services shutdown and "kill -1 SIGTERM"
> and "kill -1 SIGKILL", and would not make sense to do that in a child
> pid namespace, except if we are in a container where we don't want to
> reboot :)
> 
> So IMO, it is safe to do:
> 
> 	if (!ns_capable(current_pid_ns()->user_ns, CAP_SYS_BOOT))
>  		return -EPERM;
> 	
> 	if (pid_ns != &init_pid_ns)
> 		return pid_namespace_reboot(pid_ns, cmd, buffer);

So I don't know if you want to prepend
http://kernel.ubuntu.com/git?p=serge/linux-syslogns.git;a=commit;h=63556e9a39bcd75ec4a88333425800905013c73e
to your patchset, or just check nsown_capable(CAP_SYS_BOOT) for now,
but as soon as you resend with that I'll happily, nay,
ecstatically ack.

thanks,
-serge

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/2] Notify container-init parent a 'reboot' occured
@ 2011-08-12 16:29           ` Serge Hallyn
  0 siblings, 0 replies; 64+ messages in thread
From: Serge Hallyn @ 2011-08-12 16:29 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: akpm, containers, bonbons, oleg, linux-kernel

Quoting Daniel Lezcano (daniel.lezcano@free.fr):
...
> > Doesn't this mean that an unprivileged task in a container can shut
> > down the container?
> 
> Ha ha ! Right, good catch :)
> 
> Yes, rethinking about it, we can do what initially proposed Bruno by
> just preventing to reboot when we are not in the init_pid_ns. Actually, 
> the sys_reboot occurs after the services shutdown and "kill -1 SIGTERM"
> and "kill -1 SIGKILL", and would not make sense to do that in a child
> pid namespace, except if we are in a container where we don't want to
> reboot :)
> 
> So IMO, it is safe to do:
> 
> 	if (!ns_capable(current_pid_ns()->user_ns, CAP_SYS_BOOT))
>  		return -EPERM;
> 	
> 	if (pid_ns != &init_pid_ns)
> 		return pid_namespace_reboot(pid_ns, cmd, buffer);

So I don't know if you want to prepend
http://kernel.ubuntu.com/git?p=serge/linux-syslogns.git;a=commit;h=63556e9a39bcd75ec4a88333425800905013c73e
to your patchset, or just check nsown_capable(CAP_SYS_BOOT) for now,
but as soon as you resend with that I'll happily, nay,
ecstatically ack.

thanks,
-serge

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/2] Notify container-init parent a 'reboot' occured
  2011-08-12 16:29           ` Serge Hallyn
  (?)
  (?)
@ 2011-08-12 20:42           ` Daniel Lezcano
  -1 siblings, 0 replies; 64+ messages in thread
From: Daniel Lezcano @ 2011-08-12 20:42 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	oleg-6lXkIZvqkOAvJsYlp49lxw

On 08/12/2011 06:29 PM, Serge Hallyn wrote:
> Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org):
> ...
>>> Doesn't this mean that an unprivileged task in a container can shut
>>> down the container?
>> Ha ha ! Right, good catch :)
>>
>> Yes, rethinking about it, we can do what initially proposed Bruno by
>> just preventing to reboot when we are not in the init_pid_ns. Actually, 
>> the sys_reboot occurs after the services shutdown and "kill -1 SIGTERM"
>> and "kill -1 SIGKILL", and would not make sense to do that in a child
>> pid namespace, except if we are in a container where we don't want to
>> reboot :)
>>
>> So IMO, it is safe to do:
>>
>> 	if (!ns_capable(current_pid_ns()->user_ns, CAP_SYS_BOOT))
>>  		return -EPERM;
>> 	
>> 	if (pid_ns != &init_pid_ns)
>> 		return pid_namespace_reboot(pid_ns, cmd, buffer);
> So I don't know if you want to prepend
> http://kernel.ubuntu.com/git?p=serge/linux-syslogns.git;a=commit;h=63556e9a39bcd75ec4a88333425800905013c73e
> to your patchset, or just check nsown_capable(CAP_SYS_BOOT) for now,
> but as soon as you resend with that I'll happily, nay,
> ecstatically ack.

Ok, in order to not mix the functionnalities, I will send in a separate
patch the nsown_capable change.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/2] Notify container-init parent a 'reboot' occured
  2011-08-12 16:29           ` Serge Hallyn
  (?)
@ 2011-08-12 20:42           ` Daniel Lezcano
       [not found]             ` <4E45904F.60904-GANU6spQydw@public.gmane.org>
  2011-08-12 21:13             ` Serge Hallyn
  -1 siblings, 2 replies; 64+ messages in thread
From: Daniel Lezcano @ 2011-08-12 20:42 UTC (permalink / raw)
  To: Serge Hallyn; +Cc: akpm, containers, bonbons, oleg, linux-kernel

On 08/12/2011 06:29 PM, Serge Hallyn wrote:
> Quoting Daniel Lezcano (daniel.lezcano@free.fr):
> ...
>>> Doesn't this mean that an unprivileged task in a container can shut
>>> down the container?
>> Ha ha ! Right, good catch :)
>>
>> Yes, rethinking about it, we can do what initially proposed Bruno by
>> just preventing to reboot when we are not in the init_pid_ns. Actually, 
>> the sys_reboot occurs after the services shutdown and "kill -1 SIGTERM"
>> and "kill -1 SIGKILL", and would not make sense to do that in a child
>> pid namespace, except if we are in a container where we don't want to
>> reboot :)
>>
>> So IMO, it is safe to do:
>>
>> 	if (!ns_capable(current_pid_ns()->user_ns, CAP_SYS_BOOT))
>>  		return -EPERM;
>> 	
>> 	if (pid_ns != &init_pid_ns)
>> 		return pid_namespace_reboot(pid_ns, cmd, buffer);
> So I don't know if you want to prepend
> http://kernel.ubuntu.com/git?p=serge/linux-syslogns.git;a=commit;h=63556e9a39bcd75ec4a88333425800905013c73e
> to your patchset, or just check nsown_capable(CAP_SYS_BOOT) for now,
> but as soon as you resend with that I'll happily, nay,
> ecstatically ack.

Ok, in order to not mix the functionnalities, I will send in a separate
patch the nsown_capable change.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/2] Notify container-init parent a 'reboot' occured
       [not found]             ` <4E45904F.60904-GANU6spQydw@public.gmane.org>
@ 2011-08-12 21:13               ` Serge Hallyn
  0 siblings, 0 replies; 64+ messages in thread
From: Serge Hallyn @ 2011-08-12 21:13 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	oleg-6lXkIZvqkOAvJsYlp49lxw

Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org):
> On 08/12/2011 06:29 PM, Serge Hallyn wrote:
> > Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org):
> > ...
> >>> Doesn't this mean that an unprivileged task in a container can shut
> >>> down the container?
> >> Ha ha ! Right, good catch :)
> >>
> >> Yes, rethinking about it, we can do what initially proposed Bruno by
> >> just preventing to reboot when we are not in the init_pid_ns. Actually, 
> >> the sys_reboot occurs after the services shutdown and "kill -1 SIGTERM"
> >> and "kill -1 SIGKILL", and would not make sense to do that in a child
> >> pid namespace, except if we are in a container where we don't want to
> >> reboot :)
> >>
> >> So IMO, it is safe to do:
> >>
> >> 	if (!ns_capable(current_pid_ns()->user_ns, CAP_SYS_BOOT))
> >>  		return -EPERM;
> >> 	
> >> 	if (pid_ns != &init_pid_ns)
> >> 		return pid_namespace_reboot(pid_ns, cmd, buffer);
> > So I don't know if you want to prepend
> > http://kernel.ubuntu.com/git?p=serge/linux-syslogns.git;a=commit;h=63556e9a39bcd75ec4a88333425800905013c73e
> > to your patchset, or just check nsown_capable(CAP_SYS_BOOT) for now,
> > but as soon as you resend with that I'll happily, nay,
> > ecstatically ack.
> 
> Ok, in order to not mix the functionnalities, I will send in a separate
> patch the nsown_capable change.

Actually the nsown_capable was a temp hack fix anyway, for a problem
we don't actually have.

I forgot you are not using user namespaces yet anyway.  So you can just
leave it as

	if (!capable(CAP_SYS_BOOT))
		return -EPERM;

for now.  Since you're root in the init user namespace, that'll pass
(as soon as you don't drop it from bounding set of course)

thanks,
-serge

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/2] Notify container-init parent a 'reboot' occured
  2011-08-12 20:42           ` Daniel Lezcano
       [not found]             ` <4E45904F.60904-GANU6spQydw@public.gmane.org>
@ 2011-08-12 21:13             ` Serge Hallyn
  1 sibling, 0 replies; 64+ messages in thread
From: Serge Hallyn @ 2011-08-12 21:13 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: akpm, containers, bonbons, oleg, linux-kernel

Quoting Daniel Lezcano (daniel.lezcano@free.fr):
> On 08/12/2011 06:29 PM, Serge Hallyn wrote:
> > Quoting Daniel Lezcano (daniel.lezcano@free.fr):
> > ...
> >>> Doesn't this mean that an unprivileged task in a container can shut
> >>> down the container?
> >> Ha ha ! Right, good catch :)
> >>
> >> Yes, rethinking about it, we can do what initially proposed Bruno by
> >> just preventing to reboot when we are not in the init_pid_ns. Actually, 
> >> the sys_reboot occurs after the services shutdown and "kill -1 SIGTERM"
> >> and "kill -1 SIGKILL", and would not make sense to do that in a child
> >> pid namespace, except if we are in a container where we don't want to
> >> reboot :)
> >>
> >> So IMO, it is safe to do:
> >>
> >> 	if (!ns_capable(current_pid_ns()->user_ns, CAP_SYS_BOOT))
> >>  		return -EPERM;
> >> 	
> >> 	if (pid_ns != &init_pid_ns)
> >> 		return pid_namespace_reboot(pid_ns, cmd, buffer);
> > So I don't know if you want to prepend
> > http://kernel.ubuntu.com/git?p=serge/linux-syslogns.git;a=commit;h=63556e9a39bcd75ec4a88333425800905013c73e
> > to your patchset, or just check nsown_capable(CAP_SYS_BOOT) for now,
> > but as soon as you resend with that I'll happily, nay,
> > ecstatically ack.
> 
> Ok, in order to not mix the functionnalities, I will send in a separate
> patch the nsown_capable change.

Actually the nsown_capable was a temp hack fix anyway, for a problem
we don't actually have.

I forgot you are not using user namespaces yet anyway.  So you can just
leave it as

	if (!capable(CAP_SYS_BOOT))
		return -EPERM;

for now.  Since you're root in the init user namespace, that'll pass
(as soon as you don't drop it from bounding set of course)

thanks,
-serge

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/2] Notify container-init parent a 'reboot' occured
       [not found]   ` <1313094241-3674-3-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
  2011-08-11 21:09     ` Serge Hallyn
@ 2011-08-13  0:19     ` Matt Helsley
  2011-08-14 16:01     ` Oleg Nesterov
  2 siblings, 0 replies; 64+ messages in thread
From: Matt Helsley @ 2011-08-13  0:19 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	oleg-6lXkIZvqkOAvJsYlp49lxw

On Thu, Aug 11, 2011 at 10:24:01PM +0200, Daniel Lezcano wrote:
> When the reboot syscall is called and the pid namespace where the calling
> process belongs to is not from the init pidns, we send a SIGCHLD with CLD_REBOOTED
> to the parent of this pid namespace.

Shouldn't we honor the exit_signal set via clone() here rather than
hardcode SIGCHLD? More below...

> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano-GANU6spQydw@public.gmane.org>
> ---
>  include/asm-generic/siginfo.h |    3 ++-
>  include/linux/sched.h         |    1 +
>  kernel/signal.c               |   40 ++++++++++++++++++++++++++++++++++++++++
>  kernel/sys.c                  |   20 ++++++++++++++++++--
>  4 files changed, 61 insertions(+), 3 deletions(-)
> 
> diff --git a/include/asm-generic/siginfo.h b/include/asm-generic/siginfo.h
> index 0dd4e87..9bff4a2 100644
> --- a/include/asm-generic/siginfo.h
> +++ b/include/asm-generic/siginfo.h
> @@ -218,7 +218,8 @@ typedef struct siginfo {
>  #define CLD_TRAPPED	(__SI_CHLD|4)	/* traced child has trapped */
>  #define CLD_STOPPED	(__SI_CHLD|5)	/* child has stopped */
>  #define CLD_CONTINUED	(__SI_CHLD|6)	/* stopped child has continued */
> -#define NSIGCHLD	6
> +#define CLD_REBOOTED    (__SI_CHLD|7)   /* process was killed by a reboot */
> +#define NSIGCHLD	7
> 
>  /*
>   * SIGPOLL si_codes
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 20b03bf..c62dc9e 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2170,6 +2170,7 @@ extern int kill_pgrp(struct pid *pid, int sig, int priv);
>  extern int kill_pid(struct pid *pid, int sig, int priv);
>  extern int kill_proc_info(int, struct siginfo *, pid_t);
>  extern __must_check bool do_notify_parent(struct task_struct *, int);
> +extern void do_notify_parent_cldreboot(struct task_struct *, int, char *);
>  extern void __wake_up_parent(struct task_struct *p, struct task_struct *parent);
>  extern void force_sig(int, struct task_struct *);
>  extern int send_sig(int, struct task_struct *, int);
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 291c970..7d3d44c 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1668,6 +1668,46 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
>  	return autoreap;
>  }
> 
> +void do_notify_parent_cldreboot(struct task_struct *tsk, int why, char *buffer)
> +{
> +	struct siginfo info = { };
> +	struct task_struct *parent;
> +	struct sighand_struct *sighand;
> +	unsigned long flags;
> +
> +	if (tsk->ptrace)
> +		parent = tsk->parent;
> +	else {
> +		tsk = tsk->group_leader;
> +		parent = tsk->real_parent;
> +	}
> +
> +	info.si_signo = SIGCHLD;

should this be:

info.si_signo = tsk->exit_signal == -1 ? SIGCHLD : tsk->exit_signal;

?

> +	info.si_errno = 0;
> +	info.si_status = why;
> +
> +	rcu_read_lock();
> +	info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns);
> +	info.si_uid = __task_cred(tsk)->uid;
> +	rcu_read_unlock();
> +
> +	info.si_utime = cputime_to_clock_t(tsk->utime);
> +	info.si_stime = cputime_to_clock_t(tsk->stime);
> +
> +	info.si_code = CLD_REBOOTED;
> +
> +	sighand = parent->sighand;
> +	spin_lock_irqsave(&sighand->siglock, flags);
> +	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
> +	    sighand->action[SIGCHLD-1].sa.sa_flags & SA_CLDREBOOT)
> +		__group_send_sig_info(SIGCHLD, &info, parent);

(with corresponding changes above...)

> +	/*
> +	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
> +	 */

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/2] Notify container-init parent a 'reboot' occured
  2011-08-11 20:24 ` [PATCH 2/2] Notify container-init parent a 'reboot' occured Daniel Lezcano
  2011-08-11 21:09   ` Serge Hallyn
       [not found]   ` <1313094241-3674-3-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
@ 2011-08-13  0:19   ` Matt Helsley
  2011-08-13 14:41     ` Daniel Lezcano
       [not found]     ` <20110813001959.GB5777-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
  2011-08-14 16:01   ` Oleg Nesterov
  3 siblings, 2 replies; 64+ messages in thread
From: Matt Helsley @ 2011-08-13  0:19 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: akpm, containers, bonbons, oleg, linux-kernel

On Thu, Aug 11, 2011 at 10:24:01PM +0200, Daniel Lezcano wrote:
> When the reboot syscall is called and the pid namespace where the calling
> process belongs to is not from the init pidns, we send a SIGCHLD with CLD_REBOOTED
> to the parent of this pid namespace.

Shouldn't we honor the exit_signal set via clone() here rather than
hardcode SIGCHLD? More below...

> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
> ---
>  include/asm-generic/siginfo.h |    3 ++-
>  include/linux/sched.h         |    1 +
>  kernel/signal.c               |   40 ++++++++++++++++++++++++++++++++++++++++
>  kernel/sys.c                  |   20 ++++++++++++++++++--
>  4 files changed, 61 insertions(+), 3 deletions(-)
> 
> diff --git a/include/asm-generic/siginfo.h b/include/asm-generic/siginfo.h
> index 0dd4e87..9bff4a2 100644
> --- a/include/asm-generic/siginfo.h
> +++ b/include/asm-generic/siginfo.h
> @@ -218,7 +218,8 @@ typedef struct siginfo {
>  #define CLD_TRAPPED	(__SI_CHLD|4)	/* traced child has trapped */
>  #define CLD_STOPPED	(__SI_CHLD|5)	/* child has stopped */
>  #define CLD_CONTINUED	(__SI_CHLD|6)	/* stopped child has continued */
> -#define NSIGCHLD	6
> +#define CLD_REBOOTED    (__SI_CHLD|7)   /* process was killed by a reboot */
> +#define NSIGCHLD	7
> 
>  /*
>   * SIGPOLL si_codes
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 20b03bf..c62dc9e 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2170,6 +2170,7 @@ extern int kill_pgrp(struct pid *pid, int sig, int priv);
>  extern int kill_pid(struct pid *pid, int sig, int priv);
>  extern int kill_proc_info(int, struct siginfo *, pid_t);
>  extern __must_check bool do_notify_parent(struct task_struct *, int);
> +extern void do_notify_parent_cldreboot(struct task_struct *, int, char *);
>  extern void __wake_up_parent(struct task_struct *p, struct task_struct *parent);
>  extern void force_sig(int, struct task_struct *);
>  extern int send_sig(int, struct task_struct *, int);
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 291c970..7d3d44c 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1668,6 +1668,46 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
>  	return autoreap;
>  }
> 
> +void do_notify_parent_cldreboot(struct task_struct *tsk, int why, char *buffer)
> +{
> +	struct siginfo info = { };
> +	struct task_struct *parent;
> +	struct sighand_struct *sighand;
> +	unsigned long flags;
> +
> +	if (tsk->ptrace)
> +		parent = tsk->parent;
> +	else {
> +		tsk = tsk->group_leader;
> +		parent = tsk->real_parent;
> +	}
> +
> +	info.si_signo = SIGCHLD;

should this be:

info.si_signo = tsk->exit_signal == -1 ? SIGCHLD : tsk->exit_signal;

?

> +	info.si_errno = 0;
> +	info.si_status = why;
> +
> +	rcu_read_lock();
> +	info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns);
> +	info.si_uid = __task_cred(tsk)->uid;
> +	rcu_read_unlock();
> +
> +	info.si_utime = cputime_to_clock_t(tsk->utime);
> +	info.si_stime = cputime_to_clock_t(tsk->stime);
> +
> +	info.si_code = CLD_REBOOTED;
> +
> +	sighand = parent->sighand;
> +	spin_lock_irqsave(&sighand->siglock, flags);
> +	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
> +	    sighand->action[SIGCHLD-1].sa.sa_flags & SA_CLDREBOOT)
> +		__group_send_sig_info(SIGCHLD, &info, parent);

(with corresponding changes above...)

> +	/*
> +	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
> +	 */

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/2] Notify container-init parent a 'reboot' occured
       [not found]     ` <20110813001959.GB5777-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
@ 2011-08-13 14:41       ` Daniel Lezcano
  0 siblings, 0 replies; 64+ messages in thread
From: Daniel Lezcano @ 2011-08-13 14:41 UTC (permalink / raw)
  To: Matt Helsley
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	oleg-6lXkIZvqkOAvJsYlp49lxw

On 08/13/2011 02:19 AM, Matt Helsley wrote:
> On Thu, Aug 11, 2011 at 10:24:01PM +0200, Daniel Lezcano wrote:
>> When the reboot syscall is called and the pid namespace where the calling
>> process belongs to is not from the init pidns, we send a SIGCHLD with CLD_REBOOTED
>> to the parent of this pid namespace.
> Shouldn't we honor the exit_signal set via clone() here rather than
> hardcode SIGCHLD? More below...

Hmm,  I don't think so, that does not really make sense.

[ ... ]

>> +void do_notify_parent_cldreboot(struct task_struct *tsk, int why, char *buffer)
>> +{
>> +	struct siginfo info = { };
>> +	struct task_struct *parent;
>> +	struct sighand_struct *sighand;
>> +	unsigned long flags;
>> +
>> +	if (tsk->ptrace)
>> +		parent = tsk->parent;
>> +	else {
>> +		tsk = tsk->group_leader;
>> +		parent = tsk->real_parent;
>> +	}
>> +
>> +	info.si_signo = SIGCHLD;
> should this be:
>
> info.si_signo = tsk->exit_signal == -1 ? SIGCHLD : tsk->exit_signal;


No, because this code is always called for SIGHLD.
The userspace will receive this signal with CLD_REBOOT if it sets the
SA_CLDREBOOT flag option which is not set by default.

>> +	info.si_errno = 0;
>> +	info.si_status = why;
>> +
>> +	rcu_read_lock();
>> +	info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns);
>> +	info.si_uid = __task_cred(tsk)->uid;
>> +	rcu_read_unlock();
>> +
>> +	info.si_utime = cputime_to_clock_t(tsk->utime);
>> +	info.si_stime = cputime_to_clock_t(tsk->stime);
>> +
>> +	info.si_code = CLD_REBOOTED;
>> +
>> +	sighand = parent->sighand;
>> +	spin_lock_irqsave(&sighand->siglock, flags);
>> +	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
>> +	    sighand->action[SIGCHLD-1].sa.sa_flags & SA_CLDREBOOT)
>> +		__group_send_sig_info(SIGCHLD, &info, parent);
> (with corresponding changes above...)
>
>> +	/*
>> +	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
>> +	 */

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/2] Notify container-init parent a 'reboot' occured
  2011-08-13  0:19   ` Matt Helsley
@ 2011-08-13 14:41     ` Daniel Lezcano
       [not found]     ` <20110813001959.GB5777-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
  1 sibling, 0 replies; 64+ messages in thread
From: Daniel Lezcano @ 2011-08-13 14:41 UTC (permalink / raw)
  To: Matt Helsley; +Cc: akpm, containers, bonbons, oleg, linux-kernel

On 08/13/2011 02:19 AM, Matt Helsley wrote:
> On Thu, Aug 11, 2011 at 10:24:01PM +0200, Daniel Lezcano wrote:
>> When the reboot syscall is called and the pid namespace where the calling
>> process belongs to is not from the init pidns, we send a SIGCHLD with CLD_REBOOTED
>> to the parent of this pid namespace.
> Shouldn't we honor the exit_signal set via clone() here rather than
> hardcode SIGCHLD? More below...

Hmm,  I don't think so, that does not really make sense.

[ ... ]

>> +void do_notify_parent_cldreboot(struct task_struct *tsk, int why, char *buffer)
>> +{
>> +	struct siginfo info = { };
>> +	struct task_struct *parent;
>> +	struct sighand_struct *sighand;
>> +	unsigned long flags;
>> +
>> +	if (tsk->ptrace)
>> +		parent = tsk->parent;
>> +	else {
>> +		tsk = tsk->group_leader;
>> +		parent = tsk->real_parent;
>> +	}
>> +
>> +	info.si_signo = SIGCHLD;
> should this be:
>
> info.si_signo = tsk->exit_signal == -1 ? SIGCHLD : tsk->exit_signal;


No, because this code is always called for SIGHLD.
The userspace will receive this signal with CLD_REBOOT if it sets the
SA_CLDREBOOT flag option which is not set by default.

>> +	info.si_errno = 0;
>> +	info.si_status = why;
>> +
>> +	rcu_read_lock();
>> +	info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns);
>> +	info.si_uid = __task_cred(tsk)->uid;
>> +	rcu_read_unlock();
>> +
>> +	info.si_utime = cputime_to_clock_t(tsk->utime);
>> +	info.si_stime = cputime_to_clock_t(tsk->stime);
>> +
>> +	info.si_code = CLD_REBOOTED;
>> +
>> +	sighand = parent->sighand;
>> +	spin_lock_irqsave(&sighand->siglock, flags);
>> +	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
>> +	    sighand->action[SIGCHLD-1].sa.sa_flags & SA_CLDREBOOT)
>> +		__group_send_sig_info(SIGCHLD, &info, parent);
> (with corresponding changes above...)
>
>> +	/*
>> +	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
>> +	 */


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/2] Notify container-init parent a 'reboot' occured
       [not found]   ` <1313094241-3674-3-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
  2011-08-11 21:09     ` Serge Hallyn
  2011-08-13  0:19     ` Matt Helsley
@ 2011-08-14 16:01     ` Oleg Nesterov
  2 siblings, 0 replies; 64+ messages in thread
From: Oleg Nesterov @ 2011-08-14 16:01 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 08/11, Daniel Lezcano wrote:
>
> When the reboot syscall is called and the pid namespace where the calling
> process belongs to is not from the init pidns, we send a SIGCHLD with CLD_REBOOTED
> to the parent of this pid namespace.

OK, but why you can't simply send the signal?

Why do you need the strange do_notify_parent_cldreboot() which tries
to mimic do_notify_parent() for (afaics) no reason ?

> +void do_notify_parent_cldreboot(struct task_struct *tsk, int why, char *buffer)

buffer is not used. Why?

> +	if (tsk->ptrace)
> +		parent = tsk->parent;
> +	else {
> +		tsk = tsk->group_leader;
> +		parent = tsk->real_parent;
> +	}

For what? I simply can't understand this...

> +     sighand = parent->sighand;
> +     spin_lock_irqsave(&sighand->siglock, flags);

this is unsafe, we can't trust ->sighand and parent.

> +	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.

but not in this case, afaics?

> +	__wake_up_parent(tsk, parent);

Why do you need __wake_up_parent()?

> +static void pid_namespace_reboot(struct pid_namespace *pid_ns,
> +				int cmd, char *buffer)
> +{
> +	struct task_struct *tsk = pid_ns->child_reaper;
> +	do_notify_parent_cldreboot(tsk, cmd, buffer);

nothing prevents ->child_reaper from exiting it it is multithreaded,
this can crash the kernel.

> @@ -426,10 +434,18 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
>  {
>  	char buffer[256];

again, it is not used.

>  	int ret = 0;
> +	struct pid_namespace *pid_ns = current->nsproxy->pid_ns;
> +
> +        /* We only trust the superuser with rebooting the system. */
> +	if (!capable(CAP_SYS_BOOT)) {
> +		/* If we are not in the initial pid namespace, we send a signal
> +		 * to the parent of this init pid namespace, notifying a shutdown
> +		 * occured */
> +		if (pid_ns != &init_pid_ns)
> +			pid_namespace_reboot(pid_ns, cmd, buffer);

Hmm. Looks like pid_ns should be checked after CAP_SYS_BOOT?

Oleg.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/2] Notify container-init parent a 'reboot' occured
  2011-08-11 20:24 ` [PATCH 2/2] Notify container-init parent a 'reboot' occured Daniel Lezcano
                     ` (2 preceding siblings ...)
  2011-08-13  0:19   ` Matt Helsley
@ 2011-08-14 16:01   ` Oleg Nesterov
  3 siblings, 0 replies; 64+ messages in thread
From: Oleg Nesterov @ 2011-08-14 16:01 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: akpm, bonbons, containers, linux-kernel, serge

On 08/11, Daniel Lezcano wrote:
>
> When the reboot syscall is called and the pid namespace where the calling
> process belongs to is not from the init pidns, we send a SIGCHLD with CLD_REBOOTED
> to the parent of this pid namespace.

OK, but why you can't simply send the signal?

Why do you need the strange do_notify_parent_cldreboot() which tries
to mimic do_notify_parent() for (afaics) no reason ?

> +void do_notify_parent_cldreboot(struct task_struct *tsk, int why, char *buffer)

buffer is not used. Why?

> +	if (tsk->ptrace)
> +		parent = tsk->parent;
> +	else {
> +		tsk = tsk->group_leader;
> +		parent = tsk->real_parent;
> +	}

For what? I simply can't understand this...

> +     sighand = parent->sighand;
> +     spin_lock_irqsave(&sighand->siglock, flags);

this is unsafe, we can't trust ->sighand and parent.

> +	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.

but not in this case, afaics?

> +	__wake_up_parent(tsk, parent);

Why do you need __wake_up_parent()?

> +static void pid_namespace_reboot(struct pid_namespace *pid_ns,
> +				int cmd, char *buffer)
> +{
> +	struct task_struct *tsk = pid_ns->child_reaper;
> +	do_notify_parent_cldreboot(tsk, cmd, buffer);

nothing prevents ->child_reaper from exiting it it is multithreaded,
this can crash the kernel.

> @@ -426,10 +434,18 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
>  {
>  	char buffer[256];

again, it is not used.

>  	int ret = 0;
> +	struct pid_namespace *pid_ns = current->nsproxy->pid_ns;
> +
> +        /* We only trust the superuser with rebooting the system. */
> +	if (!capable(CAP_SYS_BOOT)) {
> +		/* If we are not in the initial pid namespace, we send a signal
> +		 * to the parent of this init pid namespace, notifying a shutdown
> +		 * occured */
> +		if (pid_ns != &init_pid_ns)
> +			pid_namespace_reboot(pid_ns, cmd, buffer);

Hmm. Looks like pid_ns should be checked after CAP_SYS_BOOT?

Oleg.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 1/2] add SA_CLDREBOOT flag
       [not found]   ` <1313094241-3674-2-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
@ 2011-08-14 16:15     ` Oleg Nesterov
  0 siblings, 0 replies; 64+ messages in thread
From: Oleg Nesterov @ 2011-08-14 16:15 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 08/11, Daniel Lezcano wrote:
>
> Add the SA_CLDREBOOT flag to enable the CLD_REBOOT on SIGCHLD
> when a process of a child pid namespace calls the reboot syscall.

I doubt this is really useful in general. SIGCHLD doesn't queue,
if the parent of init already has the pending SIGCHLD it won't see
CLD_REBOOTED anyway. Anyway, you do not need a special helper to
fill siginfo.

And personally I strongly dislike SA_CLDREBOOT. For what? Imho, this
is unnecessary complication. If we want SIGCHLD-on-reboot, just send
it unconditionally.

Oleg.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 1/2] add SA_CLDREBOOT flag
  2011-08-11 20:24 ` [PATCH 1/2] add SA_CLDREBOOT flag Daniel Lezcano
       [not found]   ` <1313094241-3674-2-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
@ 2011-08-14 16:15   ` Oleg Nesterov
       [not found]     ` <20110814161532.GA30846-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2011-08-14 16:36     ` Bruno Prémont
  1 sibling, 2 replies; 64+ messages in thread
From: Oleg Nesterov @ 2011-08-14 16:15 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: akpm, bonbons, containers, linux-kernel, serge

On 08/11, Daniel Lezcano wrote:
>
> Add the SA_CLDREBOOT flag to enable the CLD_REBOOT on SIGCHLD
> when a process of a child pid namespace calls the reboot syscall.

I doubt this is really useful in general. SIGCHLD doesn't queue,
if the parent of init already has the pending SIGCHLD it won't see
CLD_REBOOTED anyway. Anyway, you do not need a special helper to
fill siginfo.

And personally I strongly dislike SA_CLDREBOOT. For what? Imho, this
is unnecessary complication. If we want SIGCHLD-on-reboot, just send
it unconditionally.

Oleg.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
       [not found] ` <1313094241-3674-1-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
  2011-08-11 20:24   ` [PATCH 1/2] add SA_CLDREBOOT flag Daniel Lezcano
  2011-08-11 20:24   ` [PATCH 2/2] Notify container-init parent a 'reboot' occured Daniel Lezcano
@ 2011-08-14 16:17   ` Oleg Nesterov
  2 siblings, 0 replies; 64+ messages in thread
From: Oleg Nesterov @ 2011-08-14 16:17 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 08/11, Daniel Lezcano wrote:
>
> In the case of a VPS, when we shutdown/halt/reboot the container, the
> reboot utility will invoke the sys_reboot syscall which has the bad
> effect to reboot the host.

Stupid question. Can't sys_reboot() simply kill init (and thus the whole
pid_ns) in this case?

Oleg.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
  2011-08-11 20:23 [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot Daniel Lezcano
                   ` (2 preceding siblings ...)
  2011-08-11 20:24 ` [PATCH 2/2] Notify container-init parent a 'reboot' occured Daniel Lezcano
@ 2011-08-14 16:17 ` Oleg Nesterov
  2011-08-14 21:36   ` Serge E. Hallyn
       [not found]   ` <20110814161707.GB30846-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  3 siblings, 2 replies; 64+ messages in thread
From: Oleg Nesterov @ 2011-08-14 16:17 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: akpm, bonbons, containers, linux-kernel, serge

On 08/11, Daniel Lezcano wrote:
>
> In the case of a VPS, when we shutdown/halt/reboot the container, the
> reboot utility will invoke the sys_reboot syscall which has the bad
> effect to reboot the host.

Stupid question. Can't sys_reboot() simply kill init (and thus the whole
pid_ns) in this case?

Oleg.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 1/2] add SA_CLDREBOOT flag
       [not found]     ` <20110814161532.GA30846-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-08-14 16:36       ` Bruno Prémont
  0 siblings, 0 replies; 64+ messages in thread
From: Bruno Prémont @ 2011-08-14 16:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Sun, 14 August 2011 Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On 08/11, Daniel Lezcano wrote:
> >
> > Add the SA_CLDREBOOT flag to enable the CLD_REBOOT on SIGCHLD
> > when a process of a child pid namespace calls the reboot syscall.
> 
> I doubt this is really useful in general. SIGCHLD doesn't queue,
> if the parent of init already has the pending SIGCHLD it won't see
> CLD_REBOOTED anyway. Anyway, you do not need a special helper to
> fill siginfo.
> 
> And personally I strongly dislike SA_CLDREBOOT. For what? Imho, this
> is unnecessary complication. If we want SIGCHLD-on-reboot, just send
> it unconditionally.

The information that is needed is what options the container did pass
to sys_reboot().
Did the container want a reboot or a power-off (or maybe just touch the
LINUX_REBOOT_CMD_CAD_* status)?

As long as that information can reach parent/reaper of container's init
the needed functionality should be there.

Bruno

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 1/2] add SA_CLDREBOOT flag
  2011-08-14 16:15   ` Oleg Nesterov
       [not found]     ` <20110814161532.GA30846-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-08-14 16:36     ` Bruno Prémont
  2011-08-14 17:10       ` Oleg Nesterov
       [not found]       ` <20110814183611.05937c96-hY15tx4IgV39zxVx7UNMDg@public.gmane.org>
  1 sibling, 2 replies; 64+ messages in thread
From: Bruno Prémont @ 2011-08-14 16:36 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Daniel Lezcano, containers, akpm, linux-kernel

On Sun, 14 August 2011 Oleg Nesterov <oleg@redhat.com> wrote:
> On 08/11, Daniel Lezcano wrote:
> >
> > Add the SA_CLDREBOOT flag to enable the CLD_REBOOT on SIGCHLD
> > when a process of a child pid namespace calls the reboot syscall.
> 
> I doubt this is really useful in general. SIGCHLD doesn't queue,
> if the parent of init already has the pending SIGCHLD it won't see
> CLD_REBOOTED anyway. Anyway, you do not need a special helper to
> fill siginfo.
> 
> And personally I strongly dislike SA_CLDREBOOT. For what? Imho, this
> is unnecessary complication. If we want SIGCHLD-on-reboot, just send
> it unconditionally.

The information that is needed is what options the container did pass
to sys_reboot().
Did the container want a reboot or a power-off (or maybe just touch the
LINUX_REBOOT_CMD_CAD_* status)?

As long as that information can reach parent/reaper of container's init
the needed functionality should be there.

Bruno

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 1/2] add SA_CLDREBOOT flag
       [not found]       ` <20110814183611.05937c96-hY15tx4IgV39zxVx7UNMDg@public.gmane.org>
@ 2011-08-14 17:10         ` Oleg Nesterov
  0 siblings, 0 replies; 64+ messages in thread
From: Oleg Nesterov @ 2011-08-14 17:10 UTC (permalink / raw)
  To: Bruno Prémont
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 08/14, Bruno Prémont wrote:
>
> On Sun, 14 August 2011 Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > On 08/11, Daniel Lezcano wrote:
> > >
> > > Add the SA_CLDREBOOT flag to enable the CLD_REBOOT on SIGCHLD
> > > when a process of a child pid namespace calls the reboot syscall.
> >
> > I doubt this is really useful in general. SIGCHLD doesn't queue,
> > if the parent of init already has the pending SIGCHLD it won't see
> > CLD_REBOOTED anyway. Anyway, you do not need a special helper to
> > fill siginfo.
> >
> > And personally I strongly dislike SA_CLDREBOOT. For what? Imho, this
> > is unnecessary complication. If we want SIGCHLD-on-reboot, just send
> > it unconditionally.
>
> The information that is needed is what options the container did pass
> to sys_reboot().
> Did the container want a reboot or a power-off (or maybe just touch the
> LINUX_REBOOT_CMD_CAD_* status)?
>
> As long as that information can reach parent/reaper of container's init
> the needed functionality should be there.

Sorry, can't understand.

Once again, why do you want SA_CLDREBOOT?

OK, you may be need to pass some info via siginfo (but note that it can
be lost, and you do not need a special helper for that). But why do you
want this hack in sa_flags?

Oleg.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 1/2] add SA_CLDREBOOT flag
  2011-08-14 16:36     ` Bruno Prémont
@ 2011-08-14 17:10       ` Oleg Nesterov
       [not found]       ` <20110814183611.05937c96-hY15tx4IgV39zxVx7UNMDg@public.gmane.org>
  1 sibling, 0 replies; 64+ messages in thread
From: Oleg Nesterov @ 2011-08-14 17:10 UTC (permalink / raw)
  To: Bruno Prémont; +Cc: Daniel Lezcano, containers, akpm, linux-kernel

On 08/14, Bruno Prémont wrote:
>
> On Sun, 14 August 2011 Oleg Nesterov <oleg@redhat.com> wrote:
> > On 08/11, Daniel Lezcano wrote:
> > >
> > > Add the SA_CLDREBOOT flag to enable the CLD_REBOOT on SIGCHLD
> > > when a process of a child pid namespace calls the reboot syscall.
> >
> > I doubt this is really useful in general. SIGCHLD doesn't queue,
> > if the parent of init already has the pending SIGCHLD it won't see
> > CLD_REBOOTED anyway. Anyway, you do not need a special helper to
> > fill siginfo.
> >
> > And personally I strongly dislike SA_CLDREBOOT. For what? Imho, this
> > is unnecessary complication. If we want SIGCHLD-on-reboot, just send
> > it unconditionally.
>
> The information that is needed is what options the container did pass
> to sys_reboot().
> Did the container want a reboot or a power-off (or maybe just touch the
> LINUX_REBOOT_CMD_CAD_* status)?
>
> As long as that information can reach parent/reaper of container's init
> the needed functionality should be there.

Sorry, can't understand.

Once again, why do you want SA_CLDREBOOT?

OK, you may be need to pass some info via siginfo (but note that it can
be lost, and you do not need a special helper for that). But why do you
want this hack in sa_flags?

Oleg.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
       [not found]   ` <20110814161707.GB30846-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-08-14 21:36     ` Serge E. Hallyn
  0 siblings, 0 replies; 64+ messages in thread
From: Serge E. Hallyn @ 2011-08-14 21:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Quoting Oleg Nesterov (oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org):
> On 08/11, Daniel Lezcano wrote:
> >
> > In the case of a VPS, when we shutdown/halt/reboot the container, the
> > reboot utility will invoke the sys_reboot syscall which has the bad
> > effect to reboot the host.
> 
> Stupid question. Can't sys_reboot() simply kill init (and thus the whole
> pid_ns) in this case?

The goal is to be able to distinguish a request for reboot from shutdown.
If we just kill the init, then the parent of init (the container monitor)
cannot restart the container to emulate reboot.

[ sorry if I misunderstand what you're asking ]

-serge

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
  2011-08-14 16:17 ` [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot Oleg Nesterov
@ 2011-08-14 21:36   ` Serge E. Hallyn
  2011-08-15 14:47     ` Oleg Nesterov
       [not found]     ` <20110814213642.GB13799-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
       [not found]   ` <20110814161707.GB30846-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 2 replies; 64+ messages in thread
From: Serge E. Hallyn @ 2011-08-14 21:36 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Daniel Lezcano, akpm, bonbons, containers, linux-kernel

Quoting Oleg Nesterov (oleg@redhat.com):
> On 08/11, Daniel Lezcano wrote:
> >
> > In the case of a VPS, when we shutdown/halt/reboot the container, the
> > reboot utility will invoke the sys_reboot syscall which has the bad
> > effect to reboot the host.
> 
> Stupid question. Can't sys_reboot() simply kill init (and thus the whole
> pid_ns) in this case?

The goal is to be able to distinguish a request for reboot from shutdown.
If we just kill the init, then the parent of init (the container monitor)
cannot restart the container to emulate reboot.

[ sorry if I misunderstand what you're asking ]

-serge

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
       [not found]     ` <20110814213642.GB13799-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
@ 2011-08-15 14:47       ` Oleg Nesterov
  0 siblings, 0 replies; 64+ messages in thread
From: Oleg Nesterov @ 2011-08-15 14:47 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 08/14, Serge E. Hallyn wrote:
>
> Quoting Oleg Nesterov (oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org):
> > On 08/11, Daniel Lezcano wrote:
> > >
> > > In the case of a VPS, when we shutdown/halt/reboot the container, the
> > > reboot utility will invoke the sys_reboot syscall which has the bad
> > > effect to reboot the host.
> >
> > Stupid question. Can't sys_reboot() simply kill init (and thus the whole
> > pid_ns) in this case?
>
> The goal is to be able to distinguish a request for reboot from shutdown.
> If we just kill the init, then the parent of init (the container monitor)
> cannot restart the container to emulate reboot.

OK, thanks.

What if init reports the reason it was killed?

Ignoring LINUX_REBOOT_CMD_CAD_/etc, I mean, roughly,

	- add "int reboot_cmd" into struct pid_namespace

	- sys_reboot(cmd) does

		if (!global_namespace) {
			task_active_pid_ns(current)->reboot_cmd = cmd;
			sigkill_my_init();
		}

	- change zap_pid_ns_processes() to do

		if (pid_ns->reboot_cmd)	// approximately
			current->exit_state = pid_ns->reboot_cmd;

Then its parent can look at status after wait(&status).


Not that I think this is very nice, but signals are not reliable.
And once again, SIGCHLD doesn't queue. And, perhaps this doesn't
matter, but sys_reboot() sends SIGCHLD and returns -EPERM, this
can confuse the container.


In any case. If you want to send a signal, please do not introduce
SA_CLDREBOOT. Please do not play with ptrace or __wake_up_parent, this
is meaningless. Just fill siginfo and send SIGCHLD unconditionally.

Oleg.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
  2011-08-14 21:36   ` Serge E. Hallyn
@ 2011-08-15 14:47     ` Oleg Nesterov
  2011-08-15 17:39       ` Serge E. Hallyn
                         ` (2 more replies)
       [not found]     ` <20110814213642.GB13799-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
  1 sibling, 3 replies; 64+ messages in thread
From: Oleg Nesterov @ 2011-08-15 14:47 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Daniel Lezcano, akpm, bonbons, containers, linux-kernel

On 08/14, Serge E. Hallyn wrote:
>
> Quoting Oleg Nesterov (oleg@redhat.com):
> > On 08/11, Daniel Lezcano wrote:
> > >
> > > In the case of a VPS, when we shutdown/halt/reboot the container, the
> > > reboot utility will invoke the sys_reboot syscall which has the bad
> > > effect to reboot the host.
> >
> > Stupid question. Can't sys_reboot() simply kill init (and thus the whole
> > pid_ns) in this case?
>
> The goal is to be able to distinguish a request for reboot from shutdown.
> If we just kill the init, then the parent of init (the container monitor)
> cannot restart the container to emulate reboot.

OK, thanks.

What if init reports the reason it was killed?

Ignoring LINUX_REBOOT_CMD_CAD_/etc, I mean, roughly,

	- add "int reboot_cmd" into struct pid_namespace

	- sys_reboot(cmd) does

		if (!global_namespace) {
			task_active_pid_ns(current)->reboot_cmd = cmd;
			sigkill_my_init();
		}

	- change zap_pid_ns_processes() to do

		if (pid_ns->reboot_cmd)	// approximately
			current->exit_state = pid_ns->reboot_cmd;

Then its parent can look at status after wait(&status).


Not that I think this is very nice, but signals are not reliable.
And once again, SIGCHLD doesn't queue. And, perhaps this doesn't
matter, but sys_reboot() sends SIGCHLD and returns -EPERM, this
can confuse the container.


In any case. If you want to send a signal, please do not introduce
SA_CLDREBOOT. Please do not play with ptrace or __wake_up_parent, this
is meaningless. Just fill siginfo and send SIGCHLD unconditionally.

Oleg.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
       [not found]       ` <20110815144744.GA9660-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-08-15 17:39         ` Serge E. Hallyn
  2011-08-18 23:46         ` Daniel Lezcano
  1 sibling, 0 replies; 64+ messages in thread
From: Serge E. Hallyn @ 2011-08-15 17:39 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Quoting Oleg Nesterov (oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org):
> On 08/14, Serge E. Hallyn wrote:
> >
> > Quoting Oleg Nesterov (oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org):
> > > On 08/11, Daniel Lezcano wrote:
> > > >
> > > > In the case of a VPS, when we shutdown/halt/reboot the container, the
> > > > reboot utility will invoke the sys_reboot syscall which has the bad
> > > > effect to reboot the host.
> > >
> > > Stupid question. Can't sys_reboot() simply kill init (and thus the whole
> > > pid_ns) in this case?
> >
> > The goal is to be able to distinguish a request for reboot from shutdown.
> > If we just kill the init, then the parent of init (the container monitor)
> > cannot restart the container to emulate reboot.
> 
> OK, thanks.
> 
> What if init reports the reason it was killed?
> 
> Ignoring LINUX_REBOOT_CMD_CAD_/etc, I mean, roughly,
> 
> 	- add "int reboot_cmd" into struct pid_namespace
> 
> 	- sys_reboot(cmd) does
> 
> 		if (!global_namespace) {
> 			task_active_pid_ns(current)->reboot_cmd = cmd;
> 			sigkill_my_init();
> 		}
> 
> 	- change zap_pid_ns_processes() to do
> 
> 		if (pid_ns->reboot_cmd)	// approximately
> 			current->exit_state = pid_ns->reboot_cmd;
> 
> Then its parent can look at status after wait(&status).

That looks good to me.  Daniel, is there a reason this wouldn't work?

> Not that I think this is very nice, but signals are not reliable.
> And once again, SIGCHLD doesn't queue. And, perhaps this doesn't
> matter, but sys_reboot() sends SIGCHLD and returns -EPERM, this
> can confuse the container.

I think you're looking at an older (Aug 11) version.  Daniel's newer
patch should do the right thing (but I don't seem to have it in front
of me atm)

> In any case. If you want to send a signal, please do not introduce
> SA_CLDREBOOT. Please do not play with ptrace or __wake_up_parent, this
> is meaningless. Just fill siginfo and send SIGCHLD unconditionally.
> 
> Oleg.

Thanks for the help, Oleg.

-serge

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
  2011-08-15 14:47     ` Oleg Nesterov
@ 2011-08-15 17:39       ` Serge E. Hallyn
  2011-08-15 17:50         ` Daniel Lezcano
       [not found]         ` <20110815173940.GA19620-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
       [not found]       ` <20110815144744.GA9660-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2011-08-18 23:46       ` Daniel Lezcano
  2 siblings, 2 replies; 64+ messages in thread
From: Serge E. Hallyn @ 2011-08-15 17:39 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Daniel Lezcano, akpm, bonbons, containers, linux-kernel

Quoting Oleg Nesterov (oleg@redhat.com):
> On 08/14, Serge E. Hallyn wrote:
> >
> > Quoting Oleg Nesterov (oleg@redhat.com):
> > > On 08/11, Daniel Lezcano wrote:
> > > >
> > > > In the case of a VPS, when we shutdown/halt/reboot the container, the
> > > > reboot utility will invoke the sys_reboot syscall which has the bad
> > > > effect to reboot the host.
> > >
> > > Stupid question. Can't sys_reboot() simply kill init (and thus the whole
> > > pid_ns) in this case?
> >
> > The goal is to be able to distinguish a request for reboot from shutdown.
> > If we just kill the init, then the parent of init (the container monitor)
> > cannot restart the container to emulate reboot.
> 
> OK, thanks.
> 
> What if init reports the reason it was killed?
> 
> Ignoring LINUX_REBOOT_CMD_CAD_/etc, I mean, roughly,
> 
> 	- add "int reboot_cmd" into struct pid_namespace
> 
> 	- sys_reboot(cmd) does
> 
> 		if (!global_namespace) {
> 			task_active_pid_ns(current)->reboot_cmd = cmd;
> 			sigkill_my_init();
> 		}
> 
> 	- change zap_pid_ns_processes() to do
> 
> 		if (pid_ns->reboot_cmd)	// approximately
> 			current->exit_state = pid_ns->reboot_cmd;
> 
> Then its parent can look at status after wait(&status).

That looks good to me.  Daniel, is there a reason this wouldn't work?

> Not that I think this is very nice, but signals are not reliable.
> And once again, SIGCHLD doesn't queue. And, perhaps this doesn't
> matter, but sys_reboot() sends SIGCHLD and returns -EPERM, this
> can confuse the container.

I think you're looking at an older (Aug 11) version.  Daniel's newer
patch should do the right thing (but I don't seem to have it in front
of me atm)

> In any case. If you want to send a signal, please do not introduce
> SA_CLDREBOOT. Please do not play with ptrace or __wake_up_parent, this
> is meaningless. Just fill siginfo and send SIGCHLD unconditionally.
> 
> Oleg.

Thanks for the help, Oleg.

-serge

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
       [not found]         ` <20110815173940.GA19620-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
@ 2011-08-15 17:50           ` Daniel Lezcano
  0 siblings, 0 replies; 64+ messages in thread
From: Daniel Lezcano @ 2011-08-15 17:50 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy, Oleg Nesterov,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 08/15/2011 07:39 PM, Serge E. Hallyn wrote:
> Quoting Oleg Nesterov (oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org):
>> On 08/14, Serge E. Hallyn wrote:
>>> Quoting Oleg Nesterov (oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org):
>>>> On 08/11, Daniel Lezcano wrote:
>>>>> In the case of a VPS, when we shutdown/halt/reboot the container, the
>>>>> reboot utility will invoke the sys_reboot syscall which has the bad
>>>>> effect to reboot the host.
>>>> Stupid question. Can't sys_reboot() simply kill init (and thus the whole
>>>> pid_ns) in this case?
>>> The goal is to be able to distinguish a request for reboot from shutdown.
>>> If we just kill the init, then the parent of init (the container monitor)
>>> cannot restart the container to emulate reboot.
>> OK, thanks.
>>
>> What if init reports the reason it was killed?
>>
>> Ignoring LINUX_REBOOT_CMD_CAD_/etc, I mean, roughly,
>>
>> 	- add "int reboot_cmd" into struct pid_namespace
>>
>> 	- sys_reboot(cmd) does
>>
>> 		if (!global_namespace) {
>> 			task_active_pid_ns(current)->reboot_cmd = cmd;
>> 			sigkill_my_init();
>> 		}
>>
>> 	- change zap_pid_ns_processes() to do
>>
>> 		if (pid_ns->reboot_cmd)	// approximately
>> 			current->exit_state = pid_ns->reboot_cmd;
>>
>> Then its parent can look at status after wait(&status).
> That looks good to me.  Daniel, is there a reason this wouldn't work?

No, it is an alternative we can try. Oleg raise a good point about the
SIGCHLD not being queue.
Let me write this patch and test it.

  -- Daniel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
  2011-08-15 17:39       ` Serge E. Hallyn
@ 2011-08-15 17:50         ` Daniel Lezcano
       [not found]         ` <20110815173940.GA19620-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
  1 sibling, 0 replies; 64+ messages in thread
From: Daniel Lezcano @ 2011-08-15 17:50 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Oleg Nesterov, akpm, bonbons, containers, linux-kernel

On 08/15/2011 07:39 PM, Serge E. Hallyn wrote:
> Quoting Oleg Nesterov (oleg@redhat.com):
>> On 08/14, Serge E. Hallyn wrote:
>>> Quoting Oleg Nesterov (oleg@redhat.com):
>>>> On 08/11, Daniel Lezcano wrote:
>>>>> In the case of a VPS, when we shutdown/halt/reboot the container, the
>>>>> reboot utility will invoke the sys_reboot syscall which has the bad
>>>>> effect to reboot the host.
>>>> Stupid question. Can't sys_reboot() simply kill init (and thus the whole
>>>> pid_ns) in this case?
>>> The goal is to be able to distinguish a request for reboot from shutdown.
>>> If we just kill the init, then the parent of init (the container monitor)
>>> cannot restart the container to emulate reboot.
>> OK, thanks.
>>
>> What if init reports the reason it was killed?
>>
>> Ignoring LINUX_REBOOT_CMD_CAD_/etc, I mean, roughly,
>>
>> 	- add "int reboot_cmd" into struct pid_namespace
>>
>> 	- sys_reboot(cmd) does
>>
>> 		if (!global_namespace) {
>> 			task_active_pid_ns(current)->reboot_cmd = cmd;
>> 			sigkill_my_init();
>> 		}
>>
>> 	- change zap_pid_ns_processes() to do
>>
>> 		if (pid_ns->reboot_cmd)	// approximately
>> 			current->exit_state = pid_ns->reboot_cmd;
>>
>> Then its parent can look at status after wait(&status).
> That looks good to me.  Daniel, is there a reason this wouldn't work?

No, it is an alternative we can try. Oleg raise a good point about the
SIGCHLD not being queue.
Let me write this patch and test it.

  -- Daniel



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
       [not found]       ` <20110815144744.GA9660-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2011-08-15 17:39         ` Serge E. Hallyn
@ 2011-08-18 23:46         ` Daniel Lezcano
  1 sibling, 0 replies; 64+ messages in thread
From: Daniel Lezcano @ 2011-08-18 23:46 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 08/15/2011 04:47 PM, Oleg Nesterov wrote:
> On 08/14, Serge E. Hallyn wrote:
>> Quoting Oleg Nesterov (oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org):
>>> On 08/11, Daniel Lezcano wrote:
>>>> In the case of a VPS, when we shutdown/halt/reboot the container, the
>>>> reboot utility will invoke the sys_reboot syscall which has the bad
>>>> effect to reboot the host.
>>> Stupid question. Can't sys_reboot() simply kill init (and thus the whole
>>> pid_ns) in this case?
>> The goal is to be able to distinguish a request for reboot from shutdown.
>> If we just kill the init, then the parent of init (the container monitor)
>> cannot restart the container to emulate reboot.
> OK, thanks.
>
> What if init reports the reason it was killed?
>
> Ignoring LINUX_REBOOT_CMD_CAD_/etc, I mean, roughly,
>
> 	- add "int reboot_cmd" into struct pid_namespace
>
> 	- sys_reboot(cmd) does
>
> 		if (!global_namespace) {
> 			task_active_pid_ns(current)->reboot_cmd = cmd;
> 			sigkill_my_init();
> 		}

Hi Oleg,

what would be your advice to get rid of from_ancestor_ns which prevent
the signal to be delivered to the init process ?

Thanks
  -- Daniel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
  2011-08-15 14:47     ` Oleg Nesterov
  2011-08-15 17:39       ` Serge E. Hallyn
       [not found]       ` <20110815144744.GA9660-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-08-18 23:46       ` Daniel Lezcano
  2011-08-19 15:24         ` Oleg Nesterov
       [not found]         ` <4E4DA461.8030006-GANU6spQydw@public.gmane.org>
  2 siblings, 2 replies; 64+ messages in thread
From: Daniel Lezcano @ 2011-08-18 23:46 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Serge E. Hallyn, akpm, bonbons, containers, linux-kernel

On 08/15/2011 04:47 PM, Oleg Nesterov wrote:
> On 08/14, Serge E. Hallyn wrote:
>> Quoting Oleg Nesterov (oleg@redhat.com):
>>> On 08/11, Daniel Lezcano wrote:
>>>> In the case of a VPS, when we shutdown/halt/reboot the container, the
>>>> reboot utility will invoke the sys_reboot syscall which has the bad
>>>> effect to reboot the host.
>>> Stupid question. Can't sys_reboot() simply kill init (and thus the whole
>>> pid_ns) in this case?
>> The goal is to be able to distinguish a request for reboot from shutdown.
>> If we just kill the init, then the parent of init (the container monitor)
>> cannot restart the container to emulate reboot.
> OK, thanks.
>
> What if init reports the reason it was killed?
>
> Ignoring LINUX_REBOOT_CMD_CAD_/etc, I mean, roughly,
>
> 	- add "int reboot_cmd" into struct pid_namespace
>
> 	- sys_reboot(cmd) does
>
> 		if (!global_namespace) {
> 			task_active_pid_ns(current)->reboot_cmd = cmd;
> 			sigkill_my_init();
> 		}

Hi Oleg,

what would be your advice to get rid of from_ancestor_ns which prevent
the signal to be delivered to the init process ?

Thanks
  -- Daniel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
       [not found]         ` <4E4DA461.8030006-GANU6spQydw@public.gmane.org>
@ 2011-08-19 15:24           ` Oleg Nesterov
  0 siblings, 0 replies; 64+ messages in thread
From: Oleg Nesterov @ 2011-08-19 15:24 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 08/19, Daniel Lezcano wrote:
>
> On 08/15/2011 04:47 PM, Oleg Nesterov wrote:
> >
> > 	- sys_reboot(cmd) does
> >
> > 		if (!global_namespace) {
> > 			task_active_pid_ns(current)->reboot_cmd = cmd;
> > 			sigkill_my_init();
> > 		}
>
> Hi Oleg,
>
> what would be your advice to get rid of from_ancestor_ns which prevent
> the signal to be delivered to the init process ?

Sure, a plain kill can't work. You can do force_sig_info(), this clears
SIGNAL_UNKILLABLE.

Hmm. But now I seem to recall we have other reasons to make the new
sigkill_task() helper... We will see. Anyway, force_ should work afaics.

Oleg.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
  2011-08-18 23:46       ` Daniel Lezcano
@ 2011-08-19 15:24         ` Oleg Nesterov
       [not found]           ` <20110819152416.GA17034-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2011-08-22 12:28           ` Daniel Lezcano
       [not found]         ` <4E4DA461.8030006-GANU6spQydw@public.gmane.org>
  1 sibling, 2 replies; 64+ messages in thread
From: Oleg Nesterov @ 2011-08-19 15:24 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: Serge E. Hallyn, akpm, bonbons, containers, linux-kernel

On 08/19, Daniel Lezcano wrote:
>
> On 08/15/2011 04:47 PM, Oleg Nesterov wrote:
> >
> > 	- sys_reboot(cmd) does
> >
> > 		if (!global_namespace) {
> > 			task_active_pid_ns(current)->reboot_cmd = cmd;
> > 			sigkill_my_init();
> > 		}
>
> Hi Oleg,
>
> what would be your advice to get rid of from_ancestor_ns which prevent
> the signal to be delivered to the init process ?

Sure, a plain kill can't work. You can do force_sig_info(), this clears
SIGNAL_UNKILLABLE.

Hmm. But now I seem to recall we have other reasons to make the new
sigkill_task() helper... We will see. Anyway, force_ should work afaics.

Oleg.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
       [not found]           ` <20110819152416.GA17034-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-08-22 12:28             ` Daniel Lezcano
  0 siblings, 0 replies; 64+ messages in thread
From: Daniel Lezcano @ 2011-08-22 12:28 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 08/19/2011 05:24 PM, Oleg Nesterov wrote:
> On 08/19, Daniel Lezcano wrote:
>> On 08/15/2011 04:47 PM, Oleg Nesterov wrote:
>>> 	- sys_reboot(cmd) does
>>>
>>> 		if (!global_namespace) {
>>> 			task_active_pid_ns(current)->reboot_cmd = cmd;
>>> 			sigkill_my_init();
>>> 		}
>> Hi Oleg,
>>
>> what would be your advice to get rid of from_ancestor_ns which prevent
>> the signal to be delivered to the init process ?
> Sure, a plain kill can't work. You can do force_sig_info(), this clears
> SIGNAL_UNKILLABLE.
>
> Hmm. But now I seem to recall we have other reasons to make the new
> sigkill_task() helper... We will see. Anyway, force_ should work afaics.

Thanks Oleg.

I wrote the patch by sending a signal to the init process of the pid
namespace using force_sig_info.
That works fine, thanks for the hint.

I am wondering what is the best way to transmit the reason of the reboot
to the parent of the container's init.

If we pass the reason to the exit_code of the init process, that will be
a bit weird as the process is signaled and did not exited  no ?
Furthermore, how to differentiate an application container (eg. a
script) exiting with an error with the same value of a reboot reason ?

Wouldn't make sense to let the user to specify a signal via prctl where
the si_code is filled with the reason ?

Without invoking the prctl, the init process is simply killed by the
kernel, otherwise we send the signal to the container's init.

From userspace:

void sigreboot(int sig, siginfo_t *si, void *private)
{
    switch(si->si_code) {
       case LINUX_REBOOT_CMD_RESTART:
        reboot_container();
        break;

      case LINUX_REBOOT_CMD_HALT:
        halt_container();
        break;

     ...
    }
}

  struct sigaction sa = {
    .sa_sigaction = sigreboot,
    .sa_flags =  SA_SIGINFO;  
 }
 
 sigaction(SIGUSR1, &sa, NULL);

 prctl(PR_SIGREBOOT, SIGUSR1);

 -----

 and from the kernel (called from sys_reboot):

 int kill_pid_ns(struct pid_namespace *pid_ns, int reason)
{
        struct task_struct *tsk;
        struct siginfo info;

        if (pid_ns->notifier) {

                         info.si_signo = SIGKILL;
                         info.si_errno = 0;
                         info.si_code = reason;
                         info.si_pid = 0;
                         info.si_uid = 0;

                         return force_sig_info(notifier->sig, &info,
notifier->tsk);
        }

        write_lock_irq(&tasklist_lock);
        tsk = pid_ns->child_reaper;
        write_unlock_irq(&tasklist_lock);

        info.si_signo = SIGKILL;
        info.si_errno = 0;
        info.si_code = SI_KERNEL;
        info.si_pid = 0;
        info.si_uid = 0;

        return force_sig_info(SIGKILL, &info, tsk);
}

Roughly, assuming pid_ns->notifier is reseted when we reparent to the
init_pid_ns.init.

What do you think ?

Thanks
  -- Daniel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
  2011-08-19 15:24         ` Oleg Nesterov
       [not found]           ` <20110819152416.GA17034-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-08-22 12:28           ` Daniel Lezcano
  2011-08-22 15:44             ` Oleg Nesterov
       [not found]             ` <4E524B73.3050704-GANU6spQydw@public.gmane.org>
  1 sibling, 2 replies; 64+ messages in thread
From: Daniel Lezcano @ 2011-08-22 12:28 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Serge E. Hallyn, akpm, bonbons, containers, linux-kernel

On 08/19/2011 05:24 PM, Oleg Nesterov wrote:
> On 08/19, Daniel Lezcano wrote:
>> On 08/15/2011 04:47 PM, Oleg Nesterov wrote:
>>> 	- sys_reboot(cmd) does
>>>
>>> 		if (!global_namespace) {
>>> 			task_active_pid_ns(current)->reboot_cmd = cmd;
>>> 			sigkill_my_init();
>>> 		}
>> Hi Oleg,
>>
>> what would be your advice to get rid of from_ancestor_ns which prevent
>> the signal to be delivered to the init process ?
> Sure, a plain kill can't work. You can do force_sig_info(), this clears
> SIGNAL_UNKILLABLE.
>
> Hmm. But now I seem to recall we have other reasons to make the new
> sigkill_task() helper... We will see. Anyway, force_ should work afaics.

Thanks Oleg.

I wrote the patch by sending a signal to the init process of the pid
namespace using force_sig_info.
That works fine, thanks for the hint.

I am wondering what is the best way to transmit the reason of the reboot
to the parent of the container's init.

If we pass the reason to the exit_code of the init process, that will be
a bit weird as the process is signaled and did not exited  no ?
Furthermore, how to differentiate an application container (eg. a
script) exiting with an error with the same value of a reboot reason ?

Wouldn't make sense to let the user to specify a signal via prctl where
the si_code is filled with the reason ?

Without invoking the prctl, the init process is simply killed by the
kernel, otherwise we send the signal to the container's init.

>From userspace:

void sigreboot(int sig, siginfo_t *si, void *private)
{
    switch(si->si_code) {
       case LINUX_REBOOT_CMD_RESTART:
        reboot_container();
        break;

      case LINUX_REBOOT_CMD_HALT:
        halt_container();
        break;

     ...
    }
}

  struct sigaction sa = {
    .sa_sigaction = sigreboot,
    .sa_flags =  SA_SIGINFO;  
 }
 
 sigaction(SIGUSR1, &sa, NULL);

 prctl(PR_SIGREBOOT, SIGUSR1);

 -----

 and from the kernel (called from sys_reboot):

 int kill_pid_ns(struct pid_namespace *pid_ns, int reason)
{
        struct task_struct *tsk;
        struct siginfo info;

        if (pid_ns->notifier) {

                         info.si_signo = SIGKILL;
                         info.si_errno = 0;
                         info.si_code = reason;
                         info.si_pid = 0;
                         info.si_uid = 0;

                         return force_sig_info(notifier->sig, &info,
notifier->tsk);
        }

        write_lock_irq(&tasklist_lock);
        tsk = pid_ns->child_reaper;
        write_unlock_irq(&tasklist_lock);

        info.si_signo = SIGKILL;
        info.si_errno = 0;
        info.si_code = SI_KERNEL;
        info.si_pid = 0;
        info.si_uid = 0;

        return force_sig_info(SIGKILL, &info, tsk);
}

Roughly, assuming pid_ns->notifier is reseted when we reparent to the
init_pid_ns.init.

What do you think ?

Thanks
  -- Daniel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
       [not found]             ` <4E524B73.3050704-GANU6spQydw@public.gmane.org>
@ 2011-08-22 15:44               ` Oleg Nesterov
  0 siblings, 0 replies; 64+ messages in thread
From: Oleg Nesterov @ 2011-08-22 15:44 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 08/22, Daniel Lezcano wrote:
>
> If we pass the reason to the exit_code of the init process, that will be
> a bit weird as the process is signaled and did not exited  no ?

Just in case, you shouldn't change ->exit_code blindly. We should only
change it if init was a) SIGKILL'ed and b) pid_ns->reboot_cmd is set.
In this case we can assume that it was killed by sys_reboot.

Now. I didn't really mean exit_state should be equal to sys_reboot's
cmd arg. I thought about something like

	swicth (reboot_cmd) {
	case LINUX_REBOOT_CMD_RESTART:
		code = SIGHUP;
		break;
	case LINUX_REBOOT_CMD_HALT:
		code = SIGINT;	// doesn't really matter what we report
		...
	}

we know that init can't be killed by SIGHUP/SIGINT, and this can't be
confused with the case when init does exit(exit_code).

But in fact I do not not think that WIFSIGNALED() is that important.
init shouldn't exit anyway.

> Furthermore, how to differentiate an application container (eg. a
> script) exiting with an error with the same value of a reboot reason ?

Well, I think it is better to fix the script than the kernel.


Daniel, I am not arguing. I agree that this looks like the hack anyway.
Just I think that other approaches are even worse imho. We should try
to make the kernel change as simple as possible.


> Wouldn't make sense to let the user to specify a signal via prctl where
> the si_code is filled with the reason ?

Sorry, I don't quite understand the idea...

And, iiuc, the point was to "fix" sys_reboot() so that we do not need
to mofify the distro/userspace?



In short. Please do what you like more. But I'd like you to know,
I'll argue with any complications which (afaics!) we can avoid,
I promise ;)

Oleg.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
  2011-08-22 12:28           ` Daniel Lezcano
@ 2011-08-22 15:44             ` Oleg Nesterov
       [not found]               ` <20110822154448.GA8527-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2011-08-22 16:31               ` Bruno Prémont
       [not found]             ` <4E524B73.3050704-GANU6spQydw@public.gmane.org>
  1 sibling, 2 replies; 64+ messages in thread
From: Oleg Nesterov @ 2011-08-22 15:44 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: Serge E. Hallyn, akpm, bonbons, containers, linux-kernel

On 08/22, Daniel Lezcano wrote:
>
> If we pass the reason to the exit_code of the init process, that will be
> a bit weird as the process is signaled and did not exited  no ?

Just in case, you shouldn't change ->exit_code blindly. We should only
change it if init was a) SIGKILL'ed and b) pid_ns->reboot_cmd is set.
In this case we can assume that it was killed by sys_reboot.

Now. I didn't really mean exit_state should be equal to sys_reboot's
cmd arg. I thought about something like

	swicth (reboot_cmd) {
	case LINUX_REBOOT_CMD_RESTART:
		code = SIGHUP;
		break;
	case LINUX_REBOOT_CMD_HALT:
		code = SIGINT;	// doesn't really matter what we report
		...
	}

we know that init can't be killed by SIGHUP/SIGINT, and this can't be
confused with the case when init does exit(exit_code).

But in fact I do not not think that WIFSIGNALED() is that important.
init shouldn't exit anyway.

> Furthermore, how to differentiate an application container (eg. a
> script) exiting with an error with the same value of a reboot reason ?

Well, I think it is better to fix the script than the kernel.


Daniel, I am not arguing. I agree that this looks like the hack anyway.
Just I think that other approaches are even worse imho. We should try
to make the kernel change as simple as possible.


> Wouldn't make sense to let the user to specify a signal via prctl where
> the si_code is filled with the reason ?

Sorry, I don't quite understand the idea...

And, iiuc, the point was to "fix" sys_reboot() so that we do not need
to mofify the distro/userspace?



In short. Please do what you like more. But I'd like you to know,
I'll argue with any complications which (afaics!) we can avoid,
I promise ;)

Oleg.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
       [not found]               ` <20110822154448.GA8527-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-08-22 16:31                 ` Bruno Prémont
  0 siblings, 0 replies; 64+ messages in thread
From: Bruno Prémont @ 2011-08-22 16:31 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Mon, 22 August 2011 Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On 08/22, Daniel Lezcano wrote:
> >
> > If we pass the reason to the exit_code of the init process, that will be
> > a bit weird as the process is signaled and did not exited  no ?
> 
> Just in case, you shouldn't change ->exit_code blindly. We should only
> change it if init was a) SIGKILL'ed and b) pid_ns->reboot_cmd is set.
> In this case we can assume that it was killed by sys_reboot.
> 
> Now. I didn't really mean exit_state should be equal to sys_reboot's
> cmd arg. I thought about something like
> 
> 	swicth (reboot_cmd) {
> 	case LINUX_REBOOT_CMD_RESTART:
> 		code = SIGHUP;
> 		break;
> 	case LINUX_REBOOT_CMD_HALT:
> 		code = SIGINT;	// doesn't really matter what we report
> 		...
> 	}

Isn't it possible to add the two cases to si_code possible values, e.g.
CDL_RESTART, CDL_HALT (or CDL_SYS_RESTART, CDL_SYS_HALT to avoid possible
confusion with CDL_STOPPED)?

Then on sys_reboot() flag container init and kill it (this way sys_reboot()
preserves its "will not return on success for restart/halt" scematic)?
Then container init would see CLD_KILLED replaced with matching reboot
reason.

> we know that init can't be killed by SIGHUP/SIGINT, and this can't be
> confused with the case when init does exit(exit_code).
> 
> But in fact I do not not think that WIFSIGNALED() is that important.
> init shouldn't exit anyway.

Playing with the exit code is probably more problematic (maybe even more
once a process can unshare PID_NS as then any process spawning children
may see the exit codes with special meaning and mis-reporting the child
as having failed when return code is non-zero).

> > Furthermore, how to differentiate an application container (eg. a
> > script) exiting with an error with the same value of a reboot reason ?
> 
> Well, I think it is better to fix the script than the kernel.

IMHO this depends a lot on the point-of-view regarding pid_ns versus
container where technically both may be the same though user might have
completely different setup in mind.

For container init should never exit by itself, for simple segmentation
the basic script approach makes a lot of sense and gets invisible to parent
once unshare(PID_NS) gets possible.

> Daniel, I am not arguing. I agree that this looks like the hack anyway.
> Just I think that other approaches are even worse imho. We should try
> to make the kernel change as simple as possible.
> 
> 
> > Wouldn't make sense to let the user to specify a signal via prctl where
> > the si_code is filled with the reason ?
> 
> Sorry, I don't quite understand the idea...
> 
> And, iiuc, the point was to "fix" sys_reboot() so that we do not need
> to mofify the distro/userspace?

That's definitely the goal (not modify distro/userspace running inside
container).

Bruno

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
  2011-08-22 15:44             ` Oleg Nesterov
       [not found]               ` <20110822154448.GA8527-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-08-22 16:31               ` Bruno Prémont
       [not found]                 ` <20110822183134.10390b46-hY15tx4IgV39zxVx7UNMDg@public.gmane.org>
  2011-08-22 17:39                 ` Oleg Nesterov
  1 sibling, 2 replies; 64+ messages in thread
From: Bruno Prémont @ 2011-08-22 16:31 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Daniel Lezcano, Serge E. Hallyn, akpm, containers, linux-kernel

On Mon, 22 August 2011 Oleg Nesterov <oleg@redhat.com> wrote:
> On 08/22, Daniel Lezcano wrote:
> >
> > If we pass the reason to the exit_code of the init process, that will be
> > a bit weird as the process is signaled and did not exited  no ?
> 
> Just in case, you shouldn't change ->exit_code blindly. We should only
> change it if init was a) SIGKILL'ed and b) pid_ns->reboot_cmd is set.
> In this case we can assume that it was killed by sys_reboot.
> 
> Now. I didn't really mean exit_state should be equal to sys_reboot's
> cmd arg. I thought about something like
> 
> 	swicth (reboot_cmd) {
> 	case LINUX_REBOOT_CMD_RESTART:
> 		code = SIGHUP;
> 		break;
> 	case LINUX_REBOOT_CMD_HALT:
> 		code = SIGINT;	// doesn't really matter what we report
> 		...
> 	}

Isn't it possible to add the two cases to si_code possible values, e.g.
CDL_RESTART, CDL_HALT (or CDL_SYS_RESTART, CDL_SYS_HALT to avoid possible
confusion with CDL_STOPPED)?

Then on sys_reboot() flag container init and kill it (this way sys_reboot()
preserves its "will not return on success for restart/halt" scematic)?
Then container init would see CLD_KILLED replaced with matching reboot
reason.

> we know that init can't be killed by SIGHUP/SIGINT, and this can't be
> confused with the case when init does exit(exit_code).
> 
> But in fact I do not not think that WIFSIGNALED() is that important.
> init shouldn't exit anyway.

Playing with the exit code is probably more problematic (maybe even more
once a process can unshare PID_NS as then any process spawning children
may see the exit codes with special meaning and mis-reporting the child
as having failed when return code is non-zero).

> > Furthermore, how to differentiate an application container (eg. a
> > script) exiting with an error with the same value of a reboot reason ?
> 
> Well, I think it is better to fix the script than the kernel.

IMHO this depends a lot on the point-of-view regarding pid_ns versus
container where technically both may be the same though user might have
completely different setup in mind.

For container init should never exit by itself, for simple segmentation
the basic script approach makes a lot of sense and gets invisible to parent
once unshare(PID_NS) gets possible.

> Daniel, I am not arguing. I agree that this looks like the hack anyway.
> Just I think that other approaches are even worse imho. We should try
> to make the kernel change as simple as possible.
> 
> 
> > Wouldn't make sense to let the user to specify a signal via prctl where
> > the si_code is filled with the reason ?
> 
> Sorry, I don't quite understand the idea...
> 
> And, iiuc, the point was to "fix" sys_reboot() so that we do not need
> to mofify the distro/userspace?

That's definitely the goal (not modify distro/userspace running inside
container).

Bruno

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
       [not found]                 ` <20110822183134.10390b46-hY15tx4IgV39zxVx7UNMDg@public.gmane.org>
@ 2011-08-22 17:39                   ` Oleg Nesterov
  0 siblings, 0 replies; 64+ messages in thread
From: Oleg Nesterov @ 2011-08-22 17:39 UTC (permalink / raw)
  To: Bruno Prémont
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 08/22, Bruno Prémont wrote:
>
> On Mon, 22 August 2011 Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > On 08/22, Daniel Lezcano wrote:
> > >
> > > If we pass the reason to the exit_code of the init process, that will be
> > > a bit weird as the process is signaled and did not exited  no ?
> >
> > Just in case, you shouldn't change ->exit_code blindly. We should only
> > change it if init was a) SIGKILL'ed and b) pid_ns->reboot_cmd is set.
> > In this case we can assume that it was killed by sys_reboot.
> >
> > Now. I didn't really mean exit_state should be equal to sys_reboot's
> > cmd arg. I thought about something like
> >
> > 	swicth (reboot_cmd) {
> > 	case LINUX_REBOOT_CMD_RESTART:
> > 		code = SIGHUP;
> > 		break;
> > 	case LINUX_REBOOT_CMD_HALT:
> > 		code = SIGINT;	// doesn't really matter what we report
> > 		...
> > 	}
>
> Isn't it possible to add the two cases to si_code possible values, e.g.
> CDL_RESTART, CDL_HALT (or CDL_SYS_RESTART, CDL_SYS_HALT

How? You should change do_wait() paths then. Even if we could, personally
I'd strongly object ;) Look, you have the very specific problem. The kernel
can't do everything to make everyone happy. There is tradeoff.

But if you really meant siginfo->si_code, I do not understand at all what
you actually mean. This info is not preserved when the task exits.

> to avoid possible
> confusion with CDL_STOPPED)?

How it is possible to confuse this with CDL_STOPPED?

> Then on sys_reboot() flag container init and kill it (this way sys_reboot()
> preserves its "will not return on success for restart/halt" scematic)?

This is what I suggested...

> Then container init would see CLD_KILLED replaced with matching reboot
> reason.

For what? its parent need this info, not container init. I guess I got
lost completely.

> Playing with the exit code is probably more problematic

OK, then please do something else. I do not pretend I really understand
what do you really need to solve your problem. But please do not forget
the kernel is already very complex and full of misc hacks ;)

> > And, iiuc, the point was to "fix" sys_reboot() so that we do not need
> > to mofify the distro/userspace?
>
> That's definitely the goal (not modify distro/userspace running inside
> container).

In this case I do not understand how prctl() can help.


But please do not try to convince me, this is simply unnecessary ;)

Oleg.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
  2011-08-22 16:31               ` Bruno Prémont
       [not found]                 ` <20110822183134.10390b46-hY15tx4IgV39zxVx7UNMDg@public.gmane.org>
@ 2011-08-22 17:39                 ` Oleg Nesterov
       [not found]                   ` <20110822173949.GA13242-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2011-08-22 19:17                   ` Bruno Prémont
  1 sibling, 2 replies; 64+ messages in thread
From: Oleg Nesterov @ 2011-08-22 17:39 UTC (permalink / raw)
  To: Bruno Prémont
  Cc: Daniel Lezcano, Serge E. Hallyn, akpm, containers, linux-kernel

On 08/22, Bruno Prémont wrote:
>
> On Mon, 22 August 2011 Oleg Nesterov <oleg@redhat.com> wrote:
> > On 08/22, Daniel Lezcano wrote:
> > >
> > > If we pass the reason to the exit_code of the init process, that will be
> > > a bit weird as the process is signaled and did not exited  no ?
> >
> > Just in case, you shouldn't change ->exit_code blindly. We should only
> > change it if init was a) SIGKILL'ed and b) pid_ns->reboot_cmd is set.
> > In this case we can assume that it was killed by sys_reboot.
> >
> > Now. I didn't really mean exit_state should be equal to sys_reboot's
> > cmd arg. I thought about something like
> >
> > 	swicth (reboot_cmd) {
> > 	case LINUX_REBOOT_CMD_RESTART:
> > 		code = SIGHUP;
> > 		break;
> > 	case LINUX_REBOOT_CMD_HALT:
> > 		code = SIGINT;	// doesn't really matter what we report
> > 		...
> > 	}
>
> Isn't it possible to add the two cases to si_code possible values, e.g.
> CDL_RESTART, CDL_HALT (or CDL_SYS_RESTART, CDL_SYS_HALT

How? You should change do_wait() paths then. Even if we could, personally
I'd strongly object ;) Look, you have the very specific problem. The kernel
can't do everything to make everyone happy. There is tradeoff.

But if you really meant siginfo->si_code, I do not understand at all what
you actually mean. This info is not preserved when the task exits.

> to avoid possible
> confusion with CDL_STOPPED)?

How it is possible to confuse this with CDL_STOPPED?

> Then on sys_reboot() flag container init and kill it (this way sys_reboot()
> preserves its "will not return on success for restart/halt" scematic)?

This is what I suggested...

> Then container init would see CLD_KILLED replaced with matching reboot
> reason.

For what? its parent need this info, not container init. I guess I got
lost completely.

> Playing with the exit code is probably more problematic

OK, then please do something else. I do not pretend I really understand
what do you really need to solve your problem. But please do not forget
the kernel is already very complex and full of misc hacks ;)

> > And, iiuc, the point was to "fix" sys_reboot() so that we do not need
> > to mofify the distro/userspace?
>
> That's definitely the goal (not modify distro/userspace running inside
> container).

In this case I do not understand how prctl() can help.


But please do not try to convince me, this is simply unnecessary ;)

Oleg.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
       [not found]                   ` <20110822173949.GA13242-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-08-22 19:17                     ` Bruno Prémont
  0 siblings, 0 replies; 64+ messages in thread
From: Bruno Prémont @ 2011-08-22 19:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Mon, 22 August 2011 Oleg Nesterov wrote:
> On 08/22, Bruno Prémont wrote:
> >
> > On Mon, 22 August 2011 Oleg Nesterov <oleg@redhat.com> wrote:
> > > On 08/22, Daniel Lezcano wrote:
> > > >
> > > > If we pass the reason to the exit_code of the init process, that will be
> > > > a bit weird as the process is signaled and did not exited  no ?
> > >
> > > Just in case, you shouldn't change ->exit_code blindly. We should only
> > > change it if init was a) SIGKILL'ed and b) pid_ns->reboot_cmd is set.
> > > In this case we can assume that it was killed by sys_reboot.
> > >
> > > Now. I didn't really mean exit_state should be equal to sys_reboot's
> > > cmd arg. I thought about something like
> > >
> > > 	swicth (reboot_cmd) {
> > > 	case LINUX_REBOOT_CMD_RESTART:
> > > 		code = SIGHUP;
> > > 		break;
> > > 	case LINUX_REBOOT_CMD_HALT:
> > > 		code = SIGINT;	// doesn't really matter what we report
> > > 		...
> > > 	}
> >
> > Isn't it possible to add the two cases to si_code possible values, e.g.
> > CDL_RESTART, CDL_HALT (or CDL_SYS_RESTART, CDL_SYS_HALT
> 
> How? You should change do_wait() paths then. Even if we could, personally
> I'd strongly object ;) Look, you have the very specific problem. The kernel
> can't do everything to make everyone happy. There is tradeoff.
> 
> But if you really meant siginfo->si_code, I do not understand at all what
> you actually mean. This info is not preserved when the task exits.

I've been reading do_wait() code a bit, it decides between CLD_KILLED and
CLD_DUMPED based on a bit of (struct task_struct).exit_code.
So struct_task IS still available (how about it's namespace references? If
the namespaces are not the reboot reason would need to be stored somewhere
inside of struct task which might be some overhead too much)

So as long as container init's task_struct exists the reboot reason could
be preserved and used to replace CLD_DUMPED/CLD_KILLED siginfo->si_code.

So yes, tradeoff then is where to adjust code in order to get the information
along.

> > to avoid possible
> > confusion with CDL_STOPPED)?
> 
> How it is possible to confuse this with CDL_STOPPED?

Confusion is usually reading too quickly! (e.g. skipping, overseeing,
not realizing the details)

> > Then on sys_reboot() flag container init and kill it (this way sys_reboot()
> > preserves its "will not return on success for restart/halt" scematic)?
> 
> This is what I suggested...
> 
> > Then container init would see CLD_KILLED replaced with matching reboot
> > reason.
> 
> For what? its parent need this info, not container init. I guess I got
> lost completely.

I meant CLD_KILLED as visible to parent in siginfo.
Sure, as container's init gets killed it doesn't see anything itself.

> > Playing with the exit code is probably more problematic
> 
> OK, then please do something else. I do not pretend I really understand
> what do you really need to solve your problem. But please do not forget
> the kernel is already very complex and full of misc hacks ;)
> 
> > > And, iiuc, the point was to "fix" sys_reboot() so that we do not need
> > > to mofify the distro/userspace?
> >
> > That's definitely the goal (not modify distro/userspace running inside
> > container).
> 
> In this case I do not understand how prctl() can help.

I'm not talking about prctl() - as I understand Daniel the prctl() part is
for the process outside of the container, not the one inside.
So for container hypervisor to say if it wants to get informed or not.

> But please do not try to convince me, this is simply unnecessary ;)

No, trying to know what's reasonably possible without having to do a second
iteration when a new (foreseeable) features arrives.
And also hopefully having the same understanding of the issue.

Bruno
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
  2011-08-22 17:39                 ` Oleg Nesterov
       [not found]                   ` <20110822173949.GA13242-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-08-22 19:17                   ` Bruno Prémont
  2011-08-23 13:33                     ` Oleg Nesterov
       [not found]                     ` <20110822211716.7c141d5c-hY15tx4IgV39zxVx7UNMDg@public.gmane.org>
  1 sibling, 2 replies; 64+ messages in thread
From: Bruno Prémont @ 2011-08-22 19:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Daniel Lezcano, Serge E. Hallyn, akpm, containers, linux-kernel

On Mon, 22 August 2011 Oleg Nesterov wrote:
> On 08/22, Bruno Prémont wrote:
> >
> > On Mon, 22 August 2011 Oleg Nesterov <oleg@redhat.com> wrote:
> > > On 08/22, Daniel Lezcano wrote:
> > > >
> > > > If we pass the reason to the exit_code of the init process, that will be
> > > > a bit weird as the process is signaled and did not exited  no ?
> > >
> > > Just in case, you shouldn't change ->exit_code blindly. We should only
> > > change it if init was a) SIGKILL'ed and b) pid_ns->reboot_cmd is set.
> > > In this case we can assume that it was killed by sys_reboot.
> > >
> > > Now. I didn't really mean exit_state should be equal to sys_reboot's
> > > cmd arg. I thought about something like
> > >
> > > 	swicth (reboot_cmd) {
> > > 	case LINUX_REBOOT_CMD_RESTART:
> > > 		code = SIGHUP;
> > > 		break;
> > > 	case LINUX_REBOOT_CMD_HALT:
> > > 		code = SIGINT;	// doesn't really matter what we report
> > > 		...
> > > 	}
> >
> > Isn't it possible to add the two cases to si_code possible values, e.g.
> > CDL_RESTART, CDL_HALT (or CDL_SYS_RESTART, CDL_SYS_HALT
> 
> How? You should change do_wait() paths then. Even if we could, personally
> I'd strongly object ;) Look, you have the very specific problem. The kernel
> can't do everything to make everyone happy. There is tradeoff.
> 
> But if you really meant siginfo->si_code, I do not understand at all what
> you actually mean. This info is not preserved when the task exits.

I've been reading do_wait() code a bit, it decides between CLD_KILLED and
CLD_DUMPED based on a bit of (struct task_struct).exit_code.
So struct_task IS still available (how about it's namespace references? If
the namespaces are not the reboot reason would need to be stored somewhere
inside of struct task which might be some overhead too much)

So as long as container init's task_struct exists the reboot reason could
be preserved and used to replace CLD_DUMPED/CLD_KILLED siginfo->si_code.

So yes, tradeoff then is where to adjust code in order to get the information
along.

> > to avoid possible
> > confusion with CDL_STOPPED)?
> 
> How it is possible to confuse this with CDL_STOPPED?

Confusion is usually reading too quickly! (e.g. skipping, overseeing,
not realizing the details)

> > Then on sys_reboot() flag container init and kill it (this way sys_reboot()
> > preserves its "will not return on success for restart/halt" scematic)?
> 
> This is what I suggested...
> 
> > Then container init would see CLD_KILLED replaced with matching reboot
> > reason.
> 
> For what? its parent need this info, not container init. I guess I got
> lost completely.

I meant CLD_KILLED as visible to parent in siginfo.
Sure, as container's init gets killed it doesn't see anything itself.

> > Playing with the exit code is probably more problematic
> 
> OK, then please do something else. I do not pretend I really understand
> what do you really need to solve your problem. But please do not forget
> the kernel is already very complex and full of misc hacks ;)
> 
> > > And, iiuc, the point was to "fix" sys_reboot() so that we do not need
> > > to mofify the distro/userspace?
> >
> > That's definitely the goal (not modify distro/userspace running inside
> > container).
> 
> In this case I do not understand how prctl() can help.

I'm not talking about prctl() - as I understand Daniel the prctl() part is
for the process outside of the container, not the one inside.
So for container hypervisor to say if it wants to get informed or not.

> But please do not try to convince me, this is simply unnecessary ;)

No, trying to know what's reasonably possible without having to do a second
iteration when a new (foreseeable) features arrives.
And also hopefully having the same understanding of the issue.

Bruno

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
       [not found]                     ` <20110822211716.7c141d5c-hY15tx4IgV39zxVx7UNMDg@public.gmane.org>
@ 2011-08-23 13:33                       ` Oleg Nesterov
  0 siblings, 0 replies; 64+ messages in thread
From: Oleg Nesterov @ 2011-08-23 13:33 UTC (permalink / raw)
  To: Bruno Prémont
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 08/22, Bruno Prémont wrote:
>
> On Mon, 22 August 2011 Oleg Nesterov wrote:
> > On 08/22, Bruno Prémont wrote:
> > >
> > > Isn't it possible to add the two cases to si_code possible values, e.g.
> > > CDL_RESTART, CDL_HALT (or CDL_SYS_RESTART, CDL_SYS_HALT
> >
> > How? You should change do_wait() paths then. Even if we could, personally
> > I'd strongly object ;) Look, you have the very specific problem. The kernel
> > can't do everything to make everyone happy. There is tradeoff.
> >
> > But if you really meant siginfo->si_code, I do not understand at all what
> > you actually mean. This info is not preserved when the task exits.
>
> I've been reading do_wait() code a bit, it decides between CLD_KILLED and
> CLD_DUMPED based on a bit of (struct task_struct).exit_code.

Yes. But we were talking about the CLD_EXITED/CLD_KILLED difference.
And in fact this CLD_ doesn't matter at all. sys_waitid(info) can see
it, but you can simply look at "status". There is no additional info.

> So struct_task IS still available

Sure. But I do not understand why do you mention this... And, I think,
in this discussion we can pretend that only task->exit_code is still
available.

> (how about it's namespace references? If
> the namespaces are not the reboot reason would need to be stored somewhere
> inside of struct task which might be some overhead too much)
>
> So as long as container init's task_struct exists the reboot reason could
> be preserved and used to replace CLD_DUMPED/CLD_KILLED siginfo->si_code.

At least now I understand why did you mention si_code/CLD before. You
meant waitid(). I thought you were talking about the death-notifications
which can't report CLD_ you need.

I strongly object. We shouldn't uglify wait_task_zombie() to solve the
very specific problem.

And once again. sub_init->parent does wiat(&status) and sees
WIFSIGNALED() && WTERMSIG(status) == SIGHUP. This can only mean that
sys_reboot(LINUX_REBOOT_CMD_RESTART) was called. It _can not_ be really
killed by SIGHUP, it must be CMD_RESTART.

Why this can't work? Why do you want the additional complications?

> > > > And, iiuc, the point was to "fix" sys_reboot() so that we do not need
> > > > to mofify the distro/userspace?
> > >
> > > That's definitely the goal (not modify distro/userspace running inside
> > > container).
> >
> > In this case I do not understand how prctl() can help.
>
> I'm not talking about prctl() - as I understand Daniel the prctl() part is
> for the process outside of the container, not the one inside.
> So for container hypervisor to say if it wants to get informed or not.

OK, I certainly misunderstood him. And still can't understand what
exactly was suggested ;)

> > But please do not try to convince me, this is simply unnecessary ;)
>
> No, trying to know what's reasonably possible

Can't resist... IMHO, imho, imho, but in this case I believe
"reasonably possible" == "simplest" ;)

Oleg.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
  2011-08-22 19:17                   ` Bruno Prémont
@ 2011-08-23 13:33                     ` Oleg Nesterov
  2011-08-23 14:09                       ` Greg Kurz
       [not found]                       ` <20110823133302.GA19582-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
       [not found]                     ` <20110822211716.7c141d5c-hY15tx4IgV39zxVx7UNMDg@public.gmane.org>
  1 sibling, 2 replies; 64+ messages in thread
From: Oleg Nesterov @ 2011-08-23 13:33 UTC (permalink / raw)
  To: Bruno Prémont
  Cc: Daniel Lezcano, Serge E. Hallyn, akpm, containers, linux-kernel

On 08/22, Bruno Prémont wrote:
>
> On Mon, 22 August 2011 Oleg Nesterov wrote:
> > On 08/22, Bruno Prémont wrote:
> > >
> > > Isn't it possible to add the two cases to si_code possible values, e.g.
> > > CDL_RESTART, CDL_HALT (or CDL_SYS_RESTART, CDL_SYS_HALT
> >
> > How? You should change do_wait() paths then. Even if we could, personally
> > I'd strongly object ;) Look, you have the very specific problem. The kernel
> > can't do everything to make everyone happy. There is tradeoff.
> >
> > But if you really meant siginfo->si_code, I do not understand at all what
> > you actually mean. This info is not preserved when the task exits.
>
> I've been reading do_wait() code a bit, it decides between CLD_KILLED and
> CLD_DUMPED based on a bit of (struct task_struct).exit_code.

Yes. But we were talking about the CLD_EXITED/CLD_KILLED difference.
And in fact this CLD_ doesn't matter at all. sys_waitid(info) can see
it, but you can simply look at "status". There is no additional info.

> So struct_task IS still available

Sure. But I do not understand why do you mention this... And, I think,
in this discussion we can pretend that only task->exit_code is still
available.

> (how about it's namespace references? If
> the namespaces are not the reboot reason would need to be stored somewhere
> inside of struct task which might be some overhead too much)
>
> So as long as container init's task_struct exists the reboot reason could
> be preserved and used to replace CLD_DUMPED/CLD_KILLED siginfo->si_code.

At least now I understand why did you mention si_code/CLD before. You
meant waitid(). I thought you were talking about the death-notifications
which can't report CLD_ you need.

I strongly object. We shouldn't uglify wait_task_zombie() to solve the
very specific problem.

And once again. sub_init->parent does wiat(&status) and sees
WIFSIGNALED() && WTERMSIG(status) == SIGHUP. This can only mean that
sys_reboot(LINUX_REBOOT_CMD_RESTART) was called. It _can not_ be really
killed by SIGHUP, it must be CMD_RESTART.

Why this can't work? Why do you want the additional complications?

> > > > And, iiuc, the point was to "fix" sys_reboot() so that we do not need
> > > > to mofify the distro/userspace?
> > >
> > > That's definitely the goal (not modify distro/userspace running inside
> > > container).
> >
> > In this case I do not understand how prctl() can help.
>
> I'm not talking about prctl() - as I understand Daniel the prctl() part is
> for the process outside of the container, not the one inside.
> So for container hypervisor to say if it wants to get informed or not.

OK, I certainly misunderstood him. And still can't understand what
exactly was suggested ;)

> > But please do not try to convince me, this is simply unnecessary ;)
>
> No, trying to know what's reasonably possible

Can't resist... IMHO, imho, imho, but in this case I believe
"reasonably possible" == "simplest" ;)

Oleg.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
       [not found]                       ` <20110823133302.GA19582-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-08-23 14:09                         ` Greg Kurz
  0 siblings, 0 replies; 64+ messages in thread
From: Greg Kurz @ 2011-08-23 14:09 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Bruno Prémont,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Tue, 2011-08-23 at 15:33 +0200, Oleg Nesterov wrote:
> [...]
> At least now I understand why did you mention si_code/CLD before. You
> meant waitid(). I thought you were talking about the death-notifications
> which can't report CLD_ you need.
> 
> I strongly object. We shouldn't uglify wait_task_zombie() to solve the
> very specific problem.
> 
> And once again. sub_init->parent does wiat(&status) and sees
> WIFSIGNALED() && WTERMSIG(status) == SIGHUP. This can only mean that
> sys_reboot(LINUX_REBOOT_CMD_RESTART) was called. It _can not_ be really
> killed by SIGHUP, it must be CMD_RESTART.
> 
> Why this can't work? Why do you want the additional complications?
> 

I don't see either what could go wrong with you approach. It doesn't
mess with critical wait() or signal paths. It's definitely the way to
go.

Thanks.

-- 
Gregory Kurz                                     gkurz-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org
Software Engineer @ IBM/Meiosys                  http://www.ibm.com
Tel +33 (0)534 638 479                           Fax +33 (0)561 400 420

"Anarchy is about taking complete responsibility for yourself."
        Alan Moore.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
  2011-08-23 13:33                     ` Oleg Nesterov
@ 2011-08-23 14:09                       ` Greg Kurz
       [not found]                         ` <1314108566.3465.29.camel-GiB8zCg7hOfDOqzlkpFKJg@public.gmane.org>
  2011-08-23 14:29                         ` Oleg Nesterov
       [not found]                       ` <20110823133302.GA19582-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 2 replies; 64+ messages in thread
From: Greg Kurz @ 2011-08-23 14:09 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Bruno Prémont, containers, akpm, linux-kernel

On Tue, 2011-08-23 at 15:33 +0200, Oleg Nesterov wrote:
> [...]
> At least now I understand why did you mention si_code/CLD before. You
> meant waitid(). I thought you were talking about the death-notifications
> which can't report CLD_ you need.
> 
> I strongly object. We shouldn't uglify wait_task_zombie() to solve the
> very specific problem.
> 
> And once again. sub_init->parent does wiat(&status) and sees
> WIFSIGNALED() && WTERMSIG(status) == SIGHUP. This can only mean that
> sys_reboot(LINUX_REBOOT_CMD_RESTART) was called. It _can not_ be really
> killed by SIGHUP, it must be CMD_RESTART.
> 
> Why this can't work? Why do you want the additional complications?
> 

I don't see either what could go wrong with you approach. It doesn't
mess with critical wait() or signal paths. It's definitely the way to
go.

Thanks.

-- 
Gregory Kurz                                     gkurz@fr.ibm.com
Software Engineer @ IBM/Meiosys                  http://www.ibm.com
Tel +33 (0)534 638 479                           Fax +33 (0)561 400 420

"Anarchy is about taking complete responsibility for yourself."
        Alan Moore.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
       [not found]                         ` <1314108566.3465.29.camel-GiB8zCg7hOfDOqzlkpFKJg@public.gmane.org>
@ 2011-08-23 14:29                           ` Oleg Nesterov
  0 siblings, 0 replies; 64+ messages in thread
From: Oleg Nesterov @ 2011-08-23 14:29 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Bruno Prémont,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 08/23, Greg Kurz wrote:
>
> On Tue, 2011-08-23 at 15:33 +0200, Oleg Nesterov wrote:
> > [...]
> > At least now I understand why did you mention si_code/CLD before. You
> > meant waitid(). I thought you were talking about the death-notifications
> > which can't report CLD_ you need.
> >
> > I strongly object. We shouldn't uglify wait_task_zombie() to solve the
> > very specific problem.
> >
> > And once again. sub_init->parent does wiat(&status) and sees
> > WIFSIGNALED() && WTERMSIG(status) == SIGHUP. This can only mean that
> > sys_reboot(LINUX_REBOOT_CMD_RESTART) was called. It _can not_ be really
> > killed by SIGHUP, it must be CMD_RESTART.
> >
> > Why this can't work? Why do you want the additional complications?
> >
>
> I don't see either what could go wrong with you approach.

Thanks ;)

Just in case... instead of WIFSIGNALED/WTERMSIG we can also report
the exit code in the upper bits. I mean,

	switch (reboot_cmd) {
	case LINUX_REBOOT_CMD_RESTART:
		code = 1 << 16;
		break;
	case LINUX_REBOOT_CMD_HALT:
		code = 2 << 16;
		break;
	}

this can't be confused with the normal exit(code), just the parent
should be careful, I am not sure this can't confuse WIFEXITED/WEXITSTATUS
user-space macroses.

wait(&status) takes "int *", we have a room for additional info, and
wait_task_zombie() simply copies exit_code.

Oleg.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
  2011-08-23 14:09                       ` Greg Kurz
       [not found]                         ` <1314108566.3465.29.camel-GiB8zCg7hOfDOqzlkpFKJg@public.gmane.org>
@ 2011-08-23 14:29                         ` Oleg Nesterov
       [not found]                           ` <20110823142914.GA22593-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2011-08-24 19:44                           ` Bruno Prémont
  1 sibling, 2 replies; 64+ messages in thread
From: Oleg Nesterov @ 2011-08-23 14:29 UTC (permalink / raw)
  To: Greg Kurz; +Cc: Bruno Prémont, containers, akpm, linux-kernel

On 08/23, Greg Kurz wrote:
>
> On Tue, 2011-08-23 at 15:33 +0200, Oleg Nesterov wrote:
> > [...]
> > At least now I understand why did you mention si_code/CLD before. You
> > meant waitid(). I thought you were talking about the death-notifications
> > which can't report CLD_ you need.
> >
> > I strongly object. We shouldn't uglify wait_task_zombie() to solve the
> > very specific problem.
> >
> > And once again. sub_init->parent does wiat(&status) and sees
> > WIFSIGNALED() && WTERMSIG(status) == SIGHUP. This can only mean that
> > sys_reboot(LINUX_REBOOT_CMD_RESTART) was called. It _can not_ be really
> > killed by SIGHUP, it must be CMD_RESTART.
> >
> > Why this can't work? Why do you want the additional complications?
> >
>
> I don't see either what could go wrong with you approach.

Thanks ;)

Just in case... instead of WIFSIGNALED/WTERMSIG we can also report
the exit code in the upper bits. I mean,

	switch (reboot_cmd) {
	case LINUX_REBOOT_CMD_RESTART:
		code = 1 << 16;
		break;
	case LINUX_REBOOT_CMD_HALT:
		code = 2 << 16;
		break;
	}

this can't be confused with the normal exit(code), just the parent
should be careful, I am not sure this can't confuse WIFEXITED/WEXITSTATUS
user-space macroses.

wait(&status) takes "int *", we have a room for additional info, and
wait_task_zombie() simply copies exit_code.

Oleg.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
       [not found]                           ` <20110823142914.GA22593-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-08-24 19:44                             ` Bruno Prémont
  0 siblings, 0 replies; 64+ messages in thread
From: Bruno Prémont @ 2011-08-24 19:44 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Tue, 23 August 2011 Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On 08/23, Greg Kurz wrote:
> >
> > On Tue, 2011-08-23 at 15:33 +0200, Oleg Nesterov wrote:
> > > [...]
> > > At least now I understand why did you mention si_code/CLD before. You
> > > meant waitid(). I thought you were talking about the death-notifications
> > > which can't report CLD_ you need.
> > >
> > > I strongly object. We shouldn't uglify wait_task_zombie() to solve the
> > > very specific problem.
> > >
> > > And once again. sub_init->parent does wiat(&status) and sees
> > > WIFSIGNALED() && WTERMSIG(status) == SIGHUP. This can only mean that
> > > sys_reboot(LINUX_REBOOT_CMD_RESTART) was called. It _can not_ be really
> > > killed by SIGHUP, it must be CMD_RESTART.
> > >
> > > Why this can't work? Why do you want the additional complications?
> > >
> >
> > I don't see either what could go wrong with you approach.
> 
> Thanks ;)
> 
> Just in case... instead of WIFSIGNALED/WTERMSIG we can also report
> the exit code in the upper bits. I mean,
> 
> 	switch (reboot_cmd) {
> 	case LINUX_REBOOT_CMD_RESTART:
> 		code = 1 << 16;
> 		break;
> 	case LINUX_REBOOT_CMD_HALT:
> 		code = 2 << 16;
> 		break;
> 	}

That looks nice and simple!

> this can't be confused with the normal exit(code), just the parent
> should be careful, I am not sure this can't confuse WIFEXITED/WEXITSTATUS
> user-space macroses.

For recent glibc (2.12.2):
sys/wait.h and stdlib.h:
# define WIFEXITED(status) __WIFEXITED (__WAIT_INT (status))
# define WEXITSTATUS(status)       __WEXITSTATUS (__WAIT_INT (status))
bits/waitstatus.h:
#define     __WIFEXITED(status)     (__WTERMSIG(status) == 0)
#define     __WTERMSIG(status)      ((status) & 0x7f)
#define     __WEXITSTATUS(status)   (((status) & 0xff00) >> 8)


For this version of glibc on x86 and amd64 it looks fine.
Would have to look at other versions and other libcs to make sure there
are no surprises.

Bruno

> wait(&status) takes "int *", we have a room for additional info, and
> wait_task_zombie() simply copies exit_code.
> 
> Oleg.
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
  2011-08-23 14:29                         ` Oleg Nesterov
       [not found]                           ` <20110823142914.GA22593-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-08-24 19:44                           ` Bruno Prémont
  2011-08-25 15:37                             ` Oleg Nesterov
       [not found]                             ` <20110824214418.474b24c6-hY15tx4IgV39zxVx7UNMDg@public.gmane.org>
  1 sibling, 2 replies; 64+ messages in thread
From: Bruno Prémont @ 2011-08-24 19:44 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Greg Kurz, containers, akpm, linux-kernel

On Tue, 23 August 2011 Oleg Nesterov <oleg@redhat.com> wrote:
> On 08/23, Greg Kurz wrote:
> >
> > On Tue, 2011-08-23 at 15:33 +0200, Oleg Nesterov wrote:
> > > [...]
> > > At least now I understand why did you mention si_code/CLD before. You
> > > meant waitid(). I thought you were talking about the death-notifications
> > > which can't report CLD_ you need.
> > >
> > > I strongly object. We shouldn't uglify wait_task_zombie() to solve the
> > > very specific problem.
> > >
> > > And once again. sub_init->parent does wiat(&status) and sees
> > > WIFSIGNALED() && WTERMSIG(status) == SIGHUP. This can only mean that
> > > sys_reboot(LINUX_REBOOT_CMD_RESTART) was called. It _can not_ be really
> > > killed by SIGHUP, it must be CMD_RESTART.
> > >
> > > Why this can't work? Why do you want the additional complications?
> > >
> >
> > I don't see either what could go wrong with you approach.
> 
> Thanks ;)
> 
> Just in case... instead of WIFSIGNALED/WTERMSIG we can also report
> the exit code in the upper bits. I mean,
> 
> 	switch (reboot_cmd) {
> 	case LINUX_REBOOT_CMD_RESTART:
> 		code = 1 << 16;
> 		break;
> 	case LINUX_REBOOT_CMD_HALT:
> 		code = 2 << 16;
> 		break;
> 	}

That looks nice and simple!

> this can't be confused with the normal exit(code), just the parent
> should be careful, I am not sure this can't confuse WIFEXITED/WEXITSTATUS
> user-space macroses.

For recent glibc (2.12.2):
sys/wait.h and stdlib.h:
# define WIFEXITED(status) __WIFEXITED (__WAIT_INT (status))
# define WEXITSTATUS(status)       __WEXITSTATUS (__WAIT_INT (status))
bits/waitstatus.h:
#define     __WIFEXITED(status)     (__WTERMSIG(status) == 0)
#define     __WTERMSIG(status)      ((status) & 0x7f)
#define     __WEXITSTATUS(status)   (((status) & 0xff00) >> 8)


For this version of glibc on x86 and amd64 it looks fine.
Would have to look at other versions and other libcs to make sure there
are no surprises.

Bruno

> wait(&status) takes "int *", we have a room for additional info, and
> wait_task_zombie() simply copies exit_code.
> 
> Oleg.
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
       [not found]                             ` <20110824214418.474b24c6-hY15tx4IgV39zxVx7UNMDg@public.gmane.org>
@ 2011-08-25 15:37                               ` Oleg Nesterov
  0 siblings, 0 replies; 64+ messages in thread
From: Oleg Nesterov @ 2011-08-25 15:37 UTC (permalink / raw)
  To: Bruno Prémont
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 08/24, Bruno Prémont wrote:
>
> On Tue, 23 August 2011 Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> >
> > Just in case... instead of WIFSIGNALED/WTERMSIG we can also report
> > the exit code in the upper bits. I mean,
> >
> > 	switch (reboot_cmd) {
> > 	case LINUX_REBOOT_CMD_RESTART:
> > 		code = 1 << 16;
> > 		break;
> > 	case LINUX_REBOOT_CMD_HALT:
> > 		code = 2 << 16;
> > 		break;
> > 	}
>
> That looks nice and simple!

Great. To me, WIFSIGNALED() looks better, but this is subjective and
in any case this is up to you.

> > this can't be confused with the normal exit(code), just the parent
> > should be careful, I am not sure this can't confuse WIFEXITED/WEXITSTATUS
> > user-space macroses.
>
> For recent glibc (2.12.2):
> sys/wait.h and stdlib.h:
> # define WIFEXITED(status) __WIFEXITED (__WAIT_INT (status))
> # define WEXITSTATUS(status)       __WEXITSTATUS (__WAIT_INT (status))
> bits/waitstatus.h:
> #define     __WIFEXITED(status)     (__WTERMSIG(status) == 0)
> #define     __WTERMSIG(status)      ((status) & 0x7f)
> #define     __WEXITSTATUS(status)   (((status) & 0xff00) >> 8)

... and note that __WEXITSTATUS() does "& 0xff00", it uses the
lower 16 bits.

So the parent should read status "by hand". Not a problem, I think.
Say, a traced task reports the additional info this way.

Oleg.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
  2011-08-24 19:44                           ` Bruno Prémont
@ 2011-08-25 15:37                             ` Oleg Nesterov
       [not found]                             ` <20110824214418.474b24c6-hY15tx4IgV39zxVx7UNMDg@public.gmane.org>
  1 sibling, 0 replies; 64+ messages in thread
From: Oleg Nesterov @ 2011-08-25 15:37 UTC (permalink / raw)
  To: Bruno Prémont; +Cc: Greg Kurz, containers, akpm, linux-kernel

On 08/24, Bruno Prémont wrote:
>
> On Tue, 23 August 2011 Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > Just in case... instead of WIFSIGNALED/WTERMSIG we can also report
> > the exit code in the upper bits. I mean,
> >
> > 	switch (reboot_cmd) {
> > 	case LINUX_REBOOT_CMD_RESTART:
> > 		code = 1 << 16;
> > 		break;
> > 	case LINUX_REBOOT_CMD_HALT:
> > 		code = 2 << 16;
> > 		break;
> > 	}
>
> That looks nice and simple!

Great. To me, WIFSIGNALED() looks better, but this is subjective and
in any case this is up to you.

> > this can't be confused with the normal exit(code), just the parent
> > should be careful, I am not sure this can't confuse WIFEXITED/WEXITSTATUS
> > user-space macroses.
>
> For recent glibc (2.12.2):
> sys/wait.h and stdlib.h:
> # define WIFEXITED(status) __WIFEXITED (__WAIT_INT (status))
> # define WEXITSTATUS(status)       __WEXITSTATUS (__WAIT_INT (status))
> bits/waitstatus.h:
> #define     __WIFEXITED(status)     (__WTERMSIG(status) == 0)
> #define     __WTERMSIG(status)      ((status) & 0x7f)
> #define     __WEXITSTATUS(status)   (((status) & 0xff00) >> 8)

... and note that __WEXITSTATUS() does "& 0xff00", it uses the
lower 16 bits.

So the parent should read status "by hand". Not a problem, I think.
Say, a traced task reports the additional info this way.

Oleg.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot
@ 2011-08-11 20:23 Daniel Lezcano
  0 siblings, 0 replies; 64+ messages in thread
From: Daniel Lezcano @ 2011-08-11 20:23 UTC (permalink / raw)
  To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy,
	oleg-6lXkIZvqkOAvJsYlp49lxw, linux-kernel-u79uwXL29TY76Z2rM5mHXA

From: Daniel Lezcano <dlezcano-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>

In the case of a VPS, when we shutdown/halt/reboot the container, the
reboot utility will invoke the sys_reboot syscall which has the bad
effect to reboot the host. The way to fix that is to drop the
CAP_SYS_REBOOT capability in the container.

In this case, the container shutdowns correctly but, at the end, the
init process is waiting indefinitely and we have the containers stuck
with one process (the init process).

In order to fix that, we used a hypervisor process, parent of the
container's init process, watching for the container's utmp file and
detecting when the runlevel changes. When this runlevel change is
detected we wait for the container to have one process left and then we
kill the container's init.

That works well if we modify the distro configuration files, we make
/var/run to not be a tmpfs and we remove all the files inside this
directory when the container boots. *But* as soon as we upgrade the
container distro, all the tweaks are lost. So this method works but at
the cost of tweaking the containers configuration files again and again,
each time there is an update, which is not tolerable in a production
environment.

This patchset solves the problem by send a SIGCHLD signal to the process
parent of the init process of the child pid namespace. By this way, we know
when a pid namespace wanted to reboot/halt/shutdown and we can take advantage
of that to kill, restart or suspend the container.

Daniel Lezcano (2):
  add SA_CLDREBOOT flag
  Notify container-init parent a 'reboot' occured

 arch/alpha/include/asm/signal.h   |    2 +
 arch/arm/include/asm/signal.h     |    2 +
 arch/avr32/include/asm/signal.h   |    2 +
 arch/cris/include/asm/signal.h    |    2 +
 arch/h8300/include/asm/signal.h   |    2 +
 arch/ia64/include/asm/signal.h    |    2 +
 arch/m32r/include/asm/signal.h    |    2 +
 arch/m68k/include/asm/signal.h    |    2 +
 arch/mips/include/asm/signal.h    |    2 +
 arch/mn10300/include/asm/signal.h |    2 +
 arch/parisc/include/asm/signal.h  |    2 +
 arch/powerpc/include/asm/signal.h |    2 +
 arch/s390/include/asm/signal.h    |    2 +
 arch/sparc/include/asm/signal.h   |    2 +-
 arch/x86/include/asm/signal.h     |    2 +
 arch/xtensa/include/asm/signal.h  |    2 +
 include/asm-generic/siginfo.h     |    3 +-
 include/asm-generic/signal.h      |    2 +
 include/linux/sched.h             |    1 +
 kernel/signal.c                   |   40 +++++++++++++++++++++++++++++++++++++
 kernel/sys.c                      |   20 ++++++++++++++++-
 21 files changed, 94 insertions(+), 4 deletions(-)

-- 
1.7.4.1

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2011-08-25 15:41 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-11 20:23 [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot Daniel Lezcano
     [not found] ` <1313094241-3674-1-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
2011-08-11 20:24   ` [PATCH 1/2] add SA_CLDREBOOT flag Daniel Lezcano
2011-08-11 20:24   ` [PATCH 2/2] Notify container-init parent a 'reboot' occured Daniel Lezcano
2011-08-14 16:17   ` [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot Oleg Nesterov
2011-08-11 20:24 ` [PATCH 1/2] add SA_CLDREBOOT flag Daniel Lezcano
     [not found]   ` <1313094241-3674-2-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
2011-08-14 16:15     ` Oleg Nesterov
2011-08-14 16:15   ` Oleg Nesterov
     [not found]     ` <20110814161532.GA30846-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-08-14 16:36       ` Bruno Prémont
2011-08-14 16:36     ` Bruno Prémont
2011-08-14 17:10       ` Oleg Nesterov
     [not found]       ` <20110814183611.05937c96-hY15tx4IgV39zxVx7UNMDg@public.gmane.org>
2011-08-14 17:10         ` Oleg Nesterov
2011-08-11 20:24 ` [PATCH 2/2] Notify container-init parent a 'reboot' occured Daniel Lezcano
2011-08-11 21:09   ` Serge Hallyn
2011-08-11 21:30     ` Daniel Lezcano
2011-08-11 21:30     ` Daniel Lezcano
     [not found]       ` <4E444A04.3070403-GANU6spQydw@public.gmane.org>
2011-08-11 21:50         ` Serge Hallyn
2011-08-12 16:29         ` Serge Hallyn
2011-08-12 16:29           ` Serge Hallyn
2011-08-12 20:42           ` Daniel Lezcano
     [not found]             ` <4E45904F.60904-GANU6spQydw@public.gmane.org>
2011-08-12 21:13               ` Serge Hallyn
2011-08-12 21:13             ` Serge Hallyn
2011-08-12 20:42           ` Daniel Lezcano
2011-08-11 21:50       ` Serge Hallyn
     [not found]   ` <1313094241-3674-3-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
2011-08-11 21:09     ` Serge Hallyn
2011-08-13  0:19     ` Matt Helsley
2011-08-14 16:01     ` Oleg Nesterov
2011-08-13  0:19   ` Matt Helsley
2011-08-13 14:41     ` Daniel Lezcano
     [not found]     ` <20110813001959.GB5777-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2011-08-13 14:41       ` Daniel Lezcano
2011-08-14 16:01   ` Oleg Nesterov
2011-08-14 16:17 ` [PATCH 0/2] Send a SIGCHLD to the init's pid namespace parent when reboot Oleg Nesterov
2011-08-14 21:36   ` Serge E. Hallyn
2011-08-15 14:47     ` Oleg Nesterov
2011-08-15 17:39       ` Serge E. Hallyn
2011-08-15 17:50         ` Daniel Lezcano
     [not found]         ` <20110815173940.GA19620-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2011-08-15 17:50           ` Daniel Lezcano
     [not found]       ` <20110815144744.GA9660-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-08-15 17:39         ` Serge E. Hallyn
2011-08-18 23:46         ` Daniel Lezcano
2011-08-18 23:46       ` Daniel Lezcano
2011-08-19 15:24         ` Oleg Nesterov
     [not found]           ` <20110819152416.GA17034-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-08-22 12:28             ` Daniel Lezcano
2011-08-22 12:28           ` Daniel Lezcano
2011-08-22 15:44             ` Oleg Nesterov
     [not found]               ` <20110822154448.GA8527-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-08-22 16:31                 ` Bruno Prémont
2011-08-22 16:31               ` Bruno Prémont
     [not found]                 ` <20110822183134.10390b46-hY15tx4IgV39zxVx7UNMDg@public.gmane.org>
2011-08-22 17:39                   ` Oleg Nesterov
2011-08-22 17:39                 ` Oleg Nesterov
     [not found]                   ` <20110822173949.GA13242-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-08-22 19:17                     ` Bruno Prémont
2011-08-22 19:17                   ` Bruno Prémont
2011-08-23 13:33                     ` Oleg Nesterov
2011-08-23 14:09                       ` Greg Kurz
     [not found]                         ` <1314108566.3465.29.camel-GiB8zCg7hOfDOqzlkpFKJg@public.gmane.org>
2011-08-23 14:29                           ` Oleg Nesterov
2011-08-23 14:29                         ` Oleg Nesterov
     [not found]                           ` <20110823142914.GA22593-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-08-24 19:44                             ` Bruno Prémont
2011-08-24 19:44                           ` Bruno Prémont
2011-08-25 15:37                             ` Oleg Nesterov
     [not found]                             ` <20110824214418.474b24c6-hY15tx4IgV39zxVx7UNMDg@public.gmane.org>
2011-08-25 15:37                               ` Oleg Nesterov
     [not found]                       ` <20110823133302.GA19582-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-08-23 14:09                         ` Greg Kurz
     [not found]                     ` <20110822211716.7c141d5c-hY15tx4IgV39zxVx7UNMDg@public.gmane.org>
2011-08-23 13:33                       ` Oleg Nesterov
     [not found]             ` <4E524B73.3050704-GANU6spQydw@public.gmane.org>
2011-08-22 15:44               ` Oleg Nesterov
     [not found]         ` <4E4DA461.8030006-GANU6spQydw@public.gmane.org>
2011-08-19 15:24           ` Oleg Nesterov
     [not found]     ` <20110814213642.GB13799-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2011-08-15 14:47       ` Oleg Nesterov
     [not found]   ` <20110814161707.GB30846-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-08-14 21:36     ` Serge E. Hallyn
  -- strict thread matches above, loose matches on Subject: below --
2011-08-11 20:23 Daniel Lezcano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.