linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1][V5] Handle reboot in a child pid namespace
@ 2012-01-05  9:06 Daniel Lezcano
  2012-01-05  9:06 ` [PATCH 1/1][V5] Add reboot_pid_ns to handle the reboot syscall Daniel Lezcano
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Lezcano @ 2012-01-05  9:06 UTC (permalink / raw)
  To: akpm; +Cc: serge.hallyn, oleg, containers, gkurz, linux-kernel, mtk.manpages


ChangeLog:
==========

 * V5
    - make static inline function for reboot_pid_ns to return 0 when
      CONFIG_PID_NS is off and do the pid_namespace pointer comparison
      inside the function. That makes the compiler to remove this portion
      of code when CONFIG_PID_NS is not enabled.

 * V4
   - store the signal number the child pid namespace init should
     exit from. It is simpler, cleaner, and does not add more encoding
     bits to the exit code of the process.

 * V3
   - removed lock and serialization of pid_ns_reboot

 * V2
   - added a lock for the pid namespace to prevent racy call
     to the 'reboot' syscall
   - Moved 'reboot' command assigned in zap_pid_ns_processes
     instead of wait_task_zombie
   - added tasklist lock around force_sig
   - added do_exit in pid_ns_reboot
   - used task_active_pid_ns instead of declaring a new variable in sys_reboot
   - moved code up before POWER_OFF changed to HALT in sys_reboot


Test case:
==========

#include <alloca.h>
#include <stdio.h>
#include <sched.h>
#include <unistd.h>
#include <signal.h>
#include <sys/reboot.h>
#include <sys/types.h>
#include <sys/wait.h>

#include <linux/reboot.h>

static int do_reboot(void *arg)
{
        int *cmd = arg;

        if (reboot(*cmd))
                printf("failed to reboot(%d): %m\n", *cmd);
}

int test_reboot(int cmd, int sig)
{
        long stack_size = 4096;
        void *stack = alloca(stack_size) + stack_size;
        int status;
        pid_t ret;

        ret = clone(do_reboot, stack, CLONE_NEWPID | SIGCHLD, &cmd);
        if (ret < 0) {
                printf("failed to clone: %m\n");
                return -1;
        }

        if (wait(&status) < 0) {
                printf("unexpected wait error: %m\n");
                return -1;
        }

        if (!WIFSIGNALED(status)) {
                printf("child process exited but was not signaled\n");
                return -1;
        }

        if (WTERMSIG(status) != sig) {
                printf("signal termination is not the one expected\n");
                return -1;
        }

        return 0;
}

int main(int argc, char *argv[])
{
        int status;

        status = test_reboot(LINUX_REBOOT_CMD_RESTART, SIGHUP);
        if (status < 0)
                return 1;
        printf("reboot(LINUX_REBOOT_CMD_RESTART) succeed\n");

        status = test_reboot(LINUX_REBOOT_CMD_RESTART2, SIGHUP);
        if (status < 0)
                return 1;
        printf("reboot(LINUX_REBOOT_CMD_RESTART2) succeed\n");

        status = test_reboot(LINUX_REBOOT_CMD_HALT, SIGINT);
        if (status < 0)
                return 1;
        printf("reboot(LINUX_REBOOT_CMD_HALT) succeed\n");

        status = test_reboot(LINUX_REBOOT_CMD_POWER_OFF, SIGINT);
        if (status < 0)
                return 1;
        printf("reboot(LINUX_REBOOT_CMD_POWERR_OFF) succeed\n");

        status = test_reboot(LINUX_REBOOT_CMD_CAD_ON, -1);
        if (status >= 0) {
                printf("reboot(LINUX_REBOOT_CMD_CAD_ON) should have failed\n");
                return 1;
        }
        printf("reboot(LINUX_REBOOT_CMD_CAD_ON) has failed as expected\n");

        return 0;
}

Daniel Lezcano (1):
  Add reboot_pid_ns to handle the reboot syscall

 include/linux/pid_namespace.h |    8 +++++++-
 kernel/pid_namespace.c        |   33 +++++++++++++++++++++++++++++++++
 kernel/sys.c                  |    3 +++
 3 files changed, 43 insertions(+), 1 deletions(-)

-- 
1.7.5.4


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/1][V5] Add reboot_pid_ns to handle the reboot syscall
  2012-01-05  9:06 [PATCH 0/1][V5] Handle reboot in a child pid namespace Daniel Lezcano
@ 2012-01-05  9:06 ` Daniel Lezcano
  2012-01-05 19:29   ` Serge Hallyn
  2012-02-03  0:10   ` Andrew Morton
  0 siblings, 2 replies; 9+ messages in thread
From: Daniel Lezcano @ 2012-01-05  9:06 UTC (permalink / raw)
  To: akpm; +Cc: serge.hallyn, oleg, containers, gkurz, linux-kernel, mtk.manpages

In the case of a child pid namespace, rebooting the system does not
really makes sense. When the pid namespace is used in conjunction
with the other namespaces in order to create a linux container, the
reboot syscall leads to some problems.

A container can reboot the host. That can be fixed by dropping
the sys_reboot capability but we are unable to correctly to poweroff/
halt/reboot a container and the container stays stuck at the shutdown
time with the container's init process waiting indefinitively.

After several attempts, no solution from userspace was found to reliabily
handle the shutdown from a container.

This patch propose to make the init process of the child pid namespace to
exit with a signal status set to : SIGINT if the child pid namespace called
"halt/poweroff" and SIGHUP if the child pid namespace called "reboot".
When the reboot syscall is called and we are not in the initial
pid namespace, we kill the pid namespace for "HALT", "POWEROFF", "RESTART",
and "RESTART2". Otherwise we return EINVAL.

Returning EINVAL is also an easy way to check if this feature is supported
by the kernel when invoking another 'reboot' option like CAD.

By this way the parent process of the child pid namespace knows if
it rebooted or not and can take the right decision.

Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/pid_namespace.h |    8 +++++++-
 kernel/pid_namespace.c        |   33 +++++++++++++++++++++++++++++++++
 kernel/sys.c                  |    8 ++++++++
 3 files changed, 48 insertions(+), 1 deletions(-)

diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index e7cf666..b90c798 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -32,6 +32,7 @@ struct pid_namespace {
 #endif
 	gid_t pid_gid;
 	int hide_pid;
+	int reboot;
 };
 
 extern struct pid_namespace init_pid_ns;
@@ -47,6 +48,7 @@ static inline struct pid_namespace *get_pid_ns(struct pid_namespace *ns)
 extern struct pid_namespace *copy_pid_ns(unsigned long flags, struct pid_namespace *ns);
 extern void free_pid_ns(struct kref *kref);
 extern void zap_pid_ns_processes(struct pid_namespace *pid_ns);
+extern int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd);
 
 static inline void put_pid_ns(struct pid_namespace *ns)
 {
@@ -74,11 +76,15 @@ static inline void put_pid_ns(struct pid_namespace *ns)
 {
 }
 
-
 static inline void zap_pid_ns_processes(struct pid_namespace *ns)
 {
 	BUG();
 }
+
+static inline int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd)
+{
+	return 0;
+}
 #endif /* CONFIG_PID_NS */
 
 extern struct pid_namespace *task_active_pid_ns(struct task_struct *tsk);
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index a896839..0d355e8 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -15,6 +15,7 @@
 #include <linux/acct.h>
 #include <linux/slab.h>
 #include <linux/proc_fs.h>
+#include <linux/reboot.h>
 
 #define BITS_PER_PAGE		(PAGE_SIZE*8)
 
@@ -187,6 +188,9 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
 		rc = sys_wait4(-1, NULL, __WALL, NULL);
 	} while (rc != -ECHILD);
 
+	if (pid_ns->reboot)
+		current->signal->group_exit_code = pid_ns->reboot;
+
 	acct_exit_ns(pid_ns);
 	return;
 }
@@ -221,6 +225,35 @@ static struct ctl_table pid_ns_ctl_table[] = {
 
 static struct ctl_path kern_path[] = { { .procname = "kernel", }, { } };
 
+int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd)
+{
+	if (pid_ns == &init_pid_ns)
+		return 0;
+
+	switch(cmd) {
+	case LINUX_REBOOT_CMD_RESTART2:
+	case LINUX_REBOOT_CMD_RESTART:
+		pid_ns->reboot = SIGHUP;
+		break;
+
+	case LINUX_REBOOT_CMD_POWER_OFF:
+	case LINUX_REBOOT_CMD_HALT:
+		pid_ns->reboot = SIGINT;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	read_lock(&tasklist_lock);
+	force_sig(SIGKILL, pid_ns->child_reaper);
+	read_unlock(&tasklist_lock);
+
+	do_exit(0);
+
+	/* Not reached */
+	return 0;
+}
+
 static __init int pid_namespaces_init(void)
 {
 	pid_ns_cachep = KMEM_CACHE(pid_namespace, SLAB_PANIC);
diff --git a/kernel/sys.c b/kernel/sys.c
index 4070153..bd924fa 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -444,6 +444,14 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
 	                magic2 != LINUX_REBOOT_MAGIC2C))
 		return -EINVAL;
 
+	/* In case the pid namespaces are enabled, the current task is in a
+	 * child pid_namespace and the command is handled by 'reboot_pid_ns',
+	 * this one will invoke 'do_exit'.
+	 */
+	ret = reboot_pid_ns(task_active_pid_ns(current), cmd);
+	if (ret)
+		return ret;
+
 	/* Instead of trying to make the power_off code look like
 	 * halt when pm_power_off is not set do it the easy way.
 	 */
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1][V5] Add reboot_pid_ns to handle the reboot syscall
  2012-01-05  9:06 ` [PATCH 1/1][V5] Add reboot_pid_ns to handle the reboot syscall Daniel Lezcano
@ 2012-01-05 19:29   ` Serge Hallyn
  2012-01-11 10:23     ` Serge Hallyn
  2012-02-03  0:10   ` Andrew Morton
  1 sibling, 1 reply; 9+ messages in thread
From: Serge Hallyn @ 2012-01-05 19:29 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: akpm, oleg, containers, gkurz, linux-kernel, mtk.manpages

Quoting Daniel Lezcano (daniel.lezcano@free.fr):
> In the case of a child pid namespace, rebooting the system does not
> really makes sense. When the pid namespace is used in conjunction
> with the other namespaces in order to create a linux container, the
> reboot syscall leads to some problems.
> 
> A container can reboot the host. That can be fixed by dropping
> the sys_reboot capability but we are unable to correctly to poweroff/
> halt/reboot a container and the container stays stuck at the shutdown
> time with the container's init process waiting indefinitively.
> 
> After several attempts, no solution from userspace was found to reliabily
> handle the shutdown from a container.
> 
> This patch propose to make the init process of the child pid namespace to
> exit with a signal status set to : SIGINT if the child pid namespace called
> "halt/poweroff" and SIGHUP if the child pid namespace called "reboot".
> When the reboot syscall is called and we are not in the initial
> pid namespace, we kill the pid namespace for "HALT", "POWEROFF", "RESTART",
> and "RESTART2". Otherwise we return EINVAL.
> 
> Returning EINVAL is also an easy way to check if this feature is supported
> by the kernel when invoking another 'reboot' option like CAD.
> 
> By this way the parent process of the child pid namespace knows if
> it rebooted or not and can take the right decision.
> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
> Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
> Reviewed-by: Oleg Nesterov <oleg@redhat.com>

The testcase in [PATCH 0/1] passed for me, and reboot from init_pid_ns
works as usual.

Tested-by: Serge Hallyn <serge.hallyn@canonical.com>

thanks,
-serge

> ---
>  include/linux/pid_namespace.h |    8 +++++++-
>  kernel/pid_namespace.c        |   33 +++++++++++++++++++++++++++++++++
>  kernel/sys.c                  |    8 ++++++++
>  3 files changed, 48 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
> index e7cf666..b90c798 100644
> --- a/include/linux/pid_namespace.h
> +++ b/include/linux/pid_namespace.h
> @@ -32,6 +32,7 @@ struct pid_namespace {
>  #endif
>  	gid_t pid_gid;
>  	int hide_pid;
> +	int reboot;
>  };
>  
>  extern struct pid_namespace init_pid_ns;
> @@ -47,6 +48,7 @@ static inline struct pid_namespace *get_pid_ns(struct pid_namespace *ns)
>  extern struct pid_namespace *copy_pid_ns(unsigned long flags, struct pid_namespace *ns);
>  extern void free_pid_ns(struct kref *kref);
>  extern void zap_pid_ns_processes(struct pid_namespace *pid_ns);
> +extern int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd);
>  
>  static inline void put_pid_ns(struct pid_namespace *ns)
>  {
> @@ -74,11 +76,15 @@ static inline void put_pid_ns(struct pid_namespace *ns)
>  {
>  }
>  
> -
>  static inline void zap_pid_ns_processes(struct pid_namespace *ns)
>  {
>  	BUG();
>  }
> +
> +static inline int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd)
> +{
> +	return 0;
> +}
>  #endif /* CONFIG_PID_NS */
>  
>  extern struct pid_namespace *task_active_pid_ns(struct task_struct *tsk);
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index a896839..0d355e8 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -15,6 +15,7 @@
>  #include <linux/acct.h>
>  #include <linux/slab.h>
>  #include <linux/proc_fs.h>
> +#include <linux/reboot.h>
>  
>  #define BITS_PER_PAGE		(PAGE_SIZE*8)
>  
> @@ -187,6 +188,9 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
>  		rc = sys_wait4(-1, NULL, __WALL, NULL);
>  	} while (rc != -ECHILD);
>  
> +	if (pid_ns->reboot)
> +		current->signal->group_exit_code = pid_ns->reboot;
> +
>  	acct_exit_ns(pid_ns);
>  	return;
>  }
> @@ -221,6 +225,35 @@ static struct ctl_table pid_ns_ctl_table[] = {
>  
>  static struct ctl_path kern_path[] = { { .procname = "kernel", }, { } };
>  
> +int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd)
> +{
> +	if (pid_ns == &init_pid_ns)
> +		return 0;
> +
> +	switch(cmd) {
> +	case LINUX_REBOOT_CMD_RESTART2:
> +	case LINUX_REBOOT_CMD_RESTART:
> +		pid_ns->reboot = SIGHUP;
> +		break;
> +
> +	case LINUX_REBOOT_CMD_POWER_OFF:
> +	case LINUX_REBOOT_CMD_HALT:
> +		pid_ns->reboot = SIGINT;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	read_lock(&tasklist_lock);
> +	force_sig(SIGKILL, pid_ns->child_reaper);
> +	read_unlock(&tasklist_lock);
> +
> +	do_exit(0);
> +
> +	/* Not reached */
> +	return 0;
> +}
> +
>  static __init int pid_namespaces_init(void)
>  {
>  	pid_ns_cachep = KMEM_CACHE(pid_namespace, SLAB_PANIC);
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 4070153..bd924fa 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -444,6 +444,14 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
>  	                magic2 != LINUX_REBOOT_MAGIC2C))
>  		return -EINVAL;
>  
> +	/* In case the pid namespaces are enabled, the current task is in a
> +	 * child pid_namespace and the command is handled by 'reboot_pid_ns',
> +	 * this one will invoke 'do_exit'.
> +	 */
> +	ret = reboot_pid_ns(task_active_pid_ns(current), cmd);
> +	if (ret)
> +		return ret;
> +
>  	/* Instead of trying to make the power_off code look like
>  	 * halt when pm_power_off is not set do it the easy way.
>  	 */
> -- 
> 1.7.5.4
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1][V5] Add reboot_pid_ns to handle the reboot syscall
  2012-01-05 19:29   ` Serge Hallyn
@ 2012-01-11 10:23     ` Serge Hallyn
  2012-01-11 10:45       ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: Serge Hallyn @ 2012-01-11 10:23 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: containers, oleg, linux-kernel, mtk.manpages, akpm

Quoting Serge Hallyn (serge.hallyn@canonical.com):
> Quoting Daniel Lezcano (daniel.lezcano@free.fr):
> > In the case of a child pid namespace, rebooting the system does not
> > really makes sense. When the pid namespace is used in conjunction
> > with the other namespaces in order to create a linux container, the
> > reboot syscall leads to some problems.
> > 
> > A container can reboot the host. That can be fixed by dropping
> > the sys_reboot capability but we are unable to correctly to poweroff/
> > halt/reboot a container and the container stays stuck at the shutdown
> > time with the container's init process waiting indefinitively.
> > 
> > After several attempts, no solution from userspace was found to reliabily
> > handle the shutdown from a container.
> > 
> > This patch propose to make the init process of the child pid namespace to
> > exit with a signal status set to : SIGINT if the child pid namespace called
> > "halt/poweroff" and SIGHUP if the child pid namespace called "reboot".
> > When the reboot syscall is called and we are not in the initial
> > pid namespace, we kill the pid namespace for "HALT", "POWEROFF", "RESTART",
> > and "RESTART2". Otherwise we return EINVAL.
> > 
> > Returning EINVAL is also an easy way to check if this feature is supported
> > by the kernel when invoking another 'reboot' option like CAD.
> > 
> > By this way the parent process of the child pid namespace knows if
> > it rebooted or not and can take the right decision.
> > 
> > Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
> > Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
> > Reviewed-by: Oleg Nesterov <oleg@redhat.com>
> 
> The testcase in [PATCH 0/1] passed for me, and reboot from init_pid_ns
> works as usual.
> 
> Tested-by: Serge Hallyn <serge.hallyn@canonical.com>

Hi Andrew,

Are you considering taking this patch into -mm?

thanks,
-serge

> 
> thanks,
> -serge
> 
> > ---
> >  include/linux/pid_namespace.h |    8 +++++++-
> >  kernel/pid_namespace.c        |   33 +++++++++++++++++++++++++++++++++
> >  kernel/sys.c                  |    8 ++++++++
> >  3 files changed, 48 insertions(+), 1 deletions(-)
> > 
> > diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
> > index e7cf666..b90c798 100644
> > --- a/include/linux/pid_namespace.h
> > +++ b/include/linux/pid_namespace.h
> > @@ -32,6 +32,7 @@ struct pid_namespace {
> >  #endif
> >  	gid_t pid_gid;
> >  	int hide_pid;
> > +	int reboot;
> >  };
> >  
> >  extern struct pid_namespace init_pid_ns;
> > @@ -47,6 +48,7 @@ static inline struct pid_namespace *get_pid_ns(struct pid_namespace *ns)
> >  extern struct pid_namespace *copy_pid_ns(unsigned long flags, struct pid_namespace *ns);
> >  extern void free_pid_ns(struct kref *kref);
> >  extern void zap_pid_ns_processes(struct pid_namespace *pid_ns);
> > +extern int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd);
> >  
> >  static inline void put_pid_ns(struct pid_namespace *ns)
> >  {
> > @@ -74,11 +76,15 @@ static inline void put_pid_ns(struct pid_namespace *ns)
> >  {
> >  }
> >  
> > -
> >  static inline void zap_pid_ns_processes(struct pid_namespace *ns)
> >  {
> >  	BUG();
> >  }
> > +
> > +static inline int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd)
> > +{
> > +	return 0;
> > +}
> >  #endif /* CONFIG_PID_NS */
> >  
> >  extern struct pid_namespace *task_active_pid_ns(struct task_struct *tsk);
> > diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> > index a896839..0d355e8 100644
> > --- a/kernel/pid_namespace.c
> > +++ b/kernel/pid_namespace.c
> > @@ -15,6 +15,7 @@
> >  #include <linux/acct.h>
> >  #include <linux/slab.h>
> >  #include <linux/proc_fs.h>
> > +#include <linux/reboot.h>
> >  
> >  #define BITS_PER_PAGE		(PAGE_SIZE*8)
> >  
> > @@ -187,6 +188,9 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
> >  		rc = sys_wait4(-1, NULL, __WALL, NULL);
> >  	} while (rc != -ECHILD);
> >  
> > +	if (pid_ns->reboot)
> > +		current->signal->group_exit_code = pid_ns->reboot;
> > +
> >  	acct_exit_ns(pid_ns);
> >  	return;
> >  }
> > @@ -221,6 +225,35 @@ static struct ctl_table pid_ns_ctl_table[] = {
> >  
> >  static struct ctl_path kern_path[] = { { .procname = "kernel", }, { } };
> >  
> > +int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd)
> > +{
> > +	if (pid_ns == &init_pid_ns)
> > +		return 0;
> > +
> > +	switch(cmd) {
> > +	case LINUX_REBOOT_CMD_RESTART2:
> > +	case LINUX_REBOOT_CMD_RESTART:
> > +		pid_ns->reboot = SIGHUP;
> > +		break;
> > +
> > +	case LINUX_REBOOT_CMD_POWER_OFF:
> > +	case LINUX_REBOOT_CMD_HALT:
> > +		pid_ns->reboot = SIGINT;
> > +		break;
> > +	default:
> > +		return -EINVAL;
> > +	}
> > +
> > +	read_lock(&tasklist_lock);
> > +	force_sig(SIGKILL, pid_ns->child_reaper);
> > +	read_unlock(&tasklist_lock);
> > +
> > +	do_exit(0);
> > +
> > +	/* Not reached */
> > +	return 0;
> > +}
> > +
> >  static __init int pid_namespaces_init(void)
> >  {
> >  	pid_ns_cachep = KMEM_CACHE(pid_namespace, SLAB_PANIC);
> > diff --git a/kernel/sys.c b/kernel/sys.c
> > index 4070153..bd924fa 100644
> > --- a/kernel/sys.c
> > +++ b/kernel/sys.c
> > @@ -444,6 +444,14 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
> >  	                magic2 != LINUX_REBOOT_MAGIC2C))
> >  		return -EINVAL;
> >  
> > +	/* In case the pid namespaces are enabled, the current task is in a
> > +	 * child pid_namespace and the command is handled by 'reboot_pid_ns',
> > +	 * this one will invoke 'do_exit'.
> > +	 */
> > +	ret = reboot_pid_ns(task_active_pid_ns(current), cmd);
> > +	if (ret)
> > +		return ret;
> > +
> >  	/* Instead of trying to make the power_off code look like
> >  	 * halt when pm_power_off is not set do it the easy way.
> >  	 */
> > -- 
> > 1.7.5.4
> > 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1][V5] Add reboot_pid_ns to handle the reboot syscall
  2012-01-11 10:23     ` Serge Hallyn
@ 2012-01-11 10:45       ` Andrew Morton
  2012-01-11 10:49         ` Serge E. Hallyn
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2012-01-11 10:45 UTC (permalink / raw)
  To: Serge Hallyn; +Cc: Daniel Lezcano, containers, oleg, linux-kernel, mtk.manpages

On Wed, 11 Jan 2012 11:23:46 +0100 Serge Hallyn <serge.hallyn@canonical.com> wrote:

> > 
> > The testcase in [PATCH 0/1] passed for me, and reboot from init_pid_ns
> > works as usual.
> > 
> > Tested-by: Serge Hallyn <serge.hallyn@canonical.com>
> 
> Hi Andrew,
> 
> Are you considering taking this patch into -mm?

I have it queued up to look at.  I'll do that I've merged the current
lot into rc1 - if I were to merge new stuff before then I'd only
confuse myself ;)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1][V5] Add reboot_pid_ns to handle the reboot syscall
  2012-01-11 10:45       ` Andrew Morton
@ 2012-01-11 10:49         ` Serge E. Hallyn
  0 siblings, 0 replies; 9+ messages in thread
From: Serge E. Hallyn @ 2012-01-11 10:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Serge Hallyn, linux-kernel, containers, mtk.manpages, oleg

Quoting Andrew Morton (akpm@linux-foundation.org):
> On Wed, 11 Jan 2012 11:23:46 +0100 Serge Hallyn <serge.hallyn@canonical.com> wrote:
> 
> > > 
> > > The testcase in [PATCH 0/1] passed for me, and reboot from init_pid_ns
> > > works as usual.
> > > 
> > > Tested-by: Serge Hallyn <serge.hallyn@canonical.com>
> > 
> > Hi Andrew,
> > 
> > Are you considering taking this patch into -mm?
> 
> I have it queued up to look at.  I'll do that I've merged the current
> lot into rc1 - if I were to merge new stuff before then I'd only
> confuse myself ;)

Thanks!

-serge

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1][V5] Add reboot_pid_ns to handle the reboot syscall
  2012-01-05  9:06 ` [PATCH 1/1][V5] Add reboot_pid_ns to handle the reboot syscall Daniel Lezcano
  2012-01-05 19:29   ` Serge Hallyn
@ 2012-02-03  0:10   ` Andrew Morton
  2012-02-03  8:59     ` Daniel Lezcano
  1 sibling, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2012-02-03  0:10 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: serge.hallyn, oleg, containers, gkurz, linux-kernel, mtk.manpages

On Thu,  5 Jan 2012 10:06:50 +0100
Daniel Lezcano <daniel.lezcano@free.fr> wrote:

> In the case of a child pid namespace, rebooting the system does not
> really makes sense. When the pid namespace is used in conjunction
> with the other namespaces in order to create a linux container, the
> reboot syscall leads to some problems.
> 
> A container can reboot the host. That can be fixed by dropping
> the sys_reboot capability but we are unable to correctly to poweroff/
> halt/reboot a container and the container stays stuck at the shutdown
> time with the container's init process waiting indefinitively.
> 
> After several attempts, no solution from userspace was found to reliabily
> handle the shutdown from a container.
> 
> This patch propose to make the init process of the child pid namespace to
> exit with a signal status set to : SIGINT if the child pid namespace called
> "halt/poweroff" and SIGHUP if the child pid namespace called "reboot".
> When the reboot syscall is called and we are not in the initial
> pid namespace, we kill the pid namespace for "HALT", "POWEROFF", "RESTART",
> and "RESTART2". Otherwise we return EINVAL.
> 
> Returning EINVAL is also an easy way to check if this feature is supported
> by the kernel when invoking another 'reboot' option like CAD.
> 
> By this way the parent process of the child pid namespace knows if
> it rebooted or not and can take the right decision.

Looks OK, although the comments need help.  Is the below still true?

Do you think it would be feasible to put your testcase into
tools/testing/selftests?  I'm thinking "no", because running the test
needs elevated permissions and might reboot the user's machine(!).


--- a/include/linux/pid_namespace.h~pidns-add-reboot_pid_ns-to-handle-the-reboot-syscall-fix
+++ a/include/linux/pid_namespace.h
@@ -32,7 +32,7 @@ struct pid_namespace {
 #endif
 	gid_t pid_gid;
 	int hide_pid;
-	int reboot;
+	int reboot;	/* group exit code if this pidns was rebooted */
 };
 
 extern struct pid_namespace init_pid_ns;
--- a/kernel/sys.c~pidns-add-reboot_pid_ns-to-handle-the-reboot-syscall-fix
+++ a/kernel/sys.c
@@ -444,9 +444,10 @@ SYSCALL_DEFINE4(reboot, int, magic1, int
 	                magic2 != LINUX_REBOOT_MAGIC2C))
 		return -EINVAL;
 
-	/* In case the pid namespaces are enabled, the current task is in a
-	 * child pid_namespace and the command is handled by 'reboot_pid_ns',
-	 * this one will invoke 'do_exit'.
+	/*
+	 * If pid namespaces are enabled and the current task is in a child
+	 * pid_namespace, the command is handled by reboot_pid_ns() which will
+	 * call do_exit().
 	 */
 	ret = reboot_pid_ns(task_active_pid_ns(current), cmd);
 	if (ret)

> --- a/include/linux/pid_namespace.h
> +++ b/include/linux/pid_namespace.h
> @@ -32,6 +32,7 @@ struct pid_namespace {
>  #endif
>  	gid_t pid_gid;
>  	int hide_pid;
> +	int reboot;
>  };

This was particuarly distressing.  The field was poorly named and other
people forgotting to document their data structures doesn't mean that
we should continue to do this!



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1][V5] Add reboot_pid_ns to handle the reboot syscall
  2012-02-03  0:10   ` Andrew Morton
@ 2012-02-03  8:59     ` Daniel Lezcano
  2012-02-03 15:47       ` Serge Hallyn
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Lezcano @ 2012-02-03  8:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: serge.hallyn, oleg, containers, gkurz, linux-kernel, mtk.manpages

On 02/03/2012 01:10 AM, Andrew Morton wrote:
> On Thu,  5 Jan 2012 10:06:50 +0100
> Daniel Lezcano<daniel.lezcano@free.fr>  wrote:
>
>> In the case of a child pid namespace, rebooting the system does not
>> really makes sense. When the pid namespace is used in conjunction
>> with the other namespaces in order to create a linux container, the
>> reboot syscall leads to some problems.
>>
>> A container can reboot the host. That can be fixed by dropping
>> the sys_reboot capability but we are unable to correctly to poweroff/
>> halt/reboot a container and the container stays stuck at the shutdown
>> time with the container's init process waiting indefinitively.
>>
>> After several attempts, no solution from userspace was found to reliabily
>> handle the shutdown from a container.
>>
>> This patch propose to make the init process of the child pid namespace to
>> exit with a signal status set to : SIGINT if the child pid namespace called
>> "halt/poweroff" and SIGHUP if the child pid namespace called "reboot".
>> When the reboot syscall is called and we are not in the initial
>> pid namespace, we kill the pid namespace for "HALT", "POWEROFF", "RESTART",
>> and "RESTART2". Otherwise we return EINVAL.
>>
>> Returning EINVAL is also an easy way to check if this feature is supported
>> by the kernel when invoking another 'reboot' option like CAD.
>>
>> By this way the parent process of the child pid namespace knows if
>> it rebooted or not and can take the right decision.
> Looks OK, although the comments need help.  Is the below still true?

Yes, thanks for fixing this.

>
> Do you think it would be feasible to put your testcase into
> tools/testing/selftests?  I'm thinking "no", because running the test
> needs elevated permissions and might reboot the user's machine(!).

Yes, right. I don't think the user will be happy with that. 
Unfortunately, I don't see how to test this feature without falling into 
a reboot on failure. On the other side, this very specific feature is 
used in the container environment and if it fails that will be spotted 
immediately and fixed. So I don't think that does make sense to add this 
test in tools/testing/selftests.

[ ... ]

>   	gid_t pid_gid;
>   	int hide_pid;
> +	int reboot;
>   };
> This was particuarly distressing.  The field was poorly named and other
> people forgotting to document their data structures doesn't mean that
> we should continue to do this!

Thanks again for adding the description. I will take care next time to 
add a simple description when the field name is not self-explicit or 
ambiguous.

   -- Daniel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1][V5] Add reboot_pid_ns to handle the reboot syscall
  2012-02-03  8:59     ` Daniel Lezcano
@ 2012-02-03 15:47       ` Serge Hallyn
  0 siblings, 0 replies; 9+ messages in thread
From: Serge Hallyn @ 2012-02-03 15:47 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Andrew Morton, oleg, containers, gkurz, linux-kernel, mtk.manpages

Quoting Daniel Lezcano (daniel.lezcano@free.fr):
> On 02/03/2012 01:10 AM, Andrew Morton wrote:
> >On Thu,  5 Jan 2012 10:06:50 +0100
> >Daniel Lezcano<daniel.lezcano@free.fr>  wrote:
> >
> >>In the case of a child pid namespace, rebooting the system does not
> >>really makes sense. When the pid namespace is used in conjunction
> >>with the other namespaces in order to create a linux container, the
> >>reboot syscall leads to some problems.
> >>
> >>A container can reboot the host. That can be fixed by dropping
> >>the sys_reboot capability but we are unable to correctly to poweroff/
> >>halt/reboot a container and the container stays stuck at the shutdown
> >>time with the container's init process waiting indefinitively.
> >>
> >>After several attempts, no solution from userspace was found to reliabily
> >>handle the shutdown from a container.
> >>
> >>This patch propose to make the init process of the child pid namespace to
> >>exit with a signal status set to : SIGINT if the child pid namespace called
> >>"halt/poweroff" and SIGHUP if the child pid namespace called "reboot".
> >>When the reboot syscall is called and we are not in the initial
> >>pid namespace, we kill the pid namespace for "HALT", "POWEROFF", "RESTART",
> >>and "RESTART2". Otherwise we return EINVAL.
> >>
> >>Returning EINVAL is also an easy way to check if this feature is supported
> >>by the kernel when invoking another 'reboot' option like CAD.
> >>
> >>By this way the parent process of the child pid namespace knows if
> >>it rebooted or not and can take the right decision.
> >Looks OK, although the comments need help.  Is the below still true?
> 
> Yes, thanks for fixing this.
> 
> >
> >Do you think it would be feasible to put your testcase into
> >tools/testing/selftests?  I'm thinking "no", because running the test
> >needs elevated permissions and might reboot the user's machine(!).
> 
> Yes, right. I don't think the user will be happy with that.
> Unfortunately, I don't see how to test this feature without falling
> into a reboot on failure. On the other side, this very specific
> feature is used in the container environment and if it fails that
> will be spotted immediately and fixed. So I don't think that does
> make sense to add this test in tools/testing/selftests.

In fact I can add Daniel's testcase to my lxc testsuite to run
separately, apart from lxc.  I haven't yet hooked it up into our
jenkins qa rig, but that's planned.

> [ ... ]
> 
> >  	gid_t pid_gid;
> >  	int hide_pid;
> >+	int reboot;
> >  };
> >This was particuarly distressing.  The field was poorly named and other
> >people forgotting to document their data structures doesn't mean that
> >we should continue to do this!
> 
> Thanks again for adding the description. I will take care next time
> to add a simple description when the field name is not self-explicit
> or ambiguous.
> 
>   -- Daniel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-02-03 15:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-05  9:06 [PATCH 0/1][V5] Handle reboot in a child pid namespace Daniel Lezcano
2012-01-05  9:06 ` [PATCH 1/1][V5] Add reboot_pid_ns to handle the reboot syscall Daniel Lezcano
2012-01-05 19:29   ` Serge Hallyn
2012-01-11 10:23     ` Serge Hallyn
2012-01-11 10:45       ` Andrew Morton
2012-01-11 10:49         ` Serge E. Hallyn
2012-02-03  0:10   ` Andrew Morton
2012-02-03  8:59     ` Daniel Lezcano
2012-02-03 15:47       ` Serge Hallyn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).