linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] pid: Allow frozen userspace to reboot from non-init pid ns
@ 2023-09-29 17:44 Brian Geffon
  2023-09-29 20:09 ` Kees Cook
  2023-10-12  9:48 ` Rafael J. Wysocki
  0 siblings, 2 replies; 7+ messages in thread
From: Brian Geffon @ 2023-09-29 17:44 UTC (permalink / raw)
  To: Christian Brauner, Kees Cook
  Cc: Rafael J . Wysocki, Matthias Kaehlcke, Luis Chamberlain,
	Frederic Weisbecker, linux-kernel, Brian Geffon

When the system has a frozen userspace, for example, during hibernation
the child reaper task will also be frozen. Attmepting to deliver a
signal to it to handle the reboot(2) will ultimately lead to the system
hanging unless userspace is thawed.

This change checks if the current task is the suspending task and if so
it will allow it to proceed with a reboot from the non-init pid ns.

Signed-off-by: Brian Geffon <bgeffon@google.com>
Reported-by: Matthias Kaehlcke <mka@chromium.org>
Tested-by: Matthias Kaehlcke <mka@chromium.org>
---
 kernel/pid_namespace.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 0bf44afe04dd..4a93a5063eda 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -321,6 +321,15 @@ int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd)
 	if (pid_ns == &init_pid_ns)
 		return 0;
 
+	if (current->flags & PF_SUSPEND_TASK) {
+		/*
+		 * Attempting to signal the child_reaper won't work if it's
+		 * frozen. In this case we shutdown the system as if we were in
+		 * the init_pid_ns.
+		 */
+		return 0;
+	}
+
 	switch (cmd) {
 	case LINUX_REBOOT_CMD_RESTART2:
 	case LINUX_REBOOT_CMD_RESTART:
-- 
2.42.0.582.g8ccd20d70d-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] pid: Allow frozen userspace to reboot from non-init pid ns
  2023-09-29 17:44 [PATCH] pid: Allow frozen userspace to reboot from non-init pid ns Brian Geffon
@ 2023-09-29 20:09 ` Kees Cook
  2023-09-30  0:25   ` Brian Geffon
  2023-10-12  9:48 ` Rafael J. Wysocki
  1 sibling, 1 reply; 7+ messages in thread
From: Kees Cook @ 2023-09-29 20:09 UTC (permalink / raw)
  To: Brian Geffon
  Cc: Christian Brauner, Rafael J . Wysocki, Matthias Kaehlcke,
	Luis Chamberlain, Frederic Weisbecker, linux-kernel

On Fri, Sep 29, 2023 at 01:44:42PM -0400, Brian Geffon wrote:
> When the system has a frozen userspace, for example, during hibernation
> the child reaper task will also be frozen. Attmepting to deliver a
> signal to it to handle the reboot(2) will ultimately lead to the system
> hanging unless userspace is thawed.
> 
> This change checks if the current task is the suspending task and if so
> it will allow it to proceed with a reboot from the non-init pid ns.

I don't know the code flow too well here, but shouldn't init_pid_ns
always be doing the reboot regardless of anything else?

Also how is this syscall running if current is frozen? This feels weird
to me... shouldn't the frozen test be against pid_ns->child_reaper
instead of current?

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] pid: Allow frozen userspace to reboot from non-init pid ns
  2023-09-29 20:09 ` Kees Cook
@ 2023-09-30  0:25   ` Brian Geffon
  2023-10-09 20:05     ` Kees Cook
  2023-10-12  3:53     ` Eric W. Biederman
  0 siblings, 2 replies; 7+ messages in thread
From: Brian Geffon @ 2023-09-30  0:25 UTC (permalink / raw)
  To: Kees Cook
  Cc: Christian Brauner, Rafael J . Wysocki, Matthias Kaehlcke,
	Luis Chamberlain, Frederic Weisbecker, linux-kernel

On Fri, Sep 29, 2023 at 4:09 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Fri, Sep 29, 2023 at 01:44:42PM -0400, Brian Geffon wrote:
> > When the system has a frozen userspace, for example, during hibernation
> > the child reaper task will also be frozen. Attmepting to deliver a
> > signal to it to handle the reboot(2) will ultimately lead to the system
> > hanging unless userspace is thawed.
> >
> > This change checks if the current task is the suspending task and if so
> > it will allow it to proceed with a reboot from the non-init pid ns.
>
> I don't know the code flow too well here, but shouldn't init_pid_ns
> always be doing the reboot regardless of anything else?

I think the point of this is, normally the reaper is runnable and so
an appropriate signal will be delivered allowing them to also clean up
[2]. In our case, they won't be runnable and doing this wouldn't make
sense.

>
> Also how is this syscall running if current is frozen? This feels weird
> to me... shouldn't the frozen test be against pid_ns->child_reaper
> instead of current?

The task which froze the system won't be frozen to make sure this
happens it will have the flag PF_SUSPEND_TASK added, so we know if we
have this flag we're the only running user space task [1].

>
> -Kees

I hope my understanding is correct and it makes sense. Thanks for
taking the time to review.

Brian

1. https://elixir.bootlin.com/linux/latest/source/kernel/power/process.c#L130
2. https://elixir.bootlin.com/linux/latest/source/kernel/pid_namespace.c#L327


>
> --
> Kees Cook

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] pid: Allow frozen userspace to reboot from non-init pid ns
  2023-09-30  0:25   ` Brian Geffon
@ 2023-10-09 20:05     ` Kees Cook
  2023-10-12  3:53     ` Eric W. Biederman
  1 sibling, 0 replies; 7+ messages in thread
From: Kees Cook @ 2023-10-09 20:05 UTC (permalink / raw)
  To: Christian Brauner, Brian Geffon
  Cc: Rafael J . Wysocki, Matthias Kaehlcke, Luis Chamberlain,
	Frederic Weisbecker, linux-kernel

On Fri, Sep 29, 2023 at 08:25:42PM -0400, Brian Geffon wrote:
> On Fri, Sep 29, 2023 at 4:09 PM Kees Cook <keescook@chromium.org> wrote:
> >
> > On Fri, Sep 29, 2023 at 01:44:42PM -0400, Brian Geffon wrote:
> > > When the system has a frozen userspace, for example, during hibernation
> > > the child reaper task will also be frozen. Attmepting to deliver a
> > > signal to it to handle the reboot(2) will ultimately lead to the system
> > > hanging unless userspace is thawed.
> > >
> > > This change checks if the current task is the suspending task and if so
> > > it will allow it to proceed with a reboot from the non-init pid ns.
> >
> > I don't know the code flow too well here, but shouldn't init_pid_ns
> > always be doing the reboot regardless of anything else?
> 
> I think the point of this is, normally the reaper is runnable and so
> an appropriate signal will be delivered allowing them to also clean up
> [2]. In our case, they won't be runnable and doing this wouldn't make
> sense.
> 
> >
> > Also how is this syscall running if current is frozen? This feels weird
> > to me... shouldn't the frozen test be against pid_ns->child_reaper
> > instead of current?
> 
> The task which froze the system won't be frozen to make sure this
> happens it will have the flag PF_SUSPEND_TASK added, so we know if we
> have this flag we're the only running user space task [1].
> 
> >
> > -Kees
> 
> I hope my understanding is correct and it makes sense. Thanks for
> taking the time to review.

Christian, is this something you can take a look at? I'm much less
familiar with this area of the process list logic.

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] pid: Allow frozen userspace to reboot from non-init pid ns
  2023-09-30  0:25   ` Brian Geffon
  2023-10-09 20:05     ` Kees Cook
@ 2023-10-12  3:53     ` Eric W. Biederman
  1 sibling, 0 replies; 7+ messages in thread
From: Eric W. Biederman @ 2023-10-12  3:53 UTC (permalink / raw)
  To: Brian Geffon
  Cc: Kees Cook, Christian Brauner, Rafael J . Wysocki,
	Matthias Kaehlcke, Luis Chamberlain, Frederic Weisbecker,
	linux-kernel

Brian Geffon <bgeffon@google.com> writes:

> On Fri, Sep 29, 2023 at 4:09 PM Kees Cook <keescook@chromium.org> wrote:
>>
>> On Fri, Sep 29, 2023 at 01:44:42PM -0400, Brian Geffon wrote:
>> > When the system has a frozen userspace, for example, during hibernation
>> > the child reaper task will also be frozen. Attmepting to deliver a
>> > signal to it to handle the reboot(2) will ultimately lead to the system
>> > hanging unless userspace is thawed.
>> >
>> > This change checks if the current task is the suspending task and if so
>> > it will allow it to proceed with a reboot from the non-init pid ns.
>>
>> I don't know the code flow too well here, but shouldn't init_pid_ns
>> always be doing the reboot regardless of anything else?
>
> I think the point of this is, normally the reaper is runnable and so
> an appropriate signal will be delivered allowing them to also clean up
> [2]. In our case, they won't be runnable and doing this wouldn't make
> sense.

The entire reboot_pid_ns thing is just a polite way of keeping
applications like /sbin/reboot working inside a pid namespace.

Ordinarily the process calling reboot (inside the container) won't
have the privileges to request an entire system reboot.  So I don't
see anything making sense to promote that reboot into a system-wide
reboot.

Which leads me to the question.  What is actually happening with
hibernation that we want something inside a pid namespace to somehow
have the permissions to reboot the entire machine?

>> Also how is this syscall running if current is frozen? This feels weird
>> to me... shouldn't the frozen test be against pid_ns->child_reaper
>> instead of current?
>
> The task which froze the system won't be frozen to make sure this
> happens it will have the flag PF_SUSPEND_TASK added, so we know if we
> have this flag we're the only running user space task [1].

Someone has a task inside a container that is successfully suspending
the entire system?

I don't see how that makes sense.

But on the level that it somehow does I would put a test in
kernel/reboot.c something like:

/*
 * If the caller can't perform a normal reboot call
 * reboot_pid_ns
 */
if ((pid_ns != &init_pid_ns) &&
    !((current->flags & PF_SUSPEND_TASK) && capable(CAP_SYS_BOOT))) {
	return reboot_pid_ns(pid_ns, cmd);
}

Making reboot_pid_ns responsible for the logic that should be bypassing
it is quite confusing.

> I hope my understanding is correct and it makes sense. Thanks for
> taking the time to review.
>
> Brian
>
> 1. https://elixir.bootlin.com/linux/latest/source/kernel/power/process.c#L130
> 2. https://elixir.bootlin.com/linux/latest/source/kernel/pid_namespace.c#L327


I really don't know if allowing PF_SUSPEND_TASK so that hibernation and
the like can work from inside a container makes any sense at all.

But the above is roughly how I would make it work.

Eric


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] pid: Allow frozen userspace to reboot from non-init pid ns
  2023-09-29 17:44 [PATCH] pid: Allow frozen userspace to reboot from non-init pid ns Brian Geffon
  2023-09-29 20:09 ` Kees Cook
@ 2023-10-12  9:48 ` Rafael J. Wysocki
  2023-10-17 19:00   ` Brian Geffon
  1 sibling, 1 reply; 7+ messages in thread
From: Rafael J. Wysocki @ 2023-10-12  9:48 UTC (permalink / raw)
  To: Brian Geffon
  Cc: Christian Brauner, Kees Cook, Rafael J . Wysocki,
	Matthias Kaehlcke, Luis Chamberlain, Frederic Weisbecker,
	linux-kernel

On Fri, Sep 29, 2023 at 7:45 PM Brian Geffon <bgeffon@google.com> wrote:
>
> When the system has a frozen userspace, for example, during hibernation
> the child reaper task will also be frozen. Attmepting to deliver a
> signal to it to handle the reboot(2) will ultimately lead to the system
> hanging unless userspace is thawed.
>
> This change checks if the current task is the suspending task and if so
> it will allow it to proceed with a reboot from the non-init pid ns.
>
> Signed-off-by: Brian Geffon <bgeffon@google.com>
> Reported-by: Matthias Kaehlcke <mka@chromium.org>
> Tested-by: Matthias Kaehlcke <mka@chromium.org>

If the report is public, which I think is the case, having a Link: tag
pointing to it here would be nice.

> ---
>  kernel/pid_namespace.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index 0bf44afe04dd..4a93a5063eda 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -321,6 +321,15 @@ int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd)
>         if (pid_ns == &init_pid_ns)
>                 return 0;
>
> +       if (current->flags & PF_SUSPEND_TASK) {
> +               /*
> +                * Attempting to signal the child_reaper won't work if it's
> +                * frozen. In this case we shutdown the system as if we were in
> +                * the init_pid_ns.
> +                */

Is the system guaranteed to be in the right state for a shutdown at this point?

There is a system-wide suspend-resume or hibernation in progress, so
system_transition_mutex should be held and that should cause reboot()
to block anyway.  Do you know why it doesn't block and why the suspend
task has any reason to call it?

> +               return 0;
> +       }
> +
>         switch (cmd) {
>         case LINUX_REBOOT_CMD_RESTART2:
>         case LINUX_REBOOT_CMD_RESTART:
> --
> 2.42.0.582.g8ccd20d70d-goog
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] pid: Allow frozen userspace to reboot from non-init pid ns
  2023-10-12  9:48 ` Rafael J. Wysocki
@ 2023-10-17 19:00   ` Brian Geffon
  0 siblings, 0 replies; 7+ messages in thread
From: Brian Geffon @ 2023-10-17 19:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Christian Brauner, Kees Cook, Matthias Kaehlcke,
	Luis Chamberlain, Frederic Weisbecker, linux-kernel

On Thu, Oct 12, 2023 at 5:48 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Fri, Sep 29, 2023 at 7:45 PM Brian Geffon <bgeffon@google.com> wrote:
> >
> > When the system has a frozen userspace, for example, during hibernation
> > the child reaper task will also be frozen. Attmepting to deliver a
> > signal to it to handle the reboot(2) will ultimately lead to the system
> > hanging unless userspace is thawed.
> >
> > This change checks if the current task is the suspending task and if so
> > it will allow it to proceed with a reboot from the non-init pid ns.
> >
> > Signed-off-by: Brian Geffon <bgeffon@google.com>
> > Reported-by: Matthias Kaehlcke <mka@chromium.org>
> > Tested-by: Matthias Kaehlcke <mka@chromium.org>
>
> If the report is public, which I think is the case, having a Link: tag
> pointing to it here would be nice.
>
> > ---
> >  kernel/pid_namespace.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> > index 0bf44afe04dd..4a93a5063eda 100644
> > --- a/kernel/pid_namespace.c
> > +++ b/kernel/pid_namespace.c
> > @@ -321,6 +321,15 @@ int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd)
> >         if (pid_ns == &init_pid_ns)
> >                 return 0;
> >
> > +       if (current->flags & PF_SUSPEND_TASK) {
> > +               /*
> > +                * Attempting to signal the child_reaper won't work if it's
> > +                * frozen. In this case we shutdown the system as if we were in
> > +                * the init_pid_ns.
> > +                */
>
> Is the system guaranteed to be in the right state for a shutdown at this point?
>
> There is a system-wide suspend-resume or hibernation in progress, so
> system_transition_mutex should be held and that should cause reboot()
> to block anyway.  Do you know why it doesn't block and why the suspend
> task has any reason to call it?
>

Sorry for the delay in responding to these questions, I'm going to do
another pass through this code and respond with a more detailed
explanation in the next few days.

> > +               return 0;
> > +       }
> > +
> >         switch (cmd) {
> >         case LINUX_REBOOT_CMD_RESTART2:
> >         case LINUX_REBOOT_CMD_RESTART:
> > --
> > 2.42.0.582.g8ccd20d70d-goog
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-10-17 19:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-29 17:44 [PATCH] pid: Allow frozen userspace to reboot from non-init pid ns Brian Geffon
2023-09-29 20:09 ` Kees Cook
2023-09-30  0:25   ` Brian Geffon
2023-10-09 20:05     ` Kees Cook
2023-10-12  3:53     ` Eric W. Biederman
2023-10-12  9:48 ` Rafael J. Wysocki
2023-10-17 19:00   ` Brian Geffon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).