linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* SIGCHLD signal sometimes sent with si_pid==0 (Linux 5.6.5)
@ 2020-04-19 20:13 Christof Meerwald
  2020-04-20 17:05 ` [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent Eric W. Biederman
  2020-04-21 14:59 ` SIGCHLD signal sometimes sent with si_pid==0 (Linux 5.6.5) Eric W. Biederman
  0 siblings, 2 replies; 22+ messages in thread
From: Christof Meerwald @ 2020-04-19 20:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: Eric W. Biederman

Hi,

this is probably related to commit
7a0cf094944e2540758b7f957eb6846d5126f535 (signal: Correct namespace
fixups of si_pid and si_uid).

With a 5.6.5 kernel I am seeing SIGCHLD signals that don't include a
properly set si_pid field - this seems to happen for multi-threaded
child processes.

A simple test program (based on the sample from the signalfd man page):

#include <sys/signalfd.h>
#include <signal.h>
#include <unistd.h>
#include <spawn.h>
#include <stdlib.h>
#include <stdio.h>

#define handle_error(msg) \
    do { perror(msg); exit(EXIT_FAILURE); } while (0)

int main(int argc, char *argv[])
{
  sigset_t mask;
  int sfd;
  struct signalfd_siginfo fdsi;
  ssize_t s;

  sigemptyset(&mask);
  sigaddset(&mask, SIGCHLD);

  if (sigprocmask(SIG_BLOCK, &mask, NULL) == -1)
    handle_error("sigprocmask");

  pid_t chldpid;
  char *chldargv[] = { "./sfdclient", NULL };
  posix_spawn(&chldpid, "./sfdclient", NULL, NULL, chldargv, NULL);

  sfd = signalfd(-1, &mask, 0);
  if (sfd == -1)
    handle_error("signalfd");

  for (;;) {
    s = read(sfd, &fdsi, sizeof(struct signalfd_siginfo));
    if (s != sizeof(struct signalfd_siginfo))
      handle_error("read");

    if (fdsi.ssi_signo == SIGCHLD) {
      printf("Got SIGCHLD %d %d %d %d\n",
          fdsi.ssi_status, fdsi.ssi_code,
          fdsi.ssi_uid, fdsi.ssi_pid);
      return 0;
    } else {
      printf("Read unexpected signal\n");
    }
  }
}


and a multi-threaded client to test with:

#include <unistd.h>
#include <pthread.h>

void *f(void *arg)
{
  sleep(100);
}

int main()
{
  pthread_t t[8];

  for (int i = 0; i != 8; ++i)
  {
    pthread_create(&t[i], NULL, f, NULL);
  }
}


I tried to do a bit of debugging and what seems to be happening is
that

		/* From an ancestor pid namespace? */
		if (!task_pid_nr_ns(current, task_active_pid_ns(t))) {

fails inside task_pid_nr_ns because the check for "pid_alive" fails.

This code seems to be called from do_notify_parent and there we
actually have "tsk != current" (I am assuming both are threads of the
current process?)


Christof

-- 

http://cmeerw.org                              sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-19 20:13 SIGCHLD signal sometimes sent with si_pid==0 (Linux 5.6.5) Christof Meerwald
@ 2020-04-20 17:05 ` Eric W. Biederman
  2020-04-21  8:30   ` Christian Brauner
  2020-04-21  9:04   ` Oleg Nesterov
  2020-04-21 14:59 ` SIGCHLD signal sometimes sent with si_pid==0 (Linux 5.6.5) Eric W. Biederman
  1 sibling, 2 replies; 22+ messages in thread
From: Eric W. Biederman @ 2020-04-20 17:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: Christof Meerwald, Linux Containers, Oleg Nesterov


Christof Meerwald <cmeerw@cmeerw.org> writes:
> Hi,
>
> this is probably related to commit
> 7a0cf094944e2540758b7f957eb6846d5126f535 (signal: Correct namespace
> fixups of si_pid and si_uid).
>
> With a 5.6.5 kernel I am seeing SIGCHLD signals that don't include a
> properly set si_pid field - this seems to happen for multi-threaded
> child processes.
>
> A simple test program (based on the sample from the signalfd man page):
>
> #include <sys/signalfd.h>
> #include <signal.h>
> #include <unistd.h>
> #include <spawn.h>
> #include <stdlib.h>
> #include <stdio.h>
>
> #define handle_error(msg) \
>     do { perror(msg); exit(EXIT_FAILURE); } while (0)
>
> int main(int argc, char *argv[])
> {
>   sigset_t mask;
>   int sfd;
>   struct signalfd_siginfo fdsi;
>   ssize_t s;
>
>   sigemptyset(&mask);
>   sigaddset(&mask, SIGCHLD);
>
>   if (sigprocmask(SIG_BLOCK, &mask, NULL) == -1)
>     handle_error("sigprocmask");
>
>   pid_t chldpid;
>   char *chldargv[] = { "./sfdclient", NULL };
>   posix_spawn(&chldpid, "./sfdclient", NULL, NULL, chldargv, NULL);
>
>   sfd = signalfd(-1, &mask, 0);
>   if (sfd == -1)
>     handle_error("signalfd");
>
>   for (;;) {
>     s = read(sfd, &fdsi, sizeof(struct signalfd_siginfo));
>     if (s != sizeof(struct signalfd_siginfo))
>       handle_error("read");
>
>     if (fdsi.ssi_signo == SIGCHLD) {
>       printf("Got SIGCHLD %d %d %d %d\n",
>           fdsi.ssi_status, fdsi.ssi_code,
>           fdsi.ssi_uid, fdsi.ssi_pid);
>       return 0;
>     } else {
>       printf("Read unexpected signal\n");
>     }
>   }
> }
>
>
> and a multi-threaded client to test with:
>
> #include <unistd.h>
> #include <pthread.h>
>
> void *f(void *arg)
> {
>   sleep(100);
> }
>
> int main()
> {
>   pthread_t t[8];
>
>   for (int i = 0; i != 8; ++i)
>   {
>     pthread_create(&t[i], NULL, f, NULL);
>   }
> }
>
> I tried to do a bit of debugging and what seems to be happening is
> that
>
>   /* From an ancestor pid namespace? */
>   if (!task_pid_nr_ns(current, task_active_pid_ns(t))) {
>
> fails inside task_pid_nr_ns because the check for "pid_alive" fails.
>
> This code seems to be called from do_notify_parent and there we
> actually have "tsk != current" (I am assuming both are threads of the
> current process?)

I instrumented the code with a warning and received the following backtrace:
> WARNING: CPU: 0 PID: 777 at kernel/pid.c:501 __task_pid_nr_ns.cold.6+0xc/0x15
> Modules linked in:
> CPU: 0 PID: 777 Comm: sfdclient Not tainted 5.7.0-rc1userns+ #2924
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> RIP: 0010:__task_pid_nr_ns.cold.6+0xc/0x15
> Code: ff 66 90 48 83 ec 08 89 7c 24 04 48 8d 7e 08 48 8d 74 24 04 e8 9a b6 44 00 48 83 c4 08 c3 48 c7 c7 59 9f ac 82 e8 c2 c4 04 00 <0f> 0b e9 3fd
> RSP: 0018:ffffc9000042fbf8 EFLAGS: 00010046
> RAX: 000000000000000c RBX: 0000000000000000 RCX: ffffc9000042faf4
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81193d29
> RBP: ffffc9000042fc18 R08: 0000000000000000 R09: 0000000000000001
> R10: 000000100f938416 R11: 0000000000000309 R12: ffff8880b941c140
> R13: 0000000000000000 R14: 0000000000000000 R15: ffff8880b941c140
> FS:  0000000000000000(0000) GS:ffff8880bca00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2e8c0a32e0 CR3: 0000000002e10000 CR4: 00000000000006f0
> Call Trace:
>  send_signal+0x1c8/0x310
>  do_notify_parent+0x50f/0x550
>  release_task.part.21+0x4fd/0x620
>  do_exit+0x6f6/0xaf0
>  do_group_exit+0x42/0xb0
>  get_signal+0x13b/0xbb0
>  do_signal+0x2b/0x670
>  ? __audit_syscall_exit+0x24d/0x2b0
>  ? rcu_read_lock_sched_held+0x4d/0x60
>  ? kfree+0x24c/0x2b0
>  do_syscall_64+0x176/0x640
>  ? trace_hardirqs_off_thunk+0x1a/0x1c
>  entry_SYSCALL_64_after_hwframe+0x49/0xb3

The immediate problem is as Christof noticed that "pid_alive(current) == false".
This happens because do_notify_parent is called from the last thread to exit
in a process after that thread has been reaped.

The bigger issue is that do_notify_parent can be called from any
process that manages to wait on a thread of a multi-threaded process
from wait_task_zombie.  So any logic based upon current for
do_notify_parent is just nonsense, as current can be pretty much
anything.

So change do_notify_parent to call __send_signal directly.

Inspecting the code it appears this problem has existed since the pid
namespace support started handling this case in 2.6.30.  This fix only
backports to 7a0cf094944e ("signal: Correct namespace fixups of si_pid and si_uid")
where the problem logic was moved out of __send_signal and into send_signal.

Cc: stable@vger.kernel.org
Fixes: 6588c1e3ff01 ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary
Ref: 921cf9f63089 ("signals: protect cinit from unblocked SIG_DFL signals")
Link: https://lore.kernel.org/lkml/20200419201336.GI22017@edge.cmeerw.net/
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---

Unless someone has an objection I will apply this one and send it to
Linus.

 kernel/signal.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 9899c5f91ee1..a88a89422227 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1993,8 +1993,12 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 		if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
 			sig = 0;
 	}
+	/*
+	 * Bypass send_signal as the si_pid and si_uid values have
+	 * been generated in the parent's namespaces.
+	 */
 	if (valid_signal(sig) && sig)
-		__group_send_sig_info(sig, &info, tsk->parent);
+		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
 	__wake_up_parent(tsk, tsk->parent);
 	spin_unlock_irqrestore(&psig->siglock, flags);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-20 17:05 ` [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent Eric W. Biederman
@ 2020-04-21  8:30   ` Christian Brauner
  2020-04-21  9:28     ` Oleg Nesterov
                       ` (2 more replies)
  2020-04-21  9:04   ` Oleg Nesterov
  1 sibling, 3 replies; 22+ messages in thread
From: Christian Brauner @ 2020-04-21  8:30 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linux Containers, Christof Meerwald, Oleg Nesterov

On Mon, Apr 20, 2020 at 12:05:38PM -0500, Eric W. Biederman wrote:
> 
> Christof Meerwald <cmeerw@cmeerw.org> writes:
> > Hi,
> >
> > this is probably related to commit
> > 7a0cf094944e2540758b7f957eb6846d5126f535 (signal: Correct namespace
> > fixups of si_pid and si_uid).
> >
> > With a 5.6.5 kernel I am seeing SIGCHLD signals that don't include a
> > properly set si_pid field - this seems to happen for multi-threaded
> > child processes.
> >
> > A simple test program (based on the sample from the signalfd man page):
> >
> > #include <sys/signalfd.h>
> > #include <signal.h>
> > #include <unistd.h>
> > #include <spawn.h>
> > #include <stdlib.h>
> > #include <stdio.h>
> >
> > #define handle_error(msg) \
> >     do { perror(msg); exit(EXIT_FAILURE); } while (0)
> >
> > int main(int argc, char *argv[])
> > {
> >   sigset_t mask;
> >   int sfd;
> >   struct signalfd_siginfo fdsi;
> >   ssize_t s;
> >
> >   sigemptyset(&mask);
> >   sigaddset(&mask, SIGCHLD);
> >
> >   if (sigprocmask(SIG_BLOCK, &mask, NULL) == -1)
> >     handle_error("sigprocmask");
> >
> >   pid_t chldpid;
> >   char *chldargv[] = { "./sfdclient", NULL };
> >   posix_spawn(&chldpid, "./sfdclient", NULL, NULL, chldargv, NULL);
> >
> >   sfd = signalfd(-1, &mask, 0);
> >   if (sfd == -1)
> >     handle_error("signalfd");
> >
> >   for (;;) {
> >     s = read(sfd, &fdsi, sizeof(struct signalfd_siginfo));
> >     if (s != sizeof(struct signalfd_siginfo))
> >       handle_error("read");
> >
> >     if (fdsi.ssi_signo == SIGCHLD) {
> >       printf("Got SIGCHLD %d %d %d %d\n",
> >           fdsi.ssi_status, fdsi.ssi_code,
> >           fdsi.ssi_uid, fdsi.ssi_pid);
> >       return 0;
> >     } else {
> >       printf("Read unexpected signal\n");
> >     }
> >   }
> > }
> >
> >
> > and a multi-threaded client to test with:
> >
> > #include <unistd.h>
> > #include <pthread.h>
> >
> > void *f(void *arg)
> > {
> >   sleep(100);
> > }
> >
> > int main()
> > {
> >   pthread_t t[8];
> >
> >   for (int i = 0; i != 8; ++i)
> >   {
> >     pthread_create(&t[i], NULL, f, NULL);
> >   }
> > }
> >
> > I tried to do a bit of debugging and what seems to be happening is
> > that
> >
> >   /* From an ancestor pid namespace? */
> >   if (!task_pid_nr_ns(current, task_active_pid_ns(t))) {
> >
> > fails inside task_pid_nr_ns because the check for "pid_alive" fails.
> >
> > This code seems to be called from do_notify_parent and there we
> > actually have "tsk != current" (I am assuming both are threads of the
> > current process?)
> 
> I instrumented the code with a warning and received the following backtrace:
> > WARNING: CPU: 0 PID: 777 at kernel/pid.c:501 __task_pid_nr_ns.cold.6+0xc/0x15
> > Modules linked in:
> > CPU: 0 PID: 777 Comm: sfdclient Not tainted 5.7.0-rc1userns+ #2924
> > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > RIP: 0010:__task_pid_nr_ns.cold.6+0xc/0x15
> > Code: ff 66 90 48 83 ec 08 89 7c 24 04 48 8d 7e 08 48 8d 74 24 04 e8 9a b6 44 00 48 83 c4 08 c3 48 c7 c7 59 9f ac 82 e8 c2 c4 04 00 <0f> 0b e9 3fd
> > RSP: 0018:ffffc9000042fbf8 EFLAGS: 00010046
> > RAX: 000000000000000c RBX: 0000000000000000 RCX: ffffc9000042faf4
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81193d29
> > RBP: ffffc9000042fc18 R08: 0000000000000000 R09: 0000000000000001
> > R10: 000000100f938416 R11: 0000000000000309 R12: ffff8880b941c140
> > R13: 0000000000000000 R14: 0000000000000000 R15: ffff8880b941c140
> > FS:  0000000000000000(0000) GS:ffff8880bca00000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007f2e8c0a32e0 CR3: 0000000002e10000 CR4: 00000000000006f0
> > Call Trace:
> >  send_signal+0x1c8/0x310
> >  do_notify_parent+0x50f/0x550
> >  release_task.part.21+0x4fd/0x620
> >  do_exit+0x6f6/0xaf0
> >  do_group_exit+0x42/0xb0
> >  get_signal+0x13b/0xbb0
> >  do_signal+0x2b/0x670
> >  ? __audit_syscall_exit+0x24d/0x2b0
> >  ? rcu_read_lock_sched_held+0x4d/0x60
> >  ? kfree+0x24c/0x2b0
> >  do_syscall_64+0x176/0x640
> >  ? trace_hardirqs_off_thunk+0x1a/0x1c
> >  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> 
> The immediate problem is as Christof noticed that "pid_alive(current) == false".
> This happens because do_notify_parent is called from the last thread to exit
> in a process after that thread has been reaped.
> 
> The bigger issue is that do_notify_parent can be called from any
> process that manages to wait on a thread of a multi-threaded process
> from wait_task_zombie.  So any logic based upon current for
> do_notify_parent is just nonsense, as current can be pretty much
> anything.
> 
> So change do_notify_parent to call __send_signal directly.
> 
> Inspecting the code it appears this problem has existed since the pid
> namespace support started handling this case in 2.6.30.  This fix only
> backports to 7a0cf094944e ("signal: Correct namespace fixups of si_pid and si_uid")
> where the problem logic was moved out of __send_signal and into send_signal.
> 
> Cc: stable@vger.kernel.org
> Fixes: 6588c1e3ff01 ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary
> Ref: 921cf9f63089 ("signals: protect cinit from unblocked SIG_DFL signals")
> Link: https://lore.kernel.org/lkml/20200419201336.GI22017@edge.cmeerw.net/
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
> 
> Unless someone has an objection I will apply this one and send it to
> Linus.
> 
>  kernel/signal.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 9899c5f91ee1..a88a89422227 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1993,8 +1993,12 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
>  		if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
>  			sig = 0;
>  	}
> +	/*
> +	 * Bypass send_signal as the si_pid and si_uid values have
> +	 * been generated in the parent's namespaces.
> +	 */

At first I misread that comment as saying that we're skipping sending a
signal not that it relates to a specific function (and I won't admit that
I wrote a whole long paragraph on why I'm confused we're skipping
sending signals on invalid si_pid and si_uid...).

I think it would be worth to say something that simply states the facts
such as:
"If we're not autoreaping, send a signal with si_pid and si_uid set
 as generated in the parent's namespaces."
which imho is way clearer then pointing to out that we're skipping
send_signal(). The logic here and the comment in its current form are
hard to correlate, especially since send_signal() was never called
directly here but is rather called by __group_send_sig_info().
The details of why we switched from __group_send_sign_info() to
__send_signal() could just go into the commit message. It's more
confusing in the code.

>  	if (valid_signal(sig) && sig)

(Not related to you patch but odd ordering of the if-branch here. I
would've expected this to read

if (sig && valid_signal(sig))

especially since "sig" can be set to 0 right before.

> -		__group_send_sig_info(sig, &info, tsk->parent);
> +		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);

So below you switch to __send_signal() but set the "force" argument to
to "false". Before that, if the signal was generated from another pid
namespace and we fixed up si_pid and si_uid the "force" argument was set
to "true", i.e.  __send_signal(..., ..., ..., ..., true) Even though
from reading through the code I think that change is safe to make let me
verify that this was an intentional change?

Thanks!
Christian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-20 17:05 ` [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent Eric W. Biederman
  2020-04-21  8:30   ` Christian Brauner
@ 2020-04-21  9:04   ` Oleg Nesterov
  2020-04-21 10:19     ` [PATCH] remove the no longer needed pid_alive() check in __task_pid_nr_ns() Oleg Nesterov
  1 sibling, 1 reply; 22+ messages in thread
From: Oleg Nesterov @ 2020-04-21  9:04 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, Christof Meerwald, Linux Containers

On 04/20, Eric W. Biederman wrote:
>
> The immediate problem is as Christof noticed that "pid_alive(current) == false".

this is slightly offtopic, but we can probably remove this "pid_alive" check,
pid_nr_ns() checks pid != NULL anyway.

> Inspecting the code it appears this problem has existed since the pid
> namespace support started handling this case

Agreed...

> @@ -1993,8 +1993,12 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
>  		if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
>  			sig = 0;
>  	}
> +	/*
> +	 * Bypass send_signal as the si_pid and si_uid values have
> +	 * been generated in the parent's namespaces.
> +	 */
>  	if (valid_signal(sig) && sig)
> -		__group_send_sig_info(sig, &info, tsk->parent);
> +		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);

Acked-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21  8:30   ` Christian Brauner
@ 2020-04-21  9:28     ` Oleg Nesterov
  2020-04-21 10:21       ` Christian Brauner
  2020-04-21 10:28     ` Christian Brauner
  2020-04-21 14:57     ` Eric W. Biederman
  2 siblings, 1 reply; 22+ messages in thread
From: Oleg Nesterov @ 2020-04-21  9:28 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Eric W. Biederman, linux-kernel, Linux Containers, Christof Meerwald

On 04/21, Christian Brauner wrote:
>
> > -		__group_send_sig_info(sig, &info, tsk->parent);
> > +		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
>
> So below you switch to __send_signal() but set the "force" argument to
> to "false".

it must be false, the signal is generated from the parent's namespace or
its descendant

> Before that, if the signal was generated from another pid
> namespace and we fixed up si_pid and si_uid the "force" argument was set
> to "true",

before that the "force" argument could be falsely true by the same reason,
task_pid_nr_ns(tsk, tsk->parent) can return 0 because "tsk" no longer have
pids after __unhash_process().

Oleg.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] remove the no longer needed pid_alive() check in __task_pid_nr_ns()
  2020-04-21  9:04   ` Oleg Nesterov
@ 2020-04-21 10:19     ` Oleg Nesterov
  2020-04-21 10:50       ` Christian Brauner
  2020-04-21 15:05       ` Eric W. Biederman
  0 siblings, 2 replies; 22+ messages in thread
From: Oleg Nesterov @ 2020-04-21 10:19 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, Christof Meerwald, Linux Containers

Starting from 2c4704756cab ("pids: Move the pgrp and session pid pointers
from task_struct to signal_struct") __task_pid_nr_ns() doesn't dereference
task->group_leader, we can remove the pid_alive() check.

pid_nr_ns() has to check pid != NULL anyway, pid_alive() just adds the
unnecessary confusion.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/pid.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/pid.c b/kernel/pid.c
index bc21c0fb26d8..47221db038e2 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -475,8 +475,7 @@ pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
 	rcu_read_lock();
 	if (!ns)
 		ns = task_active_pid_ns(current);
-	if (likely(pid_alive(task)))
-		nr = pid_nr_ns(rcu_dereference(*task_pid_ptr(task, type)), ns);
+	nr = pid_nr_ns(rcu_dereference(*task_pid_ptr(task, type)), ns);
 	rcu_read_unlock();
 
 	return nr;
-- 
2.25.1.362.g51ebf55



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21  9:28     ` Oleg Nesterov
@ 2020-04-21 10:21       ` Christian Brauner
  2020-04-21 11:11         ` Oleg Nesterov
  0 siblings, 1 reply; 22+ messages in thread
From: Christian Brauner @ 2020-04-21 10:21 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, linux-kernel, Linux Containers, Christof Meerwald

On Tue, Apr 21, 2020 at 11:28:47AM +0200, Oleg Nesterov wrote:
> On 04/21, Christian Brauner wrote:
> >
> > > -		__group_send_sig_info(sig, &info, tsk->parent);
> > > +		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
> >
> > So below you switch to __send_signal() but set the "force" argument to
> > to "false".
> 
> it must be false, the signal is generated from the parent's namespace or
> its descendant
> 
> > Before that, if the signal was generated from another pid
> > namespace and we fixed up si_pid and si_uid the "force" argument was set
> > to "true",
> 
> before that the "force" argument could be falsely true by the same reason,
> task_pid_nr_ns(tsk, tsk->parent) can return 0 because "tsk" no longer have
> pids after __unhash_process().

As I said in my mail, looking at the codepath it seems safe. My question
was whether it is _always_ incorrectly false which I do think it is
because child subreapers can't come from outside the pid namespace. If
they could you could create a scenario where the signal is generated
from a sibling pid namespace in which case it would be correctly set to
true.

Christian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21  8:30   ` Christian Brauner
  2020-04-21  9:28     ` Oleg Nesterov
@ 2020-04-21 10:28     ` Christian Brauner
  2020-04-21 14:57     ` Eric W. Biederman
  2 siblings, 0 replies; 22+ messages in thread
From: Christian Brauner @ 2020-04-21 10:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Christof Meerwald, linux-kernel, Oleg Nesterov

On Tue, Apr 21, 2020 at 10:30:31AM +0200, Christian Brauner wrote:
> On Mon, Apr 20, 2020 at 12:05:38PM -0500, Eric W. Biederman wrote:
> > 
> > Christof Meerwald <cmeerw@cmeerw.org> writes:
> > > Hi,
> > >
> > > this is probably related to commit
> > > 7a0cf094944e2540758b7f957eb6846d5126f535 (signal: Correct namespace
> > > fixups of si_pid and si_uid).
> > >
> > > With a 5.6.5 kernel I am seeing SIGCHLD signals that don't include a
> > > properly set si_pid field - this seems to happen for multi-threaded
> > > child processes.
> > >
> > > A simple test program (based on the sample from the signalfd man page):
> > >
> > > #include <sys/signalfd.h>
> > > #include <signal.h>
> > > #include <unistd.h>
> > > #include <spawn.h>
> > > #include <stdlib.h>
> > > #include <stdio.h>
> > >
> > > #define handle_error(msg) \
> > >     do { perror(msg); exit(EXIT_FAILURE); } while (0)
> > >
> > > int main(int argc, char *argv[])
> > > {
> > >   sigset_t mask;
> > >   int sfd;
> > >   struct signalfd_siginfo fdsi;
> > >   ssize_t s;
> > >
> > >   sigemptyset(&mask);
> > >   sigaddset(&mask, SIGCHLD);
> > >
> > >   if (sigprocmask(SIG_BLOCK, &mask, NULL) == -1)
> > >     handle_error("sigprocmask");
> > >
> > >   pid_t chldpid;
> > >   char *chldargv[] = { "./sfdclient", NULL };
> > >   posix_spawn(&chldpid, "./sfdclient", NULL, NULL, chldargv, NULL);
> > >
> > >   sfd = signalfd(-1, &mask, 0);
> > >   if (sfd == -1)
> > >     handle_error("signalfd");
> > >
> > >   for (;;) {
> > >     s = read(sfd, &fdsi, sizeof(struct signalfd_siginfo));
> > >     if (s != sizeof(struct signalfd_siginfo))
> > >       handle_error("read");
> > >
> > >     if (fdsi.ssi_signo == SIGCHLD) {
> > >       printf("Got SIGCHLD %d %d %d %d\n",
> > >           fdsi.ssi_status, fdsi.ssi_code,
> > >           fdsi.ssi_uid, fdsi.ssi_pid);
> > >       return 0;
> > >     } else {
> > >       printf("Read unexpected signal\n");
> > >     }
> > >   }
> > > }
> > >
> > >
> > > and a multi-threaded client to test with:
> > >
> > > #include <unistd.h>
> > > #include <pthread.h>
> > >
> > > void *f(void *arg)
> > > {
> > >   sleep(100);
> > > }
> > >
> > > int main()
> > > {
> > >   pthread_t t[8];
> > >
> > >   for (int i = 0; i != 8; ++i)
> > >   {
> > >     pthread_create(&t[i], NULL, f, NULL);
> > >   }
> > > }
> > >
> > > I tried to do a bit of debugging and what seems to be happening is
> > > that
> > >
> > >   /* From an ancestor pid namespace? */
> > >   if (!task_pid_nr_ns(current, task_active_pid_ns(t))) {
> > >
> > > fails inside task_pid_nr_ns because the check for "pid_alive" fails.
> > >
> > > This code seems to be called from do_notify_parent and there we
> > > actually have "tsk != current" (I am assuming both are threads of the
> > > current process?)
> > 
> > I instrumented the code with a warning and received the following backtrace:
> > > WARNING: CPU: 0 PID: 777 at kernel/pid.c:501 __task_pid_nr_ns.cold.6+0xc/0x15
> > > Modules linked in:
> > > CPU: 0 PID: 777 Comm: sfdclient Not tainted 5.7.0-rc1userns+ #2924
> > > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > > RIP: 0010:__task_pid_nr_ns.cold.6+0xc/0x15
> > > Code: ff 66 90 48 83 ec 08 89 7c 24 04 48 8d 7e 08 48 8d 74 24 04 e8 9a b6 44 00 48 83 c4 08 c3 48 c7 c7 59 9f ac 82 e8 c2 c4 04 00 <0f> 0b e9 3fd
> > > RSP: 0018:ffffc9000042fbf8 EFLAGS: 00010046
> > > RAX: 000000000000000c RBX: 0000000000000000 RCX: ffffc9000042faf4
> > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81193d29
> > > RBP: ffffc9000042fc18 R08: 0000000000000000 R09: 0000000000000001
> > > R10: 000000100f938416 R11: 0000000000000309 R12: ffff8880b941c140
> > > R13: 0000000000000000 R14: 0000000000000000 R15: ffff8880b941c140
> > > FS:  0000000000000000(0000) GS:ffff8880bca00000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00007f2e8c0a32e0 CR3: 0000000002e10000 CR4: 00000000000006f0
> > > Call Trace:
> > >  send_signal+0x1c8/0x310
> > >  do_notify_parent+0x50f/0x550
> > >  release_task.part.21+0x4fd/0x620
> > >  do_exit+0x6f6/0xaf0
> > >  do_group_exit+0x42/0xb0
> > >  get_signal+0x13b/0xbb0
> > >  do_signal+0x2b/0x670
> > >  ? __audit_syscall_exit+0x24d/0x2b0
> > >  ? rcu_read_lock_sched_held+0x4d/0x60
> > >  ? kfree+0x24c/0x2b0
> > >  do_syscall_64+0x176/0x640
> > >  ? trace_hardirqs_off_thunk+0x1a/0x1c
> > >  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> > 
> > The immediate problem is as Christof noticed that "pid_alive(current) == false".
> > This happens because do_notify_parent is called from the last thread to exit
> > in a process after that thread has been reaped.
> > 
> > The bigger issue is that do_notify_parent can be called from any
> > process that manages to wait on a thread of a multi-threaded process
> > from wait_task_zombie.  So any logic based upon current for
> > do_notify_parent is just nonsense, as current can be pretty much
> > anything.
> > 
> > So change do_notify_parent to call __send_signal directly.
> > 
> > Inspecting the code it appears this problem has existed since the pid
> > namespace support started handling this case in 2.6.30.  This fix only
> > backports to 7a0cf094944e ("signal: Correct namespace fixups of si_pid and si_uid")
> > where the problem logic was moved out of __send_signal and into send_signal.
> > 
> > Cc: stable@vger.kernel.org
> > Fixes: 6588c1e3ff01 ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary
> > Ref: 921cf9f63089 ("signals: protect cinit from unblocked SIG_DFL signals")
> > Link: https://lore.kernel.org/lkml/20200419201336.GI22017@edge.cmeerw.net/
> > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> > ---
> > 
> > Unless someone has an objection I will apply this one and send it to
> > Linus.
> > 
> >  kernel/signal.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/signal.c b/kernel/signal.c
> > index 9899c5f91ee1..a88a89422227 100644
> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -1993,8 +1993,12 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
> >  		if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
> >  			sig = 0;
> >  	}
> > +	/*
> > +	 * Bypass send_signal as the si_pid and si_uid values have
> > +	 * been generated in the parent's namespaces.
> > +	 */
> 
> At first I misread that comment as saying that we're skipping sending a
> signal not that it relates to a specific function (and I won't admit that
> I wrote a whole long paragraph on why I'm confused we're skipping
> sending signals on invalid si_pid and si_uid...).
> 
> I think it would be worth to say something that simply states the facts
> such as:
> "If we're not autoreaping, send a signal with si_pid and si_uid set
>  as generated in the parent's namespaces."
> which imho is way clearer then pointing to out that we're skipping
> send_signal(). The logic here and the comment in its current form are
> hard to correlate, especially since send_signal() was never called
> directly here but is rather called by __group_send_sig_info().
> The details of why we switched from __group_send_sign_info() to
> __send_signal() could just go into the commit message. It's more
> confusing in the code.

Forgot before, otherwise:
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

Thanks!
Christian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] remove the no longer needed pid_alive() check in __task_pid_nr_ns()
  2020-04-21 10:19     ` [PATCH] remove the no longer needed pid_alive() check in __task_pid_nr_ns() Oleg Nesterov
@ 2020-04-21 10:50       ` Christian Brauner
  2020-04-21 15:05       ` Eric W. Biederman
  1 sibling, 0 replies; 22+ messages in thread
From: Christian Brauner @ 2020-04-21 10:50 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, Linux Containers, Christof Meerwald, linux-kernel

On Tue, Apr 21, 2020 at 12:19:04PM +0200, Oleg Nesterov wrote:
> Starting from 2c4704756cab ("pids: Move the pgrp and session pid pointers
> from task_struct to signal_struct") __task_pid_nr_ns() doesn't dereference
> task->group_leader, we can remove the pid_alive() check.
> 
> pid_nr_ns() has to check pid != NULL anyway, pid_alive() just adds the
> unnecessary confusion.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---

Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21 10:21       ` Christian Brauner
@ 2020-04-21 11:11         ` Oleg Nesterov
  2020-04-21 11:26           ` Christian Brauner
  2020-04-21 11:28           ` Oleg Nesterov
  0 siblings, 2 replies; 22+ messages in thread
From: Oleg Nesterov @ 2020-04-21 11:11 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Eric W. Biederman, linux-kernel, Linux Containers, Christof Meerwald

Sorry Christian, I don't understand...

On 04/21, Christian Brauner wrote:
>
> On Tue, Apr 21, 2020 at 11:28:47AM +0200, Oleg Nesterov wrote:
> > On 04/21, Christian Brauner wrote:
> > >
> > > > -		__group_send_sig_info(sig, &info, tsk->parent);
> > > > +		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
> > >
> > > So below you switch to __send_signal() but set the "force" argument to
> > > to "false".
> >
> > it must be false, the signal is generated from the parent's namespace or
> > its descendant
> >
> > > Before that, if the signal was generated from another pid
> > > namespace and we fixed up si_pid and si_uid the "force" argument was set
> > > to "true",
> >
> > before that the "force" argument could be falsely true by the same reason,
> > task_pid_nr_ns(tsk, tsk->parent) can return 0 because "tsk" no longer have
> > pids after __unhash_process().
>
> As I said in my mail, looking at the codepath it seems safe. My question
> was whether it is _always_ incorrectly false which I do think it is

Again, it must be always "false", it can be incorrectly "true" and this
too is fixed by Eric's patch.

> because child subreapers can't come from outside the pid namespace. If
> they could you could create a scenario where the signal is generated
> from a sibling pid namespace in which case it would be correctly set to
> true.

not sure I understand, but probably the answer is "yes"...

task and task->parent either live in the same namespace or the child's
namespace is the descendant of task->parent's namespace. In both cases
task_pid_nr_ns(tsk, tsk->parent) should return the valid pid_nr and
"force" must be false.

The corner case is release_task() when the last exiting sub-thread sends
a signal on behalf of its ->group_leader, and at this point all the tsk's
pid pointers are NULL, that is why "force" can be falsely "true".

Oleg.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21 11:11         ` Oleg Nesterov
@ 2020-04-21 11:26           ` Christian Brauner
  2020-04-21 12:17             ` Oleg Nesterov
  2020-04-21 11:28           ` Oleg Nesterov
  1 sibling, 1 reply; 22+ messages in thread
From: Christian Brauner @ 2020-04-21 11:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Linux Containers, Christof Meerwald, Eric W. Biederman, linux-kernel

On Tue, Apr 21, 2020 at 01:11:40PM +0200, Oleg Nesterov wrote:
> Sorry Christian, I don't understand...

In my original mail, it was really just a clarification question. I
said the patch is correct from looking at the codepaths. :) I was just
trying to see whether there was a potential corner-case we're missing
where "force" could be _validly_ true.

> 
> > because child subreapers can't come from outside the pid namespace. If
> > they could you could create a scenario where the signal is generated
> > from a sibling pid namespace in which case it would be correctly set to
> > true.
> 
> not sure I understand, but probably the answer is "yes"...

(This is really purely academic now since it isn't possible, but for
pure amusement assume that a child subreaper could cross namespace
boundaries (which they don't). A marks itself as a subreaper and creates
a new process B in a new pid namespace <pidnsB>, process B setnses into
<pidnsC> which is a sibling pid namespace, B clones a new proces in
<pidnsC> which is now a full member of <pidnsC>, B dies and C is
reparented to A, B exits and then you'd be getting a sigchld from a pid
in a pid namespace in which you have no pid nr.)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21 11:11         ` Oleg Nesterov
  2020-04-21 11:26           ` Christian Brauner
@ 2020-04-21 11:28           ` Oleg Nesterov
  2020-04-21 11:38             ` Christian Brauner
  1 sibling, 1 reply; 22+ messages in thread
From: Oleg Nesterov @ 2020-04-21 11:28 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Eric W. Biederman, linux-kernel, Linux Containers, Christof Meerwald

On 04/21, Oleg Nesterov wrote:
>
> The corner case is release_task() when the last exiting sub-thread sends
> a signal on behalf of its ->group_leader, and at this point all the tsk's
> pid pointers are NULL, that is why "force" can be falsely "true".

Or do_notify_parent() can be called by debugger from the parent namespace,
in this case "force" can be falsely "true" too.

Oleg.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21 11:28           ` Oleg Nesterov
@ 2020-04-21 11:38             ` Christian Brauner
  0 siblings, 0 replies; 22+ messages in thread
From: Christian Brauner @ 2020-04-21 11:38 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, linux-kernel, Linux Containers, Christof Meerwald

On Tue, Apr 21, 2020 at 01:28:31PM +0200, Oleg Nesterov wrote:
> On 04/21, Oleg Nesterov wrote:
> >
> > The corner case is release_task() when the last exiting sub-thread sends
> > a signal on behalf of its ->group_leader, and at this point all the tsk's
> > pid pointers are NULL, that is why "force" can be falsely "true".
> 
> Or do_notify_parent() can be called by debugger from the parent namespace,
> in this case "force" can be falsely "true" too.

That's an interesting scenario to think about as well. Cross-pid-namespace
interactions are fun... That's why the cross-pid-namespace-signal
sending aspects we discussed a while back on-list though pretty nice to
have at some point are somewhat scary.

Christian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21 11:26           ` Christian Brauner
@ 2020-04-21 12:17             ` Oleg Nesterov
  2020-04-21 12:59               ` Christian Brauner
  0 siblings, 1 reply; 22+ messages in thread
From: Oleg Nesterov @ 2020-04-21 12:17 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Linux Containers, Christof Meerwald, Eric W. Biederman, linux-kernel

On 04/21, Christian Brauner wrote:
>
> process B setnses into
> <pidnsC> which is a sibling pid namespace,

please see pidns_install(), it verifies that

	* Only allow entering the current active pid namespace
	* or a child of the current active pid namespace.

Oleg.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21 12:17             ` Oleg Nesterov
@ 2020-04-21 12:59               ` Christian Brauner
  2020-04-21 13:42                 ` Eric W. Biederman
  0 siblings, 1 reply; 22+ messages in thread
From: Christian Brauner @ 2020-04-21 12:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Linux Containers, Christof Meerwald, Eric W. Biederman, linux-kernel

On Tue, Apr 21, 2020 at 02:17:22PM +0200, Oleg Nesterov wrote:
> On 04/21, Christian Brauner wrote:
> >
> > process B setnses into
> > <pidnsC> which is a sibling pid namespace,
> 
> please see pidns_install(), it verifies that
> 
> 	* Only allow entering the current active pid namespace
> 	* or a child of the current active pid namespace.

I forgot about that.

Though, don't we have the same problem in:

static void do_notify_parent_cldstop(struct task_struct *tsk,
				     bool for_ptracer, int why)

at least for the for_ptrace is false case?

Christian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21 12:59               ` Christian Brauner
@ 2020-04-21 13:42                 ` Eric W. Biederman
  0 siblings, 0 replies; 22+ messages in thread
From: Eric W. Biederman @ 2020-04-21 13:42 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Oleg Nesterov, Linux Containers, Christof Meerwald, linux-kernel

Christian Brauner <christian.brauner@ubuntu.com> writes:

> On Tue, Apr 21, 2020 at 02:17:22PM +0200, Oleg Nesterov wrote:
>> On 04/21, Christian Brauner wrote:
>> >
>> > process B setnses into
>> > <pidnsC> which is a sibling pid namespace,
>> 
>> please see pidns_install(), it verifies that
>> 
>> 	* Only allow entering the current active pid namespace
>> 	* or a child of the current active pid namespace.
>
> I forgot about that.
>
> Though, don't we have the same problem in:
>
> static void do_notify_parent_cldstop(struct task_struct *tsk,
> 				     bool for_ptracer, int why)
>
> at least for the for_ptrace is false case?


The same problem does not exist with do_notify_parent_cldstop because
do_notify_parent_cldstop is always called from current (there is
one case of current->group_leader but that is close enough calculations
made against current are true).

However because do_notify_parent_cldstop calculates si_pid and si_uid of
the target parent process I think we can still get the wrong si_uid.

So it probably makes sense to generalize the fixup code in send_signal
and make do_notify_parent_cldstop just generate ids in the current
namespace and let the fixup code do it's job.

Eric

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21  8:30   ` Christian Brauner
  2020-04-21  9:28     ` Oleg Nesterov
  2020-04-21 10:28     ` Christian Brauner
@ 2020-04-21 14:57     ` Eric W. Biederman
  2020-04-21 15:08       ` Christian Brauner
  2 siblings, 1 reply; 22+ messages in thread
From: Eric W. Biederman @ 2020-04-21 14:57 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-kernel, Linux Containers, Christof Meerwald, Oleg Nesterov

Christian Brauner <christian.brauner@ubuntu.com> writes:

> On Mon, Apr 20, 2020 at 12:05:38PM -0500, Eric W. Biederman wrote:
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index 9899c5f91ee1..a88a89422227 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -1993,8 +1993,12 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
>>  		if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
>>  			sig = 0;
>>  	}
>> +	/*
>> +	 * Bypass send_signal as the si_pid and si_uid values have
>> +	 * been generated in the parent's namespaces.
>> +	 */
>
> At first I misread that comment as saying that we're skipping sending a
> signal not that it relates to a specific function (and I won't admit that
> I wrote a whole long paragraph on why I'm confused we're skipping
> sending signals on invalid si_pid and si_uid...).

I have updated the comment to read:
+       /*
+        * Send with __send_signal as si_pid and si_uid are in the
+        * parent's namespaces.
+        */

That should be enough of a hint for someone to read the code and figure
out what is going on.

Eric

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: SIGCHLD signal sometimes sent with si_pid==0 (Linux 5.6.5)
  2020-04-19 20:13 SIGCHLD signal sometimes sent with si_pid==0 (Linux 5.6.5) Christof Meerwald
  2020-04-20 17:05 ` [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent Eric W. Biederman
@ 2020-04-21 14:59 ` Eric W. Biederman
  1 sibling, 0 replies; 22+ messages in thread
From: Eric W. Biederman @ 2020-04-21 14:59 UTC (permalink / raw)
  To: Christof Meerwald; +Cc: linux-kernel

Christof Meerwald <cmeerw@cmeerw.org> writes:

> Hi,
>
> this is probably related to commit
> 7a0cf094944e2540758b7f957eb6846d5126f535 (signal: Correct namespace
> fixups of si_pid and si_uid).
>
> With a 5.6.5 kernel I am seeing SIGCHLD signals that don't include a
> properly set si_pid field - this seems to happen for multi-threaded
> child processes.

Christof I want to say very good spotting and reporting of this issue.

Thank you.
Eric

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] remove the no longer needed pid_alive() check in __task_pid_nr_ns()
  2020-04-21 10:19     ` [PATCH] remove the no longer needed pid_alive() check in __task_pid_nr_ns() Oleg Nesterov
  2020-04-21 10:50       ` Christian Brauner
@ 2020-04-21 15:05       ` Eric W. Biederman
  2020-04-24 18:05         ` Oleg Nesterov
  1 sibling, 1 reply; 22+ messages in thread
From: Eric W. Biederman @ 2020-04-21 15:05 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, Christof Meerwald, Linux Containers

Oleg Nesterov <oleg@redhat.com> writes:

> Starting from 2c4704756cab ("pids: Move the pgrp and session pid pointers
> from task_struct to signal_struct") __task_pid_nr_ns() doesn't dereference
> task->group_leader, we can remove the pid_alive() check.
>
> pid_nr_ns() has to check pid != NULL anyway, pid_alive() just adds the
> unnecessary confusion.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>

> ---

Good catch that does simplify things.

Eric

>  kernel/pid.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/kernel/pid.c b/kernel/pid.c
> index bc21c0fb26d8..47221db038e2 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -475,8 +475,7 @@ pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
>  	rcu_read_lock();
>  	if (!ns)
>  		ns = task_active_pid_ns(current);
> -	if (likely(pid_alive(task)))
> -		nr = pid_nr_ns(rcu_dereference(*task_pid_ptr(task, type)), ns);
> +	nr = pid_nr_ns(rcu_dereference(*task_pid_ptr(task, type)), ns);
>  	rcu_read_unlock();
>  
>  	return nr;

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21 14:57     ` Eric W. Biederman
@ 2020-04-21 15:08       ` Christian Brauner
  0 siblings, 0 replies; 22+ messages in thread
From: Christian Brauner @ 2020-04-21 15:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Christof Meerwald, linux-kernel, Oleg Nesterov

On Tue, Apr 21, 2020 at 09:57:09AM -0500, Eric W. Biederman wrote:
> Christian Brauner <christian.brauner@ubuntu.com> writes:
> 
> > On Mon, Apr 20, 2020 at 12:05:38PM -0500, Eric W. Biederman wrote:
> >> diff --git a/kernel/signal.c b/kernel/signal.c
> >> index 9899c5f91ee1..a88a89422227 100644
> >> --- a/kernel/signal.c
> >> +++ b/kernel/signal.c
> >> @@ -1993,8 +1993,12 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
> >>  		if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
> >>  			sig = 0;
> >>  	}
> >> +	/*
> >> +	 * Bypass send_signal as the si_pid and si_uid values have
> >> +	 * been generated in the parent's namespaces.
> >> +	 */
> >
> > At first I misread that comment as saying that we're skipping sending a
> > signal not that it relates to a specific function (and I won't admit that
> > I wrote a whole long paragraph on why I'm confused we're skipping
> > sending signals on invalid si_pid and si_uid...).
> 
> I have updated the comment to read:
> +       /*
> +        * Send with __send_signal as si_pid and si_uid are in the
> +        * parent's namespaces.
> +        */
> 
> That should be enough of a hint for someone to read the code and figure
> out what is going on.

Perfect, thanks!
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

Christian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] remove the no longer needed pid_alive() check in __task_pid_nr_ns()
  2020-04-21 15:05       ` Eric W. Biederman
@ 2020-04-24 18:05         ` Oleg Nesterov
  2020-04-24 19:54           ` Eric W. Biederman
  0 siblings, 1 reply; 22+ messages in thread
From: Oleg Nesterov @ 2020-04-24 18:05 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, Christof Meerwald, Linux Containers

On 04/21, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > Starting from 2c4704756cab ("pids: Move the pgrp and session pid pointers
> > from task_struct to signal_struct") __task_pid_nr_ns() doesn't dereference
> > task->group_leader, we can remove the pid_alive() check.
> >
> > pid_nr_ns() has to check pid != NULL anyway, pid_alive() just adds the
> > unnecessary confusion.
> >
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>

Thanks, can you take this patch?

Oleg.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] remove the no longer needed pid_alive() check in __task_pid_nr_ns()
  2020-04-24 18:05         ` Oleg Nesterov
@ 2020-04-24 19:54           ` Eric W. Biederman
  0 siblings, 0 replies; 22+ messages in thread
From: Eric W. Biederman @ 2020-04-24 19:54 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, Christof Meerwald, Linux Containers

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/21, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@redhat.com> writes:
>>
>> > Starting from 2c4704756cab ("pids: Move the pgrp and session pid pointers
>> > from task_struct to signal_struct") __task_pid_nr_ns() doesn't dereference
>> > task->group_leader, we can remove the pid_alive() check.
>> >
>> > pid_nr_ns() has to check pid != NULL anyway, pid_alive() just adds the
>> > unnecessary confusion.
>> >
>> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
> Thanks, can you take this patch?

I plan to.  Probably on a topic branch for signals.  I am sorting
that out now.

Eric


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-04-24 19:58 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-19 20:13 SIGCHLD signal sometimes sent with si_pid==0 (Linux 5.6.5) Christof Meerwald
2020-04-20 17:05 ` [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent Eric W. Biederman
2020-04-21  8:30   ` Christian Brauner
2020-04-21  9:28     ` Oleg Nesterov
2020-04-21 10:21       ` Christian Brauner
2020-04-21 11:11         ` Oleg Nesterov
2020-04-21 11:26           ` Christian Brauner
2020-04-21 12:17             ` Oleg Nesterov
2020-04-21 12:59               ` Christian Brauner
2020-04-21 13:42                 ` Eric W. Biederman
2020-04-21 11:28           ` Oleg Nesterov
2020-04-21 11:38             ` Christian Brauner
2020-04-21 10:28     ` Christian Brauner
2020-04-21 14:57     ` Eric W. Biederman
2020-04-21 15:08       ` Christian Brauner
2020-04-21  9:04   ` Oleg Nesterov
2020-04-21 10:19     ` [PATCH] remove the no longer needed pid_alive() check in __task_pid_nr_ns() Oleg Nesterov
2020-04-21 10:50       ` Christian Brauner
2020-04-21 15:05       ` Eric W. Biederman
2020-04-24 18:05         ` Oleg Nesterov
2020-04-24 19:54           ` Eric W. Biederman
2020-04-21 14:59 ` SIGCHLD signal sometimes sent with si_pid==0 (Linux 5.6.5) Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).