All of lore.kernel.org
 help / color / mirror / Atom feed
* SIGCHLD signal sometimes sent with si_pid==0 (Linux 5.6.5)
@ 2020-04-19 20:13 Christof Meerwald
  2020-04-20 17:05 ` [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent Eric W. Biederman
  2020-04-21 14:59 ` SIGCHLD signal sometimes sent with si_pid==0 (Linux 5.6.5) Eric W. Biederman
  0 siblings, 2 replies; 22+ messages in thread
From: Christof Meerwald @ 2020-04-19 20:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: Eric W. Biederman

Hi,

this is probably related to commit
7a0cf094944e2540758b7f957eb6846d5126f535 (signal: Correct namespace
fixups of si_pid and si_uid).

With a 5.6.5 kernel I am seeing SIGCHLD signals that don't include a
properly set si_pid field - this seems to happen for multi-threaded
child processes.

A simple test program (based on the sample from the signalfd man page):

#include <sys/signalfd.h>
#include <signal.h>
#include <unistd.h>
#include <spawn.h>
#include <stdlib.h>
#include <stdio.h>

#define handle_error(msg) \
    do { perror(msg); exit(EXIT_FAILURE); } while (0)

int main(int argc, char *argv[])
{
  sigset_t mask;
  int sfd;
  struct signalfd_siginfo fdsi;
  ssize_t s;

  sigemptyset(&mask);
  sigaddset(&mask, SIGCHLD);

  if (sigprocmask(SIG_BLOCK, &mask, NULL) == -1)
    handle_error("sigprocmask");

  pid_t chldpid;
  char *chldargv[] = { "./sfdclient", NULL };
  posix_spawn(&chldpid, "./sfdclient", NULL, NULL, chldargv, NULL);

  sfd = signalfd(-1, &mask, 0);
  if (sfd == -1)
    handle_error("signalfd");

  for (;;) {
    s = read(sfd, &fdsi, sizeof(struct signalfd_siginfo));
    if (s != sizeof(struct signalfd_siginfo))
      handle_error("read");

    if (fdsi.ssi_signo == SIGCHLD) {
      printf("Got SIGCHLD %d %d %d %d\n",
          fdsi.ssi_status, fdsi.ssi_code,
          fdsi.ssi_uid, fdsi.ssi_pid);
      return 0;
    } else {
      printf("Read unexpected signal\n");
    }
  }
}


and a multi-threaded client to test with:

#include <unistd.h>
#include <pthread.h>

void *f(void *arg)
{
  sleep(100);
}

int main()
{
  pthread_t t[8];

  for (int i = 0; i != 8; ++i)
  {
    pthread_create(&t[i], NULL, f, NULL);
  }
}


I tried to do a bit of debugging and what seems to be happening is
that

		/* From an ancestor pid namespace? */
		if (!task_pid_nr_ns(current, task_active_pid_ns(t))) {

fails inside task_pid_nr_ns because the check for "pid_alive" fails.

This code seems to be called from do_notify_parent and there we
actually have "tsk != current" (I am assuming both are threads of the
current process?)


Christof

-- 

http://cmeerw.org                              sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-19 20:13 SIGCHLD signal sometimes sent with si_pid==0 (Linux 5.6.5) Christof Meerwald
@ 2020-04-20 17:05 ` Eric W. Biederman
  2020-04-21  8:30   ` Christian Brauner
  2020-04-21  9:04   ` Oleg Nesterov
  2020-04-21 14:59 ` SIGCHLD signal sometimes sent with si_pid==0 (Linux 5.6.5) Eric W. Biederman
  1 sibling, 2 replies; 22+ messages in thread
From: Eric W. Biederman @ 2020-04-20 17:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: Christof Meerwald, Linux Containers, Oleg Nesterov


Christof Meerwald <cmeerw@cmeerw.org> writes:
> Hi,
>
> this is probably related to commit
> 7a0cf094944e2540758b7f957eb6846d5126f535 (signal: Correct namespace
> fixups of si_pid and si_uid).
>
> With a 5.6.5 kernel I am seeing SIGCHLD signals that don't include a
> properly set si_pid field - this seems to happen for multi-threaded
> child processes.
>
> A simple test program (based on the sample from the signalfd man page):
>
> #include <sys/signalfd.h>
> #include <signal.h>
> #include <unistd.h>
> #include <spawn.h>
> #include <stdlib.h>
> #include <stdio.h>
>
> #define handle_error(msg) \
>     do { perror(msg); exit(EXIT_FAILURE); } while (0)
>
> int main(int argc, char *argv[])
> {
>   sigset_t mask;
>   int sfd;
>   struct signalfd_siginfo fdsi;
>   ssize_t s;
>
>   sigemptyset(&mask);
>   sigaddset(&mask, SIGCHLD);
>
>   if (sigprocmask(SIG_BLOCK, &mask, NULL) == -1)
>     handle_error("sigprocmask");
>
>   pid_t chldpid;
>   char *chldargv[] = { "./sfdclient", NULL };
>   posix_spawn(&chldpid, "./sfdclient", NULL, NULL, chldargv, NULL);
>
>   sfd = signalfd(-1, &mask, 0);
>   if (sfd == -1)
>     handle_error("signalfd");
>
>   for (;;) {
>     s = read(sfd, &fdsi, sizeof(struct signalfd_siginfo));
>     if (s != sizeof(struct signalfd_siginfo))
>       handle_error("read");
>
>     if (fdsi.ssi_signo == SIGCHLD) {
>       printf("Got SIGCHLD %d %d %d %d\n",
>           fdsi.ssi_status, fdsi.ssi_code,
>           fdsi.ssi_uid, fdsi.ssi_pid);
>       return 0;
>     } else {
>       printf("Read unexpected signal\n");
>     }
>   }
> }
>
>
> and a multi-threaded client to test with:
>
> #include <unistd.h>
> #include <pthread.h>
>
> void *f(void *arg)
> {
>   sleep(100);
> }
>
> int main()
> {
>   pthread_t t[8];
>
>   for (int i = 0; i != 8; ++i)
>   {
>     pthread_create(&t[i], NULL, f, NULL);
>   }
> }
>
> I tried to do a bit of debugging and what seems to be happening is
> that
>
>   /* From an ancestor pid namespace? */
>   if (!task_pid_nr_ns(current, task_active_pid_ns(t))) {
>
> fails inside task_pid_nr_ns because the check for "pid_alive" fails.
>
> This code seems to be called from do_notify_parent and there we
> actually have "tsk != current" (I am assuming both are threads of the
> current process?)

I instrumented the code with a warning and received the following backtrace:
> WARNING: CPU: 0 PID: 777 at kernel/pid.c:501 __task_pid_nr_ns.cold.6+0xc/0x15
> Modules linked in:
> CPU: 0 PID: 777 Comm: sfdclient Not tainted 5.7.0-rc1userns+ #2924
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> RIP: 0010:__task_pid_nr_ns.cold.6+0xc/0x15
> Code: ff 66 90 48 83 ec 08 89 7c 24 04 48 8d 7e 08 48 8d 74 24 04 e8 9a b6 44 00 48 83 c4 08 c3 48 c7 c7 59 9f ac 82 e8 c2 c4 04 00 <0f> 0b e9 3fd
> RSP: 0018:ffffc9000042fbf8 EFLAGS: 00010046
> RAX: 000000000000000c RBX: 0000000000000000 RCX: ffffc9000042faf4
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81193d29
> RBP: ffffc9000042fc18 R08: 0000000000000000 R09: 0000000000000001
> R10: 000000100f938416 R11: 0000000000000309 R12: ffff8880b941c140
> R13: 0000000000000000 R14: 0000000000000000 R15: ffff8880b941c140
> FS:  0000000000000000(0000) GS:ffff8880bca00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2e8c0a32e0 CR3: 0000000002e10000 CR4: 00000000000006f0
> Call Trace:
>  send_signal+0x1c8/0x310
>  do_notify_parent+0x50f/0x550
>  release_task.part.21+0x4fd/0x620
>  do_exit+0x6f6/0xaf0
>  do_group_exit+0x42/0xb0
>  get_signal+0x13b/0xbb0
>  do_signal+0x2b/0x670
>  ? __audit_syscall_exit+0x24d/0x2b0
>  ? rcu_read_lock_sched_held+0x4d/0x60
>  ? kfree+0x24c/0x2b0
>  do_syscall_64+0x176/0x640
>  ? trace_hardirqs_off_thunk+0x1a/0x1c
>  entry_SYSCALL_64_after_hwframe+0x49/0xb3

The immediate problem is as Christof noticed that "pid_alive(current) == false".
This happens because do_notify_parent is called from the last thread to exit
in a process after that thread has been reaped.

The bigger issue is that do_notify_parent can be called from any
process that manages to wait on a thread of a multi-threaded process
from wait_task_zombie.  So any logic based upon current for
do_notify_parent is just nonsense, as current can be pretty much
anything.

So change do_notify_parent to call __send_signal directly.

Inspecting the code it appears this problem has existed since the pid
namespace support started handling this case in 2.6.30.  This fix only
backports to 7a0cf094944e ("signal: Correct namespace fixups of si_pid and si_uid")
where the problem logic was moved out of __send_signal and into send_signal.

Cc: stable@vger.kernel.org
Fixes: 6588c1e3ff01 ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary
Ref: 921cf9f63089 ("signals: protect cinit from unblocked SIG_DFL signals")
Link: https://lore.kernel.org/lkml/20200419201336.GI22017@edge.cmeerw.net/
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---

Unless someone has an objection I will apply this one and send it to
Linus.

 kernel/signal.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 9899c5f91ee1..a88a89422227 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1993,8 +1993,12 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 		if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
 			sig = 0;
 	}
+	/*
+	 * Bypass send_signal as the si_pid and si_uid values have
+	 * been generated in the parent's namespaces.
+	 */
 	if (valid_signal(sig) && sig)
-		__group_send_sig_info(sig, &info, tsk->parent);
+		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
 	__wake_up_parent(tsk, tsk->parent);
 	spin_unlock_irqrestore(&psig->siglock, flags);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-20 17:05 ` [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent Eric W. Biederman
@ 2020-04-21  8:30   ` Christian Brauner
  2020-04-21  9:28     ` Oleg Nesterov
                       ` (2 more replies)
  2020-04-21  9:04   ` Oleg Nesterov
  1 sibling, 3 replies; 22+ messages in thread
From: Christian Brauner @ 2020-04-21  8:30 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linux Containers, Christof Meerwald, Oleg Nesterov

On Mon, Apr 20, 2020 at 12:05:38PM -0500, Eric W. Biederman wrote:
> 
> Christof Meerwald <cmeerw@cmeerw.org> writes:
> > Hi,
> >
> > this is probably related to commit
> > 7a0cf094944e2540758b7f957eb6846d5126f535 (signal: Correct namespace
> > fixups of si_pid and si_uid).
> >
> > With a 5.6.5 kernel I am seeing SIGCHLD signals that don't include a
> > properly set si_pid field - this seems to happen for multi-threaded
> > child processes.
> >
> > A simple test program (based on the sample from the signalfd man page):
> >
> > #include <sys/signalfd.h>
> > #include <signal.h>
> > #include <unistd.h>
> > #include <spawn.h>
> > #include <stdlib.h>
> > #include <stdio.h>
> >
> > #define handle_error(msg) \
> >     do { perror(msg); exit(EXIT_FAILURE); } while (0)
> >
> > int main(int argc, char *argv[])
> > {
> >   sigset_t mask;
> >   int sfd;
> >   struct signalfd_siginfo fdsi;
> >   ssize_t s;
> >
> >   sigemptyset(&mask);
> >   sigaddset(&mask, SIGCHLD);
> >
> >   if (sigprocmask(SIG_BLOCK, &mask, NULL) == -1)
> >     handle_error("sigprocmask");
> >
> >   pid_t chldpid;
> >   char *chldargv[] = { "./sfdclient", NULL };
> >   posix_spawn(&chldpid, "./sfdclient", NULL, NULL, chldargv, NULL);
> >
> >   sfd = signalfd(-1, &mask, 0);
> >   if (sfd == -1)
> >     handle_error("signalfd");
> >
> >   for (;;) {
> >     s = read(sfd, &fdsi, sizeof(struct signalfd_siginfo));
> >     if (s != sizeof(struct signalfd_siginfo))
> >       handle_error("read");
> >
> >     if (fdsi.ssi_signo == SIGCHLD) {
> >       printf("Got SIGCHLD %d %d %d %d\n",
> >           fdsi.ssi_status, fdsi.ssi_code,
> >           fdsi.ssi_uid, fdsi.ssi_pid);
> >       return 0;
> >     } else {
> >       printf("Read unexpected signal\n");
> >     }
> >   }
> > }
> >
> >
> > and a multi-threaded client to test with:
> >
> > #include <unistd.h>
> > #include <pthread.h>
> >
> > void *f(void *arg)
> > {
> >   sleep(100);
> > }
> >
> > int main()
> > {
> >   pthread_t t[8];
> >
> >   for (int i = 0; i != 8; ++i)
> >   {
> >     pthread_create(&t[i], NULL, f, NULL);
> >   }
> > }
> >
> > I tried to do a bit of debugging and what seems to be happening is
> > that
> >
> >   /* From an ancestor pid namespace? */
> >   if (!task_pid_nr_ns(current, task_active_pid_ns(t))) {
> >
> > fails inside task_pid_nr_ns because the check for "pid_alive" fails.
> >
> > This code seems to be called from do_notify_parent and there we
> > actually have "tsk != current" (I am assuming both are threads of the
> > current process?)
> 
> I instrumented the code with a warning and received the following backtrace:
> > WARNING: CPU: 0 PID: 777 at kernel/pid.c:501 __task_pid_nr_ns.cold.6+0xc/0x15
> > Modules linked in:
> > CPU: 0 PID: 777 Comm: sfdclient Not tainted 5.7.0-rc1userns+ #2924
> > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > RIP: 0010:__task_pid_nr_ns.cold.6+0xc/0x15
> > Code: ff 66 90 48 83 ec 08 89 7c 24 04 48 8d 7e 08 48 8d 74 24 04 e8 9a b6 44 00 48 83 c4 08 c3 48 c7 c7 59 9f ac 82 e8 c2 c4 04 00 <0f> 0b e9 3fd
> > RSP: 0018:ffffc9000042fbf8 EFLAGS: 00010046
> > RAX: 000000000000000c RBX: 0000000000000000 RCX: ffffc9000042faf4
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81193d29
> > RBP: ffffc9000042fc18 R08: 0000000000000000 R09: 0000000000000001
> > R10: 000000100f938416 R11: 0000000000000309 R12: ffff8880b941c140
> > R13: 0000000000000000 R14: 0000000000000000 R15: ffff8880b941c140
> > FS:  0000000000000000(0000) GS:ffff8880bca00000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007f2e8c0a32e0 CR3: 0000000002e10000 CR4: 00000000000006f0
> > Call Trace:
> >  send_signal+0x1c8/0x310
> >  do_notify_parent+0x50f/0x550
> >  release_task.part.21+0x4fd/0x620
> >  do_exit+0x6f6/0xaf0
> >  do_group_exit+0x42/0xb0
> >  get_signal+0x13b/0xbb0
> >  do_signal+0x2b/0x670
> >  ? __audit_syscall_exit+0x24d/0x2b0
> >  ? rcu_read_lock_sched_held+0x4d/0x60
> >  ? kfree+0x24c/0x2b0
> >  do_syscall_64+0x176/0x640
> >  ? trace_hardirqs_off_thunk+0x1a/0x1c
> >  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> 
> The immediate problem is as Christof noticed that "pid_alive(current) == false".
> This happens because do_notify_parent is called from the last thread to exit
> in a process after that thread has been reaped.
> 
> The bigger issue is that do_notify_parent can be called from any
> process that manages to wait on a thread of a multi-threaded process
> from wait_task_zombie.  So any logic based upon current for
> do_notify_parent is just nonsense, as current can be pretty much
> anything.
> 
> So change do_notify_parent to call __send_signal directly.
> 
> Inspecting the code it appears this problem has existed since the pid
> namespace support started handling this case in 2.6.30.  This fix only
> backports to 7a0cf094944e ("signal: Correct namespace fixups of si_pid and si_uid")
> where the problem logic was moved out of __send_signal and into send_signal.
> 
> Cc: stable@vger.kernel.org
> Fixes: 6588c1e3ff01 ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary
> Ref: 921cf9f63089 ("signals: protect cinit from unblocked SIG_DFL signals")
> Link: https://lore.kernel.org/lkml/20200419201336.GI22017@edge.cmeerw.net/
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
> 
> Unless someone has an objection I will apply this one and send it to
> Linus.
> 
>  kernel/signal.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 9899c5f91ee1..a88a89422227 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1993,8 +1993,12 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
>  		if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
>  			sig = 0;
>  	}
> +	/*
> +	 * Bypass send_signal as the si_pid and si_uid values have
> +	 * been generated in the parent's namespaces.
> +	 */

At first I misread that comment as saying that we're skipping sending a
signal not that it relates to a specific function (and I won't admit that
I wrote a whole long paragraph on why I'm confused we're skipping
sending signals on invalid si_pid and si_uid...).

I think it would be worth to say something that simply states the facts
such as:
"If we're not autoreaping, send a signal with si_pid and si_uid set
 as generated in the parent's namespaces."
which imho is way clearer then pointing to out that we're skipping
send_signal(). The logic here and the comment in its current form are
hard to correlate, especially since send_signal() was never called
directly here but is rather called by __group_send_sig_info().
The details of why we switched from __group_send_sign_info() to
__send_signal() could just go into the commit message. It's more
confusing in the code.

>  	if (valid_signal(sig) && sig)

(Not related to you patch but odd ordering of the if-branch here. I
would've expected this to read

if (sig && valid_signal(sig))

especially since "sig" can be set to 0 right before.

> -		__group_send_sig_info(sig, &info, tsk->parent);
> +		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);

So below you switch to __send_signal() but set the "force" argument to
to "false". Before that, if the signal was generated from another pid
namespace and we fixed up si_pid and si_uid the "force" argument was set
to "true", i.e.  __send_signal(..., ..., ..., ..., true) Even though
from reading through the code I think that change is safe to make let me
verify that this was an intentional change?

Thanks!
Christian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-20 17:05 ` [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent Eric W. Biederman
  2020-04-21  8:30   ` Christian Brauner
@ 2020-04-21  9:04   ` Oleg Nesterov
  2020-04-21 10:19     ` [PATCH] remove the no longer needed pid_alive() check in __task_pid_nr_ns() Oleg Nesterov
  1 sibling, 1 reply; 22+ messages in thread
From: Oleg Nesterov @ 2020-04-21  9:04 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, Christof Meerwald, Linux Containers

On 04/20, Eric W. Biederman wrote:
>
> The immediate problem is as Christof noticed that "pid_alive(current) == false".

this is slightly offtopic, but we can probably remove this "pid_alive" check,
pid_nr_ns() checks pid != NULL anyway.

> Inspecting the code it appears this problem has existed since the pid
> namespace support started handling this case

Agreed...

> @@ -1993,8 +1993,12 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
>  		if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
>  			sig = 0;
>  	}
> +	/*
> +	 * Bypass send_signal as the si_pid and si_uid values have
> +	 * been generated in the parent's namespaces.
> +	 */
>  	if (valid_signal(sig) && sig)
> -		__group_send_sig_info(sig, &info, tsk->parent);
> +		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);

Acked-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21  8:30   ` Christian Brauner
@ 2020-04-21  9:28     ` Oleg Nesterov
  2020-04-21 10:21       ` Christian Brauner
  2020-04-21 10:28     ` Christian Brauner
  2020-04-21 14:57     ` Eric W. Biederman
  2 siblings, 1 reply; 22+ messages in thread
From: Oleg Nesterov @ 2020-04-21  9:28 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Eric W. Biederman, linux-kernel, Linux Containers, Christof Meerwald

On 04/21, Christian Brauner wrote:
>
> > -		__group_send_sig_info(sig, &info, tsk->parent);
> > +		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
>
> So below you switch to __send_signal() but set the "force" argument to
> to "false".

it must be false, the signal is generated from the parent's namespace or
its descendant

> Before that, if the signal was generated from another pid
> namespace and we fixed up si_pid and si_uid the "force" argument was set
> to "true",

before that the "force" argument could be falsely true by the same reason,
task_pid_nr_ns(tsk, tsk->parent) can return 0 because "tsk" no longer have
pids after __unhash_process().

Oleg.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] remove the no longer needed pid_alive() check in __task_pid_nr_ns()
  2020-04-21  9:04   ` Oleg Nesterov
@ 2020-04-21 10:19     ` Oleg Nesterov
  2020-04-21 10:50       ` Christian Brauner
  2020-04-21 15:05       ` Eric W. Biederman
  0 siblings, 2 replies; 22+ messages in thread
From: Oleg Nesterov @ 2020-04-21 10:19 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, Christof Meerwald, Linux Containers

Starting from 2c4704756cab ("pids: Move the pgrp and session pid pointers
from task_struct to signal_struct") __task_pid_nr_ns() doesn't dereference
task->group_leader, we can remove the pid_alive() check.

pid_nr_ns() has to check pid != NULL anyway, pid_alive() just adds the
unnecessary confusion.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/pid.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/pid.c b/kernel/pid.c
index bc21c0fb26d8..47221db038e2 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -475,8 +475,7 @@ pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
 	rcu_read_lock();
 	if (!ns)
 		ns = task_active_pid_ns(current);
-	if (likely(pid_alive(task)))
-		nr = pid_nr_ns(rcu_dereference(*task_pid_ptr(task, type)), ns);
+	nr = pid_nr_ns(rcu_dereference(*task_pid_ptr(task, type)), ns);
 	rcu_read_unlock();
 
 	return nr;
-- 
2.25.1.362.g51ebf55



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21  9:28     ` Oleg Nesterov
@ 2020-04-21 10:21       ` Christian Brauner
  2020-04-21 11:11         ` Oleg Nesterov
  0 siblings, 1 reply; 22+ messages in thread
From: Christian Brauner @ 2020-04-21 10:21 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, linux-kernel, Linux Containers, Christof Meerwald

On Tue, Apr 21, 2020 at 11:28:47AM +0200, Oleg Nesterov wrote:
> On 04/21, Christian Brauner wrote:
> >
> > > -		__group_send_sig_info(sig, &info, tsk->parent);
> > > +		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
> >
> > So below you switch to __send_signal() but set the "force" argument to
> > to "false".
> 
> it must be false, the signal is generated from the parent's namespace or
> its descendant
> 
> > Before that, if the signal was generated from another pid
> > namespace and we fixed up si_pid and si_uid the "force" argument was set
> > to "true",
> 
> before that the "force" argument could be falsely true by the same reason,
> task_pid_nr_ns(tsk, tsk->parent) can return 0 because "tsk" no longer have
> pids after __unhash_process().

As I said in my mail, looking at the codepath it seems safe. My question
was whether it is _always_ incorrectly false which I do think it is
because child subreapers can't come from outside the pid namespace. If
they could you could create a scenario where the signal is generated
from a sibling pid namespace in which case it would be correctly set to
true.

Christian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21  8:30   ` Christian Brauner
  2020-04-21  9:28     ` Oleg Nesterov
@ 2020-04-21 10:28     ` Christian Brauner
  2020-04-21 14:57     ` Eric W. Biederman
  2 siblings, 0 replies; 22+ messages in thread
From: Christian Brauner @ 2020-04-21 10:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Christof Meerwald, linux-kernel, Oleg Nesterov

On Tue, Apr 21, 2020 at 10:30:31AM +0200, Christian Brauner wrote:
> On Mon, Apr 20, 2020 at 12:05:38PM -0500, Eric W. Biederman wrote:
> > 
> > Christof Meerwald <cmeerw@cmeerw.org> writes:
> > > Hi,
> > >
> > > this is probably related to commit
> > > 7a0cf094944e2540758b7f957eb6846d5126f535 (signal: Correct namespace
> > > fixups of si_pid and si_uid).
> > >
> > > With a 5.6.5 kernel I am seeing SIGCHLD signals that don't include a
> > > properly set si_pid field - this seems to happen for multi-threaded
> > > child processes.
> > >
> > > A simple test program (based on the sample from the signalfd man page):
> > >
> > > #include <sys/signalfd.h>
> > > #include <signal.h>
> > > #include <unistd.h>
> > > #include <spawn.h>
> > > #include <stdlib.h>
> > > #include <stdio.h>
> > >
> > > #define handle_error(msg) \
> > >     do { perror(msg); exit(EXIT_FAILURE); } while (0)
> > >
> > > int main(int argc, char *argv[])
> > > {
> > >   sigset_t mask;
> > >   int sfd;
> > >   struct signalfd_siginfo fdsi;
> > >   ssize_t s;
> > >
> > >   sigemptyset(&mask);
> > >   sigaddset(&mask, SIGCHLD);
> > >
> > >   if (sigprocmask(SIG_BLOCK, &mask, NULL) == -1)
> > >     handle_error("sigprocmask");
> > >
> > >   pid_t chldpid;
> > >   char *chldargv[] = { "./sfdclient", NULL };
> > >   posix_spawn(&chldpid, "./sfdclient", NULL, NULL, chldargv, NULL);
> > >
> > >   sfd = signalfd(-1, &mask, 0);
> > >   if (sfd == -1)
> > >     handle_error("signalfd");
> > >
> > >   for (;;) {
> > >     s = read(sfd, &fdsi, sizeof(struct signalfd_siginfo));
> > >     if (s != sizeof(struct signalfd_siginfo))
> > >       handle_error("read");
> > >
> > >     if (fdsi.ssi_signo == SIGCHLD) {
> > >       printf("Got SIGCHLD %d %d %d %d\n",
> > >           fdsi.ssi_status, fdsi.ssi_code,
> > >           fdsi.ssi_uid, fdsi.ssi_pid);
> > >       return 0;
> > >     } else {
> > >       printf("Read unexpected signal\n");
> > >     }
> > >   }
> > > }
> > >
> > >
> > > and a multi-threaded client to test with:
> > >
> > > #include <unistd.h>
> > > #include <pthread.h>
> > >
> > > void *f(void *arg)
> > > {
> > >   sleep(100);
> > > }
> > >
> > > int main()
> > > {
> > >   pthread_t t[8];
> > >
> > >   for (int i = 0; i != 8; ++i)
> > >   {
> > >     pthread_create(&t[i], NULL, f, NULL);
> > >   }
> > > }
> > >
> > > I tried to do a bit of debugging and what seems to be happening is
> > > that
> > >
> > >   /* From an ancestor pid namespace? */
> > >   if (!task_pid_nr_ns(current, task_active_pid_ns(t))) {
> > >
> > > fails inside task_pid_nr_ns because the check for "pid_alive" fails.
> > >
> > > This code seems to be called from do_notify_parent and there we
> > > actually have "tsk != current" (I am assuming both are threads of the
> > > current process?)
> > 
> > I instrumented the code with a warning and received the following backtrace:
> > > WARNING: CPU: 0 PID: 777 at kernel/pid.c:501 __task_pid_nr_ns.cold.6+0xc/0x15
> > > Modules linked in:
> > > CPU: 0 PID: 777 Comm: sfdclient Not tainted 5.7.0-rc1userns+ #2924
> > > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > > RIP: 0010:__task_pid_nr_ns.cold.6+0xc/0x15
> > > Code: ff 66 90 48 83 ec 08 89 7c 24 04 48 8d 7e 08 48 8d 74 24 04 e8 9a b6 44 00 48 83 c4 08 c3 48 c7 c7 59 9f ac 82 e8 c2 c4 04 00 <0f> 0b e9 3fd
> > > RSP: 0018:ffffc9000042fbf8 EFLAGS: 00010046
> > > RAX: 000000000000000c RBX: 0000000000000000 RCX: ffffc9000042faf4
> > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81193d29
> > > RBP: ffffc9000042fc18 R08: 0000000000000000 R09: 0000000000000001
> > > R10: 000000100f938416 R11: 0000000000000309 R12: ffff8880b941c140
> > > R13: 0000000000000000 R14: 0000000000000000 R15: ffff8880b941c140
> > > FS:  0000000000000000(0000) GS:ffff8880bca00000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00007f2e8c0a32e0 CR3: 0000000002e10000 CR4: 00000000000006f0
> > > Call Trace:
> > >  send_signal+0x1c8/0x310
> > >  do_notify_parent+0x50f/0x550
> > >  release_task.part.21+0x4fd/0x620
> > >  do_exit+0x6f6/0xaf0
> > >  do_group_exit+0x42/0xb0
> > >  get_signal+0x13b/0xbb0
> > >  do_signal+0x2b/0x670
> > >  ? __audit_syscall_exit+0x24d/0x2b0
> > >  ? rcu_read_lock_sched_held+0x4d/0x60
> > >  ? kfree+0x24c/0x2b0
> > >  do_syscall_64+0x176/0x640
> > >  ? trace_hardirqs_off_thunk+0x1a/0x1c
> > >  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> > 
> > The immediate problem is as Christof noticed that "pid_alive(current) == false".
> > This happens because do_notify_parent is called from the last thread to exit
> > in a process after that thread has been reaped.
> > 
> > The bigger issue is that do_notify_parent can be called from any
> > process that manages to wait on a thread of a multi-threaded process
> > from wait_task_zombie.  So any logic based upon current for
> > do_notify_parent is just nonsense, as current can be pretty much
> > anything.
> > 
> > So change do_notify_parent to call __send_signal directly.
> > 
> > Inspecting the code it appears this problem has existed since the pid
> > namespace support started handling this case in 2.6.30.  This fix only
> > backports to 7a0cf094944e ("signal: Correct namespace fixups of si_pid and si_uid")
> > where the problem logic was moved out of __send_signal and into send_signal.
> > 
> > Cc: stable@vger.kernel.org
> > Fixes: 6588c1e3ff01 ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary
> > Ref: 921cf9f63089 ("signals: protect cinit from unblocked SIG_DFL signals")
> > Link: https://lore.kernel.org/lkml/20200419201336.GI22017@edge.cmeerw.net/
> > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> > ---
> > 
> > Unless someone has an objection I will apply this one and send it to
> > Linus.
> > 
> >  kernel/signal.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/signal.c b/kernel/signal.c
> > index 9899c5f91ee1..a88a89422227 100644
> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -1993,8 +1993,12 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
> >  		if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
> >  			sig = 0;
> >  	}
> > +	/*
> > +	 * Bypass send_signal as the si_pid and si_uid values have
> > +	 * been generated in the parent's namespaces.
> > +	 */
> 
> At first I misread that comment as saying that we're skipping sending a
> signal not that it relates to a specific function (and I won't admit that
> I wrote a whole long paragraph on why I'm confused we're skipping
> sending signals on invalid si_pid and si_uid...).
> 
> I think it would be worth to say something that simply states the facts
> such as:
> "If we're not autoreaping, send a signal with si_pid and si_uid set
>  as generated in the parent's namespaces."
> which imho is way clearer then pointing to out that we're skipping
> send_signal(). The logic here and the comment in its current form are
> hard to correlate, especially since send_signal() was never called
> directly here but is rather called by __group_send_sig_info().
> The details of why we switched from __group_send_sign_info() to
> __send_signal() could just go into the commit message. It's more
> confusing in the code.

Forgot before, otherwise:
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

Thanks!
Christian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] remove the no longer needed pid_alive() check in __task_pid_nr_ns()
  2020-04-21 10:19     ` [PATCH] remove the no longer needed pid_alive() check in __task_pid_nr_ns() Oleg Nesterov
@ 2020-04-21 10:50       ` Christian Brauner
  2020-04-21 15:05       ` Eric W. Biederman
  1 sibling, 0 replies; 22+ messages in thread
From: Christian Brauner @ 2020-04-21 10:50 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, Linux Containers, Christof Meerwald, linux-kernel

On Tue, Apr 21, 2020 at 12:19:04PM +0200, Oleg Nesterov wrote:
> Starting from 2c4704756cab ("pids: Move the pgrp and session pid pointers
> from task_struct to signal_struct") __task_pid_nr_ns() doesn't dereference
> task->group_leader, we can remove the pid_alive() check.
> 
> pid_nr_ns() has to check pid != NULL anyway, pid_alive() just adds the
> unnecessary confusion.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---

Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21 10:21       ` Christian Brauner
@ 2020-04-21 11:11         ` Oleg Nesterov
  2020-04-21 11:26           ` Christian Brauner
  2020-04-21 11:28           ` Oleg Nesterov
  0 siblings, 2 replies; 22+ messages in thread
From: Oleg Nesterov @ 2020-04-21 11:11 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Eric W. Biederman, linux-kernel, Linux Containers, Christof Meerwald

Sorry Christian, I don't understand...

On 04/21, Christian Brauner wrote:
>
> On Tue, Apr 21, 2020 at 11:28:47AM +0200, Oleg Nesterov wrote:
> > On 04/21, Christian Brauner wrote:
> > >
> > > > -		__group_send_sig_info(sig, &info, tsk->parent);
> > > > +		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
> > >
> > > So below you switch to __send_signal() but set the "force" argument to
> > > to "false".
> >
> > it must be false, the signal is generated from the parent's namespace or
> > its descendant
> >
> > > Before that, if the signal was generated from another pid
> > > namespace and we fixed up si_pid and si_uid the "force" argument was set
> > > to "true",
> >
> > before that the "force" argument could be falsely true by the same reason,
> > task_pid_nr_ns(tsk, tsk->parent) can return 0 because "tsk" no longer have
> > pids after __unhash_process().
>
> As I said in my mail, looking at the codepath it seems safe. My question
> was whether it is _always_ incorrectly false which I do think it is

Again, it must be always "false", it can be incorrectly "true" and this
too is fixed by Eric's patch.

> because child subreapers can't come from outside the pid namespace. If
> they could you could create a scenario where the signal is generated
> from a sibling pid namespace in which case it would be correctly set to
> true.

not sure I understand, but probably the answer is "yes"...

task and task->parent either live in the same namespace or the child's
namespace is the descendant of task->parent's namespace. In both cases
task_pid_nr_ns(tsk, tsk->parent) should return the valid pid_nr and
"force" must be false.

The corner case is release_task() when the last exiting sub-thread sends
a signal on behalf of its ->group_leader, and at this point all the tsk's
pid pointers are NULL, that is why "force" can be falsely "true".

Oleg.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21 11:11         ` Oleg Nesterov
@ 2020-04-21 11:26           ` Christian Brauner
  2020-04-21 12:17             ` Oleg Nesterov
  2020-04-21 11:28           ` Oleg Nesterov
  1 sibling, 1 reply; 22+ messages in thread
From: Christian Brauner @ 2020-04-21 11:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Linux Containers, Christof Meerwald, Eric W. Biederman, linux-kernel

On Tue, Apr 21, 2020 at 01:11:40PM +0200, Oleg Nesterov wrote:
> Sorry Christian, I don't understand...

In my original mail, it was really just a clarification question. I
said the patch is correct from looking at the codepaths. :) I was just
trying to see whether there was a potential corner-case we're missing
where "force" could be _validly_ true.

> 
> > because child subreapers can't come from outside the pid namespace. If
> > they could you could create a scenario where the signal is generated
> > from a sibling pid namespace in which case it would be correctly set to
> > true.
> 
> not sure I understand, but probably the answer is "yes"...

(This is really purely academic now since it isn't possible, but for
pure amusement assume that a child subreaper could cross namespace
boundaries (which they don't). A marks itself as a subreaper and creates
a new process B in a new pid namespace <pidnsB>, process B setnses into
<pidnsC> which is a sibling pid namespace, B clones a new proces in
<pidnsC> which is now a full member of <pidnsC>, B dies and C is
reparented to A, B exits and then you'd be getting a sigchld from a pid
in a pid namespace in which you have no pid nr.)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21 11:11         ` Oleg Nesterov
  2020-04-21 11:26           ` Christian Brauner
@ 2020-04-21 11:28           ` Oleg Nesterov
  2020-04-21 11:38             ` Christian Brauner
  1 sibling, 1 reply; 22+ messages in thread
From: Oleg Nesterov @ 2020-04-21 11:28 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Eric W. Biederman, linux-kernel, Linux Containers, Christof Meerwald

On 04/21, Oleg Nesterov wrote:
>
> The corner case is release_task() when the last exiting sub-thread sends
> a signal on behalf of its ->group_leader, and at this point all the tsk's
> pid pointers are NULL, that is why "force" can be falsely "true".

Or do_notify_parent() can be called by debugger from the parent namespace,
in this case "force" can be falsely "true" too.

Oleg.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21 11:28           ` Oleg Nesterov
@ 2020-04-21 11:38             ` Christian Brauner
  0 siblings, 0 replies; 22+ messages in thread
From: Christian Brauner @ 2020-04-21 11:38 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, linux-kernel, Linux Containers, Christof Meerwald

On Tue, Apr 21, 2020 at 01:28:31PM +0200, Oleg Nesterov wrote:
> On 04/21, Oleg Nesterov wrote:
> >
> > The corner case is release_task() when the last exiting sub-thread sends
> > a signal on behalf of its ->group_leader, and at this point all the tsk's
> > pid pointers are NULL, that is why "force" can be falsely "true".
> 
> Or do_notify_parent() can be called by debugger from the parent namespace,
> in this case "force" can be falsely "true" too.

That's an interesting scenario to think about as well. Cross-pid-namespace
interactions are fun... That's why the cross-pid-namespace-signal
sending aspects we discussed a while back on-list though pretty nice to
have at some point are somewhat scary.

Christian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21 11:26           ` Christian Brauner
@ 2020-04-21 12:17             ` Oleg Nesterov
  2020-04-21 12:59               ` Christian Brauner
  0 siblings, 1 reply; 22+ messages in thread
From: Oleg Nesterov @ 2020-04-21 12:17 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Linux Containers, Christof Meerwald, Eric W. Biederman, linux-kernel

On 04/21, Christian Brauner wrote:
>
> process B setnses into
> <pidnsC> which is a sibling pid namespace,

please see pidns_install(), it verifies that

	* Only allow entering the current active pid namespace
	* or a child of the current active pid namespace.

Oleg.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21 12:17             ` Oleg Nesterov
@ 2020-04-21 12:59               ` Christian Brauner
  2020-04-21 13:42                 ` Eric W. Biederman
  0 siblings, 1 reply; 22+ messages in thread
From: Christian Brauner @ 2020-04-21 12:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Linux Containers, Christof Meerwald, Eric W. Biederman, linux-kernel

On Tue, Apr 21, 2020 at 02:17:22PM +0200, Oleg Nesterov wrote:
> On 04/21, Christian Brauner wrote:
> >
> > process B setnses into
> > <pidnsC> which is a sibling pid namespace,
> 
> please see pidns_install(), it verifies that
> 
> 	* Only allow entering the current active pid namespace
> 	* or a child of the current active pid namespace.

I forgot about that.

Though, don't we have the same problem in:

static void do_notify_parent_cldstop(struct task_struct *tsk,
				     bool for_ptracer, int why)

at least for the for_ptrace is false case?

Christian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21 12:59               ` Christian Brauner
@ 2020-04-21 13:42                 ` Eric W. Biederman
  0 siblings, 0 replies; 22+ messages in thread
From: Eric W. Biederman @ 2020-04-21 13:42 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Oleg Nesterov, Linux Containers, Christof Meerwald, linux-kernel

Christian Brauner <christian.brauner@ubuntu.com> writes:

> On Tue, Apr 21, 2020 at 02:17:22PM +0200, Oleg Nesterov wrote:
>> On 04/21, Christian Brauner wrote:
>> >
>> > process B setnses into
>> > <pidnsC> which is a sibling pid namespace,
>> 
>> please see pidns_install(), it verifies that
>> 
>> 	* Only allow entering the current active pid namespace
>> 	* or a child of the current active pid namespace.
>
> I forgot about that.
>
> Though, don't we have the same problem in:
>
> static void do_notify_parent_cldstop(struct task_struct *tsk,
> 				     bool for_ptracer, int why)
>
> at least for the for_ptrace is false case?


The same problem does not exist with do_notify_parent_cldstop because
do_notify_parent_cldstop is always called from current (there is
one case of current->group_leader but that is close enough calculations
made against current are true).

However because do_notify_parent_cldstop calculates si_pid and si_uid of
the target parent process I think we can still get the wrong si_uid.

So it probably makes sense to generalize the fixup code in send_signal
and make do_notify_parent_cldstop just generate ids in the current
namespace and let the fixup code do it's job.

Eric

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21  8:30   ` Christian Brauner
  2020-04-21  9:28     ` Oleg Nesterov
  2020-04-21 10:28     ` Christian Brauner
@ 2020-04-21 14:57     ` Eric W. Biederman
  2020-04-21 15:08       ` Christian Brauner
  2 siblings, 1 reply; 22+ messages in thread
From: Eric W. Biederman @ 2020-04-21 14:57 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-kernel, Linux Containers, Christof Meerwald, Oleg Nesterov

Christian Brauner <christian.brauner@ubuntu.com> writes:

> On Mon, Apr 20, 2020 at 12:05:38PM -0500, Eric W. Biederman wrote:
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index 9899c5f91ee1..a88a89422227 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -1993,8 +1993,12 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
>>  		if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
>>  			sig = 0;
>>  	}
>> +	/*
>> +	 * Bypass send_signal as the si_pid and si_uid values have
>> +	 * been generated in the parent's namespaces.
>> +	 */
>
> At first I misread that comment as saying that we're skipping sending a
> signal not that it relates to a specific function (and I won't admit that
> I wrote a whole long paragraph on why I'm confused we're skipping
> sending signals on invalid si_pid and si_uid...).

I have updated the comment to read:
+       /*
+        * Send with __send_signal as si_pid and si_uid are in the
+        * parent's namespaces.
+        */

That should be enough of a hint for someone to read the code and figure
out what is going on.

Eric

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: SIGCHLD signal sometimes sent with si_pid==0 (Linux 5.6.5)
  2020-04-19 20:13 SIGCHLD signal sometimes sent with si_pid==0 (Linux 5.6.5) Christof Meerwald
  2020-04-20 17:05 ` [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent Eric W. Biederman
@ 2020-04-21 14:59 ` Eric W. Biederman
  1 sibling, 0 replies; 22+ messages in thread
From: Eric W. Biederman @ 2020-04-21 14:59 UTC (permalink / raw)
  To: Christof Meerwald; +Cc: linux-kernel

Christof Meerwald <cmeerw@cmeerw.org> writes:

> Hi,
>
> this is probably related to commit
> 7a0cf094944e2540758b7f957eb6846d5126f535 (signal: Correct namespace
> fixups of si_pid and si_uid).
>
> With a 5.6.5 kernel I am seeing SIGCHLD signals that don't include a
> properly set si_pid field - this seems to happen for multi-threaded
> child processes.

Christof I want to say very good spotting and reporting of this issue.

Thank you.
Eric

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] remove the no longer needed pid_alive() check in __task_pid_nr_ns()
  2020-04-21 10:19     ` [PATCH] remove the no longer needed pid_alive() check in __task_pid_nr_ns() Oleg Nesterov
  2020-04-21 10:50       ` Christian Brauner
@ 2020-04-21 15:05       ` Eric W. Biederman
  2020-04-24 18:05         ` Oleg Nesterov
  1 sibling, 1 reply; 22+ messages in thread
From: Eric W. Biederman @ 2020-04-21 15:05 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, Christof Meerwald, Linux Containers

Oleg Nesterov <oleg@redhat.com> writes:

> Starting from 2c4704756cab ("pids: Move the pgrp and session pid pointers
> from task_struct to signal_struct") __task_pid_nr_ns() doesn't dereference
> task->group_leader, we can remove the pid_alive() check.
>
> pid_nr_ns() has to check pid != NULL anyway, pid_alive() just adds the
> unnecessary confusion.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>

> ---

Good catch that does simplify things.

Eric

>  kernel/pid.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/kernel/pid.c b/kernel/pid.c
> index bc21c0fb26d8..47221db038e2 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -475,8 +475,7 @@ pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
>  	rcu_read_lock();
>  	if (!ns)
>  		ns = task_active_pid_ns(current);
> -	if (likely(pid_alive(task)))
> -		nr = pid_nr_ns(rcu_dereference(*task_pid_ptr(task, type)), ns);
> +	nr = pid_nr_ns(rcu_dereference(*task_pid_ptr(task, type)), ns);
>  	rcu_read_unlock();
>  
>  	return nr;

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent
  2020-04-21 14:57     ` Eric W. Biederman
@ 2020-04-21 15:08       ` Christian Brauner
  0 siblings, 0 replies; 22+ messages in thread
From: Christian Brauner @ 2020-04-21 15:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Christof Meerwald, linux-kernel, Oleg Nesterov

On Tue, Apr 21, 2020 at 09:57:09AM -0500, Eric W. Biederman wrote:
> Christian Brauner <christian.brauner@ubuntu.com> writes:
> 
> > On Mon, Apr 20, 2020 at 12:05:38PM -0500, Eric W. Biederman wrote:
> >> diff --git a/kernel/signal.c b/kernel/signal.c
> >> index 9899c5f91ee1..a88a89422227 100644
> >> --- a/kernel/signal.c
> >> +++ b/kernel/signal.c
> >> @@ -1993,8 +1993,12 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
> >>  		if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
> >>  			sig = 0;
> >>  	}
> >> +	/*
> >> +	 * Bypass send_signal as the si_pid and si_uid values have
> >> +	 * been generated in the parent's namespaces.
> >> +	 */
> >
> > At first I misread that comment as saying that we're skipping sending a
> > signal not that it relates to a specific function (and I won't admit that
> > I wrote a whole long paragraph on why I'm confused we're skipping
> > sending signals on invalid si_pid and si_uid...).
> 
> I have updated the comment to read:
> +       /*
> +        * Send with __send_signal as si_pid and si_uid are in the
> +        * parent's namespaces.
> +        */
> 
> That should be enough of a hint for someone to read the code and figure
> out what is going on.

Perfect, thanks!
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

Christian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] remove the no longer needed pid_alive() check in __task_pid_nr_ns()
  2020-04-21 15:05       ` Eric W. Biederman
@ 2020-04-24 18:05         ` Oleg Nesterov
  2020-04-24 19:54           ` Eric W. Biederman
  0 siblings, 1 reply; 22+ messages in thread
From: Oleg Nesterov @ 2020-04-24 18:05 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, Christof Meerwald, Linux Containers

On 04/21, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > Starting from 2c4704756cab ("pids: Move the pgrp and session pid pointers
> > from task_struct to signal_struct") __task_pid_nr_ns() doesn't dereference
> > task->group_leader, we can remove the pid_alive() check.
> >
> > pid_nr_ns() has to check pid != NULL anyway, pid_alive() just adds the
> > unnecessary confusion.
> >
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>

Thanks, can you take this patch?

Oleg.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] remove the no longer needed pid_alive() check in __task_pid_nr_ns()
  2020-04-24 18:05         ` Oleg Nesterov
@ 2020-04-24 19:54           ` Eric W. Biederman
  0 siblings, 0 replies; 22+ messages in thread
From: Eric W. Biederman @ 2020-04-24 19:54 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, Christof Meerwald, Linux Containers

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/21, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@redhat.com> writes:
>>
>> > Starting from 2c4704756cab ("pids: Move the pgrp and session pid pointers
>> > from task_struct to signal_struct") __task_pid_nr_ns() doesn't dereference
>> > task->group_leader, we can remove the pid_alive() check.
>> >
>> > pid_nr_ns() has to check pid != NULL anyway, pid_alive() just adds the
>> > unnecessary confusion.
>> >
>> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
> Thanks, can you take this patch?

I plan to.  Probably on a topic branch for signals.  I am sorting
that out now.

Eric


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-04-24 19:58 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-19 20:13 SIGCHLD signal sometimes sent with si_pid==0 (Linux 5.6.5) Christof Meerwald
2020-04-20 17:05 ` [PATCH] signal: Avoid corrupting si_pid and si_uid in do_notify_parent Eric W. Biederman
2020-04-21  8:30   ` Christian Brauner
2020-04-21  9:28     ` Oleg Nesterov
2020-04-21 10:21       ` Christian Brauner
2020-04-21 11:11         ` Oleg Nesterov
2020-04-21 11:26           ` Christian Brauner
2020-04-21 12:17             ` Oleg Nesterov
2020-04-21 12:59               ` Christian Brauner
2020-04-21 13:42                 ` Eric W. Biederman
2020-04-21 11:28           ` Oleg Nesterov
2020-04-21 11:38             ` Christian Brauner
2020-04-21 10:28     ` Christian Brauner
2020-04-21 14:57     ` Eric W. Biederman
2020-04-21 15:08       ` Christian Brauner
2020-04-21  9:04   ` Oleg Nesterov
2020-04-21 10:19     ` [PATCH] remove the no longer needed pid_alive() check in __task_pid_nr_ns() Oleg Nesterov
2020-04-21 10:50       ` Christian Brauner
2020-04-21 15:05       ` Eric W. Biederman
2020-04-24 18:05         ` Oleg Nesterov
2020-04-24 19:54           ` Eric W. Biederman
2020-04-21 14:59 ` SIGCHLD signal sometimes sent with si_pid==0 (Linux 5.6.5) Eric W. Biederman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.