linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Eric W. Biederman" <ebiederm@xmission.com>
To: Tycho Andersen <tycho@tycho.pizza>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	Christian Brauner <brauner@kernel.org>,
	fuse-devel <fuse-devel@lists.sourceforge.net>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: strange interaction between fuse + pidns
Date: Tue, 12 Jul 2022 09:34:50 -0500	[thread overview]
Message-ID: <87sfn62yd1.fsf@email.froward.int.ebiederm.org> (raw)
In-Reply-To: <Ys16l6+iotX2JE33@netflix> (Tycho Andersen's message of "Tue, 12 Jul 2022 07:43:51 -0600")

Tycho Andersen <tycho@tycho.pizza> writes:

> On Mon, Jul 11, 2022 at 06:06:21PM -0500, Eric W. Biederman wrote:
>> Tycho Andersen <tycho@tycho.pizza> writes:
>> It is not different enough to change the semantics.  What I am aiming
>> for is having a dedicated flag indicating a task will exit, that
>> fatal_signal_pending can check.  And I intend to make that flag one way
>> so that once it is set it will never be cleared.
>
> Ok - how far out is that? I'd like to try to convince Miklos to land
> the fuse part of this fix now, but without the "look at shared signals
> too" patch, that fix is useless. I'm not married to my patch, but I
> would like to get this fixed somehow soon.

My point is that we need to figure out why you need the look at shared
signals.

If I can get everything reviewed my changes will be in the next merge
window (it unfortunately always takes longer to get the code reviewed
than I would like).

However when my changes land does not matter.  What you are trying to
solve is orthogonal of my on-going work.

The problem is that looking at shared signals is fundamentally broken.
A case in point is that kernel threads can have a pending SIGKILL that
is not a fatal signal.  As kernel threads are allowed to ignore or even
handle SIGKILL.

If you want to change fatal_signal_pending to include PF_EXITING I would
need to double check the implications but I think that would work, and
would not have the problems including the shared pending state of
SIGKILL.

>> The other thing I have played with that might be relevant was removing
>> the explicit wait in zap_pid_ns_processes and simply not allowing wait
>> to reap the pid namespace init until all it's children had been reaped.
>> Essentially how we deal with the thread group leader for ordinary
>> processes.  Does that sound like it might help in the fuse case?
>
> No, the problem is that the wait code doesn't know to look in the
> right place, so waiting later still won't help.

I was suggesting to modify the kernel so that zap_pid_ns_processes would
not wait for the zapped processes.  Instead I was proposing that
delay_group_leader called from wait_consider_task would simply refuse to
allow the init process of a pid namespace to be reaped until every other
process of that pid namespace had exited.

You can prototype how that would affect the deadlock by simply removing
the waiting from zap_pid_ns_processes.

I suggest that simply because that has the potential to remove some of
the strange pid namespace cases.

I don't understand the problematic interaction between pid namespace
shutdown and the fuse daemon, so I am merely suggesting a possibility
that I know can simplify pid namespace shutdown.

Something like:

diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index f4f8cb0435b4..d22a30b0b0cf 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -207,47 +207,6 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
 	read_unlock(&tasklist_lock);
 	rcu_read_unlock();
 
-	/*
-	 * Reap the EXIT_ZOMBIE children we had before we ignored SIGCHLD.
-	 * kernel_wait4() will also block until our children traced from the
-	 * parent namespace are detached and become EXIT_DEAD.
-	 */
-	do {
-		clear_thread_flag(TIF_SIGPENDING);
-		rc = kernel_wait4(-1, NULL, __WALL, NULL);
-	} while (rc != -ECHILD);
-
-	/*
-	 * kernel_wait4() misses EXIT_DEAD children, and EXIT_ZOMBIE
-	 * process whose parents processes are outside of the pid
-	 * namespace.  Such processes are created with setns()+fork().
-	 *
-	 * If those EXIT_ZOMBIE processes are not reaped by their
-	 * parents before their parents exit, they will be reparented
-	 * to pid_ns->child_reaper.  Thus pidns->child_reaper needs to
-	 * stay valid until they all go away.
-	 *
-	 * The code relies on the pid_ns->child_reaper ignoring
-	 * SIGCHILD to cause those EXIT_ZOMBIE processes to be
-	 * autoreaped if reparented.
-	 *
-	 * Semantically it is also desirable to wait for EXIT_ZOMBIE
-	 * processes before allowing the child_reaper to be reaped, as
-	 * that gives the invariant that when the init process of a
-	 * pid namespace is reaped all of the processes in the pid
-	 * namespace are gone.
-	 *
-	 * Once all of the other tasks are gone from the pid_namespace
-	 * free_pid() will awaken this task.
-	 */
-	for (;;) {
-		set_current_state(TASK_INTERRUPTIBLE);
-		if (pid_ns->pid_allocated == init_pids)
-			break;
-		schedule();
-	}
-	__set_current_state(TASK_RUNNING);
-
 	if (pid_ns->reboot)
 		current->signal->group_exit_code = pid_ns->reboot;
 

Eric

  reply	other threads:[~2022-07-12 14:35 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-23 17:21 strange interaction between fuse + pidns Tycho Andersen
2022-06-23 21:55 ` Vivek Goyal
2022-06-23 23:41   ` Tycho Andersen
2022-06-24 17:36     ` Vivek Goyal
2022-07-11 10:35 ` Miklos Szeredi
2022-07-11 13:59   ` Miklos Szeredi
2022-07-11 20:25     ` Tycho Andersen
2022-07-11 21:37       ` Eric W. Biederman
2022-07-11 22:53         ` Tycho Andersen
2022-07-11 23:06           ` Eric W. Biederman
2022-07-12 13:43             ` Tycho Andersen
2022-07-12 14:34               ` Eric W. Biederman [this message]
2022-07-12 15:14                 ` Tycho Andersen
2022-07-13 17:53                   ` [PATCH] sched: __fatal_signal_pending() should also check PF_EXITING Tycho Andersen
2022-07-20 15:03                     ` Serge E. Hallyn
2022-07-20 20:58                       ` Tycho Andersen
2022-07-21  1:54                         ` Serge E. Hallyn
2022-07-27 15:44                           ` Tycho Andersen
2022-07-27 16:32                             ` Eric W. Biederman
2022-07-27 17:55                               ` Tycho Andersen
2022-07-28 18:48                                 ` Eric W. Biederman
2022-07-27 17:55                             ` Oleg Nesterov
2022-07-27 18:18                               ` Tycho Andersen
2022-07-27 19:19                                 ` Oleg Nesterov
2022-07-27 19:40                                   ` Tycho Andersen
2022-07-28  9:12                                     ` Oleg Nesterov
2022-07-28 21:20                                       ` Tycho Andersen
2022-07-29  5:04                                         ` Eric W. Biederman
2022-07-29 13:50                                           ` Tycho Andersen
2022-07-29 16:15                                             ` Eric W. Biederman
2022-07-29 16:48                                               ` Tycho Andersen
2022-07-29 17:40                                                 ` [RFC][PATCH] fuse: In fuse_flush only wait if someone wants the return code Eric W. Biederman
2022-07-29 20:47                                                   ` Oleg Nesterov
2022-07-30  0:15                                                     ` Al Viro
2022-07-30  5:10                                                       ` [RFC][PATCH v2] " Eric W. Biederman
2022-08-01 15:16                                                         ` Tycho Andersen
2022-08-02 12:50                                                         ` Miklos Szeredi
2022-08-15 13:59                                                         ` Tycho Andersen
2022-08-15 17:55                                                           ` Serge E. Hallyn
2022-09-01 14:06                                                           ` [PATCH] " Tycho Andersen
2022-09-19 15:03                                                             ` Tycho Andersen
2022-09-20 18:02                                                               ` Serge E. Hallyn
2022-09-26 14:17                                                               ` Tycho Andersen
2022-09-27  9:46                                                             ` Miklos Szeredi
2022-09-29 14:05                                                               ` [fuse-devel] " Stef Bon
2022-09-29 16:39                                                               ` [PATCH v2] " Tycho Andersen
2022-09-30 13:35                                                                 ` Miklos Szeredi
2022-09-30 14:01                                                                   ` Tycho Andersen
2022-09-30 14:41                                                                     ` Miklos Szeredi
2022-09-30 16:09                                                                       ` Tycho Andersen
2022-10-26  9:01                                                                         ` Miklos Szeredi
2022-11-14 16:02                                                                           ` [PATCH v3] " Tycho Andersen
2022-11-28 15:00                                                                             ` Tycho Andersen
2022-12-08 14:26                                                                               ` Miklos Szeredi
2022-12-08 17:49                                                                                 ` Tycho Andersen
2022-12-19 19:16                                                                                   ` Tycho Andersen
2023-01-03 14:51                                                                                     ` Tycho Andersen
2023-01-05 15:15                                                                                       ` Serge E. Hallyn
2023-01-26 14:12                                                                                       ` Miklos Szeredi
2022-09-30 19:47                                                               ` [PATCH] " Serge E. Hallyn
2022-09-19 15:46                                                           ` [RFC][PATCH v2] " Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sfn62yd1.fsf@email.froward.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=brauner@kernel.org \
    --cc=fuse-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=tycho@tycho.pizza \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).