linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Eric W. Biederman" <ebiederm@xmission.com>
To: Tycho Andersen <tycho@tycho.pizza>
Cc: Oleg Nesterov <oleg@redhat.com>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	Miklos Szeredi <miklos@szeredi.hu>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] sched: __fatal_signal_pending() should also check PF_EXITING
Date: Fri, 29 Jul 2022 00:04:17 -0500	[thread overview]
Message-ID: <87pmhofr1q.fsf@email.froward.int.ebiederm.org> (raw)
In-Reply-To: <YuL9uc8WfiYlb2Hw@tycho.pizza> (Tycho Andersen's message of "Thu, 28 Jul 2022 15:20:57 -0600")

Tycho Andersen <tycho@tycho.pizza> writes:

> On Thu, Jul 28, 2022 at 11:12:20AM +0200, Oleg Nesterov wrote:
>> This is clear, but it seems you do not understand me. Let me try again
>> to explain and please correct me if I am wrong.
>> 
>> To simplify, lets suppose we have a single-thread task T which simply
>> does
>> 	__set_current_state(TASK_KILLABLE);
>> 	schedule();
>> 
>> in the do_exit() paths after exit_signals() which sets PF_EXITING. Btw,
>> note that it even documents that this thread is not "visible" for the
>> group-wide signals, see below.
>> 
>> Now, suppose that this task is running and you send SIGKILL. T will
>> dequeue SIGKILL from T->penging and call do_exit(). However, it won't
>> remove SIGKILL from T->signal.shared_pending(), and this means that
>> signal_pending(T) is still true.
>> 
>> Now. If we add a PF_EXITING or sigismember(shared_pending, SIGKILL) check
>> into __fatal_signal_pending(), then yes, T won't block in schedule(),
>> schedule()->signal_pending_state() will return true.
>> 
>> But what if T exits on its own? It will block in schedule() forever.
>> schedule()->signal_pending_state() will not even check __fatal_signal_pending(),
>> signal_pending() == F.
>> 
>> Now if you send SIGKILL to this task, SIGKILL won't wake it up or even
>> set TIF_SIGPENDING, complete_signal() will do nothing.
>> 
>> See?
>> 
>> I agree, we should probably cleanup this logic and define how exactly
>> the exiting task should react to signals (not only fatal signals). But
>> your patch certainly doesn't look good to me and it is not enough.
>> May be we can change get_signal() to not remove SIGKILL from t->pending
>> for the start... not sure, this needs another discussion.
>
> Thank you for this! Between that and Eric's line about:
>
>> Frankly that there are some left over SIGKILL bits in the pending mask
>> is a misfeature, and it is definitely not something you should count on.
>
> I think I finally maybe understand the objections.
>
> Is it fair to say that a task with PF_EXITING should never wait? I'm
> wondering if a solution would be to patch the wait code to look for
> PF_EXITING, in addition to checking the signal state.

That will at a minimum change zap_pid_ns_processes to busy wait
instead of sleeping while it waits for children to die.

So we would need to survey the waits that can happen when closing file
descriptors and any other place on the exit path to see how much impact
a such a change would do.


It might be possible to allow an extra SIGKILL to terminate such waits.
We do something like that for coredumps.  But that is incredibly subtle
and a pain to maintain so I want to avoid that if we can.


>> Finally. if fuse_flush() wants __fatal_signal_pending() == T when the
>> caller exits, perhaps it can do it itself? Something like
>> 
>> 	if (current->flags & PF_EXITING) {
>> 		spin_lock_irq(siglock);
>> 		set_thread_flag(TIF_SIGPENDING);
>> 		sigaddset(&current->pending.signal, SIGKILL);
>> 		spin_unlock_irq(siglock);
>> 	}
>> 
>> Sure, this is ugly as hell. But perhaps this can serve as a workaround?
>
> or even just
>
>     if (current->flags & PF_EXITING)
>         return;
>
> since we don't have anyone to send the result of the flush to anyway.
> If we don't end up converging on a fix here, I'll just send that
> patch. Thanks for the suggestion.

If that was limited to the case you care about that would be reasonable.

That will have an effect on any time a process that opens files on a
fuse filesystem exits and depends upon the exit path to close it's file
descriptors to the fuse filesystem.


I do see a plausible solution along those lines.

In fuse_flush instead of using fuse_simple_request call an equivalent
function that when PF_EXITING is true skips calling request_wait_answer.
Or perhaps when PF_EXITING is set uses schedule_work to call the
request_wait_answer.

That will allow everything to work as it does today.  It will optimize
the fuse when file descriptors are called on the exit path.  It will
avoid the hang by removing an indefinite wait on userspace.

This should even generalize into the vfs.  I looked and nfs also looks
like it has the potential to optimize out the wait for the result of the
flush.  A correctly implemented flush method looks to flush any
write-back data when the file is closed and to return any errors from
that flush to the caller of close.  For .flush called from the exit path
aka exit_files aka close_files there is no way to place to return an
error status to, so there is no need to wait for the flush to complete.

That said solve I think it makes sense to solve the problem for fuse
first, and the we can figure out support for other filesystems.

Eric


  reply	other threads:[~2022-07-29  5:05 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-23 17:21 strange interaction between fuse + pidns Tycho Andersen
2022-06-23 21:55 ` Vivek Goyal
2022-06-23 23:41   ` Tycho Andersen
2022-06-24 17:36     ` Vivek Goyal
2022-07-11 10:35 ` Miklos Szeredi
2022-07-11 13:59   ` Miklos Szeredi
2022-07-11 20:25     ` Tycho Andersen
2022-07-11 21:37       ` Eric W. Biederman
2022-07-11 22:53         ` Tycho Andersen
2022-07-11 23:06           ` Eric W. Biederman
2022-07-12 13:43             ` Tycho Andersen
2022-07-12 14:34               ` Eric W. Biederman
2022-07-12 15:14                 ` Tycho Andersen
2022-07-13 17:53                   ` [PATCH] sched: __fatal_signal_pending() should also check PF_EXITING Tycho Andersen
2022-07-20 15:03                     ` Serge E. Hallyn
2022-07-20 20:58                       ` Tycho Andersen
2022-07-21  1:54                         ` Serge E. Hallyn
2022-07-27 15:44                           ` Tycho Andersen
2022-07-27 16:32                             ` Eric W. Biederman
2022-07-27 17:55                               ` Tycho Andersen
2022-07-28 18:48                                 ` Eric W. Biederman
2022-07-27 17:55                             ` Oleg Nesterov
2022-07-27 18:18                               ` Tycho Andersen
2022-07-27 19:19                                 ` Oleg Nesterov
2022-07-27 19:40                                   ` Tycho Andersen
2022-07-28  9:12                                     ` Oleg Nesterov
2022-07-28 21:20                                       ` Tycho Andersen
2022-07-29  5:04                                         ` Eric W. Biederman [this message]
2022-07-29 13:50                                           ` Tycho Andersen
2022-07-29 16:15                                             ` Eric W. Biederman
2022-07-29 16:48                                               ` Tycho Andersen
2022-07-29 17:40                                                 ` [RFC][PATCH] fuse: In fuse_flush only wait if someone wants the return code Eric W. Biederman
2022-07-29 20:47                                                   ` Oleg Nesterov
2022-07-30  0:15                                                     ` Al Viro
2022-07-30  5:10                                                       ` [RFC][PATCH v2] " Eric W. Biederman
2022-08-01 15:16                                                         ` Tycho Andersen
2022-08-02 12:50                                                         ` Miklos Szeredi
2022-08-15 13:59                                                         ` Tycho Andersen
2022-08-15 17:55                                                           ` Serge E. Hallyn
2022-09-01 14:06                                                           ` [PATCH] " Tycho Andersen
2022-09-19 15:03                                                             ` Tycho Andersen
2022-09-20 18:02                                                               ` Serge E. Hallyn
2022-09-26 14:17                                                               ` Tycho Andersen
2022-09-27  9:46                                                             ` Miklos Szeredi
2022-09-29 14:05                                                               ` [fuse-devel] " Stef Bon
2022-09-29 16:39                                                               ` [PATCH v2] " Tycho Andersen
2022-09-30 13:35                                                                 ` Miklos Szeredi
2022-09-30 14:01                                                                   ` Tycho Andersen
2022-09-30 14:41                                                                     ` Miklos Szeredi
2022-09-30 16:09                                                                       ` Tycho Andersen
2022-10-26  9:01                                                                         ` Miklos Szeredi
2022-11-14 16:02                                                                           ` [PATCH v3] " Tycho Andersen
2022-11-28 15:00                                                                             ` Tycho Andersen
2022-12-08 14:26                                                                               ` Miklos Szeredi
2022-12-08 17:49                                                                                 ` Tycho Andersen
2022-12-19 19:16                                                                                   ` Tycho Andersen
2023-01-03 14:51                                                                                     ` Tycho Andersen
2023-01-05 15:15                                                                                       ` Serge E. Hallyn
2023-01-26 14:12                                                                                       ` Miklos Szeredi
2022-09-30 19:47                                                               ` [PATCH] " Serge E. Hallyn
2022-09-19 15:46                                                           ` [RFC][PATCH v2] " Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pmhofr1q.fsf@email.froward.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=oleg@redhat.com \
    --cc=serge@hallyn.com \
    --cc=tycho@tycho.pizza \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).