linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fs: process fput task_work with TWA_SIGNAL
@ 2021-01-05 18:29 Jens Axboe
  2021-01-07 22:17 ` Doug Anderson
  2021-01-08  5:26 ` Al Viro
  0 siblings, 2 replies; 14+ messages in thread
From: Jens Axboe @ 2021-01-05 18:29 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel; +Cc: Alexander Viro, Oleg Nesterov, Song Liu

Song reported a boot regression in a kvm image with 5.11-rc, and bisected
it down to the below patch. Debugging this issue, turns out that the boot
stalled when a task is waiting on a pipe being released. As we no longer
run task_work from get_signal() unless it's queued with TWA_SIGNAL, the
task goes idle without running the task_work. This prevents ->release()
from being called on the pipe, which another boot task is waiting on.

Use TWA_SIGNAL for the file fput work to ensure it's run before the task
goes idle.

Fixes: 98b89b649fce ("signal: kill JOBCTL_TASK_WORK")
Reported-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

---

The other alternative here is obviously to re-instate the:

if (unlikely(current->task_works))
	task_work_run();

in get_signal() that we had before this change. Might be safer in case
there are other cases that need to ensure the work is run in a timely
fashion, though I do think it's cleaner to long term to correctly mark
task_work with the needed notification type. Comments welcome...

diff --git a/fs/file_table.c b/fs/file_table.c
index 45437f8e1003..7c76b611c95b 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -338,7 +338,13 @@ void fput_many(struct file *file, unsigned int refs)
 
 		if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
 			init_task_work(&file->f_u.fu_rcuhead, ____fput);
-			if (!task_work_add(task, &file->f_u.fu_rcuhead, TWA_RESUME))
+			/*
+			 * We could be dependent on the fput task_work running,
+			 * eg for pipes where someone is waiting on release
+			 * being called. Use TWA_SIGNAL to ensure it's run
+			 * before the task goes idle.
+			 */
+			if (!task_work_add(task, &file->f_u.fu_rcuhead, TWA_SIGNAL))
 				return;
 			/*
 			 * After this task has run exit_task_work(),

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: process fput task_work with TWA_SIGNAL
  2021-01-05 18:29 [PATCH] fs: process fput task_work with TWA_SIGNAL Jens Axboe
@ 2021-01-07 22:17 ` Doug Anderson
  2021-01-08  3:52   ` Jens Axboe
  2021-01-08  5:26 ` Al Viro
  1 sibling, 1 reply; 14+ messages in thread
From: Doug Anderson @ 2021-01-07 22:17 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-fsdevel, linux-kernel, Alexander Viro, Oleg Nesterov, Song Liu

Hi,

On Tue, Jan 5, 2021 at 10:30 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> Song reported a boot regression in a kvm image with 5.11-rc, and bisected
> it down to the below patch. Debugging this issue, turns out that the boot
> stalled when a task is waiting on a pipe being released. As we no longer
> run task_work from get_signal() unless it's queued with TWA_SIGNAL, the
> task goes idle without running the task_work. This prevents ->release()
> from being called on the pipe, which another boot task is waiting on.
>
> Use TWA_SIGNAL for the file fput work to ensure it's run before the task
> goes idle.
>
> Fixes: 98b89b649fce ("signal: kill JOBCTL_TASK_WORK")
> Reported-by: Song Liu <songliubraving@fb.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

I just spend a bit of time bisecting and landed on commit 98b89b649fce
("signal: kill JOBCTL_TASK_WORK") causing my failure to bootup
mainline.  Your patch fixes my problem.  I haven't done any analysis
of the code--just testing, thus:

Tested-by: Douglas Anderson <dianders@chromium.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: process fput task_work with TWA_SIGNAL
  2021-01-07 22:17 ` Doug Anderson
@ 2021-01-08  3:52   ` Jens Axboe
  2021-01-08  6:20     ` Sedat Dilek
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2021-01-08  3:52 UTC (permalink / raw)
  To: Doug Anderson
  Cc: linux-fsdevel, linux-kernel, Alexander Viro, Oleg Nesterov, Song Liu

On 1/7/21 3:17 PM, Doug Anderson wrote:
> Hi,
> 
> On Tue, Jan 5, 2021 at 10:30 AM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> Song reported a boot regression in a kvm image with 5.11-rc, and bisected
>> it down to the below patch. Debugging this issue, turns out that the boot
>> stalled when a task is waiting on a pipe being released. As we no longer
>> run task_work from get_signal() unless it's queued with TWA_SIGNAL, the
>> task goes idle without running the task_work. This prevents ->release()
>> from being called on the pipe, which another boot task is waiting on.
>>
>> Use TWA_SIGNAL for the file fput work to ensure it's run before the task
>> goes idle.
>>
>> Fixes: 98b89b649fce ("signal: kill JOBCTL_TASK_WORK")
>> Reported-by: Song Liu <songliubraving@fb.com>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> I just spend a bit of time bisecting and landed on commit 98b89b649fce
> ("signal: kill JOBCTL_TASK_WORK") causing my failure to bootup
> mainline.  Your patch fixes my problem.  I haven't done any analysis
> of the code--just testing, thus:
> 
> Tested-by: Douglas Anderson <dianders@chromium.org>

Thanks, adding your Tested-by.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: process fput task_work with TWA_SIGNAL
  2021-01-05 18:29 [PATCH] fs: process fput task_work with TWA_SIGNAL Jens Axboe
  2021-01-07 22:17 ` Doug Anderson
@ 2021-01-08  5:26 ` Al Viro
  2021-01-08  6:21   ` Sedat Dilek
  2021-01-08  6:46   ` Al Viro
  1 sibling, 2 replies; 14+ messages in thread
From: Al Viro @ 2021-01-08  5:26 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, linux-kernel, Oleg Nesterov, Song Liu

On Tue, Jan 05, 2021 at 11:29:11AM -0700, Jens Axboe wrote:
> Song reported a boot regression in a kvm image with 5.11-rc, and bisected
> it down to the below patch. Debugging this issue, turns out that the boot
> stalled when a task is waiting on a pipe being released. As we no longer
> run task_work from get_signal() unless it's queued with TWA_SIGNAL, the
> task goes idle without running the task_work. This prevents ->release()
> from being called on the pipe, which another boot task is waiting on.
> 
> Use TWA_SIGNAL for the file fput work to ensure it's run before the task
> goes idle.
> 
> Fixes: 98b89b649fce ("signal: kill JOBCTL_TASK_WORK")
> Reported-by: Song Liu <songliubraving@fb.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> ---
> 
> The other alternative here is obviously to re-instate the:
> 
> if (unlikely(current->task_works))
> 	task_work_run();
> 
> in get_signal() that we had before this change. Might be safer in case
> there are other cases that need to ensure the work is run in a timely
> fashion, though I do think it's cleaner to long term to correctly mark
> task_work with the needed notification type. Comments welcome...

Interesting...  I think I've missed the discussion of that thing; could
you forward the relevant thread my way or give an archive link to it?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: process fput task_work with TWA_SIGNAL
  2021-01-08  3:52   ` Jens Axboe
@ 2021-01-08  6:20     ` Sedat Dilek
  0 siblings, 0 replies; 14+ messages in thread
From: Sedat Dilek @ 2021-01-08  6:20 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Doug Anderson, linux-fsdevel, linux-kernel, Alexander Viro,
	Oleg Nesterov, Song Liu

On Fri, Jan 8, 2021 at 4:56 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 1/7/21 3:17 PM, Doug Anderson wrote:
> > Hi,
> >
> > On Tue, Jan 5, 2021 at 10:30 AM Jens Axboe <axboe@kernel.dk> wrote:
> >>
> >> Song reported a boot regression in a kvm image with 5.11-rc, and bisected
> >> it down to the below patch. Debugging this issue, turns out that the boot
> >> stalled when a task is waiting on a pipe being released. As we no longer
> >> run task_work from get_signal() unless it's queued with TWA_SIGNAL, the
> >> task goes idle without running the task_work. This prevents ->release()
> >> from being called on the pipe, which another boot task is waiting on.
> >>
> >> Use TWA_SIGNAL for the file fput work to ensure it's run before the task
> >> goes idle.
> >>
> >> Fixes: 98b89b649fce ("signal: kill JOBCTL_TASK_WORK")
> >> Reported-by: Song Liu <songliubraving@fb.com>
> >> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >
> > I just spend a bit of time bisecting and landed on commit 98b89b649fce
> > ("signal: kill JOBCTL_TASK_WORK") causing my failure to bootup
> > mainline.  Your patch fixes my problem.  I haven't done any analysis
> > of the code--just testing, thus:
> >
> > Tested-by: Douglas Anderson <dianders@chromium.org>
>
> Thanks, adding your Tested-by.
>

I have this in my patch-series since it appeared in [1].

Feel free to add my:

   Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang version 11.0.1

- Sedat -

[1] https://git.kernel.dk/cgit/linux-block/log/?h=task_work

- Sedat -

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: process fput task_work with TWA_SIGNAL
  2021-01-08  5:26 ` Al Viro
@ 2021-01-08  6:21   ` Sedat Dilek
  2021-01-08  6:47     ` Al Viro
  2021-01-08  6:52     ` Al Viro
  2021-01-08  6:46   ` Al Viro
  1 sibling, 2 replies; 14+ messages in thread
From: Sedat Dilek @ 2021-01-08  6:21 UTC (permalink / raw)
  To: Al Viro; +Cc: Jens Axboe, linux-fsdevel, linux-kernel, Oleg Nesterov, Song Liu

On Fri, Jan 8, 2021 at 6:30 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Tue, Jan 05, 2021 at 11:29:11AM -0700, Jens Axboe wrote:
> > Song reported a boot regression in a kvm image with 5.11-rc, and bisected
> > it down to the below patch. Debugging this issue, turns out that the boot
> > stalled when a task is waiting on a pipe being released. As we no longer
> > run task_work from get_signal() unless it's queued with TWA_SIGNAL, the
> > task goes idle without running the task_work. This prevents ->release()
> > from being called on the pipe, which another boot task is waiting on.
> >
> > Use TWA_SIGNAL for the file fput work to ensure it's run before the task
> > goes idle.
> >
> > Fixes: 98b89b649fce ("signal: kill JOBCTL_TASK_WORK")
> > Reported-by: Song Liu <songliubraving@fb.com>
> > Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >
> > ---
> >
> > The other alternative here is obviously to re-instate the:
> >
> > if (unlikely(current->task_works))
> >       task_work_run();
> >
> > in get_signal() that we had before this change. Might be safer in case
> > there are other cases that need to ensure the work is run in a timely
> > fashion, though I do think it's cleaner to long term to correctly mark
> > task_work with the needed notification type. Comments welcome...
>
> Interesting...  I think I've missed the discussion of that thing; could
> you forward the relevant thread my way or give an archive link to it?

See [1].

- Sedat -

[1] https://marc.info/?t=160987156600001&r=1&w=2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: process fput task_work with TWA_SIGNAL
  2021-01-08  5:26 ` Al Viro
  2021-01-08  6:21   ` Sedat Dilek
@ 2021-01-08  6:46   ` Al Viro
  2021-01-08 15:13     ` Jens Axboe
  1 sibling, 1 reply; 14+ messages in thread
From: Al Viro @ 2021-01-08  6:46 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, linux-kernel, Oleg Nesterov, Song Liu

On Fri, Jan 08, 2021 at 05:26:51AM +0000, Al Viro wrote:
> On Tue, Jan 05, 2021 at 11:29:11AM -0700, Jens Axboe wrote:
> > Song reported a boot regression in a kvm image with 5.11-rc, and bisected
> > it down to the below patch. Debugging this issue, turns out that the boot
> > stalled when a task is waiting on a pipe being released. As we no longer
> > run task_work from get_signal() unless it's queued with TWA_SIGNAL, the
> > task goes idle without running the task_work. This prevents ->release()
> > from being called on the pipe, which another boot task is waiting on.
> > 
> > Use TWA_SIGNAL for the file fput work to ensure it's run before the task
> > goes idle.
> > 
> > Fixes: 98b89b649fce ("signal: kill JOBCTL_TASK_WORK")
> > Reported-by: Song Liu <songliubraving@fb.com>
> > Signed-off-by: Jens Axboe <axboe@kernel.dk>
> > 
> > ---
> > 
> > The other alternative here is obviously to re-instate the:
> > 
> > if (unlikely(current->task_works))
> > 	task_work_run();
> > 
> > in get_signal() that we had before this change. Might be safer in case
> > there are other cases that need to ensure the work is run in a timely
> > fashion, though I do think it's cleaner to long term to correctly mark
> > task_work with the needed notification type. Comments welcome...
> 
> Interesting...  I think I've missed the discussion of that thing; could
> you forward the relevant thread my way or give an archive link to it?

Actually, why do we need TWA_RESUME at all?  OK, a while ago you've added
a way for task_work_add() to do wake_up_signal().  Fine, so if the sucker
had been asleep in get_signal(), it gets woken up and the work gets run
fast.  Irrelevant for those who did task_work_add() for themselves.
With that commit, though, you've suddenly changed the default behaviour -
now if you do that task_work_add() for current *and* get asleep in
get_signal(), task_work_add() gets delayed - potentially for a very
long time.

Now the default (TWA_RESUME) has changed semantics; matter of fact,
TWA_SIGNAL seems to be a lot closer than what we used to have.  I'm
too sleepy right now to check if there are valid usecases for your
new TWA_RESUME behaviour, but I very much doubt that old callers
(before the TWA_RESUME/TWA_SIGNAL split) want that.

In particular, for mntput_no_expire() we definitely do *not* want
that, same as with fput().  Same, AFAICS, for YAMA report_access().
And for binder_deferred_fd_close().  And task_tick_numa() looks
that way as well...

Anyway, bedtime for me; right now it looks like at least for task == current
we always want TWA_SIGNAL.  I'll look into that more tomorrow when I get
up, but so far it smells like switching everything to TWA_SIGNAL would
be the right thing to do, if not going back to bool notify for
task_work_add()...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: process fput task_work with TWA_SIGNAL
  2021-01-08  6:21   ` Sedat Dilek
@ 2021-01-08  6:47     ` Al Viro
  2021-01-08  6:52     ` Al Viro
  1 sibling, 0 replies; 14+ messages in thread
From: Al Viro @ 2021-01-08  6:47 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Jens Axboe, linux-fsdevel, linux-kernel, Oleg Nesterov, Song Liu

On Fri, Jan 08, 2021 at 07:21:52AM +0100, Sedat Dilek wrote:
> On Fri, Jan 8, 2021 at 6:30 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Tue, Jan 05, 2021 at 11:29:11AM -0700, Jens Axboe wrote:
> > > Song reported a boot regression in a kvm image with 5.11-rc, and bisected
> > > it down to the below patch. Debugging this issue, turns out that the boot
> > > stalled when a task is waiting on a pipe being released. As we no longer
> > > run task_work from get_signal() unless it's queued with TWA_SIGNAL, the
> > > task goes idle without running the task_work. This prevents ->release()
> > > from being called on the pipe, which another boot task is waiting on.
> > >
> > > Use TWA_SIGNAL for the file fput work to ensure it's run before the task
> > > goes idle.
> > >
> > > Fixes: 98b89b649fce ("signal: kill JOBCTL_TASK_WORK")
> > > Reported-by: Song Liu <songliubraving@fb.com>
> > > Signed-off-by: Jens Axboe <axboe@kernel.dk>
> > >
> > > ---
> > >
> > > The other alternative here is obviously to re-instate the:
> > >
> > > if (unlikely(current->task_works))
> > >       task_work_run();
> > >
> > > in get_signal() that we had before this change. Might be safer in case
> > > there are other cases that need to ensure the work is run in a timely
> > > fashion, though I do think it's cleaner to long term to correctly mark
> > > task_work with the needed notification type. Comments welcome...
> >
> > Interesting...  I think I've missed the discussion of that thing; could
> > you forward the relevant thread my way or give an archive link to it?
> 
> See [1].
> 
> - Sedat -
> 
> [1] https://marc.info/?t=160987156600001&r=1&w=2

Thanks; will check tomorrow.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: process fput task_work with TWA_SIGNAL
  2021-01-08  6:21   ` Sedat Dilek
  2021-01-08  6:47     ` Al Viro
@ 2021-01-08  6:52     ` Al Viro
  1 sibling, 0 replies; 14+ messages in thread
From: Al Viro @ 2021-01-08  6:52 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Jens Axboe, linux-fsdevel, linux-kernel, Oleg Nesterov, Song Liu

On Fri, Jan 08, 2021 at 07:21:52AM +0100, Sedat Dilek wrote:
> On Fri, Jan 8, 2021 at 6:30 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Tue, Jan 05, 2021 at 11:29:11AM -0700, Jens Axboe wrote:
> > > Song reported a boot regression in a kvm image with 5.11-rc, and bisected
> > > it down to the below patch. Debugging this issue, turns out that the boot
> > > stalled when a task is waiting on a pipe being released. As we no longer
> > > run task_work from get_signal() unless it's queued with TWA_SIGNAL, the
> > > task goes idle without running the task_work. This prevents ->release()
> > > from being called on the pipe, which another boot task is waiting on.
> > >
> > > Use TWA_SIGNAL for the file fput work to ensure it's run before the task
> > > goes idle.
> > >
> > > Fixes: 98b89b649fce ("signal: kill JOBCTL_TASK_WORK")
> > > Reported-by: Song Liu <songliubraving@fb.com>
> > > Signed-off-by: Jens Axboe <axboe@kernel.dk>
> > >
> > > ---
> > >
> > > The other alternative here is obviously to re-instate the:
> > >
> > > if (unlikely(current->task_works))
> > >       task_work_run();
> > >
> > > in get_signal() that we had before this change. Might be safer in case
> > > there are other cases that need to ensure the work is run in a timely
> > > fashion, though I do think it's cleaner to long term to correctly mark
> > > task_work with the needed notification type. Comments welcome...
> >
> > Interesting...  I think I've missed the discussion of that thing; could
> > you forward the relevant thread my way or give an archive link to it?
> 
> See [1].
> 
> - Sedat -
> 
> [1] https://marc.info/?t=160987156600001&r=1&w=2

Wait, that's this very thread, starting with the posting I'd been replying
to.  Really confused now...  Was that a private bug report and an equally
private discussion?  That's what I wanted to see...

Anyway, I'm more than half-asleep right now; will get back to that in the
morning.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: process fput task_work with TWA_SIGNAL
  2021-01-08  6:46   ` Al Viro
@ 2021-01-08 15:13     ` Jens Axboe
  2021-01-08 15:58       ` Al Viro
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2021-01-08 15:13 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel, Oleg Nesterov, Song Liu

On 1/7/21 11:46 PM, Al Viro wrote:
> On Fri, Jan 08, 2021 at 05:26:51AM +0000, Al Viro wrote:
>> On Tue, Jan 05, 2021 at 11:29:11AM -0700, Jens Axboe wrote:
>>> Song reported a boot regression in a kvm image with 5.11-rc, and bisected
>>> it down to the below patch. Debugging this issue, turns out that the boot
>>> stalled when a task is waiting on a pipe being released. As we no longer
>>> run task_work from get_signal() unless it's queued with TWA_SIGNAL, the
>>> task goes idle without running the task_work. This prevents ->release()
>>> from being called on the pipe, which another boot task is waiting on.
>>>
>>> Use TWA_SIGNAL for the file fput work to ensure it's run before the task
>>> goes idle.
>>>
>>> Fixes: 98b89b649fce ("signal: kill JOBCTL_TASK_WORK")
>>> Reported-by: Song Liu <songliubraving@fb.com>
>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>
>>> ---
>>>
>>> The other alternative here is obviously to re-instate the:
>>>
>>> if (unlikely(current->task_works))
>>> 	task_work_run();
>>>
>>> in get_signal() that we had before this change. Might be safer in case
>>> there are other cases that need to ensure the work is run in a timely
>>> fashion, though I do think it's cleaner to long term to correctly mark
>>> task_work with the needed notification type. Comments welcome...
>>
>> Interesting...  I think I've missed the discussion of that thing; could
>> you forward the relevant thread my way or give an archive link to it?

The initial report from Song was off list, and I just worked from that
to get to understanding the issue. Most of it is in the commit message,
but the debugging basically involved figuring out what the stuck task
was doing (it was in idle), and that it still had pending task_work. The
task_work was 5 entries of ____fput, with 4 being ext4 files, and 1
being a pipe. So that lead to the theory of the pipe not being released,
and hence why we were stuck.

> Actually, why do we need TWA_RESUME at all?  OK, a while ago you've added
> a way for task_work_add() to do wake_up_signal().  Fine, so if the sucker
> had been asleep in get_signal(), it gets woken up and the work gets run
> fast.  Irrelevant for those who did task_work_add() for themselves.
> With that commit, though, you've suddenly changed the default behaviour -
> now if you do that task_work_add() for current *and* get asleep in
> get_signal(), task_work_add() gets delayed - potentially for a very
> long time.

Right, this is why I brought up that we can re-instate the get_signal()
running task_work unconditionally as another way of fixing it, because
that change was inadvertently done as part of the commit that killed off
JOBCTL_TASK_WORK.

> Now the default (TWA_RESUME) has changed semantics; matter of fact,
> TWA_SIGNAL seems to be a lot closer than what we used to have.  I'm
> too sleepy right now to check if there are valid usecases for your new
> TWA_RESUME behaviour, but I very much doubt that old callers (before
> the TWA_RESUME/TWA_SIGNAL split) want that.
> 
> In particular, for mntput_no_expire() we definitely do *not* want
> that, same as with fput().  Same, AFAICS, for YAMA report_access().
> And for binder_deferred_fd_close().  And task_tick_numa() looks that
> way as well...
> 
> Anyway, bedtime for me; right now it looks like at least for task ==
> current we always want TWA_SIGNAL.  I'll look into that more tomorrow
> when I get up, but so far it smells like switching everything to
> TWA_SIGNAL would be the right thing to do, if not going back to bool
> notify for task_work_add()...

Before the change, the fact that we ran task_work off get_signal() and
thus processed even non-notify work in that path was a bit of a mess,
imho. If you have work that needs processing now, in the same manner as
signals, then you really should be using TWA_SIGNAL. For this pipe case,
and I'd need to setup and reproduce it again, the task must have a
signal pending and that would have previously caused the task_work to
run, and now it does not. TWA_RESUME technically didn't change its
behavior, it's still the same notification type, we just don't run
task_work unconditionally (regardless of notification type) from
get_signal().

I think the main question here is if we want to re-instate the behavior
of running task_work off get_signal(). I'm leaning towards not doing
that and ensuring that callers that DO need that are using TWA_SIGNAL.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: process fput task_work with TWA_SIGNAL
  2021-01-08 15:13     ` Jens Axboe
@ 2021-01-08 15:58       ` Al Viro
  2021-01-08 16:10         ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2021-01-08 15:58 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, linux-kernel, Oleg Nesterov, Song Liu

On Fri, Jan 08, 2021 at 08:13:25AM -0700, Jens Axboe wrote:
> > Anyway, bedtime for me; right now it looks like at least for task ==
> > current we always want TWA_SIGNAL.  I'll look into that more tomorrow
> > when I get up, but so far it smells like switching everything to
> > TWA_SIGNAL would be the right thing to do, if not going back to bool
> > notify for task_work_add()...
> 
> Before the change, the fact that we ran task_work off get_signal() and
> thus processed even non-notify work in that path was a bit of a mess,
> imho. If you have work that needs processing now, in the same manner as
> signals, then you really should be using TWA_SIGNAL. For this pipe case,
> and I'd need to setup and reproduce it again, the task must have a
> signal pending and that would have previously caused the task_work to
> run, and now it does not. TWA_RESUME technically didn't change its
> behavior, it's still the same notification type, we just don't run
> task_work unconditionally (regardless of notification type) from
> get_signal().

It sure as hell did change behaviour.  Think of the effect of getting
hit with SIGSTOP.  That's what that "bit of a mess" had been about.
Work done now vs. possibly several days later when SIGCONT finally
gets sent.

> I think the main question here is if we want to re-instate the behavior
> of running task_work off get_signal(). I'm leaning towards not doing
> that and ensuring that callers that DO need that are using TWA_SIGNAL.

Can you show the callers that DO NOT need it?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: process fput task_work with TWA_SIGNAL
  2021-01-08 15:58       ` Al Viro
@ 2021-01-08 16:10         ` Jens Axboe
  2021-01-08 16:26           ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2021-01-08 16:10 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel, Oleg Nesterov, Song Liu

On 1/8/21 8:58 AM, Al Viro wrote:
> On Fri, Jan 08, 2021 at 08:13:25AM -0700, Jens Axboe wrote:
>>> Anyway, bedtime for me; right now it looks like at least for task ==
>>> current we always want TWA_SIGNAL.  I'll look into that more tomorrow
>>> when I get up, but so far it smells like switching everything to
>>> TWA_SIGNAL would be the right thing to do, if not going back to bool
>>> notify for task_work_add()...
>>
>> Before the change, the fact that we ran task_work off get_signal() and
>> thus processed even non-notify work in that path was a bit of a mess,
>> imho. If you have work that needs processing now, in the same manner as
>> signals, then you really should be using TWA_SIGNAL. For this pipe case,
>> and I'd need to setup and reproduce it again, the task must have a
>> signal pending and that would have previously caused the task_work to
>> run, and now it does not. TWA_RESUME technically didn't change its
>> behavior, it's still the same notification type, we just don't run
>> task_work unconditionally (regardless of notification type) from
>> get_signal().
> 
> It sure as hell did change behaviour.  Think of the effect of getting
> hit with SIGSTOP.  That's what that "bit of a mess" had been about.
> Work done now vs. possibly several days later when SIGCONT finally
> gets sent.
> 
>> I think the main question here is if we want to re-instate the behavior
>> of running task_work off get_signal(). I'm leaning towards not doing
>> that and ensuring that callers that DO need that are using TWA_SIGNAL.
> 
> Can you show the callers that DO NOT need it?

OK, so here's my suggestion:

1) For 5.11, we just re-instate the task_work run in get_signal(). This
   will make TWA_RESUME have the exact same behavior as before.

2) For 5.12, I'll prepare a patch that collapses TWA_RESUME and TWA_SIGNAL,
   turning it into a bool again (notify or no notify).

How does that sound?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: process fput task_work with TWA_SIGNAL
  2021-01-08 16:10         ` Jens Axboe
@ 2021-01-08 16:26           ` Jens Axboe
  2021-01-08 18:05             ` Al Viro
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2021-01-08 16:26 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel, Oleg Nesterov, Song Liu

[-- Attachment #1: Type: text/plain, Size: 2239 bytes --]

On 1/8/21 9:10 AM, Jens Axboe wrote:
> On 1/8/21 8:58 AM, Al Viro wrote:
>> On Fri, Jan 08, 2021 at 08:13:25AM -0700, Jens Axboe wrote:
>>>> Anyway, bedtime for me; right now it looks like at least for task ==
>>>> current we always want TWA_SIGNAL.  I'll look into that more tomorrow
>>>> when I get up, but so far it smells like switching everything to
>>>> TWA_SIGNAL would be the right thing to do, if not going back to bool
>>>> notify for task_work_add()...
>>>
>>> Before the change, the fact that we ran task_work off get_signal() and
>>> thus processed even non-notify work in that path was a bit of a mess,
>>> imho. If you have work that needs processing now, in the same manner as
>>> signals, then you really should be using TWA_SIGNAL. For this pipe case,
>>> and I'd need to setup and reproduce it again, the task must have a
>>> signal pending and that would have previously caused the task_work to
>>> run, and now it does not. TWA_RESUME technically didn't change its
>>> behavior, it's still the same notification type, we just don't run
>>> task_work unconditionally (regardless of notification type) from
>>> get_signal().
>>
>> It sure as hell did change behaviour.  Think of the effect of getting
>> hit with SIGSTOP.  That's what that "bit of a mess" had been about.
>> Work done now vs. possibly several days later when SIGCONT finally
>> gets sent.
>>
>>> I think the main question here is if we want to re-instate the behavior
>>> of running task_work off get_signal(). I'm leaning towards not doing
>>> that and ensuring that callers that DO need that are using TWA_SIGNAL.
>>
>> Can you show the callers that DO NOT need it?
> 
> OK, so here's my suggestion:
> 
> 1) For 5.11, we just re-instate the task_work run in get_signal(). This
>    will make TWA_RESUME have the exact same behavior as before.
> 
> 2) For 5.12, I'll prepare a patch that collapses TWA_RESUME and TWA_SIGNAL,
>    turning it into a bool again (notify or no notify).
> 
> How does that sound?

Attached the patches - #1 is proposed for 5.11 to fix the current issue,
and then 2-4 can get queued for 5.12 to totally remove the difference
between TWA_RESUME and TWA_SIGNAL.

Totally untested, but pretty straight forward.

-- 
Jens Axboe


[-- Attachment #2: 0003-task_work-use-true-false-for-task_work_add-notificat.patch --]
[-- Type: text/x-patch, Size: 9298 bytes --]

From 069af1e44d70b6d3dd746e41a5f9d65505fb5490 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@kernel.dk>
Date: Fri, 8 Jan 2021 09:22:04 -0700
Subject: [PATCH 3/4] task_work: use true/false for task_work_add notification
 type

There's no difference between TWA_SIGNAL and TWA_RESUME anymore, change
all callers to simply specify whether they need notification or not.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 arch/x86/kernel/cpu/mce/core.c         |  2 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  2 +-
 drivers/acpi/apei/ghes.c               |  2 +-
 drivers/android/binder.c               |  2 +-
 fs/file_table.c                        |  2 +-
 fs/io_uring.c                          | 19 +++++++++----------
 fs/namespace.c                         |  2 +-
 kernel/events/uprobes.c                |  2 +-
 kernel/irq/manage.c                    |  2 +-
 kernel/sched/fair.c                    |  2 +-
 kernel/time/posix-cpu-timers.c         |  2 +-
 security/keys/keyctl.c                 |  2 +-
 security/yama/yama_lsm.c               |  2 +-
 13 files changed, 21 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 13d3f1cbda17..9f315b4c022d 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1278,7 +1278,7 @@ static void queue_task_work(struct mce *m, int kill_current_task)
 	else
 		current->mce_kill_me.func = kill_me_maybe;
 
-	task_work_add(current, &current->mce_kill_me, TWA_RESUME);
+	task_work_add(current, &current->mce_kill_me, true);
 }
 
 /*
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 29ffb95b25ff..109dd4fe72da 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -579,7 +579,7 @@ static int __rdtgroup_move_task(struct task_struct *tsk,
 	 * callback has been invoked.
 	 */
 	atomic_inc(&rdtgrp->waitcount);
-	ret = task_work_add(tsk, &callback->work, TWA_RESUME);
+	ret = task_work_add(tsk, &callback->work, true);
 	if (ret) {
 		/*
 		 * Task is exiting. Drop the refcount and free the callback.
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index fce7ade2aba9..99df00f64306 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -942,7 +942,7 @@ static void ghes_proc_in_irq(struct irq_work *irq_work)
 			estatus_node->task_work.func = ghes_kick_task_work;
 			estatus_node->task_work_cpu = smp_processor_id();
 			ret = task_work_add(current, &estatus_node->task_work,
-					    TWA_RESUME);
+					    true);
 			if (ret)
 				estatus_node->task_work.func = NULL;
 		}
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index c119736ca56a..5b1b2ed7c020 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -1839,7 +1839,7 @@ static void binder_deferred_fd_close(int fd)
 	close_fd_get_file(fd, &twcb->file);
 	if (twcb->file) {
 		filp_close(twcb->file, current->files);
-		task_work_add(current, &twcb->twork, TWA_RESUME);
+		task_work_add(current, &twcb->twork, true);
 	} else {
 		kfree(twcb);
 	}
diff --git a/fs/file_table.c b/fs/file_table.c
index 45437f8e1003..f2bb37fd0905 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -338,7 +338,7 @@ void fput_many(struct file *file, unsigned int refs)
 
 		if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
 			init_task_work(&file->f_u.fu_rcuhead, ____fput);
-			if (!task_work_add(task, &file->f_u.fu_rcuhead, TWA_RESUME))
+			if (!task_work_add(task, &file->f_u.fu_rcuhead, true))
 				return;
 			/*
 			 * After this task has run exit_task_work(),
diff --git a/fs/io_uring.c b/fs/io_uring.c
index ca46f314640b..70a555b17bac 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2077,7 +2077,7 @@ static int io_req_task_work_add(struct io_kiocb *req)
 {
 	struct task_struct *tsk = req->task;
 	struct io_ring_ctx *ctx = req->ctx;
-	enum task_work_notify_mode notify;
+	bool notify;
 	int ret;
 
 	if (tsk->flags & PF_EXITING)
@@ -2085,13 +2085,12 @@ static int io_req_task_work_add(struct io_kiocb *req)
 
 	/*
 	 * SQPOLL kernel thread doesn't need notification, just a wakeup. For
-	 * all other cases, use TWA_SIGNAL unconditionally to ensure we're
-	 * processing task_work. There's no reliable way to tell if TWA_RESUME
-	 * will do the job.
+	 * all other cases, use notification unconditionally to ensure we're
+	 * processing task_work.
 	 */
-	notify = TWA_NONE;
+	notify = false;
 	if (!(ctx->flags & IORING_SETUP_SQPOLL))
-		notify = TWA_SIGNAL;
+		notify = true;
 
 	ret = task_work_add(tsk, &req->task_work, notify);
 	if (!ret)
@@ -2159,7 +2158,7 @@ static void io_req_task_queue(struct io_kiocb *req)
 
 		init_task_work(&req->task_work, io_req_task_cancel);
 		tsk = io_wq_get_task(req->ctx->io_wq);
-		task_work_add(tsk, &req->task_work, TWA_NONE);
+		task_work_add(tsk, &req->task_work, false);
 		wake_up_process(tsk);
 	}
 }
@@ -2279,7 +2278,7 @@ static void io_free_req_deferred(struct io_kiocb *req)
 		struct task_struct *tsk;
 
 		tsk = io_wq_get_task(req->ctx->io_wq);
-		task_work_add(tsk, &req->task_work, TWA_NONE);
+		task_work_add(tsk, &req->task_work, false);
 		wake_up_process(tsk);
 	}
 }
@@ -3375,7 +3374,7 @@ static int io_async_buf_func(struct wait_queue_entry *wait, unsigned mode,
 		/* queue just for cancelation */
 		init_task_work(&req->task_work, io_req_task_cancel);
 		tsk = io_wq_get_task(req->ctx->io_wq);
-		task_work_add(tsk, &req->task_work, TWA_NONE);
+		task_work_add(tsk, &req->task_work, false);
 		wake_up_process(tsk);
 	}
 	return 1;
@@ -5092,7 +5091,7 @@ static int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *poll,
 
 		WRITE_ONCE(poll->canceled, true);
 		tsk = io_wq_get_task(req->ctx->io_wq);
-		task_work_add(tsk, &req->task_work, TWA_NONE);
+		task_work_add(tsk, &req->task_work, false);
 		wake_up_process(tsk);
 	}
 	return 1;
diff --git a/fs/namespace.c b/fs/namespace.c
index d2db7dfe232b..f6b661ad8bbd 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1194,7 +1194,7 @@ static void mntput_no_expire(struct mount *mnt)
 		struct task_struct *task = current;
 		if (likely(!(task->flags & PF_KTHREAD))) {
 			init_task_work(&mnt->mnt_rcu, __cleanup_mnt);
-			if (!task_work_add(task, &mnt->mnt_rcu, TWA_RESUME))
+			if (!task_work_add(task, &mnt->mnt_rcu, true))
 				return;
 		}
 		if (llist_add(&mnt->mnt_llist, &delayed_mntput_list))
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index bf9edd8d75be..8bb26a338e06 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1823,7 +1823,7 @@ void uprobe_copy_process(struct task_struct *t, unsigned long flags)
 
 	t->utask->dup_xol_addr = area->vaddr;
 	init_task_work(&t->utask->dup_xol_work, dup_xol_work);
-	task_work_add(t, &t->utask->dup_xol_work, TWA_RESUME);
+	task_work_add(t, &t->utask->dup_xol_work, true);
 }
 
 /*
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index ab8567f32501..d4e2f484e4af 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1232,7 +1232,7 @@ static int irq_thread(void *data)
 		handler_fn = irq_thread_fn;
 
 	init_task_work(&on_exit_work, irq_thread_dtor);
-	task_work_add(current, &on_exit_work, TWA_NONE);
+	task_work_add(current, &on_exit_work, false);
 
 	irq_thread_check_affinity(desc, action);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 04a3ce20da67..476c564f0f8a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2939,7 +2939,7 @@ static void task_tick_numa(struct rq *rq, struct task_struct *curr)
 		curr->node_stamp += period;
 
 		if (!time_before(jiffies, curr->mm->numa_next_scan))
-			task_work_add(curr, work, TWA_RESUME);
+			task_work_add(curr, work, true);
 	}
 }
 
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index a71758e34e45..51080a1ed11f 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1128,7 +1128,7 @@ static inline void __run_posix_cpu_timers(struct task_struct *tsk)
 
 	/* Schedule task work to actually expire the timers */
 	tsk->posix_cputimers_work.scheduled = true;
-	task_work_add(tsk, &tsk->posix_cputimers_work.work, TWA_RESUME);
+	task_work_add(tsk, &tsk->posix_cputimers_work.work, true);
 }
 
 static inline bool posix_cpu_timers_enable_work(struct task_struct *tsk,
diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index 61a614c21b9b..e26bbccda7cc 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -1693,7 +1693,7 @@ long keyctl_session_to_parent(void)
 
 	/* the replacement session keyring is applied just prior to userspace
 	 * restarting */
-	ret = task_work_add(parent, newwork, TWA_RESUME);
+	ret = task_work_add(parent, newwork, true);
 	if (!ret)
 		newwork = NULL;
 unlock:
diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c
index 06e226166aab..536c99646f6a 100644
--- a/security/yama/yama_lsm.c
+++ b/security/yama/yama_lsm.c
@@ -99,7 +99,7 @@ static void report_access(const char *access, struct task_struct *target,
 	info->access = access;
 	info->target = target;
 	info->agent = agent;
-	if (task_work_add(current, &info->work, TWA_RESUME) == 0)
+	if (task_work_add(current, &info->work, true) == 0)
 		return; /* success */
 
 	WARN(1, "report_access called from exiting task");
-- 
2.30.0


[-- Attachment #3: 0002-task_work-always-use-signal-work-if-notification-is-.patch --]
[-- Type: text/x-patch, Size: 4038 bytes --]

From 232a5d9bf181b05d199ff3c3be7ab7c2b0b1898c Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@kernel.dk>
Date: Fri, 8 Jan 2021 09:17:45 -0700
Subject: [PATCH 2/4] task_work: always use signal work if notification is
 selected

Since we run any task_work from get_signal(), there's no real distinction
between TWA_RESUME and TWA_SIGNAL. Turn the notification method for
task_work_add() into a boolean, in preparation for getting rid of the
difference between the two.

With TWA_SIGNAL always being used, we no longer need to check and run
task_work in get_signal() unconditionally. Get rid of it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 include/linux/task_work.h |  2 +-
 kernel/signal.c           |  3 ---
 kernel/task_work.c        | 29 ++++++++---------------------
 3 files changed, 9 insertions(+), 25 deletions(-)

diff --git a/include/linux/task_work.h b/include/linux/task_work.h
index 0d848a1e9e62..8df8d539fad8 100644
--- a/include/linux/task_work.h
+++ b/include/linux/task_work.h
@@ -20,7 +20,7 @@ enum task_work_notify_mode {
 };
 
 int task_work_add(struct task_struct *task, struct callback_head *twork,
-			enum task_work_notify_mode mode);
+			bool notify);
 
 struct callback_head *task_work_cancel(struct task_struct *, task_work_func_t);
 void task_work_run(void);
diff --git a/kernel/signal.c b/kernel/signal.c
index 6b9c431da08f..5736c55aaa1a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2550,9 +2550,6 @@ bool get_signal(struct ksignal *ksig)
 	struct signal_struct *signal = current->signal;
 	int signr;
 
-	if (unlikely(current->task_works))
-		task_work_run();
-
 	/*
 	 * For non-generic architectures, check for TIF_NOTIFY_SIGNAL so
 	 * that the arch handlers don't all have to do it. If we get here
diff --git a/kernel/task_work.c b/kernel/task_work.c
index 9cde961875c0..7a4850669033 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -9,15 +9,13 @@ static struct callback_head work_exited; /* all we need is ->next == NULL */
  * task_work_add - ask the @task to execute @work->func()
  * @task: the task which should run the callback
  * @work: the callback to run
- * @notify: how to notify the targeted task
+ * @notify: whether to notify the targeted task or not
  *
- * Queue @work for task_work_run() below and notify the @task if @notify
- * is @TWA_RESUME or @TWA_SIGNAL. @TWA_SIGNAL works like signals, in that the
- * it will interrupt the targeted task and run the task_work. @TWA_RESUME
- * work is run only when the task exits the kernel and returns to user mode,
- * or before entering guest mode. Fails if the @task is exiting/exited and thus
- * it can't process this @work. Otherwise @work->func() will be called when the
- * @task goes through one of the aforementioned transitions, or exits.
+ * Queue @work for task_work_run() below and notify the @task if @notify is
+ * true. If notification is selected, the targeted task is interrupted so it
+ * can process the work. Fails if the @task is exiting/exited and thus it can't
+ * process this @work. Otherwise @work->func() will be called when the @task
+ * goes through one of the aforementioned transitions, or exits.
  *
  * If the targeted task is exiting, then an error is returned and the work item
  * is not queued. It's up to the caller to arrange for an alternative mechanism
@@ -30,7 +28,7 @@ static struct callback_head work_exited; /* all we need is ->next == NULL */
  * 0 if succeeds or -ESRCH.
  */
 int task_work_add(struct task_struct *task, struct callback_head *work,
-		  enum task_work_notify_mode notify)
+		  bool notify)
 {
 	struct callback_head *head;
 
@@ -41,19 +39,8 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
 		work->next = head;
 	} while (cmpxchg(&task->task_works, head, work) != head);
 
-	switch (notify) {
-	case TWA_NONE:
-		break;
-	case TWA_RESUME:
-		set_notify_resume(task);
-		break;
-	case TWA_SIGNAL:
+	if (notify)
 		set_notify_signal(task);
-		break;
-	default:
-		WARN_ON_ONCE(1);
-		break;
-	}
 
 	return 0;
 }
-- 
2.30.0


[-- Attachment #4: 0001-task_work-unconditionally-run-task_work-from-get_sig.patch --]
[-- Type: text/x-patch, Size: 1788 bytes --]

From 35d0b389f3b23439ad15b610d6e43fc72fc75779 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@kernel.dk>
Date: Tue, 5 Jan 2021 11:32:43 -0700
Subject: [PATCH 1/4] task_work: unconditionally run task_work from
 get_signal()

Song reported a boot regression in a kvm image with 5.11-rc, and bisected
it down to the below patch. Debugging this issue, turns out that the boot
stalled when a task is waiting on a pipe being released. As we no longer
run task_work from get_signal() unless it's queued with TWA_SIGNAL, the
task goes idle without running the task_work. This prevents ->release()
from being called on the pipe, which another boot task is waiting on.

For now, re-instate the unconditional task_work run from get_signal().
For 5.12, we'll collapse TWA_RESUME and TWA_SIGNAL, as it no longer
makes sense to have a distinction between the two. This will turn
task_work notification into a simple boolean, whether to notify or not.

Fixes: 98b89b649fce ("signal: kill JOBCTL_TASK_WORK")
Reported-by: Song Liu <songliubraving@fb.com>
Tested-by: John Stultz <john.stultz@linaro.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang version 11.0.1
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 kernel/signal.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/signal.c b/kernel/signal.c
index 5736c55aaa1a..6b9c431da08f 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2550,6 +2550,9 @@ bool get_signal(struct ksignal *ksig)
 	struct signal_struct *signal = current->signal;
 	int signr;
 
+	if (unlikely(current->task_works))
+		task_work_run();
+
 	/*
 	 * For non-generic architectures, check for TIF_NOTIFY_SIGNAL so
 	 * that the arch handlers don't all have to do it. If we get here
-- 
2.30.0


[-- Attachment #5: 0004-task_work-kill-enum-task_work_notify_mode.patch --]
[-- Type: text/x-patch, Size: 826 bytes --]

From 4906058b2573b7de9c220dd8566c20d0e78a7c4b Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@kernel.dk>
Date: Fri, 8 Jan 2021 09:23:16 -0700
Subject: [PATCH 4/4] task_work: kill enum task_work_notify_mode

It's now unused, get rid of it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 include/linux/task_work.h | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/include/linux/task_work.h b/include/linux/task_work.h
index 8df8d539fad8..7b987260f4d7 100644
--- a/include/linux/task_work.h
+++ b/include/linux/task_work.h
@@ -13,12 +13,6 @@ init_task_work(struct callback_head *twork, task_work_func_t func)
 	twork->func = func;
 }
 
-enum task_work_notify_mode {
-	TWA_NONE,
-	TWA_RESUME,
-	TWA_SIGNAL,
-};
-
 int task_work_add(struct task_struct *task, struct callback_head *twork,
 			bool notify);
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: process fput task_work with TWA_SIGNAL
  2021-01-08 16:26           ` Jens Axboe
@ 2021-01-08 18:05             ` Al Viro
  0 siblings, 0 replies; 14+ messages in thread
From: Al Viro @ 2021-01-08 18:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, linux-kernel, Oleg Nesterov, Song Liu

On Fri, Jan 08, 2021 at 09:26:40AM -0700, Jens Axboe wrote:
> >> Can you show the callers that DO NOT need it?
> > 
> > OK, so here's my suggestion:
> > 
> > 1) For 5.11, we just re-instate the task_work run in get_signal(). This
> >    will make TWA_RESUME have the exact same behavior as before.
> > 
> > 2) For 5.12, I'll prepare a patch that collapses TWA_RESUME and TWA_SIGNAL,
> >    turning it into a bool again (notify or no notify).
> > 
> > How does that sound?
> 
> Attached the patches - #1 is proposed for 5.11 to fix the current issue,
> and then 2-4 can get queued for 5.12 to totally remove the difference
> between TWA_RESUME and TWA_SIGNAL.
> 
> Totally untested, but pretty straight forward.

	Umm...  I'm looking at the callers of get_signal() and I really wonder
how your support for TIF_NOTIFY_SIGNAL interacts with saved sigmask handling
by various do_signal() (calls of restore_saved_sigmask()).  Could you give
pointers to relevant discussion or a braindump on the same?  I realize that
it had been months ago, but...

	Do we even need restore_saved_sigmask_unless() now?  Could
set_user_sigmask() simply set TIF_NOTIFY_SIGNAL?  Oleg, could you comment
on that?

	Another fun question is how does that thing interact with
single-stepping logics; it's been about 8 years since I looked into
those horrors, but they used to be bloody awful...

	What I'm trying to figure out is how costly TIF_NOTIFY_SIGNAL is
on the work execution side; task_work_add() side is cheap enough, it's
delivery that is interesting.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-01-08 18:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-05 18:29 [PATCH] fs: process fput task_work with TWA_SIGNAL Jens Axboe
2021-01-07 22:17 ` Doug Anderson
2021-01-08  3:52   ` Jens Axboe
2021-01-08  6:20     ` Sedat Dilek
2021-01-08  5:26 ` Al Viro
2021-01-08  6:21   ` Sedat Dilek
2021-01-08  6:47     ` Al Viro
2021-01-08  6:52     ` Al Viro
2021-01-08  6:46   ` Al Viro
2021-01-08 15:13     ` Jens Axboe
2021-01-08 15:58       ` Al Viro
2021-01-08 16:10         ` Jens Axboe
2021-01-08 16:26           ` Jens Axboe
2021-01-08 18:05             ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).