All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: jack@suse.cz, linux-block@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, axboe@kernel.dk, tj@kernel.org,
	david@fromorbit.com
Subject: Re: [PATCH] bdi: Fix oops in wb_workfn()
Date: Mon, 21 May 2018 11:38:23 +0200	[thread overview]
Message-ID: <20180521093823.kjj5tk7ko244jv4d@quack2.suse.cz> (raw)
In-Reply-To: <201805192327.JIF05779.OQFJFStOOMLFVH@I-love.SAKURA.ne.jp>

On Sat 19-05-18 23:27:09, Tetsuo Handa wrote:
> Tetsuo Handa wrote:
> > Jan Kara wrote:
> > > Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
> > > the necessary precautions against racing with bdi unregistration.
> > 
> > Yes, this patch will solve NULL pointer dereference bug. But is it OK to leave
> > list_empty(&wb->work_list) == false situation? Who takes over the role of making
> > list_empty(&wb->work_list) == true?
> 
> syzbot is again reporting the same NULL pointer dereference.
> 
>   general protection fault in wb_workfn (2)
>   https://syzkaller.appspot.com/bug?id=e0818ccb7e46190b3f1038b0c794299208ed4206

Gaah... So we are still missing something.

> Didn't we overlook something obvious in commit b8b784958eccbf8f ("bdi:
> Fix oops in wb_workfn()") ?
> 
> At first, I thought that that commit will solve NULL pointer dereference bug.
> But what does
> 
>  	if (!list_empty(&wb->work_list))
> -		mod_delayed_work(bdi_wq, &wb->dwork, 0);
> +		wb_wakeup(wb);
>  	else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
>  		wb_wakeup_delayed(wb);
> 
> mean?
> 
> static void wb_wakeup(struct bdi_writeback *wb)
> {
> 	spin_lock_bh(&wb->work_lock);
> 	if (test_bit(WB_registered, &wb->state))
> 		mod_delayed_work(bdi_wq, &wb->dwork, 0);
> 	spin_unlock_bh(&wb->work_lock);
> }
> 
> It means nothing but "we don't call mod_delayed_work() if WB_registered
> bit was already cleared".

Exactly.

> But if WB_registered bit is not yet cleared when we hit
> wb_wakeup_delayed() path?
> 
> void wb_wakeup_delayed(struct bdi_writeback *wb)
> {
> 	unsigned long timeout;
> 
> 	timeout = msecs_to_jiffies(dirty_writeback_interval * 10);
> 	spin_lock_bh(&wb->work_lock);
> 	if (test_bit(WB_registered, &wb->state))
> 		queue_delayed_work(bdi_wq, &wb->dwork, timeout);
> 	spin_unlock_bh(&wb->work_lock);
> }
> 
> add_timer() is called because (presumably) timeout > 0. And after that
> timeout expires, __queue_work() is called even if WB_registered bit is
> already cleared before that timeout expires, isn't it?

Yes.

> void delayed_work_timer_fn(struct timer_list *t)
> {
> 	struct delayed_work *dwork = from_timer(dwork, t, timer);
> 
> 	/* should have been called from irqsafe timer with irq already off */
> 	__queue_work(dwork->cpu, dwork->wq, &dwork->work);
> }
> 
> Then, wb_workfn() is after all scheduled even if we check for
> WB_registered bit, isn't it?

It can be queued after WB_registered bit is cleared but it cannot be queued
after mod_delayed_work(bdi_wq, &wb->dwork, 0) has finished. That function
deletes the pending timer (the timer cannot be armed again because
WB_registered is cleared) and queues what should be the last round of
wb_workfn().

> Then, don't we need to check that
> 
> 	mod_delayed_work(bdi_wq, &wb->dwork, 0);
> 	flush_delayed_work(&wb->dwork);
> 
> is really waiting for completion? At least, shouldn't we try below debug
> output (not only for debugging this report but also generally desirable)?
> 
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index 7441bd9..ccec8cd 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -376,8 +376,10 @@ static void wb_shutdown(struct bdi_writeback *wb)
>  	 * tells wb_workfn() that @wb is dying and its work_list needs to
>  	 * be drained no matter what.
>  	 */
> -	mod_delayed_work(bdi_wq, &wb->dwork, 0);
> -	flush_delayed_work(&wb->dwork);
> +	if (!mod_delayed_work(bdi_wq, &wb->dwork, 0))
> +		printk(KERN_WARNING "wb_shutdown: mod_delayed_work() failed\n");

false return from mod_delayed_work() just means that there was no timer
armed. That is a valid situation if there are no dirty data.

> +	if (!flush_delayed_work(&wb->dwork))
> +		printk(KERN_WARNING "wb_shutdown: flush_delayed_work() failed\n");

And this is valid as well (although unlikely) if the work managed to
complete on another CPU before flush_delayed_work() was called.

So I don't think your warnings will help us much. But yes, we need to debug
this somehow. For now I have no idea what could be still going wrong.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2018-05-21  9:38 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-03 16:26 [PATCH] bdi: Fix oops in wb_workfn() Jan Kara
2018-05-03 21:55 ` Dave Chinner
2018-05-03 21:57   ` Jens Axboe
2018-05-09  9:48   ` Jan Kara
2018-05-03 22:35 ` Tetsuo Handa
2018-05-09  9:47   ` Jan Kara
2018-05-19 14:27   ` Tetsuo Handa
2018-05-21  9:38     ` Jan Kara [this message]
2018-05-25 10:15       ` Tetsuo Handa
2018-05-09 10:31 ` Jan Kara
2018-05-09 14:42   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180521093823.kjj5tk7ko244jv4d@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=axboe@kernel.dk \
    --cc=david@fromorbit.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.