All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>, Jan Kara <jack@suse.cz>,
	syzbot <syzbot+3622cea378100f45d59f@syzkaller.appspotmail.com>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	syzkaller-bugs <syzkaller-bugs@googlegroups.com>,
	Theodore Ts'o <tytso@mit.edu>, Linux-MM <linux-mm@kvack.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Nicholas Piggin <npiggin@gmail.com>,
	Alex Shi <alex.shi@linux.alibaba.com>, Qian Cai <cai@lca.pw>,
	Christoph Hellwig <hch@infradead.org>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Matthew Wilcox <willy@infradead.org>,
	William Kucharski <william.kucharski@oracle.com>,
	Jens Axboe <axboe@kernel.dk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-xfs <linux-xfs@vger.kernel.org>
Subject: Re: kernel BUG at fs/ext4/inode.c:LINE!
Date: Mon, 23 Nov 2020 22:34:12 -0800 (PST)	[thread overview]
Message-ID: <alpine.LSU.2.11.2011232209540.5235@eggly.anvils> (raw)
In-Reply-To: <CAHk-=whYO5v09E8oHoYQDn7qqV0hBu713AjF+zxJ9DCr1+WOtQ@mail.gmail.com>

On Mon, 23 Nov 2020, Linus Torvalds wrote:
> On Mon, Nov 23, 2020 at 8:07 PM Hugh Dickins <hughd@google.com> wrote:
> >
> > The problem is that PageWriteback is not accompanied by a page reference
> > (as the NOTE at the end of test_clear_page_writeback() acknowledges): as
> > soon as TestClearPageWriteback has been done, that page could be removed
> > from page cache, freed, and reused for something else by the time that
> > wake_up_page() is reached.
> 
> Ugh.
> 
> Would it be possible to instead just make PageWriteback take the ref?
> 
> I don't hate your patch per se, but looking at that long explanation,
> and looking at the gyrations end_page_writeback() does, I go "why
> don't we do that?"
> 
> IOW, why couldn't we just make the __test_set_page_writeback()
> increment the page count if the writeback flag wasn't already set, and
> then make the end_page_writeback() do a put_page() after it all?

Right, that should be a lot simpler, and will not require any of the
cleanup (much as I liked that).  If you're reasonably confident that
adding the extra get_page+put_page to every writeback (instead of
just to the waited case, which I presume significantly less common)
will get lost in the noise - I was not confident of that, nor
confident of devising realistic tests to decide it.

What I did look into before sending, was whether in the filesystems
there was a pattern of doing a put_page() after *set_page_writeback(),
when it would just be a matter of deleting that put_page() and doing
it instead at the end of end_page_writeback().  But no: there were a
few cases like that, but in general no such pattern.

Though, what I think I'll try is not quite what you suggest there,
but instead do both get_page() and put_page() in end_page_writeback().
The reason being, there are a number of places (in mm at least) where
we judge what to do by the expected refcount: places that know to add
1 on when PagePrivate is set (for buffers), but do not expect to add
1 on when PageWriteback is set.  Now, all of those places probably
have to have their own wait_on_page_writeback() too, but I'd rather
narrow the window when the refcount is raised, than work through
what if any change would be needed in those places.

> >
> > Then on crashing a second time, realized there's a stronger reason against
> > that approach.  If my testing just occasionally crashes on that check,
> > when the page is reused for part of a compound page, wouldn't it be much
> > more common for the page to get reused as an order-0 page before reaching
> > wake_up_page()?  And on rare occasions, might that reused page already be
> > marked PageWriteback by its new user, and already be waited upon?  What
> > would that look like?
> >
> > It would look like BUG_ON(PageWriteback) after wait_on_page_writeback()
> > in write_cache_pages() (though I have never seen that crash myself).
> 
> So looking more at the patch, I started looking at this part:
> 
> > +       writeback = TestClearPageWriteback(page);
> > +       /* No need for smp_mb__after_atomic() after TestClear */
> > +       waiters = PageWaiters(page);
> > +       if (waiters) {
> > +               /*
> > +                * Writeback doesn't hold a page reference on its own, relying
> > +                * on truncation to wait for the clearing of PG_writeback.
> > +                * We could safely wake_up_page_bit(page, PG_writeback) here,
> > +                * while holding i_pages lock: but that would be a poor choice
> > +                * if the page is on a long hash chain; so instead choose to
> > +                * get_page+put_page - though atomics will add some overhead.
> > +                */
> > +               get_page(page);
> > +       }
> 
> and thinking more about this, my first reaction was "but that has the
> same race, just a smaller window".
> 
> And then reading the comment more, I realize you relied on the i_pages
> lock, and that this odd ordering was to avoid the possible latency.

Yes.  I decided to send the get_page+put_page variant, rather than the
wake_up_page_bit while holding i_pages variant (also tested), in part
because it's easier to edit the get_page+put_page one to the other.

> 
> But what about the non-mapping case? I'm not sure how that happens,
> but this does seem very fragile.

I don't see how the non-mapping case would ever occur: I think it
probably comes from a general pattern of caution about NULL mapping
when akpm (I think) originally wrote these functions.

> 
> I'm wondering why you didn't want to just do the get_page()
> unconditionally and early. Is avoiding the refcount really such a big
> optimization?

I don't know: I trust your judgement more than mine.

Hugh

  reply	other threads:[~2020-11-24  6:34 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-27  2:48 kernel BUG at fs/ext4/inode.c:LINE! syzbot
2020-08-28 10:07 ` Jan Kara
2020-08-31 10:03   ` Jan Kara
2020-08-31 18:21     ` Linus Torvalds
2020-08-31 18:21       ` Linus Torvalds
2020-11-24  4:07       ` Hugh Dickins
2020-11-24  4:07         ` Hugh Dickins
2020-11-24  4:26         ` Linus Torvalds
2020-11-24  4:26           ` Linus Torvalds
2020-11-24  4:53         ` Linus Torvalds
2020-11-24  4:53           ` Linus Torvalds
2020-11-24  6:34           ` Hugh Dickins [this message]
2020-11-24  6:34             ` Hugh Dickins
2020-11-24 16:46             ` Hugh Dickins
2020-11-24 16:46               ` Hugh Dickins
2020-11-24 12:19         ` Matthew Wilcox
2020-11-24 16:28           ` Hugh Dickins
2020-11-24 16:28             ` Hugh Dickins
2020-11-24 18:33             ` Matthew Wilcox
2020-11-24 19:00               ` Linus Torvalds
2020-11-24 19:00                 ` Linus Torvalds
2020-11-24 20:15                 ` Matthew Wilcox
2020-11-24 20:34                   ` Linus Torvalds
2020-11-24 20:34                     ` Linus Torvalds
2020-11-24 21:46                     ` Hugh Dickins
2020-11-24 21:46                       ` Hugh Dickins
2020-11-24 23:24                       ` Linus Torvalds
2020-11-24 23:24                         ` Linus Torvalds
2020-11-25 21:30                         ` Linus Torvalds
2020-11-25 21:30                           ` Linus Torvalds
2020-11-25 22:01                           ` Linus Torvalds
2020-11-25 22:01                             ` Linus Torvalds
2020-11-25  9:20           ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.2011232209540.5235@eggly.anvils \
    --to=hughd@google.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linux.alibaba.com \
    --cc=axboe@kernel.dk \
    --cc=cai@lca.pw \
    --cc=darrick.wong@oracle.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=kirill@shutemov.name \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=npiggin@gmail.com \
    --cc=oleg@redhat.com \
    --cc=syzbot+3622cea378100f45d59f@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=william.kucharski@oracle.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.