linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@gmail.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
	Christoph Hellwig <hch@infradead.org>,
	dsterba@suse.cz
Subject: Re: [PATCH] iomap: Return zero in case of unsuccessful pagecache invalidation before DIO
Date: Fri, 29 May 2020 11:55:33 +0100	[thread overview]
Message-ID: <CAL3q7H4Mit=P9yHsZXKsVxMN0=7m7gS2ZqiGTbyFz5Aid9t3hQ@mail.gmail.com> (raw)
In-Reply-To: <20200529002319.GQ252930@magnolia>

On Fri, May 29, 2020 at 1:23 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> On Thu, May 28, 2020 at 02:21:03PM -0500, Goldwyn Rodrigues wrote:
> >
> > Filesystems such as btrfs are unable to guarantee page invalidation
> > because pages could be locked as a part of the extent. Return zero
>
> Locked for what?  filemap_write_and_wait_range should have just cleaned
> them off.

Yes, it will be confusing even for someone more familiar with btrfs.
The changelog could be more detailed to make it clear what's happening and why.

So what happens:

1) iomap_dio_rw() calls filemap_write_and_wait_range().
    That starts delalloc for all dirty pages in the range and then
waits for writeback to complete.
    This is enough for most filesystems at least (if not all except btrfs).

2) However, in btrfs once writeback finishes, a job is queued to run
on a dedicated workqueue, to execute the function
btrfs_finish_ordered_io().
    So that job will be run after filemap_write_and_wait_range() returns.
    That function locks the file range (using a btrfs specific data
structure), does a bunch of things (updating several btrees), and then
unlocks the file range.

3) While iomap calls invalidate_inode_pages2_range(), which ends up
calling the btrfs callback btfs_releasepage(),
    btrfs_finish_ordered_io() is running and has the file range locked
(this is what Goldwyn means by "pages could be locked", which is
confusing because it's not about any locked struct page).

4) Because the file range is locked, btrfs_releasepage() returns 0
(page can't be released), this happens in the helper function
try_release_extent_state().
    Any page in that range is not dirty nor under writeback anymore
and, in fact, btrfs_finished_ordered_io() doesn't do anything with the
pages, it's only updating metadata.

    btrfs_releasepage() in this case could release the pages, but
there are other contextes where the file range is locked, the pages
are still not dirty and not under writeback, where this would not be
safe to do.

5) So because of that invalidate_inode_pages2_range() returns
non-zero, the iomap code prints that warning message and then proceeds
with doing a direct IO write anyway.

What happens currently in btrfs, before Goldwyn's patchset:

1) generic_file_direct_write() also calls filemap_write_and_wait_range().
2) After that it calls invalidate_inode_pages2_range() too, but if
that returns non-zero, it doesn't print any warning and falls back to
a buffered write.

So Goldwyn here is effectively adding that behaviour from
generic_file_direct_write() to iomap.

Thanks.

>
> > in case a page cache invalidation is unsuccessful so filesystems can
> > fallback to buffered I/O. This is similar to
> > generic_file_direct_write().
> >
> > This takes care of the following invalidation warning during btrfs
> > mixed buffered and direct I/O using iomap_dio_rw():
> >
> > Page cache invalidation failure on direct I/O.  Possible data
> > corruption due to collision with buffered I/O!
> >
> > Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> >
> > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > index e4addfc58107..215315be6233 100644
> > --- a/fs/iomap/direct-io.c
> > +++ b/fs/iomap/direct-io.c
> > @@ -483,9 +483,15 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> >        */
> >       ret = invalidate_inode_pages2_range(mapping,
> >                       pos >> PAGE_SHIFT, end >> PAGE_SHIFT);
> > -     if (ret)
> > -             dio_warn_stale_pagecache(iocb->ki_filp);
> > -     ret = 0;
> > +     /*
> > +      * If a page can not be invalidated, return 0 to fall back
> > +      * to buffered write.
> > +      */
> > +     if (ret) {
> > +             if (ret == -EBUSY)
> > +                     ret = 0;
> > +             goto out_free_dio;
>
> XFS doesn't fall back to buffered io when directio fails, which means
> this will cause a regression there.
>
> Granted mixing write types is bogus...
>
> --D
>
> > +     }
> >
> >       if (iov_iter_rw(iter) == WRITE && !wait_for_completion &&
> >           !inode->i_sb->s_dio_done_wq) {
> >
> > --
> > Goldwyn



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

  reply	other threads:[~2020-05-29 10:55 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-28 19:21 [PATCH] iomap: Return zero in case of unsuccessful pagecache invalidation before DIO Goldwyn Rodrigues
2020-05-29  0:23 ` Darrick J. Wong
2020-05-29 10:55   ` Filipe Manana [this message]
2020-05-29 11:31     ` Matthew Wilcox
2020-05-29 11:50       ` Filipe Manana
2020-05-29 12:45         ` Goldwyn Rodrigues
2020-06-01 15:16   ` Goldwyn Rodrigues
2020-06-03 11:23     ` Filipe Manana
2020-06-03 11:32       ` Filipe Manana
2020-06-03 19:02         ` Darrick J. Wong
2020-06-03 19:10           ` Filipe Manana
2020-06-03 19:18             ` Matthew Wilcox
2020-06-03 21:07           ` Goldwyn Rodrigues
2020-06-04 13:55           ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAL3q7H4Mit=P9yHsZXKsVxMN0=7m7gS2ZqiGTbyFz5Aid9t3hQ@mail.gmail.com' \
    --to=fdmanana@gmail.com \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=darrick.wong@oracle.com \
    --cc=dsterba@suse.cz \
    --cc=hch@infradead.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=rgoldwyn@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).