All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Transient errors in Direct I/O
@ 2020-06-05 20:48 Goldwyn Rodrigues
  2020-06-05 20:48 ` [PATCH 1/3] iomap: dio Return zero in case of unsuccessful pagecache invalidation Goldwyn Rodrigues
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Goldwyn Rodrigues @ 2020-06-05 20:48 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-btrfs, fdmanana, linux-fsdevel, hch

In current scenarios, for XFS, it would mean that a page invalidation
would end up being a writeback error. So, if iomap returns zero, fall
back to biffered I/O. XFS has never supported fallback to buffered I/O.
I hope it is not "never will" ;)

With mixed buffered and direct writes in btrfs, the pages may not be
released the extent may be locked in the ordered extents cleanup thread,
which must make changes to the btrfs trees. In case of btrfs, if it is
possible to wait, depending on the memory flags passed, wait for extent
bit to be cleared so direct I/O is executed so there is no need to
fallback to buffered I/O.

-- 
Goldwyn



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] iomap: dio Return zero in case of unsuccessful pagecache invalidation
  2020-06-05 20:48 [PATCH 0/3] Transient errors in Direct I/O Goldwyn Rodrigues
@ 2020-06-05 20:48 ` Goldwyn Rodrigues
  2020-06-06  3:13   ` Matthew Wilcox
  2020-06-05 20:48 ` [PATCH 2/3] btrfs: Wait for extent bits to release page Goldwyn Rodrigues
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Goldwyn Rodrigues @ 2020-06-05 20:48 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-btrfs, fdmanana, linux-fsdevel, hch, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Filesystems such as btrfs are unable to guarantee page invalidation
because extents could be locked because of an ongoing I/O. This happens
even though a filemap_write_and_wait() has been called because
btrfs locks extents in a separate cleanup thread until all ordered
extents in range have performed the tree changes.

Return zero in case a page cache invalidation is unsuccessful so
filesystems can fallback to buffered I/O.

This takes care of the following invalidation warning during btrfs
mixed buffered and direct I/O using iomap_dio_rw():

Page cache invalidation failure on direct I/O.  Possible data
corruption due to collision with buffered I/O!

This is similar to the behavior of generic_file_direct_write().

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/iomap/direct-io.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index e4addfc58107..215315be6233 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -483,9 +483,15 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	 */
 	ret = invalidate_inode_pages2_range(mapping,
 			pos >> PAGE_SHIFT, end >> PAGE_SHIFT);
-	if (ret)
-		dio_warn_stale_pagecache(iocb->ki_filp);
-	ret = 0;
+	/*
+	 * If a page can not be invalidated, return 0 to fall back
+	 * to buffered write.
+	 */
+	if (ret) {
+		if (ret == -EBUSY)
+			ret = 0;
+		goto out_free_dio;
+	}
 
 	if (iov_iter_rw(iter) == WRITE && !wait_for_completion &&
 	    !inode->i_sb->s_dio_done_wq) {
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] btrfs: Wait for extent bits to release page
  2020-06-05 20:48 [PATCH 0/3] Transient errors in Direct I/O Goldwyn Rodrigues
  2020-06-05 20:48 ` [PATCH 1/3] iomap: dio Return zero in case of unsuccessful pagecache invalidation Goldwyn Rodrigues
@ 2020-06-05 20:48 ` Goldwyn Rodrigues
  2020-06-08 10:20   ` Filipe Manana
  2020-06-05 20:48 ` [PATCH 3/3] xfs: fallback to buffered I/O if direct I/O is short Goldwyn Rodrigues
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Goldwyn Rodrigues @ 2020-06-05 20:48 UTC (permalink / raw)
  To: darrick.wong
  Cc: linux-btrfs, fdmanana, linux-fsdevel, hch, Goldwyn Rodrigues,
	Johannes Thumshirn, Nikolay Borisov

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

While trying to release a page, the extent containing the page may be locked
which would stop the page from being released. Wait for the
extent lock to be cleared, if blocking is allowed and then clear
the bits.

While we are at it, clean the code of try_release_extent_state() to make
it simpler.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/extent_io.c | 37 ++++++++++++++++---------------------
 fs/btrfs/extent_io.h |  2 +-
 fs/btrfs/inode.c     |  4 ++--
 3 files changed, 19 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c59e07360083..0ab444d2028d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4466,33 +4466,28 @@ int extent_invalidatepage(struct extent_io_tree *tree,
  * are locked or under IO and drops the related state bits if it is safe
  * to drop the page.
  */
-static int try_release_extent_state(struct extent_io_tree *tree,
+static bool try_release_extent_state(struct extent_io_tree *tree,
 				    struct page *page, gfp_t mask)
 {
 	u64 start = page_offset(page);
 	u64 end = start + PAGE_SIZE - 1;
-	int ret = 1;
 
 	if (test_range_bit(tree, start, end, EXTENT_LOCKED, 0, NULL)) {
-		ret = 0;
-	} else {
-		/*
-		 * at this point we can safely clear everything except the
-		 * locked bit and the nodatasum bit
-		 */
-		ret = __clear_extent_bit(tree, start, end,
-				 ~(EXTENT_LOCKED | EXTENT_NODATASUM),
-				 0, 0, NULL, mask, NULL);
-
-		/* if clear_extent_bit failed for enomem reasons,
-		 * we can't allow the release to continue.
-		 */
-		if (ret < 0)
-			ret = 0;
-		else
-			ret = 1;
+		if (!gfpflags_allow_blocking(mask))
+			return false;
+		wait_extent_bit(tree, start, end, EXTENT_LOCKED);
 	}
-	return ret;
+	/*
+	 * At this point we can safely clear everything except the locked and
+	 * nodatasum bits. If clear_extent_bit failed due to -ENOMEM,
+	 * don't allow release.
+	 */
+	if (__clear_extent_bit(tree, start, end,
+				~(EXTENT_LOCKED | EXTENT_NODATASUM), 0, 0,
+				NULL, mask, NULL) < 0)
+		return false;
+
+	return true;
 }
 
 /*
@@ -4500,7 +4495,7 @@ static int try_release_extent_state(struct extent_io_tree *tree,
  * in the range corresponding to the page, both state records and extent
  * map records are removed
  */
-int try_release_extent_mapping(struct page *page, gfp_t mask)
+bool try_release_extent_mapping(struct page *page, gfp_t mask)
 {
 	struct extent_map *em;
 	u64 start = page_offset(page);
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 9a10681b12bf..6cba4ad6ebc1 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -189,7 +189,7 @@ typedef struct extent_map *(get_extent_t)(struct btrfs_inode *inode,
 					  struct page *page, size_t pg_offset,
 					  u64 start, u64 len);
 
-int try_release_extent_mapping(struct page *page, gfp_t mask);
+bool try_release_extent_mapping(struct page *page, gfp_t mask);
 int try_release_extent_buffer(struct page *page);
 
 int extent_read_full_page(struct page *page, get_extent_t *get_extent,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1242d0aa108d..8cb44c49c1d2 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7887,8 +7887,8 @@ btrfs_readpages(struct file *file, struct address_space *mapping,
 
 static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags)
 {
-	int ret = try_release_extent_mapping(page, gfp_flags);
-	if (ret == 1) {
+	bool ret = try_release_extent_mapping(page, gfp_flags);
+	if (ret) {
 		ClearPagePrivate(page);
 		set_page_private(page, 0);
 		put_page(page);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] xfs: fallback to buffered I/O if direct I/O is short
  2020-06-05 20:48 [PATCH 0/3] Transient errors in Direct I/O Goldwyn Rodrigues
  2020-06-05 20:48 ` [PATCH 1/3] iomap: dio Return zero in case of unsuccessful pagecache invalidation Goldwyn Rodrigues
  2020-06-05 20:48 ` [PATCH 2/3] btrfs: Wait for extent bits to release page Goldwyn Rodrigues
@ 2020-06-05 20:48 ` Goldwyn Rodrigues
  2020-06-10  2:59 ` [PATCH 0/3] Transient errors in Direct I/O Dave Chinner
  2020-06-10  5:31 ` Qu Wenruo
  4 siblings, 0 replies; 13+ messages in thread
From: Goldwyn Rodrigues @ 2020-06-05 20:48 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-btrfs, fdmanana, linux-fsdevel, hch, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Most filesystems such as ext4 and btrfs fallback to buffered I/O in case
direct write's fail. In case direct I/O is short, fallback to buffered
write to complete the I/O. Make sure the data is on disk by performing a
filemap_write_and_wait_range() and invalidating the pages in the range.

For reads, call xfs_file_buffered_aio_read() in case of short I/O.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/xfs/xfs_file.c | 41 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 34 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 4b8bdecc3863..786391228dea 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -31,6 +31,10 @@
 #include <linux/fadvise.h>
 
 static const struct vm_operations_struct xfs_file_vm_ops;
+STATIC ssize_t xfs_file_buffered_aio_write(struct kiocb *iocb,
+		struct iov_iter	*from);
+STATIC ssize_t xfs_file_buffered_aio_read(struct kiocb *iocb,
+		struct iov_iter	*to);
 
 int
 xfs_update_prealloc_flags(
@@ -169,6 +173,7 @@ xfs_file_dio_aio_read(
 	struct xfs_inode	*ip = XFS_I(file_inode(iocb->ki_filp));
 	size_t			count = iov_iter_count(to);
 	ssize_t			ret;
+	ssize_t			buffered_read = 0;
 
 	trace_xfs_file_direct_read(ip, count, iocb->ki_pos);
 
@@ -187,7 +192,13 @@ xfs_file_dio_aio_read(
 			is_sync_kiocb(iocb));
 	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
 
-	return ret;
+	if (ret < 0 || ret == count)
+		return ret;
+
+	iocb->ki_flags &= ~IOCB_DIRECT;
+	buffered_read = xfs_file_buffered_aio_read(iocb, to);
+
+	return ret + buffered_read;
 }
 
 static noinline ssize_t
@@ -483,6 +494,9 @@ xfs_file_dio_aio_write(
 	int			iolock;
 	size_t			count = iov_iter_count(from);
 	struct xfs_buftarg      *target = xfs_inode_buftarg(ip);
+	loff_t			offset, end;
+	ssize_t			buffered_write = 0;
+	int			err;
 
 	/* DIO must be aligned to device logical sector size */
 	if ((iocb->ki_pos | count) & target->bt_logical_sectormask)
@@ -552,12 +566,25 @@ xfs_file_dio_aio_write(
 out:
 	xfs_iunlock(ip, iolock);
 
-	/*
-	 * No fallback to buffered IO on errors for XFS, direct IO will either
-	 * complete fully or fail.
-	 */
-	ASSERT(ret < 0 || ret == count);
-	return ret;
+	if (ret < 0 || ret == count)
+		return ret;
+
+	/* Fallback to buffered write */
+
+	offset = iocb->ki_pos;
+
+	buffered_write = xfs_file_buffered_aio_write(iocb, from);
+	if (buffered_write < 0)
+		return ret;
+
+	end = offset + buffered_write - 1;
+
+	err = filemap_write_and_wait_range(mapping, offset, end);
+	if (!err)
+		invalidate_mapping_pages(mapping, offset >> PAGE_SHIFT,
+				end >> PAGE_SHIFT);
+
+	return ret + buffered_write;
 }
 
 static noinline ssize_t
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] iomap: dio Return zero in case of unsuccessful pagecache invalidation
  2020-06-05 20:48 ` [PATCH 1/3] iomap: dio Return zero in case of unsuccessful pagecache invalidation Goldwyn Rodrigues
@ 2020-06-06  3:13   ` Matthew Wilcox
  0 siblings, 0 replies; 13+ messages in thread
From: Matthew Wilcox @ 2020-06-06  3:13 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: darrick.wong, linux-btrfs, fdmanana, linux-fsdevel, hch,
	Goldwyn Rodrigues

On Fri, Jun 05, 2020 at 03:48:36PM -0500, Goldwyn Rodrigues wrote:
> Return zero in case a page cache invalidation is unsuccessful so
> filesystems can fallback to buffered I/O.

I don't think it's acceptable to change common code to fix a btrfs bug
at this point in the release cycle.  This change needs to be agreed to
before the -rc5 stage, not during the pre-rc1 window.

If you can't fix this in btrfs alone, then back out the btrfs changes
for now.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] btrfs: Wait for extent bits to release page
  2020-06-05 20:48 ` [PATCH 2/3] btrfs: Wait for extent bits to release page Goldwyn Rodrigues
@ 2020-06-08 10:20   ` Filipe Manana
  2020-06-08 12:13     ` Goldwyn Rodrigues
  0 siblings, 1 reply; 13+ messages in thread
From: Filipe Manana @ 2020-06-08 10:20 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: Darrick J. Wong, linux-btrfs, linux-fsdevel, Christoph Hellwig,
	Goldwyn Rodrigues, Johannes Thumshirn, Nikolay Borisov

On Fri, Jun 5, 2020 at 9:48 PM Goldwyn Rodrigues <rgoldwyn@suse.de> wrote:
>
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>
> While trying to release a page, the extent containing the page may be locked
> which would stop the page from being released. Wait for the
> extent lock to be cleared, if blocking is allowed and then clear
> the bits.
>
> While we are at it, clean the code of try_release_extent_state() to make
> it simpler.
>
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

I'm confused Goldwyn.

Previously in another thread [1] you mentioned you dropped this patch
from a previous patchset because
it was causing locking issues (iirc you mentioned a deadlock in
another different thread).

Now you send exactly the same patch (unless I missed some very subtle
change, in which case keeping the reviewed-by tags is not correct).
Are the locking issues gone? What fixed them?
And how did you trigger those issues, some specific fstest (which?),
some other test (which/how)?

And if this patch is now working for some reason, then why are patches
1/3 and 3/3 needed?
Wasn't patch 1/3 motivated exactly because this patch (2/3) was
causing the locking issues.

Thanks.

[1] https://lore.kernel.org/linux-btrfs/20200526164428.sirhx6yjsghxpnqt@fiona/

> ---
>  fs/btrfs/extent_io.c | 37 ++++++++++++++++---------------------
>  fs/btrfs/extent_io.h |  2 +-
>  fs/btrfs/inode.c     |  4 ++--
>  3 files changed, 19 insertions(+), 24 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index c59e07360083..0ab444d2028d 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4466,33 +4466,28 @@ int extent_invalidatepage(struct extent_io_tree *tree,
>   * are locked or under IO and drops the related state bits if it is safe
>   * to drop the page.
>   */
> -static int try_release_extent_state(struct extent_io_tree *tree,
> +static bool try_release_extent_state(struct extent_io_tree *tree,
>                                     struct page *page, gfp_t mask)
>  {
>         u64 start = page_offset(page);
>         u64 end = start + PAGE_SIZE - 1;
> -       int ret = 1;
>
>         if (test_range_bit(tree, start, end, EXTENT_LOCKED, 0, NULL)) {
> -               ret = 0;
> -       } else {
> -               /*
> -                * at this point we can safely clear everything except the
> -                * locked bit and the nodatasum bit
> -                */
> -               ret = __clear_extent_bit(tree, start, end,
> -                                ~(EXTENT_LOCKED | EXTENT_NODATASUM),
> -                                0, 0, NULL, mask, NULL);
> -
> -               /* if clear_extent_bit failed for enomem reasons,
> -                * we can't allow the release to continue.
> -                */
> -               if (ret < 0)
> -                       ret = 0;
> -               else
> -                       ret = 1;
> +               if (!gfpflags_allow_blocking(mask))
> +                       return false;
> +               wait_extent_bit(tree, start, end, EXTENT_LOCKED);
>         }
> -       return ret;
> +       /*
> +        * At this point we can safely clear everything except the locked and
> +        * nodatasum bits. If clear_extent_bit failed due to -ENOMEM,
> +        * don't allow release.
> +        */
> +       if (__clear_extent_bit(tree, start, end,
> +                               ~(EXTENT_LOCKED | EXTENT_NODATASUM), 0, 0,
> +                               NULL, mask, NULL) < 0)
> +               return false;
> +
> +       return true;
>  }
>
>  /*
> @@ -4500,7 +4495,7 @@ static int try_release_extent_state(struct extent_io_tree *tree,
>   * in the range corresponding to the page, both state records and extent
>   * map records are removed
>   */
> -int try_release_extent_mapping(struct page *page, gfp_t mask)
> +bool try_release_extent_mapping(struct page *page, gfp_t mask)
>  {
>         struct extent_map *em;
>         u64 start = page_offset(page);
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index 9a10681b12bf..6cba4ad6ebc1 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -189,7 +189,7 @@ typedef struct extent_map *(get_extent_t)(struct btrfs_inode *inode,
>                                           struct page *page, size_t pg_offset,
>                                           u64 start, u64 len);
>
> -int try_release_extent_mapping(struct page *page, gfp_t mask);
> +bool try_release_extent_mapping(struct page *page, gfp_t mask);
>  int try_release_extent_buffer(struct page *page);
>
>  int extent_read_full_page(struct page *page, get_extent_t *get_extent,
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 1242d0aa108d..8cb44c49c1d2 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7887,8 +7887,8 @@ btrfs_readpages(struct file *file, struct address_space *mapping,
>
>  static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags)
>  {
> -       int ret = try_release_extent_mapping(page, gfp_flags);
> -       if (ret == 1) {
> +       bool ret = try_release_extent_mapping(page, gfp_flags);
> +       if (ret) {
>                 ClearPagePrivate(page);
>                 set_page_private(page, 0);
>                 put_page(page);
> --
> 2.25.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] btrfs: Wait for extent bits to release page
  2020-06-08 10:20   ` Filipe Manana
@ 2020-06-08 12:13     ` Goldwyn Rodrigues
  0 siblings, 0 replies; 13+ messages in thread
From: Goldwyn Rodrigues @ 2020-06-08 12:13 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Darrick J. Wong, linux-btrfs, linux-fsdevel, Christoph Hellwig,
	Johannes Thumshirn, Nikolay Borisov

On 11:20 08/06, Filipe Manana wrote:
> On Fri, Jun 5, 2020 at 9:48 PM Goldwyn Rodrigues <rgoldwyn@suse.de> wrote:
> >
> > From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> >
> > While trying to release a page, the extent containing the page may be locked
> > which would stop the page from being released. Wait for the
> > extent lock to be cleared, if blocking is allowed and then clear
> > the bits.
> >
> > While we are at it, clean the code of try_release_extent_state() to make
> > it simpler.
> >
> > Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> > Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> > Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> I'm confused Goldwyn.
> 
> Previously in another thread [1] you mentioned you dropped this patch
> from a previous patchset because
> it was causing locking issues (iirc you mentioned a deadlock in
> another different thread).
> 
> Now you send exactly the same patch (unless I missed some very subtle
> change, in which case keeping the reviewed-by tags is not correct).
> Are the locking issues gone? What fixed them?
> And how did you trigger those issues, some specific fstest (which?),
> some other test (which/how)?

I ran xfstests and did not find the lockups I was finding earlier.
Unfortunately, I don't have the lockups, but the process would wait for
the extent_bits to get unlocked.

> 
> And if this patch is now working for some reason, then why are patches
> 1/3 and 3/3 needed?
> Wasn't patch 1/3 motivated exactly because this patch (2/3) was
> causing the locking issues.

This patch reduces the amount of time btrfs has to fallback to the
buffered path. Probably I should point this out in the patch header.
The other two patches are required to make sure we don't err during
transient release page errors, while this one reduces the amount of
these transient errors in the first place.


> 
> Thanks.
> 
> [1] https://lore.kernel.org/linux-btrfs/20200526164428.sirhx6yjsghxpnqt@fiona/
> 
> > ---
> >  fs/btrfs/extent_io.c | 37 ++++++++++++++++---------------------
> >  fs/btrfs/extent_io.h |  2 +-
> >  fs/btrfs/inode.c     |  4 ++--
> >  3 files changed, 19 insertions(+), 24 deletions(-)
> >
> > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > index c59e07360083..0ab444d2028d 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -4466,33 +4466,28 @@ int extent_invalidatepage(struct extent_io_tree *tree,
> >   * are locked or under IO and drops the related state bits if it is safe
> >   * to drop the page.
> >   */
> > -static int try_release_extent_state(struct extent_io_tree *tree,
> > +static bool try_release_extent_state(struct extent_io_tree *tree,
> >                                     struct page *page, gfp_t mask)
> >  {
> >         u64 start = page_offset(page);
> >         u64 end = start + PAGE_SIZE - 1;
> > -       int ret = 1;
> >
> >         if (test_range_bit(tree, start, end, EXTENT_LOCKED, 0, NULL)) {
> > -               ret = 0;
> > -       } else {
> > -               /*
> > -                * at this point we can safely clear everything except the
> > -                * locked bit and the nodatasum bit
> > -                */
> > -               ret = __clear_extent_bit(tree, start, end,
> > -                                ~(EXTENT_LOCKED | EXTENT_NODATASUM),
> > -                                0, 0, NULL, mask, NULL);
> > -
> > -               /* if clear_extent_bit failed for enomem reasons,
> > -                * we can't allow the release to continue.
> > -                */
> > -               if (ret < 0)
> > -                       ret = 0;
> > -               else
> > -                       ret = 1;
> > +               if (!gfpflags_allow_blocking(mask))
> > +                       return false;
> > +               wait_extent_bit(tree, start, end, EXTENT_LOCKED);
> >         }
> > -       return ret;
> > +       /*
> > +        * At this point we can safely clear everything except the locked and
> > +        * nodatasum bits. If clear_extent_bit failed due to -ENOMEM,
> > +        * don't allow release.
> > +        */
> > +       if (__clear_extent_bit(tree, start, end,
> > +                               ~(EXTENT_LOCKED | EXTENT_NODATASUM), 0, 0,
> > +                               NULL, mask, NULL) < 0)
> > +               return false;
> > +
> > +       return true;
> >  }
> >
> >  /*
> > @@ -4500,7 +4495,7 @@ static int try_release_extent_state(struct extent_io_tree *tree,
> >   * in the range corresponding to the page, both state records and extent
> >   * map records are removed
> >   */
> > -int try_release_extent_mapping(struct page *page, gfp_t mask)
> > +bool try_release_extent_mapping(struct page *page, gfp_t mask)
> >  {
> >         struct extent_map *em;
> >         u64 start = page_offset(page);
> > diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> > index 9a10681b12bf..6cba4ad6ebc1 100644
> > --- a/fs/btrfs/extent_io.h
> > +++ b/fs/btrfs/extent_io.h
> > @@ -189,7 +189,7 @@ typedef struct extent_map *(get_extent_t)(struct btrfs_inode *inode,
> >                                           struct page *page, size_t pg_offset,
> >                                           u64 start, u64 len);
> >
> > -int try_release_extent_mapping(struct page *page, gfp_t mask);
> > +bool try_release_extent_mapping(struct page *page, gfp_t mask);
> >  int try_release_extent_buffer(struct page *page);
> >
> >  int extent_read_full_page(struct page *page, get_extent_t *get_extent,
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index 1242d0aa108d..8cb44c49c1d2 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -7887,8 +7887,8 @@ btrfs_readpages(struct file *file, struct address_space *mapping,
> >
> >  static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags)
> >  {
> > -       int ret = try_release_extent_mapping(page, gfp_flags);
> > -       if (ret == 1) {
> > +       bool ret = try_release_extent_mapping(page, gfp_flags);
> > +       if (ret) {
> >                 ClearPagePrivate(page);
> >                 set_page_private(page, 0);
> >                 put_page(page);
> > --
> > 2.25.0
> >
> 
> 
> -- 
> Filipe David Manana,
> 
> “Whether you think you can, or you think you can't — you're right.”

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] Transient errors in Direct I/O
  2020-06-05 20:48 [PATCH 0/3] Transient errors in Direct I/O Goldwyn Rodrigues
                   ` (2 preceding siblings ...)
  2020-06-05 20:48 ` [PATCH 3/3] xfs: fallback to buffered I/O if direct I/O is short Goldwyn Rodrigues
@ 2020-06-10  2:59 ` Dave Chinner
  2020-06-10  5:05   ` Dave Chinner
  2020-06-10  5:31 ` Qu Wenruo
  4 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2020-06-10  2:59 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: darrick.wong, linux-btrfs, fdmanana, linux-fsdevel, hch, linux-xfs

[ Please cc the XFS list on XFS and iomap infrastructure changes.]

On Fri, Jun 05, 2020 at 03:48:35PM -0500, Goldwyn Rodrigues wrote:
> In current scenarios, for XFS, it would mean that a page invalidation
> would end up being a writeback error. So, if iomap returns zero, fall
> back to biffered I/O. XFS has never supported fallback to buffered I/O.
> I hope it is not "never will" ;)

I wouldn't say "never", but we are not going to change XFS behaviour
because btrfs has a page invalidation vs DIO bug in it...

If you want to make a specific filesystem fall back to buffered IO
in this case, pass a new flag into iomap_dio_rw() to conditionally
abort the DIO on invalidation failure and let filesystem
implementations opt-in to use that flag.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] Transient errors in Direct I/O
  2020-06-10  2:59 ` [PATCH 0/3] Transient errors in Direct I/O Dave Chinner
@ 2020-06-10  5:05   ` Dave Chinner
  2020-06-11 14:13     ` Goldwyn Rodrigues
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2020-06-10  5:05 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: darrick.wong, linux-btrfs, fdmanana, linux-fsdevel, hch, linux-xfs

On Wed, Jun 10, 2020 at 12:59:00PM +1000, Dave Chinner wrote:
> [ Please cc the XFS list on XFS and iomap infrastructure changes.]
> 
> On Fri, Jun 05, 2020 at 03:48:35PM -0500, Goldwyn Rodrigues wrote:
> > In current scenarios, for XFS, it would mean that a page invalidation
> > would end up being a writeback error. So, if iomap returns zero, fall
> > back to biffered I/O. XFS has never supported fallback to buffered I/O.
> > I hope it is not "never will" ;)
> 
> I wouldn't say "never", but we are not going to change XFS behaviour
> because btrfs has a page invalidation vs DIO bug in it...

Let me point out a specific "oh shit, I didn't think of that" sort
of problem that your blind fallback to buffered IO causes. Do this
via direct IO:

	pwritev2(RWF_NOWAIT)

now have it fail invalidation in the direct IO path and fallback to
buffered write. What does buffered write do with it?


	if (iocb->ki_flags & IOCB_NOWAIT)
		return -EOPNOTSUPP;

Yup, userspace gets a completely spurious and bogus -EOPNOTSUPP
error to pwritev2() because some 3rd party is accessing the same
file via mmap or buffered IO.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] Transient errors in Direct I/O
  2020-06-05 20:48 [PATCH 0/3] Transient errors in Direct I/O Goldwyn Rodrigues
                   ` (3 preceding siblings ...)
  2020-06-10  2:59 ` [PATCH 0/3] Transient errors in Direct I/O Dave Chinner
@ 2020-06-10  5:31 ` Qu Wenruo
  2020-06-11 14:11   ` Goldwyn Rodrigues
  4 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2020-06-10  5:31 UTC (permalink / raw)
  To: Goldwyn Rodrigues, darrick.wong; +Cc: linux-btrfs, fdmanana, linux-fsdevel, hch


[-- Attachment #1.1: Type: text/plain, Size: 1289 bytes --]



On 2020/6/6 上午4:48, Goldwyn Rodrigues wrote:
> In current scenarios, for XFS, it would mean that a page invalidation
> would end up being a writeback error. So, if iomap returns zero, fall
> back to biffered I/O. XFS has never supported fallback to buffered I/O.
> I hope it is not "never will" ;)
> 
> With mixed buffered and direct writes in btrfs, the pages may not be
> released the extent may be locked in the ordered extents cleanup thread,

I'm wondering can we handle this case in a different way.

In fact btrfs has its own special handling for invalidating pages.
Btrfs will first look for any ordered extents covering the page, finish
the ordered extent manually, then invalidate the page.

I'm not sure why invalidate_inode_pages2_range() used in dio iomap code
does not use the fs specific invalidatepage(), but only do_lander_page()
then releasepage().

Shouldn'y we btrfs implement the lander_page() to handle ordered extents
properly?
Or is there any special requirement?

Thanks,
Qu

> which must make changes to the btrfs trees. In case of btrfs, if it is
> possible to wait, depending on the memory flags passed, wait for extent
> bit to be cleared so direct I/O is executed so there is no need to
> fallback to buffered I/O.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] Transient errors in Direct I/O
  2020-06-10  5:31 ` Qu Wenruo
@ 2020-06-11 14:11   ` Goldwyn Rodrigues
  2020-06-12 12:56     ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: Goldwyn Rodrigues @ 2020-06-11 14:11 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: darrick.wong, linux-btrfs, fdmanana, linux-fsdevel, hch

[-- Attachment #1: Type: text/plain, Size: 1376 bytes --]

On 13:31 10/06, Qu Wenruo wrote:
> 
> 
> On 2020/6/6 上午4:48, Goldwyn Rodrigues wrote:
> > In current scenarios, for XFS, it would mean that a page invalidation
> > would end up being a writeback error. So, if iomap returns zero, fall
> > back to biffered I/O. XFS has never supported fallback to buffered I/O.
> > I hope it is not "never will" ;)
> > 
> > With mixed buffered and direct writes in btrfs, the pages may not be
> > released the extent may be locked in the ordered extents cleanup thread,
> 
> I'm wondering can we handle this case in a different way.
> 
> In fact btrfs has its own special handling for invalidating pages.
> Btrfs will first look for any ordered extents covering the page, finish
> the ordered extent manually, then invalidate the page.
> 
> I'm not sure why invalidate_inode_pages2_range() used in dio iomap code
> does not use the fs specific invalidatepage(), but only do_lander_page()
> then releasepage().
> 
> Shouldn'y we btrfs implement the lander_page() to handle ordered extents
> properly?
> Or is there any special requirement?
> 

The problem is aops->launder_page() is called only if PG_Dirty is
set. In this case it is not because we just performed a writeback.

Also, it is not just ordered ordered extent writeback which may lock
the extent, a buffered read can lock as well.

-- 
Goldwyn

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] Transient errors in Direct I/O
  2020-06-10  5:05   ` Dave Chinner
@ 2020-06-11 14:13     ` Goldwyn Rodrigues
  0 siblings, 0 replies; 13+ messages in thread
From: Goldwyn Rodrigues @ 2020-06-11 14:13 UTC (permalink / raw)
  To: Dave Chinner
  Cc: darrick.wong, linux-btrfs, fdmanana, linux-fsdevel, hch, linux-xfs

On 15:05 10/06, Dave Chinner wrote:
> On Wed, Jun 10, 2020 at 12:59:00PM +1000, Dave Chinner wrote:
> > [ Please cc the XFS list on XFS and iomap infrastructure changes.]
> > 
> > On Fri, Jun 05, 2020 at 03:48:35PM -0500, Goldwyn Rodrigues wrote:
> > > In current scenarios, for XFS, it would mean that a page invalidation
> > > would end up being a writeback error. So, if iomap returns zero, fall
> > > back to biffered I/O. XFS has never supported fallback to buffered I/O.
> > > I hope it is not "never will" ;)
> > 
> > I wouldn't say "never", but we are not going to change XFS behaviour
> > because btrfs has a page invalidation vs DIO bug in it...
> 
> Let me point out a specific "oh shit, I didn't think of that" sort
> of problem that your blind fallback to buffered IO causes. Do this
> via direct IO:
> 
> 	pwritev2(RWF_NOWAIT)
> 
> now have it fail invalidation in the direct IO path and fallback to
> buffered write. What does buffered write do with it?
> 
> 
> 	if (iocb->ki_flags & IOCB_NOWAIT)
> 		return -EOPNOTSUPP;
> 
> Yup, userspace gets a completely spurious and bogus -EOPNOTSUPP
> error to pwritev2() because some 3rd party is accessing the same
> file via mmap or buffered IO.
> 

Oh shit, I didn't think about that!

I think adding a flag to iomap_dio_rw() to return in case of page
invalidation failure is the best option for now.

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] Transient errors in Direct I/O
  2020-06-11 14:11   ` Goldwyn Rodrigues
@ 2020-06-12 12:56     ` Qu Wenruo
  0 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2020-06-12 12:56 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: darrick.wong, linux-btrfs, fdmanana, linux-fsdevel, hch


[-- Attachment #1.1: Type: text/plain, Size: 1766 bytes --]



On 2020/6/11 下午10:11, Goldwyn Rodrigues wrote:
> On 13:31 10/06, Qu Wenruo wrote:
>>
>>
>> On 2020/6/6 上午4:48, Goldwyn Rodrigues wrote:
>>> In current scenarios, for XFS, it would mean that a page invalidation
>>> would end up being a writeback error. So, if iomap returns zero, fall
>>> back to biffered I/O. XFS has never supported fallback to buffered I/O.
>>> I hope it is not "never will" ;)
>>>
>>> With mixed buffered and direct writes in btrfs, the pages may not be
>>> released the extent may be locked in the ordered extents cleanup thread,
>>
>> I'm wondering can we handle this case in a different way.
>>
>> In fact btrfs has its own special handling for invalidating pages.
>> Btrfs will first look for any ordered extents covering the page, finish
>> the ordered extent manually, then invalidate the page.
>>
>> I'm not sure why invalidate_inode_pages2_range() used in dio iomap code
>> does not use the fs specific invalidatepage(), but only do_lander_page()
>> then releasepage().
>>
>> Shouldn'y we btrfs implement the lander_page() to handle ordered extents
>> properly?
>> Or is there any special requirement?
>>
> 
> The problem is aops->launder_page() is called only if PG_Dirty is
> set. In this case it is not because we just performed a writeback.

For the dio iomap code, before btrfs_finish_ordered_io(), the pages in
ordered ranges are still dirty.
So launder_page() here can still get triggered to finish the ordered extent.

> 
> Also, it is not just ordered ordered extent writeback which may lock
> the extent, a buffered read can lock as well.
> 
That's right.
But at least that would greatly reduce the possibility for btrfs to fall
back to buffered IO, right?

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-06-12 12:56 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-05 20:48 [PATCH 0/3] Transient errors in Direct I/O Goldwyn Rodrigues
2020-06-05 20:48 ` [PATCH 1/3] iomap: dio Return zero in case of unsuccessful pagecache invalidation Goldwyn Rodrigues
2020-06-06  3:13   ` Matthew Wilcox
2020-06-05 20:48 ` [PATCH 2/3] btrfs: Wait for extent bits to release page Goldwyn Rodrigues
2020-06-08 10:20   ` Filipe Manana
2020-06-08 12:13     ` Goldwyn Rodrigues
2020-06-05 20:48 ` [PATCH 3/3] xfs: fallback to buffered I/O if direct I/O is short Goldwyn Rodrigues
2020-06-10  2:59 ` [PATCH 0/3] Transient errors in Direct I/O Dave Chinner
2020-06-10  5:05   ` Dave Chinner
2020-06-11 14:13     ` Goldwyn Rodrigues
2020-06-10  5:31 ` Qu Wenruo
2020-06-11 14:11   ` Goldwyn Rodrigues
2020-06-12 12:56     ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.