* [PATCH] iomap: don't invalidate folios after writeback errors
@ 2022-05-16 3:37 Darrick J. Wong
2022-05-16 12:27 ` Matthew Wilcox
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Darrick J. Wong @ 2022-05-16 3:37 UTC (permalink / raw)
To: Dave Chinner, Matthew Wilcox; +Cc: linux-fsdevel, xfs, Christoph Hellwig
From: Darrick J. Wong <djwong@kernel.org>
XFS has the unique behavior (as compared to the other Linux filesystems)
that on writeback errors it will completely invalidate the affected
folio and force the page cache to reread the contents from disk. All
other filesystems leave the page mapped and up to date.
This is a rude awakening for user programs, since (in the case where
write fails but reread doesn't) file contents will appear to revert to
old disk contents with no notification other than an EIO on fsync. This
might have been annoying back in the days when iomap dealt with one page
at a time, but with multipage folios, we can now throw away *megabytes*
worth of data for a single write error.
On *most* Linux filesystems, a program can respond to an EIO on write by
redirtying the entire file and scheduling it for writeback. This isn't
foolproof, since the page that failed writeback is no longer dirty and
could be evicted, but programs that want to recover properly *also*
have to detect XFS and regenerate every write they've made to the file.
When running xfs/314 on arm64, I noticed a UAF bug when xfs_discard_folio
invalidates multipage folios that could be undergoing writeback. If,
say, we have a 256K folio caching a mix of written and unwritten
extents, it's possible that we could start writeback of the first (say)
64K of the folio and then hit a writeback error on the next 64K. We
then free the iop attached to the folio, which is really bad because
writeback completion on the first 64k will trip over the "blocks per
folio > 1 && !iop" assertion.
This can't be fixed by only invalidating the folio if writeback fails at
the start of the folio, since the folio is marked !uptodate, which trips
other assertions elsewhere. Get rid of the whole behavior entirely.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
fs/iomap/buffered-io.c | 1 -
fs/xfs/xfs_aops.c | 4 +---
2 files changed, 1 insertion(+), 4 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 8fb9b2797fc5..94b53cbdefad 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1387,7 +1387,6 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
if (wpc->ops->discard_folio)
wpc->ops->discard_folio(folio, pos);
if (!count) {
- folio_clear_uptodate(folio);
folio_unlock(folio);
goto done;
}
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 90b7f4d127de..f6216d0fb0c2 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -464,7 +464,7 @@ xfs_discard_folio(
int error;
if (xfs_is_shutdown(mp))
- goto out_invalidate;
+ return;
xfs_alert_ratelimited(mp,
"page discard on page "PTR_FMT", inode 0x%llx, pos %llu.",
@@ -474,8 +474,6 @@ xfs_discard_folio(
i_blocks_per_folio(inode, folio) - pageoff_fsb);
if (error && !xfs_is_shutdown(mp))
xfs_alert(mp, "page discard unable to remove delalloc mapping.");
-out_invalidate:
- iomap_invalidate_folio(folio, offset, folio_size(folio) - offset);
}
static const struct iomap_writeback_ops xfs_writeback_ops = {
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] iomap: don't invalidate folios after writeback errors
2022-05-16 3:37 [PATCH] iomap: don't invalidate folios after writeback errors Darrick J. Wong
@ 2022-05-16 12:27 ` Matthew Wilcox
2022-05-16 12:53 ` Jeff Layton
2022-05-16 13:43 ` Christoph Hellwig
2 siblings, 0 replies; 4+ messages in thread
From: Matthew Wilcox @ 2022-05-16 12:27 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Dave Chinner, linux-fsdevel, xfs, Christoph Hellwig
On Sun, May 15, 2022 at 08:37:09PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> XFS has the unique behavior (as compared to the other Linux filesystems)
> that on writeback errors it will completely invalidate the affected
> folio and force the page cache to reread the contents from disk. All
> other filesystems leave the page mapped and up to date.
>
> This is a rude awakening for user programs, since (in the case where
> write fails but reread doesn't) file contents will appear to revert to
> old disk contents with no notification other than an EIO on fsync. This
> might have been annoying back in the days when iomap dealt with one page
> at a time, but with multipage folios, we can now throw away *megabytes*
> worth of data for a single write error.
>
> On *most* Linux filesystems, a program can respond to an EIO on write by
> redirtying the entire file and scheduling it for writeback. This isn't
> foolproof, since the page that failed writeback is no longer dirty and
> could be evicted, but programs that want to recover properly *also*
> have to detect XFS and regenerate every write they've made to the file.
>
> When running xfs/314 on arm64, I noticed a UAF bug when xfs_discard_folio
> invalidates multipage folios that could be undergoing writeback. If,
> say, we have a 256K folio caching a mix of written and unwritten
> extents, it's possible that we could start writeback of the first (say)
> 64K of the folio and then hit a writeback error on the next 64K. We
> then free the iop attached to the folio, which is really bad because
> writeback completion on the first 64k will trip over the "blocks per
> folio > 1 && !iop" assertion.
>
> This can't be fixed by only invalidating the folio if writeback fails at
> the start of the folio, since the folio is marked !uptodate, which trips
> other assertions elsewhere. Get rid of the whole behavior entirely.
>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] iomap: don't invalidate folios after writeback errors
2022-05-16 3:37 [PATCH] iomap: don't invalidate folios after writeback errors Darrick J. Wong
2022-05-16 12:27 ` Matthew Wilcox
@ 2022-05-16 12:53 ` Jeff Layton
2022-05-16 13:43 ` Christoph Hellwig
2 siblings, 0 replies; 4+ messages in thread
From: Jeff Layton @ 2022-05-16 12:53 UTC (permalink / raw)
To: Darrick J. Wong, Dave Chinner, Matthew Wilcox
Cc: linux-fsdevel, xfs, Christoph Hellwig
On Sun, 2022-05-15 at 20:37 -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> XFS has the unique behavior (as compared to the other Linux filesystems)
> that on writeback errors it will completely invalidate the affected
> folio and force the page cache to reread the contents from disk. All
> other filesystems leave the page mapped and up to date.
>
> This is a rude awakening for user programs, since (in the case where
> write fails but reread doesn't) file contents will appear to revert to
> old disk contents with no notification other than an EIO on fsync. This
> might have been annoying back in the days when iomap dealt with one page
> at a time, but with multipage folios, we can now throw away *megabytes*
> worth of data for a single write error.
>
> On *most* Linux filesystems, a program can respond to an EIO on write by
> redirtying the entire file and scheduling it for writeback. This isn't
> foolproof, since the page that failed writeback is no longer dirty and
> could be evicted, but programs that want to recover properly *also*
> have to detect XFS and regenerate every write they've made to the file.
>
> When running xfs/314 on arm64, I noticed a UAF bug when xfs_discard_folio
> invalidates multipage folios that could be undergoing writeback. If,
> say, we have a 256K folio caching a mix of written and unwritten
> extents, it's possible that we could start writeback of the first (say)
> 64K of the folio and then hit a writeback error on the next 64K. We
> then free the iop attached to the folio, which is really bad because
> writeback completion on the first 64k will trip over the "blocks per
> folio > 1 && !iop" assertion.
>
> This can't be fixed by only invalidating the folio if writeback fails at
> the start of the folio, since the folio is marked !uptodate, which trips
> other assertions elsewhere. Get rid of the whole behavior entirely.
>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
> fs/iomap/buffered-io.c | 1 -
> fs/xfs/xfs_aops.c | 4 +---
> 2 files changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 8fb9b2797fc5..94b53cbdefad 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1387,7 +1387,6 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> if (wpc->ops->discard_folio)
> wpc->ops->discard_folio(folio, pos);
> if (!count) {
> - folio_clear_uptodate(folio);
> folio_unlock(folio);
> goto done;
> }
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 90b7f4d127de..f6216d0fb0c2 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -464,7 +464,7 @@ xfs_discard_folio(
> int error;
>
> if (xfs_is_shutdown(mp))
> - goto out_invalidate;
> + return;
>
> xfs_alert_ratelimited(mp,
> "page discard on page "PTR_FMT", inode 0x%llx, pos %llu.",
> @@ -474,8 +474,6 @@ xfs_discard_folio(
> i_blocks_per_folio(inode, folio) - pageoff_fsb);
> if (error && !xfs_is_shutdown(mp))
> xfs_alert(mp, "page discard unable to remove delalloc mapping.");
> -out_invalidate:
> - iomap_invalidate_folio(folio, offset, folio_size(folio) - offset);
> }
>
> static const struct iomap_writeback_ops xfs_writeback_ops = {
Nice to start bringing some consistency to this behavior across the
kernel!
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] iomap: don't invalidate folios after writeback errors
2022-05-16 3:37 [PATCH] iomap: don't invalidate folios after writeback errors Darrick J. Wong
2022-05-16 12:27 ` Matthew Wilcox
2022-05-16 12:53 ` Jeff Layton
@ 2022-05-16 13:43 ` Christoph Hellwig
2 siblings, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2022-05-16 13:43 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Dave Chinner, Matthew Wilcox, linux-fsdevel, xfs, Christoph Hellwig
The changes looks good to me:
Reviewed-by: Christoph Hellwig <hch@lst.de>
I did a little digging and the invalidatepage call first appeared
in
commit 90ce6b6a90f62578a141359ffe9261e65f2c5265
Author: Steve Lord <lord@sgi.com>
Date: Mon Sep 2 12:36:59 2002 +0000
move page_buf_io.c to xfs_aops.c
The old page_buf_io.c file seems to not have made it through to
git, so this seems rather prehistoric.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-05-16 13:43 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-16 3:37 [PATCH] iomap: don't invalidate folios after writeback errors Darrick J. Wong
2022-05-16 12:27 ` Matthew Wilcox
2022-05-16 12:53 ` Jeff Layton
2022-05-16 13:43 ` Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.