All of lore.kernel.org
 help / color / mirror / Atom feed
* writeback fixlets and tidyups v2
@ 2023-05-31  6:04 Christoph Hellwig
  2023-05-31  6:04 ` [PATCH 01/16] btrfs: fix range_end calculation in extent_write_locked_range Christoph Hellwig
                   ` (17 more replies)
  0 siblings, 18 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-05-31  6:04 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Hi all,

this little series fixes a few minors bugs found by code inspection in the
writeback code, and then goes on to remove the use of PageError.

This replaces the earlier version in the for-next branch.

Changes:
 - rebased to the latest misc-next tree
 - remove a set but unused variable in pages_processed
 - remove a spurious folio_set_error
 - return bool from btrfs_verify_page
 - update various commit message

Subject:
 compression.c |    2 
 extent_io.c   |  359 +++++++++++++++++++---------------------------------------
 extent_io.h   |    6 
 inode.c       |  180 ++++++++++++-----------------
 subpage.c     |   35 -----
 subpage.h     |   10 -
 6 files changed, 209 insertions(+), 383 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 01/16] btrfs: fix range_end calculation in extent_write_locked_range
  2023-05-31  6:04 writeback fixlets and tidyups v2 Christoph Hellwig
@ 2023-05-31  6:04 ` Christoph Hellwig
  2023-05-31  6:04 ` [PATCH 02/16] btrfs: factor out a btrfs_verify_page helper Christoph Hellwig
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-05-31  6:04 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

The range_end field in struct writeback_control is inclusive, just like
the end parameter passed to extent_write_locked_range.  Not doing this
could cause extra writeout, which is harmless but suboptimal.

Fixes: 771ed689d2cd ("Btrfs: Optimize compressed writeback and reads")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d5b3b15f7ab222..0726c82db309a2 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2310,7 +2310,7 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end)
 	struct writeback_control wbc_writepages = {
 		.sync_mode	= WB_SYNC_ALL,
 		.range_start	= start,
-		.range_end	= end + 1,
+		.range_end	= end,
 		.no_cgroup_owner = 1,
 	};
 	struct btrfs_bio_ctrl bio_ctrl = {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 02/16] btrfs: factor out a btrfs_verify_page helper
  2023-05-31  6:04 writeback fixlets and tidyups v2 Christoph Hellwig
  2023-05-31  6:04 ` [PATCH 01/16] btrfs: fix range_end calculation in extent_write_locked_range Christoph Hellwig
@ 2023-05-31  6:04 ` Christoph Hellwig
  2023-05-31  6:04 ` [PATCH 03/16] btrfs: fix fsverify read error handling in end_page_read Christoph Hellwig
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-05-31  6:04 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Split all the conditionals for the fsverity calls in end_page_read into
a btrfs_verify_page helper to keep the code readable and make additional
refacoring easier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0726c82db309a2..8e42ce48b52e4a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -481,6 +481,15 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 			       start, end, page_ops, NULL);
 }
 
+static bool btrfs_verify_page(struct page *page, u64 start)
+{
+	if (!fsverity_active(page->mapping->host) ||
+	    PageError(page) || PageUptodate(page) ||
+	    start >= i_size_read(page->mapping->host))
+		return true;
+	return fsverity_verify_page(page);
+}
+
 static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
@@ -489,11 +498,7 @@ static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
 	       start + len <= page_offset(page) + PAGE_SIZE);
 
 	if (uptodate) {
-		if (fsverity_active(page->mapping->host) &&
-		    !PageError(page) &&
-		    !PageUptodate(page) &&
-		    start < i_size_read(page->mapping->host) &&
-		    !fsverity_verify_page(page)) {
+		if (!btrfs_verify_page(page, start)) {
 			btrfs_page_set_error(fs_info, page, start, len);
 		} else {
 			btrfs_page_set_uptodate(fs_info, page, start, len);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 03/16] btrfs: fix fsverify read error handling in end_page_read
  2023-05-31  6:04 writeback fixlets and tidyups v2 Christoph Hellwig
  2023-05-31  6:04 ` [PATCH 01/16] btrfs: fix range_end calculation in extent_write_locked_range Christoph Hellwig
  2023-05-31  6:04 ` [PATCH 02/16] btrfs: factor out a btrfs_verify_page helper Christoph Hellwig
@ 2023-05-31  6:04 ` Christoph Hellwig
  2023-05-31  6:04 ` [PATCH 04/16] btrfs: don't check PageError in btrfs_verify_page Christoph Hellwig
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-05-31  6:04 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Also clear the uptodate bit to make sure the page isn't seen as uptodate
in the page cache if fsverity verification fails.

Fixes: 146054090b08 ("btrfs: initial fsverity support")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8e42ce48b52e4a..a943a66224891a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -497,12 +497,8 @@ static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
 	ASSERT(page_offset(page) <= start &&
 	       start + len <= page_offset(page) + PAGE_SIZE);
 
-	if (uptodate) {
-		if (!btrfs_verify_page(page, start)) {
-			btrfs_page_set_error(fs_info, page, start, len);
-		} else {
-			btrfs_page_set_uptodate(fs_info, page, start, len);
-		}
+	if (uptodate && btrfs_verify_page(page, start)) {
+		btrfs_page_set_uptodate(fs_info, page, start, len);
 	} else {
 		btrfs_page_clear_uptodate(fs_info, page, start, len);
 		btrfs_page_set_error(fs_info, page, start, len);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 04/16] btrfs: don't check PageError in btrfs_verify_page
  2023-05-31  6:04 writeback fixlets and tidyups v2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2023-05-31  6:04 ` [PATCH 03/16] btrfs: fix fsverify read error handling in end_page_read Christoph Hellwig
@ 2023-05-31  6:04 ` Christoph Hellwig
  2023-05-31  6:04 ` [PATCH 05/16] btrfs: don't fail writeback when allocating the compression context fails Christoph Hellwig
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-05-31  6:04 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

btrfs_verify_page is called from the readpage completion handler, which
is only used to read pages, or parts of pages that aren't uptodate yet.
The only case where PageError could be set on a page in btrfs is if we
had a previous writeback error, but in that case we won't called readpage
on it, as it has previously been marked uptodate.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a943a66224891a..178e8230c28af9 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -484,7 +484,7 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 static bool btrfs_verify_page(struct page *page, u64 start)
 {
 	if (!fsverity_active(page->mapping->host) ||
-	    PageError(page) || PageUptodate(page) ||
+	    PageUptodate(page) ||
 	    start >= i_size_read(page->mapping->host))
 		return true;
 	return fsverity_verify_page(page);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 05/16] btrfs: don't fail writeback when allocating the compression context fails
  2023-05-31  6:04 writeback fixlets and tidyups v2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2023-05-31  6:04 ` [PATCH 04/16] btrfs: don't check PageError in btrfs_verify_page Christoph Hellwig
@ 2023-05-31  6:04 ` Christoph Hellwig
  2023-05-31  6:04 ` [PATCH 06/16] btrfs: rename cow_file_range_async to run_delalloc_compressed Christoph Hellwig
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-05-31  6:04 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

If cow_file_range_async fails to allocate the asynchronous writeback
context, it currently returns an error and entirely fails the writeback.
This is not a good idea as a writeback failure is a non-temporary error
condition that will make the file system unusuable.  Just fall back to
synchronous uncompressed writeback instead.  This requires us to delay
setting the BTRFS_INODE_HAS_ASYNC_EXTENT flag until we've committed to
the async writeback.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/inode.c | 74 ++++++++++++++++++------------------------------
 1 file changed, 28 insertions(+), 46 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e05978cb89cdce..b4b6bd621264cf 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1720,58 +1720,36 @@ static noinline void async_cow_free(struct btrfs_work *work)
 		kvfree(async_cow);
 }
 
-static int cow_file_range_async(struct btrfs_inode *inode,
-				struct writeback_control *wbc,
-				struct page *locked_page,
-				u64 start, u64 end, int *page_started,
-				unsigned long *nr_written)
+static bool cow_file_range_async(struct btrfs_inode *inode,
+				 struct writeback_control *wbc,
+				 struct page *locked_page,
+				 u64 start, u64 end, int *page_started,
+				 unsigned long *nr_written)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	struct cgroup_subsys_state *blkcg_css = wbc_blkcg_css(wbc);
 	struct async_cow *ctx;
 	struct async_chunk *async_chunk;
 	unsigned long nr_pages;
-	u64 cur_end;
 	u64 num_chunks = DIV_ROUND_UP(end - start, SZ_512K);
 	int i;
-	bool should_compress;
 	unsigned nofs_flag;
 	const blk_opf_t write_flags = wbc_to_write_flags(wbc);
 
-	unlock_extent(&inode->io_tree, start, end, NULL);
-
-	if (inode->flags & BTRFS_INODE_NOCOMPRESS &&
-	    !btrfs_test_opt(fs_info, FORCE_COMPRESS)) {
-		num_chunks = 1;
-		should_compress = false;
-	} else {
-		should_compress = true;
-	}
-
 	nofs_flag = memalloc_nofs_save();
 	ctx = kvmalloc(struct_size(ctx, chunks, num_chunks), GFP_KERNEL);
 	memalloc_nofs_restore(nofs_flag);
+	if (!ctx)
+		return false;
 
-	if (!ctx) {
-		unsigned clear_bits = EXTENT_LOCKED | EXTENT_DELALLOC |
-			EXTENT_DELALLOC_NEW | EXTENT_DEFRAG |
-			EXTENT_DO_ACCOUNTING;
-		unsigned long page_ops = PAGE_UNLOCK | PAGE_START_WRITEBACK |
-					 PAGE_END_WRITEBACK | PAGE_SET_ERROR;
-
-		extent_clear_unlock_delalloc(inode, start, end, locked_page,
-					     clear_bits, page_ops);
-		return -ENOMEM;
-	}
+	unlock_extent(&inode->io_tree, start, end, NULL);
+	set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &inode->runtime_flags);
 
 	async_chunk = ctx->chunks;
 	atomic_set(&ctx->num_chunks, num_chunks);
 
 	for (i = 0; i < num_chunks; i++) {
-		if (should_compress)
-			cur_end = min(end, start + SZ_512K - 1);
-		else
-			cur_end = end;
+		u64 cur_end = min(end, start + SZ_512K - 1);
 
 		/*
 		 * igrab is called higher up in the call chain, take only the
@@ -1832,7 +1810,7 @@ static int cow_file_range_async(struct btrfs_inode *inode,
 		start = cur_end + 1;
 	}
 	*page_started = 1;
-	return 0;
+	return true;
 }
 
 static noinline int run_delalloc_zoned(struct btrfs_inode *inode,
@@ -2413,8 +2391,8 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page
 		u64 start, u64 end, int *page_started, unsigned long *nr_written,
 		struct writeback_control *wbc)
 {
-	int ret;
 	const bool zoned = btrfs_is_zoned(inode->root->fs_info);
+	int ret = 0;
 
 	/*
 	 * The range must cover part of the @locked_page, or the returned
@@ -2434,19 +2412,23 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page
 		ASSERT(!zoned || btrfs_is_data_reloc_root(inode->root));
 		ret = run_delalloc_nocow(inode, locked_page, start, end,
 					 page_started, nr_written);
-	} else if (!btrfs_inode_can_compress(inode) ||
-		   !inode_need_compress(inode, start, end)) {
-		if (zoned)
-			ret = run_delalloc_zoned(inode, locked_page, start, end,
-						 page_started, nr_written);
-		else
-			ret = cow_file_range(inode, locked_page, start, end,
-					     page_started, nr_written, 1, NULL);
-	} else {
-		set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &inode->runtime_flags);
-		ret = cow_file_range_async(inode, wbc, locked_page, start, end,
-					   page_started, nr_written);
+		goto out;
 	}
+
+	if (btrfs_inode_can_compress(inode) &&
+	    inode_need_compress(inode, start, end) &&
+	    cow_file_range_async(inode, wbc, locked_page, start,
+				 end, page_started, nr_written))
+		goto out;
+
+	if (zoned)
+		ret = run_delalloc_zoned(inode, locked_page, start, end,
+					 page_started, nr_written);
+	else
+		ret = cow_file_range(inode, locked_page, start, end,
+				     page_started, nr_written, 1, NULL);
+
+out:
 	ASSERT(ret <= 0);
 	if (ret)
 		btrfs_cleanup_ordered_extents(inode, locked_page, start,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 06/16] btrfs: rename cow_file_range_async to run_delalloc_compressed
  2023-05-31  6:04 writeback fixlets and tidyups v2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2023-05-31  6:04 ` [PATCH 05/16] btrfs: don't fail writeback when allocating the compression context fails Christoph Hellwig
@ 2023-05-31  6:04 ` Christoph Hellwig
  2023-05-31  6:04 ` [PATCH 07/16] btrfs: don't check PageError in __extent_writepage Christoph Hellwig
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-05-31  6:04 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs, Anand Jain

cow_file_range_async is only used for compressed writeback.  Rename it
to run_delalloc_compressed, which also fits in with run_delalloc_nocow
and run_delalloc_zoned.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/inode.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b4b6bd621264cf..9a7c0564fb7160 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1693,7 +1693,7 @@ static noinline void async_cow_submit(struct btrfs_work *work)
 	 * ->inode could be NULL if async_chunk_start has failed to compress,
 	 * in which case we don't have anything to submit, yet we need to
 	 * always adjust ->async_delalloc_pages as its paired with the init
-	 * happening in cow_file_range_async
+	 * happening in run_delalloc_compressed
 	 */
 	if (async_chunk->inode)
 		submit_compressed_extents(async_chunk);
@@ -1720,11 +1720,11 @@ static noinline void async_cow_free(struct btrfs_work *work)
 		kvfree(async_cow);
 }
 
-static bool cow_file_range_async(struct btrfs_inode *inode,
-				 struct writeback_control *wbc,
-				 struct page *locked_page,
-				 u64 start, u64 end, int *page_started,
-				 unsigned long *nr_written)
+static bool run_delalloc_compressed(struct btrfs_inode *inode,
+				    struct writeback_control *wbc,
+				    struct page *locked_page,
+				    u64 start, u64 end, int *page_started,
+				    unsigned long *nr_written)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	struct cgroup_subsys_state *blkcg_css = wbc_blkcg_css(wbc);
@@ -2417,8 +2417,8 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page
 
 	if (btrfs_inode_can_compress(inode) &&
 	    inode_need_compress(inode, start, end) &&
-	    cow_file_range_async(inode, wbc, locked_page, start,
-				 end, page_started, nr_written))
+	    run_delalloc_compressed(inode, wbc, locked_page, start,
+				    end, page_started, nr_written))
 		goto out;
 
 	if (zoned)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 07/16] btrfs: don't check PageError in __extent_writepage
  2023-05-31  6:04 writeback fixlets and tidyups v2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2023-05-31  6:04 ` [PATCH 06/16] btrfs: rename cow_file_range_async to run_delalloc_compressed Christoph Hellwig
@ 2023-05-31  6:04 ` Christoph Hellwig
  2023-05-31  6:04 ` [PATCH 08/16] btrfs: stop setting PageError in the data I/O path Christoph Hellwig
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-05-31  6:04 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

__extent_writepage currenly sets PageError whenever any error happens,
and the also checks for PageError to decide if to call error handling.
This leads to very unclear responsibility for cleaning up on errors.
In the VM and generic writeback helpers the basic idea is that once
I/O is fired off all error handling responsibility is delegated to the
end I/O handler.  But if that end I/O handler sets the PageError bit,
and the submitter checks it, the bit could in some cases leak into the
submission context for fast enough I/O.

Fix this by simply not checking PageError and just using the local
ret variable to check for submission errors.  This also fundamentally
solves the long problem documented in a comment in __extent_writepage
by never leaking the error bit into the submission context.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 33 +--------------------------------
 1 file changed, 1 insertion(+), 32 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 178e8230c28af9..33e5f0a31c21a7 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1557,38 +1557,7 @@ static int __extent_writepage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl
 		set_page_writeback(page);
 		end_page_writeback(page);
 	}
-	/*
-	 * Here we used to have a check for PageError() and then set @ret and
-	 * call end_extent_writepage().
-	 *
-	 * But in fact setting @ret here will cause different error paths
-	 * between subpage and regular sectorsize.
-	 *
-	 * For regular page size, we never submit current page, but only add
-	 * current page to current bio.
-	 * The bio submission can only happen in next page.
-	 * Thus if we hit the PageError() branch, @ret is already set to
-	 * non-zero value and will not get updated for regular sectorsize.
-	 *
-	 * But for subpage case, it's possible we submit part of current page,
-	 * thus can get PageError() set by submitted bio of the same page,
-	 * while our @ret is still 0.
-	 *
-	 * So here we unify the behavior and don't set @ret.
-	 * Error can still be properly passed to higher layer as page will
-	 * be set error, here we just don't handle the IO failure.
-	 *
-	 * NOTE: This is just a hotfix for subpage.
-	 * The root fix will be properly ending ordered extent when we hit
-	 * an error during writeback.
-	 *
-	 * But that needs a bigger refactoring, as we not only need to grab the
-	 * submitted OE, but also need to know exactly at which bytenr we hit
-	 * the error.
-	 * Currently the full page based __extent_writepage_io() is not
-	 * capable of that.
-	 */
-	if (PageError(page))
+	if (ret)
 		end_extent_writepage(page, ret, page_start, page_end);
 	if (bio_ctrl->extent_locked) {
 		struct writeback_control *wbc = bio_ctrl->wbc;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 08/16] btrfs: stop setting PageError in the data I/O path
  2023-05-31  6:04 writeback fixlets and tidyups v2 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2023-05-31  6:04 ` [PATCH 07/16] btrfs: don't check PageError in __extent_writepage Christoph Hellwig
@ 2023-05-31  6:04 ` Christoph Hellwig
  2023-05-31  6:04 ` [PATCH 09/16] btrfs: remove PAGE_SET_ERROR Christoph Hellwig
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-05-31  6:04 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

PageError is not used by the VFS/MM and deprecated because it uses up a
page bit and has no coherent rules.  Instead read errors are usually
propagated by not setting or clearing the uptodate bit, and write errors
are propagated through the address_space.  Btrfs now only sets the flag
and never clears it for data pages, so just remove all places setting it,
and the subpage error bit.

Note that the error propagation for superblock writes that work on the
block device mapping still uses PageError for now, but that will be
addressed in a separate series.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/compression.c |  2 --
 fs/btrfs/extent_io.c   | 24 +++++-------------------
 fs/btrfs/inode.c       |  3 ---
 fs/btrfs/subpage.c     | 35 -----------------------------------
 fs/btrfs/subpage.h     | 10 ++++------
 5 files changed, 9 insertions(+), 65 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 04cd5de4f00f60..bb2d1a4ceca12f 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -211,8 +211,6 @@ static noinline void end_compressed_writeback(const struct compressed_bio *cb)
 		for (i = 0; i < ret; i++) {
 			struct folio *folio = fbatch.folios[i];
 
-			if (errno)
-				folio_set_error(folio);
 			btrfs_page_clamp_clear_writeback(fs_info, &folio->page,
 							 cb->start, cb->len);
 		}
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 33e5f0a31c21a7..b7f26a4bb663cd 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -223,8 +223,6 @@ static int process_one_page(struct btrfs_fs_info *fs_info,
 
 	if (page_ops & PAGE_SET_ORDERED)
 		btrfs_page_clamp_set_ordered(fs_info, page, start, len);
-	if (page_ops & PAGE_SET_ERROR)
-		btrfs_page_clamp_set_error(fs_info, page, start, len);
 	if (page_ops & PAGE_START_WRITEBACK) {
 		btrfs_page_clamp_clear_dirty(fs_info, page, start, len);
 		btrfs_page_clamp_set_writeback(fs_info, page, start, len);
@@ -497,12 +495,10 @@ static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
 	ASSERT(page_offset(page) <= start &&
 	       start + len <= page_offset(page) + PAGE_SIZE);
 
-	if (uptodate && btrfs_verify_page(page, start)) {
+	if (uptodate && btrfs_verify_page(page, start))
 		btrfs_page_set_uptodate(fs_info, page, start, len);
-	} else {
+	else
 		btrfs_page_clear_uptodate(fs_info, page, start, len);
-		btrfs_page_set_error(fs_info, page, start, len);
-	}
 
 	if (!btrfs_is_subpage(fs_info, page))
 		unlock_page(page);
@@ -530,7 +526,6 @@ void end_extent_writepage(struct page *page, int err, u64 start, u64 end)
 		len = end + 1 - start;
 
 		btrfs_page_clear_uptodate(fs_info, page, start, len);
-		btrfs_page_set_error(fs_info, page, start, len);
 		ret = err < 0 ? err : -EIO;
 		mapping_set_error(page->mapping, ret);
 	}
@@ -1059,7 +1054,6 @@ static int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 	ret = set_page_extent_mapped(page);
 	if (ret < 0) {
 		unlock_extent(tree, start, end, NULL);
-		btrfs_page_set_error(fs_info, page, start, PAGE_SIZE);
 		unlock_page(page);
 		return ret;
 	}
@@ -1263,11 +1257,9 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 		}
 		ret = btrfs_run_delalloc_range(inode, page, delalloc_start,
 				delalloc_end, &page_started, &nr_written, wbc);
-		if (ret) {
-			btrfs_page_set_error(inode->root->fs_info, page,
-					     page_offset(page), PAGE_SIZE);
+		if (ret)
 			return ret;
-		}
+
 		/*
 		 * delalloc_end is already one less than the total length, so
 		 * we don't subtract one from PAGE_SIZE
@@ -1420,7 +1412,6 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 
 		em = btrfs_get_extent(inode, NULL, 0, cur, end - cur + 1);
 		if (IS_ERR(em)) {
-			btrfs_page_set_error(fs_info, page, cur, end - cur + 1);
 			ret = PTR_ERR_OR_ZERO(em);
 			goto out_error;
 		}
@@ -1519,9 +1510,6 @@ static int __extent_writepage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl
 
 	WARN_ON(!PageLocked(page));
 
-	btrfs_page_clear_error(btrfs_sb(inode->i_sb), page,
-			       page_offset(page), PAGE_SIZE);
-
 	pg_offset = offset_in_page(i_size);
 	if (page->index > end_index ||
 	   (page->index == end_index && !pg_offset)) {
@@ -1534,10 +1522,8 @@ static int __extent_writepage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl
 		memzero_page(page, pg_offset, PAGE_SIZE - pg_offset);
 
 	ret = set_page_extent_mapped(page);
-	if (ret < 0) {
-		SetPageError(page);
+	if (ret < 0)
 		goto done;
-	}
 
 	if (!bio_ctrl->extent_locked) {
 		ret = writepage_delalloc(BTRFS_I(inode), page, bio_ctrl->wbc);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 9a7c0564fb7160..2e1e1f18649d90 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1153,8 +1153,6 @@ static int submit_uncompressed_range(struct btrfs_inode *inode,
 			const u64 page_start = page_offset(locked_page);
 			const u64 page_end = page_start + PAGE_SIZE - 1;
 
-			btrfs_page_set_error(inode->root->fs_info, locked_page,
-					     page_start, PAGE_SIZE);
 			set_page_writeback(locked_page);
 			end_page_writeback(locked_page);
 			end_extent_writepage(locked_page, ret, page_start, page_end);
@@ -2942,7 +2940,6 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
 		mapping_set_error(page->mapping, ret);
 		end_extent_writepage(page, ret, page_start, page_end);
 		clear_page_dirty_for_io(page);
-		SetPageError(page);
 	}
 	btrfs_page_clear_checked(inode->root->fs_info, page, page_start, PAGE_SIZE);
 	unlock_page(page);
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 74bc43040531fd..1b999c6e419307 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -100,9 +100,6 @@ void btrfs_init_subpage_info(struct btrfs_subpage_info *subpage_info, u32 sector
 	subpage_info->uptodate_offset = cur;
 	cur += nr_bits;
 
-	subpage_info->error_offset = cur;
-	cur += nr_bits;
-
 	subpage_info->dirty_offset = cur;
 	cur += nr_bits;
 
@@ -416,35 +413,6 @@ void btrfs_subpage_clear_uptodate(const struct btrfs_fs_info *fs_info,
 	spin_unlock_irqrestore(&subpage->lock, flags);
 }
 
-void btrfs_subpage_set_error(const struct btrfs_fs_info *fs_info,
-		struct page *page, u64 start, u32 len)
-{
-	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
-	unsigned int start_bit = subpage_calc_start_bit(fs_info, page,
-							error, start, len);
-	unsigned long flags;
-
-	spin_lock_irqsave(&subpage->lock, flags);
-	bitmap_set(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits);
-	SetPageError(page);
-	spin_unlock_irqrestore(&subpage->lock, flags);
-}
-
-void btrfs_subpage_clear_error(const struct btrfs_fs_info *fs_info,
-		struct page *page, u64 start, u32 len)
-{
-	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
-	unsigned int start_bit = subpage_calc_start_bit(fs_info, page,
-							error, start, len);
-	unsigned long flags;
-
-	spin_lock_irqsave(&subpage->lock, flags);
-	bitmap_clear(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits);
-	if (subpage_test_bitmap_all_zero(fs_info, subpage, error))
-		ClearPageError(page);
-	spin_unlock_irqrestore(&subpage->lock, flags);
-}
-
 void btrfs_subpage_set_dirty(const struct btrfs_fs_info *fs_info,
 		struct page *page, u64 start, u32 len)
 {
@@ -606,7 +574,6 @@ bool btrfs_subpage_test_##name(const struct btrfs_fs_info *fs_info,	\
 	return ret;							\
 }
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(uptodate);
-IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(error);
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(dirty);
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(writeback);
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(ordered);
@@ -674,7 +641,6 @@ bool btrfs_page_clamp_test_##name(const struct btrfs_fs_info *fs_info,	\
 }
 IMPLEMENT_BTRFS_PAGE_OPS(uptodate, SetPageUptodate, ClearPageUptodate,
 			 PageUptodate);
-IMPLEMENT_BTRFS_PAGE_OPS(error, SetPageError, ClearPageError, PageError);
 IMPLEMENT_BTRFS_PAGE_OPS(dirty, set_page_dirty, clear_page_dirty_for_io,
 			 PageDirty);
 IMPLEMENT_BTRFS_PAGE_OPS(writeback, set_page_writeback, end_page_writeback,
@@ -769,7 +735,6 @@ void __cold btrfs_subpage_dump_bitmap(const struct btrfs_fs_info *fs_info,
 
 	spin_lock_irqsave(&subpage->lock, flags);
 	GET_SUBPAGE_BITMAP(subpage, subpage_info, uptodate, &uptodate_bitmap);
-	GET_SUBPAGE_BITMAP(subpage, subpage_info, error, &error_bitmap);
 	GET_SUBPAGE_BITMAP(subpage, subpage_info, dirty, &dirty_bitmap);
 	GET_SUBPAGE_BITMAP(subpage, subpage_info, writeback, &writeback_bitmap);
 	GET_SUBPAGE_BITMAP(subpage, subpage_info, ordered, &ordered_bitmap);
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 5905caea840970..5cbf67ccbdeb1d 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -8,17 +8,17 @@
 /*
  * Extra info for subpapge bitmap.
  *
- * For subpage we pack all uptodate/error/dirty/writeback/ordered bitmaps into
+ * For subpage we pack all uptodate/dirty/writeback/ordered bitmaps into
  * one larger bitmap.
  *
  * This structure records how they are organized in the bitmap:
  *
- * /- uptodate_offset	/- error_offset	/- dirty_offset
+ * /- uptodate_offset	/- dirty_offset	/- ordered_offset
  * |			|		|
  * v			v		v
- * |u|u|u|u|........|u|u|e|e|.......|e|e| ...	|o|o|
+ * |u|u|u|u|........|u|u|d|d|.......|d|d|o|o|.......|o|o|
  * |<- bitmap_nr_bits ->|
- * |<--------------- total_nr_bits ---------------->|
+ * |<----------------- total_nr_bits ------------------>|
  */
 struct btrfs_subpage_info {
 	/* Number of bits for each bitmap */
@@ -32,7 +32,6 @@ struct btrfs_subpage_info {
 	 * @bitmap_size, which is calculated from PAGE_SIZE / sectorsize.
 	 */
 	unsigned int uptodate_offset;
-	unsigned int error_offset;
 	unsigned int dirty_offset;
 	unsigned int writeback_offset;
 	unsigned int ordered_offset;
@@ -141,7 +140,6 @@ bool btrfs_page_clamp_test_##name(const struct btrfs_fs_info *fs_info,	\
 		struct page *page, u64 start, u32 len);
 
 DECLARE_BTRFS_SUBPAGE_OPS(uptodate);
-DECLARE_BTRFS_SUBPAGE_OPS(error);
 DECLARE_BTRFS_SUBPAGE_OPS(dirty);
 DECLARE_BTRFS_SUBPAGE_OPS(writeback);
 DECLARE_BTRFS_SUBPAGE_OPS(ordered);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 09/16] btrfs: remove PAGE_SET_ERROR
  2023-05-31  6:04 writeback fixlets and tidyups v2 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2023-05-31  6:04 ` [PATCH 08/16] btrfs: stop setting PageError in the data I/O path Christoph Hellwig
@ 2023-05-31  6:04 ` Christoph Hellwig
  2023-05-31  6:04 ` [PATCH 10/16] btrfs: remove non-standard extent handling in __extent_writepage_io Christoph Hellwig
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-05-31  6:04 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Now that the btrfs writeback code has stopped using PageError, using
PAGE_SET_ERROR to just set the per-address_space error flag is confusing.
Just open code the mapping_set_error calls in the callers and remove
the PAGE_SET_ERROR flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c |  3 ---
 fs/btrfs/extent_io.h |  1 -
 fs/btrfs/inode.c     | 11 ++++++-----
 3 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index b7f26a4bb663cd..09a9973c27ccfb 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -268,9 +268,6 @@ static int __process_pages_contig(struct address_space *mapping,
 		ASSERT(processed_end && *processed_end == start);
 	}
 
-	if ((page_ops & PAGE_SET_ERROR) && start_index <= end_index)
-		mapping_set_error(mapping, -EIO);
-
 	folio_batch_init(&fbatch);
 	while (index <= end_index) {
 		int found_folios;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 2d91ca91dca5c1..6723bf3483d9f9 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -40,7 +40,6 @@ enum {
 	ENUM_BIT(PAGE_START_WRITEBACK),
 	ENUM_BIT(PAGE_END_WRITEBACK),
 	ENUM_BIT(PAGE_SET_ORDERED),
-	ENUM_BIT(PAGE_SET_ERROR),
 	ENUM_BIT(PAGE_LOCK),
 };
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2e1e1f18649d90..ea9e880c8cee76 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -835,6 +835,7 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
 {
 	struct btrfs_inode *inode = async_chunk->inode;
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
+	struct address_space *mapping = inode->vfs_inode.i_mapping;
 	u64 blocksize = fs_info->sectorsize;
 	u64 start = async_chunk->start;
 	u64 end = async_chunk->end;
@@ -949,7 +950,7 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
 		/* Compression level is applied here and only here */
 		ret = btrfs_compress_pages(
 			compress_type | (fs_info->compress_level << 4),
-					   inode->vfs_inode.i_mapping, start,
+					   mapping, start,
 					   pages,
 					   &nr_pages,
 					   &total_in,
@@ -992,9 +993,9 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
 			unsigned long clear_flags = EXTENT_DELALLOC |
 				EXTENT_DELALLOC_NEW | EXTENT_DEFRAG |
 				EXTENT_DO_ACCOUNTING;
-			unsigned long page_error_op;
 
-			page_error_op = ret < 0 ? PAGE_SET_ERROR : 0;
+			if (ret < 0)
+				mapping_set_error(mapping, -EIO);
 
 			/*
 			 * inline extent creation worked or returned error,
@@ -1011,7 +1012,6 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
 						     clear_flags,
 						     PAGE_UNLOCK |
 						     PAGE_START_WRITEBACK |
-						     page_error_op |
 						     PAGE_END_WRITEBACK);
 
 			/*
@@ -1271,12 +1271,13 @@ static int submit_one_async_extent(struct btrfs_inode *inode,
 	btrfs_dec_block_group_reservations(fs_info, ins.objectid);
 	btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset, 1);
 out_free:
+	mapping_set_error(inode->vfs_inode.i_mapping, -EIO);
 	extent_clear_unlock_delalloc(inode, start, end,
 				     NULL, EXTENT_LOCKED | EXTENT_DELALLOC |
 				     EXTENT_DELALLOC_NEW |
 				     EXTENT_DEFRAG | EXTENT_DO_ACCOUNTING,
 				     PAGE_UNLOCK | PAGE_START_WRITEBACK |
-				     PAGE_END_WRITEBACK | PAGE_SET_ERROR);
+				     PAGE_END_WRITEBACK);
 	free_async_extent_pages(async_extent);
 	goto done;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 10/16] btrfs: remove non-standard extent handling in __extent_writepage_io
  2023-05-31  6:04 writeback fixlets and tidyups v2 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2023-05-31  6:04 ` [PATCH 09/16] btrfs: remove PAGE_SET_ERROR Christoph Hellwig
@ 2023-05-31  6:04 ` Christoph Hellwig
  2024-02-10  9:39   ` Qu Wenruo
  2023-05-31  6:05 ` [PATCH 11/16] btrfs: move nr_to_write to __extent_writepage Christoph Hellwig
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2023-05-31  6:04 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

__extent_writepage_io is never called for compressed or inline extents,
or holes.  Remove the not quite working code for them and replace it with
asserts that these cases don't happen.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 23 +++++------------------
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 09a9973c27ccfb..a2e1dbd9b92309 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1361,7 +1361,6 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 	struct extent_map *em;
 	int ret = 0;
 	int nr = 0;
-	bool compressed;
 
 	ret = btrfs_writepage_cow_fixup(page);
 	if (ret) {
@@ -1419,10 +1418,14 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 		ASSERT(cur < end);
 		ASSERT(IS_ALIGNED(em->start, fs_info->sectorsize));
 		ASSERT(IS_ALIGNED(em->len, fs_info->sectorsize));
+
 		block_start = em->block_start;
-		compressed = test_bit(EXTENT_FLAG_COMPRESSED, &em->flags);
 		disk_bytenr = em->block_start + extent_offset;
 
+		ASSERT(!test_bit(EXTENT_FLAG_COMPRESSED, &em->flags));
+		ASSERT(block_start != EXTENT_MAP_HOLE);
+		ASSERT(block_start != EXTENT_MAP_INLINE);
+
 		/*
 		 * Note that em_end from extent_map_end() and dirty_range_end from
 		 * find_next_dirty_byte() are all exclusive
@@ -1431,22 +1434,6 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 		free_extent_map(em);
 		em = NULL;
 
-		/*
-		 * compressed and inline extents are written through other
-		 * paths in the FS
-		 */
-		if (compressed || block_start == EXTENT_MAP_HOLE ||
-		    block_start == EXTENT_MAP_INLINE) {
-			if (compressed)
-				nr++;
-			else
-				btrfs_writepage_endio_finish_ordered(inode,
-						page, cur, cur + iosize - 1, true);
-			btrfs_page_clear_dirty(fs_info, page, cur, iosize);
-			cur += iosize;
-			continue;
-		}
-
 		btrfs_set_range_writeback(inode, cur, cur + iosize - 1);
 		if (!PageWriteback(page)) {
 			btrfs_err(inode->root->fs_info,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 11/16] btrfs: move nr_to_write to __extent_writepage
  2023-05-31  6:04 writeback fixlets and tidyups v2 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2023-05-31  6:04 ` [PATCH 10/16] btrfs: remove non-standard extent handling in __extent_writepage_io Christoph Hellwig
@ 2023-05-31  6:05 ` Christoph Hellwig
  2023-05-31  6:05 ` [PATCH 12/16] btrfs: only call __extent_writepage_io from extent_write_locked_range Christoph Hellwig
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-05-31  6:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs, Anand Jain

Move the nr_to_write accounting from __extent_writepage_io to
__extent_writepage_io as we'll grow another __extent_writepage_io that
doesn't want this accounting soon.  Also drop the obsolete comment -
decrementing a counter in the on-stack writeback_control data structure
doesn't need the page lock.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a2e1dbd9b92309..f773ee12ce1cee 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1370,12 +1370,6 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 		return 1;
 	}
 
-	/*
-	 * we don't want to touch the inode after unlocking the page,
-	 * so we update the mapping writeback index now
-	 */
-	bio_ctrl->wbc->nr_to_write--;
-
 	bio_ctrl->end_io_func = end_bio_extent_writepage;
 	while (cur <= end) {
 		u64 disk_bytenr;
@@ -1521,6 +1515,8 @@ static int __extent_writepage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl
 	if (ret == 1)
 		return 0;
 
+	bio_ctrl->wbc->nr_to_write--;
+
 done:
 	if (nr == 0) {
 		/* make sure the mapping tag for page dirty gets cleared */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 12/16] btrfs: only call __extent_writepage_io from extent_write_locked_range
  2023-05-31  6:04 writeback fixlets and tidyups v2 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2023-05-31  6:05 ` [PATCH 11/16] btrfs: move nr_to_write to __extent_writepage Christoph Hellwig
@ 2023-05-31  6:05 ` Christoph Hellwig
  2023-05-31  6:05 ` [PATCH 13/16] btrfs: don't treat zoned writeback as being from an async helper thread Christoph Hellwig
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-05-31  6:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

__extent_writepage does a lot of things that make no sense for
extent_write_locked_range, given that extent_write_locked_range itself is
called from __extent_writepage either directly or through a workqueue,
and all this work has already been done in the first invocation and the
pages haven't been unlocked since.  Just call __extent_writepage_io
directly instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 65 ++++++++++++++++++--------------------------
 1 file changed, 26 insertions(+), 39 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f773ee12ce1cee..1da247e753b08a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -103,12 +103,6 @@ struct btrfs_bio_ctrl {
 	blk_opf_t opf;
 	btrfs_bio_end_io_t end_io_func;
 	struct writeback_control *wbc;
-
-	/*
-	 * Tell writepage not to lock the state bits for this range, it still
-	 * does the unlocking.
-	 */
-	bool extent_locked;
 };
 
 static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl)
@@ -1475,7 +1469,6 @@ static int __extent_writepage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl
 {
 	struct folio *folio = page_folio(page);
 	struct inode *inode = page->mapping->host;
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	const u64 page_start = page_offset(page);
 	const u64 page_end = page_start + PAGE_SIZE - 1;
 	int ret;
@@ -1503,13 +1496,11 @@ static int __extent_writepage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl
 	if (ret < 0)
 		goto done;
 
-	if (!bio_ctrl->extent_locked) {
-		ret = writepage_delalloc(BTRFS_I(inode), page, bio_ctrl->wbc);
-		if (ret == 1)
-			return 0;
-		if (ret)
-			goto done;
-	}
+	ret = writepage_delalloc(BTRFS_I(inode), page, bio_ctrl->wbc);
+	if (ret == 1)
+		return 0;
+	if (ret)
+		goto done;
 
 	ret = __extent_writepage_io(BTRFS_I(inode), page, bio_ctrl, i_size, &nr);
 	if (ret == 1)
@@ -1525,21 +1516,7 @@ static int __extent_writepage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl
 	}
 	if (ret)
 		end_extent_writepage(page, ret, page_start, page_end);
-	if (bio_ctrl->extent_locked) {
-		struct writeback_control *wbc = bio_ctrl->wbc;
-
-		/*
-		 * If bio_ctrl->extent_locked, it's from extent_write_locked_range(),
-		 * the page can either be locked by lock_page() or
-		 * process_one_page().
-		 * Let btrfs_page_unlock_writer() handle both cases.
-		 */
-		ASSERT(wbc);
-		btrfs_page_unlock_writer(fs_info, page, wbc->range_start,
-					 wbc->range_end + 1 - wbc->range_start);
-	} else {
-		unlock_page(page);
-	}
+	unlock_page(page);
 	ASSERT(ret <= 0);
 	return ret;
 }
@@ -2239,10 +2216,10 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end)
 	int first_error = 0;
 	int ret = 0;
 	struct address_space *mapping = inode->i_mapping;
-	struct page *page;
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	const u32 sectorsize = fs_info->sectorsize;
+	loff_t i_size = i_size_read(inode);
 	u64 cur = start;
-	unsigned long nr_pages;
-	const u32 sectorsize = btrfs_sb(inode->i_sb)->sectorsize;
 	struct writeback_control wbc_writepages = {
 		.sync_mode	= WB_SYNC_ALL,
 		.range_start	= start,
@@ -2254,17 +2231,15 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end)
 		/* We're called from an async helper function */
 		.opf = REQ_OP_WRITE | REQ_BTRFS_CGROUP_PUNT |
 			wbc_to_write_flags(&wbc_writepages),
-		.extent_locked = 1,
 	};
 
 	ASSERT(IS_ALIGNED(start, sectorsize) && IS_ALIGNED(end + 1, sectorsize));
-	nr_pages = (round_up(end, PAGE_SIZE) - round_down(start, PAGE_SIZE)) >>
-		   PAGE_SHIFT;
-	wbc_writepages.nr_to_write = nr_pages * 2;
 
 	wbc_attach_fdatawrite_inode(&wbc_writepages, inode);
 	while (cur <= end) {
 		u64 cur_end = min(round_down(cur, PAGE_SIZE) + PAGE_SIZE - 1, end);
+		struct page *page;
+		int nr = 0;
 
 		page = find_get_page(mapping, cur >> PAGE_SHIFT);
 		/*
@@ -2275,12 +2250,25 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end)
 		ASSERT(PageLocked(page));
 		ASSERT(PageDirty(page));
 		clear_page_dirty_for_io(page);
-		ret = __extent_writepage(page, &bio_ctrl);
-		ASSERT(ret <= 0);
+
+		ret = __extent_writepage_io(BTRFS_I(inode), page, &bio_ctrl,
+					    i_size, &nr);
+		if (ret == 1)
+			goto next_page;
+
+		/* Make sure the mapping tag for page dirty gets cleared. */
+		if (nr == 0) {
+			set_page_writeback(page);
+			end_page_writeback(page);
+		}
+		if (ret)
+			end_extent_writepage(page, ret, cur, cur_end);
+		btrfs_page_unlock_writer(fs_info, page, cur, cur_end + 1 - cur);
 		if (ret < 0) {
 			found_error = true;
 			first_error = ret;
 		}
+next_page:
 		put_page(page);
 		cur = cur_end + 1;
 	}
@@ -2301,7 +2289,6 @@ int extent_writepages(struct address_space *mapping,
 	struct btrfs_bio_ctrl bio_ctrl = {
 		.wbc = wbc,
 		.opf = REQ_OP_WRITE | wbc_to_write_flags(wbc),
-		.extent_locked = 0,
 	};
 
 	/*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 13/16] btrfs: don't treat zoned writeback as being from an async helper thread
  2023-05-31  6:04 writeback fixlets and tidyups v2 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2023-05-31  6:05 ` [PATCH 12/16] btrfs: only call __extent_writepage_io from extent_write_locked_range Christoph Hellwig
@ 2023-05-31  6:05 ` Christoph Hellwig
  2023-05-31  6:05 ` [PATCH 14/16] btrfs: don't redirty the locked page for extent_write_locked_range Christoph Hellwig
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-05-31  6:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

When extent_write_locked_range was originally added, it was only used
writing back compressed pages from an async helper thread.  But it is
now also used for writing back pages on zoned devices, where it is
called directly from the ->writepage context.  In this case we want to
to be able to pass on the writeback_control instead of creating a new
one, and more importantly want to use all the normal cgroup interaction
instead of potentially deferring writeback to another helper.

Fixes: 898793d992c2 ("btrfs: zoned: write out partially allocated region")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 20 +++++++-------------
 fs/btrfs/extent_io.h |  3 ++-
 fs/btrfs/inode.c     | 20 +++++++++++++++-----
 3 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1da247e753b08a..f4d3c56b29009b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2210,7 +2210,8 @@ static int extent_write_cache_pages(struct address_space *mapping,
  * already been ran (aka, ordered extent inserted) and all pages are still
  * locked.
  */
-int extent_write_locked_range(struct inode *inode, u64 start, u64 end)
+int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
+			      struct writeback_control *wbc)
 {
 	bool found_error = false;
 	int first_error = 0;
@@ -2220,22 +2221,16 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end)
 	const u32 sectorsize = fs_info->sectorsize;
 	loff_t i_size = i_size_read(inode);
 	u64 cur = start;
-	struct writeback_control wbc_writepages = {
-		.sync_mode	= WB_SYNC_ALL,
-		.range_start	= start,
-		.range_end	= end,
-		.no_cgroup_owner = 1,
-	};
 	struct btrfs_bio_ctrl bio_ctrl = {
-		.wbc = &wbc_writepages,
-		/* We're called from an async helper function */
-		.opf = REQ_OP_WRITE | REQ_BTRFS_CGROUP_PUNT |
-			wbc_to_write_flags(&wbc_writepages),
+		.wbc = wbc,
+		.opf = REQ_OP_WRITE | wbc_to_write_flags(wbc),
 	};
 
+	if (wbc->no_cgroup_owner)
+		bio_ctrl.opf |= REQ_BTRFS_CGROUP_PUNT;
+
 	ASSERT(IS_ALIGNED(start, sectorsize) && IS_ALIGNED(end + 1, sectorsize));
 
-	wbc_attach_fdatawrite_inode(&wbc_writepages, inode);
 	while (cur <= end) {
 		u64 cur_end = min(round_down(cur, PAGE_SIZE) + PAGE_SIZE - 1, end);
 		struct page *page;
@@ -2275,7 +2270,6 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end)
 
 	submit_write_bio(&bio_ctrl, found_error ? ret : 0);
 
-	wbc_detach_inode(&wbc_writepages);
 	if (found_error)
 		return first_error;
 	return ret;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 6723bf3483d9f9..c5fae3a7d911bf 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -178,7 +178,8 @@ int try_release_extent_mapping(struct page *page, gfp_t mask);
 int try_release_extent_buffer(struct page *page);
 
 int btrfs_read_folio(struct file *file, struct folio *folio);
-int extent_write_locked_range(struct inode *inode, u64 start, u64 end);
+int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
+			      struct writeback_control *wbc);
 int extent_writepages(struct address_space *mapping,
 		      struct writeback_control *wbc);
 int btree_write_cache_pages(struct address_space *mapping,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ea9e880c8cee76..54b4b241b354fc 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1133,6 +1133,12 @@ static int submit_uncompressed_range(struct btrfs_inode *inode,
 	unsigned long nr_written = 0;
 	int page_started = 0;
 	int ret;
+	struct writeback_control wbc = {
+		.sync_mode		= WB_SYNC_ALL,
+		.range_start		= start,
+		.range_end		= end,
+		.no_cgroup_owner	= 1,
+	};
 
 	/*
 	 * Call cow_file_range() to run the delalloc range directly, since we
@@ -1162,7 +1168,10 @@ static int submit_uncompressed_range(struct btrfs_inode *inode,
 	}
 
 	/* All pages will be unlocked, including @locked_page */
-	return extent_write_locked_range(&inode->vfs_inode, start, end);
+	wbc_attach_fdatawrite_inode(&wbc, &inode->vfs_inode);
+	ret = extent_write_locked_range(&inode->vfs_inode, start, end, &wbc);
+	wbc_detach_inode(&wbc);
+	return ret;
 }
 
 static int submit_one_async_extent(struct btrfs_inode *inode,
@@ -1815,7 +1824,8 @@ static bool run_delalloc_compressed(struct btrfs_inode *inode,
 static noinline int run_delalloc_zoned(struct btrfs_inode *inode,
 				       struct page *locked_page, u64 start,
 				       u64 end, int *page_started,
-				       unsigned long *nr_written)
+				       unsigned long *nr_written,
+				       struct writeback_control *wbc)
 {
 	u64 done_offset = end;
 	int ret;
@@ -1847,8 +1857,8 @@ static noinline int run_delalloc_zoned(struct btrfs_inode *inode,
 			account_page_redirty(locked_page);
 		}
 		locked_page_done = true;
-		extent_write_locked_range(&inode->vfs_inode, start, done_offset);
-
+		extent_write_locked_range(&inode->vfs_inode, start, done_offset,
+					  wbc);
 		start = done_offset + 1;
 	}
 
@@ -2422,7 +2432,7 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page
 
 	if (zoned)
 		ret = run_delalloc_zoned(inode, locked_page, start, end,
-					 page_started, nr_written);
+					 page_started, nr_written, wbc);
 	else
 		ret = cow_file_range(inode, locked_page, start, end,
 				     page_started, nr_written, 1, NULL);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 14/16] btrfs: don't redirty the locked page for extent_write_locked_range
  2023-05-31  6:04 writeback fixlets and tidyups v2 Christoph Hellwig
                   ` (12 preceding siblings ...)
  2023-05-31  6:05 ` [PATCH 13/16] btrfs: don't treat zoned writeback as being from an async helper thread Christoph Hellwig
@ 2023-05-31  6:05 ` Christoph Hellwig
  2023-06-05 21:00   ` David Sterba
  2023-05-31  6:05 ` [PATCH 15/16] btrfs: refactor the zoned device handling in cow_file_range Christoph Hellwig
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2023-05-31  6:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Instead of redirtying the locked page before calling
extent_write_locked_range, just pass a locked_page argument similar to
many other functions in the btrfs writeback code, and then exclude the
locked page from clearing the dirty bit in extent_write_locked_range.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 17 ++++++++++-------
 fs/btrfs/extent_io.h |  3 ++-
 fs/btrfs/inode.c     | 25 ++++++-------------------
 3 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f4d3c56b29009b..bc061b2ac55a12 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2210,8 +2210,8 @@ static int extent_write_cache_pages(struct address_space *mapping,
  * already been ran (aka, ordered extent inserted) and all pages are still
  * locked.
  */
-int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
-			      struct writeback_control *wbc)
+int extent_write_locked_range(struct inode *inode, struct page *locked_page,
+			      u64 start, u64 end, struct writeback_control *wbc)
 {
 	bool found_error = false;
 	int first_error = 0;
@@ -2237,14 +2237,17 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
 		int nr = 0;
 
 		page = find_get_page(mapping, cur >> PAGE_SHIFT);
+
 		/*
-		 * All pages in the range are locked since
-		 * btrfs_run_delalloc_range(), thus there is no way to clear
-		 * the page dirty flag.
+		 * All pages have been locked by btrfs_run_delalloc_range(),
+		 * thus the dirty bit can't have been cleared.
 		 */
 		ASSERT(PageLocked(page));
-		ASSERT(PageDirty(page));
-		clear_page_dirty_for_io(page);
+		if (page != locked_page) {
+			/* already cleared by extent_write_cache_pages */
+			ASSERT(PageDirty(page));
+			clear_page_dirty_for_io(page);
+		}
 
 		ret = __extent_writepage_io(BTRFS_I(inode), page, &bio_ctrl,
 					    i_size, &nr);
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index c5fae3a7d911bf..00c468aea010a1 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -178,7 +178,8 @@ int try_release_extent_mapping(struct page *page, gfp_t mask);
 int try_release_extent_buffer(struct page *page);
 
 int btrfs_read_folio(struct file *file, struct folio *folio);
-int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
+int extent_write_locked_range(struct inode *inode, struct page *locked_page,
+			      u64 start, u64 end,
 			      struct writeback_control *wbc);
 int extent_writepages(struct address_space *mapping,
 		      struct writeback_control *wbc);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 54b4b241b354fc..68ae20a3f785e3 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1088,17 +1088,9 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
 cleanup_and_bail_uncompressed:
 	/*
 	 * No compression, but we still need to write the pages in the file
-	 * we've been given so far.  redirty the locked page if it corresponds
-	 * to our extent and set things up for the async work queue to run
-	 * cow_file_range to do the normal delalloc dance.
+	 * we've been given so far.  Set things up for the async work queue to
+	 * run cow_file_range to do the normal delalloc dance.
 	 */
-	if (async_chunk->locked_page &&
-	    (page_offset(async_chunk->locked_page) >= start &&
-	     page_offset(async_chunk->locked_page)) <= end) {
-		__set_page_dirty_nobuffers(async_chunk->locked_page);
-		/* unlocked later on in the async handlers */
-	}
-
 	if (redirty)
 		extent_range_redirty_for_io(&inode->vfs_inode, start, end);
 	add_async_extent(async_chunk, start, end - start + 1, 0, NULL, 0,
@@ -1169,7 +1161,8 @@ static int submit_uncompressed_range(struct btrfs_inode *inode,
 
 	/* All pages will be unlocked, including @locked_page */
 	wbc_attach_fdatawrite_inode(&wbc, &inode->vfs_inode);
-	ret = extent_write_locked_range(&inode->vfs_inode, start, end, &wbc);
+	ret = extent_write_locked_range(&inode->vfs_inode, locked_page, start,
+					end, &wbc);
 	wbc_detach_inode(&wbc);
 	return ret;
 }
@@ -1829,7 +1822,6 @@ static noinline int run_delalloc_zoned(struct btrfs_inode *inode,
 {
 	u64 done_offset = end;
 	int ret;
-	bool locked_page_done = false;
 
 	while (start <= end) {
 		ret = cow_file_range(inode, locked_page, start, end, page_started,
@@ -1852,13 +1844,8 @@ static noinline int run_delalloc_zoned(struct btrfs_inode *inode,
 			continue;
 		}
 
-		if (!locked_page_done) {
-			__set_page_dirty_nobuffers(locked_page);
-			account_page_redirty(locked_page);
-		}
-		locked_page_done = true;
-		extent_write_locked_range(&inode->vfs_inode, start, done_offset,
-					  wbc);
+		extent_write_locked_range(&inode->vfs_inode, locked_page, start,
+					  done_offset, wbc);
 		start = done_offset + 1;
 	}
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 15/16] btrfs: refactor the zoned device handling in cow_file_range
  2023-05-31  6:04 writeback fixlets and tidyups v2 Christoph Hellwig
                   ` (13 preceding siblings ...)
  2023-05-31  6:05 ` [PATCH 14/16] btrfs: don't redirty the locked page for extent_write_locked_range Christoph Hellwig
@ 2023-05-31  6:05 ` Christoph Hellwig
  2023-06-06 14:20   ` Naohiro Aota
  2023-05-31  6:05 ` [PATCH 16/16] btrfs: split page locking out of __process_pages_contig Christoph Hellwig
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2023-05-31  6:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Handling of the done_offset to cow_file_range is a bit confusing, as
it is not updated at all when the function succeeds, and the -EAGAIN
status is used bother for the case where we need to wait for a zone
finish and the one where the allocation was partially successful.

Change the calling convention so that done_offset is always updated,
and 0 is returned if some allocation was successful (partial allocation
can still only happen for zoned devices), and -EAGAIN is only returned
when the caller needs to wait for a zone finish.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/inode.c | 53 ++++++++++++++++++++++++------------------------
 1 file changed, 27 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 68ae20a3f785e3..a12d4f77859706 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1403,7 +1403,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 	unsigned clear_bits;
 	unsigned long page_ops;
 	bool extent_reserved = false;
-	int ret = 0;
+	int ret;
 
 	if (btrfs_is_free_space_inode(inode)) {
 		ret = -EINVAL;
@@ -1462,7 +1462,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 			 * inline extent or a compressed extent.
 			 */
 			unlock_page(locked_page);
-			goto out;
+			goto done;
 		} else if (ret < 0) {
 			goto out_unlock;
 		}
@@ -1491,6 +1491,23 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 		ret = btrfs_reserve_extent(root, cur_alloc_size, cur_alloc_size,
 					   min_alloc_size, 0, alloc_hint,
 					   &ins, 1, 1);
+		if (ret == -EAGAIN) {
+			/*
+			 * For zoned devices, let the caller retry after writing
+			 * out the already allocated regions or waiting for a
+			 * zone to finish if no allocation was possible at all.
+			 *
+			 * Else convert to -ENOSPC since the caller cannot
+			 * retry.
+			 */
+			if (btrfs_is_zoned(fs_info)) {
+				if (start == orig_start)
+					return -EAGAIN;
+				*done_offset = start - 1;
+				return 0;
+			}
+			ret = -ENOSPC;
+		}
 		if (ret < 0)
 			goto out_unlock;
 		cur_alloc_size = ins.offset;
@@ -1571,8 +1588,10 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 		if (ret)
 			goto out_unlock;
 	}
-out:
-	return ret;
+done:
+	if (done_offset)
+		*done_offset = end;
+	return 0;
 
 out_drop_extent_cache:
 	btrfs_drop_extent_map_range(inode, start, start + ram_size - 1, false);
@@ -1580,21 +1599,6 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 	btrfs_dec_block_group_reservations(fs_info, ins.objectid);
 	btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset, 1);
 out_unlock:
-	/*
-	 * If done_offset is non-NULL and ret == -EAGAIN, we expect the
-	 * caller to write out the successfully allocated region and retry.
-	 */
-	if (done_offset && ret == -EAGAIN) {
-		if (orig_start < start)
-			*done_offset = start - 1;
-		else
-			*done_offset = start;
-		return ret;
-	} else if (ret == -EAGAIN) {
-		/* Convert to -ENOSPC since the caller cannot retry. */
-		ret = -ENOSPC;
-	}
-
 	/*
 	 * Now, we have three regions to clean up:
 	 *
@@ -1826,23 +1830,20 @@ static noinline int run_delalloc_zoned(struct btrfs_inode *inode,
 	while (start <= end) {
 		ret = cow_file_range(inode, locked_page, start, end, page_started,
 				     nr_written, 0, &done_offset);
-		if (ret && ret != -EAGAIN)
-			return ret;
-
 		if (*page_started) {
 			ASSERT(ret == 0);
 			return 0;
 		}
+		if (ret == -EAGAIN) {
+			ASSERT(btrfs_is_zoned(inode->root->fs_info));
 
-		if (ret == 0)
-			done_offset = end;
-
-		if (done_offset == start) {
 			wait_on_bit_io(&inode->root->fs_info->flags,
 				       BTRFS_FS_NEED_ZONE_FINISH,
 				       TASK_UNINTERRUPTIBLE);
 			continue;
 		}
+		if (ret)
+			return ret;
 
 		extent_write_locked_range(&inode->vfs_inode, locked_page, start,
 					  done_offset, wbc);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 16/16] btrfs: split page locking out of __process_pages_contig
  2023-05-31  6:04 writeback fixlets and tidyups v2 Christoph Hellwig
                   ` (14 preceding siblings ...)
  2023-05-31  6:05 ` [PATCH 15/16] btrfs: refactor the zoned device handling in cow_file_range Christoph Hellwig
@ 2023-05-31  6:05 ` Christoph Hellwig
  2023-05-31 16:39 ` writeback fixlets and tidyups v2 Josef Bacik
  2023-06-05 21:01 ` David Sterba
  17 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-05-31  6:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

There is a lot of complexity in __process_pages_contig to deal with the
PAGE_LOCK case that can return an error unlike all the other actions.

Open code the page iteration for page locking in lock_delalloc_pages and
remove all the now unused code from __process_pages_contig.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 151 +++++++++++++++++--------------------------
 fs/btrfs/extent_io.h |   1 -
 2 files changed, 59 insertions(+), 93 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index bc061b2ac55a12..ca0601ec73f831 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -197,18 +197,9 @@ void extent_range_redirty_for_io(struct inode *inode, u64 start, u64 end)
 	}
 }
 
-/*
- * Process one page for __process_pages_contig().
- *
- * Return >0 if we hit @page == @locked_page.
- * Return 0 if we updated the page status.
- * Return -EGAIN if the we need to try again.
- * (For PAGE_LOCK case but got dirty page or page not belong to mapping)
- */
-static int process_one_page(struct btrfs_fs_info *fs_info,
-			    struct address_space *mapping,
-			    struct page *page, struct page *locked_page,
-			    unsigned long page_ops, u64 start, u64 end)
+static void process_one_page(struct btrfs_fs_info *fs_info,
+			     struct page *page, struct page *locked_page,
+			     unsigned long page_ops, u64 start, u64 end)
 {
 	u32 len;
 
@@ -224,94 +215,36 @@ static int process_one_page(struct btrfs_fs_info *fs_info,
 	if (page_ops & PAGE_END_WRITEBACK)
 		btrfs_page_clamp_clear_writeback(fs_info, page, start, len);
 
-	if (page == locked_page)
-		return 1;
-
-	if (page_ops & PAGE_LOCK) {
-		int ret;
-
-		ret = btrfs_page_start_writer_lock(fs_info, page, start, len);
-		if (ret)
-			return ret;
-		if (!PageDirty(page) || page->mapping != mapping) {
-			btrfs_page_end_writer_lock(fs_info, page, start, len);
-			return -EAGAIN;
-		}
-	}
-	if (page_ops & PAGE_UNLOCK)
+	if (page != locked_page && (page_ops & PAGE_UNLOCK))
 		btrfs_page_end_writer_lock(fs_info, page, start, len);
-	return 0;
 }
 
-static int __process_pages_contig(struct address_space *mapping,
-				  struct page *locked_page,
-				  u64 start, u64 end, unsigned long page_ops,
-				  u64 *processed_end)
+static void __process_pages_contig(struct address_space *mapping,
+				   struct page *locked_page, u64 start, u64 end,
+				   unsigned long page_ops)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(mapping->host->i_sb);
 	pgoff_t start_index = start >> PAGE_SHIFT;
 	pgoff_t end_index = end >> PAGE_SHIFT;
 	pgoff_t index = start_index;
-	unsigned long pages_processed = 0;
 	struct folio_batch fbatch;
-	int err = 0;
 	int i;
 
-	if (page_ops & PAGE_LOCK) {
-		ASSERT(page_ops == PAGE_LOCK);
-		ASSERT(processed_end && *processed_end == start);
-	}
-
 	folio_batch_init(&fbatch);
 	while (index <= end_index) {
 		int found_folios;
 
 		found_folios = filemap_get_folios_contig(mapping, &index,
 				end_index, &fbatch);
-
-		if (found_folios == 0) {
-			/*
-			 * Only if we're going to lock these pages, we can find
-			 * nothing at @index.
-			 */
-			ASSERT(page_ops & PAGE_LOCK);
-			err = -EAGAIN;
-			goto out;
-		}
-
 		for (i = 0; i < found_folios; i++) {
-			int process_ret;
 			struct folio *folio = fbatch.folios[i];
-			process_ret = process_one_page(fs_info, mapping,
-					&folio->page, locked_page, page_ops,
-					start, end);
-			if (process_ret < 0) {
-				err = -EAGAIN;
-				folio_batch_release(&fbatch);
-				goto out;
-			}
-			pages_processed += folio_nr_pages(folio);
+
+			process_one_page(fs_info, &folio->page, locked_page,
+					 page_ops, start, end);
 		}
 		folio_batch_release(&fbatch);
 		cond_resched();
 	}
-out:
-	if (err && processed_end) {
-		/*
-		 * Update @processed_end. I know this is awful since it has
-		 * two different return value patterns (inclusive vs exclusive).
-		 *
-		 * But the exclusive pattern is necessary if @start is 0, or we
-		 * underflow and check against processed_end won't work as
-		 * expected.
-		 */
-		if (pages_processed)
-			*processed_end = min(end,
-			((u64)(start_index + pages_processed) << PAGE_SHIFT) - 1);
-		else
-			*processed_end = start;
-	}
-	return err;
 }
 
 static noinline void __unlock_for_delalloc(struct inode *inode,
@@ -326,29 +259,63 @@ static noinline void __unlock_for_delalloc(struct inode *inode,
 		return;
 
 	__process_pages_contig(inode->i_mapping, locked_page, start, end,
-			       PAGE_UNLOCK, NULL);
+			       PAGE_UNLOCK);
 }
 
 static noinline int lock_delalloc_pages(struct inode *inode,
 					struct page *locked_page,
-					u64 delalloc_start,
-					u64 delalloc_end)
+					u64 start,
+					u64 end)
 {
-	unsigned long index = delalloc_start >> PAGE_SHIFT;
-	unsigned long end_index = delalloc_end >> PAGE_SHIFT;
-	u64 processed_end = delalloc_start;
-	int ret;
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	struct address_space *mapping = inode->i_mapping;
+	pgoff_t start_index = start >> PAGE_SHIFT;
+	pgoff_t end_index = end >> PAGE_SHIFT;
+	pgoff_t index = start_index;
+	u64 processed_end = start;
+	struct folio_batch fbatch;
 
-	ASSERT(locked_page);
 	if (index == locked_page->index && index == end_index)
 		return 0;
 
-	ret = __process_pages_contig(inode->i_mapping, locked_page, delalloc_start,
-				     delalloc_end, PAGE_LOCK, &processed_end);
-	if (ret == -EAGAIN && processed_end > delalloc_start)
-		__unlock_for_delalloc(inode, locked_page, delalloc_start,
-				      processed_end);
-	return ret;
+	folio_batch_init(&fbatch);
+	while (index <= end_index) {
+		unsigned int found_folios, i;
+
+		found_folios = filemap_get_folios_contig(mapping, &index,
+				end_index, &fbatch);
+		if (found_folios == 0)
+			goto out;
+
+		for (i = 0; i < found_folios; i++) {
+			struct page *page = &fbatch.folios[i]->page;
+			u32 len = end + 1 - start;
+
+			if (page == locked_page)
+				continue;
+
+			if (btrfs_page_start_writer_lock(fs_info, page, start,
+							 len))
+				goto out;
+
+			if (!PageDirty(page) || page->mapping != mapping) {
+				btrfs_page_end_writer_lock(fs_info, page, start,
+							   len);
+				goto out;
+			}
+
+			processed_end = page_offset(page) + PAGE_SIZE - 1;
+		}
+		folio_batch_release(&fbatch);
+		cond_resched();
+	}
+
+	return 0;
+out:
+	folio_batch_release(&fbatch);
+	if (processed_end > start)
+		__unlock_for_delalloc(inode, locked_page, start, processed_end);
+	return -EAGAIN;
 }
 
 /*
@@ -467,7 +434,7 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 	clear_extent_bit(&inode->io_tree, start, end, clear_bits, NULL);
 
 	__process_pages_contig(inode->vfs_inode.i_mapping, locked_page,
-			       start, end, page_ops, NULL);
+			       start, end, page_ops);
 }
 
 static bool btrfs_verify_page(struct page *page, u64 start)
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 00c468aea010a1..b57a2068b4287b 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -40,7 +40,6 @@ enum {
 	ENUM_BIT(PAGE_START_WRITEBACK),
 	ENUM_BIT(PAGE_END_WRITEBACK),
 	ENUM_BIT(PAGE_SET_ORDERED),
-	ENUM_BIT(PAGE_LOCK),
 };
 
 /*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: writeback fixlets and tidyups v2
  2023-05-31  6:04 writeback fixlets and tidyups v2 Christoph Hellwig
                   ` (15 preceding siblings ...)
  2023-05-31  6:05 ` [PATCH 16/16] btrfs: split page locking out of __process_pages_contig Christoph Hellwig
@ 2023-05-31 16:39 ` Josef Bacik
  2023-06-05 21:01 ` David Sterba
  17 siblings, 0 replies; 26+ messages in thread
From: Josef Bacik @ 2023-05-31 16:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, David Sterba, linux-btrfs

On Wed, May 31, 2023 at 08:04:49AM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> this little series fixes a few minors bugs found by code inspection in the
> writeback code, and then goes on to remove the use of PageError.
> 
> This replaces the earlier version in the for-next branch.
> 
> Changes:
>  - rebased to the latest misc-next tree
>  - remove a set but unused variable in pages_processed
>  - remove a spurious folio_set_error
>  - return bool from btrfs_verify_page
>  - update various commit message
> 

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 14/16] btrfs: don't redirty the locked page for extent_write_locked_range
  2023-05-31  6:05 ` [PATCH 14/16] btrfs: don't redirty the locked page for extent_write_locked_range Christoph Hellwig
@ 2023-06-05 21:00   ` David Sterba
  0 siblings, 0 replies; 26+ messages in thread
From: David Sterba @ 2023-06-05 21:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Wed, May 31, 2023 at 08:05:03AM +0200, Christoph Hellwig wrote:
> Instead of redirtying the locked page before calling
> extent_write_locked_range, just pass a locked_page argument similar to
> many other functions in the btrfs writeback code, and then exclude the
> locked page from clearing the dirty bit in extent_write_locked_range.

I'm still not convinced this is safe, the changelog should be much
longer for this kind of change. I remember some bugs with redirtying
(e.g. 1d53c9e6723022b12e4a, 4adaa611020fa6ac65b0a). The code has evolved
since so I'd like to see more than "instead of this just do that" style
of changelog.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: writeback fixlets and tidyups v2
  2023-05-31  6:04 writeback fixlets and tidyups v2 Christoph Hellwig
                   ` (16 preceding siblings ...)
  2023-05-31 16:39 ` writeback fixlets and tidyups v2 Josef Bacik
@ 2023-06-05 21:01 ` David Sterba
  2023-06-06  6:24   ` Christoph Hellwig
  17 siblings, 1 reply; 26+ messages in thread
From: David Sterba @ 2023-06-05 21:01 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Wed, May 31, 2023 at 08:04:49AM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> this little series fixes a few minors bugs found by code inspection in the
> writeback code, and then goes on to remove the use of PageError.
> 
> This replaces the earlier version in the for-next branch.
> 
> Changes:
>  - rebased to the latest misc-next tree
>  - remove a set but unused variable in pages_processed
>  - remove a spurious folio_set_error
>  - return bool from btrfs_verify_page
>  - update various commit message
> 
> Subject:
>  compression.c |    2 
>  extent_io.c   |  359 +++++++++++++++++++---------------------------------------
>  extent_io.h   |    6 
>  inode.c       |  180 ++++++++++++-----------------
>  subpage.c     |   35 -----
>  subpage.h     |   10 -
>  6 files changed, 209 insertions(+), 383 deletions(-)

Patches 14, 15 and 16 got postponed, the rest is now in misc-next,
thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: writeback fixlets and tidyups v2
  2023-06-05 21:01 ` David Sterba
@ 2023-06-06  6:24   ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-06-06  6:24 UTC (permalink / raw)
  To: David Sterba
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Mon, Jun 05, 2023 at 11:01:59PM +0200, David Sterba wrote:
> Patches 14, 15 and 16 got postponed, the rest is now in misc-next,
> thanks.

I'll reply to you comments on patch 14.  Why did you not take 15 and 16?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 15/16] btrfs: refactor the zoned device handling in cow_file_range
  2023-05-31  6:05 ` [PATCH 15/16] btrfs: refactor the zoned device handling in cow_file_range Christoph Hellwig
@ 2023-06-06 14:20   ` Naohiro Aota
  2023-06-07  7:27     ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Naohiro Aota @ 2023-06-06 14:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Wed, May 31, 2023 at 08:05:04AM +0200, Christoph Hellwig wrote:
> Handling of the done_offset to cow_file_range is a bit confusing, as
> it is not updated at all when the function succeeds, and the -EAGAIN
> status is used bother for the case where we need to wait for a zone
> finish and the one where the allocation was partially successful.
> 
> Change the calling convention so that done_offset is always updated,
> and 0 is returned if some allocation was successful (partial allocation
> can still only happen for zoned devices), and -EAGAIN is only returned
> when the caller needs to wait for a zone finish.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/btrfs/inode.c | 53 ++++++++++++++++++++++++------------------------
>  1 file changed, 27 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 68ae20a3f785e3..a12d4f77859706 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1403,7 +1403,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
>  	unsigned clear_bits;
>  	unsigned long page_ops;
>  	bool extent_reserved = false;
> -	int ret = 0;
> +	int ret;
>  
>  	if (btrfs_is_free_space_inode(inode)) {
>  		ret = -EINVAL;
> @@ -1462,7 +1462,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
>  			 * inline extent or a compressed extent.
>  			 */
>  			unlock_page(locked_page);
> -			goto out;
> +			goto done;
>  		} else if (ret < 0) {
>  			goto out_unlock;
>  		}
> @@ -1491,6 +1491,23 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
>  		ret = btrfs_reserve_extent(root, cur_alloc_size, cur_alloc_size,
>  					   min_alloc_size, 0, alloc_hint,
>  					   &ins, 1, 1);
> +		if (ret == -EAGAIN) {

Yeah, moving this check here is reasonable. Currently, we check the return
value of cow_file_range_inline(), btrfs_reserve_extent(), and
btrfs_reloc_clone_csums(), but -EAGAIN is only meaningful for
btrfs_reserve_extent().

> +			/*
> +			 * For zoned devices, let the caller retry after writing
> +			 * out the already allocated regions or waiting for a
> +			 * zone to finish if no allocation was possible at all.
> +			 *
> +			 * Else convert to -ENOSPC since the caller cannot
> +			 * retry.
> +			 */
> +			if (btrfs_is_zoned(fs_info)) {
> +				if (start == orig_start)
> +					return -EAGAIN;
> +				*done_offset = start - 1;

This will hit a null pointer dereference when it is called from
submit_uncompressed_range(). Well, that means we need to implement the
partial writing also in submit_uncompressed_range().

BTW, we also need to handle -EAGAIN from btrfs_reserve_extent() in
btrfs_new_extent_direct() as it currently returns -EAGAIN to the userland.

> +				return 0;
> +			}

Once the above issue is solved, we can just ASSERT(btrfs_is_zoned()) for
ret == -EAGAIN case.

> +			ret = -ENOSPC;
> +		}
>  		if (ret < 0)
>  			goto out_unlock;
>  		cur_alloc_size = ins.offset;
> @@ -1571,8 +1588,10 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
>  		if (ret)
>  			goto out_unlock;
>  	}
> -out:
> -	return ret;
> +done:
> +	if (done_offset)
> +		*done_offset = end;
> +	return 0;
>  
>  out_drop_extent_cache:
>  	btrfs_drop_extent_map_range(inode, start, start + ram_size - 1, false);
> @@ -1580,21 +1599,6 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
>  	btrfs_dec_block_group_reservations(fs_info, ins.objectid);
>  	btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset, 1);
>  out_unlock:
> -	/*
> -	 * If done_offset is non-NULL and ret == -EAGAIN, we expect the
> -	 * caller to write out the successfully allocated region and retry.
> -	 */
> -	if (done_offset && ret == -EAGAIN) {
> -		if (orig_start < start)
> -			*done_offset = start - 1;
> -		else
> -			*done_offset = start;
> -		return ret;
> -	} else if (ret == -EAGAIN) {
> -		/* Convert to -ENOSPC since the caller cannot retry. */
> -		ret = -ENOSPC;
> -	}
> -
>  	/*
>  	 * Now, we have three regions to clean up:
>  	 *
> @@ -1826,23 +1830,20 @@ static noinline int run_delalloc_zoned(struct btrfs_inode *inode,
>  	while (start <= end) {
>  		ret = cow_file_range(inode, locked_page, start, end, page_started,
>  				     nr_written, 0, &done_offset);
> -		if (ret && ret != -EAGAIN)
> -			return ret;
> -
>  		if (*page_started) {
>  			ASSERT(ret == 0);
>  			return 0;
>  		}
> +		if (ret == -EAGAIN) {
> +			ASSERT(btrfs_is_zoned(inode->root->fs_info));

Is this necessary? run_delalloc_zoned() should be called only for zoned
mode anyway.

>  
> -		if (ret == 0)
> -			done_offset = end;
> -
> -		if (done_offset == start) {
>  			wait_on_bit_io(&inode->root->fs_info->flags,
>  				       BTRFS_FS_NEED_ZONE_FINISH,
>  				       TASK_UNINTERRUPTIBLE);
>  			continue;
>  		}
> +		if (ret)
> +			return ret;
>  
>  		extent_write_locked_range(&inode->vfs_inode, locked_page, start,
>  					  done_offset, wbc);
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 15/16] btrfs: refactor the zoned device handling in cow_file_range
  2023-06-06 14:20   ` Naohiro Aota
@ 2023-06-07  7:27     ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-06-07  7:27 UTC (permalink / raw)
  To: Naohiro Aota
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Tue, Jun 06, 2023 at 02:20:50PM +0000, Naohiro Aota wrote:
> > +			/*
> > +			 * For zoned devices, let the caller retry after writing
> > +			 * out the already allocated regions or waiting for a
> > +			 * zone to finish if no allocation was possible at all.
> > +			 *
> > +			 * Else convert to -ENOSPC since the caller cannot
> > +			 * retry.
> > +			 */
> > +			if (btrfs_is_zoned(fs_info)) {
> > +				if (start == orig_start)
> > +					return -EAGAIN;
> > +				*done_offset = start - 1;
> 
> This will hit a null pointer dereference when it is called from
> submit_uncompressed_range(). Well, that means we need to implement the
> partial writing also in submit_uncompressed_range().

I think we need to do that anyway, don't we?  As far as I can tell
submit_uncompressed_range is simply broken right now on zoned devices
if cow_file_range fails to allocate all blocks.

> BTW, we also need to handle -EAGAIN from btrfs_reserve_extent() in
> btrfs_new_extent_direct() as it currently returns -EAGAIN to the userland.

Indeed.

> >  			ASSERT(ret == 0);
> >  			return 0;
> >  		}
> > +		if (ret == -EAGAIN) {
> > +			ASSERT(btrfs_is_zoned(inode->root->fs_info));
> 
> Is this necessary? run_delalloc_zoned() should be called only for zoned
> mode anyway.

True.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 10/16] btrfs: remove non-standard extent handling in __extent_writepage_io
  2023-05-31  6:04 ` [PATCH 10/16] btrfs: remove non-standard extent handling in __extent_writepage_io Christoph Hellwig
@ 2024-02-10  9:39   ` Qu Wenruo
  2024-02-12 15:52     ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Qu Wenruo @ 2024-02-10  9:39 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs



On 2023/5/31 15:34, Christoph Hellwig wrote:
> __extent_writepage_io is never called for compressed or inline extents,
> or holes.  Remove the not quite working code for them and replace it with
> asserts that these cases don't happen.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

It's a little too late, but this patch is causing crashing for subpage.

> ---
>   fs/btrfs/extent_io.c | 23 +++++------------------
>   1 file changed, 5 insertions(+), 18 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 09a9973c27ccfb..a2e1dbd9b92309 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -1361,7 +1361,6 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
>   	struct extent_map *em;
>   	int ret = 0;
>   	int nr = 0;
> -	bool compressed;
>
>   	ret = btrfs_writepage_cow_fixup(page);
>   	if (ret) {
> @@ -1419,10 +1418,14 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
>   		ASSERT(cur < end);
>   		ASSERT(IS_ALIGNED(em->start, fs_info->sectorsize));
>   		ASSERT(IS_ALIGNED(em->len, fs_info->sectorsize));
> +
>   		block_start = em->block_start;
> -		compressed = test_bit(EXTENT_FLAG_COMPRESSED, &em->flags);
>   		disk_bytenr = em->block_start + extent_offset;
>
> +		ASSERT(!test_bit(EXTENT_FLAG_COMPRESSED, &em->flags));
> +		ASSERT(block_start != EXTENT_MAP_HOLE);

For subpage cases, __extent_writepage_io() can be triggered to write
only a subset of the page, from extent_write_locked_range().

In that case, if we have submitted the target range, since our @len is
to the end of the page, we can hit a hole.

In that case, this ASSERT() would be triggered.
And even worse, if CONFIG_BTRFS_ASSERT() is not enabled, we can do wrong
writeback using the wrong disk_bytenr.

So at least we need to skip the hole ranges for subpage.
And thankfully the remaining two cases are impossible for subpage.

Thanks,
Qu

> +		ASSERT(block_start != EXTENT_MAP_INLINE);
> +
>   		/*
>   		 * Note that em_end from extent_map_end() and dirty_range_end from
>   		 * find_next_dirty_byte() are all exclusive
> @@ -1431,22 +1434,6 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
>   		free_extent_map(em);
>   		em = NULL;
>
> -		/*
> -		 * compressed and inline extents are written through other
> -		 * paths in the FS
> -		 */
> -		if (compressed || block_start == EXTENT_MAP_HOLE ||
> -		    block_start == EXTENT_MAP_INLINE) {
> -			if (compressed)
> -				nr++;
> -			else
> -				btrfs_writepage_endio_finish_ordered(inode,
> -						page, cur, cur + iosize - 1, true);
> -			btrfs_page_clear_dirty(fs_info, page, cur, iosize);
> -			cur += iosize;
> -			continue;
> -		}
> -
>   		btrfs_set_range_writeback(inode, cur, cur + iosize - 1);
>   		if (!PageWriteback(page)) {
>   			btrfs_err(inode->root->fs_info,

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 10/16] btrfs: remove non-standard extent handling in __extent_writepage_io
  2024-02-10  9:39   ` Qu Wenruo
@ 2024-02-12 15:52     ` Christoph Hellwig
  2024-02-12 23:06       ` Qu Wenruo
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2024-02-12 15:52 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Sat, Feb 10, 2024 at 08:09:47PM +1030, Qu Wenruo wrote:
>> @@ -1419,10 +1418,14 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
>>   		ASSERT(cur < end);
>>   		ASSERT(IS_ALIGNED(em->start, fs_info->sectorsize));
>>   		ASSERT(IS_ALIGNED(em->len, fs_info->sectorsize));
>> +
>>   		block_start = em->block_start;
>> -		compressed = test_bit(EXTENT_FLAG_COMPRESSED, &em->flags);
>>   		disk_bytenr = em->block_start + extent_offset;
>>
>> +		ASSERT(!test_bit(EXTENT_FLAG_COMPRESSED, &em->flags));
>> +		ASSERT(block_start != EXTENT_MAP_HOLE);
>
> For subpage cases, __extent_writepage_io() can be triggered to write
> only a subset of the page, from extent_write_locked_range().

Yes.

> In that case, if we have submitted the target range, since our @len is
> to the end of the page, we can hit a hole.
>
> In that case, this ASSERT() would be triggered.
> And even worse, if CONFIG_BTRFS_ASSERT() is not enabled, we can do wrong
> writeback using the wrong disk_bytenr.
>
> So at least we need to skip the hole ranges for subpage.
> And thankfully the remaining two cases are impossible for subpage.

The patch below reinstates the hole handling.  I don't have a system
that tests the btrfs subpage code right now, so this is untested:

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index cfd2967f04a293..a106036641104c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1388,7 +1388,6 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 		disk_bytenr = em->block_start + extent_offset;
 
 		ASSERT(!extent_map_is_compressed(em));
-		ASSERT(block_start != EXTENT_MAP_HOLE);
 		ASSERT(block_start != EXTENT_MAP_INLINE);
 
 		/*
@@ -1399,6 +1398,15 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 		free_extent_map(em);
 		em = NULL;
 
+		if (block_start == EXTENT_MAP_HOLE) {
+			btrfs_mark_ordered_io_finished(inode, page, cur, iosize,
+						       true);
+			btrfs_folio_clear_dirty(fs_info, page_folio(page), cur,
+						iosize);
+			cur += iosize;
+			continue;
+		}
+
 		btrfs_set_range_writeback(inode, cur, cur + iosize - 1);
 		if (!PageWriteback(page)) {
 			btrfs_err(inode->root->fs_info,

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 10/16] btrfs: remove non-standard extent handling in __extent_writepage_io
  2024-02-12 15:52     ` Christoph Hellwig
@ 2024-02-12 23:06       ` Qu Wenruo
  0 siblings, 0 replies; 26+ messages in thread
From: Qu Wenruo @ 2024-02-12 23:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs



On 2024/2/13 02:22, Christoph Hellwig wrote:
> On Sat, Feb 10, 2024 at 08:09:47PM +1030, Qu Wenruo wrote:
>>> @@ -1419,10 +1418,14 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
>>>    		ASSERT(cur < end);
>>>    		ASSERT(IS_ALIGNED(em->start, fs_info->sectorsize));
>>>    		ASSERT(IS_ALIGNED(em->len, fs_info->sectorsize));
>>> +
>>>    		block_start = em->block_start;
>>> -		compressed = test_bit(EXTENT_FLAG_COMPRESSED, &em->flags);
>>>    		disk_bytenr = em->block_start + extent_offset;
>>>
>>> +		ASSERT(!test_bit(EXTENT_FLAG_COMPRESSED, &em->flags));
>>> +		ASSERT(block_start != EXTENT_MAP_HOLE);
>>
>> For subpage cases, __extent_writepage_io() can be triggered to write
>> only a subset of the page, from extent_write_locked_range().
>
> Yes.
>
>> In that case, if we have submitted the target range, since our @len is
>> to the end of the page, we can hit a hole.
>>
>> In that case, this ASSERT() would be triggered.
>> And even worse, if CONFIG_BTRFS_ASSERT() is not enabled, we can do wrong
>> writeback using the wrong disk_bytenr.
>>
>> So at least we need to skip the hole ranges for subpage.
>> And thankfully the remaining two cases are impossible for subpage.
>
> The patch below reinstates the hole handling.  I don't have a system
> that tests the btrfs subpage code right now, so this is untested:
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index cfd2967f04a293..a106036641104c 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -1388,7 +1388,6 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
>   		disk_bytenr = em->block_start + extent_offset;
>
>   		ASSERT(!extent_map_is_compressed(em));
> -		ASSERT(block_start != EXTENT_MAP_HOLE);
>   		ASSERT(block_start != EXTENT_MAP_INLINE);
>
>   		/*
> @@ -1399,6 +1398,15 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
>   		free_extent_map(em);
>   		em = NULL;
>
> +		if (block_start == EXTENT_MAP_HOLE) {
> +			btrfs_mark_ordered_io_finished(inode, page, cur, iosize,
> +						       true);
> +			btrfs_folio_clear_dirty(fs_info, page_folio(page), cur,
> +						iosize);
> +			cur += iosize;
> +			continue;
> +		}
> +
>   		btrfs_set_range_writeback(inode, cur, cur + iosize - 1);
>   		if (!PageWriteback(page)) {
>   			btrfs_err(inode->root->fs_info,

There are more problem than I initially thought.

In fact, I went another path to make __extent_writepage_io() to only
submit IO for the desired range.

But we would have another problem related to @locked_page handling.

For extent_write_locked_range(), we expect to unlock the @locked_page.
But for subpage case, we can have multiple dirty ranges.

In that case, if we unlock @locked_page for the first iteration, the
next extent_write_locked_range() iteration can still try to get the same
page, and found the page unlocked, triggering an ASSERT().

So the patch itself is not the root cause, it's the lack of subpage
locking causing the problem.

Sorry for the false alerts.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2024-02-12 23:06 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-31  6:04 writeback fixlets and tidyups v2 Christoph Hellwig
2023-05-31  6:04 ` [PATCH 01/16] btrfs: fix range_end calculation in extent_write_locked_range Christoph Hellwig
2023-05-31  6:04 ` [PATCH 02/16] btrfs: factor out a btrfs_verify_page helper Christoph Hellwig
2023-05-31  6:04 ` [PATCH 03/16] btrfs: fix fsverify read error handling in end_page_read Christoph Hellwig
2023-05-31  6:04 ` [PATCH 04/16] btrfs: don't check PageError in btrfs_verify_page Christoph Hellwig
2023-05-31  6:04 ` [PATCH 05/16] btrfs: don't fail writeback when allocating the compression context fails Christoph Hellwig
2023-05-31  6:04 ` [PATCH 06/16] btrfs: rename cow_file_range_async to run_delalloc_compressed Christoph Hellwig
2023-05-31  6:04 ` [PATCH 07/16] btrfs: don't check PageError in __extent_writepage Christoph Hellwig
2023-05-31  6:04 ` [PATCH 08/16] btrfs: stop setting PageError in the data I/O path Christoph Hellwig
2023-05-31  6:04 ` [PATCH 09/16] btrfs: remove PAGE_SET_ERROR Christoph Hellwig
2023-05-31  6:04 ` [PATCH 10/16] btrfs: remove non-standard extent handling in __extent_writepage_io Christoph Hellwig
2024-02-10  9:39   ` Qu Wenruo
2024-02-12 15:52     ` Christoph Hellwig
2024-02-12 23:06       ` Qu Wenruo
2023-05-31  6:05 ` [PATCH 11/16] btrfs: move nr_to_write to __extent_writepage Christoph Hellwig
2023-05-31  6:05 ` [PATCH 12/16] btrfs: only call __extent_writepage_io from extent_write_locked_range Christoph Hellwig
2023-05-31  6:05 ` [PATCH 13/16] btrfs: don't treat zoned writeback as being from an async helper thread Christoph Hellwig
2023-05-31  6:05 ` [PATCH 14/16] btrfs: don't redirty the locked page for extent_write_locked_range Christoph Hellwig
2023-06-05 21:00   ` David Sterba
2023-05-31  6:05 ` [PATCH 15/16] btrfs: refactor the zoned device handling in cow_file_range Christoph Hellwig
2023-06-06 14:20   ` Naohiro Aota
2023-06-07  7:27     ` Christoph Hellwig
2023-05-31  6:05 ` [PATCH 16/16] btrfs: split page locking out of __process_pages_contig Christoph Hellwig
2023-05-31 16:39 ` writeback fixlets and tidyups v2 Josef Bacik
2023-06-05 21:01 ` David Sterba
2023-06-06  6:24   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.