All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH v2 1/2] btrfs: do not clear page dirty inside extent_write_locked_range()
Date: Wed,  6 Mar 2024 09:05:33 +1030	[thread overview]
Message-ID: <ebf001731e2ebafd5c2a435a7e0848634a421ed7.1709677986.git.wqu@suse.com> (raw)
In-Reply-To: <cover.1709677986.git.wqu@suse.com>

[BUG]
For subpage + zoned case, btrfs can easily hang with the following
workload, even with previous subpage delalloc rework:

 # mkfs.btrfs -s 4k -f $dev
 # mount $dev $mnt
 # xfs_io -f -c "pwrite 32k 128k" $mnt/foobar
 # umount $mnt

The system would hang at unmount due to unfinished ordered extents.

Above $dev is a tcmu-runner emulated zoned HDD, which has a max zone
append size of 64K, and the system has 64K page size.

[CAUSE]
There is a bug involved in extent_write_locked_range() (well, I'm
already surprised by how many subpage incompatible code are inside that
function):

- If @pages_dirty is true, we will clear the page dirty flag for the
  whole page

  This means, for above case, since the max zone append size is 64K,
  we got an ordered extent sized 64K, resulting the following writeback
  range:

  0               64K                 128K            192K
  |       |///////|///////////////////|/////////|
          32K               96K
           \       OE      /

  |///| = subpage dirty range

  And when we go into the __extent_writepage_io() call to submit [32K,
  64K), extent_write_locked_range() would find it's the locked page, and
  not clear its page dirty flag, so the submission go without any
  problem.

  But when we go into the [64K, 96K) range for the second half of the
  ordered extent, extent_write_locked_range() would clear the page dirty
  flag for the whole page range [64K, 128K), resulting the following
  layout:

  0               64K                 128K            192K
  |       |///////|         |         |/////////|
          32K               96K
           \       OE      /

  Then inside __extent_writepage_io(), since the page is no longer
  dirty, we skip the submission, causing only half of the ordered extent
  can be finished, thus hanging the whole system.

  Furthermore, this would cause more problems when we move to the next
  delalloc range [96K, 160K), as the original dirty range [96K, 128K)
  has its dirty flag cleared without releasing its data/metadata rsv, we
  would got rsv leak.

This bug only affects subpage and zoned case.
For non-subpage and zoned case, find_next_dirty_byte() just return the
whole page no matter if it has dirty flags or not.

For subpage and non-zoned case, we never go into
extent_write_locked_range().

[FIX]
Just do not clear the page dirty at all.
As __extent_writepage_io() would do a more accurate, subpage compatible
clear for page dirty anyway.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index fb63055f42f3..bdd0e29ba848 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2290,10 +2290,8 @@ void extent_write_locked_range(struct inode *inode, struct page *locked_page,
 
 		page = find_get_page(mapping, cur >> PAGE_SHIFT);
 		ASSERT(PageLocked(page));
-		if (pages_dirty && page != locked_page) {
+		if (pages_dirty && page != locked_page)
 			ASSERT(PageDirty(page));
-			clear_page_dirty_for_io(page);
-		}
 
 		ret = __extent_writepage_io(BTRFS_I(inode), page, cur, cur_len,
 					    &bio_ctrl, i_size, &nr);
-- 
2.44.0


  reply	other threads:[~2024-03-05 22:35 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-05 22:35 [PATCH v2 0/2] btrfs: fix data corruption/hang/rsv leak in subpage zoned cases Qu Wenruo
2024-03-05 22:35 ` Qu Wenruo [this message]
2024-05-10 14:25   ` [PATCH v2 1/2] btrfs: do not clear page dirty inside extent_write_locked_range() Josef Bacik
2024-03-05 22:35 ` [PATCH v2 2/2] btrfs: make extent_write_locked_range() to handle subpage writeback correctly Qu Wenruo
2024-05-10 14:25   ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ebf001731e2ebafd5c2a435a7e0848634a421ed7.1709677986.git.wqu@suse.com \
    --to=wqu@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.