All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH v3 23/31] btrfs: disable inline extent creation for subpage
Date: Fri, 21 May 2021 14:40:42 +0800	[thread overview]
Message-ID: <20210521064050.191164-24-wqu@suse.com> (raw)
In-Reply-To: <20210521064050.191164-1-wqu@suse.com>

[BUG]
When running the following fsx command (extracted from generic/127) on
subpage btrfs, it can create inline extent with regular extents:

	fsx -q -l 262144 -o 65536 -S 191110531 -N 9057 -R -W $mnt/file > /tmp/fsx

The offending extent would look like:

        item 9 key (257 INODE_REF 256) itemoff 15703 itemsize 14
                index 2 namelen 4 name: file
        item 10 key (257 EXTENT_DATA 0) itemoff 14975 itemsize 728
                generation 7 type 0 (inline)
                inline extent data size 707 ram_bytes 707 compression 0 (none)
        item 11 key (257 EXTENT_DATA 4096) itemoff 14922 itemsize 53
                generation 7 type 2 (prealloc)
                prealloc data disk byte 102346752 nr 4096
                prealloc data offset 0 nr 4096

[CAUSE]
For subpage btrfs, the writeback is triggered in page unit, which means,
even if we just want to writeback range [16K, 20K) for 64K page system,
we will still try to writeback any dirty sector of range [0, 64K).

This is never a problem if sectorsize == PAGE_SIZE, but for subpage,
this can cause unexpected problems.

For above test case, the last several operations from fsx are:

 9055 trunc      from 0x40000 to 0x2c3
 9057 falloc     from 0x164c to 0x19d2 (0x386 bytes)

In operation 9055, we dirtied sector [0, 4096), then in falloc, we call
btrfs_wait_ordered_range(inode, start=4096, len=4096), only expecting to
writeback any dirty data in [4096, 8192), but nothing else.

Unfortunately, in subpage case, above btrfs_wait_ordered_range() will
trigger writeback of the range [0, 64K), which includes the data at [0,
4096).

And since at the call site, we haven't yet increased i_size, which is
still 707, this means cow_file_range() can insert an inline extent.

Resulting above inline + regular extent.

[WORKAROUND]
I don't really have any good short-term solution yet, as this means all
operations that would trigger writeback need to be reviewed for any
isize change.

So here I choose to disable inline extent creation for subpage case as a
workaround.
We have done tons of work just to avoid such extent, so I don't to
create an exception just for subpage.

This only affects inline extent creation, btrfs subpage support has no
problem reading existing inline extents at all.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index f82055ba09c8..f90cb8676e36 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -682,7 +682,11 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
 		}
 	}
 cont:
-	if (start == 0) {
+	/*
+	 * Check cow_file_range() for why we don't even try to create
+	 * inline extent for subpage case.
+	 */
+	if (start == 0 && fs_info->sectorsize == PAGE_SIZE) {
 		/* lets try to make an inline extent */
 		if (ret || total_in < actual_end) {
 			/* we didn't compress the entire range, try
@@ -1080,7 +1084,17 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 
 	inode_should_defrag(inode, start, end, num_bytes, SZ_64K);
 
-	if (start == 0) {
+	/*
+	 * Due to the page size limit, for subpage we can only trigger the
+	 * writeback for the dirty sectors of page, that means data writeback
+	 * is doing more writeback than what we want.
+	 *
+	 * This is especially unexpected for some call sites like fallocate,
+	 * where we only increase isize after everything is done.
+	 * This means we can trigger inline extent even we didn't want.
+	 * So here we skip inline extent creation completely.
+	 */
+	if (start == 0 && fs_info->sectorsize == PAGE_SIZE) {
 		/* lets try to make an inline extent */
 		ret = cow_file_range_inline(inode, start, end, 0,
 					    BTRFS_COMPRESS_NONE, NULL);
-- 
2.31.1


  parent reply	other threads:[~2021-05-21  6:43 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-21  6:40 [PATCH v3 00/31] btrfs: add data write support for subpage Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 01/31] btrfs: pass bytenr directly to __process_pages_contig() Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 02/31] btrfs: refactor the page status update into process_one_page() Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 03/31] btrfs: provide btrfs_page_clamp_*() helpers Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 04/31] btrfs: only require sector size alignment for end_bio_extent_writepage() Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 05/31] btrfs: make btrfs_dirty_pages() to be subpage compatible Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 06/31] btrfs: make __process_pages_contig() to handle subpage dirty/error/writeback status Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 07/31] btrfs: make end_bio_extent_writepage() to be subpage compatible Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 08/31] btrfs: make process_one_page() to handle subpage locking Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 09/31] btrfs: introduce helpers for subpage ordered status Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 10/31] btrfs: make page Ordered bit to be subpage compatible Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 11/31] btrfs: update locked page dirty/writeback/error bits in __process_pages_contig Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 12/31] btrfs: prevent extent_clear_unlock_delalloc() to unlock page not locked by __process_pages_contig() Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 13/31] btrfs: make btrfs_set_range_writeback() subpage compatible Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 14/31] btrfs: make __extent_writepage_io() only submit dirty range for subpage Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 15/31] btrfs: make btrfs_truncate_block() to be subpage compatible Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 16/31] btrfs: make btrfs_page_mkwrite() " Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 17/31] btrfs: reflink: make copy_inline_to_page() " Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 18/31] btrfs: fix the filemap_range_has_page() call in btrfs_punch_hole_lock_range() Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 19/31] btrfs: don't clear page extent mapped if we're not invalidating the full page Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 20/31] btrfs: extract relocation page read and dirty part into its own function Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 21/31] btrfs: make relocate_one_page() to handle subpage case Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 22/31] btrfs: fix wild subpage writeback which does not have ordered extent Qu Wenruo
2021-05-21  6:40 ` Qu Wenruo [this message]
2021-05-21  6:40 ` [PATCH v3 24/31] btrfs: allow submit_extent_page() to do bio split for subpage Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 25/31] btrfs: make defrag to be semi subpage compatible Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 26/31] btrfs: reject raid5/6 fs for subpage Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 27/31] btrfs: fix a crash caused by race between prepare_pages() and btrfs_releasepage() Qu Wenruo
2021-05-24 10:56   ` Filipe Manana
2021-05-24 11:58     ` Qu Wenruo
2021-05-24 12:10       ` Filipe Manana
2021-05-21  6:40 ` [PATCH v3 28/31] btrfs: fix a use-after-free bug in writeback subpage helper Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 29/31] btrfs: fix a subpage false alert for relocating partial preallocated data extents Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 30/31] btrfs: fix a subpage relocation data corruption Qu Wenruo
2021-05-21  6:40 ` [PATCH v3 31/31] btrfs: allow read-write for 4K sectorsize on 64K page size systems Qu Wenruo
2021-05-30  0:12 ` [PATCH v3 00/31] btrfs: add data write support for subpage Neal Gompa
2021-05-30  0:24   ` Qu Wenruo
2021-05-31  1:32   ` Su Yue
2021-05-31  1:52     ` Neal Gompa
2021-05-31  2:26       ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210521064050.191164-24-wqu@suse.com \
    --to=wqu@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.