All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] btrfs: fixes for the 13 subpage preparation patches
@ 2021-05-18  2:38 Qu Wenruo
  2021-05-18  2:38 ` [PATCH v2 1/2] btrfs: fix the never finishing ordered extent when it get cleaned up Qu Wenruo
  2021-05-18  2:38 ` [PATCH v2 2/2] btrfs: fix the unsafe access in btrfs_lookup_first_ordered_range() Qu Wenruo
  0 siblings, 2 replies; 3+ messages in thread
From: Qu Wenruo @ 2021-05-18  2:38 UTC (permalink / raw)
  To: linux-btrfs

Although Ritesh and Goldwyn were both testing full subpage patches and
got pretty good results, our awesome maintainer David still found some
crashes:

- btrfs/125 hang on x86
  This test case is not in auto group, and for subpage case we reject
  RAID56 thus it doesn't trigger the crash.
  It's a behavior change in page Ordered (private2) cleanup sequence.
  We can no longer cleanup Private2 and then finish ordered extent.
  We have to do them in the same run, or later
  btrfs_mark_ordered_io_finished() will skip pages without Ordered bit,
  and causing the hang.

  This needs a dedicated fix, but it's still pretty small, and won't
  affect normal routines.

- generic/521 random crash on x86
  Which I can't reproduce, but according to the dying message, it's not
  hard to find the problem.
  An assignment out of the protection of spinlock.
  This small fix can be folded into patch "btrfs: introduce
  btrfs_lookup_first_ordered_range()"

Changelog:
v2:
- Fix the typos in the commit message of the 1st patch
- Remove the ClearPageOrdered() call in the 1st patch
  As btrfs_mark_ordered_io_finished() will clear it in a more accurate
  way.
  And ClearPageOrdered() is not subpage compatible.

Qu Wenruo (2):
  btrfs: fix the never finishing ordered extent when it get cleaned up
  btrfs: fix the unsafe access in btrfs_lookup_first_ordered_range()

 fs/btrfs/inode.c        | 21 ++++++++++++++++++++-
 fs/btrfs/ordered-data.c |  3 ++-
 2 files changed, 22 insertions(+), 2 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH v2 1/2] btrfs: fix the never finishing ordered extent when it get cleaned up
  2021-05-18  2:38 [PATCH v2 0/2] btrfs: fixes for the 13 subpage preparation patches Qu Wenruo
@ 2021-05-18  2:38 ` Qu Wenruo
  2021-05-18  2:38 ` [PATCH v2 2/2] btrfs: fix the unsafe access in btrfs_lookup_first_ordered_range() Qu Wenruo
  1 sibling, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2021-05-18  2:38 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
When running subpage preparation patches on x86, btrfs/125 will hang
forever with one ordered extent never finished.

[CAUSE]
The test case btrfs/125 itself will always fail as the fix is never merged.

When the test fails at balance, btrfs needs to cleanup the ordered
extent in btrfs_cleanup_ordered_extents() for data reloc inode.

The problem is in the sequence how we cleanup the page Order bit.

Currently it works like:

  btrfs_cleanup_ordered_extents()
  |- find_get_page();
  |- btrfs_page_clear_ordered(page);
  |  Now the page doesn't have Ordered bit anymore.
  |  !!! This also includes the first (locked) page !!!
  |
  |- offset += PAGE_SIZE
  |  This is to skip the first page
  |- __endio_write_update_ordered()
     |- btrfs_mark_ordered_io_finished(NULL)
        Except the first page, all ordered extent is finished.

Then the locked page is cleaned up in __extent_writepage():

  __extent_writepage()
  |- If (PageError(page))
  |- end_extent_writepage()
     |- btrfs_mark_ordered_io_finished(page)
        |- if (btrfs_test_page_ordered(page))
        |-  !!! The page get skipped !!!
            The ordered extent is not decreased as the page doesn't
            have ordered bit anymore.

This leaves the ordered extent with bytes_left == sectorsize, thus never
finish.

[FIX]
The proper fix is to ensure the one cleaning up the locked page to clear
page Ordered and finish ordered io range at the same time.

Thus it's no longer possible to clear page ordered then do the ordered
extent io accounting.

This patch choose to fix the call site in btrfs_cleanup_ordered_extents().

Also since we're here, also make btrfs_cleanup_ordered_extents() to
properly clamp the range for the incoming subpage support.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 48bcb351aad6..6004f72d0474 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -144,6 +144,8 @@ void btrfs_inode_unlock(struct inode *inode, unsigned int ilock_flags)
 		inode_unlock(inode);
 }
 
+static void finish_ordered_fn(struct btrfs_work *work);
+
 /*
  * Cleanup all submitted ordered extents in specified range to handle errors
  * from the btrfs_run_delalloc_range() callback.
@@ -166,11 +168,28 @@ static inline void btrfs_cleanup_ordered_extents(struct btrfs_inode *inode,
 	struct page *page;
 
 	while (index <= end_index) {
+		u64 range_start;
+		u64 range_len;
+
 		page = find_get_page(inode->vfs_inode.i_mapping, index);
 		index++;
 		if (!page)
 			continue;
-		ClearPageOrdered(page);
+
+		range_start = max_t(u64, page_offset(page), offset);
+		range_len = min(offset + bytes, page_offset(page) + PAGE_SIZE) -
+			    range_start;
+		/*
+		 * Clear page Ordered and its ordered range manually.
+		 *
+		 * We used to clear page Ordered first, but since Ordered bit
+		 * indicates whether we have ordered extent, if it get cleared
+		 * without finishing the ordered io range, the ordered extent
+		 * will hang forever, as later btrfs_mark_ordered_io_finished()
+		 * will just skip the range.
+		 */
+		btrfs_mark_ordered_io_finished(inode, page, range_start,
+				range_len, finish_ordered_fn, false);
 		put_page(page);
 	}
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH v2 2/2] btrfs: fix the unsafe access in btrfs_lookup_first_ordered_range()
  2021-05-18  2:38 [PATCH v2 0/2] btrfs: fixes for the 13 subpage preparation patches Qu Wenruo
  2021-05-18  2:38 ` [PATCH v2 1/2] btrfs: fix the never finishing ordered extent when it get cleaned up Qu Wenruo
@ 2021-05-18  2:38 ` Qu Wenruo
  1 sibling, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2021-05-18  2:38 UTC (permalink / raw)
  To: linux-btrfs

Please fold this fix into patch "btrfs: introduce btrfs_lookup_first_ordered_range()".

[BUG]
David reported a failure in generic/521 which
btrfs_lookup_first_ordered_range() got a poisoned pointer:

 run fstests generic/521 at 2021-05-14 00:33:06
 general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6a9b: 0000 [#1] PREEMPT SMP
 CPU: 0 PID: 20046 Comm: fsx Not tainted 5.13.0-rc1-default+ #1463
 RIP: 0010:btrfs_lookup_first_ordered_range+0x46/0x140 [btrfs]
 RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: ffffffffffffffff
 RDX: 6b6b6b6b6b6b6b6b RSI: ffffffffc01b3e09 RDI: ffff93c444e397d0
 Call Trace:
  btrfs_invalidatepage+0xd3/0x390 [btrfs]
  truncate_cleanup_page+0xda/0x170
  truncate_inode_pages_range+0x131/0x5a0
  ? trace_btrfs_space_reservation+0x33/0xf0 [btrfs]
  ? lock_acquire+0xa0/0x150
  ? unmap_mapping_pages+0x4d/0x130
  ? do_raw_spin_unlock+0x4b/0xa0
  ? unmap_mapping_pages+0x5e/0x130
  btrfs_punch_hole_lock_range+0xc5/0x130 [btrfs]
  btrfs_zero_range+0x1d7/0x4b0 [btrfs]
  btrfs_fallocate+0x6b4/0x890 [btrfs]
  ? __x64_sys_fallocate+0x3e/0x70
  ? __do_sys_newfstatat+0x40/0x70
  vfs_fallocate+0x12e/0x420
  __x64_sys_fallocate+0x3e/0x70
  do_syscall_64+0x3f/0xb0
  entry_SYSCALL_64_after_hwframe+0x44/0xae

[CAUSE]
Although I can't reproduce, according to the line number, it's in the btree
search code, and just lines before that, I use some copied code from
tree_search():

	struct rb_node *node = tree->tree.rb_node;

But that assignment is out of spinlock, which is not safe to access,
thus lead to above poisoned pointer.

Unlike tree_search(), which callers have already hold the spinlock.

[FIX]
Fix it by only assign @node after we have hold the spinlock.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ordered-data.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 4fa377da40e4..b1b377ad99a0 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -943,13 +943,14 @@ struct btrfs_ordered_extent *btrfs_lookup_first_ordered_range(
 			struct btrfs_inode *inode, u64 file_offset, u64 len)
 {
 	struct btrfs_ordered_inode_tree *tree = &inode->ordered_tree;
-	struct rb_node *node = tree->tree.rb_node;
+	struct rb_node *node;
 	struct rb_node *cur;
 	struct rb_node *prev;
 	struct rb_node *next;
 	struct btrfs_ordered_extent *entry = NULL;
 
 	spin_lock_irq(&tree->lock);
+	node = tree->tree.rb_node;
 	/*
 	 * Here we don't want to use tree_search() which will use tree->last
 	 * and screw up the search order.
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-05-18  2:38 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-18  2:38 [PATCH v2 0/2] btrfs: fixes for the 13 subpage preparation patches Qu Wenruo
2021-05-18  2:38 ` [PATCH v2 1/2] btrfs: fix the never finishing ordered extent when it get cleaned up Qu Wenruo
2021-05-18  2:38 ` [PATCH v2 2/2] btrfs: fix the unsafe access in btrfs_lookup_first_ordered_range() Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.