All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] btrfs: fixes for the 13 subpage preparation patches
@ 2021-05-18  7:09 Qu Wenruo
  2021-05-18  7:09 ` [PATCH v3 1/2] btrfs: fix the fs hang when run_delalloc_range() failed Qu Wenruo
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Qu Wenruo @ 2021-05-18  7:09 UTC (permalink / raw)
  To: linux-btrfs

Although Ritesh and Goldwyn were both testing full subpage patches and
got pretty good results, our awesome maintainer David still found some
crash and hang:

- btrfs/125 hang on x86
  This test case is not in auto group, and for subpage case we reject
  RAID56 thus it doesn't trigger the hang.

  It's a behavior change in page Ordered (private2) cleanup sequence.
  In btrfs_cleanup_ordered_extents() we cleaned up the page Ordered bit,
  but doesn't run the ordered extent accounting for the locked page.
  This leaves the ordered extent accounting never run on the locked
  page, hanging the fs.

  This needs a dedicated fix, but it's still pretty small (mostly
  comments), and won't affect normal routines.

- generic/521 random crash on x86
  Which I can't reproduce, but according to the dying message, it's not
  hard to find the problem.
  An assignment out of the protection of spinlock.
  This small fix can be folded into patch "btrfs: introduce
  btrfs_lookup_first_ordered_range()"

Changelog:
v2:
- Fix the typos in the commit message of the 1st patch
- Remove the ClearPageOrdered() call in the 1st patch
  As btrfs_mark_ordered_io_finished() will clear it in a more accurate
  way.
  And ClearPageOrdered() is not subpage compatible.

v3:
- Change the fix for the btrfs/215 hang
  Now we choose to skip the locked page, leaving both its Ordered bit
  and ordered extent accounting to be handled by run_delalloc_range()
  error hanlding path.

  This fixes a possible ordered extent underflow, and removes a forward
  declaration.

Qu Wenruo (2):
  btrfs: fix the fs hang when run_delalloc_range() failed
  btrfs: fix the unsafe access in btrfs_lookup_first_ordered_range()

 fs/btrfs/inode.c        | 21 +++++++++++++++++++++
 fs/btrfs/ordered-data.c |  3 ++-
 2 files changed, 23 insertions(+), 1 deletion(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3 1/2] btrfs: fix the fs hang when run_delalloc_range() failed
  2021-05-18  7:09 [PATCH v3 0/2] btrfs: fixes for the 13 subpage preparation patches Qu Wenruo
@ 2021-05-18  7:09 ` Qu Wenruo
  2021-05-18  7:09 ` [PATCH v3 2/2] btrfs: fix the unsafe access in btrfs_lookup_first_ordered_range() Qu Wenruo
  2021-05-18 21:09 ` [PATCH v3 0/2] btrfs: fixes for the 13 subpage preparation patches David Sterba
  2 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2021-05-18  7:09 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
When running subpage preparation patches on x86, btrfs/125 will hang
forever with one ordered extent never finished.

[CAUSE]
The test case btrfs/125 itself will always fail as the fix is never merged.

When the test fails at balance, btrfs needs to cleanup the ordered
extent in btrfs_cleanup_ordered_extents() for data reloc inode.

The problem is in the sequence how we cleanup the page Order bit.

Currently it works like:

  btrfs_cleanup_ordered_extents()
  |- find_get_page();
  |- btrfs_page_clear_ordered(page);
  |  Now the page doesn't have Ordered bit anymore.
  |  !!! This also includes the first (locked) page !!!
  |
  |- offset += PAGE_SIZE
  |  This is to skip the first page
  |- __endio_write_update_ordered()
     |- btrfs_mark_ordered_io_finished(NULL)
        Except the first page, all ordered extent is finished.

Then the locked page is cleaned up in __extent_writepage():

  __extent_writepage()
  |- If (PageError(page))
  |- end_extent_writepage()
     |- btrfs_mark_ordered_io_finished(page)
        |- if (btrfs_test_page_ordered(page))
        |-  !!! The page get skipped !!!
            The ordered extent is not decreased as the page doesn't
            have ordered bit anymore.

This leaves the ordered extent with bytes_left == sectorsize, thus never
finish.

[FIX]
The fix is to ensure we never clear page Ordered bit without running the
ordered extent accounting.

Here we choose to skip the locked page in
btrfs_cleanup_ordered_extents() so that later end_extent_writepage() can
properly finish the ordered extent.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 48bcb351aad6..83b014c40d42 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -166,10 +166,31 @@ static inline void btrfs_cleanup_ordered_extents(struct btrfs_inode *inode,
 	struct page *page;
 
 	while (index <= end_index) {
+		/*
+		 * For locked page, we will call end_extent_writepage() on it
+		 * in run_delalloc_range() for the error handling.
+		 * That end_extent_writepage() function will call
+		 * btrfs_mark_ordered_io_finished() to clear page Ordered and
+		 * run the ordered extent accounting.
+		 *
+		 * Here we can't just clear the Ordered bit, or
+		 * btrfs_mark_ordered_io_finished() will skip the accounting
+		 * for the page range, and the ordered extent will never finish.
+		 */
+		if (index == (page_offset(locked_page) >> PAGE_SHIFT)) {
+			index++;
+			continue;
+		}
 		page = find_get_page(inode->vfs_inode.i_mapping, index);
 		index++;
 		if (!page)
 			continue;
+
+		/*
+		 * Here we just clear all Ordered bit for every page in the
+		 * range, then __endio_write_update_ordered() will handle
+		 * the ordered extent accounting for the range.
+		 */
 		ClearPageOrdered(page);
 		put_page(page);
 	}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v3 2/2] btrfs: fix the unsafe access in btrfs_lookup_first_ordered_range()
  2021-05-18  7:09 [PATCH v3 0/2] btrfs: fixes for the 13 subpage preparation patches Qu Wenruo
  2021-05-18  7:09 ` [PATCH v3 1/2] btrfs: fix the fs hang when run_delalloc_range() failed Qu Wenruo
@ 2021-05-18  7:09 ` Qu Wenruo
  2021-05-18 21:09 ` [PATCH v3 0/2] btrfs: fixes for the 13 subpage preparation patches David Sterba
  2 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2021-05-18  7:09 UTC (permalink / raw)
  To: linux-btrfs

Please fold this fix into patch "btrfs: introduce btrfs_lookup_first_ordered_range()".

[BUG]
David reported a failure in generic/521 which
btrfs_lookup_first_ordered_range() got a poisoned pointer:

 run fstests generic/521 at 2021-05-14 00:33:06
 general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6a9b: 0000 [#1] PREEMPT SMP
 CPU: 0 PID: 20046 Comm: fsx Not tainted 5.13.0-rc1-default+ #1463
 RIP: 0010:btrfs_lookup_first_ordered_range+0x46/0x140 [btrfs]
 RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: ffffffffffffffff
 RDX: 6b6b6b6b6b6b6b6b RSI: ffffffffc01b3e09 RDI: ffff93c444e397d0
 Call Trace:
  btrfs_invalidatepage+0xd3/0x390 [btrfs]
  truncate_cleanup_page+0xda/0x170
  truncate_inode_pages_range+0x131/0x5a0
  ? trace_btrfs_space_reservation+0x33/0xf0 [btrfs]
  ? lock_acquire+0xa0/0x150
  ? unmap_mapping_pages+0x4d/0x130
  ? do_raw_spin_unlock+0x4b/0xa0
  ? unmap_mapping_pages+0x5e/0x130
  btrfs_punch_hole_lock_range+0xc5/0x130 [btrfs]
  btrfs_zero_range+0x1d7/0x4b0 [btrfs]
  btrfs_fallocate+0x6b4/0x890 [btrfs]
  ? __x64_sys_fallocate+0x3e/0x70
  ? __do_sys_newfstatat+0x40/0x70
  vfs_fallocate+0x12e/0x420
  __x64_sys_fallocate+0x3e/0x70
  do_syscall_64+0x3f/0xb0
  entry_SYSCALL_64_after_hwframe+0x44/0xae

[CAUSE]
Although I can't reproduce, according to the line number, it's in the btree
search code, and just lines before that, I use some copied code from
tree_search():

	struct rb_node *node = tree->tree.rb_node;

But that assignment is out of spinlock, which is not safe to access,
thus lead to above poisoned pointer.

Unlike tree_search(), which callers have already hold the spinlock.

[FIX]
Fix it by only assign @node after we have hold the spinlock.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ordered-data.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 4fa377da40e4..b1b377ad99a0 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -943,13 +943,14 @@ struct btrfs_ordered_extent *btrfs_lookup_first_ordered_range(
 			struct btrfs_inode *inode, u64 file_offset, u64 len)
 {
 	struct btrfs_ordered_inode_tree *tree = &inode->ordered_tree;
-	struct rb_node *node = tree->tree.rb_node;
+	struct rb_node *node;
 	struct rb_node *cur;
 	struct rb_node *prev;
 	struct rb_node *next;
 	struct btrfs_ordered_extent *entry = NULL;
 
 	spin_lock_irq(&tree->lock);
+	node = tree->tree.rb_node;
 	/*
 	 * Here we don't want to use tree_search() which will use tree->last
 	 * and screw up the search order.
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 0/2] btrfs: fixes for the 13 subpage preparation patches
  2021-05-18  7:09 [PATCH v3 0/2] btrfs: fixes for the 13 subpage preparation patches Qu Wenruo
  2021-05-18  7:09 ` [PATCH v3 1/2] btrfs: fix the fs hang when run_delalloc_range() failed Qu Wenruo
  2021-05-18  7:09 ` [PATCH v3 2/2] btrfs: fix the unsafe access in btrfs_lookup_first_ordered_range() Qu Wenruo
@ 2021-05-18 21:09 ` David Sterba
  2021-05-20 13:47   ` David Sterba
  2 siblings, 1 reply; 5+ messages in thread
From: David Sterba @ 2021-05-18 21:09 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, May 18, 2021 at 03:09:40PM +0800, Qu Wenruo wrote:
> v3:
> - Change the fix for the btrfs/215 hang
>   Now we choose to skip the locked page, leaving both its Ordered bit
>   and ordered extent accounting to be handled by run_delalloc_range()
>   error hanlding path.
> 
>   This fixes a possible ordered extent underflow, and removes a forward
>   declaration.

There are 2 finished fstests runs and it did hang or crash so it seems
to be fixed, thanks. I haven't verified individual tests that reported
some output mismatch, so can't say if they're known false positives or
possibly related to your patches. I'll try to take a look tomorrow.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 0/2] btrfs: fixes for the 13 subpage preparation patches
  2021-05-18 21:09 ` [PATCH v3 0/2] btrfs: fixes for the 13 subpage preparation patches David Sterba
@ 2021-05-20 13:47   ` David Sterba
  0 siblings, 0 replies; 5+ messages in thread
From: David Sterba @ 2021-05-20 13:47 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs

On Tue, May 18, 2021 at 11:09:28PM +0200, David Sterba wrote:
> On Tue, May 18, 2021 at 03:09:40PM +0800, Qu Wenruo wrote:
> > v3:
> > - Change the fix for the btrfs/215 hang
> >   Now we choose to skip the locked page, leaving both its Ordered bit
> >   and ordered extent accounting to be handled by run_delalloc_range()
> >   error hanlding path.
> > 
> >   This fixes a possible ordered extent underflow, and removes a forward
> >   declaration.
> 
> There are 2 finished fstests runs and it did hang or crash so it seems
> to be fixed, thanks. I haven't verified individual tests that reported
> some output mismatch, so can't say if they're known false positives or
> possibly related to your patches. I'll try to take a look tomorrow.

Patch 2 folded and patch 1 added to the rest of the patchset, now merged
to misc-next. Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-05-20 13:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-18  7:09 [PATCH v3 0/2] btrfs: fixes for the 13 subpage preparation patches Qu Wenruo
2021-05-18  7:09 ` [PATCH v3 1/2] btrfs: fix the fs hang when run_delalloc_range() failed Qu Wenruo
2021-05-18  7:09 ` [PATCH v3 2/2] btrfs: fix the unsafe access in btrfs_lookup_first_ordered_range() Qu Wenruo
2021-05-18 21:09 ` [PATCH v3 0/2] btrfs: fixes for the 13 subpage preparation patches David Sterba
2021-05-20 13:47   ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.