linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] btrfs: make __extent_writepage() not return error if the page is marked error
@ 2021-07-20 11:45 Qu Wenruo
  2021-07-21 19:30 ` Josef Bacik
  0 siblings, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2021-07-20 11:45 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
When running btrfs/160 in a loop for subpage with experimental
compression support, it has a high chance to crash (~20%):

 BTRFS critical (device dm-7): panic in __btrfs_add_ordered_extent:238: inconsistency in ordered tree at offset 0 (errno=-17 Object already exists)
 ------------[ cut here ]------------
 kernel BUG at fs/btrfs/ordered-data.c:238!
 Internal error: Oops - BUG: 0 [#1] SMP
 pc : __btrfs_add_ordered_extent+0x550/0x670 [btrfs]
 lr : __btrfs_add_ordered_extent+0x550/0x670 [btrfs]
 Call trace:
  __btrfs_add_ordered_extent+0x550/0x670 [btrfs]
  btrfs_add_ordered_extent+0x2c/0x50 [btrfs]
  run_delalloc_nocow+0x81c/0x8fc [btrfs]
  btrfs_run_delalloc_range+0xa4/0x390 [btrfs]
  writepage_delalloc+0xc0/0x1ac [btrfs]
  __extent_writepage+0xf4/0x370 [btrfs]
  extent_write_cache_pages+0x288/0x4f4 [btrfs]
  extent_writepages+0x58/0xe0 [btrfs]
  btrfs_writepages+0x1c/0x30 [btrfs]
  do_writepages+0x60/0x110
  __filemap_fdatawrite_range+0x108/0x170
  filemap_fdatawrite_range+0x20/0x30
  btrfs_fdatawrite_range+0x34/0x4dc [btrfs]
  __btrfs_write_out_cache+0x34c/0x480 [btrfs]
  btrfs_write_out_cache+0x144/0x220 [btrfs]
  btrfs_start_dirty_block_groups+0x3ac/0x6b0 [btrfs]
  btrfs_commit_transaction+0xd0/0xbb4 [btrfs]
  btrfs_sync_fs+0x64/0x1cc [btrfs]
  sync_fs_one_sb+0x3c/0x50
  iterate_supers+0xcc/0x1d4
  ksys_sync+0x6c/0xd0
  __arm64_sys_sync+0x1c/0x30
  invoke_syscall+0x50/0x120
  el0_svc_common.constprop.0+0x4c/0xd4
  do_el0_svc+0x30/0x9c
  el0_svc+0x2c/0x54
  el0_sync_handler+0x1a8/0x1b0
  el0_sync+0x198/0x1c0
 ---[ end trace 336f67369ae6e0af ]---

[CAUSE]
For subpage case, we can have multiple sectors inside a page, this makes
it possible for __extent_writepage() to have part of its page submitted
before returning.

In btrfs/160, we are using dm-dust to emulate write error, this means
for certain pages, we could have everything running fine, but at the end
of __extent_writepage(), one of the submitted bios fails due to dm-dust.

Then the page is marked Error, and we change @ret from 0 to -EIO.

This makes the caller extent_write_cache_pages() to error out, without
submitting the remaining pages.

Furthermore, since we're erroring out for free space cache, it doesn't
really care about the error and will update the inode and retry the
writeback.

Then we re-run the delalloc range, and will try to insert the same
delalloc range while previous delalloc range is still hanging there,
triggering the above error.

[FIX]
Ironically, to fix the problem we need to ignore the PageError() part,
at least not to populate @ret.

By this, we fallback to the same behavior of non-subpage case, continue
to submit the remaining pages so that the ordered extent can finish.

The failed bio will still mark the page error and properly inform the
caller that some IO failed, we just don't need to bother the IO error so
early.

Since we're here, also convert the PageError() macros to subpage
compatible helpers.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index fff1a4d8fe25..237824caaa84 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3815,7 +3815,8 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 		ret = btrfs_run_delalloc_range(inode, page, delalloc_start,
 				delalloc_end, &page_started, nr_written, wbc);
 		if (ret) {
-			SetPageError(page);
+			btrfs_page_set_error(inode->root->fs_info, page,
+					     page_offset(page), PAGE_SIZE);
 			/*
 			 * btrfs_run_delalloc_range should return < 0 for error
 			 * but just in case, we use > 0 here meaning the IO is
@@ -4092,7 +4093,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
 
 	WARN_ON(!PageLocked(page));
 
-	ClearPageError(page);
+	btrfs_page_clear_error(fs_info, page, page_offset(page), PAGE_SIZE);
 
 	pg_offset = offset_in_page(i_size);
 	if (page->index > end_index ||
@@ -4133,10 +4134,8 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
 		set_page_writeback(page);
 		end_page_writeback(page);
 	}
-	if (PageError(page)) {
-		ret = ret < 0 ? ret : -EIO;
+	if (PageError(page))
 		end_extent_writepage(page, ret, page_start, page_end);
-	}
 	if (epd->extent_locked) {
 		/*
 		 * If epd->extent_locked, it's from extent_write_locked_range(),
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: make __extent_writepage() not return error if the page is marked error
  2021-07-20 11:45 [PATCH] btrfs: make __extent_writepage() not return error if the page is marked error Qu Wenruo
@ 2021-07-21 19:30 ` Josef Bacik
  2021-07-21 21:18   ` Qu Wenruo
  0 siblings, 1 reply; 5+ messages in thread
From: Josef Bacik @ 2021-07-21 19:30 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 7/20/21 7:45 AM, Qu Wenruo wrote:
> [BUG]
> When running btrfs/160 in a loop for subpage with experimental
> compression support, it has a high chance to crash (~20%):
> 
>   BTRFS critical (device dm-7): panic in __btrfs_add_ordered_extent:238: inconsistency in ordered tree at offset 0 (errno=-17 Object already exists)
>   ------------[ cut here ]------------
>   kernel BUG at fs/btrfs/ordered-data.c:238!
>   Internal error: Oops - BUG: 0 [#1] SMP
>   pc : __btrfs_add_ordered_extent+0x550/0x670 [btrfs]
>   lr : __btrfs_add_ordered_extent+0x550/0x670 [btrfs]
>   Call trace:
>    __btrfs_add_ordered_extent+0x550/0x670 [btrfs]
>    btrfs_add_ordered_extent+0x2c/0x50 [btrfs]
>    run_delalloc_nocow+0x81c/0x8fc [btrfs]
>    btrfs_run_delalloc_range+0xa4/0x390 [btrfs]
>    writepage_delalloc+0xc0/0x1ac [btrfs]
>    __extent_writepage+0xf4/0x370 [btrfs]
>    extent_write_cache_pages+0x288/0x4f4 [btrfs]
>    extent_writepages+0x58/0xe0 [btrfs]
>    btrfs_writepages+0x1c/0x30 [btrfs]
>    do_writepages+0x60/0x110
>    __filemap_fdatawrite_range+0x108/0x170
>    filemap_fdatawrite_range+0x20/0x30
>    btrfs_fdatawrite_range+0x34/0x4dc [btrfs]
>    __btrfs_write_out_cache+0x34c/0x480 [btrfs]
>    btrfs_write_out_cache+0x144/0x220 [btrfs]
>    btrfs_start_dirty_block_groups+0x3ac/0x6b0 [btrfs]
>    btrfs_commit_transaction+0xd0/0xbb4 [btrfs]
>    btrfs_sync_fs+0x64/0x1cc [btrfs]
>    sync_fs_one_sb+0x3c/0x50
>    iterate_supers+0xcc/0x1d4
>    ksys_sync+0x6c/0xd0
>    __arm64_sys_sync+0x1c/0x30
>    invoke_syscall+0x50/0x120
>    el0_svc_common.constprop.0+0x4c/0xd4
>    do_el0_svc+0x30/0x9c
>    el0_svc+0x2c/0x54
>    el0_sync_handler+0x1a8/0x1b0
>    el0_sync+0x198/0x1c0
>   ---[ end trace 336f67369ae6e0af ]---
> 
> [CAUSE]
> For subpage case, we can have multiple sectors inside a page, this makes
> it possible for __extent_writepage() to have part of its page submitted
> before returning.
> 
> In btrfs/160, we are using dm-dust to emulate write error, this means
> for certain pages, we could have everything running fine, but at the end
> of __extent_writepage(), one of the submitted bios fails due to dm-dust.
> 
> Then the page is marked Error, and we change @ret from 0 to -EIO.
> 
> This makes the caller extent_write_cache_pages() to error out, without
> submitting the remaining pages.
> 
> Furthermore, since we're erroring out for free space cache, it doesn't
> really care about the error and will update the inode and retry the
> writeback.
> 
> Then we re-run the delalloc range, and will try to insert the same
> delalloc range while previous delalloc range is still hanging there,
> triggering the above error.

This seems like the real bug.  We should do the proper cleanup for the range in 
this case, not ignore errors during writeback.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: make __extent_writepage() not return error if the page is marked error
  2021-07-21 19:30 ` Josef Bacik
@ 2021-07-21 21:18   ` Qu Wenruo
  2021-07-22 14:18     ` David Sterba
  0 siblings, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2021-07-21 21:18 UTC (permalink / raw)
  To: Josef Bacik, Qu Wenruo, linux-btrfs



On 2021/7/22 上午3:30, Josef Bacik wrote:
> On 7/20/21 7:45 AM, Qu Wenruo wrote:
>> [BUG]
>> When running btrfs/160 in a loop for subpage with experimental
>> compression support, it has a high chance to crash (~20%):
>>
>>   BTRFS critical (device dm-7): panic in
>> __btrfs_add_ordered_extent:238: inconsistency in ordered tree at
>> offset 0 (errno=-17 Object already exists)
>>   ------------[ cut here ]------------
>>   kernel BUG at fs/btrfs/ordered-data.c:238!
>>   Internal error: Oops - BUG: 0 [#1] SMP
>>   pc : __btrfs_add_ordered_extent+0x550/0x670 [btrfs]
>>   lr : __btrfs_add_ordered_extent+0x550/0x670 [btrfs]
>>   Call trace:
>>    __btrfs_add_ordered_extent+0x550/0x670 [btrfs]
>>    btrfs_add_ordered_extent+0x2c/0x50 [btrfs]
>>    run_delalloc_nocow+0x81c/0x8fc [btrfs]
>>    btrfs_run_delalloc_range+0xa4/0x390 [btrfs]
>>    writepage_delalloc+0xc0/0x1ac [btrfs]
>>    __extent_writepage+0xf4/0x370 [btrfs]
>>    extent_write_cache_pages+0x288/0x4f4 [btrfs]
>>    extent_writepages+0x58/0xe0 [btrfs]
>>    btrfs_writepages+0x1c/0x30 [btrfs]
>>    do_writepages+0x60/0x110
>>    __filemap_fdatawrite_range+0x108/0x170
>>    filemap_fdatawrite_range+0x20/0x30
>>    btrfs_fdatawrite_range+0x34/0x4dc [btrfs]
>>    __btrfs_write_out_cache+0x34c/0x480 [btrfs]
>>    btrfs_write_out_cache+0x144/0x220 [btrfs]
>>    btrfs_start_dirty_block_groups+0x3ac/0x6b0 [btrfs]
>>    btrfs_commit_transaction+0xd0/0xbb4 [btrfs]
>>    btrfs_sync_fs+0x64/0x1cc [btrfs]
>>    sync_fs_one_sb+0x3c/0x50
>>    iterate_supers+0xcc/0x1d4
>>    ksys_sync+0x6c/0xd0
>>    __arm64_sys_sync+0x1c/0x30
>>    invoke_syscall+0x50/0x120
>>    el0_svc_common.constprop.0+0x4c/0xd4
>>    do_el0_svc+0x30/0x9c
>>    el0_svc+0x2c/0x54
>>    el0_sync_handler+0x1a8/0x1b0
>>    el0_sync+0x198/0x1c0
>>   ---[ end trace 336f67369ae6e0af ]---
>>
>> [CAUSE]
>> For subpage case, we can have multiple sectors inside a page, this makes
>> it possible for __extent_writepage() to have part of its page submitted
>> before returning.
>>
>> In btrfs/160, we are using dm-dust to emulate write error, this means
>> for certain pages, we could have everything running fine, but at the end
>> of __extent_writepage(), one of the submitted bios fails due to dm-dust.
>>
>> Then the page is marked Error, and we change @ret from 0 to -EIO.
>>
>> This makes the caller extent_write_cache_pages() to error out, without
>> submitting the remaining pages.
>>
>> Furthermore, since we're erroring out for free space cache, it doesn't
>> really care about the error and will update the inode and retry the
>> writeback.
>>
>> Then we re-run the delalloc range, and will try to insert the same
>> delalloc range while previous delalloc range is still hanging there,
>> triggering the above error.
>
> This seems like the real bug.

That's true.

>  We should do the proper cleanup for the
> range in this case, not ignore errors during writeback.  Thanks,

But if you change the view point, __extent_writepage() is only
submitting the pages to bio.
It shouldn't bother the bio error, but only care the error that affects
the bio submission.

This PageError() check makes the behavior different between subpage and
regular page size, thus this can be considered as a quick fix for subpage.

For a proper fix for both subpage and non-subpage case, I'm trying a
completely different way, and will send out a RFC for comment later.

Thanks,
Qu

>
> Josef

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: make __extent_writepage() not return error if the page is marked error
  2021-07-21 21:18   ` Qu Wenruo
@ 2021-07-22 14:18     ` David Sterba
  2021-07-22 21:48       ` Qu Wenruo
  0 siblings, 1 reply; 5+ messages in thread
From: David Sterba @ 2021-07-22 14:18 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Josef Bacik, Qu Wenruo, linux-btrfs

On Thu, Jul 22, 2021 at 05:18:19AM +0800, Qu Wenruo wrote:
> >> For subpage case, we can have multiple sectors inside a page, this makes
> >> it possible for __extent_writepage() to have part of its page submitted
> >> before returning.
> >>
> >> In btrfs/160, we are using dm-dust to emulate write error, this means
> >> for certain pages, we could have everything running fine, but at the end
> >> of __extent_writepage(), one of the submitted bios fails due to dm-dust.
> >>
> >> Then the page is marked Error, and we change @ret from 0 to -EIO.
> >>
> >> This makes the caller extent_write_cache_pages() to error out, without
> >> submitting the remaining pages.
> >>
> >> Furthermore, since we're erroring out for free space cache, it doesn't
> >> really care about the error and will update the inode and retry the
> >> writeback.
> >>
> >> Then we re-run the delalloc range, and will try to insert the same
> >> delalloc range while previous delalloc range is still hanging there,
> >> triggering the above error.
> >
> > This seems like the real bug.
> 
> That's true.
> 
> >  We should do the proper cleanup for the
> > range in this case, not ignore errors during writeback.  Thanks,
> 
> But if you change the view point, __extent_writepage() is only
> submitting the pages to bio.
> It shouldn't bother the bio error, but only care the error that affects
> the bio submission.
> 
> This PageError() check makes the behavior different between subpage and
> regular page size, thus this can be considered as a quick fix for subpage.
> 
> For a proper fix for both subpage and non-subpage case, I'm trying a
> completely different way, and will send out a RFC for comment later.

I tend to agree with Josef, the change is conunter intuitive. The proper
fix would be probably bigger than just a line removing the 'ret'
updates, so at least a comment explaining what's going on should be
there incase we decie to take this patch first.

I assume that for non-subpage case it's working as before.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: make __extent_writepage() not return error if the page is marked error
  2021-07-22 14:18     ` David Sterba
@ 2021-07-22 21:48       ` Qu Wenruo
  0 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2021-07-22 21:48 UTC (permalink / raw)
  To: dsterba, Josef Bacik, Qu Wenruo, linux-btrfs



On 2021/7/22 下午10:18, David Sterba wrote:
> On Thu, Jul 22, 2021 at 05:18:19AM +0800, Qu Wenruo wrote:
>>>> For subpage case, we can have multiple sectors inside a page, this makes
>>>> it possible for __extent_writepage() to have part of its page submitted
>>>> before returning.
>>>>
>>>> In btrfs/160, we are using dm-dust to emulate write error, this means
>>>> for certain pages, we could have everything running fine, but at the end
>>>> of __extent_writepage(), one of the submitted bios fails due to dm-dust.
>>>>
>>>> Then the page is marked Error, and we change @ret from 0 to -EIO.
>>>>
>>>> This makes the caller extent_write_cache_pages() to error out, without
>>>> submitting the remaining pages.
>>>>
>>>> Furthermore, since we're erroring out for free space cache, it doesn't
>>>> really care about the error and will update the inode and retry the
>>>> writeback.
>>>>
>>>> Then we re-run the delalloc range, and will try to insert the same
>>>> delalloc range while previous delalloc range is still hanging there,
>>>> triggering the above error.
>>>
>>> This seems like the real bug.
>>
>> That's true.
>>
>>>    We should do the proper cleanup for the
>>> range in this case, not ignore errors during writeback.  Thanks,
>>
>> But if you change the view point, __extent_writepage() is only
>> submitting the pages to bio.
>> It shouldn't bother the bio error, but only care the error that affects
>> the bio submission.
>>
>> This PageError() check makes the behavior different between subpage and
>> regular page size, thus this can be considered as a quick fix for subpage.
>>
>> For a proper fix for both subpage and non-subpage case, I'm trying a
>> completely different way, and will send out a RFC for comment later.
>
> I tend to agree with Josef, the change is conunter intuitive. The proper
> fix would be probably bigger than just a line removing the 'ret'
> updates, so at least a comment explaining what's going on should be
> there incase we decie to take this patch first.

Right, the commit message and missing comment is indeed concerning.

>
> I assume that for non-subpage case it's working as before.
>

Ironically, for non-subpage case, that branch never get executed for bio
error.

For non-subpage case, a page will only be *added* to current bio, but
submission never happens for current page.

Thus that if (PageError()) branch only get executed for fatal errors
like failure to allocate memory.

Only for subpage case that we can have part of the page submitted while
we're still at that page.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-07-22 21:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-20 11:45 [PATCH] btrfs: make __extent_writepage() not return error if the page is marked error Qu Wenruo
2021-07-21 19:30 ` Josef Bacik
2021-07-21 21:18   ` Qu Wenruo
2021-07-22 14:18     ` David Sterba
2021-07-22 21:48       ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).