* [PATCH] btrfs: don't call end_extent_writepage() in __extent_writepage() when IO failed
@ 2021-08-03 5:53 Qu Wenruo
2021-09-08 17:03 ` Boris Burkov
0 siblings, 1 reply; 3+ messages in thread
From: Qu Wenruo @ 2021-08-03 5:53 UTC (permalink / raw)
To: linux-btrfs
[BUG]
When running generic/475 with 64K page size and 4K sectorsize (aka
subpage), it can trigger the following BUG_ON() inside
btrfs_csum_one_bio(), the possibility is around 1/20 ~ 1/5:
bio_for_each_segment(bvec, bio, iter) {
if (!contig)
offset = page_offset(bvec.bv_page) + bvec.bv_offset;
if (!ordered) {
ordered = btrfs_lookup_ordered_extent(inode, offset);
BUG_ON(!ordered); /* Logic error */ <<<<
}
nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info,
[CAUSE]
Test case generic/475 uses dm-errors to emulate IO failure.
Here if we have a page cache which has the following delalloc range:
0 32K 64K
|/////| |////| |
\- [0, 4K) \- [32K, 36K)
And then __extent_writepage() can go through the following race:
T1 (writeback) | T2 (endio)
--------------------------------+----------------------------------
__extent_writepage() |
|- writepage_delalloc() |
| |- run_delalloc_range() |
| | Add OE for [0, 4K) |
| |- run_delalloc_range() |
| Add OE for [32K, 36K) |
| |
|- __extent_writepage_io() |
| |- submit_extent_page() |
| | |- Assemble the bio for |
| | range [0, 4K) |
| |- submit_extent_page() |
| | |- Submit the bio for |
| | | range [0, 4K) |
| | | | end_bio_extent_writepage()
| | | | |- error = -EIO;
| | | | |- end_extent_writepage( error=-EIO);
| | | | |- writepage_endio_finish_ordered()
| | | | | Remove OE for range [0, 4K)
| | | | |- btrfs_page_set_error()
| |- submit_extent_page() |
| |- Assemble the bio for |
| range [32K, 36K) |
|- if (PageError(page)) |
|- end_extent_writepage() |
|- endio_finish_ordered() |
Remove OE [32K, 36K) |
|
Submit bio for [32K, 36K) |
|- btrfs_csum_one_bio() |
|- BUG_ON(!ordered_extent) |
OE [32K, 36K) is already |
removed. |
This can only happen for subpage case, as for regular sectorsize, we
never submit current page, thus IO error will never mark the current
page Error.
[FIX]
Just remove the end_extent_writepage() call and the if (PageError())
check.
As mentioned, the end_extent_writepage() never really get executed for
regular sectorsize, and could cause above BUG_ON() for subpage.
This also means, inside __extent_writepage() we should not bother any IO
failure, but only focus on the error hit during bio assembly and
submission.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/extent_io.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e665779c046d..a1a6ac787faf 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4111,8 +4111,8 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
* Here we used to have a check for PageError() and then set @ret and
* call end_extent_writepage().
*
- * But in fact setting @ret here will cause different error paths
- * between subpage and regular sectorsize.
+ * But in fact setting @ret and call end_extent_writepage() here will
+ * cause different error paths between subpage and regular sectorsize.
*
* For regular page size, we never submit current page, but only add
* current page to current bio.
@@ -4124,7 +4124,12 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
* thus can get PageError() set by submitted bio of the same page,
* while our @ret is still 0.
*
- * So here we unify the behavior and don't set @ret.
+ * The same is also for end_extent_writepage(), which can finish
+ * ordered extent before submitting the real bio, causing
+ * BUG_ON() in btrfs_csum_one_bio().
+ *
+ * So here we unify the behavior and don't set @ret nor call
+ * end_extent_writepage().
* Error can still be properly passed to higher layer as page will
* be set error, here we just don't handle the IO failure.
*
@@ -4138,8 +4143,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
* Currently the full page based __extent_writepage_io() is not
* capable of that.
*/
- if (PageError(page))
- end_extent_writepage(page, ret, start, page_end);
+
unlock_page(page);
ASSERT(ret <= 0);
return ret;
--
2.32.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] btrfs: don't call end_extent_writepage() in __extent_writepage() when IO failed
2021-08-03 5:53 [PATCH] btrfs: don't call end_extent_writepage() in __extent_writepage() when IO failed Qu Wenruo
@ 2021-09-08 17:03 ` Boris Burkov
2021-09-08 22:36 ` Qu Wenruo
0 siblings, 1 reply; 3+ messages in thread
From: Boris Burkov @ 2021-09-08 17:03 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Tue, Aug 03, 2021 at 01:53:48PM +0800, Qu Wenruo wrote:
> [BUG]
> When running generic/475 with 64K page size and 4K sectorsize (aka
> subpage), it can trigger the following BUG_ON() inside
> btrfs_csum_one_bio(), the possibility is around 1/20 ~ 1/5:
>
> bio_for_each_segment(bvec, bio, iter) {
> if (!contig)
> offset = page_offset(bvec.bv_page) + bvec.bv_offset;
>
> if (!ordered) {
> ordered = btrfs_lookup_ordered_extent(inode, offset);
> BUG_ON(!ordered); /* Logic error */ <<<<
> }
>
> nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info,
>
> [CAUSE]
> Test case generic/475 uses dm-errors to emulate IO failure.
>
> Here if we have a page cache which has the following delalloc range:
>
> 0 32K 64K
> |/////| |////| |
> \- [0, 4K) \- [32K, 36K)
>
> And then __extent_writepage() can go through the following race:
>
> T1 (writeback) | T2 (endio)
> --------------------------------+----------------------------------
> __extent_writepage() |
> |- writepage_delalloc() |
> | |- run_delalloc_range() |
> | | Add OE for [0, 4K) |
> | |- run_delalloc_range() |
> | Add OE for [32K, 36K) |
> | |
> |- __extent_writepage_io() |
> | |- submit_extent_page() |
> | | |- Assemble the bio for |
> | | range [0, 4K) |
> | |- submit_extent_page() |
> | | |- Submit the bio for |
> | | | range [0, 4K) |
> | | | | end_bio_extent_writepage()
> | | | | |- error = -EIO;
> | | | | |- end_extent_writepage( error=-EIO);
> | | | | |- writepage_endio_finish_ordered()
> | | | | | Remove OE for range [0, 4K)
> | | | | |- btrfs_page_set_error()
> | |- submit_extent_page() |
> | |- Assemble the bio for |
> | range [32K, 36K) |
> |- if (PageError(page)) |
> |- end_extent_writepage() |
> |- endio_finish_ordered() |
> Remove OE [32K, 36K) |
> |
> Submit bio for [32K, 36K) |
> |- btrfs_csum_one_bio() |
> |- BUG_ON(!ordered_extent) |
> OE [32K, 36K) is already |
> removed. |
>
> This can only happen for subpage case, as for regular sectorsize, we
> never submit current page, thus IO error will never mark the current
> page Error.
>
> [FIX]
> Just remove the end_extent_writepage() call and the if (PageError())
> check.
>
> As mentioned, the end_extent_writepage() never really get executed for
> regular sectorsize, and could cause above BUG_ON() for subpage.
I was a little surprised to see this assertion, because it begs the
question: "why was this call added in the first place?"
As best as I can tell, it was introduced by Filipe in
"Btrfs: fix hang on error (such as ENOSPC) when writing extent pages"
That looks like a reasonably niche case that might not be covered by
xfstests, so I was wondering if you had already convinced yourself that
it no longer applies.
I'll try to see if I can reproduce his issue with this patch, or if the
code has changed by enough that it no longer reproduces.
>
> This also means, inside __extent_writepage() we should not bother any IO
> failure, but only focus on the error hit during bio assembly and
> submission.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/extent_io.c | 14 +++++++++-----
> 1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index e665779c046d..a1a6ac787faf 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4111,8 +4111,8 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
> * Here we used to have a check for PageError() and then set @ret and
> * call end_extent_writepage().
> *
> - * But in fact setting @ret here will cause different error paths
> - * between subpage and regular sectorsize.
> + * But in fact setting @ret and call end_extent_writepage() here will
> + * cause different error paths between subpage and regular sectorsize.
> *
> * For regular page size, we never submit current page, but only add
> * current page to current bio.
> @@ -4124,7 +4124,12 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
> * thus can get PageError() set by submitted bio of the same page,
> * while our @ret is still 0.
> *
> - * So here we unify the behavior and don't set @ret.
> + * The same is also for end_extent_writepage(), which can finish
> + * ordered extent before submitting the real bio, causing
> + * BUG_ON() in btrfs_csum_one_bio().
> + *
> + * So here we unify the behavior and don't set @ret nor call
> + * end_extent_writepage().
> * Error can still be properly passed to higher layer as page will
> * be set error, here we just don't handle the IO failure.
> *
> @@ -4138,8 +4143,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
> * Currently the full page based __extent_writepage_io() is not
> * capable of that.
> */
> - if (PageError(page))
> - end_extent_writepage(page, ret, start, page_end);
> +
> unlock_page(page);
> ASSERT(ret <= 0);
> return ret;
> --
> 2.32.0
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] btrfs: don't call end_extent_writepage() in __extent_writepage() when IO failed
2021-09-08 17:03 ` Boris Burkov
@ 2021-09-08 22:36 ` Qu Wenruo
0 siblings, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2021-09-08 22:36 UTC (permalink / raw)
To: Boris Burkov, Qu Wenruo; +Cc: linux-btrfs
On 2021/9/9 上午1:03, Boris Burkov wrote:
> On Tue, Aug 03, 2021 at 01:53:48PM +0800, Qu Wenruo wrote:
>> [BUG]
>> When running generic/475 with 64K page size and 4K sectorsize (aka
>> subpage), it can trigger the following BUG_ON() inside
>> btrfs_csum_one_bio(), the possibility is around 1/20 ~ 1/5:
>>
>> bio_for_each_segment(bvec, bio, iter) {
>> if (!contig)
>> offset = page_offset(bvec.bv_page) + bvec.bv_offset;
>>
>> if (!ordered) {
>> ordered = btrfs_lookup_ordered_extent(inode, offset);
>> BUG_ON(!ordered); /* Logic error */ <<<<
>> }
>>
>> nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info,
>>
>> [CAUSE]
>> Test case generic/475 uses dm-errors to emulate IO failure.
>>
>> Here if we have a page cache which has the following delalloc range:
>>
>> 0 32K 64K
>> |/////| |////| |
>> \- [0, 4K) \- [32K, 36K)
>>
>> And then __extent_writepage() can go through the following race:
>>
>> T1 (writeback) | T2 (endio)
>> --------------------------------+----------------------------------
>> __extent_writepage() |
>> |- writepage_delalloc() |
>> | |- run_delalloc_range() |
>> | | Add OE for [0, 4K) |
>> | |- run_delalloc_range() |
>> | Add OE for [32K, 36K) |
>> | |
>> |- __extent_writepage_io() |
>> | |- submit_extent_page() |
>> | | |- Assemble the bio for |
>> | | range [0, 4K) |
>> | |- submit_extent_page() |
>> | | |- Submit the bio for |
>> | | | range [0, 4K) |
>> | | | | end_bio_extent_writepage()
>> | | | | |- error = -EIO;
>> | | | | |- end_extent_writepage( error=-EIO);
>> | | | | |- writepage_endio_finish_ordered()
>> | | | | | Remove OE for range [0, 4K)
>> | | | | |- btrfs_page_set_error()
>> | |- submit_extent_page() |
>> | |- Assemble the bio for |
>> | range [32K, 36K) |
>> |- if (PageError(page)) |
>> |- end_extent_writepage() |
>> |- endio_finish_ordered() |
>> Remove OE [32K, 36K) |
>> |
>> Submit bio for [32K, 36K) |
>> |- btrfs_csum_one_bio() |
>> |- BUG_ON(!ordered_extent) |
>> OE [32K, 36K) is already |
>> removed. |
>>
>> This can only happen for subpage case, as for regular sectorsize, we
>> never submit current page, thus IO error will never mark the current
>> page Error.
>>
>> [FIX]
>> Just remove the end_extent_writepage() call and the if (PageError())
>> check.
>>
>> As mentioned, the end_extent_writepage() never really get executed for
>> regular sectorsize, and could cause above BUG_ON() for subpage.
>
> I was a little surprised to see this assertion, because it begs the
> question: "why was this call added in the first place?"
>
> As best as I can tell, it was introduced by Filipe in
> "Btrfs: fix hang on error (such as ENOSPC) when writing extent pages"
>
> That looks like a reasonably niche case that might not be covered by
> xfstests, so I was wondering if you had already convinced yourself that
> it no longer applies.
Not that niche, since the commit message provides a reproducer.
>
> I'll try to see if I can reproduce his issue with this patch, or if the
> code has changed by enough that it no longer reproduces.
There are a lot of more code change since 2014, one of the core change
is 524272607e88 ("btrfs: Handle delalloc error correctly to avoid
ordered extent hang"), which adds proper error handling in
run_delalloc_range().
Feel free to add if you find more commits enhancing the error handling path.
But for now, from the original reproducer and the existing ENOSPC test
groups, I don't think there is anything extra you need to worry.
Thanks,
Qu
>
>>
>> This also means, inside __extent_writepage() we should not bother any IO
>> failure, but only focus on the error hit during bio assembly and
>> submission.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> fs/btrfs/extent_io.c | 14 +++++++++-----
>> 1 file changed, 9 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index e665779c046d..a1a6ac787faf 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -4111,8 +4111,8 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
>> * Here we used to have a check for PageError() and then set @ret and
>> * call end_extent_writepage().
>> *
>> - * But in fact setting @ret here will cause different error paths
>> - * between subpage and regular sectorsize.
>> + * But in fact setting @ret and call end_extent_writepage() here will
>> + * cause different error paths between subpage and regular sectorsize.
>> *
>> * For regular page size, we never submit current page, but only add
>> * current page to current bio.
>> @@ -4124,7 +4124,12 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
>> * thus can get PageError() set by submitted bio of the same page,
>> * while our @ret is still 0.
>> *
>> - * So here we unify the behavior and don't set @ret.
>> + * The same is also for end_extent_writepage(), which can finish
>> + * ordered extent before submitting the real bio, causing
>> + * BUG_ON() in btrfs_csum_one_bio().
>> + *
>> + * So here we unify the behavior and don't set @ret nor call
>> + * end_extent_writepage().
>> * Error can still be properly passed to higher layer as page will
>> * be set error, here we just don't handle the IO failure.
>> *
>> @@ -4138,8 +4143,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
>> * Currently the full page based __extent_writepage_io() is not
>> * capable of that.
>> */
>> - if (PageError(page))
>> - end_extent_writepage(page, ret, start, page_end);
>> +
>> unlock_page(page);
>> ASSERT(ret <= 0);
>> return ret;
>> --
>> 2.32.0
>>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-09-08 22:36 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-03 5:53 [PATCH] btrfs: don't call end_extent_writepage() in __extent_writepage() when IO failed Qu Wenruo
2021-09-08 17:03 ` Boris Burkov
2021-09-08 22:36 ` Qu Wenruo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).