All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] btrfs: setup the page before calling any subpage helper
@ 2021-07-30  5:58 Qu Wenruo
  2021-07-30 11:08 ` David Sterba
  0 siblings, 1 reply; 4+ messages in thread
From: Qu Wenruo @ 2021-07-30  5:58 UTC (permalink / raw)
  To: linux-btrfs

Function set_page_extent_mapped() will setup the data page cache so that
for subpage cases those pages will have page->private to store subpage
specific info.

Normally this happens when we create a new page for the page cache.
But there is a special call site, __extent_writepage(), as we have
special cases where upper layer can mark some page dirty without going
through set_page_dirty() interface.

I haven't yet seen any real world case for this, but if that's possible
then in __extent_writepage() we will call btrfs_page_clear_error()
before setting up the page->private, which can lead to NULL pointer
dereference.

Fix it by moving set_page_extent_mapped() call before
btrfs_page_clear_error().
And make sure in the error path we won't call anything subpage helper.

Fixes: 32443de3382b ("btrfs: introduce btrfs_subpage for data inodes")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
I really hope we can have a more explicit comment about in exactly which
cases we can have such page, and maybe some test cases for it.

In fact, I haven't really seen any case like this, and it doesn't really
make sense for me to make some MM layer code to mark a page dirty
without going through set_page_dirty() interface.
---
 fs/btrfs/extent_io.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e665779c046d..e82328bcb281 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4065,6 +4065,13 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
 
 	WARN_ON(!PageLocked(page));
 
+	ret = set_page_extent_mapped(page);
+	if (ret < 0) {
+		SetPageError(page);
+		unlock_page(page);
+		return ret;
+	}
+
 	btrfs_page_clear_error(btrfs_sb(inode->i_sb), page,
 			       page_offset(page), PAGE_SIZE);
 
@@ -4081,12 +4088,6 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
 		flush_dcache_page(page);
 	}
 
-	ret = set_page_extent_mapped(page);
-	if (ret < 0) {
-		SetPageError(page);
-		goto done;
-	}
-
 	if (!epd->extent_locked) {
 		ret = writepage_delalloc(BTRFS_I(inode), page, wbc, start,
 					 &nr_written);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] btrfs: setup the page before calling any subpage helper
  2021-07-30  5:58 [PATCH] btrfs: setup the page before calling any subpage helper Qu Wenruo
@ 2021-07-30 11:08 ` David Sterba
  2021-07-30 11:27   ` Qu Wenruo
  0 siblings, 1 reply; 4+ messages in thread
From: David Sterba @ 2021-07-30 11:08 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Jul 30, 2021 at 01:58:57PM +0800, Qu Wenruo wrote:
> Function set_page_extent_mapped() will setup the data page cache so that
> for subpage cases those pages will have page->private to store subpage
> specific info.
> 
> Normally this happens when we create a new page for the page cache.
> But there is a special call site, __extent_writepage(), as we have
> special cases where upper layer can mark some page dirty without going
> through set_page_dirty() interface.
> 
> I haven't yet seen any real world case for this, but if that's possible
> then in __extent_writepage() we will call btrfs_page_clear_error()
> before setting up the page->private, which can lead to NULL pointer
> dereference.

Yeah it's hard to believe, but it's been there since almost the
beginning. Back then there was a hard BUG() in the fixup worker, I've
hit it randomly on x86_64,

https://lore.kernel.org/linux-btrfs/20111031154139.GF19328@twin.jikos.cz/

you could find a lot of other reports where it crashed inside
btrfs_writepage_fixup_worker.

> Fix it by moving set_page_extent_mapped() call before
> btrfs_page_clear_error().
> And make sure in the error path we won't call anything subpage helper.

I'm not sure about the fix, because the whole fixup thing is not
entirely clear.

> Fixes: 32443de3382b ("btrfs: introduce btrfs_subpage for data inodes")
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> I really hope we can have a more explicit comment about in exactly which
> cases we can have such page, and maybe some test cases for it.

The only reliable test case was on s390 with a particular seed for fsx,
I still have it stored somewhere. On x86_64 it's very hard to hit.

> In fact, I haven't really seen any case like this, and it doesn't really
> make sense for me to make some MM layer code to mark a page dirty
> without going through set_page_dirty() interface.

On s390 it's quick because the page state bits are stored in 2 places
and need to be synced. On x86_64 it's very unclear and low level arch
specific MM stuff but it is still a problem.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] btrfs: setup the page before calling any subpage helper
  2021-07-30 11:08 ` David Sterba
@ 2021-07-30 11:27   ` Qu Wenruo
  2021-08-12 16:06     ` David Sterba
  0 siblings, 1 reply; 4+ messages in thread
From: Qu Wenruo @ 2021-07-30 11:27 UTC (permalink / raw)
  To: dsterba, linux-btrfs



On 2021/7/30 下午7:08, David Sterba wrote:
> On Fri, Jul 30, 2021 at 01:58:57PM +0800, Qu Wenruo wrote:
>> Function set_page_extent_mapped() will setup the data page cache so that
>> for subpage cases those pages will have page->private to store subpage
>> specific info.
>>
>> Normally this happens when we create a new page for the page cache.
>> But there is a special call site, __extent_writepage(), as we have
>> special cases where upper layer can mark some page dirty without going
>> through set_page_dirty() interface.
>>
>> I haven't yet seen any real world case for this, but if that's possible
>> then in __extent_writepage() we will call btrfs_page_clear_error()
>> before setting up the page->private, which can lead to NULL pointer
>> dereference.
> 
> Yeah it's hard to believe, but it's been there since almost the
> beginning. Back then there was a hard BUG() in the fixup worker, I've
> hit it randomly on x86_64,
> 
> https://lore.kernel.org/linux-btrfs/20111031154139.GF19328@twin.jikos.cz/
> 
> you could find a lot of other reports where it crashed inside
> btrfs_writepage_fixup_worker.

Well, over 10 years ago.

But finally I have seen a real world report for it.

This makes me wonder, wouldn't other sites like iomap, which also 
utilize a private structure for subpage bitmaps, also hit such crash?

> 
>> Fix it by moving set_page_extent_mapped() call before
>> btrfs_page_clear_error().
>> And make sure in the error path we won't call anything subpage helper.
> 
> I'm not sure about the fix, because the whole fixup thing is not
> entirely clear.

Indeed, but the idea should be straightfoward:

Don't call any subpage helper before set_page_extent_mapped().

So that such page without private get passed in, we can still handle it 
well.

> 
>> Fixes: 32443de3382b ("btrfs: introduce btrfs_subpage for data inodes")
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> I really hope we can have a more explicit comment about in exactly which
>> cases we can have such page, and maybe some test cases for it.
> 
> The only reliable test case was on s390 with a particular seed for fsx,
> I still have it stored somewhere. On x86_64 it's very hard to hit.

Can it be reproduced by S390 qemu tcc emulation?

And by S390, did you mean modern Power8/9/10 systems too?

> 
>> In fact, I haven't really seen any case like this, and it doesn't really
>> make sense for me to make some MM layer code to mark a page dirty
>> without going through set_page_dirty() interface.
> 
> On s390 it's quick because the page state bits are stored in 2 places
> and need to be synced. On x86_64 it's very unclear and low level arch
> specific MM stuff but it is still a problem.
> 

The hard to hit part is really harming our test coverage...

Thanks,
Qu


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] btrfs: setup the page before calling any subpage helper
  2021-07-30 11:27   ` Qu Wenruo
@ 2021-08-12 16:06     ` David Sterba
  0 siblings, 0 replies; 4+ messages in thread
From: David Sterba @ 2021-08-12 16:06 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, linux-btrfs

On Fri, Jul 30, 2021 at 07:27:41PM +0800, Qu Wenruo wrote:
> On 2021/7/30 下午7:08, David Sterba wrote:
> > On Fri, Jul 30, 2021 at 01:58:57PM +0800, Qu Wenruo wrote:
> > Yeah it's hard to believe, but it's been there since almost the
> > beginning. Back then there was a hard BUG() in the fixup worker, I've
> > hit it randomly on x86_64,
> > 
> > https://lore.kernel.org/linux-btrfs/20111031154139.GF19328@twin.jikos.cz/
> > 
> > you could find a lot of other reports where it crashed inside
> > btrfs_writepage_fixup_worker.
> 
> Well, over 10 years ago.
> 
> But finally I have seen a real world report for it.
> 
> This makes me wonder, wouldn't other sites like iomap, which also 
> utilize a private structure for subpage bitmaps, also hit such crash?

I don't know.

> >> Fix it by moving set_page_extent_mapped() call before
> >> btrfs_page_clear_error().
> >> And make sure in the error path we won't call anything subpage helper.
> > 
> > I'm not sure about the fix, because the whole fixup thing is not
> > entirely clear.
> 
> Indeed, but the idea should be straightfoward:
> 
> Don't call any subpage helper before set_page_extent_mapped().
> 
> So that such page without private get passed in, we can still handle it 
> well.
> 
> > 
> >> Fixes: 32443de3382b ("btrfs: introduce btrfs_subpage for data inodes")
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >> I really hope we can have a more explicit comment about in exactly which
> >> cases we can have such page, and maybe some test cases for it.
> > 
> > The only reliable test case was on s390 with a particular seed for fsx,
> > I still have it stored somewhere. On x86_64 it's very hard to hit.
> 
> Can it be reproduced by S390 qemu tcc emulation?

I think the reproducer was deterministic, depending on the fsx seed it
created a file layout that at some point triggered the desync between hw
page and memory management page, requiring the fixup.

So even if the emulation is slow, it should work to verify the test. If
you set that up, you could also restore the BUG() inside the fixup
worker to see for yourself that this mysterious fixup thing is real.

> And by S390, did you mean modern Power8/9/10 systems too?

I haven't been testing on Power machines a lot so I don't know, it's
possible that the success rate hitting that would be similar as on
x86_64.

> >> In fact, I haven't really seen any case like this, and it doesn't really
> >> make sense for me to make some MM layer code to mark a page dirty
> >> without going through set_page_dirty() interface.
> > 
> > On s390 it's quick because the page state bits are stored in 2 places
> > and need to be synced. On x86_64 it's very unclear and low level arch
> > specific MM stuff but it is still a problem.
> 
> The hard to hit part is really harming our test coverage...

I don't disagree with that.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-08-12 16:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-30  5:58 [PATCH] btrfs: setup the page before calling any subpage helper Qu Wenruo
2021-07-30 11:08 ` David Sterba
2021-07-30 11:27   ` Qu Wenruo
2021-08-12 16:06     ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.