linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 5.10/5.15 v2 0/1 RFC] mm/truncate: fix WARNING in ext4_set_page_dirty()
@ 2024-01-25 13:09 Roman Smirnov
  2024-01-25 13:09 ` [PATCH 5.10/5.15 v2 1/1 RFC] mm/truncate: Replace page_mapped() call in invalidate_inode_page() Roman Smirnov
  2024-01-25 14:06 ` [PATCH 5.10/5.15 v2 0/1 RFC] mm/truncate: fix WARNING in ext4_set_page_dirty() Matthew Wilcox
  0 siblings, 2 replies; 8+ messages in thread
From: Roman Smirnov @ 2024-01-25 13:09 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: Roman Smirnov, Matthew Wilcox (Oracle),
	Andrew Morton, Alexey Khoroshilov, Sergey Shtylyov,
	Karina Yankevich, lvc-project, linux-fsdevel, linux-kernel,
	linux-mm

Syzkaller reports warning in ext4_set_page_dirty() in 5.10 and 5.15
stable releases. It happens because invalidate_inode_page() frees pages
that are needed for the system. To fix this we need to add additional
checks to the function. page_mapped() checks if a page exists in the 
page tables, but this is not enough. The page can be used in other places:
https://elixir.bootlin.com/linux/v6.8-rc1/source/include/linux/page_ref.h#L71

Kernel outputs an error line related to direct I/O:
https://syzkaller.appspot.com/text?tag=CrashLog&x=14ab52dac80000

The problem can be fixed in 5.10 and 5.15 stable releases by the 
following patch.

The patch replaces page_mapped() call with check that finds additional
references to the page excluding page cache and filesystem private data.
If additional references exist, the page cannot be freed.

This version does not include the first patch from the first version.
The problem can be fixed without it. 

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Link: https://syzkaller.appspot.com/bug?extid=02f21431b65c214aa1d6

Matthew Wilcox (Oracle) (1):
  mm/truncate: Replace page_mapped() call in invalidate_inode_page()

 mm/truncate.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 5.10/5.15 v2 1/1 RFC] mm/truncate: Replace page_mapped() call in invalidate_inode_page()
  2024-01-25 13:09 [PATCH 5.10/5.15 v2 0/1 RFC] mm/truncate: fix WARNING in ext4_set_page_dirty() Roman Smirnov
@ 2024-01-25 13:09 ` Roman Smirnov
  2024-01-25 14:06 ` [PATCH 5.10/5.15 v2 0/1 RFC] mm/truncate: fix WARNING in ext4_set_page_dirty() Matthew Wilcox
  1 sibling, 0 replies; 8+ messages in thread
From: Roman Smirnov @ 2024-01-25 13:09 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: Roman Smirnov, Matthew Wilcox (Oracle),
	Andrew Morton, Alexey Khoroshilov, Sergey Shtylyov,
	Karina Yankevich, lvc-project, linux-fsdevel, linux-kernel,
	linux-mm

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

commit e41c81d0d30e1a6ebf408feaf561f80cac4457dc upstream.

folio_mapped() is expensive because it has to check each page's mapcount
field.  A cheaper check is whether there are any extra references to
the page, other than the one we own, one from the page private data and
the ones held by the page cache.

The call to remove_mapping() will fail in any case if it cannot freeze
the refcount, but failing here avoids cycling the i_pages spinlock.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
[Roman: replaced folio_ref_count() call with page_ref_count(),
folio_nr_pages() call with compound_nr(), and
folio_has_private() call with page_has_private()]
Signed-off-by: Roman Smirnov <r.smirnov@omp.ru>
---
 mm/truncate.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index 8914ca4ce4b1..989bc7785d55 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -256,7 +256,9 @@ int invalidate_inode_page(struct page *page)
 		return 0;
 	if (PageDirty(page) || PageWriteback(page))
 		return 0;
-	if (page_mapped(page))
+	/* The refcount will be elevated if the page is used by the system */
+	if (page_ref_count(page) >
+			compound_nr(page) + page_has_private(page) + 1)
 		return 0;
 	return invalidate_complete_page(mapping, page);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 5.10/5.15 v2 0/1 RFC] mm/truncate: fix WARNING in ext4_set_page_dirty()
  2024-01-25 13:09 [PATCH 5.10/5.15 v2 0/1 RFC] mm/truncate: fix WARNING in ext4_set_page_dirty() Roman Smirnov
  2024-01-25 13:09 ` [PATCH 5.10/5.15 v2 1/1 RFC] mm/truncate: Replace page_mapped() call in invalidate_inode_page() Roman Smirnov
@ 2024-01-25 14:06 ` Matthew Wilcox
  2024-01-29  9:11   ` Jan Kara
  1 sibling, 1 reply; 8+ messages in thread
From: Matthew Wilcox @ 2024-01-25 14:06 UTC (permalink / raw)
  To: Roman Smirnov
  Cc: stable, Greg Kroah-Hartman, Andrew Morton, Alexey Khoroshilov,
	Sergey Shtylyov, Karina Yankevich, lvc-project, linux-fsdevel,
	linux-kernel, linux-mm, linux-ext4, Theodore Ts'o,
	Andreas Dilger, Jan Kara

On Thu, Jan 25, 2024 at 01:09:46PM +0000, Roman Smirnov wrote:
> Syzkaller reports warning in ext4_set_page_dirty() in 5.10 and 5.15
> stable releases. It happens because invalidate_inode_page() frees pages
> that are needed for the system. To fix this we need to add additional
> checks to the function. page_mapped() checks if a page exists in the 
> page tables, but this is not enough. The page can be used in other places:
> https://elixir.bootlin.com/linux/v6.8-rc1/source/include/linux/page_ref.h#L71
> 
> Kernel outputs an error line related to direct I/O:
> https://syzkaller.appspot.com/text?tag=CrashLog&x=14ab52dac80000

OK, this is making a lot more sense.

The invalidate_inode_page() path (after the page_mapped check) calls
try_to_release_page() which strips the buffers from the page.
__remove_mapping() tries to freeze the page and presuambly fails.

ext4 is checking there are still buffer heads attached to the page.
I'm not sure why it's doing that; it's legitimate to strip the
bufferheads from a page and then reattach them later (if they're
attached to a dirty page, they are created dirty).

So the only question in my mind is whether ext4 is right to have this
assert in the first place.  It seems wrong to me, but perhaps someone
from ext4 can explain why it's correct.

> The problem can be fixed in 5.10 and 5.15 stable releases by the 
> following patch.
> 
> The patch replaces page_mapped() call with check that finds additional
> references to the page excluding page cache and filesystem private data.
> If additional references exist, the page cannot be freed.
> 
> This version does not include the first patch from the first version.
> The problem can be fixed without it. 
> 
> Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
> 
> Link: https://syzkaller.appspot.com/bug?extid=02f21431b65c214aa1d6
> 
> Matthew Wilcox (Oracle) (1):
>   mm/truncate: Replace page_mapped() call in invalidate_inode_page()
> 
>  mm/truncate.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 5.10/5.15 v2 0/1 RFC] mm/truncate: fix WARNING in ext4_set_page_dirty()
  2024-01-25 14:06 ` [PATCH 5.10/5.15 v2 0/1 RFC] mm/truncate: fix WARNING in ext4_set_page_dirty() Matthew Wilcox
@ 2024-01-29  9:11   ` Jan Kara
  2024-01-29 14:41     ` Matthew Wilcox
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Kara @ 2024-01-29  9:11 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Roman Smirnov, stable, Greg Kroah-Hartman, Andrew Morton,
	Alexey Khoroshilov, Sergey Shtylyov, Karina Yankevich,
	lvc-project, linux-fsdevel, linux-kernel, linux-mm, linux-ext4,
	Theodore Ts'o, Andreas Dilger, Jan Kara

On Thu 25-01-24 14:06:58, Matthew Wilcox wrote:
> On Thu, Jan 25, 2024 at 01:09:46PM +0000, Roman Smirnov wrote:
> > Syzkaller reports warning in ext4_set_page_dirty() in 5.10 and 5.15
> > stable releases. It happens because invalidate_inode_page() frees pages
> > that are needed for the system. To fix this we need to add additional
> > checks to the function. page_mapped() checks if a page exists in the 
> > page tables, but this is not enough. The page can be used in other places:
> > https://elixir.bootlin.com/linux/v6.8-rc1/source/include/linux/page_ref.h#L71
> > 
> > Kernel outputs an error line related to direct I/O:
> > https://syzkaller.appspot.com/text?tag=CrashLog&x=14ab52dac80000
> 
> OK, this is making a lot more sense.
> 
> The invalidate_inode_page() path (after the page_mapped check) calls
> try_to_release_page() which strips the buffers from the page.
> __remove_mapping() tries to freeze the page and presuambly fails.

Yep, likely.

> ext4 is checking there are still buffer heads attached to the page.
> I'm not sure why it's doing that; it's legitimate to strip the
> bufferheads from a page and then reattach them later (if they're
> attached to a dirty page, they are created dirty).

Well, we really need to track dirtiness on per fs-block basis in ext4
(which makes a difference when blocksize < page size). For example for
delayed block allocation we reserve exactly as many blocks as we need
(which need not be all the blocks in the page e.g. when writing just one
block in the middle of a large hole). So when all buffers would be marked
as dirty we would overrun our reservation. Hence at the moment of dirtying
we really need buffers to be attached to the page and stay there until the
page is written back.
 
								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 5.10/5.15 v2 0/1 RFC] mm/truncate: fix WARNING in ext4_set_page_dirty()
  2024-01-29  9:11   ` Jan Kara
@ 2024-01-29 14:41     ` Matthew Wilcox
  2024-01-29 16:09       ` Jan Kara
  0 siblings, 1 reply; 8+ messages in thread
From: Matthew Wilcox @ 2024-01-29 14:41 UTC (permalink / raw)
  To: Jan Kara
  Cc: Roman Smirnov, stable, Greg Kroah-Hartman, Andrew Morton,
	Alexey Khoroshilov, Sergey Shtylyov, Karina Yankevich,
	lvc-project, linux-fsdevel, linux-kernel, linux-mm, linux-ext4,
	Theodore Ts'o, Andreas Dilger, Jan Kara

On Mon, Jan 29, 2024 at 10:11:24AM +0100, Jan Kara wrote:
> On Thu 25-01-24 14:06:58, Matthew Wilcox wrote:
> > On Thu, Jan 25, 2024 at 01:09:46PM +0000, Roman Smirnov wrote:
> > > Syzkaller reports warning in ext4_set_page_dirty() in 5.10 and 5.15
> > > stable releases. It happens because invalidate_inode_page() frees pages
> > > that are needed for the system. To fix this we need to add additional
> > > checks to the function. page_mapped() checks if a page exists in the 
> > > page tables, but this is not enough. The page can be used in other places:
> > > https://elixir.bootlin.com/linux/v6.8-rc1/source/include/linux/page_ref.h#L71
> > > 
> > > Kernel outputs an error line related to direct I/O:
> > > https://syzkaller.appspot.com/text?tag=CrashLog&x=14ab52dac80000
> > 
> > OK, this is making a lot more sense.
> > 
> > The invalidate_inode_page() path (after the page_mapped check) calls
> > try_to_release_page() which strips the buffers from the page.
> > __remove_mapping() tries to freeze the page and presuambly fails.
> 
> Yep, likely.
> 
> > ext4 is checking there are still buffer heads attached to the page.
> > I'm not sure why it's doing that; it's legitimate to strip the
> > bufferheads from a page and then reattach them later (if they're
> > attached to a dirty page, they are created dirty).
> 
> Well, we really need to track dirtiness on per fs-block basis in ext4
> (which makes a difference when blocksize < page size). For example for
> delayed block allocation we reserve exactly as many blocks as we need
> (which need not be all the blocks in the page e.g. when writing just one
> block in the middle of a large hole). So when all buffers would be marked
> as dirty we would overrun our reservation. Hence at the moment of dirtying
> we really need buffers to be attached to the page and stay there until the
> page is written back.

Thanks for the clear explanation!

Isn't the correct place to ensure that this is true in
ext4_release_folio()?  I think all paths to remove buffer_heads from a
folio go through ext4_release_folio() and so it can be prohibited here
if the folio is part of a delalloc extent?

I worry that the proposed fix here cuts off only one path to hitting
this WARN_ON and we need a more comprehensive fix.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 5.10/5.15 v2 0/1 RFC] mm/truncate: fix WARNING in ext4_set_page_dirty()
  2024-01-29 14:41     ` Matthew Wilcox
@ 2024-01-29 16:09       ` Jan Kara
  2024-02-13  7:07         ` Roman Smirnov
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Kara @ 2024-01-29 16:09 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jan Kara, Roman Smirnov, stable, Greg Kroah-Hartman,
	Andrew Morton, Alexey Khoroshilov, Sergey Shtylyov,
	Karina Yankevich, lvc-project, linux-fsdevel, linux-kernel,
	linux-mm, linux-ext4, Theodore Ts'o, Andreas Dilger,
	Jan Kara

On Mon 29-01-24 14:41:56, Matthew Wilcox wrote:
> On Mon, Jan 29, 2024 at 10:11:24AM +0100, Jan Kara wrote:
> > On Thu 25-01-24 14:06:58, Matthew Wilcox wrote:
> > > On Thu, Jan 25, 2024 at 01:09:46PM +0000, Roman Smirnov wrote:
> > > > Syzkaller reports warning in ext4_set_page_dirty() in 5.10 and 5.15
> > > > stable releases. It happens because invalidate_inode_page() frees pages
> > > > that are needed for the system. To fix this we need to add additional
> > > > checks to the function. page_mapped() checks if a page exists in the 
> > > > page tables, but this is not enough. The page can be used in other places:
> > > > https://elixir.bootlin.com/linux/v6.8-rc1/source/include/linux/page_ref.h#L71
> > > > 
> > > > Kernel outputs an error line related to direct I/O:
> > > > https://syzkaller.appspot.com/text?tag=CrashLog&x=14ab52dac80000
> > > 
> > > OK, this is making a lot more sense.
> > > 
> > > The invalidate_inode_page() path (after the page_mapped check) calls
> > > try_to_release_page() which strips the buffers from the page.
> > > __remove_mapping() tries to freeze the page and presuambly fails.
> > 
> > Yep, likely.
> > 
> > > ext4 is checking there are still buffer heads attached to the page.
> > > I'm not sure why it's doing that; it's legitimate to strip the
> > > bufferheads from a page and then reattach them later (if they're
> > > attached to a dirty page, they are created dirty).
> > 
> > Well, we really need to track dirtiness on per fs-block basis in ext4
> > (which makes a difference when blocksize < page size). For example for
> > delayed block allocation we reserve exactly as many blocks as we need
> > (which need not be all the blocks in the page e.g. when writing just one
> > block in the middle of a large hole). So when all buffers would be marked
> > as dirty we would overrun our reservation. Hence at the moment of dirtying
> > we really need buffers to be attached to the page and stay there until the
> > page is written back.
> 
> Thanks for the clear explanation!
> 
> Isn't the correct place to ensure that this is true in
> ext4_release_folio()?  I think all paths to remove buffer_heads from a
> folio go through ext4_release_folio() and so it can be prohibited here
> if the folio is part of a delalloc extent?

OK, I tried to keep it simple but now I have to go into more intricate
details of GUP and the IO path so please bear with me. Normally, how things
happen on write or page_mkwrite time is:

lock_page(page)
check we have buffers, create if not
do stuff with page
mark appropriate buffers (and thus the page) dirty
unlock_page(page)

Now the page and buffers are dirty so nothing can be freed as reclaim
doesn't touch such pages (and neither does try_to_free_buffers()). So we
are safe until page writeback time.

But GUP users such as direct IO are different. They do the page_mkwrite()
dance at GUP time so we are fine at that moment. But on direct IO
completion they recheck page dirty bits and call set_page_dirty() *again*
if they find the page has been cleaned in the mean time. And this is where
the problem really happens. If writeback of the pages serving as direct IO
buffer happen while the IO is running, buffers get cleaned, and can be
reclaimed, and we then hit the warning in ext4_set_page_dirty().

So what we really need is "don't reclaim page buffers if the page is pinned
by GUP". This is what MM checks in recent kernels (since d824ec2a15467 "mm:
do not reclaim private data from pinned page") and the patch discussed here
is effectively an equivalent of it for stable. So AFAICT it really closes
all the problematic paths. Sure we could implement that check in
ext4_release_folio() but I don't think there's a great reason for that.

And yes, technically I assume we could reconstruct the buffer state from
other data structures if we find the buffers are missing. But in
ext4_set_page_dirty() that is not easily possible as it may be called in
softirq context. And elsewhere it is prone to hiding other bugs we may
introduce. So just not stripping the buffer heads when the page is pinned
is by far the easiest solution for ext4, in particular for stable...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 5.10/5.15 v2 0/1 RFC] mm/truncate: fix WARNING in ext4_set_page_dirty()
  2024-01-29 16:09       ` Jan Kara
@ 2024-02-13  7:07         ` Roman Smirnov
  2024-02-13  9:43           ` Greg Kroah-Hartman
  0 siblings, 1 reply; 8+ messages in thread
From: Roman Smirnov @ 2024-02-13  7:07 UTC (permalink / raw)
  To: Jan Kara, Matthew Wilcox
  Cc: stable, Greg Kroah-Hartman, Andrew Morton, Alexey Khoroshilov,
	Sergey Shtylyov, Karina Yankevich, lvc-project, linux-fsdevel,
	linux-kernel, linux-mm, linux-ext4, Theodore Ts'o,
	Andreas Dilger, Jan Kara

Is there something else to do to make the patch accepted?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 5.10/5.15 v2 0/1 RFC] mm/truncate: fix WARNING in ext4_set_page_dirty()
  2024-02-13  7:07         ` Roman Smirnov
@ 2024-02-13  9:43           ` Greg Kroah-Hartman
  0 siblings, 0 replies; 8+ messages in thread
From: Greg Kroah-Hartman @ 2024-02-13  9:43 UTC (permalink / raw)
  To: Roman Smirnov
  Cc: Jan Kara, Matthew Wilcox, stable, Andrew Morton,
	Alexey Khoroshilov, Sergey Shtylyov, Karina Yankevich,
	lvc-project, linux-fsdevel, linux-kernel, linux-mm, linux-ext4,
	Theodore Ts'o, Andreas Dilger, Jan Kara

On Tue, Feb 13, 2024 at 07:07:18AM +0000, Roman Smirnov wrote:
> Is there something else to do to make the patch accepted?

What patch?  No context here...

Also, for obvious reasons, we don't apply "RFC" patches as you yourself
don't think they are good enough to be merged :(

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-02-13  9:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-25 13:09 [PATCH 5.10/5.15 v2 0/1 RFC] mm/truncate: fix WARNING in ext4_set_page_dirty() Roman Smirnov
2024-01-25 13:09 ` [PATCH 5.10/5.15 v2 1/1 RFC] mm/truncate: Replace page_mapped() call in invalidate_inode_page() Roman Smirnov
2024-01-25 14:06 ` [PATCH 5.10/5.15 v2 0/1 RFC] mm/truncate: fix WARNING in ext4_set_page_dirty() Matthew Wilcox
2024-01-29  9:11   ` Jan Kara
2024-01-29 14:41     ` Matthew Wilcox
2024-01-29 16:09       ` Jan Kara
2024-02-13  7:07         ` Roman Smirnov
2024-02-13  9:43           ` Greg Kroah-Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).