[RFC PATCH] filemap: add mapping_mapped check in filemap_unaccount

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH] filemap: add mapping_mapped check in filemap_unaccount_folio()
@ 2024-01-19  9:20 Peng Zhang
  2024-01-19 13:40 ` Matthew Wilcox
  0 siblings, 1 reply; 13+ messages in thread
From: Peng Zhang @ 2024-01-19  9:20 UTC (permalink / raw)
  To: linux-mm, linux-fsdevel, netdev
  Cc: willy, akpm, edumazet, davem, dsahern, kuba, pabeni, arjunroy,
	wangkefeng.wang

From: ZhangPeng <zhangpeng362@huawei.com>

Recently, we discovered a syzkaller issue that triggers
VM_BUG_ON_FOLIO in filemap_unaccount_folio() with CONFIG_DEBUG_VM
enabled, or bad page without CONFIG_DEBUG_VM.

The specific scenarios are as follows:
(1) mmap: Use socket fd to create a TCP VMA.
(2) open(O_CREAT) + fallocate + sendfile: Read the ext4 file and create
the page cache. The mapping of the page cache is ext4 inode->i_mapping.
Send the ext4 page cache to the socket fd through sendfile.
(3) getsockopt TCP_ZEROCOPY_RECEIVE: Receive the ext4 page cache and use
vm_insert_pages() to insert the ext4 page cache to the TCP VMA. In this
case, mapcount changes from - 1 to 0. The page cache mapping is ext4
inode->i_mapping, but the VMA of the page cache is the TCP VMA and
folio->mapping->i_mmap is empty.
(4) open(O_TRUNC): Deletes the ext4 page cache. In this case, the page
cache is still in the xarray tree of mapping->i_pages and these page
cache should also be deleted. However, folio->mapping->i_mmap is empty.
Therefore, truncate_cleanup_folio()->unmap_mapping_folio() can't unmap
i_mmap tree. In filemap_unaccount_folio(), the mapcount of the folio is
0, causing BUG ON.

Syz log that can be used to reproduce the issue:
r3 = socket$inet_tcp(0x2, 0x1, 0x0)
mmap(&(0x7f0000ff9000/0x4000)=nil, 0x4000, 0x0, 0x12, r3, 0x0)
r4 = socket$inet_tcp(0x2, 0x1, 0x0)
bind$inet(r4, &(0x7f0000000000)={0x2, 0x4e24, @multicast1}, 0x10)
connect$inet(r4, &(0x7f00000006c0)={0x2, 0x4e24, @empty}, 0x10)
r5 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
0x181e42, 0x0)
fallocate(r5, 0x0, 0x0, 0x85b8)
sendfile(r4, r5, 0x0, 0x8ba0)
getsockopt$inet_tcp_TCP_ZEROCOPY_RECEIVE(r4, 0x6, 0x23,
&(0x7f00000001c0)={&(0x7f0000ffb000/0x3000)=nil, 0x3000, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0}, &(0x7f0000000440)=0x40)
r6 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
0x181e42, 0x0)

In the current TCP zerocopy scenario, folio will be released normally .
When the process exits, if the page cache is truncated before the
process exits, BUG ON or Bad page occurs, which does not meet the
expectation.
To fix this issue, the mapping_mapped() check is added to
filemap_unaccount_folio(). In addition, to reduce the impact on
performance, no lock is added when mapping_mapped() is checked.

Signed-off-by: ZhangPeng <zhangpeng362@huawei.com>
---
 mm/filemap.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index ea49677c6338..6a669eb24816 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -148,10 +148,11 @@ static void page_cache_delete(struct address_space *mapping,
 static void filemap_unaccount_folio(struct address_space *mapping,
 		struct folio *folio)
 {
+	bool mapped = folio_mapped(folio) && mapping_mapped(mapping);
 	long nr;

-	VM_BUG_ON_FOLIO(folio_mapped(folio), folio);
-	if (!IS_ENABLED(CONFIG_DEBUG_VM) && unlikely(folio_mapped(folio))) {
+	VM_BUG_ON_FOLIO(mapped, folio);
+	if (!IS_ENABLED(CONFIG_DEBUG_VM) && unlikely(mapped)) {
 		pr_alert("BUG: Bad page cache in process %s  pfn:%05lx\n",
 			 current->comm, folio_pfn(folio));
 		dump_page(&folio->page, "still mapped when deleted");
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] filemap: add mapping_mapped check in filemap_unaccount_folio()
  2024-01-19  9:20 [RFC PATCH] filemap: add mapping_mapped check in filemap_unaccount_folio() Peng Zhang
@ 2024-01-19 13:40 ` Matthew Wilcox
  2024-01-20  6:46   ` zhangpeng (AS)
  0 siblings, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2024-01-19 13:40 UTC (permalink / raw)
  To: Peng Zhang
  Cc: linux-mm, linux-fsdevel, netdev, akpm, edumazet, davem, dsahern,
	kuba, pabeni, arjunroy, wangkefeng.wang

On Fri, Jan 19, 2024 at 05:20:24PM +0800, Peng Zhang wrote:
> Recently, we discovered a syzkaller issue that triggers
> VM_BUG_ON_FOLIO in filemap_unaccount_folio() with CONFIG_DEBUG_VM
> enabled, or bad page without CONFIG_DEBUG_VM.
> 
> The specific scenarios are as follows:
> (1) mmap: Use socket fd to create a TCP VMA.
> (2) open(O_CREAT) + fallocate + sendfile: Read the ext4 file and create
> the page cache. The mapping of the page cache is ext4 inode->i_mapping.
> Send the ext4 page cache to the socket fd through sendfile.
> (3) getsockopt TCP_ZEROCOPY_RECEIVE: Receive the ext4 page cache and use
> vm_insert_pages() to insert the ext4 page cache to the TCP VMA. In this
> case, mapcount changes from - 1 to 0. The page cache mapping is ext4
> inode->i_mapping, but the VMA of the page cache is the TCP VMA and
> folio->mapping->i_mmap is empty.

I think this is the bug.  We shouldn't be incrementing the mapcount
in this scenario.  Assuming we want to support doing this at all and
we don't want to include something like ...

	if (folio->mapping) {
		if (folio->mapping != vma->vm_file->f_mapping)
			return -EINVAL;
		if (page_to_pgoff(page) != linear_page_index(vma, address))
			return -EINVAL;
	}

But maybe there's a reason for networking needing to map pages in this
scenario?

> (4) open(O_TRUNC): Deletes the ext4 page cache. In this case, the page
> cache is still in the xarray tree of mapping->i_pages and these page
> cache should also be deleted. However, folio->mapping->i_mmap is empty.
> Therefore, truncate_cleanup_folio()->unmap_mapping_folio() can't unmap
> i_mmap tree. In filemap_unaccount_folio(), the mapcount of the folio is
> 0, causing BUG ON.
> 
> Syz log that can be used to reproduce the issue:
> r3 = socket$inet_tcp(0x2, 0x1, 0x0)
> mmap(&(0x7f0000ff9000/0x4000)=nil, 0x4000, 0x0, 0x12, r3, 0x0)
> r4 = socket$inet_tcp(0x2, 0x1, 0x0)
> bind$inet(r4, &(0x7f0000000000)={0x2, 0x4e24, @multicast1}, 0x10)
> connect$inet(r4, &(0x7f00000006c0)={0x2, 0x4e24, @empty}, 0x10)
> r5 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
> 0x181e42, 0x0)
> fallocate(r5, 0x0, 0x0, 0x85b8)
> sendfile(r4, r5, 0x0, 0x8ba0)
> getsockopt$inet_tcp_TCP_ZEROCOPY_RECEIVE(r4, 0x6, 0x23,
> &(0x7f00000001c0)={&(0x7f0000ffb000/0x3000)=nil, 0x3000, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0}, &(0x7f0000000440)=0x40)
> r6 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
> 0x181e42, 0x0)
> 
> In the current TCP zerocopy scenario, folio will be released normally .
> When the process exits, if the page cache is truncated before the
> process exits, BUG ON or Bad page occurs, which does not meet the
> expectation.
> To fix this issue, the mapping_mapped() check is added to
> filemap_unaccount_folio(). In addition, to reduce the impact on
> performance, no lock is added when mapping_mapped() is checked.

NAK this patch, you're just preventing the assertion from firing.
I think there's a deeper problem here.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] filemap: add mapping_mapped check in filemap_unaccount_folio()
  2024-01-19 13:40 ` Matthew Wilcox
@ 2024-01-20  6:46   ` zhangpeng (AS)
  2024-01-22 16:04     ` SECURITY PROBLEM: Any user can crash the kernel with TCP ZEROCOPY Matthew Wilcox
  0 siblings, 1 reply; 13+ messages in thread
From: zhangpeng (AS) @ 2024-01-20  6:46 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, linux-fsdevel, netdev, akpm, edumazet, davem, dsahern,
	kuba, pabeni, arjunroy, wangkefeng.wang

On 2024/1/19 21:40, Matthew Wilcox wrote:

> On Fri, Jan 19, 2024 at 05:20:24PM +0800, Peng Zhang wrote:
>> Recently, we discovered a syzkaller issue that triggers
>> VM_BUG_ON_FOLIO in filemap_unaccount_folio() with CONFIG_DEBUG_VM
>> enabled, or bad page without CONFIG_DEBUG_VM.
>>
>> The specific scenarios are as follows:
>> (1) mmap: Use socket fd to create a TCP VMA.
>> (2) open(O_CREAT) + fallocate + sendfile: Read the ext4 file and create
>> the page cache. The mapping of the page cache is ext4 inode->i_mapping.
>> Send the ext4 page cache to the socket fd through sendfile.
>> (3) getsockopt TCP_ZEROCOPY_RECEIVE: Receive the ext4 page cache and use
>> vm_insert_pages() to insert the ext4 page cache to the TCP VMA. In this
>> case, mapcount changes from - 1 to 0. The page cache mapping is ext4
>> inode->i_mapping, but the VMA of the page cache is the TCP VMA and
>> folio->mapping->i_mmap is empty.
> I think this is the bug.  We shouldn't be incrementing the mapcount
> in this scenario.  Assuming we want to support doing this at all and
> we don't want to include something like ...
>
> 	if (folio->mapping) {
> 		if (folio->mapping != vma->vm_file->f_mapping)
> 			return -EINVAL;
> 		if (page_to_pgoff(page) != linear_page_index(vma, address))
> 			return -EINVAL;
> 	}
>
> But maybe there's a reason for networking needing to map pages in this
> scenario?

Agreed, and I'm also curious why.

>> (4) open(O_TRUNC): Deletes the ext4 page cache. In this case, the page
>> cache is still in the xarray tree of mapping->i_pages and these page
>> cache should also be deleted. However, folio->mapping->i_mmap is empty.
>> Therefore, truncate_cleanup_folio()->unmap_mapping_folio() can't unmap
>> i_mmap tree. In filemap_unaccount_folio(), the mapcount of the folio is
>> 0, causing BUG ON.
>>
>> Syz log that can be used to reproduce the issue:
>> r3 = socket$inet_tcp(0x2, 0x1, 0x0)
>> mmap(&(0x7f0000ff9000/0x4000)=nil, 0x4000, 0x0, 0x12, r3, 0x0)
>> r4 = socket$inet_tcp(0x2, 0x1, 0x0)
>> bind$inet(r4, &(0x7f0000000000)={0x2, 0x4e24, @multicast1}, 0x10)
>> connect$inet(r4, &(0x7f00000006c0)={0x2, 0x4e24, @empty}, 0x10)
>> r5 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
>> 0x181e42, 0x0)
>> fallocate(r5, 0x0, 0x0, 0x85b8)
>> sendfile(r4, r5, 0x0, 0x8ba0)
>> getsockopt$inet_tcp_TCP_ZEROCOPY_RECEIVE(r4, 0x6, 0x23,
>> &(0x7f00000001c0)={&(0x7f0000ffb000/0x3000)=nil, 0x3000, 0x0, 0x0, 0x0,
>> 0x0, 0x0, 0x0, 0x0}, &(0x7f0000000440)=0x40)
>> r6 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
>> 0x181e42, 0x0)
>>
>> In the current TCP zerocopy scenario, folio will be released normally .
>> When the process exits, if the page cache is truncated before the
>> process exits, BUG ON or Bad page occurs, which does not meet the
>> expectation.
>> To fix this issue, the mapping_mapped() check is added to
>> filemap_unaccount_folio(). In addition, to reduce the impact on
>> performance, no lock is added when mapping_mapped() is checked.
> NAK this patch, you're just preventing the assertion from firing.
> I think there's a deeper problem here.

-- 
Best Regards,
Peng


^ permalink raw reply	[flat|nested] 13+ messages in thread

* SECURITY PROBLEM: Any user can crash the kernel with TCP ZEROCOPY
  2024-01-20  6:46   ` zhangpeng (AS)
@ 2024-01-22 16:04     ` Matthew Wilcox
  2024-01-22 16:30       ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2024-01-22 16:04 UTC (permalink / raw)
  To: zhangpeng (AS)
  Cc: linux-mm, linux-fsdevel, netdev, akpm, edumazet, davem, dsahern,
	kuba, pabeni, arjunroy, wangkefeng.wang

I'm disappointed to have no reaction from netdev so far.  Let's see if a
more exciting subject line evinces some interest.

On Sat, Jan 20, 2024 at 02:46:49PM +0800, zhangpeng (AS) wrote:
> On 2024/1/19 21:40, Matthew Wilcox wrote:
> 
> > On Fri, Jan 19, 2024 at 05:20:24PM +0800, Peng Zhang wrote:
> > > Recently, we discovered a syzkaller issue that triggers
> > > VM_BUG_ON_FOLIO in filemap_unaccount_folio() with CONFIG_DEBUG_VM
> > > enabled, or bad page without CONFIG_DEBUG_VM.
> > > 
> > > The specific scenarios are as follows:
> > > (1) mmap: Use socket fd to create a TCP VMA.
> > > (2) open(O_CREAT) + fallocate + sendfile: Read the ext4 file and create
> > > the page cache. The mapping of the page cache is ext4 inode->i_mapping.
> > > Send the ext4 page cache to the socket fd through sendfile.
> > > (3) getsockopt TCP_ZEROCOPY_RECEIVE: Receive the ext4 page cache and use
> > > vm_insert_pages() to insert the ext4 page cache to the TCP VMA. In this
> > > case, mapcount changes from - 1 to 0. The page cache mapping is ext4
> > > inode->i_mapping, but the VMA of the page cache is the TCP VMA and
> > > folio->mapping->i_mmap is empty.
> > I think this is the bug.  We shouldn't be incrementing the mapcount
> > in this scenario.  Assuming we want to support doing this at all and
> > we don't want to include something like ...
> > 
> > 	if (folio->mapping) {
> > 		if (folio->mapping != vma->vm_file->f_mapping)
> > 			return -EINVAL;
> > 		if (page_to_pgoff(page) != linear_page_index(vma, address))
> > 			return -EINVAL;
> > 	}
> > 
> > But maybe there's a reason for networking needing to map pages in this
> > scenario?
> 
> Agreed, and I'm also curious why.
> 
> > > (4) open(O_TRUNC): Deletes the ext4 page cache. In this case, the page
> > > cache is still in the xarray tree of mapping->i_pages and these page
> > > cache should also be deleted. However, folio->mapping->i_mmap is empty.
> > > Therefore, truncate_cleanup_folio()->unmap_mapping_folio() can't unmap
> > > i_mmap tree. In filemap_unaccount_folio(), the mapcount of the folio is
> > > 0, causing BUG ON.
> > > 
> > > Syz log that can be used to reproduce the issue:
> > > r3 = socket$inet_tcp(0x2, 0x1, 0x0)
> > > mmap(&(0x7f0000ff9000/0x4000)=nil, 0x4000, 0x0, 0x12, r3, 0x0)
> > > r4 = socket$inet_tcp(0x2, 0x1, 0x0)
> > > bind$inet(r4, &(0x7f0000000000)={0x2, 0x4e24, @multicast1}, 0x10)
> > > connect$inet(r4, &(0x7f00000006c0)={0x2, 0x4e24, @empty}, 0x10)
> > > r5 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
> > > 0x181e42, 0x0)
> > > fallocate(r5, 0x0, 0x0, 0x85b8)
> > > sendfile(r4, r5, 0x0, 0x8ba0)
> > > getsockopt$inet_tcp_TCP_ZEROCOPY_RECEIVE(r4, 0x6, 0x23,
> > > &(0x7f00000001c0)={&(0x7f0000ffb000/0x3000)=nil, 0x3000, 0x0, 0x0, 0x0,
> > > 0x0, 0x0, 0x0, 0x0}, &(0x7f0000000440)=0x40)
> > > r6 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
> > > 0x181e42, 0x0)
> > > 
> > > In the current TCP zerocopy scenario, folio will be released normally .
> > > When the process exits, if the page cache is truncated before the
> > > process exits, BUG ON or Bad page occurs, which does not meet the
> > > expectation.
> > > To fix this issue, the mapping_mapped() check is added to
> > > filemap_unaccount_folio(). In addition, to reduce the impact on
> > > performance, no lock is added when mapping_mapped() is checked.
> > NAK this patch, you're just preventing the assertion from firing.
> > I think there's a deeper problem here.
> 
> -- 
> Best Regards,
> Peng
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SECURITY PROBLEM: Any user can crash the kernel with TCP ZEROCOPY
  2024-01-22 16:04     ` SECURITY PROBLEM: Any user can crash the kernel with TCP ZEROCOPY Matthew Wilcox
@ 2024-01-22 16:30       ` Eric Dumazet
  2024-01-22 17:12         ` Matthew Wilcox
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2024-01-22 16:30 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: zhangpeng (AS),
	linux-mm, linux-fsdevel, netdev, akpm, davem, dsahern, kuba,
	pabeni, arjunroy, wangkefeng.wang

On Mon, Jan 22, 2024 at 5:04 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> I'm disappointed to have no reaction from netdev so far.  Let's see if a
> more exciting subject line evinces some interest.

Hmm, perhaps some of us were enjoying their weekend ?

I also see '[RFC PATCH] filemap: add mapping_mapped check in
filemap_unaccount_folio()',
and during the merge window, network maintainers tend to prioritize
their work based on tags.

If a stack trace was added, perhaps our attention would have been caught.

I don't really know what changed recently, all I know is that TCP zero
copy is for real network traffic.

Real trafic uses order-0 pages, 4K at a time.

If can_map_frag() needs to add another safety check, let's add it.

syzbot is usually quite good at bisections, was a bug origin found ?


>
> On Sat, Jan 20, 2024 at 02:46:49PM +0800, zhangpeng (AS) wrote:
> > On 2024/1/19 21:40, Matthew Wilcox wrote:
> >
> > > On Fri, Jan 19, 2024 at 05:20:24PM +0800, Peng Zhang wrote:
> > > > Recently, we discovered a syzkaller issue that triggers
> > > > VM_BUG_ON_FOLIO in filemap_unaccount_folio() with CONFIG_DEBUG_VM
> > > > enabled, or bad page without CONFIG_DEBUG_VM.
> > > >
> > > > The specific scenarios are as follows:
> > > > (1) mmap: Use socket fd to create a TCP VMA.
> > > > (2) open(O_CREAT) + fallocate + sendfile: Read the ext4 file and create
> > > > the page cache. The mapping of the page cache is ext4 inode->i_mapping.
> > > > Send the ext4 page cache to the socket fd through sendfile.
> > > > (3) getsockopt TCP_ZEROCOPY_RECEIVE: Receive the ext4 page cache and use
> > > > vm_insert_pages() to insert the ext4 page cache to the TCP VMA. In this
> > > > case, mapcount changes from - 1 to 0. The page cache mapping is ext4
> > > > inode->i_mapping, but the VMA of the page cache is the TCP VMA and
> > > > folio->mapping->i_mmap is empty.
> > > I think this is the bug.  We shouldn't be incrementing the mapcount
> > > in this scenario.  Assuming we want to support doing this at all and
> > > we don't want to include something like ...
> > >
> > >     if (folio->mapping) {
> > >             if (folio->mapping != vma->vm_file->f_mapping)
> > >                     return -EINVAL;
> > >             if (page_to_pgoff(page) != linear_page_index(vma, address))
> > >                     return -EINVAL;
> > >     }
> > >
> > > But maybe there's a reason for networking needing to map pages in this
> > > scenario?
> >
> > Agreed, and I'm also curious why.
> >
> > > > (4) open(O_TRUNC): Deletes the ext4 page cache. In this case, the page
> > > > cache is still in the xarray tree of mapping->i_pages and these page
> > > > cache should also be deleted. However, folio->mapping->i_mmap is empty.
> > > > Therefore, truncate_cleanup_folio()->unmap_mapping_folio() can't unmap
> > > > i_mmap tree. In filemap_unaccount_folio(), the mapcount of the folio is
> > > > 0, causing BUG ON.
> > > >
> > > > Syz log that can be used to reproduce the issue:
> > > > r3 = socket$inet_tcp(0x2, 0x1, 0x0)
> > > > mmap(&(0x7f0000ff9000/0x4000)=nil, 0x4000, 0x0, 0x12, r3, 0x0)
> > > > r4 = socket$inet_tcp(0x2, 0x1, 0x0)
> > > > bind$inet(r4, &(0x7f0000000000)={0x2, 0x4e24, @multicast1}, 0x10)
> > > > connect$inet(r4, &(0x7f00000006c0)={0x2, 0x4e24, @empty}, 0x10)
> > > > r5 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
> > > > 0x181e42, 0x0)
> > > > fallocate(r5, 0x0, 0x0, 0x85b8)
> > > > sendfile(r4, r5, 0x0, 0x8ba0)
> > > > getsockopt$inet_tcp_TCP_ZEROCOPY_RECEIVE(r4, 0x6, 0x23,
> > > > &(0x7f00000001c0)={&(0x7f0000ffb000/0x3000)=nil, 0x3000, 0x0, 0x0, 0x0,
> > > > 0x0, 0x0, 0x0, 0x0}, &(0x7f0000000440)=0x40)
> > > > r6 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
> > > > 0x181e42, 0x0)
> > > >
> > > > In the current TCP zerocopy scenario, folio will be released normally .
> > > > When the process exits, if the page cache is truncated before the
> > > > process exits, BUG ON or Bad page occurs, which does not meet the
> > > > expectation.
> > > > To fix this issue, the mapping_mapped() check is added to
> > > > filemap_unaccount_folio(). In addition, to reduce the impact on
> > > > performance, no lock is added when mapping_mapped() is checked.
> > > NAK this patch, you're just preventing the assertion from firing.
> > > I think there's a deeper problem here.
> >
> > --
> > Best Regards,
> > Peng
> >
> >

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SECURITY PROBLEM: Any user can crash the kernel with TCP ZEROCOPY
  2024-01-22 16:30       ` Eric Dumazet
@ 2024-01-22 17:12         ` Matthew Wilcox
  2024-01-22 17:39           ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2024-01-22 17:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: zhangpeng (AS),
	linux-mm, linux-fsdevel, netdev, akpm, davem, dsahern, kuba,
	pabeni, arjunroy, wangkefeng.wang

On Mon, Jan 22, 2024 at 05:30:18PM +0100, Eric Dumazet wrote:
> On Mon, Jan 22, 2024 at 5:04 PM Matthew Wilcox <willy@infradead.org> wrote:
> > I'm disappointed to have no reaction from netdev so far.  Let's see if a
> > more exciting subject line evinces some interest.
> 
> Hmm, perhaps some of us were enjoying their weekend ?

I am all in favour of people taking time off!  However the report came
in on Friday at 9am UTC so it had been more than a work day for anyone
anywhere in the world without response.

> I don't really know what changed recently, all I know is that TCP zero
> copy is for real network traffic.
> 
> Real trafic uses order-0 pages, 4K at a time.
> 
> If can_map_frag() needs to add another safety check, let's add it.

So it's your opinion that people don't actually use sendfile() from
a local file, and we can make this fail to zerocopy?  That's good
because I had a slew of questions about what expectations we had around
cache coherency between pages mapped this way and write()/mmap() of
the original file.  If we can just disallow this, we don't need to
have a discussion about it.

> syzbot is usually quite good at bisections, was a bug origin found ?

I have the impression that Huawei run syzkaller themselves without
syzbot.  I suspect this bug has been there for a good long time.
Wonder why nobody's found it before; it doesn't seem complicated for a
fuzzer to stumble into.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SECURITY PROBLEM: Any user can crash the kernel with TCP ZEROCOPY
  2024-01-22 17:12         ` Matthew Wilcox
@ 2024-01-22 17:39           ` Eric Dumazet
  2024-01-24  9:30             ` zhangpeng (AS)
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2024-01-22 17:39 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: zhangpeng (AS),
	linux-mm, linux-fsdevel, netdev, akpm, davem, dsahern, kuba,
	pabeni, arjunroy, wangkefeng.wang

On Mon, Jan 22, 2024 at 6:12 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Jan 22, 2024 at 05:30:18PM +0100, Eric Dumazet wrote:
> > On Mon, Jan 22, 2024 at 5:04 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > I'm disappointed to have no reaction from netdev so far.  Let's see if a
> > > more exciting subject line evinces some interest.
> >
> > Hmm, perhaps some of us were enjoying their weekend ?
>
> I am all in favour of people taking time off!  However the report came
> in on Friday at 9am UTC so it had been more than a work day for anyone
> anywhere in the world without response.
>
> > I don't really know what changed recently, all I know is that TCP zero
> > copy is for real network traffic.
> >
> > Real trafic uses order-0 pages, 4K at a time.
> >
> > If can_map_frag() needs to add another safety check, let's add it.
>
> So it's your opinion that people don't actually use sendfile() from
> a local file, and we can make this fail to zerocopy?

Certainly we do not do that at Google.
I am not sure if anybody else would have used this.



 That's good
> because I had a slew of questions about what expectations we had around
> cache coherency between pages mapped this way and write()/mmap() of
> the original file.  If we can just disallow this, we don't need to
> have a discussion about it.
>
> > syzbot is usually quite good at bisections, was a bug origin found ?
>
> I have the impression that Huawei run syzkaller themselves without
> syzbot.  I suspect this bug has been there for a good long time.
> Wonder why nobody's found it before; it doesn't seem complicated for a
> fuzzer to stumble into.

I is strange syzbot (The Google fuzzer) have not found this yet, I
suspect it might be caused
by a recent change somewhere ?

A repro would definitely help, I could start a bisection.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SECURITY PROBLEM: Any user can crash the kernel with TCP ZEROCOPY
  2024-01-22 17:39           ` Eric Dumazet
@ 2024-01-24  9:30             ` zhangpeng (AS)
  2024-01-24 10:11               ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: zhangpeng (AS) @ 2024-01-24  9:30 UTC (permalink / raw)
  To: Eric Dumazet, Matthew Wilcox
  Cc: linux-mm, linux-fsdevel, netdev, akpm, davem, dsahern, kuba,
	pabeni, arjunroy, wangkefeng.wang

On 2024/1/23 1:39, Eric Dumazet wrote:

> On Mon, Jan 22, 2024 at 6:12 PM Matthew Wilcox<willy@infradead.org>  wrote:
>> On Mon, Jan 22, 2024 at 05:30:18PM +0100, Eric Dumazet wrote:
>>> On Mon, Jan 22, 2024 at 5:04 PM Matthew Wilcox<willy@infradead.org>  wrote:
>>>> I'm disappointed to have no reaction from netdev so far.  Let's see if a
>>>> more exciting subject line evinces some interest.
>>> Hmm, perhaps some of us were enjoying their weekend ?
>> I am all in favour of people taking time off!  However the report came
>> in on Friday at 9am UTC so it had been more than a work day for anyone
>> anywhere in the world without response.
>>
>>> I don't really know what changed recently, all I know is that TCP zero
>>> copy is for real network traffic.
>>>
>>> Real trafic uses order-0 pages, 4K at a time.
>>>
>>> If can_map_frag() needs to add another safety check, let's add it.
>> So it's your opinion that people don't actually use sendfile() from
>> a local file, and we can make this fail to zerocopy?
> Certainly we do not do that at Google.
> I am not sure if anybody else would have used this.
>
>
>
>   That's good
>> because I had a slew of questions about what expectations we had around
>> cache coherency between pages mapped this way and write()/mmap() of
>> the original file.  If we can just disallow this, we don't need to
>> have a discussion about it.
>>
>>> syzbot is usually quite good at bisections, was a bug origin found ?
>> I have the impression that Huawei run syzkaller themselves without
>> syzbot.  I suspect this bug has been there for a good long time.
>> Wonder why nobody's found it before; it doesn't seem complicated for a
>> fuzzer to stumble into.
> I is strange syzbot (The Google fuzzer) have not found this yet, I
> suspect it might be caused
> by a recent change somewhere ?
>
> A repro would definitely help, I could start a bisection.

By using git-bisect, the patch that introduces this issue is 05255b823a617
("tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive."). v4.18-rc1.

Currently, there are no other repro or c reproduction programs can reproduce
the issue. The syz log used to reproduce the issue is as follows:

r3 = socket$inet_tcp(0x2, 0x1, 0x0)
mmap(&(0x7f0000ff9000/0x4000)=nil, 0x4000, 0x0, 0x12, r3, 0x0)
r4 = socket$inet_tcp(0x2, 0x1, 0x0)
bind$inet(r4, &(0x7f0000000000)={0x2, 0x4e24, @multicast1}, 0x10)
connect$inet(r4, &(0x7f00000006c0)={0x2, 0x4e24, @empty}, 0x10)
r5 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
0x181e42, 0x0)
fallocate(r5, 0x0, 0x0, 0x85b8818)
sendfile(r4, r5, 0x0, 0x3000)
getsockopt$inet_tcp_TCP_ZEROCOPY_RECEIVE(r4, 0x6, 0x23,
&(0x7f00000001c0)={&(0x7f0000ffb000/0x3000)=nil, 0x3000, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0}, &(0x7f0000000440)=0x10)
r6 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
0x181e42, 0x0)

-- 
Best Regards,
Peng


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SECURITY PROBLEM: Any user can crash the kernel with TCP ZEROCOPY
  2024-01-24  9:30             ` zhangpeng (AS)
@ 2024-01-24 10:11               ` Eric Dumazet
  2024-01-25  2:18                 ` zhangpeng (AS)
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2024-01-24 10:11 UTC (permalink / raw)
  To: zhangpeng (AS)
  Cc: Matthew Wilcox, linux-mm, linux-fsdevel, netdev, akpm, davem,
	dsahern, kuba, pabeni, arjunroy, wangkefeng.wang

On Wed, Jan 24, 2024 at 10:30 AM zhangpeng (AS) <zhangpeng362@huawei.com> wrote:
>
>
> By using git-bisect, the patch that introduces this issue is 05255b823a617
> ("tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive."). v4.18-rc1.
>
> Currently, there are no other repro or c reproduction programs can reproduce
> the issue. The syz log used to reproduce the issue is as follows:
>
> r3 = socket$inet_tcp(0x2, 0x1, 0x0)
> mmap(&(0x7f0000ff9000/0x4000)=nil, 0x4000, 0x0, 0x12, r3, 0x0)
> r4 = socket$inet_tcp(0x2, 0x1, 0x0)
> bind$inet(r4, &(0x7f0000000000)={0x2, 0x4e24, @multicast1}, 0x10)
> connect$inet(r4, &(0x7f00000006c0)={0x2, 0x4e24, @empty}, 0x10)
> r5 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
> 0x181e42, 0x0)
> fallocate(r5, 0x0, 0x0, 0x85b8818)
> sendfile(r4, r5, 0x0, 0x3000)
> getsockopt$inet_tcp_TCP_ZEROCOPY_RECEIVE(r4, 0x6, 0x23,
> &(0x7f00000001c0)={&(0x7f0000ffb000/0x3000)=nil, 0x3000, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0}, &(0x7f0000000440)=0x10)
> r6 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
> 0x181e42, 0x0)
>

Could you try the following fix then ?

(We also could remove the !skb_frag_off(frag) condition, as the
!PageCompound() is necessary it seems :/)

Thanks a lot !

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 1baa484d21902d2492fc2830d960100dc09683bf..ee954ae7778a651a9da4de057e3bafe35a6e10d6
100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1785,7 +1785,9 @@ static skb_frag_t *skb_advance_to_frag(struct
sk_buff *skb, u32 offset_skb,

 static bool can_map_frag(const skb_frag_t *frag)
 {
-       return skb_frag_size(frag) == PAGE_SIZE && !skb_frag_off(frag);
+       return skb_frag_size(frag) == PAGE_SIZE &&
+              !skb_frag_off(frag) &&
+              !PageCompound(skb_frag_page(frag));
 }

 static int find_next_mappable_frag(const skb_frag_t *frag,

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SECURITY PROBLEM: Any user can crash the kernel with TCP ZEROCOPY
  2024-01-24 10:11               ` Eric Dumazet
@ 2024-01-25  2:18                 ` zhangpeng (AS)
  2024-01-25  8:57                   ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: zhangpeng (AS) @ 2024-01-25  2:18 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Matthew Wilcox, linux-mm, linux-fsdevel, netdev, akpm, davem,
	dsahern, kuba, pabeni, arjunroy, wangkefeng.wang

On 2024/1/24 18:11, Eric Dumazet wrote:

> On Wed, Jan 24, 2024 at 10:30 AM zhangpeng (AS) <zhangpeng362@huawei.com> wrote:
>>
>> By using git-bisect, the patch that introduces this issue is 05255b823a617
>> ("tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive."). v4.18-rc1.
>>
>> Currently, there are no other repro or c reproduction programs can reproduce
>> the issue. The syz log used to reproduce the issue is as follows:
>>
>> r3 = socket$inet_tcp(0x2, 0x1, 0x0)
>> mmap(&(0x7f0000ff9000/0x4000)=nil, 0x4000, 0x0, 0x12, r3, 0x0)
>> r4 = socket$inet_tcp(0x2, 0x1, 0x0)
>> bind$inet(r4, &(0x7f0000000000)={0x2, 0x4e24, @multicast1}, 0x10)
>> connect$inet(r4, &(0x7f00000006c0)={0x2, 0x4e24, @empty}, 0x10)
>> r5 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
>> 0x181e42, 0x0)
>> fallocate(r5, 0x0, 0x0, 0x85b8818)
>> sendfile(r4, r5, 0x0, 0x3000)
>> getsockopt$inet_tcp_TCP_ZEROCOPY_RECEIVE(r4, 0x6, 0x23,
>> &(0x7f00000001c0)={&(0x7f0000ffb000/0x3000)=nil, 0x3000, 0x0, 0x0,
>> 0x0, 0x0, 0x0, 0x0, 0x0}, &(0x7f0000000440)=0x10)
>> r6 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
>> 0x181e42, 0x0)
>>
> Could you try the following fix then ?
>
> (We also could remove the !skb_frag_off(frag) condition, as the
> !PageCompound() is necessary it seems :/)
>
> Thanks a lot !
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 1baa484d21902d2492fc2830d960100dc09683bf..ee954ae7778a651a9da4de057e3bafe35a6e10d6
> 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1785,7 +1785,9 @@ static skb_frag_t *skb_advance_to_frag(struct
> sk_buff *skb, u32 offset_skb,
>
>   static bool can_map_frag(const skb_frag_t *frag)
>   {
> -       return skb_frag_size(frag) == PAGE_SIZE && !skb_frag_off(frag);
> +       return skb_frag_size(frag) == PAGE_SIZE &&
> +              !skb_frag_off(frag) &&
> +              !PageCompound(skb_frag_page(frag));
>   }
>
>   static int find_next_mappable_frag(const skb_frag_t *frag,

This patch doesn't fix this issue. The page cache that can trigger this issue
doesn't necessarily need to be compound. 🙁

-- 
Best Regards,
Peng


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SECURITY PROBLEM: Any user can crash the kernel with TCP ZEROCOPY
  2024-01-25  2:18                 ` zhangpeng (AS)
@ 2024-01-25  8:57                   ` Eric Dumazet
  2024-01-25  9:22                     ` zhangpeng (AS)
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2024-01-25  8:57 UTC (permalink / raw)
  To: zhangpeng (AS)
  Cc: Matthew Wilcox, linux-mm, linux-fsdevel, netdev, akpm, davem,
	dsahern, kuba, pabeni, arjunroy, wangkefeng.wang

On Thu, Jan 25, 2024 at 3:18 AM zhangpeng (AS) <zhangpeng362@huawei.com> wrote:
>
> On 2024/1/24 18:11, Eric Dumazet wrote:
>
> > On Wed, Jan 24, 2024 at 10:30 AM zhangpeng (AS) <zhangpeng362@huawei.com> wrote:
> >>
> >> By using git-bisect, the patch that introduces this issue is 05255b823a617
> >> ("tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive."). v4.18-rc1.
> >>
> >> Currently, there are no other repro or c reproduction programs can reproduce
> >> the issue. The syz log used to reproduce the issue is as follows:
> >>
> >> r3 = socket$inet_tcp(0x2, 0x1, 0x0)
> >> mmap(&(0x7f0000ff9000/0x4000)=nil, 0x4000, 0x0, 0x12, r3, 0x0)
> >> r4 = socket$inet_tcp(0x2, 0x1, 0x0)
> >> bind$inet(r4, &(0x7f0000000000)={0x2, 0x4e24, @multicast1}, 0x10)
> >> connect$inet(r4, &(0x7f00000006c0)={0x2, 0x4e24, @empty}, 0x10)
> >> r5 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
> >> 0x181e42, 0x0)
> >> fallocate(r5, 0x0, 0x0, 0x85b8818)
> >> sendfile(r4, r5, 0x0, 0x3000)
> >> getsockopt$inet_tcp_TCP_ZEROCOPY_RECEIVE(r4, 0x6, 0x23,
> >> &(0x7f00000001c0)={&(0x7f0000ffb000/0x3000)=nil, 0x3000, 0x0, 0x0,
> >> 0x0, 0x0, 0x0, 0x0, 0x0}, &(0x7f0000000440)=0x10)
> >> r6 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
> >> 0x181e42, 0x0)
> >>
> > Could you try the following fix then ?
> >
> > (We also could remove the !skb_frag_off(frag) condition, as the
> > !PageCompound() is necessary it seems :/)
> >
> > Thanks a lot !
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 1baa484d21902d2492fc2830d960100dc09683bf..ee954ae7778a651a9da4de057e3bafe35a6e10d6
> > 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -1785,7 +1785,9 @@ static skb_frag_t *skb_advance_to_frag(struct
> > sk_buff *skb, u32 offset_skb,
> >
> >   static bool can_map_frag(const skb_frag_t *frag)
> >   {
> > -       return skb_frag_size(frag) == PAGE_SIZE && !skb_frag_off(frag);
> > +       return skb_frag_size(frag) == PAGE_SIZE &&
> > +              !skb_frag_off(frag) &&
> > +              !PageCompound(skb_frag_page(frag));
> >   }
> >
> >   static int find_next_mappable_frag(const skb_frag_t *frag,
>
> This patch doesn't fix this issue. The page cache that can trigger this issue
> doesn't necessarily need to be compound. 🙁

Ah, too bad :/

So the issue is that the page had a mapping. I am no mm expert,
I am not sure if we need to add more tests (like testing various
illegal page flags) ?

Can you test this ?

(I am still  converting the repro into C)

Thanks.

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 1baa484d21902d2492fc2830d960100dc09683bf..2128015227a5066ea74b3911ecaefe7992da132f
100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1785,7 +1785,17 @@ static skb_frag_t *skb_advance_to_frag(struct
sk_buff *skb, u32 offset_skb,

 static bool can_map_frag(const skb_frag_t *frag)
 {
-       return skb_frag_size(frag) == PAGE_SIZE && !skb_frag_off(frag);
+       struct page *page;
+
+       if (skb_frag_size(frag) != PAGE_SIZE || skb_frag_off(frag))
+               return false;
+
+       page = skb_frag_page(frag);
+
+       if (PageCompound(page) || page->mapping)
+               return false;
+
+       return true;
 }

 static int find_next_mappable_frag(const skb_frag_t *frag,

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SECURITY PROBLEM: Any user can crash the kernel with TCP ZEROCOPY
  2024-01-25  8:57                   ` Eric Dumazet
@ 2024-01-25  9:22                     ` zhangpeng (AS)
  2024-01-25 10:31                       ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: zhangpeng (AS) @ 2024-01-25  9:22 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Matthew Wilcox, linux-mm, linux-fsdevel, netdev, akpm, davem,
	dsahern, kuba, pabeni, arjunroy, wangkefeng.wang

On 2024/1/25 16:57, Eric Dumazet wrote:

> On Thu, Jan 25, 2024 at 3:18 AM zhangpeng (AS) <zhangpeng362@huawei.com> wrote:
>> On 2024/1/24 18:11, Eric Dumazet wrote:
>>
>>> On Wed, Jan 24, 2024 at 10:30 AM zhangpeng (AS) <zhangpeng362@huawei.com> wrote:
>>>> By using git-bisect, the patch that introduces this issue is 05255b823a617
>>>> ("tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive."). v4.18-rc1.
>>>>
>>>> Currently, there are no other repro or c reproduction programs can reproduce
>>>> the issue. The syz log used to reproduce the issue is as follows:
>>>>
>>>> r3 = socket$inet_tcp(0x2, 0x1, 0x0)
>>>> mmap(&(0x7f0000ff9000/0x4000)=nil, 0x4000, 0x0, 0x12, r3, 0x0)
>>>> r4 = socket$inet_tcp(0x2, 0x1, 0x0)
>>>> bind$inet(r4, &(0x7f0000000000)={0x2, 0x4e24, @multicast1}, 0x10)
>>>> connect$inet(r4, &(0x7f00000006c0)={0x2, 0x4e24, @empty}, 0x10)
>>>> r5 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
>>>> 0x181e42, 0x0)
>>>> fallocate(r5, 0x0, 0x0, 0x85b8818)
>>>> sendfile(r4, r5, 0x0, 0x3000)
>>>> getsockopt$inet_tcp_TCP_ZEROCOPY_RECEIVE(r4, 0x6, 0x23,
>>>> &(0x7f00000001c0)={&(0x7f0000ffb000/0x3000)=nil, 0x3000, 0x0, 0x0,
>>>> 0x0, 0x0, 0x0, 0x0, 0x0}, &(0x7f0000000440)=0x10)
>>>> r6 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
>>>> 0x181e42, 0x0)
>>>>
>>> Could you try the following fix then ?
>>>
>>> (We also could remove the !skb_frag_off(frag) condition, as the
>>> !PageCompound() is necessary it seems :/)
>>>
>>> Thanks a lot !
>>>
>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>> index 1baa484d21902d2492fc2830d960100dc09683bf..ee954ae7778a651a9da4de057e3bafe35a6e10d6
>>> 100644
>>> --- a/net/ipv4/tcp.c
>>> +++ b/net/ipv4/tcp.c
>>> @@ -1785,7 +1785,9 @@ static skb_frag_t *skb_advance_to_frag(struct
>>> sk_buff *skb, u32 offset_skb,
>>>
>>>    static bool can_map_frag(const skb_frag_t *frag)
>>>    {
>>> -       return skb_frag_size(frag) == PAGE_SIZE && !skb_frag_off(frag);
>>> +       return skb_frag_size(frag) == PAGE_SIZE &&
>>> +              !skb_frag_off(frag) &&
>>> +              !PageCompound(skb_frag_page(frag));
>>>    }
>>>
>>>    static int find_next_mappable_frag(const skb_frag_t *frag,
>> This patch doesn't fix this issue. The page cache that can trigger this issue
>> doesn't necessarily need to be compound. 🙁
> Ah, too bad :/
>
> So the issue is that the page had a mapping. I am no mm expert,
> I am not sure if we need to add more tests (like testing various
> illegal page flags) ?
>
> Can you test this ?
>
> (I am still  converting the repro into C)
>
> Thanks.
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 1baa484d21902d2492fc2830d960100dc09683bf..2128015227a5066ea74b3911ecaefe7992da132f
> 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1785,7 +1785,17 @@ static skb_frag_t *skb_advance_to_frag(struct
> sk_buff *skb, u32 offset_skb,
>
>   static bool can_map_frag(const skb_frag_t *frag)
>   {
> -       return skb_frag_size(frag) == PAGE_SIZE && !skb_frag_off(frag);
> +       struct page *page;
> +
> +       if (skb_frag_size(frag) != PAGE_SIZE || skb_frag_off(frag))
> +               return false;
> +
> +       page = skb_frag_page(frag);
> +
> +       if (PageCompound(page) || page->mapping)
> +               return false;
> +
> +       return true;
>   }
>
>   static int find_next_mappable_frag(const skb_frag_t *frag,

This patch can fix this issue.

In this scenario, page->mapping is inode->i_mapping of ext4,
but VMA is tcp VMA. It's weird.

If all the pages that need to be inserted by TCP zerocopy are
page->mapping == NULL, this solution could be used.

Thanks!

-- 
Best Regards,
Peng


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SECURITY PROBLEM: Any user can crash the kernel with TCP ZEROCOPY
  2024-01-25  9:22                     ` zhangpeng (AS)
@ 2024-01-25 10:31                       ` Eric Dumazet
  0 siblings, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2024-01-25 10:31 UTC (permalink / raw)
  To: zhangpeng (AS)
  Cc: Matthew Wilcox, linux-mm, linux-fsdevel, netdev, akpm, davem,
	dsahern, kuba, pabeni, arjunroy, wangkefeng.wang

On Thu, Jan 25, 2024 at 10:22 AM zhangpeng (AS) <zhangpeng362@huawei.com> wrote:
>

>
> This patch can fix this issue.
>
>

Great, I will submit this patch for review then, thanks a lot !

>
> If all the pages that need to be inserted by TCP zerocopy are
> page->mapping == NULL, this solution could be used.

At least the patch looks sane for stable submission.

If we need to extend functionality, it can be done in future kernels.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-01-25 10:31 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-19  9:20 [RFC PATCH] filemap: add mapping_mapped check in filemap_unaccount_folio() Peng Zhang
2024-01-19 13:40 ` Matthew Wilcox
2024-01-20  6:46   ` zhangpeng (AS)
2024-01-22 16:04     ` SECURITY PROBLEM: Any user can crash the kernel with TCP ZEROCOPY Matthew Wilcox
2024-01-22 16:30       ` Eric Dumazet
2024-01-22 17:12         ` Matthew Wilcox
2024-01-22 17:39           ` Eric Dumazet
2024-01-24  9:30             ` zhangpeng (AS)
2024-01-24 10:11               ` Eric Dumazet
2024-01-25  2:18                 ` zhangpeng (AS)
2024-01-25  8:57                   ` Eric Dumazet
2024-01-25  9:22                     ` zhangpeng (AS)
2024-01-25 10:31                       ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).