All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Hugh Dickins <hughd@google.com>,
	Yuanzheng Song <songyuanzheng@huawei.com>
Cc: akpm@linux-foundation.org, gregkh@linuxfoundation.org,
	peterx@redhat.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH STABLE 5.10] mm/memory: add non-anonymous page check in the copy_present_page()
Date: Thu, 27 Oct 2022 09:54:01 +0200	[thread overview]
Message-ID: <9ffe3cbf-98bb-f958-9c80-547ec217c32f@redhat.com> (raw)
In-Reply-To: <3823471f-6dda-256e-e082-718879c05449@google.com>

On 26.10.22 23:51, Hugh Dickins wrote:
> On Mon, 24 Oct 2022, Yuanzheng Song wrote:
> 
>> The vma->anon_vma of the child process may be NULL because
>> the entire vma does not contain anonymous pages. In this
>> case, a BUG will occur when the copy_present_page() passes
>> a copy of a non-anonymous page of that vma to the
>> page_add_new_anon_rmap() to set up new anonymous rmap.
>>
>> ------------[ cut here ]------------
>> kernel BUG at mm/rmap.c:1044!
>> Internal error: Oops - BUG: 0 [#1] SMP
>> Modules linked in:
>> CPU: 2 PID: 3617 Comm: test Not tainted 5.10.149 #1
>> Hardware name: linux,dummy-virt (DT)
>> pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
>> pc : __page_set_anon_rmap+0xbc/0xf8
>> lr : __page_set_anon_rmap+0xbc/0xf8
>> sp : ffff800014c1b870
>> x29: ffff800014c1b870 x28: 0000000000000001
>> x27: 0000000010100073 x26: ffff1d65c517baa8
>> x25: ffff1d65cab0f000 x24: ffff1d65c416d800
>> x23: ffff1d65cab5f248 x22: 0000000020000000
>> x21: 0000000000000001 x20: 0000000000000000
>> x19: fffffe75970023c0 x18: 0000000000000000
>> x17: 0000000000000000 x16: 0000000000000000
>> x15: 0000000000000000 x14: 0000000000000000
>> x13: 0000000000000000 x12: 0000000000000000
>> x11: 0000000000000000 x10: 0000000000000000
>> x9 : ffffc3096d5fb858 x8 : 0000000000000000
>> x7 : 0000000000000011 x6 : ffff5a5c9089c000
>> x5 : 0000000000020000 x4 : ffff5a5c9089c000
>> x3 : ffffc3096d200000 x2 : ffffc3096e8d0000
>> x1 : ffff1d65ca3da740 x0 : 0000000000000000
>> Call trace:
>>   __page_set_anon_rmap+0xbc/0xf8
>>   page_add_new_anon_rmap+0x1e0/0x390
>>   copy_pte_range+0xd00/0x1248
>>   copy_page_range+0x39c/0x620
>>   dup_mmap+0x2e0/0x5a8
>>   dup_mm+0x78/0x140
>>   copy_process+0x918/0x1a20
>>   kernel_clone+0xac/0x638
>>   __do_sys_clone+0x78/0xb0
>>   __arm64_sys_clone+0x30/0x40
>>   el0_svc_common.constprop.0+0xb0/0x308
>>   do_el0_svc+0x48/0xb8
>>   el0_svc+0x24/0x38
>>   el0_sync_handler+0x160/0x168
>>   el0_sync+0x180/0x1c0
>> Code: 97f8ff85 f9400294 17ffffeb 97f8ff82 (d4210000)
>> ---[ end trace a972347688dc9bd4 ]---
>> Kernel panic - not syncing: Oops - BUG: Fatal exception
>> SMP: stopping secondary CPUs
>> Kernel Offset: 0x43095d200000 from 0xffff800010000000
>> PHYS_OFFSET: 0xffffe29a80000000
>> CPU features: 0x08200022,61806082
>> Memory Limit: none
>> ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]---
>>
>> This problem has been fixed by the fb3d824d1a46
>> ("mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and page_try_dup_anon_rmap()"),
>> but still exists in the linux-5.10.y branch.
>>
>> This patch is not applicable to this version because
>> of the large version differences. Therefore, fix it by
>> adding non-anonymous page check in the copy_present_page().
>>
>> Fixes: 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes")
>> Signed-off-by: Yuanzheng Song <songyuanzheng@huawei.com>
> 
> It's a good point, but this patch should not go into any stable release
> without an explicit Ack from either Peter Xu or David Hildenbrand.
> 
> To my eye, it's simply avoiding the issue, rather than fixing
> it properly; and even if the issue is so rare, and fixing properly
> too difficult or inefficent (a cached anon_vma?), that a workaround
> is good enough, it still looks like the wrong workaround (checking
> dst_vma->anon_vma instead of PageAnon seems more to the point, and
> less lenient).
> 
> But my eye on COW is very poor nowadays, and I may be plain wrong.

I am not aware of any reason for copying a !anon page during fork(). COW 
regrading fork() is all about sharing private (anon) pages between the 
parent and the child. The semantics of other pages are untouched.


Yes, I am working on reliable longterm R/O pinning improvements, whereby 
we never pin such pages in a MAP_PRIVATE mapping but instead break COW 
before pinning; but this only applies to longterm pinning 
(FOLL_LONGTERM) and is independent of fork() here.


Let me elaborate: if you have a pagecache page (or the shared zeropage) 
in a MAP_PRIVATE mapping pinned R/O, the next write fault will replace 
the page by a copy, *independent* of fork() or not: the page is already 
mapped write-protected into the page table.


IIUC, the problem here is that we have a writable private mapping (COW 
mapping) of, say, a file, whereby we never had to COW -- so no anon 
pages were mapped.

Then, we had the process pin some page (&src_mm->has_pinned) once and 
detect a pagecache page / shared zeropage as "maybe pinned" during 
fork(), which can happen easily, for example, due to other process' 
action, false positives, ... we end up duplicating a !anon page.

Restricting copying during fork() to anon pages is IMHO the right thing 
to do.

-- 
Thanks,

David / dhildenb


  parent reply	other threads:[~2022-10-27  7:54 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-24  9:49 [PATCH STABLE 5.10] mm/memory: add non-anonymous page check in the copy_present_page() Yuanzheng Song
2022-10-24  9:25 ` kernel test robot
2022-10-26 16:52 ` Greg KH
2022-10-27 11:37   ` songyuanzheng
2022-10-26 21:51 ` Hugh Dickins
2022-10-27  0:32   ` Peter Xu
2022-10-27  1:48     ` Hugh Dickins
2022-10-27  2:11       ` songyuanzheng
2022-10-27 15:01       ` Peter Xu
2022-10-27 21:58         ` Hugh Dickins
2022-10-27 22:56           ` Peter Xu
2022-10-28  1:32             ` Hugh Dickins
2022-10-28  4:26               ` David Hildenbrand
2022-10-28 14:39               ` Peter Xu
2022-10-27  7:54   ` David Hildenbrand [this message]
2022-10-27 11:55   ` songyuanzheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9ffe3cbf-98bb-f958-9c80-547ec217c32f@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterx@redhat.com \
    --cc=songyuanzheng@huawei.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.