linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: David Hildenbrand <david@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Yu Zhao <yuzhao@google.com>,
	"Yin, Fengwei" <fengwei.yin@intel.com>
Cc: linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org
Subject: Re: [RFC v2 PATCH 00/17] variable-order, large folios for anonymous memory
Date: Wed, 19 Apr 2023 11:12:09 +0100	[thread overview]
Message-ID: <5b6fe242-a19e-70bf-adba-240f2d5b8548@arm.com> (raw)
In-Reply-To: <568b5b73-f0e9-c385-f628-93e45825fb7b@redhat.com>

Hi David,

On 17/04/2023 15:05, David Hildenbrand wrote:
> [...]
> 
>>> Just a note (that you maybe already know) that we have to be a bit careful in
>>> the wp_copy path with replacing sub-pages that are marked exclusive.
>>
>> Ahh, no I wasn't aware of this - thanks for taking the time to explain it. I
>> think I have a bug.
>>
>> (I'm guessing the GUP fast path assumes that if it sees an exclusive page then
>> that page can't go away? And if I then wp_copy it, I'm breaking the assumption?
>> But surely user space can munmap it at any time and the consequences are
>> similar? It's probably clear that I don't know much about the GUP implementation
>> details...)
> 
> If GUP finds a read-only PTE pointing at an exclusive subpage, it assumes that
> this page cannot randomly be replaced by core MM due to COW. See
> gup_must_unshare(). So it can go ahead and pin the page. As long as user space
> doesn't do something stupid with the mapping (MADV_DONTNEED, munmap()) the
> pinned page must correspond to the mapped page.
> 
> If GUP finds a writeable PTE, it assumes that this page cannot randomly be
> replaced by core MM due to COW -- because writable implies exclusive. See, for
> example the VM_BUG_ON_PAGE() in follow_page_pte(). So, similarly, GUP can simply
> go ahead and pin the page.
> 
> GUP-fast runs lockless, not even taking the PT locks. It syncs against
> concurrent fork() using a special seqlock, and essentially unpins whatever it
> temporarily pinned when it detects that fork() was running concurrently. But it
> might result in some pages temporarily being flagged as "maybe pinned".
> 
> In other cases (!fork()), GUP-fast synchronizes against concurrent sharing (KSM)
> or unmapping (migration, swapout) that implies clearing of the PG_anon_flag of
> the subpage by first unmapping the PTE and conditionally remapping it. See
> mm/ksm.c:write_protect_page() as an example for the sharing side (especially: if
> page_try_share_anon_rmap() fails because the page may be pinned).
> 
> Long story short: replacing a r-o "maybe shared" (!exclusive) PTE is easy.
> Replacing an exclusive PTE (including writable PTEs) requires some work to sync
> with GUP-fast and goes rather in the "maybe just don't bother" terriroty.

I'm looking to fix this problem in my code, but am struggling to see how the
current code is safe. I'm thinking about the following scenario:

 - A page is CoW mapped into processes A and B.
 - The page takes a fault in process A, and do_wp_page() determines that it is
   "maybe-shared" and therefore must copy. So drops the PTL and calls
   wp_page_copy().
 - Process B exits.
 - Another thread in process A faults on the page. This time dw_wp_page()
   determines that the page is exclusive (due to the ref count), and reuses it,
   marking it exclusive along the way.
 - wp_page_copy() from the original thread in process A retakes the PTL and
   copies the _now exclusive_ page.

Having typed it up, I guess this can't happen, because wp_page_copy() will only
do the copy if the PTE hasn't changed and it will have changed because it is now
writable? So this is safe?

To make things more convoluted, what happens if the second thread does an
mprotect() to make the page RO after its write fault was handled? I think
mprotect() will serialize on the mmap write lock so this is safe too?

Sorry if this is a bit rambly, just trying to make sure I've understood
everything correctly.

Thanks,
Ryan



  parent reply	other threads:[~2023-04-19 10:12 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-14 13:02 [RFC v2 PATCH 00/17] variable-order, large folios for anonymous memory Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 01/17] mm: Expose clear_huge_page() unconditionally Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 02/17] mm: pass gfp flags and order to vma_alloc_zeroed_movable_folio() Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 03/17] mm: Introduce try_vma_alloc_movable_folio() Ryan Roberts
2023-04-17  8:49   ` Yin, Fengwei
2023-04-17 10:11     ` Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 04/17] mm: Implement folio_add_new_anon_rmap_range() Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 05/17] mm: Routines to determine max anon folio allocation order Ryan Roberts
2023-04-14 14:09   ` Kirill A. Shutemov
2023-04-14 14:38     ` Ryan Roberts
2023-04-14 15:37       ` Kirill A. Shutemov
2023-04-14 16:06         ` Ryan Roberts
2023-04-14 16:18           ` Matthew Wilcox
2023-04-14 16:31             ` Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 06/17] mm: Allocate large folios for anonymous memory Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 07/17] mm: Allow deferred splitting of arbitrary large anon folios Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 08/17] mm: Implement folio_move_anon_rmap_range() Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 09/17] mm: Update wp_page_reuse() to operate on range of pages Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 10/17] mm: Reuse large folios for anonymous memory Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 11/17] mm: Split __wp_page_copy_user() into 2 variants Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 12/17] mm: ptep_clear_flush_range_notify() macro for batch operation Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 13/17] mm: Implement folio_remove_rmap_range() Ryan Roberts
2023-04-14 13:03 ` [RFC v2 PATCH 14/17] mm: Copy large folios for anonymous memory Ryan Roberts
2023-04-14 13:03 ` [RFC v2 PATCH 15/17] mm: Convert zero page to large folios on write Ryan Roberts
2023-04-14 13:03 ` [RFC v2 PATCH 16/17] mm: mmap: Align unhinted maps to highest anon folio order Ryan Roberts
2023-04-17  8:25   ` Yin, Fengwei
2023-04-17 10:13     ` Ryan Roberts
2023-04-14 13:03 ` [RFC v2 PATCH 17/17] mm: Batch-zap large anonymous folio PTE mappings Ryan Roberts
2023-04-17  8:04 ` [RFC v2 PATCH 00/17] variable-order, large folios for anonymous memory Yin, Fengwei
2023-04-17 10:19   ` Ryan Roberts
2023-04-17  8:19 ` Yin, Fengwei
2023-04-17 10:28   ` Ryan Roberts
2023-04-17 10:54 ` David Hildenbrand
2023-04-17 11:43   ` Ryan Roberts
2023-04-17 14:05     ` David Hildenbrand
2023-04-17 15:38       ` Ryan Roberts
2023-04-17 15:44         ` David Hildenbrand
2023-04-17 16:15           ` Ryan Roberts
2023-04-26 10:41           ` Ryan Roberts
2023-05-17 13:58             ` David Hildenbrand
2023-05-18 11:23               ` Ryan Roberts
2023-04-19 10:12       ` Ryan Roberts [this message]
2023-04-19 10:51         ` David Hildenbrand
2023-04-19 11:13           ` Ryan Roberts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5b6fe242-a19e-70bf-adba-240f2d5b8548@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).