All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Hugh Dickins <hughd@google.com>,
	Axel Rasmussen <axelrasmussen@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Nadav Amit <nadav.amit@gmail.com>
Subject: Re: [PATCH] mm/khugepaged: Detecting uffd-wp vma more efficiently
Date: Wed, 22 Sep 2021 22:18:30 -0400	[thread overview]
Message-ID: <YUvj9r3Y954pYPnf@xz-m1.local> (raw)
In-Reply-To: <472a3497-ba70-ac6b-5828-bc5c4c93e9ab@google.com>

[-- Attachment #1: Type: text/plain, Size: 1201 bytes --]

On Wed, Sep 22, 2021 at 06:22:45PM -0700, Hugh Dickins wrote:
> No, I think I misunderstood you before: thanks for re-explaining.
> (And Axel's !userfaultfd_minor() check before calling do_fault_around()
> plays an important part in making sure that it does reach shmem_fault().)

Still thanks for confirming this, Hugh.

Said that, Axel, I didn't mean I'm against doing something similar like
uffd-wp; it's just a heads-up that maybe you won't find a reproducer with real
issues with minor mode.

Even if I think minor mode should be fine with current code, we could still
choose to disable khugepaged from removing the pmd for VM_UFFD_MINOR vmas, just
like what we'll do with VM_UFFD_WP.  At least it can still reduce false
positives.

So far in my local branch I queued the patch which I attached, that's required
for uffd-wp shmem afaict.  If you think minor mode would like that too, I can
post it separately with minor mode added in.

Note that it's slightly different from what I pasted in reply to Yang Shi - I
made it slightly more complicated just to make sure there's no race.  I
mentioned the possible race (I think) in the commit log.

Let me know your preference.

Thanks,

-- 
Peter Xu

[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 2697 bytes --]

From 989d36914ac144177e17f9aacbf2785bb8f21420 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Wed, 22 Sep 2021 16:23:33 -0400
Subject: [PATCH] mm/khugepaged: Don't recycle vma pgtable if uffd-wp
 registered

When we're trying to collapse a 2M huge shmem page, don't retract pgtable pmd
page if it's registered with uffd-wp, because that pgtable could have pte
markers installed.  Recycling of that pgtable means we'll lose the pte markers.
That could cause data loss for an uffd-wp enabled application on shmem.

Instead of disabling khugepaged on these files, simply skip retracting these
special VMAs, then the page cache can still be merged into a huge thp, and
other mm/vma can still map the range of file with a huge thp when proper.

Note that checking VM_UFFD_WP needs to be done with mmap_sem held for write,
that avoids race like:

         khugepaged                             user thread
         ==========                             ===========
     check VM_UFFD_WP, not set
                                       UFFDIO_REGISTER with uffd-wp on shmem
                                       wr-protect some pages (install markers)
     take mmap_sem write lock
     erase pmd and free pmd page
      --> pte markers are dropped unnoticed!

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/khugepaged.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 045cc579f724..23e1d03156b3 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1451,6 +1451,10 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
 	if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE))
 		return;
 
+	/* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */
+	if (userfaultfd_wp(vma))
+		return;
+
 	hpage = find_lock_page(vma->vm_file->f_mapping,
 			       linear_page_index(vma, haddr));
 	if (!hpage)
@@ -1591,7 +1595,15 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
 		 * reverse order. Trylock is a way to avoid deadlock.
 		 */
 		if (mmap_write_trylock(mm)) {
-			if (!khugepaged_test_exit(mm)) {
+			/*
+			 * When a vma is registered with uffd-wp, we can't
+			 * recycle the pmd pgtable because there can be pte
+			 * markers installed.  Skip it only, so the rest mm/vma
+			 * can still have the same file mapped hugely, however
+			 * it'll always mapped in small page size for uffd-wp
+			 * registered ranges.
+			 */
+			if (!khugepaged_test_exit(mm) && !userfaultfd_wp(vma)) {
 				spinlock_t *ptl = pmd_lock(mm, pmd);
 				/* assume page table is clear */
 				_pmd = pmdp_collapse_flush(vma, addr, pmd);
-- 
2.31.1


  reply	other threads:[~2021-09-23  2:18 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-22 17:51 [PATCH] mm/khugepaged: Detecting uffd-wp vma more efficiently Peter Xu
2021-09-22 18:21 ` David Hildenbrand
2021-09-22 18:58   ` Peter Xu
2021-09-22 19:29     ` Yang Shi
2021-09-22 19:29       ` Yang Shi
2021-09-22 20:04       ` Peter Xu
2021-09-22 20:23         ` Peter Xu
2021-09-24 10:05     ` David Hildenbrand
2021-09-22 19:33 ` Peter Xu
2021-09-22 20:49 ` Axel Rasmussen
2021-09-22 20:49   ` Axel Rasmussen
2021-09-22 21:20   ` Peter Xu
2021-09-22 23:18     ` Hugh Dickins
2021-09-22 23:18       ` Hugh Dickins
2021-09-22 23:44       ` Peter Xu
2021-09-23  1:22         ` Hugh Dickins
2021-09-23  1:22           ` Hugh Dickins
2021-09-23  2:18           ` Peter Xu [this message]
2021-09-23 16:47             ` Axel Rasmussen
2021-09-23 16:47               ` Axel Rasmussen
2021-09-23 17:53               ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YUvj9r3Y954pYPnf@xz-m1.local \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nadav.amit@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.