All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Hugh Dickins <hughd@google.com>, Jason Gunthorpe <jgg@ziepe.ca>,
	Alistair Popple <apopple@nvidia.com>,
	Matthew Wilcox <willy@infradead.org>,
	peterx@redhat.com, Jerome Glisse <jglisse@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	David Hildenbrand <david@redhat.com>
Subject: [PATCH v4 11/26] shmem/userfaultfd: Allow wr-protect none pte for file-backed mem
Date: Wed, 14 Jul 2021 18:24:33 -0400	[thread overview]
Message-ID: <20210714222433.48637-1-peterx@redhat.com> (raw)
In-Reply-To: <20210714222117.47648-1-peterx@redhat.com>

File-backed memory differs from anonymous memory in that even if the pte is
missing, the data could still resides either in the file or in page/swap cache.
So when wr-protect a pte, we need to consider none ptes too.

We do that by installing the uffd-wp special swap pte as a marker.  So when
there's a future write to the pte, the fault handler will go the special path
to first fault-in the page as read-only, then report to userfaultfd server with
the wr-protect message.

On the other hand, when unprotecting a page, it's also possible that the pte
got unmapped but replaced by the special uffd-wp marker.  Then we'll need to be
able to recover from a uffd-wp special swap pte into a none pte, so that the
next access to the page will fault in correctly as usual when trigger the fault
handler next time, rather than sending a uffd-wp message.

Special care needs to be taken throughout the change_protection_range()
process.  Since now we allow user to wr-protect a none pte, we need to be able
to pre-populate the page table entries if we see !anonymous && MM_CP_UFFD_WP
requests, otherwise change_protection_range() will always skip when the pgtable
entry does not exist.

Note that this patch only covers the small pages (pte level) but not covering
any of the transparent huge pages yet.  But this will be a base for thps too.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/mprotect.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index 4b743394afbe..8ec85b276975 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -29,6 +29,7 @@
 #include <linux/uaccess.h>
 #include <linux/mm_inline.h>
 #include <linux/pgtable.h>
+#include <linux/userfaultfd_k.h>
 #include <asm/cacheflush.h>
 #include <asm/mmu_context.h>
 #include <asm/tlbflush.h>
@@ -186,6 +187,32 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 				set_pte_at(vma->vm_mm, addr, pte, newpte);
 				pages++;
 			}
+		} else if (unlikely(is_swap_special_pte(oldpte))) {
+			if (uffd_wp_resolve && !vma_is_anonymous(vma) &&
+			    pte_swp_uffd_wp_special(oldpte)) {
+				/*
+				 * This is uffd-wp special pte and we'd like to
+				 * unprotect it.  What we need to do is simply
+				 * recover the pte into a none pte; the next
+				 * page fault will fault in the page.
+				 */
+				pte_clear(vma->vm_mm, addr, pte);
+				pages++;
+			}
+		} else {
+			/* It must be an none page, or what else?.. */
+			WARN_ON_ONCE(!pte_none(oldpte));
+			if (unlikely(uffd_wp && !vma_is_anonymous(vma))) {
+				/*
+				 * For file-backed mem, we need to be able to
+				 * wr-protect even for a none pte!  Because
+				 * even if the pte is null, the page/swap cache
+				 * could exist.
+				 */
+				set_pte_at(vma->vm_mm, addr, pte,
+					   pte_swp_mkuffd_wp_special(vma));
+				pages++;
+			}
 		}
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 	arch_leave_lazy_mmu_mode();
@@ -219,6 +246,25 @@ static inline int pmd_none_or_clear_bad_unless_trans_huge(pmd_t *pmd)
 	return 0;
 }
 
+/*
+ * File-backed vma allows uffd wr-protect upon none ptes, because even if pte
+ * is missing, page/swap cache could exist.  When that happens, the wr-protect
+ * information will be stored in the page table entries with the marker (e.g.,
+ * PTE_SWP_UFFD_WP_SPECIAL).  Prepare for that by always populating the page
+ * tables to pte level, so that we'll install the markers in change_pte_range()
+ * where necessary.
+ *
+ * Note that we only need to do this in pmd level, because if pmd does not
+ * exist, it means the whole range covered by the pmd entry (of a pud) does not
+ * contain any valid data but all zeros.  Then nothing to wr-protect.
+ */
+#define  change_protection_prepare(vma, pmd, addr, cp_flags)		\
+	do {								\
+		if (unlikely((cp_flags & MM_CP_UFFD_WP) && pmd_none(*pmd) && \
+			     !vma_is_anonymous(vma)))			\
+			WARN_ON_ONCE(pte_alloc(vma->vm_mm, pmd));	\
+	} while (0)
+
 static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
 		pud_t *pud, unsigned long addr, unsigned long end,
 		pgprot_t newprot, unsigned long cp_flags)
@@ -237,6 +283,8 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
 
 		next = pmd_addr_end(addr, end);
 
+		change_protection_prepare(vma, pmd, addr, cp_flags);
+
 		/*
 		 * Automatic NUMA balancing walks the tables with mmap_lock
 		 * held for read. It's possible a parallel update to occur
-- 
2.31.1


  parent reply	other threads:[~2021-07-14 22:24 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
2021-07-14 22:20 ` [PATCH v4 01/26] mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte Peter Xu
2021-07-14 22:20 ` [PATCH v4 02/26] shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-07-14 22:20 ` [PATCH v4 03/26] mm: Clear vmf->pte after pte_unmap_same() returns Peter Xu
2021-07-14 22:20 ` [PATCH v4 04/26] mm/userfaultfd: Introduce special pte for unmapped file-backed mem Peter Xu
2021-07-14 22:20 ` [PATCH v4 05/26] mm/swap: Introduce the idea of special swap ptes Peter Xu
2021-07-14 22:20 ` [PATCH v4 06/26] shmem/userfaultfd: Handle uffd-wp special pte in page fault handler Peter Xu
2021-07-15  6:20   ` kernel test robot
2021-07-15  6:20     ` kernel test robot
2021-07-15 18:50     ` Peter Xu
2021-07-15 18:50       ` Peter Xu
2021-07-14 22:20 ` [PATCH v4 07/26] mm: Drop first_index/last_index in zap_details Peter Xu
2021-07-14 22:20 ` [PATCH v4 08/26] mm: Introduce zap_details.zap_flags Peter Xu
2021-07-14 22:21 ` [PATCH v4 09/26] mm: Introduce ZAP_FLAG_SKIP_SWAP Peter Xu
2021-07-14 22:21 ` [PATCH v4 10/26] shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed Peter Xu
2021-07-14 22:24 ` Peter Xu [this message]
2021-07-14 22:24 ` [PATCH v4 12/26] shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps Peter Xu
2021-07-14 22:24 ` [PATCH v4 13/26] shmem/userfaultfd: Handle the left-overed special swap ptes Peter Xu
2021-07-14 22:24 ` [PATCH v4 14/26] shmem/userfaultfd: Pass over uffd-wp special swap pte when fork() Peter Xu
2021-07-14 22:24 ` [PATCH v4 15/26] mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h Peter Xu
2021-07-15  5:49   ` kernel test robot
2021-07-15  5:49     ` kernel test robot
2021-07-15  8:10   ` kernel test robot
2021-07-15  8:10     ` kernel test robot
2021-07-15 17:05   ` kernel test robot
2021-07-15 17:05     ` kernel test robot
2021-07-15 17:05   ` [RFC PATCH] mm/hugetlb: __unmap_hugepage_range() can be static kernel test robot
2021-07-15 17:05     ` kernel test robot
2021-07-15 18:53     ` Peter Xu
2021-07-15 18:53       ` Peter Xu
2021-07-14 22:24 ` [PATCH v4 16/26] mm/hugetlb: Introduce huge pte version of uffd-wp helpers Peter Xu
2021-07-14 22:24 ` [PATCH v4 17/26] hugetlb/userfaultfd: Hook page faults for uffd write protection Peter Xu
2021-07-14 22:25 ` [PATCH v4 18/26] hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-07-14 22:25 ` [PATCH v4 19/26] hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT Peter Xu
2021-07-14 22:25 ` [PATCH v4 20/26] mm/hugetlb: Introduce huge version of special swap pte helpers Peter Xu
2021-07-14 22:25 ` [PATCH v4 21/26] hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler Peter Xu
2021-07-14 22:25 ` [PATCH v4 22/26] hugetlb/userfaultfd: Allow wr-protect none ptes Peter Xu
2021-07-14 22:25 ` [PATCH v4 23/26] hugetlb/userfaultfd: Only drop uffd-wp special pte if required Peter Xu
2021-07-14 22:25 ` [PATCH v4 24/26] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs Peter Xu
2021-07-14 22:25 ` [PATCH v4 25/26] mm/userfaultfd: Enable write protection for shmem & hugetlbfs Peter Xu
2021-07-14 22:25 ` [PATCH v4 26/26] userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210714222433.48637-1-peterx@redhat.com \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nadav.amit@gmail.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.