linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Axel Rasmussen <axelrasmussen@google.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Hugh Dickins <hughd@google.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Alistair Popple <apopple@nvidia.com>,
	Jerome Glisse <jglisse@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	peterx@redhat.com, David Hildenbrand <david@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>
Subject: [PATCH v6 02/23] mm: Teach core mm about pte markers
Date: Mon, 15 Nov 2021 15:55:01 +0800	[thread overview]
Message-ID: <20211115075522.73795-3-peterx@redhat.com> (raw)
In-Reply-To: <20211115075522.73795-1-peterx@redhat.com>

This patch still does not use pte marker in any way, however it teaches the
core mm about the pte marker idea.

For example, handle_pte_marker() is introduced that will parse and handle all
the pte marker faults.

Many of the places are more about commenting it up - so that we know there's
the possibility of pte marker showing up, and why we don't need special code
for the cases.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 fs/userfaultfd.c | 10 ++++++----
 mm/filemap.c     |  5 +++++
 mm/hmm.c         |  2 +-
 mm/memcontrol.c  |  8 ++++++--
 mm/memory.c      | 23 +++++++++++++++++++++++
 mm/mincore.c     |  3 ++-
 mm/mprotect.c    |  3 +++
 7 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 22bf14ab2d16..fa24c72a849e 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -245,9 +245,10 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
 
 	/*
 	 * Lockless access: we're in a wait_event so it's ok if it
-	 * changes under us.
+	 * changes under us.  PTE markers should be handled the same as none
+	 * ptes here.
 	 */
-	if (huge_pte_none(pte))
+	if (huge_pte_none_mostly(pte))
 		ret = true;
 	if (!huge_pte_write(pte) && (reason & VM_UFFD_WP))
 		ret = true;
@@ -326,9 +327,10 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx,
 	pte = pte_offset_map(pmd, address);
 	/*
 	 * Lockless access: we're in a wait_event so it's ok if it
-	 * changes under us.
+	 * changes under us.  PTE markers should be handled the same as none
+	 * ptes here.
 	 */
-	if (pte_none(*pte))
+	if (pte_none_mostly(*pte))
 		ret = true;
 	if (!pte_write(*pte) && (reason & VM_UFFD_WP))
 		ret = true;
diff --git a/mm/filemap.c b/mm/filemap.c
index daa0e23a6ee6..9a7228b95b30 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3327,6 +3327,11 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 		vmf->pte += xas.xa_index - last_pgoff;
 		last_pgoff = xas.xa_index;
 
+		/*
+		 * NOTE: If there're PTE markers, we'll leave them to be
+		 * handled in the specific fault path, and it'll prohibit the
+		 * fault-around logic.
+		 */
 		if (!pte_none(*vmf->pte))
 			goto unlock;
 
diff --git a/mm/hmm.c b/mm/hmm.c
index 842e26599238..a0f72a540dc3 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -239,7 +239,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 	pte_t pte = *ptep;
 	uint64_t pfn_req_flags = *hmm_pfn;
 
-	if (pte_none(pte)) {
+	if (pte_none_mostly(pte)) {
 		required_fault =
 			hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0);
 		if (required_fault)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 781605e92015..eaddbc77aa5a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5692,10 +5692,14 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma,
 
 	if (pte_present(ptent))
 		page = mc_handle_present_pte(vma, addr, ptent);
+	else if (pte_none_mostly(ptent))
+		/*
+		 * PTE markers should be treated as a none pte here, separated
+		 * from other swap handling below.
+		 */
+		page = mc_handle_file_pte(vma, addr, ptent);
 	else if (is_swap_pte(ptent))
 		page = mc_handle_swap_pte(vma, ptent, &ent);
-	else if (pte_none(ptent))
-		page = mc_handle_file_pte(vma, addr, ptent);
 
 	if (!page && !ent.val)
 		return ret;
diff --git a/mm/memory.c b/mm/memory.c
index e5d59a6b6479..04662b010005 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -98,6 +98,8 @@ struct page *mem_map;
 EXPORT_SYMBOL(mem_map);
 #endif
 
+static vm_fault_t do_fault(struct vm_fault *vmf);
+
 /*
  * A number of key systems in x86 including ioremap() rely on the assumption
  * that high_memory defines the upper bound on direct map memory, then end
@@ -1380,6 +1382,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 			if (unlikely(zap_skip_check_mapping(details, page)))
 				continue;
 			rss[mm_counter(page)]--;
+		} else if (is_pte_marker_entry(entry)) {
+			/* By default, simply drop all pte markers when zap */
 		} else if (!non_swap_entry(entry)) {
 			rss[MM_SWAPENTS]--;
 			if (unlikely(!free_swap_and_cache(entry)))
@@ -3448,6 +3452,23 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
 	return 0;
 }
 
+static vm_fault_t handle_pte_marker(struct vm_fault *vmf)
+{
+	swp_entry_t entry = pte_to_swp_entry(vmf->orig_pte);
+	unsigned long marker = pte_marker_get(entry);
+
+	/*
+	 * PTE markers should always be with file-backed memories, and the
+	 * marker should never be empty.  If anything weird happened, the best
+	 * thing to do is to kill the process along with its mm.
+	 */
+	if (WARN_ON_ONCE(vma_is_anonymous(vmf->vma) || !marker))
+		return VM_FAULT_SIGBUS;
+
+	/* TODO: handle pte markers */
+	return 0;
+}
+
 /*
  * We enter with non-exclusive mmap_lock (to exclude vma changes,
  * but allow concurrent faults), and pte mapped but not yet locked.
@@ -3484,6 +3505,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 			ret = vmf->page->pgmap->ops->migrate_to_ram(vmf);
 		} else if (is_hwpoison_entry(entry)) {
 			ret = VM_FAULT_HWPOISON;
+		} else if (is_pte_marker_entry(entry)) {
+			ret = handle_pte_marker(vmf);
 		} else {
 			print_bad_pte(vma, vmf->address, vmf->orig_pte, NULL);
 			ret = VM_FAULT_SIGBUS;
diff --git a/mm/mincore.c b/mm/mincore.c
index 9122676b54d6..736869f4b409 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -121,7 +121,8 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	for (; addr != end; ptep++, addr += PAGE_SIZE) {
 		pte_t pte = *ptep;
 
-		if (pte_none(pte))
+		/* We need to do cache lookup too for pte markers */
+		if (pte_none_mostly(pte))
 			__mincore_unmapped_range(addr, addr + PAGE_SIZE,
 						 vma, vec);
 		else if (pte_present(pte))
diff --git a/mm/mprotect.c b/mm/mprotect.c
index e552f5e0ccbd..890bc1f9ca24 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -173,6 +173,9 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 					newpte = pte_swp_mksoft_dirty(newpte);
 				if (pte_swp_uffd_wp(oldpte))
 					newpte = pte_swp_mkuffd_wp(newpte);
+			} else if (is_pte_marker_entry(entry)) {
+				/* Skip it, the same as none pte */
+				continue;
 			} else {
 				newpte = oldpte;
 			}
-- 
2.32.0


  parent reply	other threads:[~2021-11-15  7:56 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-15  7:54 [PATCH v6 00/23] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
2021-11-15  7:55 ` [PATCH v6 01/23] mm: Introduce PTE_MARKER swap entry Peter Xu
2021-12-03  3:30   ` Alistair Popple
2021-12-03  4:21     ` Peter Xu
2021-12-03  5:35       ` Alistair Popple
2021-12-03  6:45         ` Peter Xu
2021-12-07  2:12           ` Alistair Popple
2021-12-07  2:30             ` Peter Xu
2021-11-15  7:55 ` Peter Xu [this message]
2021-11-15  7:55 ` [PATCH v6 03/23] mm: Check against orig_pte for finish_fault() Peter Xu
2021-12-16  5:01   ` Alistair Popple
2021-12-16  5:38     ` Peter Xu
2021-12-16  5:50       ` Peter Xu
2021-12-16  6:23         ` Alistair Popple
2021-12-16  7:06           ` Peter Xu
2021-12-16  7:45             ` Alistair Popple
2021-12-16  8:04               ` Peter Xu
2021-11-15  7:55 ` [PATCH v6 04/23] mm/uffd: PTE_MARKER_UFFD_WP Peter Xu
2021-12-16  5:18   ` Alistair Popple
2021-12-16  5:45     ` Peter Xu
2021-11-15  7:55 ` [PATCH v6 05/23] mm/shmem: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-11-15  7:55 ` [PATCH v6 06/23] mm/shmem: Handle uffd-wp special pte in page fault handler Peter Xu
2021-12-16  5:56   ` Alistair Popple
2021-12-16  6:17     ` Peter Xu
2021-12-16  6:30       ` Alistair Popple
2021-11-15  7:55 ` [PATCH v6 07/23] mm/shmem: Persist uffd-wp bit across zapping for file-backed Peter Xu
2021-11-15  8:00 ` [PATCH v6 08/23] mm/shmem: Allow uffd wr-protect none pte for file-backed mem Peter Xu
2021-11-15  8:00 ` [PATCH v6 09/23] mm/shmem: Allows file-back mem to be uffd wr-protected on thps Peter Xu
2021-11-15  8:01 ` [PATCH v6 10/23] mm/shmem: Handle uffd-wp during fork() Peter Xu
2021-11-15  8:01 ` [PATCH v6 11/23] mm/hugetlb: Introduce huge pte version of uffd-wp helpers Peter Xu
2021-11-15  8:01 ` [PATCH v6 12/23] mm/hugetlb: Hook page faults for uffd write protection Peter Xu
2021-11-15  8:01 ` [PATCH v6 13/23] mm/hugetlb: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-11-15  8:02 ` [PATCH v6 14/23] mm/hugetlb: Handle UFFDIO_WRITEPROTECT Peter Xu
2021-11-15  8:02 ` [PATCH v6 15/23] mm/hugetlb: Handle pte markers in page faults Peter Xu
2021-11-15  8:02 ` [PATCH v6 16/23] mm/hugetlb: Allow uffd wr-protect none ptes Peter Xu
2021-11-15  8:02 ` [PATCH v6 17/23] mm/hugetlb: Only drop uffd-wp special pte if required Peter Xu
2021-11-15  8:02 ` [PATCH v6 18/23] mm/hugetlb: Handle uffd-wp during fork() Peter Xu
2021-11-15  8:03 ` [PATCH v6 19/23] mm/khugepaged: Don't recycle vma pgtable if uffd-wp registered Peter Xu
2021-11-15  8:03 ` [PATCH v6 20/23] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs Peter Xu
2021-11-15  8:03 ` [PATCH v6 21/23] mm/uffd: Enable write protection for shmem & hugetlbfs Peter Xu
2021-11-15  8:03 ` [PATCH v6 22/23] mm: Enable PTE markers by default Peter Xu
2021-11-15  8:04 ` [PATCH v6 23/23] selftests/uffd: Enable uffd-wp for shmem/hugetlbfs Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211115075522.73795-3-peterx@redhat.com \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=jglisse@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nadav.amit@gmail.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).