All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: Wei Zhang <wzam@amazon.com>, Matthew Wilcox <willy@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Gal Pressman <galpress@amazon.com>,
	peterx@redhat.com, Christoph Hellwig <hch@lst.de>,
	Andrea Arcangeli <aarcange@redhat.com>, Jan Kara <jack@suse.cz>,
	Kirill Shutemov <kirill@shutemov.name>,
	David Gibson <david@gibson.dropbear.id.au>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Kirill Tkhai <ktkhai@virtuozzo.com>, Jann Horn <jannh@google.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: [PATCH 4/4] hugetlb: Do early cow when page pinned on src mm
Date: Wed,  3 Feb 2021 16:08:32 -0500	[thread overview]
Message-ID: <20210203210832.113685-5-peterx@redhat.com> (raw)
In-Reply-To: <20210203210832.113685-1-peterx@redhat.com>

This is the last missing piece of the COW-during-fork effort when there're
pinned pages found.  One can reference 70e806e4e645 ("mm: Do early cow for
pinned pages during fork() for ptes", 2020-09-27) for more information, since
we do similar things here rather than pte this time, but just for hugetlb.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/hugetlb.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 71 insertions(+), 5 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 9e6ea96bf33b..931bf1a81c16 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3734,11 +3734,27 @@ static bool is_hugetlb_entry_hwpoisoned(pte_t pte)
 		return false;
 }
 
+static void
+hugetlb_copy_page(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr,
+		  struct page *old_page, struct page *new_page)
+{
+	struct hstate *h = hstate_vma(vma);
+	unsigned int psize = pages_per_huge_page(h);
+
+	copy_user_huge_page(new_page, old_page, addr, vma, psize);
+	__SetPageUptodate(new_page);
+	ClearPagePrivate(new_page);
+	set_page_huge_active(new_page);
+	set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, new_page, 1));
+	hugepage_add_new_anon_rmap(new_page, vma, addr);
+	hugetlb_count_add(psize, vma->vm_mm);
+}
+
 int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 			    struct vm_area_struct *vma)
 {
 	pte_t *src_pte, *dst_pte, entry, dst_entry;
-	struct page *ptepage;
+	struct page *ptepage, *prealloc = NULL;
 	unsigned long addr;
 	int cow;
 	struct hstate *h = hstate_vma(vma);
@@ -3787,7 +3803,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 		dst_entry = huge_ptep_get(dst_pte);
 		if ((dst_pte == src_pte) || !huge_pte_none(dst_entry))
 			continue;
-
+again:
 		dst_ptl = huge_pte_lock(h, dst, dst_pte);
 		src_ptl = huge_pte_lockptr(h, src, src_pte);
 		spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
@@ -3816,6 +3832,54 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 			}
 			set_huge_swap_pte_at(dst, addr, dst_pte, entry, sz);
 		} else {
+			entry = huge_ptep_get(src_pte);
+			ptepage = pte_page(entry);
+			get_page(ptepage);
+
+			if (unlikely(page_needs_cow_for_dma(vma, ptepage))) {
+				/* This is very possibly a pinned huge page */
+				if (!prealloc) {
+					/*
+					 * Preallocate the huge page without
+					 * tons of locks since we could sleep.
+					 * Note: we can't use any reservation
+					 * because the page will be exclusively
+					 * owned by the child later.
+					 */
+					put_page(ptepage);
+					spin_unlock(src_ptl);
+					spin_unlock(dst_ptl);
+					prealloc = alloc_huge_page(vma, addr, 0);
+					if (!prealloc) {
+						/*
+						 * hugetlb_cow() seems to be
+						 * more careful here than us.
+						 * However for fork() we could
+						 * be strict not only because
+						 * no one should be referencing
+						 * the child mm yet, but also
+						 * if resources are rare we'd
+						 * better simply fail the
+						 * fork() even earlier.
+						 */
+						ret = -ENOMEM;
+						break;
+					}
+					goto again;
+				}
+				/*
+				 * We have page preallocated so that we can do
+				 * the copy right now.
+				 */
+				hugetlb_copy_page(vma, dst_pte, addr, ptepage,
+						  prealloc);
+				put_page(ptepage);
+				spin_unlock(src_ptl);
+				spin_unlock(dst_ptl);
+				prealloc = NULL;
+				continue;
+			}
+
 			if (cow) {
 				/*
 				 * No need to notify as we are downgrading page
@@ -3826,9 +3890,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 				 */
 				huge_ptep_set_wrprotect(src, addr, src_pte);
 			}
-			entry = huge_ptep_get(src_pte);
-			ptepage = pte_page(entry);
-			get_page(ptepage);
+
 			page_dup_rmap(ptepage, true);
 			set_huge_pte_at(dst, addr, dst_pte, entry);
 			hugetlb_count_add(pages_per_huge_page(h), dst);
@@ -3842,6 +3904,10 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 	else
 		i_mmap_unlock_read(mapping);
 
+	/* Free the preallocated page if not used at last */
+	if (prealloc)
+		put_page(prealloc);
+
 	return ret;
 }
 
-- 
2.26.2


  parent reply	other threads:[~2021-02-03 21:11 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-03 21:08 [PATCH 0/4] mm/hugetlb: Early cow on fork, and a few cleanups Peter Xu
2021-02-03 21:08 ` [PATCH 1/4] hugetlb: Dedup the code to add a new file_region Peter Xu
2021-02-03 23:01   ` Mike Kravetz
2021-02-04  1:59   ` Miaohe Lin
2021-02-03 21:08 ` [PATCH 2/4] hugetlg: Break earlier in add_reservation_in_range() when we can Peter Xu
2021-02-04  0:45   ` Mike Kravetz
2021-02-04  2:20   ` Miaohe Lin
2021-02-03 21:08 ` [PATCH 3/4] mm: Introduce page_needs_cow_for_dma() for deciding whether cow Peter Xu
2021-02-03 21:08 ` Peter Xu [this message]
2021-02-03 21:15   ` [PATCH 4/4] hugetlb: Do early cow when page pinned on src mm Linus Torvalds
2021-02-03 21:15     ` Linus Torvalds
2021-02-03 22:08     ` Peter Xu
2021-02-03 22:04   ` Mike Kravetz
2021-02-03 22:30     ` Peter Xu
2021-02-04 14:32 ` [PATCH 0/4] mm/hugetlb: Early cow on fork, and a few cleanups Gal Pressman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210203210832.113685-5-peterx@redhat.com \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=galpress@amazon.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jgg@ziepe.ca \
    --cc=kirill@shutemov.name \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    --cc=wzam@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.