All of lore.kernel.org
 help / color / mirror / Atom feed
From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: akpm@linux-foundation.org, songmuchun@bytedance.com,
	mike.kravetz@oracle.com
Cc: baolin.wang@linux.alibaba.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH v2 1/5] mm/hugetlb: fix races when looking up a CONT-PTE size hugetlb page
Date: Tue, 23 Aug 2022 15:50:01 +0800	[thread overview]
Message-ID: <0e5d92da043d147a867f634b17acbcc97a7f0e64.1661240170.git.baolin.wang@linux.alibaba.com> (raw)
In-Reply-To: <cover.1661240170.git.baolin.wang@linux.alibaba.com>
In-Reply-To: <cover.1661240170.git.baolin.wang@linux.alibaba.com>

On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb
(2M and 1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size
specified.

So when looking up a CONT-PTE size hugetlb page by follow_page(), it
will use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE
size hugetlb in follow_page_pte(). However this pte entry lock is incorrect
for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to
get the correct lock, which is mm->page_table_lock.

That means the pte entry of the CONT-PTE size hugetlb under current
pte lock is unstable in follow_page_pte(), we can continue to migrate
or poison the pte entry of the CONT-PTE size hugetlb, which can cause
some potential race issues, and following pte_xxx() validation is also
unstable in follow_page_pte(), even though they are under the 'pte lock'.

Moreover we should use huge_ptep_get() to get the pte entry value of
the CONT-PTE size hugetlb, which already takes into account the subpages'
dirty or young bits in case we missed the dirty or young state of the
CONT-PTE size hugetlb.

To fix above issues, introducing a new helper follow_huge_pte() to look
up a CONT-PTE size hugetlb page, which uses huge_pte_lock() to get the
correct pte entry lock to make the pte entry stable, as well as
supporting non-present pte handling.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 include/linux/hugetlb.h |  8 ++++++++
 mm/gup.c                | 11 ++++++++++
 mm/hugetlb.c            | 53 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 72 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 3ec981a..d491138 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -207,6 +207,8 @@ struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
 struct page *follow_huge_pd(struct vm_area_struct *vma,
 			    unsigned long address, hugepd_t hpd,
 			    int flags, int pdshift);
+struct page *follow_huge_pte(struct vm_area_struct *vma, unsigned long address,
+			     pmd_t *pmd, int flags);
 struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 				pmd_t *pmd, int flags);
 struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
@@ -312,6 +314,12 @@ static inline struct page *follow_huge_pd(struct vm_area_struct *vma,
 	return NULL;
 }
 
+static inline struct page *follow_huge_pte(struct vm_area_struct *vma,
+				unsigned long address, pmd_t *pmd, int flags)
+{
+	return NULL;
+}
+
 static inline struct page *follow_huge_pmd(struct mm_struct *mm,
 				unsigned long address, pmd_t *pmd, int flags)
 {
diff --git a/mm/gup.c b/mm/gup.c
index 3b656b7..87a94f5 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -534,6 +534,17 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 	if (unlikely(pmd_bad(*pmd)))
 		return no_page_table(vma, flags);
 
+	/*
+	 * Considering PTE level hugetlb, like continuous-PTE hugetlb on
+	 * ARM64 architecture.
+	 */
+	if (is_vm_hugetlb_page(vma)) {
+		page = follow_huge_pte(vma, address, pmd, flags);
+		if (page)
+			return page;
+		return no_page_table(vma, flags);
+	}
+
 	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
 	pte = *ptep;
 	if (!pte_present(pte)) {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6c00ba1..cf742d1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6981,6 +6981,59 @@ struct page * __weak
 	return NULL;
 }
 
+/* Support looking up a CONT-PTE size hugetlb page. */
+struct page * __weak
+follow_huge_pte(struct vm_area_struct *vma, unsigned long address,
+		pmd_t *pmd, int flags)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	struct hstate *hstate = hstate_vma(vma);
+	unsigned long size = huge_page_size(hstate);
+	struct page *page = NULL;
+	spinlock_t *ptl;
+	pte_t *ptep, pte;
+
+	/*
+	 * FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via
+	 * follow_hugetlb_page().
+	 */
+	if (WARN_ON_ONCE(flags & FOLL_PIN))
+		return NULL;
+
+	ptep = huge_pte_offset(mm, address, size);
+	if (!ptep)
+		return NULL;
+
+retry:
+	ptl = huge_pte_lock(hstate, mm, ptep);
+	pte = huge_ptep_get(ptep);
+	if (pte_present(pte)) {
+		page = pte_page(pte);
+		if (WARN_ON_ONCE(!try_grab_page(page, flags))) {
+			page = NULL;
+			goto out;
+		}
+	} else {
+		if (!(flags & FOLL_MIGRATION)) {
+			page = NULL;
+			goto out;
+		}
+
+		if (is_hugetlb_entry_migration(pte)) {
+			spin_unlock(ptl);
+			__migration_entry_wait_huge(ptep, ptl);
+			goto retry;
+		}
+		/*
+		 * hwpoisoned entry is treated as no_page_table in
+		 * follow_page_mask().
+		 */
+	}
+out:
+	spin_unlock(ptl);
+	return page;
+}
+
 struct page * __weak
 follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 		pmd_t *pmd, int flags)
-- 
1.8.3.1


  reply	other threads:[~2022-08-23  7:51 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-23  7:50 [PATCH v2 0/5] Fix some issues when looking up hugetlb page Baolin Wang
2022-08-23  7:50 ` Baolin Wang [this message]
2022-08-23  8:29   ` [PATCH v2 1/5] mm/hugetlb: fix races when looking up a CONT-PTE size " David Hildenbrand
2022-08-23 10:02     ` Baolin Wang
2022-08-23 10:23       ` David Hildenbrand
2022-08-23 23:55         ` Mike Kravetz
2022-08-24  2:06           ` Baolin Wang
2022-08-24  7:31             ` David Hildenbrand
2022-08-24  9:41               ` Baolin Wang
2022-08-24 11:55                 ` David Hildenbrand
2022-08-24 14:30                   ` Baolin Wang
2022-08-24 14:33                     ` David Hildenbrand
2022-08-24 15:06                       ` Baolin Wang
2022-08-24 15:13                         ` David Hildenbrand
2022-08-24 15:23                           ` Baolin Wang
2022-08-24 23:34                 ` Mike Kravetz
2022-08-25  1:43                   ` Baolin Wang
2022-08-25  7:10                     ` David Hildenbrand
2022-08-25  7:58                       ` Baolin Wang
2022-08-25 18:30                     ` Mike Kravetz
2022-08-25  7:25                   ` David Hildenbrand
2022-08-25 10:54                     ` Baolin Wang
2022-08-25 21:13                     ` Mike Kravetz
2022-08-26 22:40                       ` Mike Kravetz
2022-08-27 13:59                       ` Aneesh Kumar K.V
2022-08-29 18:30                         ` Mike Kravetz
2022-08-23  7:50 ` [PATCH v2 2/5] mm/hugetlb: use PTE page lock to protect CONT-PTE entries Baolin Wang
2022-08-23  7:50 ` [PATCH v2 3/5] mm/hugetlb: fix races when looking up a CONT-PMD size hugetlb page Baolin Wang
2022-08-23  7:50 ` [PATCH v2 4/5] mm/hugetlb: use PMD page lock to protect CONT-PTE entries Baolin Wang
2022-08-23  8:14   ` David Hildenbrand
2022-08-23 10:12     ` Baolin Wang
2022-08-23  7:50 ` [PATCH v2 5/5] mm/hugetlb: add FOLL_MIGRATION validation before waiting for a migration entry Baolin Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0e5d92da043d147a867f634b17acbcc97a7f0e64.1661240170.git.baolin.wang@linux.alibaba.com \
    --to=baolin.wang@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.