All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: John Hubbard <jhubbard@nvidia.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	James Houghton <jthoughton@google.com>,
	Jann Horn <jannh@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Rik van Riel <riel@surriel.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Muchun Song <songmuchun@bytedance.com>,
	David Hildenbrand <david@redhat.com>
Subject: Re: [PATCH 08/10] mm/hugetlb: Make walk_hugetlb_range() safe to pmd unshare
Date: Tue, 6 Dec 2022 11:45:09 -0500	[thread overview]
Message-ID: <Y49xlV8I2/92Flha@x1n> (raw)
In-Reply-To: <0813b9ed-3c92-088c-4fb9-45fb648c6e73@nvidia.com>

[-- Attachment #1: Type: text/plain, Size: 2030 bytes --]

On Mon, Dec 05, 2022 at 03:52:51PM -0800, John Hubbard wrote:
> On 12/5/22 15:33, Mike Kravetz wrote:
> > On 11/29/22 14:35, Peter Xu wrote:
> > > Since walk_hugetlb_range() walks the pgtable, it needs the vma lock
> > > to make sure the pgtable page will not be freed concurrently.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >   mm/pagewalk.c | 2 ++
> > >   1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> > > index 7f1c9b274906..d98564a7be57 100644
> > > --- a/mm/pagewalk.c
> > > +++ b/mm/pagewalk.c
> > > @@ -302,6 +302,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
> > >   	const struct mm_walk_ops *ops = walk->ops;
> > >   	int err = 0;
> > > +	hugetlb_vma_lock_read(vma);
> > >   	do {
> > >   		next = hugetlb_entry_end(h, addr, end);
> > >   		pte = huge_pte_offset(walk->mm, addr & hmask, sz);
> > 
> > For each found pte, we will be calling mm_walk_ops->hugetlb_entry() with
> > the vma_lock held.  I looked into the various hugetlb_entry routines, and
> > I am not sure about hmm_vma_walk_hugetlb_entry.  It seems like it could
> > possibly call hmm_vma_fault -> handle_mm_fault -> hugetlb_fault.  If this
> > can happen, then we may have an issue as hugetlb_fault will also need to
> > acquire the vma_lock in read mode.

Thanks for spotting that, Mike.

I used to notice that path special but that's when I was still using RCU
locks who doesn't have the issue.  Then I overlooked this one when
switchover.

> > 
> > I do not know the hmm code well enough to know if this may be an actual
> > issue?
> 
> Oh, this sounds like a serious concern. If we add a new lock, and hold it
> during callbacks that also need to take it, that's not going to work out,
> right?
> 
> And yes, hmm_range_fault() and related things do a good job of revealing
> this kind of deadlock. :)

I've got a fixup attached.  John, since this got your attention please also
have a look too in case there's further issues.

Thanks,

-- 
Peter Xu

[-- Attachment #2: 0001-fixup-mm-hugetlb-Make-walk_hugetlb_range-safe-to-pmd.patch --]
[-- Type: text/plain, Size: 2966 bytes --]

From 9ad1e65a31f51a0dc687cd9d6083b9e920d2da61 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Tue, 6 Dec 2022 11:38:47 -0500
Subject: [PATCH] fixup! mm/hugetlb: Make walk_hugetlb_range() safe to pmd
 unshare
Content-type: text/plain

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 arch/s390/mm/gmap.c      | 2 ++
 fs/proc/task_mmu.c       | 2 ++
 include/linux/pagewalk.h | 8 +++++++-
 mm/hmm.c                 | 8 +++++++-
 4 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 8947451ae021..292a54c490d4 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2643,7 +2643,9 @@ static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr,
 	end = start + HPAGE_SIZE - 1;
 	__storage_key_init_range(start, end);
 	set_bit(PG_arch_1, &page->flags);
+	hugetlb_vma_unlock_read(walk->vma);
 	cond_resched();
+	hugetlb_vma_lock_read(walk->vma);
 	return 0;
 }
 
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 89338950afd3..d7155f3bb678 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1612,7 +1612,9 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
 			frame++;
 	}
 
+	hugetlb_vma_unlock_read(walk->vma);
 	cond_resched();
+	hugetlb_vma_lock_read(walk->vma);
 
 	return err;
 }
diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
index 959f52e5867d..1f7c2011f6cb 100644
--- a/include/linux/pagewalk.h
+++ b/include/linux/pagewalk.h
@@ -21,7 +21,13 @@ struct mm_walk;
  *			depth is -1 if not known, 0:PGD, 1:P4D, 2:PUD, 3:PMD.
  *			Any folded depths (where PTRS_PER_P?D is equal to 1)
  *			are skipped.
- * @hugetlb_entry:	if set, called for each hugetlb entry
+ * @hugetlb_entry:	if set, called for each hugetlb entry.	Note that
+ *			currently the hook function is protected by hugetlb
+ *			vma lock to make sure pte_t* and the spinlock is valid
+ *			to access.  If the hook function needs to yield the
+ *			thread or retake the vma lock for some reason, it
+ *			needs to properly release the vma lock manually,
+ *			and retake it before the function returns.
  * @test_walk:		caller specific callback function to determine whether
  *			we walk over the current vma or not. Returning 0 means
  *			"do page table walk over the current vma", returning
diff --git a/mm/hmm.c b/mm/hmm.c
index 3850fb625dda..dcd624f28bcf 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -493,8 +493,14 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
 	required_fault =
 		hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, cpu_flags);
 	if (required_fault) {
+		int ret;
+
 		spin_unlock(ptl);
-		return hmm_vma_fault(addr, end, required_fault, walk);
+		hugetlb_vma_unlock_read(vma);
+		/* hmm_vma_fault() can retake the vma lock */
+		ret = hmm_vma_fault(addr, end, required_fault, walk);
+		hugetlb_vma_lock_read(vma);
+		return ret;
 	}
 
 	pfn = pte_pfn(entry) + ((start & ~hmask) >> PAGE_SHIFT);
-- 
2.37.3


  reply	other threads:[~2022-12-06 16:46 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-29 19:35 [PATCH 00/10] mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare Peter Xu
2022-11-29 19:35 ` [PATCH 01/10] mm/hugetlb: Let vma_offset_start() to return start Peter Xu
2022-11-30 10:11   ` David Hildenbrand
2022-11-29 19:35 ` [PATCH 02/10] mm/hugetlb: Don't wait for migration entry during follow page Peter Xu
2022-11-30  4:37   ` Mike Kravetz
2022-11-30 10:15   ` David Hildenbrand
2022-11-29 19:35 ` [PATCH 03/10] mm/hugetlb: Document huge_pte_offset usage Peter Xu
2022-11-30  4:55   ` Mike Kravetz
2022-11-30 15:58     ` Peter Xu
2022-12-05 21:47       ` Mike Kravetz
2022-11-30 10:21   ` David Hildenbrand
2022-11-30 10:24   ` David Hildenbrand
2022-11-30 16:09     ` Peter Xu
2022-11-30 16:11       ` David Hildenbrand
2022-11-30 16:25         ` Peter Xu
2022-11-30 16:31           ` David Hildenbrand
2022-11-29 19:35 ` [PATCH 04/10] mm/hugetlb: Move swap entry handling into vma lock when faulted Peter Xu
2022-12-05 22:14   ` Mike Kravetz
2022-12-05 23:36     ` Peter Xu
2022-11-29 19:35 ` [PATCH 05/10] mm/hugetlb: Make userfaultfd_huge_must_wait() safe to pmd unshare Peter Xu
2022-11-30 16:08   ` David Hildenbrand
2022-12-05 22:23   ` Mike Kravetz
2022-11-29 19:35 ` [PATCH 06/10] mm/hugetlb: Make hugetlb_follow_page_mask() " Peter Xu
2022-11-30 16:09   ` David Hildenbrand
2022-12-05 22:29   ` Mike Kravetz
2022-11-29 19:35 ` [PATCH 07/10] mm/hugetlb: Make follow_hugetlb_page() " Peter Xu
2022-11-30 16:09   ` David Hildenbrand
2022-12-05 22:45   ` Mike Kravetz
2022-11-29 19:35 ` [PATCH 08/10] mm/hugetlb: Make walk_hugetlb_range() " Peter Xu
2022-11-30 16:11   ` David Hildenbrand
2022-12-05 23:33   ` Mike Kravetz
2022-12-05 23:52     ` John Hubbard
2022-12-06 16:45       ` Peter Xu [this message]
2022-12-06 18:50         ` Mike Kravetz
2022-12-06 21:03         ` John Hubbard
2022-12-06 21:51           ` Peter Xu
2022-12-06 22:31             ` John Hubbard
2022-12-07  0:07               ` Peter Xu
2022-12-07  2:38                 ` John Hubbard
2022-12-07 14:58                   ` Peter Xu
2022-11-29 19:35 ` [PATCH 09/10] mm/hugetlb: Make page_vma_mapped_walk() " Peter Xu
2022-11-30 16:18   ` David Hildenbrand
2022-11-30 16:32     ` Peter Xu
2022-11-30 16:39       ` David Hildenbrand
2022-12-05 23:52   ` Mike Kravetz
2022-12-06 17:10     ` Mike Kravetz
2022-12-06 17:39       ` Peter Xu
2022-12-06 17:43         ` Peter Xu
2022-12-06 19:58           ` Mike Kravetz
2022-11-29 19:35 ` [PATCH 10/10] mm/hugetlb: Introduce hugetlb_walk() Peter Xu
2022-11-30  5:18   ` Eric Biggers
2022-11-30 15:37     ` Peter Xu
2022-12-06  0:21       ` Mike Kravetz
2022-11-29 20:49 ` [PATCH 00/10] mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare Andrew Morton
2022-11-29 21:19   ` Peter Xu
2022-11-29 21:26     ` Andrew Morton
2022-11-29 20:51 ` Andrew Morton
2022-11-29 21:36   ` Peter Xu
2022-11-30  9:46 ` David Hildenbrand
2022-11-30 16:23   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y49xlV8I2/92Flha@x1n \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=jannh@google.com \
    --cc=jhubbard@nvidia.com \
    --cc=jthoughton@google.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nadav.amit@gmail.com \
    --cc=riel@surriel.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.