linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Hubbard <jhubbard@nvidia.com>
To: Peter Xu <peterx@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>,
	James Houghton <jthoughton@google.com>,
	"Jann Horn" <jannh@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	Rik van Riel <riel@surriel.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Muchun Song <songmuchun@bytedance.com>,
	David Hildenbrand <david@redhat.com>
Subject: Re: [PATCH 08/10] mm/hugetlb: Make walk_hugetlb_range() safe to pmd unshare
Date: Tue, 6 Dec 2022 18:38:54 -0800	[thread overview]
Message-ID: <3f032921-c373-e942-a857-4328d7993ef0@nvidia.com> (raw)
In-Reply-To: <Y4/ZT3ab9TL1j5TL@x1n>

On 12/6/22 16:07, Peter Xu wrote:
> I thought I answered this one at [1] above.  If not, I can extend the
> answer.

[1] explains it, but it doesn't mention why it's safe to drop and reacquire.

...
> 
> If we touch it, it's a potential bug as you mentioned.  But we didn't.
> 
> Hope it explains.

I think it's OK after all, because hmm_vma_fault() does revalidate after
it takes the vma lock, so that closes the loop that I was fretting over.

I was just also worried that I'd missed some other place, but it looks
like that's not the case.

So, good.

How about this incremental diff on top, as an attempt to clarify what's
going on? Or is this too much wordage? Sometimes I write too many words:


diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
index 1f7c2011f6cb..27a6df448ee5 100644
--- a/include/linux/pagewalk.h
+++ b/include/linux/pagewalk.h
@@ -21,13 +21,16 @@ struct mm_walk;
   *			depth is -1 if not known, 0:PGD, 1:P4D, 2:PUD, 3:PMD.
   *			Any folded depths (where PTRS_PER_P?D is equal to 1)
   *			are skipped.
- * @hugetlb_entry:	if set, called for each hugetlb entry.	Note that
- *			currently the hook function is protected by hugetlb
- *			vma lock to make sure pte_t* and the spinlock is valid
- *			to access.  If the hook function needs to yield the
- *			thread or retake the vma lock for some reason, it
- *			needs to properly release the vma lock manually,
- *			and retake it before the function returns.
+ * @hugetlb_entry:	if set, called for each hugetlb entry. This hook
+ *			function is called with the vma lock held, in order to
+ *			protect against a concurrent freeing of the pte_t* or
+ *			the ptl. In some cases, the hook function needs to drop
+ *			and retake the vma lock in order to avoid deadlocks
+ *			while calling other functions. In such cases the hook
+ *			function must either refrain from accessing the pte or
+ *			ptl after dropping the vma lock, or else revalidate
+ *			those items after re-acquiring the vma lock and before
+ *			accessing them.
   * @test_walk:		caller specific callback function to determine whether
   *			we walk over the current vma or not. Returning 0 means
   *			"do page table walk over the current vma", returning
diff --git a/mm/hmm.c b/mm/hmm.c
index dcd624f28bcf..b428f2011cfd 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -497,7 +497,13 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
  
  		spin_unlock(ptl);
  		hugetlb_vma_unlock_read(vma);
-		/* hmm_vma_fault() can retake the vma lock */
+		/*
+		 * Avoid deadlock: drop the vma lock before calling
+		 * hmm_vma_fault(), which will itself potentially take and drop
+		 * the vma lock. This is also correct from a protection point of
+		 * view, because there is no further use here of either pte or
+		 * ptl after dropping the vma lock.
+		 */
  		ret = hmm_vma_fault(addr, end, required_fault, walk);
  		hugetlb_vma_lock_read(vma);
  		return ret;

>> I guess it's on me to think of something cleaner, so if I do I'll pipe
>> up. :)
> 
> That'll be very much appricated.
> 
> It's really that I don't know how to make this better, or I can rework the
> series as long as it hasn't land upstream.
> 

It's always 10x easier to notice an imperfection, than it is to improve on
it. :)

thanks,
-- 
John Hubbard
NVIDIA

  reply	other threads:[~2022-12-07  2:39 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-29 19:35 [PATCH 00/10] mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare Peter Xu
2022-11-29 19:35 ` [PATCH 01/10] mm/hugetlb: Let vma_offset_start() to return start Peter Xu
2022-11-30 10:11   ` David Hildenbrand
2022-11-29 19:35 ` [PATCH 02/10] mm/hugetlb: Don't wait for migration entry during follow page Peter Xu
2022-11-30  4:37   ` Mike Kravetz
2022-11-30 10:15   ` David Hildenbrand
2022-11-29 19:35 ` [PATCH 03/10] mm/hugetlb: Document huge_pte_offset usage Peter Xu
2022-11-30  4:55   ` Mike Kravetz
2022-11-30 15:58     ` Peter Xu
2022-12-05 21:47       ` Mike Kravetz
2022-11-30 10:21   ` David Hildenbrand
2022-11-30 10:24   ` David Hildenbrand
2022-11-30 16:09     ` Peter Xu
2022-11-30 16:11       ` David Hildenbrand
2022-11-30 16:25         ` Peter Xu
2022-11-30 16:31           ` David Hildenbrand
2022-11-29 19:35 ` [PATCH 04/10] mm/hugetlb: Move swap entry handling into vma lock when faulted Peter Xu
2022-12-05 22:14   ` Mike Kravetz
2022-12-05 23:36     ` Peter Xu
2022-11-29 19:35 ` [PATCH 05/10] mm/hugetlb: Make userfaultfd_huge_must_wait() safe to pmd unshare Peter Xu
2022-11-30 16:08   ` David Hildenbrand
2022-12-05 22:23   ` Mike Kravetz
2022-11-29 19:35 ` [PATCH 06/10] mm/hugetlb: Make hugetlb_follow_page_mask() " Peter Xu
2022-11-30 16:09   ` David Hildenbrand
2022-12-05 22:29   ` Mike Kravetz
2022-11-29 19:35 ` [PATCH 07/10] mm/hugetlb: Make follow_hugetlb_page() " Peter Xu
2022-11-30 16:09   ` David Hildenbrand
2022-12-05 22:45   ` Mike Kravetz
2022-11-29 19:35 ` [PATCH 08/10] mm/hugetlb: Make walk_hugetlb_range() " Peter Xu
2022-11-30 16:11   ` David Hildenbrand
2022-12-05 23:33   ` Mike Kravetz
2022-12-05 23:52     ` John Hubbard
2022-12-06 16:45       ` Peter Xu
2022-12-06 18:50         ` Mike Kravetz
2022-12-06 21:03         ` John Hubbard
2022-12-06 21:51           ` Peter Xu
2022-12-06 22:31             ` John Hubbard
2022-12-07  0:07               ` Peter Xu
2022-12-07  2:38                 ` John Hubbard [this message]
2022-12-07 14:58                   ` Peter Xu
2022-11-29 19:35 ` [PATCH 09/10] mm/hugetlb: Make page_vma_mapped_walk() " Peter Xu
2022-11-30 16:18   ` David Hildenbrand
2022-11-30 16:32     ` Peter Xu
2022-11-30 16:39       ` David Hildenbrand
2022-12-05 23:52   ` Mike Kravetz
2022-12-06 17:10     ` Mike Kravetz
2022-12-06 17:39       ` Peter Xu
2022-12-06 17:43         ` Peter Xu
2022-12-06 19:58           ` Mike Kravetz
2022-11-29 19:35 ` [PATCH 10/10] mm/hugetlb: Introduce hugetlb_walk() Peter Xu
2022-11-30  5:18   ` Eric Biggers
2022-11-30 15:37     ` Peter Xu
2022-12-06  0:21       ` Mike Kravetz
2022-11-29 20:49 ` [PATCH 00/10] mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare Andrew Morton
2022-11-29 21:19   ` Peter Xu
2022-11-29 21:26     ` Andrew Morton
2022-11-29 20:51 ` Andrew Morton
2022-11-29 21:36   ` Peter Xu
2022-11-30  9:46 ` David Hildenbrand
2022-11-30 16:23   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3f032921-c373-e942-a857-4328d7993ef0@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=jannh@google.com \
    --cc=jthoughton@google.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nadav.amit@gmail.com \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).