All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Hubbard <jhubbard@nvidia.com>
To: Peter Xu <peterx@redhat.com>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Miaohe Lin <linmiaohe@huawei.com>,
	David Hildenbrand <david@redhat.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Jann Horn <jannh@google.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	James Houghton <jthoughton@google.com>,
	Rik van Riel <riel@surriel.com>,
	Muchun Song <songmuchun@bytedance.com>
Subject: Re: [PATCH v3 8/9] mm/hugetlb: Make walk_hugetlb_range() safe to pmd unshare
Date: Fri, 9 Dec 2022 11:58:50 -0800	[thread overview]
Message-ID: <bbe8baf3-45b5-e843-2dcd-e7f0de035066@nvidia.com> (raw)
In-Reply-To: <20221209170100.973970-9-peterx@redhat.com>

On 12/9/22 09:00, Peter Xu wrote:
> Since walk_hugetlb_range() walks the pgtable, it needs the vma lock
> to make sure the pgtable page will not be freed concurrently.
> 
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   include/linux/pagewalk.h | 11 ++++++++++-
>   mm/hmm.c                 | 15 ++++++++++++++-
>   mm/pagewalk.c            |  2 ++
>   3 files changed, 26 insertions(+), 2 deletions(-)

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard
NVIDIA

> 
> diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
> index 959f52e5867d..27a6df448ee5 100644
> --- a/include/linux/pagewalk.h
> +++ b/include/linux/pagewalk.h
> @@ -21,7 +21,16 @@ struct mm_walk;
>    *			depth is -1 if not known, 0:PGD, 1:P4D, 2:PUD, 3:PMD.
>    *			Any folded depths (where PTRS_PER_P?D is equal to 1)
>    *			are skipped.
> - * @hugetlb_entry:	if set, called for each hugetlb entry
> + * @hugetlb_entry:	if set, called for each hugetlb entry. This hook
> + *			function is called with the vma lock held, in order to
> + *			protect against a concurrent freeing of the pte_t* or
> + *			the ptl. In some cases, the hook function needs to drop
> + *			and retake the vma lock in order to avoid deadlocks
> + *			while calling other functions. In such cases the hook
> + *			function must either refrain from accessing the pte or
> + *			ptl after dropping the vma lock, or else revalidate
> + *			those items after re-acquiring the vma lock and before
> + *			accessing them.
>    * @test_walk:		caller specific callback function to determine whether
>    *			we walk over the current vma or not. Returning 0 means
>    *			"do page table walk over the current vma", returning
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 3850fb625dda..796de6866089 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -493,8 +493,21 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
>   	required_fault =
>   		hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, cpu_flags);
>   	if (required_fault) {
> +		int ret;
> +
>   		spin_unlock(ptl);
> -		return hmm_vma_fault(addr, end, required_fault, walk);
> +		hugetlb_vma_unlock_read(vma);
> +		/*
> +		 * Avoid deadlock: drop the vma lock before calling
> +		 * hmm_vma_fault(), which will itself potentially take and
> +		 * drop the vma lock. This is also correct from a
> +		 * protection point of view, because there is no further
> +		 * use here of either pte or ptl after dropping the vma
> +		 * lock.
> +		 */
> +		ret = hmm_vma_fault(addr, end, required_fault, walk);
> +		hugetlb_vma_lock_read(vma);
> +		return ret;
>   	}
>   
>   	pfn = pte_pfn(entry) + ((start & ~hmask) >> PAGE_SHIFT);
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index 7f1c9b274906..d98564a7be57 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -302,6 +302,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
>   	const struct mm_walk_ops *ops = walk->ops;
>   	int err = 0;
>   
> +	hugetlb_vma_lock_read(vma);
>   	do {
>   		next = hugetlb_entry_end(h, addr, end);
>   		pte = huge_pte_offset(walk->mm, addr & hmask, sz);
> @@ -314,6 +315,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
>   		if (err)
>   			break;
>   	} while (addr = next, addr != end);
> +	hugetlb_vma_unlock_read(vma);
>   
>   	return err;
>   }


  reply	other threads:[~2022-12-09 19:59 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-09 17:00 [PATCH v3 0/9] mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare Peter Xu
2022-12-09 17:00 ` [PATCH v3 1/9] mm/hugetlb: Let vma_offset_start() to return start Peter Xu
2022-12-09 17:00 ` [PATCH v3 2/9] mm/hugetlb: Don't wait for migration entry during follow page Peter Xu
2022-12-09 17:00 ` [PATCH v3 3/9] mm/hugetlb: Document huge_pte_offset usage Peter Xu
2022-12-09 17:00 ` [PATCH v3 4/9] mm/hugetlb: Move swap entry handling into vma lock when faulted Peter Xu
2022-12-09 17:00 ` [PATCH v3 5/9] mm/hugetlb: Make userfaultfd_huge_must_wait() safe to pmd unshare Peter Xu
2022-12-09 17:00 ` [PATCH v3 6/9] mm/hugetlb: Make hugetlb_follow_page_mask() " Peter Xu
2022-12-09 17:00 ` [PATCH v3 7/9] mm/hugetlb: Make follow_hugetlb_page() " Peter Xu
2022-12-09 17:00 ` [PATCH v3 8/9] mm/hugetlb: Make walk_hugetlb_range() " Peter Xu
2022-12-09 19:58   ` John Hubbard [this message]
2022-12-09 17:01 ` [PATCH v3 9/9] mm/hugetlb: Introduce hugetlb_walk() Peter Xu
2022-12-09 20:06   ` John Hubbard
2022-12-09 18:57 ` [PATCH v3 0/9] mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bbe8baf3-45b5-e843-2dcd-e7f0de035066@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=jannh@google.com \
    --cc=jthoughton@google.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nadav.amit@gmail.com \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.