All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, stable@vger.kernel.org
Subject: Re: [PATCH v2 2/2] mm/hugetlb: support write-faults in shared mappings
Date: Thu, 11 Aug 2022 11:59:09 -0700	[thread overview]
Message-ID: <YvVRfSYsPOraTo6o@monkey> (raw)
In-Reply-To: <20220811103435.188481-3-david@redhat.com>

On 08/11/22 12:34, David Hildenbrand wrote:
> If we ever get a write-fault on a write-protected page in a shared mapping,
> we'd be in trouble (again). Instead, we can simply map the page writable.
> 
<snip>
> 
> Reason is that uffd-wp doesn't clear the uffd-wp PTE bit when
> unregistering and consequently keeps the PTE writeprotected. Reason for
> this is to avoid the additional overhead when unregistering. Note
> that this is the case also for !hugetlb and that we will end up with
> writable PTEs that still have the uffd-wp PTE bit set once we return
> from hugetlb_wp(). I'm not touching the uffd-wp PTE bit for now, because it
> seems to be a generic thing -- wp_page_reuse() also doesn't clear it.
> 
> VM_MAYSHARE handling in hugetlb_fault() for FAULT_FLAG_WRITE
> indicates that MAP_SHARED handling was at least envisioned, but could never
> have worked as expected.
> 
> While at it, make sure that we never end up in hugetlb_wp() on write
> faults without VM_WRITE, because we don't support maybe_mkwrite()
> semantics as commonly used in the !hugetlb case -- for example, in
> wp_page_reuse().

Nit,
to me 'make sure that we never end up in hugetlb_wp()' implies that
we would check for condition in callers as opposed to first thing in
hugetlb_wp().  However, I am OK with description as it.

> Note that there is no need to do any kind of reservation in hugetlb_fault()
> in this case ... because we already have a hugetlb page mapped R/O
> that we will simply map writable and we are not dealing with COW/unsharing.

Note that we are not really doing any reservation adjustment in
hugetlb_fault().  That code does pre-allocation of reservation data in
case we might need it in hugetlb_wp.  Since hugetlb_wp will certainly
not do an allocation in this case, we do not even need to do the
preallocation here.  This change is more of an optimization.  I am still
happy with it.

> 
> Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs")
> Cc: <stable@vger.kernel.org> # v5.19
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/hugetlb.c | 26 +++++++++++++++++++-------
>  1 file changed, 19 insertions(+), 7 deletions(-)

Thanks,

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
-- 
Mike Kravetz

> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 0aee2f3ae15c..2480ba627aa5 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5241,6 +5241,21 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
>  	VM_BUG_ON(unshare && (flags & FOLL_WRITE));
>  	VM_BUG_ON(!unshare && !(flags & FOLL_WRITE));
>  
> +	/*
> +	 * hugetlb does not support FOLL_FORCE-style write faults that keep the
> +	 * PTE mapped R/O such as maybe_mkwrite() would do.
> +	 */
> +	if (WARN_ON_ONCE(!unshare && !(vma->vm_flags & VM_WRITE)))
> +		return VM_FAULT_SIGSEGV;
> +
> +	/* Let's take out MAP_SHARED mappings first. */
> +	if (vma->vm_flags & VM_MAYSHARE) {
> +		if (unlikely(unshare))
> +			return 0;
> +		set_huge_ptep_writable(vma, haddr, ptep);
> +		return 0;
> +	}
> +
>  	pte = huge_ptep_get(ptep);
>  	old_page = pte_page(pte);
>  
> @@ -5781,12 +5796,11 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  	 * If we are going to COW/unshare the mapping later, we examine the
>  	 * pending reservations for this page now. This will ensure that any
>  	 * allocations necessary to record that reservation occur outside the
> -	 * spinlock. For private mappings, we also lookup the pagecache
> -	 * page now as it is used to determine if a reservation has been
> -	 * consumed.
> +	 * spinlock. Also lookup the pagecache page now as it is used to
> +	 * determine if a reservation has been consumed.
>  	 */
>  	if ((flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) &&
> -	    !huge_pte_write(entry)) {
> +	    !(vma->vm_flags & VM_MAYSHARE) && !huge_pte_write(entry)) {
>  		if (vma_needs_reservation(h, vma, haddr) < 0) {
>  			ret = VM_FAULT_OOM;
>  			goto out_mutex;
> @@ -5794,9 +5808,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  		/* Just decrements count, does not deallocate */
>  		vma_end_reservation(h, vma, haddr);
>  
> -		if (!(vma->vm_flags & VM_MAYSHARE))
> -			pagecache_page = hugetlbfs_pagecache_page(h,
> -								vma, haddr);
> +		pagecache_page = hugetlbfs_pagecache_page(h, vma, haddr);
>  	}
>  
>  	ptl = huge_pte_lock(h, mm, ptep);
> -- 
> 2.35.3
> 
> 

  parent reply	other threads:[~2022-08-11 18:59 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-11 10:34 [PATCH v2 0/2] mm/hugetlb: fix write-fault handling for shared mappings David Hildenbrand
2022-08-11 10:34 ` [PATCH v2 1/2] mm/hugetlb: fix hugetlb not supporting softdirty tracking David Hildenbrand
2022-08-11 18:27   ` Mike Kravetz
2022-08-11 10:34 ` [PATCH v2 2/2] mm/hugetlb: support write-faults in shared mappings David Hildenbrand
2022-08-11 13:59   ` Peter Xu
2022-08-11 16:24     ` David Hildenbrand
2022-08-11 18:59   ` Mike Kravetz [this message]
2022-08-15 13:35     ` Gerald Schaefer
2022-08-15 15:07       ` David Hildenbrand
2022-08-15 15:59         ` Gerald Schaefer
2022-08-15 18:03           ` David Hildenbrand
2022-08-15 18:38             ` Gerald Schaefer
2022-08-15 21:43               ` Mike Kravetz
2022-08-16  9:33                 ` Gerald Schaefer
2022-08-16 20:43                   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YvVRfSYsPOraTo6o@monkey \
    --to=mike.kravetz@oracle.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.