All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	David Hildenbrand <david@redhat.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Mina Almasry <almasrymina@google.com>,
	Rik van Riel <riel@surriel.com>, Vlastimil Babka <vbabka@suse.cz>,
	Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Wei Chen <harperchen1110@gmail.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH v2] hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing
Date: Sat, 29 Oct 2022 17:15:55 -0700	[thread overview]
Message-ID: <Y13CO8iIGfDnV24u@monkey> (raw)
In-Reply-To: <Y1xjyLWNCK7p0XSv@x1n>

On 10/28/22 19:20, Peter Xu wrote:
> On Fri, Oct 28, 2022 at 02:17:01PM -0700, Mike Kravetz wrote:
> > On 10/28/22 12:13, Peter Xu wrote:
> > > On Fri, Oct 28, 2022 at 08:23:25AM -0700, Mike Kravetz wrote:
> > > > On 10/26/22 21:12, Peter Xu wrote:
> > > > > On Wed, Oct 26, 2022 at 04:54:01PM -0700, Mike Kravetz wrote:
> > > > > > On 10/26/22 17:42, Peter Xu wrote:
> > > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > > index c7105ec6d08c..d8b4d7e56939 100644
> > > > --- a/mm/madvise.c
> > > > +++ b/mm/madvise.c
> > > > @@ -790,7 +790,10 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
> > > >  static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
> > > >  					unsigned long start, unsigned long end)
> > > >  {
> > > > -	zap_page_range(vma, start, end - start);
> > > > +	if (!is_vm_hugetlb_page(vma))
> > > > +		zap_page_range(vma, start, end - start);
> > > > +	else
> > > > +		clear_hugetlb_page_range(vma, start, end);
> > > 
> > > With the new ZAP_FLAG_UNMAP flag, clear_hugetlb_page_range() can be dropped
> > > completely?  As zap_page_range() won't be with ZAP_FLAG_UNMAP so we can
> > > identify things?
> > > 
> > > IIUC that's the major reason why I thought the zap flag could be helpful..
> > 
> > Argh.  I went to drop clear_hugetlb_page_range() but there is one issue.
> > In zap_page_range() the MMU_NOTIFY_CLEAR notifier is certainly called.
> > However, we really need to have a 'adjust_range_if_pmd_sharing_possible'
> > call in there because the 'range' may be part of a shared pmd.  :(
> > 
> > I think we need to either have a separate routine like clear_hugetlb_page_range
> > that sets up the appropriate range, or special case hugetlb in zap_page_range.
> > What do you think?
> > I think clear_hugetlb_page_range is the least bad of the two options.
> 
> How about special case hugetlb as you mentioned?  If I'm not wrong, it
> should be 3 lines change:
> 
> ---8<---
> diff --git a/mm/memory.c b/mm/memory.c
> index c5599a9279b1..0a1632e44571 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1706,11 +1706,13 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
>         lru_add_drain();
>         mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
>                                 start, start + size);
> +       if (is_vm_hugetlb_page(vma))
> +               adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end);
>         tlb_gather_mmu(&tlb, vma->vm_mm);
>         update_hiwater_rss(vma->vm_mm);
>         mmu_notifier_invalidate_range_start(&range);
>         do {
> -               unmap_single_vma(&tlb, vma, start, range.end, NULL);
> +               unmap_single_vma(&tlb, vma, start, start + size, NULL);
>         } while ((vma = mas_find(&mas, end - 1)) != NULL);
>         mmu_notifier_invalidate_range_end(&range);
>         tlb_finish_mmu(&tlb);
> ---8<---
> 
> As zap_page_range() is already vma-oriented anyway.  But maybe I missed
> something important?

zap_page_range is a bit confusing.  It appears that the passed range can
span multiple vmas.  Otherwise, there would be no do while loop.  Yet, there
is only one mmu_notifier_range_init call specifying the passed vma.

It appears all callers pass a range entirely within a single vma.

The modifications above would work for a range within a single vma.  However,
things would be more complicated if the range can indeed span multiple vmas.
For multiple vmas, we would need to check the first and last vmas for
pmd sharing.

Anyone know more about this seeming confusing behavior?  Perhaps, range
spanning multiple vmas was left over earlier code?
-- 
Mike Kravetz

  reply	other threads:[~2022-10-30  0:18 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-23  2:50 [PATCH v2] hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing Mike Kravetz
2022-10-24 21:55 ` Mike Kravetz
2022-10-24 23:14   ` Mike Kravetz
2022-10-26 21:42 ` Peter Xu
2022-10-26 23:54   ` Mike Kravetz
2022-10-27  1:12     ` Peter Xu
2022-10-28 15:23       ` Mike Kravetz
2022-10-28 16:13         ` Peter Xu
2022-10-28 21:17           ` Mike Kravetz
2022-10-28 23:20             ` Peter Xu
2022-10-30  0:15               ` Mike Kravetz [this message]
2022-10-30  0:54                 ` Nadav Amit
2022-10-30 18:43                   ` Peter Xu
2022-10-30 18:52                     ` Nadav Amit
2022-10-31  1:44                       ` Mike Kravetz
2022-11-02 19:24                         ` Peter Xu
2022-11-07 20:01                           ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y13CO8iIGfDnV24u@monkey \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=harperchen1110@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=naoya.horiguchi@linux.dev \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.