linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Suleiman Souhlal <suleiman@google.com>,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [patch for-3.17] mm, thp: fix collapsing of hugepages on madvise
Date: Mon, 6 Oct 2014 02:42:19 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.02.1410060237580.12568@chino.kir.corp.google.com> (raw)
In-Reply-To: <20141005184115.GA21713@node.dhcp.inet.fi>

On Sun, 5 Oct 2014, Kirill A. Shutemov wrote:

> On Sat, Oct 04, 2014 at 07:48:04PM -0700, David Rientjes wrote:
> > If an anonymous mapping is not allowed to fault thp memory and then
> > madvise(MADV_HUGEPAGE) is used after fault, khugepaged will never
> > collapse this memory into thp memory.
> > 
> > This occurs because the madvise(2) handler for thp, hugepage_advise(),
> > clears VM_NOHUGEPAGE on the stack and it isn't stored in vma->vm_flags
> > until the final action of madvise_behavior().  This causes the
> > khugepaged_enter_vma_merge() to be a no-op in hugepage_advise() when the

This should be hugepage_madvise().

> > vma had previously had VM_NOHUGEPAGE set.
> > 
> > Fix this by passing the correct vma flags to the khugepaged mm slot
> > handler.  There's no chance khugepaged can run on this vma until after
> > madvise_behavior() returns since we hold mm->mmap_sem.
> > 
> > It would be possible to clear VM_NOHUGEPAGE directly from vma->vm_flags
> > in hugepage_advise(), but I didn't want to introduce special case
> > behavior into madvise_behavior().  I think it's best to just let it
> > always set vma->vm_flags itself.
> > 
> > Cc: <stable@vger.kernel.org>
> > Reported-by: Suleiman Souhlal <suleiman@google.com>
> > Signed-off-by: David Rientjes <rientjes@google.com>
> 
> Look like rather complex fix for a not that complex bug.
> What about untested patch below?
> 
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Sun, 5 Oct 2014 21:22:43 +0300
> Subject: [PATCH] thp: fix registering VMA into khugepaged on
>  madvise(MADV_HUGEPAGE)
> 
> hugepage_madvise() tries to register VMA into khugepaged with
> khugepaged_enter_vma_merge() on madvise(MADV_HUGEPAGE). Unfortunately
> it's effectevely nop, since khugepaged_enter_vma_merge() rely on
> vma->vm_flags which has not yet updated by the time of
> hugepage_madvise().
> 
> Let's move khugepaged_enter_vma_merge() to the end of madvise_behavior().
> Now we also have chance to catch VMAs which become good for THP after
> vma_merge().
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  mm/huge_memory.c | 8 +++-----
>  mm/madvise.c     | 6 ++++++
>  2 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index f8ffd9412ec5..f84d52158a66 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1966,12 +1966,10 @@ int hugepage_madvise(struct vm_area_struct *vma,
>  		*vm_flags &= ~VM_NOHUGEPAGE;
>  		*vm_flags |= VM_HUGEPAGE;
>  		/*
> -		 * If the vma become good for khugepaged to scan,
> -		 * register it here without waiting a page fault that
> -		 * may not happen any time soon.
> +		 * vma->vm_flags is not yet updated here. madvise_behavior()
> +		 * will take care to register it in khugepaged once flags
> +		 * updated.
>  		 */
> -		if (unlikely(khugepaged_enter_vma_merge(vma)))
> -			return -ENOMEM;
>  		break;
>  	case MADV_NOHUGEPAGE:
>  		/*
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 0938b30da4ab..60effd2c5e9c 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -128,6 +128,12 @@ success:
>  	 */
>  	vma->vm_flags = new_flags;
>  
> +	/*
> +	 * If the vma become good for khugepaged to scan, register it here
> +	 * without waiting a page fault that may not happen any time soon.
> +	 */
> +	if (unlikely(khugepaged_enter_vma_merge(vma)))
> +		error = -ENOMEM;
>  out:
>  	if (error == -ENOMEM)
>  		error = -EAGAIN;

I'm pretty sure this won't compile, but I'm also pretty sure it's easy to 
come up with an madvise() bit for anon vmas that would cause the BUG_ON() 
to trigger for CONFIG_DEBUG_VM and unnecessarily do alloc_mm_slot() for 
madvise() calls that aren't MADV_HUGEPAGE with this that go through the 
madvise_behavior() path, and for that reason it's probably not as 
extendable as we'd like.  I can verify this tomorrow if you'd like.  This 
is the point of the last paragraph of my changelog to isolate all thp 
behavior changes to MADV_HUGEPAGE and MADV_NOHUGEPAGE in one place as it's 
currently done and not add any special handling in madvise_behavior().

  parent reply	other threads:[~2014-10-06  9:42 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-05  2:48 [patch for-3.17] mm, thp: fix collapsing of hugepages on madvise David Rientjes
2014-10-05 17:15 ` Linus Torvalds
2014-10-05 18:41 ` Kirill A. Shutemov
2014-10-05 18:51   ` Linus Torvalds
2014-10-06  9:42   ` David Rientjes [this message]
2014-10-06 15:03 ` Kirill A. Shutemov
2014-10-06 20:53   ` David Rientjes
2014-10-15 21:13 ` [patch resend] " David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.02.1410060237580.12568@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=stable@vger.kernel.org \
    --cc=suleiman@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).