From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Suleiman Souhlal <suleiman@google.com>,
stable@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [patch for-3.17] mm, thp: fix collapsing of hugepages on madvise
Date: Mon, 6 Oct 2014 18:03:51 +0300 [thread overview]
Message-ID: <20141006150351.GA23754@node.dhcp.inet.fi> (raw)
In-Reply-To: <alpine.DEB.2.02.1410041947080.7055@chino.kir.corp.google.com>
On Sat, Oct 04, 2014 at 07:48:04PM -0700, David Rientjes wrote:
> If an anonymous mapping is not allowed to fault thp memory and then
> madvise(MADV_HUGEPAGE) is used after fault, khugepaged will never
> collapse this memory into thp memory.
>
> This occurs because the madvise(2) handler for thp, hugepage_advise(),
> clears VM_NOHUGEPAGE on the stack and it isn't stored in vma->vm_flags
> until the final action of madvise_behavior(). This causes the
> khugepaged_enter_vma_merge() to be a no-op in hugepage_advise() when the
> vma had previously had VM_NOHUGEPAGE set.
>
> Fix this by passing the correct vma flags to the khugepaged mm slot
> handler. There's no chance khugepaged can run on this vma until after
> madvise_behavior() returns since we hold mm->mmap_sem.
>
> It would be possible to clear VM_NOHUGEPAGE directly from vma->vm_flags
> in hugepage_advise(), but I didn't want to introduce special case
> behavior into madvise_behavior(). I think it's best to just let it
> always set vma->vm_flags itself.
>
> Cc: <stable@vger.kernel.org>
> Reported-by: Suleiman Souhlal <suleiman@google.com>
> Signed-off-by: David Rientjes <rientjes@google.com>
Okay, I've looked once again and it seems your approach is better.
Although, I don't like that we need to pass down vma->vm_flags to every
khugepaged_enter() and khugepaged_enter_vma_merge().
My proposal is below. Build-tested only.
And I don't think this is subject for stable@: no crash or serious
misbehaviour. Registering to khugepaged is postponed until first page
fault. Not a big deal.
>From 6434d87a313317bcbc98313794840c366ba39ba1 Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Date: Mon, 6 Oct 2014 17:52:17 +0300
Subject: [PATCH] thp: fix registering VMA into khugepaged on
madvise(MADV_HUGEPAGE)
hugepage_madvise() tries to register VMA into khugepaged with
khugepaged_enter_vma_merge() on madvise(MADV_HUGEPAGE). Unfortunately
it's effectevely nop, since khugepaged_enter_vma_merge() rely on
vma->vm_flags which has not yet updated by the time of
hugepage_madvise().
Let's create a variant of khugepaged_enter_vma_merge() which takes
external vm_flags to consider. At the moment there's only two users for
such function: hugepage_madvise() and vma_merge().
Reported-by: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
include/linux/khugepaged.h | 9 +++++++++
mm/huge_memory.c | 27 ++++++++++++++++++++-------
mm/mmap.c | 6 +++---
3 files changed, 32 insertions(+), 10 deletions(-)
diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
index 6b394f0b5148..bb25e323ee2d 100644
--- a/include/linux/khugepaged.h
+++ b/include/linux/khugepaged.h
@@ -7,6 +7,8 @@
extern int __khugepaged_enter(struct mm_struct *mm);
extern void __khugepaged_exit(struct mm_struct *mm);
extern int khugepaged_enter_vma_merge(struct vm_area_struct *vma);
+extern int __khugepaged_enter_vma_merge(struct vm_area_struct *vma,
+ vm_flags_t vm_flags);
#define khugepaged_enabled() \
(transparent_hugepage_flags & \
@@ -62,6 +64,13 @@ static inline int khugepaged_enter_vma_merge(struct vm_area_struct *vma)
{
return 0;
}
+
+static inline int __khugepaged_enter_vma_merge(struct vm_area_struct *vma,
+ vm_flags_t vm_flags)
+{
+ return 0;
+}
+
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
#endif /* _LINUX_KHUGEPAGED_H */
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 017aff657ef5..4b0afc076827 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1944,7 +1944,7 @@ out:
#define VM_NO_THP (VM_SPECIAL | VM_HUGETLB | VM_SHARED | VM_MAYSHARE)
int hugepage_madvise(struct vm_area_struct *vma,
- unsigned long *vm_flags, int advice)
+ vm_flags_t *vm_flags, int advice)
{
switch (advice) {
case MADV_HUGEPAGE:
@@ -1969,8 +1969,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
* register it here without waiting a page fault that
* may not happen any time soon.
*/
- if (unlikely(khugepaged_enter_vma_merge(vma)))
- return -ENOMEM;
+ return __khugepaged_enter_vma_merge(vma, *vm_flags);
break;
case MADV_NOHUGEPAGE:
/*
@@ -2070,7 +2069,8 @@ int __khugepaged_enter(struct mm_struct *mm)
return 0;
}
-int khugepaged_enter_vma_merge(struct vm_area_struct *vma)
+int __khugepaged_enter_vma_merge(struct vm_area_struct *vma,
+ vm_flags_t vm_flags)
{
unsigned long hstart, hend;
if (!vma->anon_vma)
@@ -2082,14 +2082,27 @@ int khugepaged_enter_vma_merge(struct vm_area_struct *vma)
if (vma->vm_ops)
/* khugepaged not yet working on file or special mappings */
return 0;
- VM_BUG_ON(vma->vm_flags & VM_NO_THP);
+ VM_BUG_ON(vm_flags & VM_NO_THP);
hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
hend = vma->vm_end & HPAGE_PMD_MASK;
- if (hstart < hend)
- return khugepaged_enter(vma);
+ if (hstart >= hend)
+ return 0;
+ if (test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags))
+ return 0;
+ if (vm_flags & VM_NOHUGEPAGE)
+ return 0;
+ if (khugepaged_always())
+ return __khugepaged_enter(vma->vm_mm);
+ if (khugepaged_req_madv() && vm_flags & VM_HUGEPAGE)
+ return __khugepaged_enter(vma->vm_mm);
return 0;
}
+int khugepaged_enter_vma_merge(struct vm_area_struct *vma)
+{
+ return __khugepaged_enter_vma_merge(vma, vma->vm_flags);
+}
+
void __khugepaged_exit(struct mm_struct *mm)
{
struct mm_slot *mm_slot;
diff --git a/mm/mmap.c b/mm/mmap.c
index c0a3637cdb64..19fee85f0b12 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1009,8 +1009,8 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
*/
struct vm_area_struct *vma_merge(struct mm_struct *mm,
struct vm_area_struct *prev, unsigned long addr,
- unsigned long end, unsigned long vm_flags,
- struct anon_vma *anon_vma, struct file *file,
+ unsigned long end, vm_flags_t vm_flags,
+ struct anon_vma *anon_vma, struct file *file,
pgoff_t pgoff, struct mempolicy *policy)
{
pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
@@ -1056,7 +1056,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
end, prev->vm_pgoff, NULL);
if (err)
return NULL;
- khugepaged_enter_vma_merge(prev);
+ __khugepaged_enter_vma_merge(prev, vm_flags);
return prev;
}
--
Kirill A. Shutemov
next prev parent reply other threads:[~2014-10-06 15:04 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-05 2:48 [patch for-3.17] mm, thp: fix collapsing of hugepages on madvise David Rientjes
2014-10-05 17:15 ` Linus Torvalds
2014-10-05 18:41 ` Kirill A. Shutemov
2014-10-05 18:51 ` Linus Torvalds
2014-10-06 9:42 ` David Rientjes
2014-10-06 15:03 ` Kirill A. Shutemov [this message]
2014-10-06 20:53 ` David Rientjes
2014-10-15 21:13 ` [patch resend] " David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141006150351.GA23754@node.dhcp.inet.fi \
--to=kirill@shutemov.name \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
--cc=stable@vger.kernel.org \
--cc=suleiman@google.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).