All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] THP updates
@ 2015-12-24 11:51 Kirill A. Shutemov
  2015-12-24 11:51 ` [PATCH 1/4] thp: add debugfs handle to split all huge pages Kirill A. Shutemov
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Kirill A. Shutemov @ 2015-12-24 11:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Sasha Levin, linux-mm, Kirill A. Shutemov

Hi Andrew,

Patches below fixes two mlock-related bugs and increase rate of success
for split_huge_page().

I also implemented debugfs handle to split all huge pages in the system.
It's useful for debugging.

Kirill A. Shutemov (4):
  thp: add debugfs handle to split all huge pages
  thp: fix regression in handling mlocked pages in __split_huge_pmd()
  mm: stop __munlock_pagevec_fill() if THP enounted
  thp: increase split_huge_page() success rate

 mm/huge_memory.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
 mm/mlock.c       |  7 ++++++
 2 files changed, 72 insertions(+), 5 deletions(-)

-- 
2.6.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/4] thp: add debugfs handle to split all huge pages
  2015-12-24 11:51 [PATCH 0/4] THP updates Kirill A. Shutemov
@ 2015-12-24 11:51 ` Kirill A. Shutemov
  2016-01-05  9:44   ` Vlastimil Babka
  2015-12-24 11:51 ` [PATCH 2/4] thp: fix regression in handling mlocked pages in __split_huge_pmd() Kirill A. Shutemov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Kirill A. Shutemov @ 2015-12-24 11:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Sasha Levin, linux-mm, Kirill A. Shutemov

Writing 1 into 'split_huge_pages' will try to find and split all huge
pages in the system. This is useful for debuging.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a880f9addba5..99f2a0ecb621 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -27,6 +27,7 @@
 #include <linux/userfaultfd_k.h>
 #include <linux/page_idle.h>
 #include <linux/swapops.h>
+#include <linux/debugfs.h>
 
 #include <asm/tlb.h>
 #include <asm/pgalloc.h>
@@ -3535,3 +3536,61 @@ static struct shrinker deferred_split_shrinker = {
 	.scan_objects = deferred_split_scan,
 	.seeks = DEFAULT_SEEKS,
 };
+
+#ifdef CONFIG_DEBUG_FS
+static int split_huge_pages_set(void *data, u64 val)
+{
+	struct zone *zone;
+	struct page *page;
+	unsigned long pfn, max_zone_pfn;
+	unsigned long total = 0, split = 0;
+
+	if (val != 1)
+		return -EINVAL;
+
+	for_each_populated_zone(zone) {
+		max_zone_pfn = zone_end_pfn(zone);
+		for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) {
+			if (!pfn_valid(pfn))
+				continue;
+
+			page = pfn_to_page(pfn);
+			if (!get_page_unless_zero(page))
+				continue;
+
+			if (zone != page_zone(page))
+				goto next;
+
+			if (!PageHead(page) || !PageAnon(page) ||
+					PageHuge(page))
+				goto next;
+
+			total++;
+			lock_page(page);
+			if (!split_huge_page(page))
+				split++;
+			unlock_page(page);
+next:
+			put_page(page);
+		}
+	}
+
+	pr_info("%lu of %lu THP split", split, total);
+
+	return 0;
+}
+DEFINE_SIMPLE_ATTRIBUTE(split_huge_pages_fops, NULL, split_huge_pages_set,
+		"%llu\n");
+
+static int __init split_huge_pages_debugfs(void)
+{
+	void *ret;
+
+	ret = debugfs_create_file("split_huge_pages", 0644, NULL, NULL,
+			&split_huge_pages_fops);
+	if (!ret)
+		pr_warn("Failed to create fault_around_bytes in debugfs");
+	return 0;
+}
+late_initcall(split_huge_pages_debugfs);
+#endif
-- 
2.6.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/4] thp: fix regression in handling mlocked pages in __split_huge_pmd()
  2015-12-24 11:51 [PATCH 0/4] THP updates Kirill A. Shutemov
  2015-12-24 11:51 ` [PATCH 1/4] thp: add debugfs handle to split all huge pages Kirill A. Shutemov
@ 2015-12-24 11:51 ` Kirill A. Shutemov
  2015-12-24 18:51   ` Dan Williams
  2015-12-25  1:10   ` Sasha Levin
  2015-12-24 11:51 ` [PATCH 3/4] mm: stop __munlock_pagevec_fill() if THP enounted Kirill A. Shutemov
  2015-12-24 11:51 ` [PATCH 4/4] thp: increase split_huge_page() success rate Kirill A. Shutemov
  3 siblings, 2 replies; 19+ messages in thread
From: Kirill A. Shutemov @ 2015-12-24 11:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Sasha Levin, linux-mm, Kirill A. Shutemov, Dan Williams

This patch fixes regression caused by patch
 "mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd"

The patch makes pmd_trans_huge() check and "page = pmd_page(*pmd)" after
__split_huge_pmd_locked(). It can never succeed, since the pmd already
points to a page table. As result the page is never get munlocked.

It causes crashes like this:
 http://lkml.kernel.org/r/5661FBB6.6050307@oracle.com

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
---
 mm/huge_memory.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 99f2a0ecb621..1a988d9b86ef 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3024,14 +3024,12 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 	ptl = pmd_lock(mm, pmd);
 	if (unlikely(!pmd_trans_huge(*pmd) && !pmd_devmap(*pmd)))
 		goto out;
-	__split_huge_pmd_locked(vma, pmd, haddr, false);
-
-	if (pmd_trans_huge(*pmd))
-		page = pmd_page(*pmd);
-	if (page && PageMlocked(page))
+	page = pmd_page(*pmd);
+	if (PageMlocked(page))
 		get_page(page);
 	else
 		page = NULL;
+	__split_huge_pmd_locked(vma, pmd, haddr, false);
 out:
 	spin_unlock(ptl);
 	mmu_notifier_invalidate_range_end(mm, haddr, haddr + HPAGE_PMD_SIZE);
-- 
2.6.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/4] mm: stop __munlock_pagevec_fill() if THP enounted
  2015-12-24 11:51 [PATCH 0/4] THP updates Kirill A. Shutemov
  2015-12-24 11:51 ` [PATCH 1/4] thp: add debugfs handle to split all huge pages Kirill A. Shutemov
  2015-12-24 11:51 ` [PATCH 2/4] thp: fix regression in handling mlocked pages in __split_huge_pmd() Kirill A. Shutemov
@ 2015-12-24 11:51 ` Kirill A. Shutemov
  2015-12-25  1:09   ` Sasha Levin
                     ` (2 more replies)
  2015-12-24 11:51 ` [PATCH 4/4] thp: increase split_huge_page() success rate Kirill A. Shutemov
  3 siblings, 3 replies; 19+ messages in thread
From: Kirill A. Shutemov @ 2015-12-24 11:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Sasha Levin, linux-mm, Kirill A. Shutemov

THP is properly handled in munlock_vma_pages_range().

It fixes crashes like this:
 http://lkml.kernel.org/r/565C5C38.3040705@oracle.com

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/mlock.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/mlock.c b/mm/mlock.c
index af421d8bd6da..9197b6721a1e 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -393,6 +393,13 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec,
 		if (!page || page_zone_id(page) != zoneid)
 			break;
 
+		/*
+		 * Do not use pagevec for PTE-mapped THP,
+		 * munlock_vma_pages_range() will handle them.
+		 */
+		if (PageTransCompound(page))
+			break;
+
 		get_page(page);
 		/*
 		 * Increase the address that will be returned *before* the
-- 
2.6.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/4] thp: increase split_huge_page() success rate
  2015-12-24 11:51 [PATCH 0/4] THP updates Kirill A. Shutemov
                   ` (2 preceding siblings ...)
  2015-12-24 11:51 ` [PATCH 3/4] mm: stop __munlock_pagevec_fill() if THP enounted Kirill A. Shutemov
@ 2015-12-24 11:51 ` Kirill A. Shutemov
  2015-12-28 23:30   ` Andrew Morton
  3 siblings, 1 reply; 19+ messages in thread
From: Kirill A. Shutemov @ 2015-12-24 11:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Sasha Levin, linux-mm, Kirill A. Shutemov

During freeze_page(), we remove the page from rmap. It munlocks the page
if it was mlocked. clear_page_mlock() uses of lru cache, which temporary
pins page.

Let's drain the lru cache before checking page's count vs. mapcount.
The change makes mlocked page split on first attempt, if it was not
pinned by somebody else.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1a988d9b86ef..4c1c292b7ddd 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3417,6 +3417,9 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 	freeze_page(anon_vma, head);
 	VM_BUG_ON_PAGE(compound_mapcount(head), head);
 
+	/* Make sure the page is not on per-CPU pagevec as it takes pin */
+	lru_add_drain();
+
 	/* Prevent deferred_split_scan() touching ->_count */
 	spin_lock(&split_queue_lock);
 	count = page_count(head);
-- 
2.6.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] thp: fix regression in handling mlocked pages in __split_huge_pmd()
  2015-12-24 11:51 ` [PATCH 2/4] thp: fix regression in handling mlocked pages in __split_huge_pmd() Kirill A. Shutemov
@ 2015-12-24 18:51   ` Dan Williams
  2015-12-24 22:56     ` Kirill A. Shutemov
  2015-12-25  1:10   ` Sasha Levin
  1 sibling, 1 reply; 19+ messages in thread
From: Dan Williams @ 2015-12-24 18:51 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: Andrew Morton, Sasha Levin, Linux MM

On Thu, Dec 24, 2015 at 3:51 AM, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
> This patch fixes regression caused by patch
>  "mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd"
>
> The patch makes pmd_trans_huge() check and "page = pmd_page(*pmd)" after
> __split_huge_pmd_locked(). It can never succeed, since the pmd already
> points to a page table. As result the page is never get munlocked.
>
> It causes crashes like this:
>  http://lkml.kernel.org/r/5661FBB6.6050307@oracle.com
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> ---
>  mm/huge_memory.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 99f2a0ecb621..1a988d9b86ef 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3024,14 +3024,12 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>         ptl = pmd_lock(mm, pmd);
>         if (unlikely(!pmd_trans_huge(*pmd) && !pmd_devmap(*pmd)))
>                 goto out;
> -       __split_huge_pmd_locked(vma, pmd, haddr, false);
> -
> -       if (pmd_trans_huge(*pmd))
> -               page = pmd_page(*pmd);
> -       if (page && PageMlocked(page))
> +       page = pmd_page(*pmd);
> +       if (PageMlocked(page))
>                 get_page(page);
>         else
>                 page = NULL;
> +       __split_huge_pmd_locked(vma, pmd, haddr, false);

Since dax pmd mappings may not have a backing struct page I think this
additionally needs the following:

8<-----
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4eae97325e95..c4eccfa836f4 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3025,11 +3025,13 @@ void __split_huge_pmd(struct vm_area_struct
*vma, pmd_t *pmd,
       ptl = pmd_lock(mm, pmd);
       if (unlikely(!pmd_trans_huge(*pmd) && !pmd_devmap(*pmd)))
               goto out;
-       page = pmd_page(*pmd);
-       if (PageMlocked(page))
-               get_page(page);
-       else
-               page = NULL;
+       else if (pmd_trans_huge(*pmd)) {
+               page = pmd_page(*pmd);
+               if (PageMlocked(page))
+                       get_page(page);
+               else
+                       page = NULL;
+       }
       __split_huge_pmd_locked(vma, pmd, haddr, false);
out:
       spin_unlock(ptl);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] thp: fix regression in handling mlocked pages in __split_huge_pmd()
  2015-12-24 18:51   ` Dan Williams
@ 2015-12-24 22:56     ` Kirill A. Shutemov
  0 siblings, 0 replies; 19+ messages in thread
From: Kirill A. Shutemov @ 2015-12-24 22:56 UTC (permalink / raw)
  To: Dan Williams; +Cc: Kirill A. Shutemov, Andrew Morton, Sasha Levin, Linux MM

On Thu, Dec 24, 2015 at 10:51:43AM -0800, Dan Williams wrote:
> On Thu, Dec 24, 2015 at 3:51 AM, Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> > This patch fixes regression caused by patch
> >  "mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd"
> >
> > The patch makes pmd_trans_huge() check and "page = pmd_page(*pmd)" after
> > __split_huge_pmd_locked(). It can never succeed, since the pmd already
> > points to a page table. As result the page is never get munlocked.
> >
> > It causes crashes like this:
> >  http://lkml.kernel.org/r/5661FBB6.6050307@oracle.com
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Reported-by: Sasha Levin <sasha.levin@oracle.com>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  mm/huge_memory.c | 8 +++-----
> >  1 file changed, 3 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 99f2a0ecb621..1a988d9b86ef 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -3024,14 +3024,12 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
> >         ptl = pmd_lock(mm, pmd);
> >         if (unlikely(!pmd_trans_huge(*pmd) && !pmd_devmap(*pmd)))
> >                 goto out;
> > -       __split_huge_pmd_locked(vma, pmd, haddr, false);
> > -
> > -       if (pmd_trans_huge(*pmd))
> > -               page = pmd_page(*pmd);
> > -       if (page && PageMlocked(page))
> > +       page = pmd_page(*pmd);
> > +       if (PageMlocked(page))
> >                 get_page(page);
> >         else
> >                 page = NULL;
> > +       __split_huge_pmd_locked(vma, pmd, haddr, false);
> 
> Since dax pmd mappings may not have a backing struct page I think this
> additionally needs the following:
> 
> 8<-----
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 4eae97325e95..c4eccfa836f4 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3025,11 +3025,13 @@ void __split_huge_pmd(struct vm_area_struct
> *vma, pmd_t *pmd,
>        ptl = pmd_lock(mm, pmd);
>        if (unlikely(!pmd_trans_huge(*pmd) && !pmd_devmap(*pmd)))
>                goto out;
> -       page = pmd_page(*pmd);
> -       if (PageMlocked(page))
> -               get_page(page);
> -       else
> -               page = NULL;
> +       else if (pmd_trans_huge(*pmd)) {
> +               page = pmd_page(*pmd);
> +               if (PageMlocked(page))
> +                       get_page(page);
> +               else
> +                       page = NULL;
> +       }
>        __split_huge_pmd_locked(vma, pmd, haddr, false);
> out:
>        spin_unlock(ptl);
> 

Right, I've missed that. Here's updated patch.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] mm: stop __munlock_pagevec_fill() if THP enounted
  2015-12-24 11:51 ` [PATCH 3/4] mm: stop __munlock_pagevec_fill() if THP enounted Kirill A. Shutemov
@ 2015-12-25  1:09   ` Sasha Levin
  2015-12-28 23:22   ` Andrew Morton
  2016-01-05 10:18   ` Vlastimil Babka
  2 siblings, 0 replies; 19+ messages in thread
From: Sasha Levin @ 2015-12-25  1:09 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton; +Cc: linux-mm

On 12/24/2015 06:51 AM, Kirill A. Shutemov wrote:
> THP is properly handled in munlock_vma_pages_range().
> 
> It fixes crashes like this:
>  http://lkml.kernel.org/r/565C5C38.3040705@oracle.com
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Looks like this issue is fixed for me.

	Tested-by: Sasha Levin <sasha.levin@oracle.com>


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] thp: fix regression in handling mlocked pages in __split_huge_pmd()
  2015-12-24 11:51 ` [PATCH 2/4] thp: fix regression in handling mlocked pages in __split_huge_pmd() Kirill A. Shutemov
  2015-12-24 18:51   ` Dan Williams
@ 2015-12-25  1:10   ` Sasha Levin
  2015-12-25  1:12     ` Dan Williams
  1 sibling, 1 reply; 19+ messages in thread
From: Sasha Levin @ 2015-12-25  1:10 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton; +Cc: linux-mm, Dan Williams

On 12/24/2015 06:51 AM, Kirill A. Shutemov wrote:
> This patch fixes regression caused by patch
>  "mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd"
> 
> The patch makes pmd_trans_huge() check and "page = pmd_page(*pmd)" after
> __split_huge_pmd_locked(). It can never succeed, since the pmd already
> points to a page table. As result the page is never get munlocked.
> 
> It causes crashes like this:
>  http://lkml.kernel.org/r/5661FBB6.6050307@oracle.com

So this patch didn't fix the issue for me. I've sent Kirill the trace
off-list, but it's essentially the same thing.


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] thp: fix regression in handling mlocked pages in __split_huge_pmd()
  2015-12-25  1:10   ` Sasha Levin
@ 2015-12-25  1:12     ` Dan Williams
  2015-12-25  1:17       ` Sasha Levin
  0 siblings, 1 reply; 19+ messages in thread
From: Dan Williams @ 2015-12-25  1:12 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Kirill A. Shutemov, Andrew Morton, Linux MM

On Thu, Dec 24, 2015 at 5:10 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
> On 12/24/2015 06:51 AM, Kirill A. Shutemov wrote:
>> This patch fixes regression caused by patch
>>  "mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd"
>>
>> The patch makes pmd_trans_huge() check and "page = pmd_page(*pmd)" after
>> __split_huge_pmd_locked(). It can never succeed, since the pmd already
>> points to a page table. As result the page is never get munlocked.
>>
>> It causes crashes like this:
>>  http://lkml.kernel.org/r/5661FBB6.6050307@oracle.com
>
> So this patch didn't fix the issue for me. I've sent Kirill the trace
> off-list, but it's essentially the same thing.

Can you send me the trace as well, and the reproducer?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] thp: fix regression in handling mlocked pages in __split_huge_pmd()
  2015-12-25  1:12     ` Dan Williams
@ 2015-12-25  1:17       ` Sasha Levin
  2015-12-28 12:58         ` Kirill A. Shutemov
  0 siblings, 1 reply; 19+ messages in thread
From: Sasha Levin @ 2015-12-25  1:17 UTC (permalink / raw)
  To: Dan Williams; +Cc: Kirill A. Shutemov, Andrew Morton, Linux MM

[-- Attachment #1: Type: text/plain, Size: 2803 bytes --]

On 12/24/2015 08:12 PM, Dan Williams wrote:
> On Thu, Dec 24, 2015 at 5:10 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
>> > On 12/24/2015 06:51 AM, Kirill A. Shutemov wrote:
>>> >> This patch fixes regression caused by patch
>>> >>  "mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd"
>>> >>
>>> >> The patch makes pmd_trans_huge() check and "page = pmd_page(*pmd)" after
>>> >> __split_huge_pmd_locked(). It can never succeed, since the pmd already
>>> >> points to a page table. As result the page is never get munlocked.
>>> >>
>>> >> It causes crashes like this:
>>> >>  http://lkml.kernel.org/r/5661FBB6.6050307@oracle.com
>> >
>> > So this patch didn't fix the issue for me. I've sent Kirill the trace
>> > off-list, but it's essentially the same thing.
> Can you send me the trace as well, and the reproducer?

I don't have a simple reproducer, it reproduces rather quickly when running
under trinity within a KVM guest running a kernel I've attached the config
for.

Here's the trace:

[ 2885.040719] BUG: Bad page state in process kswapd0  pfn:ba000
[ 2885.040734] page:ffffea0002e80000 count:0 mapcount:0 mapping:          (null) index:0x800
[ 2885.040745] flags: 0x9fffff80144008(uptodate|head|swapbacked|mlocked)
[ 2885.040747] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[ 2885.040749] bad because of flags: 0x100000(mlocked)
[ 2885.040774] Modules linked in:
[ 2885.040798] CPU: 0 PID: 3740 Comm: kswapd0 Not tainted 4.4.0-rc5-next-20151221-sasha-00026-g627e275-d
irty #2758
[ 2885.040821]  0000000000000000 00000000b542da6d ffff8800c9c4f538 ffffffffa3045a14
[ 2885.040825]  0000000041b58ab3 ffffffffae666b8b ffffffffa3045969 ffff880559cc06fd
[ 2885.040853]  ffffea0002e80000 00000000b542da6d ffff8800c9c4f538 0000000000100000
[ 2885.040854] Call Trace:
[ 2885.041027]  [<ffffffffa3045a14>] dump_stack+0xab/0x117
[ 2885.041034]  [<ffffffffa3045969>] ? _atomic_dec_and_lock+0xc9/0xc9
[ 2885.041067]  [<ffffffffa16258b5>] bad_page+0x295/0x350
[ 2885.041160]  [<ffffffffa1627c69>] free_pages_prepare+0x489/0x1650
[ 2885.041193]  [<ffffffffa162fd13>] __free_pages_ok+0x43/0x230
[ 2885.041197]  [<ffffffffa162ff92>] free_compound_page+0x92/0xa0
[ 2885.041207]  [<ffffffffa1777af7>] free_transhuge_page+0x87/0x90
[ 2885.041215]  [<ffffffffa164c4fc>] __put_compound_page+0xac/0xc0
[ 2885.041232]  [<ffffffffa164c5ae>] __put_page+0x9e/0xb0
[ 2885.041236]  [<ffffffffa177655b>] deferred_split_scan+0x7ab/0x7d0
[ 2885.041277]  [<ffffffffa16582df>] shrink_slab+0x4af/0x660
[ 2885.041298]  [<ffffffffa1665b4d>] shrink_zone+0x6bd/0xbf0
[ 2885.041320]  [<ffffffffa1668582>] balance_pgdat+0x7f2/0xc00
[ 2885.041398]  [<ffffffffa1669243>] kswapd+0x8b3/0xa10
[ 2885.041437]  [<ffffffffa13bf8ce>] kthread+0x31e/0x330
[ 2885.041453]  [<ffffffffabe1ad0f>] ret_from_fork+0x3f/0x70


Thanks,
Sasha

[-- Attachment #2: config-sasha.gz --]
[-- Type: application/gzip, Size: 43528 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] thp: fix regression in handling mlocked pages in __split_huge_pmd()
  2015-12-25  1:17       ` Sasha Levin
@ 2015-12-28 12:58         ` Kirill A. Shutemov
  0 siblings, 0 replies; 19+ messages in thread
From: Kirill A. Shutemov @ 2015-12-28 12:58 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Dan Williams, Kirill A. Shutemov, Andrew Morton, Linux MM

On Thu, Dec 24, 2015 at 08:17:15PM -0500, Sasha Levin wrote:
> On 12/24/2015 08:12 PM, Dan Williams wrote:
> > On Thu, Dec 24, 2015 at 5:10 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
> >> > On 12/24/2015 06:51 AM, Kirill A. Shutemov wrote:
> >>> >> This patch fixes regression caused by patch
> >>> >>  "mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd"
> >>> >>
> >>> >> The patch makes pmd_trans_huge() check and "page = pmd_page(*pmd)" after
> >>> >> __split_huge_pmd_locked(). It can never succeed, since the pmd already
> >>> >> points to a page table. As result the page is never get munlocked.
> >>> >>
> >>> >> It causes crashes like this:
> >>> >>  http://lkml.kernel.org/r/5661FBB6.6050307@oracle.com
> >> >
> >> > So this patch didn't fix the issue for me. I've sent Kirill the trace
> >> > off-list, but it's essentially the same thing.
> > Can you send me the trace as well, and the reproducer?
> 
> I don't have a simple reproducer, it reproduces rather quickly when running
> under trinity within a KVM guest running a kernel I've attached the config
> for.

Is there a chance to get it reproduced with logs enabled in trinity?
I failed to repoproduce it and code audit isn't fruitful so far.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] mm: stop __munlock_pagevec_fill() if THP enounted
  2015-12-24 11:51 ` [PATCH 3/4] mm: stop __munlock_pagevec_fill() if THP enounted Kirill A. Shutemov
  2015-12-25  1:09   ` Sasha Levin
@ 2015-12-28 23:22   ` Andrew Morton
  2015-12-29 11:27     ` Kirill A. Shutemov
  2016-01-05 10:18   ` Vlastimil Babka
  2 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2015-12-28 23:22 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: Sasha Levin, linux-mm

On Thu, 24 Dec 2015 14:51:22 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> THP is properly handled in munlock_vma_pages_range().
> 
> It fixes crashes like this:
>  http://lkml.kernel.org/r/565C5C38.3040705@oracle.com
> 
> ...
>
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -393,6 +393,13 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec,
>  		if (!page || page_zone_id(page) != zoneid)
>  			break;
>  
> +		/*
> +		 * Do not use pagevec for PTE-mapped THP,
> +		 * munlock_vma_pages_range() will handle them.
> +		 */
> +		if (PageTransCompound(page))
> +			break;
> +
>  		get_page(page);
>  		/*
>  		 * Increase the address that will be returned *before* the

I'm trying to work out approximately which patch this patch fixes, and
it ain't easy.  Help?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] thp: increase split_huge_page() success rate
  2015-12-24 11:51 ` [PATCH 4/4] thp: increase split_huge_page() success rate Kirill A. Shutemov
@ 2015-12-28 23:30   ` Andrew Morton
  2015-12-29 20:57     ` Kirill A. Shutemov
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2015-12-28 23:30 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: Sasha Levin, linux-mm

On Thu, 24 Dec 2015 14:51:23 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> During freeze_page(), we remove the page from rmap. It munlocks the page
> if it was mlocked. clear_page_mlock() uses of lru cache, which temporary
> pins page.
> 
> Let's drain the lru cache before checking page's count vs. mapcount.
> The change makes mlocked page split on first attempt, if it was not
> pinned by somebody else.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  mm/huge_memory.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 1a988d9b86ef..4c1c292b7ddd 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3417,6 +3417,9 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>  	freeze_page(anon_vma, head);
>  	VM_BUG_ON_PAGE(compound_mapcount(head), head);
>  
> +	/* Make sure the page is not on per-CPU pagevec as it takes pin */
> +	lru_add_drain();
> +
>  	/* Prevent deferred_split_scan() touching ->_count */
>  	spin_lock(&split_queue_lock);
>  	count = page_count(head);

Fair enough.

mlocked pages are rare and lru_add_drain() isn't free.  We could easily
and cheaply make page_remove_rmap() return "bool was_mlocked" (or,
better, "bool might_be_in_lru_cache") to skip this overhead.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] mm: stop __munlock_pagevec_fill() if THP enounted
  2015-12-28 23:22   ` Andrew Morton
@ 2015-12-29 11:27     ` Kirill A. Shutemov
  0 siblings, 0 replies; 19+ messages in thread
From: Kirill A. Shutemov @ 2015-12-29 11:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Kirill A. Shutemov, Sasha Levin, linux-mm

On Mon, Dec 28, 2015 at 03:22:35PM -0800, Andrew Morton wrote:
> On Thu, 24 Dec 2015 14:51:22 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> 
> > THP is properly handled in munlock_vma_pages_range().
> > 
> > It fixes crashes like this:
> >  http://lkml.kernel.org/r/565C5C38.3040705@oracle.com
> > 
> > ...
> >
> > --- a/mm/mlock.c
> > +++ b/mm/mlock.c
> > @@ -393,6 +393,13 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec,
> >  		if (!page || page_zone_id(page) != zoneid)
> >  			break;
> >  
> > +		/*
> > +		 * Do not use pagevec for PTE-mapped THP,
> > +		 * munlock_vma_pages_range() will handle them.
> > +		 */
> > +		if (PageTransCompound(page))
> > +			break;
> > +
> >  		get_page(page);
> >  		/*
> >  		 * Increase the address that will be returned *before* the
> 
> I'm trying to work out approximately which patch this patch fixes, and
> it ain't easy.  Help?

"thp: allow mlocked THP again", I think.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] thp: increase split_huge_page() success rate
  2015-12-28 23:30   ` Andrew Morton
@ 2015-12-29 20:57     ` Kirill A. Shutemov
  2016-01-05 10:22       ` Vlastimil Babka
  0 siblings, 1 reply; 19+ messages in thread
From: Kirill A. Shutemov @ 2015-12-29 20:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Kirill A. Shutemov, Sasha Levin, linux-mm

On Mon, Dec 28, 2015 at 03:30:26PM -0800, Andrew Morton wrote:
> On Thu, 24 Dec 2015 14:51:23 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> 
> > During freeze_page(), we remove the page from rmap. It munlocks the page
> > if it was mlocked. clear_page_mlock() uses of lru cache, which temporary
> > pins page.
> > 
> > Let's drain the lru cache before checking page's count vs. mapcount.
> > The change makes mlocked page split on first attempt, if it was not
> > pinned by somebody else.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  mm/huge_memory.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 1a988d9b86ef..4c1c292b7ddd 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -3417,6 +3417,9 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
> >  	freeze_page(anon_vma, head);
> >  	VM_BUG_ON_PAGE(compound_mapcount(head), head);
> >  
> > +	/* Make sure the page is not on per-CPU pagevec as it takes pin */
> > +	lru_add_drain();
> > +
> >  	/* Prevent deferred_split_scan() touching ->_count */
> >  	spin_lock(&split_queue_lock);
> >  	count = page_count(head);
> 
> Fair enough.
> 
> mlocked pages are rare and lru_add_drain() isn't free.  We could easily
> and cheaply make page_remove_rmap() return "bool was_mlocked" (or,
> better, "bool might_be_in_lru_cache") to skip this overhead.

Propagating it back is painful. What about this instead:

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index ecb4ed1a821a..edfa53eda9ca 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3385,6 +3385,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 	struct page *head = compound_head(page);
 	struct anon_vma *anon_vma;
 	int count, mapcount, ret;
+	bool mlocked;
 
 	VM_BUG_ON_PAGE(is_huge_zero_page(page), page);
 	VM_BUG_ON_PAGE(!PageAnon(page), page);
@@ -3415,11 +3416,13 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 		goto out_unlock;
 	}
 
+	mlocked = PageMlocked(page);
 	freeze_page(anon_vma, head);
 	VM_BUG_ON_PAGE(compound_mapcount(head), head);
 
 	/* Make sure the page is not on per-CPU pagevec as it takes pin */
-	lru_add_drain();
+	if (mlocked)
+		lru_add_drain();
 
 	/* Prevent deferred_split_scan() touching ->_count */
 	spin_lock(&split_queue_lock);
-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] thp: add debugfs handle to split all huge pages
  2015-12-24 11:51 ` [PATCH 1/4] thp: add debugfs handle to split all huge pages Kirill A. Shutemov
@ 2016-01-05  9:44   ` Vlastimil Babka
  0 siblings, 0 replies; 19+ messages in thread
From: Vlastimil Babka @ 2016-01-05  9:44 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton; +Cc: Sasha Levin, linux-mm

On 12/24/2015 12:51 PM, Kirill A. Shutemov wrote:
> Writing 1 into 'split_huge_pages' will try to find and split all huge
> pages in the system. This is useful for debuging.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

It's not very optimized pfn scanner, but that shouldn't matter. I have 
but one suggestion and one fix below.

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>   mm/huge_memory.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 59 insertions(+)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index a880f9addba5..99f2a0ecb621 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -27,6 +27,7 @@
>   #include <linux/userfaultfd_k.h>
>   #include <linux/page_idle.h>
>   #include <linux/swapops.h>
> +#include <linux/debugfs.h>
>
>   #include <asm/tlb.h>
>   #include <asm/pgalloc.h>
> @@ -3535,3 +3536,61 @@ static struct shrinker deferred_split_shrinker = {
>   	.scan_objects = deferred_split_scan,
>   	.seeks = DEFAULT_SEEKS,
>   };
> +
> +#ifdef CONFIG_DEBUG_FS
> +static int split_huge_pages_set(void *data, u64 val)
> +{
> +	struct zone *zone;
> +	struct page *page;
> +	unsigned long pfn, max_zone_pfn;
> +	unsigned long total = 0, split = 0;
> +
> +	if (val != 1)
> +		return -EINVAL;
> +
> +	for_each_populated_zone(zone) {
> +		max_zone_pfn = zone_end_pfn(zone);
> +		for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) {
> +			if (!pfn_valid(pfn))
> +				continue;
> +
> +			page = pfn_to_page(pfn);
> +			if (!get_page_unless_zero(page))
> +				continue;
> +
> +			if (zone != page_zone(page))
> +				goto next;

I would do this check before get_page(...). Doesn't matter much, but 
looks odd.

> +
> +			if (!PageHead(page) || !PageAnon(page) ||
> +					PageHuge(page))
> +				goto next;
> +
> +			total++;
> +			lock_page(page);
> +			if (!split_huge_page(page))
> +				split++;
> +			unlock_page(page);
> +next:
> +			put_page(page);
> +		}
> +	}
> +
> +	pr_info("%lu of %lu THP split", split, total);
> +
> +	return 0;
> +}
> +DEFINE_SIMPLE_ATTRIBUTE(split_huge_pages_fops, NULL, split_huge_pages_set,
> +		"%llu\n");
> +
> +static int __init split_huge_pages_debugfs(void)
> +{
> +	void *ret;
> +
> +	ret = debugfs_create_file("split_huge_pages", 0644, NULL, NULL,
> +			&split_huge_pages_fops);
> +	if (!ret)
> +		pr_warn("Failed to create fault_around_bytes in debugfs");

s/fault_around_bytes/split_huge_pages/

> +	return 0;
> +}
> +late_initcall(split_huge_pages_debugfs);
> +#endif
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] mm: stop __munlock_pagevec_fill() if THP enounted
  2015-12-24 11:51 ` [PATCH 3/4] mm: stop __munlock_pagevec_fill() if THP enounted Kirill A. Shutemov
  2015-12-25  1:09   ` Sasha Levin
  2015-12-28 23:22   ` Andrew Morton
@ 2016-01-05 10:18   ` Vlastimil Babka
  2 siblings, 0 replies; 19+ messages in thread
From: Vlastimil Babka @ 2016-01-05 10:18 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton; +Cc: Sasha Levin, linux-mm

On 12/24/2015 12:51 PM, Kirill A. Shutemov wrote:
> THP is properly handled in munlock_vma_pages_range().
>
> It fixes crashes like this:
>   http://lkml.kernel.org/r/565C5C38.3040705@oracle.com
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Ack.

> ---
>   mm/mlock.c | 7 +++++++
>   1 file changed, 7 insertions(+)
>
> diff --git a/mm/mlock.c b/mm/mlock.c
> index af421d8bd6da..9197b6721a1e 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -393,6 +393,13 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec,
>   		if (!page || page_zone_id(page) != zoneid)
>   			break;
>
> +		/*
> +		 * Do not use pagevec for PTE-mapped THP,
> +		 * munlock_vma_pages_range() will handle them.
> +		 */
> +		if (PageTransCompound(page))
> +			break;
> +
>   		get_page(page);
>   		/*
>   		 * Increase the address that will be returned *before* the
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] thp: increase split_huge_page() success rate
  2015-12-29 20:57     ` Kirill A. Shutemov
@ 2016-01-05 10:22       ` Vlastimil Babka
  0 siblings, 0 replies; 19+ messages in thread
From: Vlastimil Babka @ 2016-01-05 10:22 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton
  Cc: Kirill A. Shutemov, Sasha Levin, linux-mm

On 12/29/2015 09:57 PM, Kirill A. Shutemov wrote:
> On Mon, Dec 28, 2015 at 03:30:26PM -0800, Andrew Morton wrote:
>> Fair enough.
>>
>> mlocked pages are rare and lru_add_drain() isn't free.  We could easily
>> and cheaply make page_remove_rmap() return "bool was_mlocked" (or,
>> better, "bool might_be_in_lru_cache") to skip this overhead.
>
> Propagating it back is painful. What about this instead:

Looks good.

> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index ecb4ed1a821a..edfa53eda9ca 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3385,6 +3385,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>   	struct page *head = compound_head(page);
>   	struct anon_vma *anon_vma;
>   	int count, mapcount, ret;
> +	bool mlocked;
>
>   	VM_BUG_ON_PAGE(is_huge_zero_page(page), page);
>   	VM_BUG_ON_PAGE(!PageAnon(page), page);
> @@ -3415,11 +3416,13 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>   		goto out_unlock;
>   	}
>
> +	mlocked = PageMlocked(page);
>   	freeze_page(anon_vma, head);
>   	VM_BUG_ON_PAGE(compound_mapcount(head), head);
>
>   	/* Make sure the page is not on per-CPU pagevec as it takes pin */
> -	lru_add_drain();
> +	if (mlocked)
> +		lru_add_drain();
>
>   	/* Prevent deferred_split_scan() touching ->_count */
>   	spin_lock(&split_queue_lock);
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-01-05 10:22 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-24 11:51 [PATCH 0/4] THP updates Kirill A. Shutemov
2015-12-24 11:51 ` [PATCH 1/4] thp: add debugfs handle to split all huge pages Kirill A. Shutemov
2016-01-05  9:44   ` Vlastimil Babka
2015-12-24 11:51 ` [PATCH 2/4] thp: fix regression in handling mlocked pages in __split_huge_pmd() Kirill A. Shutemov
2015-12-24 18:51   ` Dan Williams
2015-12-24 22:56     ` Kirill A. Shutemov
2015-12-25  1:10   ` Sasha Levin
2015-12-25  1:12     ` Dan Williams
2015-12-25  1:17       ` Sasha Levin
2015-12-28 12:58         ` Kirill A. Shutemov
2015-12-24 11:51 ` [PATCH 3/4] mm: stop __munlock_pagevec_fill() if THP enounted Kirill A. Shutemov
2015-12-25  1:09   ` Sasha Levin
2015-12-28 23:22   ` Andrew Morton
2015-12-29 11:27     ` Kirill A. Shutemov
2016-01-05 10:18   ` Vlastimil Babka
2015-12-24 11:51 ` [PATCH 4/4] thp: increase split_huge_page() success rate Kirill A. Shutemov
2015-12-28 23:30   ` Andrew Morton
2015-12-29 20:57     ` Kirill A. Shutemov
2016-01-05 10:22       ` Vlastimil Babka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.