All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG at mm/memory.c:1489!
@ 2014-05-28  8:32 ` Michael Ellerman
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Ellerman @ 2014-05-28  8:32 UTC (permalink / raw)
  To: Linux MM, LKML, trinity

Hey folks,

Anyone seen this before? Trinity hit it just now:

Linux Blade312-5 3.15.0-rc7 #306 SMP Wed May 28 17:51:18 EST 2014 ppc64

[watchdog] 27853 iterations. [F:22642 S:5174 HI:1276]
------------[ cut here ]------------
kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
cpu 0xc: Vector: 700 (Program Check) at [c000000384eaf960]
    pc: c0000000001ad6f0: .follow_page_mask+0x90/0x650
    lr: c0000000001ad6d8: .follow_page_mask+0x78/0x650
    sp: c000000384eafbe0
   msr: 8000000000029032
  current = 0xc0000003c27e1bc0
  paca    = 0xc000000001dc3000   softe: 0        irq_happened: 0x01
    pid   = 20800, comm = trinity-c12
kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
enter ? for help
[c000000384eafcc0] c0000000001e5514 .SyS_move_pages+0x524/0x7d0
[c000000384eafe30] c00000000000a1d8 syscall_exit+0x0/0x98
--- Exception: c01 (System Call) at 00003fff795f30a8
SP (3ffff958f290) is in userspace


I've left it in the debugger, can dig into it a bit more tomorrow if anyone has
any clues.

cheers



^ permalink raw reply	[flat|nested] 28+ messages in thread

* BUG at mm/memory.c:1489!
@ 2014-05-28  8:32 ` Michael Ellerman
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Ellerman @ 2014-05-28  8:32 UTC (permalink / raw)
  To: Linux MM, LKML, trinity

Hey folks,

Anyone seen this before? Trinity hit it just now:

Linux Blade312-5 3.15.0-rc7 #306 SMP Wed May 28 17:51:18 EST 2014 ppc64

[watchdog] 27853 iterations. [F:22642 S:5174 HI:1276]
------------[ cut here ]------------
kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
cpu 0xc: Vector: 700 (Program Check) at [c000000384eaf960]
    pc: c0000000001ad6f0: .follow_page_mask+0x90/0x650
    lr: c0000000001ad6d8: .follow_page_mask+0x78/0x650
    sp: c000000384eafbe0
   msr: 8000000000029032
  current = 0xc0000003c27e1bc0
  paca    = 0xc000000001dc3000   softe: 0        irq_happened: 0x01
    pid   = 20800, comm = trinity-c12
kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
enter ? for help
[c000000384eafcc0] c0000000001e5514 .SyS_move_pages+0x524/0x7d0
[c000000384eafe30] c00000000000a1d8 syscall_exit+0x0/0x98
--- Exception: c01 (System Call) at 00003fff795f30a8
SP (3ffff958f290) is in userspace


I've left it in the debugger, can dig into it a bit more tomorrow if anyone has
any clues.

cheers


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG at mm/memory.c:1489!
  2014-05-28  8:32 ` Michael Ellerman
@ 2014-05-29  0:33   ` Hugh Dickins
  -1 siblings, 0 replies; 28+ messages in thread
From: Hugh Dickins @ 2014-05-29  0:33 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Andrew Morton, Naoya Horiguchi, Benjamin Herrenschmidt,
	Tony Luck, linux-mm, linux-kernel, trinity

On Wed, 28 May 2014, Michael Ellerman wrote:
> Hey folks,
> 
> Anyone seen this before? Trinity hit it just now:
> 
> Linux Blade312-5 3.15.0-rc7 #306 SMP Wed May 28 17:51:18 EST 2014 ppc64
> 
> [watchdog] 27853 iterations. [F:22642 S:5174 HI:1276]
> ------------[ cut here ]------------
> kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> cpu 0xc: Vector: 700 (Program Check) at [c000000384eaf960]
>     pc: c0000000001ad6f0: .follow_page_mask+0x90/0x650
>     lr: c0000000001ad6d8: .follow_page_mask+0x78/0x650
>     sp: c000000384eafbe0
>    msr: 8000000000029032
>   current = 0xc0000003c27e1bc0
>   paca    = 0xc000000001dc3000   softe: 0        irq_happened: 0x01
>     pid   = 20800, comm = trinity-c12
> kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> enter ? for help
> [c000000384eafcc0] c0000000001e5514 .SyS_move_pages+0x524/0x7d0
> [c000000384eafe30] c00000000000a1d8 syscall_exit+0x0/0x98
> --- Exception: c01 (System Call) at 00003fff795f30a8
> SP (3ffff958f290) is in userspace
> 
> I've left it in the debugger, can dig into it a bit more tomorrow
> if anyone has any clues.

Thanks for leaving it overnight, but this one is quite obvious,
so go ahead and reboot whenever suits you.

Trinity didn't even need to do anything bizarre to get this: that
ordinary path simply didn't get tried on powerpc or ia64 before.

Here's a patch which should fix it for you, but I believe leaves
a race in common with other architectures.  I must turn away to
other things, and hope Naoya-san can fix up the locking separately
(or point out why it's already safe).

[PATCH] mm: fix move_pages follow_page huge_addr BUG

v3.12's e632a938d914 ("mm: migrate: add hugepage migration code to
move_pages()") is okay on most arches, but on follow_huge_addr-style
arches ia64 and powerpc, it hits my old BUG_ON(flags & FOLL_GET)
from v2.6.15 deceb6cd17e6 ("mm: follow_page with inner ptlock").

The point of the BUG_ON was that nothing needed FOLL_GET there at
the time, and it was not clear that we have sufficient locking to
use get_page() safely here on the outside - maybe the page found has
already been freed and even reused when follow_huge_addr() returns.

I suspect that e632a938d914's use of get_page() after return from
follow_huge_pmd() has the same problem: what prevents a racing
instance of move_pages() from already migrating away and freeing
that page by then?  A reference to the page should be taken while
holding suitable lock (huge_pte_lockptr?), to serialize against
concurrent migration.

But I'm not prepared to rework the hugetlb locking here myself;
so for now just supply a patch to copy e632a938d914's get_page()
after follow_huge_pmd() to after follow_huge_addr(): removing
the BUG_ON(flags & FOLL_GET), but probably leaving a race.

Fixes: e632a938d914 ("mm: migrate: add hugepage migration code to move_pages()")
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: stable@vger.kernel.org # 3.12+
---
Whether this is a patch that should go in without fixing the locking,
I don't know.  An unlikely race is better than a triggerable BUG?
Or perhaps I'm just wrong about there being any such race.

 mm/memory.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

--- 3.15-rc7/mm/memory.c	2014-04-27 23:55:53.608801152 -0700
+++ linux/mm/memory.c	2014-05-28 13:05:48.340124615 -0700
@@ -1486,7 +1486,17 @@ struct page *follow_page_mask(struct vm_
 
 	page = follow_huge_addr(mm, address, flags & FOLL_WRITE);
 	if (!IS_ERR(page)) {
-		BUG_ON(flags & FOLL_GET);
+		if (page && (flags & FOLL_GET)) {
+			/*
+			 * Refcount on tail pages are not well-defined and
+			 * shouldn't be taken. The caller should handle a NULL
+			 * return when trying to follow tail pages.
+			 */
+			if (PageHead(page))
+				get_page(page);
+			else
+				page = NULL;
+		}
 		goto out;
 	}
 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG at mm/memory.c:1489!
@ 2014-05-29  0:33   ` Hugh Dickins
  0 siblings, 0 replies; 28+ messages in thread
From: Hugh Dickins @ 2014-05-29  0:33 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Andrew Morton, Naoya Horiguchi, Benjamin Herrenschmidt,
	Tony Luck, linux-mm, linux-kernel, trinity

On Wed, 28 May 2014, Michael Ellerman wrote:
> Hey folks,
> 
> Anyone seen this before? Trinity hit it just now:
> 
> Linux Blade312-5 3.15.0-rc7 #306 SMP Wed May 28 17:51:18 EST 2014 ppc64
> 
> [watchdog] 27853 iterations. [F:22642 S:5174 HI:1276]
> ------------[ cut here ]------------
> kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> cpu 0xc: Vector: 700 (Program Check) at [c000000384eaf960]
>     pc: c0000000001ad6f0: .follow_page_mask+0x90/0x650
>     lr: c0000000001ad6d8: .follow_page_mask+0x78/0x650
>     sp: c000000384eafbe0
>    msr: 8000000000029032
>   current = 0xc0000003c27e1bc0
>   paca    = 0xc000000001dc3000   softe: 0        irq_happened: 0x01
>     pid   = 20800, comm = trinity-c12
> kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> enter ? for help
> [c000000384eafcc0] c0000000001e5514 .SyS_move_pages+0x524/0x7d0
> [c000000384eafe30] c00000000000a1d8 syscall_exit+0x0/0x98
> --- Exception: c01 (System Call) at 00003fff795f30a8
> SP (3ffff958f290) is in userspace
> 
> I've left it in the debugger, can dig into it a bit more tomorrow
> if anyone has any clues.

Thanks for leaving it overnight, but this one is quite obvious,
so go ahead and reboot whenever suits you.

Trinity didn't even need to do anything bizarre to get this: that
ordinary path simply didn't get tried on powerpc or ia64 before.

Here's a patch which should fix it for you, but I believe leaves
a race in common with other architectures.  I must turn away to
other things, and hope Naoya-san can fix up the locking separately
(or point out why it's already safe).

[PATCH] mm: fix move_pages follow_page huge_addr BUG

v3.12's e632a938d914 ("mm: migrate: add hugepage migration code to
move_pages()") is okay on most arches, but on follow_huge_addr-style
arches ia64 and powerpc, it hits my old BUG_ON(flags & FOLL_GET)
from v2.6.15 deceb6cd17e6 ("mm: follow_page with inner ptlock").

The point of the BUG_ON was that nothing needed FOLL_GET there at
the time, and it was not clear that we have sufficient locking to
use get_page() safely here on the outside - maybe the page found has
already been freed and even reused when follow_huge_addr() returns.

I suspect that e632a938d914's use of get_page() after return from
follow_huge_pmd() has the same problem: what prevents a racing
instance of move_pages() from already migrating away and freeing
that page by then?  A reference to the page should be taken while
holding suitable lock (huge_pte_lockptr?), to serialize against
concurrent migration.

But I'm not prepared to rework the hugetlb locking here myself;
so for now just supply a patch to copy e632a938d914's get_page()
after follow_huge_pmd() to after follow_huge_addr(): removing
the BUG_ON(flags & FOLL_GET), but probably leaving a race.

Fixes: e632a938d914 ("mm: migrate: add hugepage migration code to move_pages()")
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: stable@vger.kernel.org # 3.12+
---
Whether this is a patch that should go in without fixing the locking,
I don't know.  An unlikely race is better than a triggerable BUG?
Or perhaps I'm just wrong about there being any such race.

 mm/memory.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

--- 3.15-rc7/mm/memory.c	2014-04-27 23:55:53.608801152 -0700
+++ linux/mm/memory.c	2014-05-28 13:05:48.340124615 -0700
@@ -1486,7 +1486,17 @@ struct page *follow_page_mask(struct vm_
 
 	page = follow_huge_addr(mm, address, flags & FOLL_WRITE);
 	if (!IS_ERR(page)) {
-		BUG_ON(flags & FOLL_GET);
+		if (page && (flags & FOLL_GET)) {
+			/*
+			 * Refcount on tail pages are not well-defined and
+			 * shouldn't be taken. The caller should handle a NULL
+			 * return when trying to follow tail pages.
+			 */
+			if (PageHead(page))
+				get_page(page);
+			else
+				page = NULL;
+		}
 		goto out;
 	}
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG at mm/memory.c:1489!
  2014-05-29  0:33   ` Hugh Dickins
  (?)
@ 2014-05-29  4:52   ` Naoya Horiguchi
  2014-05-29 20:50       ` Hugh Dickins
  -1 siblings, 1 reply; 28+ messages in thread
From: Naoya Horiguchi @ 2014-05-29  4:52 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: mpe, Andrew Morton, benh, Tony Luck, linux-mm, linux-kernel, trinity

Hi Hugh,

On Wed, May 28, 2014 at 05:33:11PM -0700, Hugh Dickins wrote:
> On Wed, 28 May 2014, Michael Ellerman wrote:
> > Hey folks,
> > 
> > Anyone seen this before? Trinity hit it just now:
> > 
> > Linux Blade312-5 3.15.0-rc7 #306 SMP Wed May 28 17:51:18 EST 2014 ppc64
> > 
> > [watchdog] 27853 iterations. [F:22642 S:5174 HI:1276]
> > ------------[ cut here ]------------
> > kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> > cpu 0xc: Vector: 700 (Program Check) at [c000000384eaf960]
> >     pc: c0000000001ad6f0: .follow_page_mask+0x90/0x650
> >     lr: c0000000001ad6d8: .follow_page_mask+0x78/0x650
> >     sp: c000000384eafbe0
> >    msr: 8000000000029032
> >   current = 0xc0000003c27e1bc0
> >   paca    = 0xc000000001dc3000   softe: 0        irq_happened: 0x01
> >     pid   = 20800, comm = trinity-c12
> > kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> > enter ? for help
> > [c000000384eafcc0] c0000000001e5514 .SyS_move_pages+0x524/0x7d0
> > [c000000384eafe30] c00000000000a1d8 syscall_exit+0x0/0x98
> > --- Exception: c01 (System Call) at 00003fff795f30a8
> > SP (3ffff958f290) is in userspace
> > 
> > I've left it in the debugger, can dig into it a bit more tomorrow
> > if anyone has any clues.
> 
> Thanks for leaving it overnight, but this one is quite obvious,
> so go ahead and reboot whenever suits you.
> 
> Trinity didn't even need to do anything bizarre to get this: that
> ordinary path simply didn't get tried on powerpc or ia64 before.
> 
> Here's a patch which should fix it for you, but I believe leaves
> a race in common with other architectures.  I must turn away to
> other things, and hope Naoya-san can fix up the locking separately
> (or point out why it's already safe).
> 
> [PATCH] mm: fix move_pages follow_page huge_addr BUG
> 
> v3.12's e632a938d914 ("mm: migrate: add hugepage migration code to
> move_pages()") is okay on most arches, but on follow_huge_addr-style
> arches ia64 and powerpc, it hits my old BUG_ON(flags & FOLL_GET)
> from v2.6.15 deceb6cd17e6 ("mm: follow_page with inner ptlock").
> 
> The point of the BUG_ON was that nothing needed FOLL_GET there at
> the time, and it was not clear that we have sufficient locking to
> use get_page() safely here on the outside - maybe the page found has
> already been freed and even reused when follow_huge_addr() returns.
> 
> I suspect that e632a938d914's use of get_page() after return from
> follow_huge_pmd() has the same problem: what prevents a racing
> instance of move_pages() from already migrating away and freeing
> that page by then?  A reference to the page should be taken while
> holding suitable lock (huge_pte_lockptr?), to serialize against
> concurrent migration.

Right, we need take huge_pte_lockptr() here, I think.

> But I'm not prepared to rework the hugetlb locking here myself;
> so for now just supply a patch to copy e632a938d914's get_page()
> after follow_huge_pmd() to after follow_huge_addr(): removing
> the BUG_ON(flags & FOLL_GET), but probably leaving a race.

This bug was introduced by me, so I'll fix this.
Thank you for reporting.

> Fixes: e632a938d914 ("mm: migrate: add hugepage migration code to move_pages()")
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: stable@vger.kernel.org # 3.12+

This patch looks good to me.

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

> ---
> Whether this is a patch that should go in without fixing the locking,
> I don't know.  An unlikely race is better than a triggerable BUG?
> Or perhaps I'm just wrong about there being any such race.
> 
>  mm/memory.c |   12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> --- 3.15-rc7/mm/memory.c	2014-04-27 23:55:53.608801152 -0700
> +++ linux/mm/memory.c	2014-05-28 13:05:48.340124615 -0700
> @@ -1486,7 +1486,17 @@ struct page *follow_page_mask(struct vm_
>  
>  	page = follow_huge_addr(mm, address, flags & FOLL_WRITE);
>  	if (!IS_ERR(page)) {
> -		BUG_ON(flags & FOLL_GET);
> +		if (page && (flags & FOLL_GET)) {
> +			/*
> +			 * Refcount on tail pages are not well-defined and
> +			 * shouldn't be taken. The caller should handle a NULL
> +			 * return when trying to follow tail pages.
> +			 */
> +			if (PageHead(page))
> +				get_page(page);
> +			else
> +				page = NULL;
> +		}
>  		goto out;
>  	}
>  
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG at mm/memory.c:1489!
  2014-05-29  0:33   ` Hugh Dickins
@ 2014-05-29  8:59     ` Michael Ellerman
  -1 siblings, 0 replies; 28+ messages in thread
From: Michael Ellerman @ 2014-05-29  8:59 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Naoya Horiguchi, Benjamin Herrenschmidt,
	Tony Luck, linux-mm, linux-kernel, trinity

On Wed, 2014-05-28 at 17:33 -0700, Hugh Dickins wrote:
> On Wed, 28 May 2014, Michael Ellerman wrote:
> > Linux Blade312-5 3.15.0-rc7 #306 SMP Wed May 28 17:51:18 EST 2014 ppc64
> > 
> > [watchdog] 27853 iterations. [F:22642 S:5174 HI:1276]
> > ------------[ cut here ]------------
> > kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> > cpu 0xc: Vector: 700 (Program Check) at [c000000384eaf960]
> >     pc: c0000000001ad6f0: .follow_page_mask+0x90/0x650
> >     lr: c0000000001ad6d8: .follow_page_mask+0x78/0x650
> >     sp: c000000384eafbe0
> >    msr: 8000000000029032
> >   current = 0xc0000003c27e1bc0
> >   paca    = 0xc000000001dc3000   softe: 0        irq_happened: 0x01
> >     pid   = 20800, comm = trinity-c12
> > kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> > enter ? for help
> > [c000000384eafcc0] c0000000001e5514 .SyS_move_pages+0x524/0x7d0
> > [c000000384eafe30] c00000000000a1d8 syscall_exit+0x0/0x98
> > --- Exception: c01 (System Call) at 00003fff795f30a8
> > SP (3ffff958f290) is in userspace
> > 
> > I've left it in the debugger, can dig into it a bit more tomorrow
> > if anyone has any clues.
> 
> Thanks for leaving it overnight, but this one is quite obvious,
> so go ahead and reboot whenever suits you.
> 
> Trinity didn't even need to do anything bizarre to get this: that
> ordinary path simply didn't get tried on powerpc or ia64 before.
> 
> Here's a patch which should fix it for you, but I believe leaves
> a race in common with other architectures.  I must turn away to
> other things, and hope Naoya-san can fix up the locking separately
> (or point out why it's already safe).
> 
> [PATCH] mm: fix move_pages follow_page huge_addr BUG
> 
> v3.12's e632a938d914 ("mm: migrate: add hugepage migration code to
> move_pages()") is okay on most arches, but on follow_huge_addr-style
> arches ia64 and powerpc, it hits my old BUG_ON(flags & FOLL_GET)
> from v2.6.15 deceb6cd17e6 ("mm: follow_page with inner ptlock").
> 
> The point of the BUG_ON was that nothing needed FOLL_GET there at
> the time, and it was not clear that we have sufficient locking to
> use get_page() safely here on the outside - maybe the page found has
> already been freed and even reused when follow_huge_addr() returns.
> 
> I suspect that e632a938d914's use of get_page() after return from
> follow_huge_pmd() has the same problem: what prevents a racing
> instance of move_pages() from already migrating away and freeing
> that page by then?  A reference to the page should be taken while
> holding suitable lock (huge_pte_lockptr?), to serialize against
> concurrent migration.
> 
> But I'm not prepared to rework the hugetlb locking here myself;
> so for now just supply a patch to copy e632a938d914's get_page()
> after follow_huge_pmd() to after follow_huge_addr(): removing
> the BUG_ON(flags & FOLL_GET), but probably leaving a race.

Thanks for the detailed explanation Hugh.

Unfortunately I don't know our mm/hugetlb code well enough to give you a good
answer. Ben had a quick look at our follow_huge_addr() and thought it looked
"fishy". He suggested something like what we do in gup_pte_range() with
page_cache_get_speculative() might be in order.

Applying your patch and running trinity pretty immediately results in the
following, which looks related (sys_move_pages() again) ?

Unable to handle kernel paging request for data at address 0xf2000f80000000
Faulting instruction address: 0xc0000000001e29bc
cpu 0x1b: Vector: 300 (Data Access) at [c0000003c70f76f0]
    pc: c0000000001e29bc: .remove_migration_pte+0x9c/0x320
    lr: c0000000001e29b8: .remove_migration_pte+0x98/0x320
    sp: c0000003c70f7970
   msr: 8000000000009032
   dar: f2000f80000000
 dsisr: 40000000
  current = 0xc0000003f9045800
  paca    = 0xc000000001dc6c00   softe: 0        irq_happened: 0x01
    pid   = 3585, comm = trinity-c27
enter ? for help
[c0000003c70f7a20] c0000000001bce88 .rmap_walk+0x328/0x470
[c0000003c70f7ae0] c0000000001e2904 .remove_migration_ptes+0x44/0x60
[c0000003c70f7b80] c0000000001e4ce8 .migrate_pages+0x6d8/0xa00
[c0000003c70f7cc0] c0000000001e55ec .SyS_move_pages+0x5dc/0x7d0
[c0000003c70f7e30] c00000000000a1d8 syscall_exit+0x0/0x98
--- Exception: c01 (System Call) at 00003fff7b2b30a8
SP (3fffe09728a0) is in userspace
1b:mon> 

I've hit it twice in two runs:

Unable to handle kernel paging request for data at address 0xf2400f00000000
Faulting instruction address: 0xc0000000001e2a3c
cpu 0xd: Vector: 300 (Data Access) at [c00000038a4bf6f0]
    pc: c0000000001e2a3c: .remove_migration_pte+0x9c/0x320
    lr: c0000000001e2a38: .remove_migration_pte+0x98/0x320
    sp: c00000038a4bf970
   msr: 8000000000009032
   dar: f2400f00000000
 dsisr: 40000000
  current = 0xc0000003acd9e680
  paca    = 0xc000000001dc3400   softe: 0        irq_happened: 0x01
    pid   = 13334, comm = trinity-c13
enter ? for help
[c00000038a4bfa20] c0000000001bcf08 .rmap_walk+0x328/0x470
[c00000038a4bfae0] c0000000001e2984 .remove_migration_ptes+0x44/0x60
[c00000038a4bfb80] c0000000001e4d68 .migrate_pages+0x6d8/0xa00
[c00000038a4bfcc0] c0000000001e566c .SyS_move_pages+0x5dc/0x7d0
[c00000038a4bfe30] c00000000000a1d8 syscall_exit+0x0/0x98
--- Exception: c01 (System Call) at 00003fff79df30a8
SP (3fffda95d500) is in userspace
d:mon> 


If I tell trinity to skip sys_move_pages() it runs for hours.

cheers



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG at mm/memory.c:1489!
@ 2014-05-29  8:59     ` Michael Ellerman
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Ellerman @ 2014-05-29  8:59 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Naoya Horiguchi, Benjamin Herrenschmidt,
	Tony Luck, linux-mm, linux-kernel, trinity

On Wed, 2014-05-28 at 17:33 -0700, Hugh Dickins wrote:
> On Wed, 28 May 2014, Michael Ellerman wrote:
> > Linux Blade312-5 3.15.0-rc7 #306 SMP Wed May 28 17:51:18 EST 2014 ppc64
> > 
> > [watchdog] 27853 iterations. [F:22642 S:5174 HI:1276]
> > ------------[ cut here ]------------
> > kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> > cpu 0xc: Vector: 700 (Program Check) at [c000000384eaf960]
> >     pc: c0000000001ad6f0: .follow_page_mask+0x90/0x650
> >     lr: c0000000001ad6d8: .follow_page_mask+0x78/0x650
> >     sp: c000000384eafbe0
> >    msr: 8000000000029032
> >   current = 0xc0000003c27e1bc0
> >   paca    = 0xc000000001dc3000   softe: 0        irq_happened: 0x01
> >     pid   = 20800, comm = trinity-c12
> > kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> > enter ? for help
> > [c000000384eafcc0] c0000000001e5514 .SyS_move_pages+0x524/0x7d0
> > [c000000384eafe30] c00000000000a1d8 syscall_exit+0x0/0x98
> > --- Exception: c01 (System Call) at 00003fff795f30a8
> > SP (3ffff958f290) is in userspace
> > 
> > I've left it in the debugger, can dig into it a bit more tomorrow
> > if anyone has any clues.
> 
> Thanks for leaving it overnight, but this one is quite obvious,
> so go ahead and reboot whenever suits you.
> 
> Trinity didn't even need to do anything bizarre to get this: that
> ordinary path simply didn't get tried on powerpc or ia64 before.
> 
> Here's a patch which should fix it for you, but I believe leaves
> a race in common with other architectures.  I must turn away to
> other things, and hope Naoya-san can fix up the locking separately
> (or point out why it's already safe).
> 
> [PATCH] mm: fix move_pages follow_page huge_addr BUG
> 
> v3.12's e632a938d914 ("mm: migrate: add hugepage migration code to
> move_pages()") is okay on most arches, but on follow_huge_addr-style
> arches ia64 and powerpc, it hits my old BUG_ON(flags & FOLL_GET)
> from v2.6.15 deceb6cd17e6 ("mm: follow_page with inner ptlock").
> 
> The point of the BUG_ON was that nothing needed FOLL_GET there at
> the time, and it was not clear that we have sufficient locking to
> use get_page() safely here on the outside - maybe the page found has
> already been freed and even reused when follow_huge_addr() returns.
> 
> I suspect that e632a938d914's use of get_page() after return from
> follow_huge_pmd() has the same problem: what prevents a racing
> instance of move_pages() from already migrating away and freeing
> that page by then?  A reference to the page should be taken while
> holding suitable lock (huge_pte_lockptr?), to serialize against
> concurrent migration.
> 
> But I'm not prepared to rework the hugetlb locking here myself;
> so for now just supply a patch to copy e632a938d914's get_page()
> after follow_huge_pmd() to after follow_huge_addr(): removing
> the BUG_ON(flags & FOLL_GET), but probably leaving a race.

Thanks for the detailed explanation Hugh.

Unfortunately I don't know our mm/hugetlb code well enough to give you a good
answer. Ben had a quick look at our follow_huge_addr() and thought it looked
"fishy". He suggested something like what we do in gup_pte_range() with
page_cache_get_speculative() might be in order.

Applying your patch and running trinity pretty immediately results in the
following, which looks related (sys_move_pages() again) ?

Unable to handle kernel paging request for data at address 0xf2000f80000000
Faulting instruction address: 0xc0000000001e29bc
cpu 0x1b: Vector: 300 (Data Access) at [c0000003c70f76f0]
    pc: c0000000001e29bc: .remove_migration_pte+0x9c/0x320
    lr: c0000000001e29b8: .remove_migration_pte+0x98/0x320
    sp: c0000003c70f7970
   msr: 8000000000009032
   dar: f2000f80000000
 dsisr: 40000000
  current = 0xc0000003f9045800
  paca    = 0xc000000001dc6c00   softe: 0        irq_happened: 0x01
    pid   = 3585, comm = trinity-c27
enter ? for help
[c0000003c70f7a20] c0000000001bce88 .rmap_walk+0x328/0x470
[c0000003c70f7ae0] c0000000001e2904 .remove_migration_ptes+0x44/0x60
[c0000003c70f7b80] c0000000001e4ce8 .migrate_pages+0x6d8/0xa00
[c0000003c70f7cc0] c0000000001e55ec .SyS_move_pages+0x5dc/0x7d0
[c0000003c70f7e30] c00000000000a1d8 syscall_exit+0x0/0x98
--- Exception: c01 (System Call) at 00003fff7b2b30a8
SP (3fffe09728a0) is in userspace
1b:mon> 

I've hit it twice in two runs:

Unable to handle kernel paging request for data at address 0xf2400f00000000
Faulting instruction address: 0xc0000000001e2a3c
cpu 0xd: Vector: 300 (Data Access) at [c00000038a4bf6f0]
    pc: c0000000001e2a3c: .remove_migration_pte+0x9c/0x320
    lr: c0000000001e2a38: .remove_migration_pte+0x98/0x320
    sp: c00000038a4bf970
   msr: 8000000000009032
   dar: f2400f00000000
 dsisr: 40000000
  current = 0xc0000003acd9e680
  paca    = 0xc000000001dc3400   softe: 0        irq_happened: 0x01
    pid   = 13334, comm = trinity-c13
enter ? for help
[c00000038a4bfa20] c0000000001bcf08 .rmap_walk+0x328/0x470
[c00000038a4bfae0] c0000000001e2984 .remove_migration_ptes+0x44/0x60
[c00000038a4bfb80] c0000000001e4d68 .migrate_pages+0x6d8/0xa00
[c00000038a4bfcc0] c0000000001e566c .SyS_move_pages+0x5dc/0x7d0
[c00000038a4bfe30] c00000000000a1d8 syscall_exit+0x0/0x98
--- Exception: c01 (System Call) at 00003fff79df30a8
SP (3fffda95d500) is in userspace
d:mon> 


If I tell trinity to skip sys_move_pages() it runs for hours.

cheers


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] hugetlb: restrict hugepage_migration_support() to x86_64 (Re: BUG at mm/memory.c:1489!)
  2014-05-29  8:59     ` Michael Ellerman
  (?)
@ 2014-05-29 18:34     ` Naoya Horiguchi
  2014-05-29 22:04         ` Hugh Dickins
  -1 siblings, 1 reply; 28+ messages in thread
From: Naoya Horiguchi @ 2014-05-29 18:34 UTC (permalink / raw)
  To: mpe
  Cc: Hugh Dickins, Andrew Morton, benh, Tony Luck, linux-mm,
	linux-kernel, trinity

On Thu, May 29, 2014 at 06:59:43PM +1000, Michael Ellerman wrote:
> On Wed, 2014-05-28 at 17:33 -0700, Hugh Dickins wrote:
> > On Wed, 28 May 2014, Michael Ellerman wrote:
> > > Linux Blade312-5 3.15.0-rc7 #306 SMP Wed May 28 17:51:18 EST 2014 ppc64
> > >
> > > [watchdog] 27853 iterations. [F:22642 S:5174 HI:1276]
> > > ------------[ cut here ]------------
> > > kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> > > cpu 0xc: Vector: 700 (Program Check) at [c000000384eaf960]
> > >     pc: c0000000001ad6f0: .follow_page_mask+0x90/0x650
> > >     lr: c0000000001ad6d8: .follow_page_mask+0x78/0x650
> > >     sp: c000000384eafbe0
> > >    msr: 8000000000029032
> > >   current = 0xc0000003c27e1bc0
> > >   paca    = 0xc000000001dc3000   softe: 0        irq_happened: 0x01
> > >     pid   = 20800, comm = trinity-c12
> > > kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> > > enter ? for help
> > > [c000000384eafcc0] c0000000001e5514 .SyS_move_pages+0x524/0x7d0
> > > [c000000384eafe30] c00000000000a1d8 syscall_exit+0x0/0x98
> > > --- Exception: c01 (System Call) at 00003fff795f30a8
> > > SP (3ffff958f290) is in userspace
> > >
> > > I've left it in the debugger, can dig into it a bit more tomorrow
> > > if anyone has any clues.
> >
> > Thanks for leaving it overnight, but this one is quite obvious,
> > so go ahead and reboot whenever suits you.
> >
> > Trinity didn't even need to do anything bizarre to get this: that
> > ordinary path simply didn't get tried on powerpc or ia64 before.
> >
> > Here's a patch which should fix it for you, but I believe leaves
> > a race in common with other architectures.  I must turn away to
> > other things, and hope Naoya-san can fix up the locking separately
> > (or point out why it's already safe).
> >
> > [PATCH] mm: fix move_pages follow_page huge_addr BUG
> >
> > v3.12's e632a938d914 ("mm: migrate: add hugepage migration code to
> > move_pages()") is okay on most arches, but on follow_huge_addr-style
> > arches ia64 and powerpc, it hits my old BUG_ON(flags & FOLL_GET)
> > from v2.6.15 deceb6cd17e6 ("mm: follow_page with inner ptlock").
> >
> > The point of the BUG_ON was that nothing needed FOLL_GET there at
> > the time, and it was not clear that we have sufficient locking to
> > use get_page() safely here on the outside - maybe the page found has
> > already been freed and even reused when follow_huge_addr() returns.
> >
> > I suspect that e632a938d914's use of get_page() after return from
> > follow_huge_pmd() has the same problem: what prevents a racing
> > instance of move_pages() from already migrating away and freeing
> > that page by then?  A reference to the page should be taken while
> > holding suitable lock (huge_pte_lockptr?), to serialize against
> > concurrent migration.
> >
> > But I'm not prepared to rework the hugetlb locking here myself;
> > so for now just supply a patch to copy e632a938d914's get_page()
> > after follow_huge_pmd() to after follow_huge_addr(): removing
> > the BUG_ON(flags & FOLL_GET), but probably leaving a race.
>
> Thanks for the detailed explanation Hugh.
>
> Unfortunately I don't know our mm/hugetlb code well enough to give you a good
> answer. Ben had a quick look at our follow_huge_addr() and thought it looked
> "fishy". He suggested something like what we do in gup_pte_range() with
> page_cache_get_speculative() might be in order.
>
> Applying your patch and running trinity pretty immediately results in the
> following, which looks related (sys_move_pages() again) ?
>
> Unable to handle kernel paging request for data at address 0xf2000f80000000
> Faulting instruction address: 0xc0000000001e29bc
> cpu 0x1b: Vector: 300 (Data Access) at [c0000003c70f76f0]
>     pc: c0000000001e29bc: .remove_migration_pte+0x9c/0x320
>     lr: c0000000001e29b8: .remove_migration_pte+0x98/0x320
>     sp: c0000003c70f7970
>    msr: 8000000000009032
>    dar: f2000f80000000
>  dsisr: 40000000
>   current = 0xc0000003f9045800
>   paca    = 0xc000000001dc6c00   softe: 0        irq_happened: 0x01
>     pid   = 3585, comm = trinity-c27
> enter ? for help
> [c0000003c70f7a20] c0000000001bce88 .rmap_walk+0x328/0x470
> [c0000003c70f7ae0] c0000000001e2904 .remove_migration_ptes+0x44/0x60
> [c0000003c70f7b80] c0000000001e4ce8 .migrate_pages+0x6d8/0xa00
> [c0000003c70f7cc0] c0000000001e55ec .SyS_move_pages+0x5dc/0x7d0
> [c0000003c70f7e30] c00000000000a1d8 syscall_exit+0x0/0x98
> --- Exception: c01 (System Call) at 00003fff7b2b30a8
> SP (3fffe09728a0) is in userspace
> 1b:mon>

Sorry for inconvenience on your testing.

Hugepage migration is enabled for archs which have pmd-level hugepage
(including ppc64,) but not tested except for x86_64.
hugepage_migration_support() controls this so the following patch should
help you avoid the problem, I believe.
Could you try to test with it?

Thanks,
Naoya Horiguchi
---
Date: Thu, 29 May 2014 12:51:37 -0400
Subject: [PATCH] hugetlb: restrict hugepage_migration_support() to x86_64

Curretly hugepage migration is available for all archs which support pmd-level
hugepage, but testing is done only for x86_64 and there're bugs for other archs.
So to avoid breaking such archs, this patch limits the availability strictly to
x86_64 until developers of other archs get interested in enabling this feature.

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: stable@vger.kernel.org # 3.12+
---
 arch/arm/mm/hugetlbpage.c     |  5 -----
 arch/arm64/mm/hugetlbpage.c   |  5 -----
 arch/ia64/mm/hugetlbpage.c    |  5 -----
 arch/metag/mm/hugetlbpage.c   |  5 -----
 arch/mips/mm/hugetlbpage.c    |  5 -----
 arch/powerpc/mm/hugetlbpage.c | 10 ----------
 arch/s390/mm/hugetlbpage.c    |  5 -----
 arch/sh/mm/hugetlbpage.c      |  5 -----
 arch/sparc/mm/hugetlbpage.c   |  5 -----
 arch/tile/mm/hugetlbpage.c    |  5 -----
 arch/x86/Kconfig              |  4 ++++
 arch/x86/mm/hugetlbpage.c     | 10 ----------
 include/linux/hugetlb.h       | 10 ++++++----
 mm/Kconfig                    |  3 +++
 14 files changed, 13 insertions(+), 69 deletions(-)

diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c
index 54ee6163c181..66781bf34077 100644
--- a/arch/arm/mm/hugetlbpage.c
+++ b/arch/arm/mm/hugetlbpage.c
@@ -56,8 +56,3 @@ int pmd_huge(pmd_t pmd)
 {
 	return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
 }
-
-int pmd_huge_support(void)
-{
-	return 1;
-}
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 5e9aec358306..2fc8258bab2d 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -54,11 +54,6 @@ int pud_huge(pud_t pud)
 	return !(pud_val(pud) & PUD_TABLE_BIT);
 }
 
-int pmd_huge_support(void)
-{
-	return 1;
-}
-
 static __init int setup_hugepagesz(char *opt)
 {
 	unsigned long ps = memparse(opt, &opt);
diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c
index 68232db98baa..76069c18ee42 100644
--- a/arch/ia64/mm/hugetlbpage.c
+++ b/arch/ia64/mm/hugetlbpage.c
@@ -114,11 +114,6 @@ int pud_huge(pud_t pud)
 	return 0;
 }
 
-int pmd_huge_support(void)
-{
-	return 0;
-}
-
 struct page *
 follow_huge_pmd(struct mm_struct *mm, unsigned long address, pmd_t *pmd, int write)
 {
diff --git a/arch/metag/mm/hugetlbpage.c b/arch/metag/mm/hugetlbpage.c
index 042431509b56..3c52fa6d0f8e 100644
--- a/arch/metag/mm/hugetlbpage.c
+++ b/arch/metag/mm/hugetlbpage.c
@@ -110,11 +110,6 @@ int pud_huge(pud_t pud)
 	return 0;
 }
 
-int pmd_huge_support(void)
-{
-	return 1;
-}
-
 struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 			     pmd_t *pmd, int write)
 {
diff --git a/arch/mips/mm/hugetlbpage.c b/arch/mips/mm/hugetlbpage.c
index 77e0ae036e7c..4ec8ee10d371 100644
--- a/arch/mips/mm/hugetlbpage.c
+++ b/arch/mips/mm/hugetlbpage.c
@@ -84,11 +84,6 @@ int pud_huge(pud_t pud)
 	return (pud_val(pud) & _PAGE_HUGE) != 0;
 }
 
-int pmd_huge_support(void)
-{
-	return 1;
-}
-
 struct page *
 follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 		pmd_t *pmd, int write)
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index eb923654ba80..7e70ae968e5f 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -86,11 +86,6 @@ int pgd_huge(pgd_t pgd)
 	 */
 	return ((pgd_val(pgd) & 0x3) != 0x0);
 }
-
-int pmd_huge_support(void)
-{
-	return 1;
-}
 #else
 int pmd_huge(pmd_t pmd)
 {
@@ -106,11 +101,6 @@ int pgd_huge(pgd_t pgd)
 {
 	return 0;
 }
-
-int pmd_huge_support(void)
-{
-	return 0;
-}
 #endif
 
 pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index 0727a55d87d9..0ff66a7e29bb 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -220,11 +220,6 @@ int pud_huge(pud_t pud)
 	return 0;
 }
 
-int pmd_huge_support(void)
-{
-	return 1;
-}
-
 struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 			     pmd_t *pmdp, int write)
 {
diff --git a/arch/sh/mm/hugetlbpage.c b/arch/sh/mm/hugetlbpage.c
index 0d676a41081e..d7762349ea48 100644
--- a/arch/sh/mm/hugetlbpage.c
+++ b/arch/sh/mm/hugetlbpage.c
@@ -83,11 +83,6 @@ int pud_huge(pud_t pud)
 	return 0;
 }
 
-int pmd_huge_support(void)
-{
-	return 0;
-}
-
 struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 			     pmd_t *pmd, int write)
 {
diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index 9bd9ce80bf77..d329537739c6 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -231,11 +231,6 @@ int pud_huge(pud_t pud)
 	return 0;
 }
 
-int pmd_huge_support(void)
-{
-	return 0;
-}
-
 struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 			     pmd_t *pmd, int write)
 {
diff --git a/arch/tile/mm/hugetlbpage.c b/arch/tile/mm/hugetlbpage.c
index 0cb3bbaa580c..e514899e1100 100644
--- a/arch/tile/mm/hugetlbpage.c
+++ b/arch/tile/mm/hugetlbpage.c
@@ -166,11 +166,6 @@ int pud_huge(pud_t pud)
 	return !!(pud_val(pud) & _PAGE_HUGE_PAGE);
 }
 
-int pmd_huge_support(void)
-{
-	return 1;
-}
-
 struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 			     pmd_t *pmd, int write)
 {
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 25d2c6f7325e..0cf6a7d0a93e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1871,6 +1871,10 @@ config ARCH_ENABLE_SPLIT_PMD_PTLOCK
 	def_bool y
 	depends on X86_64 || X86_PAE
 
+config ARCH_ENABLE_HUGEPAGE_MIGRATION
+	def_bool y
+	depends on X86_64 || MIGRATION
+
 menu "Power management and ACPI options"
 
 config ARCH_HIBERNATION_HEADER
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 8c9f647ff9e1..8b977ebf9388 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -58,11 +58,6 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 {
 	return NULL;
 }
-
-int pmd_huge_support(void)
-{
-	return 0;
-}
 #else
 
 struct page *
@@ -80,11 +75,6 @@ int pud_huge(pud_t pud)
 {
 	return !!(pud_val(pud) & _PAGE_PSE);
 }
-
-int pmd_huge_support(void)
-{
-	return 1;
-}
 #endif
 
 #ifdef CONFIG_HUGETLB_PAGE
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 63214868c5b2..61c2e349af64 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -385,15 +385,18 @@ static inline pgoff_t basepage_index(struct page *page)
 
 extern void dissolve_free_huge_pages(unsigned long start_pfn,
 				     unsigned long end_pfn);
-int pmd_huge_support(void);
 /*
- * Currently hugepage migration is enabled only for pmd-based hugepage.
+ * Currently hugepage migration is enabled only for x86_64.
  * This function will be updated when hugepage migration is more widely
  * supported.
  */
 static inline int hugepage_migration_support(struct hstate *h)
 {
-	return pmd_huge_support() && (huge_page_shift(h) == PMD_SHIFT);
+#ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
+	return huge_page_shift(h) == PMD_SHIFT;
+#else
+	return 0;
+#endif
 }
 
 static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
@@ -443,7 +446,6 @@ static inline pgoff_t basepage_index(struct page *page)
 	return page->index;
 }
 #define dissolve_free_huge_pages(s, e)	do {} while (0)
-#define pmd_huge_support()	0
 #define hugepage_migration_support(h)	0
 
 static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
diff --git a/mm/Kconfig b/mm/Kconfig
index ebe5880c29d6..1e22701c972b 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -264,6 +264,9 @@ config MIGRATION
 	  pages as migration can relocate pages to satisfy a huge page
 	  allocation instead of reclaiming.
 
+config ARCH_ENABLE_HUGEPAGE_MIGRATION
+	boolean
+
 config PHYS_ADDR_T_64BIT
 	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
 
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: BUG at mm/memory.c:1489!
  2014-05-29  4:52   ` Naoya Horiguchi
@ 2014-05-29 20:50       ` Hugh Dickins
  0 siblings, 0 replies; 28+ messages in thread
From: Hugh Dickins @ 2014-05-29 20:50 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Hugh Dickins, mpe, Andrew Morton, benh, Tony Luck, linux-mm,
	linux-kernel, trinity

On Thu, 29 May 2014, Naoya Horiguchi wrote:
> On Wed, May 28, 2014 at 05:33:11PM -0700, Hugh Dickins wrote:
> > On Wed, 28 May 2014, Michael Ellerman wrote:
> > > Hey folks,
> > > 
> > > Anyone seen this before? Trinity hit it just now:
> > > 
> > > Linux Blade312-5 3.15.0-rc7 #306 SMP Wed May 28 17:51:18 EST 2014 ppc64
> > > 
> > > [watchdog] 27853 iterations. [F:22642 S:5174 HI:1276]
> > > ------------[ cut here ]------------
> > > kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> > > cpu 0xc: Vector: 700 (Program Check) at [c000000384eaf960]
> > >     pc: c0000000001ad6f0: .follow_page_mask+0x90/0x650
> > >     lr: c0000000001ad6d8: .follow_page_mask+0x78/0x650
> > >     sp: c000000384eafbe0
> > >    msr: 8000000000029032
> > >   current = 0xc0000003c27e1bc0
> > >   paca    = 0xc000000001dc3000   softe: 0        irq_happened: 0x01
> > >     pid   = 20800, comm = trinity-c12
> > > kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> > > enter ? for help
> > > [c000000384eafcc0] c0000000001e5514 .SyS_move_pages+0x524/0x7d0
> > > [c000000384eafe30] c00000000000a1d8 syscall_exit+0x0/0x98
> > > --- Exception: c01 (System Call) at 00003fff795f30a8
> > > SP (3ffff958f290) is in userspace
> > > 
> > > I've left it in the debugger, can dig into it a bit more tomorrow
> > > if anyone has any clues.
> > 
> > Thanks for leaving it overnight, but this one is quite obvious,
> > so go ahead and reboot whenever suits you.
> > 
> > Trinity didn't even need to do anything bizarre to get this: that
> > ordinary path simply didn't get tried on powerpc or ia64 before.
> > 
> > Here's a patch which should fix it for you, but I believe leaves
> > a race in common with other architectures.  I must turn away to
> > other things, and hope Naoya-san can fix up the locking separately
> > (or point out why it's already safe).
> > 
> > [PATCH] mm: fix move_pages follow_page huge_addr BUG
> > 
> > v3.12's e632a938d914 ("mm: migrate: add hugepage migration code to
> > move_pages()") is okay on most arches, but on follow_huge_addr-style
> > arches ia64 and powerpc, it hits my old BUG_ON(flags & FOLL_GET)
> > from v2.6.15 deceb6cd17e6 ("mm: follow_page with inner ptlock").
> > 
> > The point of the BUG_ON was that nothing needed FOLL_GET there at
> > the time, and it was not clear that we have sufficient locking to
> > use get_page() safely here on the outside - maybe the page found has
> > already been freed and even reused when follow_huge_addr() returns.
> > 
> > I suspect that e632a938d914's use of get_page() after return from
> > follow_huge_pmd() has the same problem: what prevents a racing
> > instance of move_pages() from already migrating away and freeing
> > that page by then?  A reference to the page should be taken while
> > holding suitable lock (huge_pte_lockptr?), to serialize against
> > concurrent migration.
> 
> Right, we need take huge_pte_lockptr() here, I think.
> 
> > But I'm not prepared to rework the hugetlb locking here myself;
> > so for now just supply a patch to copy e632a938d914's get_page()
> > after follow_huge_pmd() to after follow_huge_addr(): removing
> > the BUG_ON(flags & FOLL_GET), but probably leaving a race.
> 
> This bug was introduced by me, so I'll fix this.
> Thank you for reporting.
> 
> > Fixes: e632a938d914 ("mm: migrate: add hugepage migration code to move_pages()")
> > Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> > Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Cc: stable@vger.kernel.org # 3.12+
> 
> This patch looks good to me.
> 
> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Thanks, but seeing Michael's further problems once this is in,
I think we are better off for now with your non-x86_64 disablement.
But that's not complete in itself, I'll comment there.

Hugh

> 
> > ---
> > Whether this is a patch that should go in without fixing the locking,
> > I don't know.  An unlikely race is better than a triggerable BUG?
> > Or perhaps I'm just wrong about there being any such race.
> > 
> >  mm/memory.c |   12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> > 
> > --- 3.15-rc7/mm/memory.c	2014-04-27 23:55:53.608801152 -0700
> > +++ linux/mm/memory.c	2014-05-28 13:05:48.340124615 -0700
> > @@ -1486,7 +1486,17 @@ struct page *follow_page_mask(struct vm_
> >  
> >  	page = follow_huge_addr(mm, address, flags & FOLL_WRITE);
> >  	if (!IS_ERR(page)) {
> > -		BUG_ON(flags & FOLL_GET);
> > +		if (page && (flags & FOLL_GET)) {
> > +			/*
> > +			 * Refcount on tail pages are not well-defined and
> > +			 * shouldn't be taken. The caller should handle a NULL
> > +			 * return when trying to follow tail pages.
> > +			 */
> > +			if (PageHead(page))
> > +				get_page(page);
> > +			else
> > +				page = NULL;
> > +		}
> >  		goto out;
> >  	}
> >  
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> > 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG at mm/memory.c:1489!
@ 2014-05-29 20:50       ` Hugh Dickins
  0 siblings, 0 replies; 28+ messages in thread
From: Hugh Dickins @ 2014-05-29 20:50 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Hugh Dickins, mpe, Andrew Morton, benh, Tony Luck, linux-mm,
	linux-kernel, trinity

On Thu, 29 May 2014, Naoya Horiguchi wrote:
> On Wed, May 28, 2014 at 05:33:11PM -0700, Hugh Dickins wrote:
> > On Wed, 28 May 2014, Michael Ellerman wrote:
> > > Hey folks,
> > > 
> > > Anyone seen this before? Trinity hit it just now:
> > > 
> > > Linux Blade312-5 3.15.0-rc7 #306 SMP Wed May 28 17:51:18 EST 2014 ppc64
> > > 
> > > [watchdog] 27853 iterations. [F:22642 S:5174 HI:1276]
> > > ------------[ cut here ]------------
> > > kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> > > cpu 0xc: Vector: 700 (Program Check) at [c000000384eaf960]
> > >     pc: c0000000001ad6f0: .follow_page_mask+0x90/0x650
> > >     lr: c0000000001ad6d8: .follow_page_mask+0x78/0x650
> > >     sp: c000000384eafbe0
> > >    msr: 8000000000029032
> > >   current = 0xc0000003c27e1bc0
> > >   paca    = 0xc000000001dc3000   softe: 0        irq_happened: 0x01
> > >     pid   = 20800, comm = trinity-c12
> > > kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> > > enter ? for help
> > > [c000000384eafcc0] c0000000001e5514 .SyS_move_pages+0x524/0x7d0
> > > [c000000384eafe30] c00000000000a1d8 syscall_exit+0x0/0x98
> > > --- Exception: c01 (System Call) at 00003fff795f30a8
> > > SP (3ffff958f290) is in userspace
> > > 
> > > I've left it in the debugger, can dig into it a bit more tomorrow
> > > if anyone has any clues.
> > 
> > Thanks for leaving it overnight, but this one is quite obvious,
> > so go ahead and reboot whenever suits you.
> > 
> > Trinity didn't even need to do anything bizarre to get this: that
> > ordinary path simply didn't get tried on powerpc or ia64 before.
> > 
> > Here's a patch which should fix it for you, but I believe leaves
> > a race in common with other architectures.  I must turn away to
> > other things, and hope Naoya-san can fix up the locking separately
> > (or point out why it's already safe).
> > 
> > [PATCH] mm: fix move_pages follow_page huge_addr BUG
> > 
> > v3.12's e632a938d914 ("mm: migrate: add hugepage migration code to
> > move_pages()") is okay on most arches, but on follow_huge_addr-style
> > arches ia64 and powerpc, it hits my old BUG_ON(flags & FOLL_GET)
> > from v2.6.15 deceb6cd17e6 ("mm: follow_page with inner ptlock").
> > 
> > The point of the BUG_ON was that nothing needed FOLL_GET there at
> > the time, and it was not clear that we have sufficient locking to
> > use get_page() safely here on the outside - maybe the page found has
> > already been freed and even reused when follow_huge_addr() returns.
> > 
> > I suspect that e632a938d914's use of get_page() after return from
> > follow_huge_pmd() has the same problem: what prevents a racing
> > instance of move_pages() from already migrating away and freeing
> > that page by then?  A reference to the page should be taken while
> > holding suitable lock (huge_pte_lockptr?), to serialize against
> > concurrent migration.
> 
> Right, we need take huge_pte_lockptr() here, I think.
> 
> > But I'm not prepared to rework the hugetlb locking here myself;
> > so for now just supply a patch to copy e632a938d914's get_page()
> > after follow_huge_pmd() to after follow_huge_addr(): removing
> > the BUG_ON(flags & FOLL_GET), but probably leaving a race.
> 
> This bug was introduced by me, so I'll fix this.
> Thank you for reporting.
> 
> > Fixes: e632a938d914 ("mm: migrate: add hugepage migration code to move_pages()")
> > Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> > Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Cc: stable@vger.kernel.org # 3.12+
> 
> This patch looks good to me.
> 
> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Thanks, but seeing Michael's further problems once this is in,
I think we are better off for now with your non-x86_64 disablement.
But that's not complete in itself, I'll comment there.

Hugh

> 
> > ---
> > Whether this is a patch that should go in without fixing the locking,
> > I don't know.  An unlikely race is better than a triggerable BUG?
> > Or perhaps I'm just wrong about there being any such race.
> > 
> >  mm/memory.c |   12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> > 
> > --- 3.15-rc7/mm/memory.c	2014-04-27 23:55:53.608801152 -0700
> > +++ linux/mm/memory.c	2014-05-28 13:05:48.340124615 -0700
> > @@ -1486,7 +1486,17 @@ struct page *follow_page_mask(struct vm_
> >  
> >  	page = follow_huge_addr(mm, address, flags & FOLL_WRITE);
> >  	if (!IS_ERR(page)) {
> > -		BUG_ON(flags & FOLL_GET);
> > +		if (page && (flags & FOLL_GET)) {
> > +			/*
> > +			 * Refcount on tail pages are not well-defined and
> > +			 * shouldn't be taken. The caller should handle a NULL
> > +			 * return when trying to follow tail pages.
> > +			 */
> > +			if (PageHead(page))
> > +				get_page(page);
> > +			else
> > +				page = NULL;
> > +		}
> >  		goto out;
> >  	}
> >  
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> > 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG at mm/memory.c:1489!
  2014-05-29  8:59     ` Michael Ellerman
@ 2014-05-29 21:03       ` Hugh Dickins
  -1 siblings, 0 replies; 28+ messages in thread
From: Hugh Dickins @ 2014-05-29 21:03 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Hugh Dickins, Andrew Morton, Naoya Horiguchi,
	Benjamin Herrenschmidt, Tony Luck, linux-mm, linux-kernel,
	trinity

On Thu, 29 May 2014, Michael Ellerman wrote:
> 
> Unfortunately I don't know our mm/hugetlb code well enough to give you a good
> answer. Ben had a quick look at our follow_huge_addr() and thought it looked
> "fishy". He suggested something like what we do in gup_pte_range() with
> page_cache_get_speculative() might be in order.

Fishy indeed, ancient code that was only ever intended for stats-like
usage, not designed for actually getting a hold on the page.  But I
don't think there's a big problem to getting the locking right: just
hope it doesn't require a different strategy on each architecture -
often an irritation with hugetlb.  Naoya-san will sort it out in
due course (not 3.15) I expect, but will probably need testing help.

> 
> Applying your patch and running trinity pretty immediately results in the
> following, which looks related (sys_move_pages() again) ?
> 
> Unable to handle kernel paging request for data at address 0xf2000f80000000
> Faulting instruction address: 0xc0000000001e29bc
> cpu 0x1b: Vector: 300 (Data Access) at [c0000003c70f76f0]
>     pc: c0000000001e29bc: .remove_migration_pte+0x9c/0x320
>     lr: c0000000001e29b8: .remove_migration_pte+0x98/0x320
>     sp: c0000003c70f7970
>    msr: 8000000000009032
>    dar: f2000f80000000
>  dsisr: 40000000
>   current = 0xc0000003f9045800
>   paca    = 0xc000000001dc6c00   softe: 0        irq_happened: 0x01
>     pid   = 3585, comm = trinity-c27
> enter ? for help
> [c0000003c70f7a20] c0000000001bce88 .rmap_walk+0x328/0x470
> [c0000003c70f7ae0] c0000000001e2904 .remove_migration_ptes+0x44/0x60
> [c0000003c70f7b80] c0000000001e4ce8 .migrate_pages+0x6d8/0xa00
> [c0000003c70f7cc0] c0000000001e55ec .SyS_move_pages+0x5dc/0x7d0
> [c0000003c70f7e30] c00000000000a1d8 syscall_exit+0x0/0x98
> --- Exception: c01 (System Call) at 00003fff7b2b30a8
> SP (3fffe09728a0) is in userspace
> 1b:mon> 
> 
> I've hit it twice in two runs:
> 
> If I tell trinity to skip sys_move_pages() it runs for hours.

That's sad.  Sorry for wasting your time with my patch, thank you
for trying it.  What you see might be a consequence of the locking
deficiency I mentioned, given trinity's deviousness; though if it's
being clever like that, I would expect it to have already found the
equivalent issue on x86-64.  So probably not, probably another issue.

As I've said elsewhere, I think we need to go with disablement for now.

Hugh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG at mm/memory.c:1489!
@ 2014-05-29 21:03       ` Hugh Dickins
  0 siblings, 0 replies; 28+ messages in thread
From: Hugh Dickins @ 2014-05-29 21:03 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Hugh Dickins, Andrew Morton, Naoya Horiguchi,
	Benjamin Herrenschmidt, Tony Luck, linux-mm, linux-kernel,
	trinity

On Thu, 29 May 2014, Michael Ellerman wrote:
> 
> Unfortunately I don't know our mm/hugetlb code well enough to give you a good
> answer. Ben had a quick look at our follow_huge_addr() and thought it looked
> "fishy". He suggested something like what we do in gup_pte_range() with
> page_cache_get_speculative() might be in order.

Fishy indeed, ancient code that was only ever intended for stats-like
usage, not designed for actually getting a hold on the page.  But I
don't think there's a big problem to getting the locking right: just
hope it doesn't require a different strategy on each architecture -
often an irritation with hugetlb.  Naoya-san will sort it out in
due course (not 3.15) I expect, but will probably need testing help.

> 
> Applying your patch and running trinity pretty immediately results in the
> following, which looks related (sys_move_pages() again) ?
> 
> Unable to handle kernel paging request for data at address 0xf2000f80000000
> Faulting instruction address: 0xc0000000001e29bc
> cpu 0x1b: Vector: 300 (Data Access) at [c0000003c70f76f0]
>     pc: c0000000001e29bc: .remove_migration_pte+0x9c/0x320
>     lr: c0000000001e29b8: .remove_migration_pte+0x98/0x320
>     sp: c0000003c70f7970
>    msr: 8000000000009032
>    dar: f2000f80000000
>  dsisr: 40000000
>   current = 0xc0000003f9045800
>   paca    = 0xc000000001dc6c00   softe: 0        irq_happened: 0x01
>     pid   = 3585, comm = trinity-c27
> enter ? for help
> [c0000003c70f7a20] c0000000001bce88 .rmap_walk+0x328/0x470
> [c0000003c70f7ae0] c0000000001e2904 .remove_migration_ptes+0x44/0x60
> [c0000003c70f7b80] c0000000001e4ce8 .migrate_pages+0x6d8/0xa00
> [c0000003c70f7cc0] c0000000001e55ec .SyS_move_pages+0x5dc/0x7d0
> [c0000003c70f7e30] c00000000000a1d8 syscall_exit+0x0/0x98
> --- Exception: c01 (System Call) at 00003fff7b2b30a8
> SP (3fffe09728a0) is in userspace
> 1b:mon> 
> 
> I've hit it twice in two runs:
> 
> If I tell trinity to skip sys_move_pages() it runs for hours.

That's sad.  Sorry for wasting your time with my patch, thank you
for trying it.  What you see might be a consequence of the locking
deficiency I mentioned, given trinity's deviousness; though if it's
being clever like that, I would expect it to have already found the
equivalent issue on x86-64.  So probably not, probably another issue.

As I've said elsewhere, I think we need to go with disablement for now.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] hugetlb: restrict hugepage_migration_support() to x86_64 (Re: BUG at mm/memory.c:1489!)
  2014-05-29 18:34     ` [PATCH] hugetlb: restrict hugepage_migration_support() to x86_64 (Re: BUG at mm/memory.c:1489!) Naoya Horiguchi
@ 2014-05-29 22:04         ` Hugh Dickins
  0 siblings, 0 replies; 28+ messages in thread
From: Hugh Dickins @ 2014-05-29 22:04 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: mpe, Hugh Dickins, Andrew Morton, benh, Tony Luck, linux-mm,
	linux-kernel, trinity

On Thu, 29 May 2014, Naoya Horiguchi wrote:
> 
> Curretly hugepage migration is available for all archs which support pmd-level
> hugepage, but testing is done only for x86_64 and there're bugs for other archs.

And even for x86_64 I think: the follow_huge_pmd() locking issue I
mentioned.  But I agree that's a different kind of bug, and probably
not cause to disable the feature even on x86_64 at this stage - but
cause to fix it in a different patch Cc stable when you have a moment.

> So to avoid breaking such archs, this patch limits the availability strictly to
> x86_64 until developers of other archs get interested in enabling this feature.

Hmm, I don't like the sound of "until developers of other archs get
interested in enabling this feature".  Your choice, I suppose, but I
had been expecting you to give them a little more help than that, by
fixing up the follow_huge_addr() and locking as you expect it to be
(and whatever Michael's subsequent remove_migration_pte() crash comes
from - maybe obvious with a little thought, but I haven't), then
pinging those architectures to give it a try and enable if they wish.

Perhaps I'm expecting too much, and you haven't the time; doubt I have.

I believe your patch below is incomplete, or perhaps you were
expecting to layer it on top of my follow_huge_addr get_page one.
No, I think we should throw mine away if you're going to disable
the feature on most architectures for now (and once the locking is
corrected, my get_page after follow_huge_addr will be wrong anyway).

What I think you're missing is an adjustment to your 71ea2efb1e93
("mm: migrate: remove VM_HUGETLB from vma flag check in vma_migratable()"):
doesn't vma_migratable() need to test VM_HUGETLB when
!CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION?  Then we are saved from
reaching the follow_huge_addr() BUG; and avoid the weird preparation
for migrating HUGETLB pages on architectures which do not support it.

But yes, I think your disablement approach is the right thing for 3.15.

> 
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: stable@vger.kernel.org # 3.12+
> ---
>  arch/arm/mm/hugetlbpage.c     |  5 -----
>  arch/arm64/mm/hugetlbpage.c   |  5 -----
>  arch/ia64/mm/hugetlbpage.c    |  5 -----
>  arch/metag/mm/hugetlbpage.c   |  5 -----
>  arch/mips/mm/hugetlbpage.c    |  5 -----
>  arch/powerpc/mm/hugetlbpage.c | 10 ----------
>  arch/s390/mm/hugetlbpage.c    |  5 -----
>  arch/sh/mm/hugetlbpage.c      |  5 -----
>  arch/sparc/mm/hugetlbpage.c   |  5 -----
>  arch/tile/mm/hugetlbpage.c    |  5 -----
>  arch/x86/Kconfig              |  4 ++++
>  arch/x86/mm/hugetlbpage.c     | 10 ----------
>  include/linux/hugetlb.h       | 10 ++++++----
>  mm/Kconfig                    |  3 +++
>  14 files changed, 13 insertions(+), 69 deletions(-)
> 
> diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c
> index 54ee6163c181..66781bf34077 100644
> --- a/arch/arm/mm/hugetlbpage.c
> +++ b/arch/arm/mm/hugetlbpage.c
> @@ -56,8 +56,3 @@ int pmd_huge(pmd_t pmd)
>  {
>  	return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index 5e9aec358306..2fc8258bab2d 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -54,11 +54,6 @@ int pud_huge(pud_t pud)
>  	return !(pud_val(pud) & PUD_TABLE_BIT);
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  static __init int setup_hugepagesz(char *opt)
>  {
>  	unsigned long ps = memparse(opt, &opt);
> diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c
> index 68232db98baa..76069c18ee42 100644
> --- a/arch/ia64/mm/hugetlbpage.c
> +++ b/arch/ia64/mm/hugetlbpage.c
> @@ -114,11 +114,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
> -
>  struct page *
>  follow_huge_pmd(struct mm_struct *mm, unsigned long address, pmd_t *pmd, int write)
>  {
> diff --git a/arch/metag/mm/hugetlbpage.c b/arch/metag/mm/hugetlbpage.c
> index 042431509b56..3c52fa6d0f8e 100644
> --- a/arch/metag/mm/hugetlbpage.c
> +++ b/arch/metag/mm/hugetlbpage.c
> @@ -110,11 +110,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmd, int write)
>  {
> diff --git a/arch/mips/mm/hugetlbpage.c b/arch/mips/mm/hugetlbpage.c
> index 77e0ae036e7c..4ec8ee10d371 100644
> --- a/arch/mips/mm/hugetlbpage.c
> +++ b/arch/mips/mm/hugetlbpage.c
> @@ -84,11 +84,6 @@ int pud_huge(pud_t pud)
>  	return (pud_val(pud) & _PAGE_HUGE) != 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  struct page *
>  follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  		pmd_t *pmd, int write)
> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
> index eb923654ba80..7e70ae968e5f 100644
> --- a/arch/powerpc/mm/hugetlbpage.c
> +++ b/arch/powerpc/mm/hugetlbpage.c
> @@ -86,11 +86,6 @@ int pgd_huge(pgd_t pgd)
>  	 */
>  	return ((pgd_val(pgd) & 0x3) != 0x0);
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
>  #else
>  int pmd_huge(pmd_t pmd)
>  {
> @@ -106,11 +101,6 @@ int pgd_huge(pgd_t pgd)
>  {
>  	return 0;
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
>  #endif
>  
>  pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
> diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
> index 0727a55d87d9..0ff66a7e29bb 100644
> --- a/arch/s390/mm/hugetlbpage.c
> +++ b/arch/s390/mm/hugetlbpage.c
> @@ -220,11 +220,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmdp, int write)
>  {
> diff --git a/arch/sh/mm/hugetlbpage.c b/arch/sh/mm/hugetlbpage.c
> index 0d676a41081e..d7762349ea48 100644
> --- a/arch/sh/mm/hugetlbpage.c
> +++ b/arch/sh/mm/hugetlbpage.c
> @@ -83,11 +83,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmd, int write)
>  {
> diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
> index 9bd9ce80bf77..d329537739c6 100644
> --- a/arch/sparc/mm/hugetlbpage.c
> +++ b/arch/sparc/mm/hugetlbpage.c
> @@ -231,11 +231,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmd, int write)
>  {
> diff --git a/arch/tile/mm/hugetlbpage.c b/arch/tile/mm/hugetlbpage.c
> index 0cb3bbaa580c..e514899e1100 100644
> --- a/arch/tile/mm/hugetlbpage.c
> +++ b/arch/tile/mm/hugetlbpage.c
> @@ -166,11 +166,6 @@ int pud_huge(pud_t pud)
>  	return !!(pud_val(pud) & _PAGE_HUGE_PAGE);
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmd, int write)
>  {
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 25d2c6f7325e..0cf6a7d0a93e 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1871,6 +1871,10 @@ config ARCH_ENABLE_SPLIT_PMD_PTLOCK
>  	def_bool y
>  	depends on X86_64 || X86_PAE
>  
> +config ARCH_ENABLE_HUGEPAGE_MIGRATION
> +	def_bool y
> +	depends on X86_64 || MIGRATION
> +

Should that be X86_64 && MIGRATION?  X86_64 && HUGETLB_PAGE && MIGRATION?
Maybe it doesn't matter.

Yes, I agree a per-arch config option is better than all those
pmd_huge_support() functions, especially all the ones saying 0.

>  menu "Power management and ACPI options"
>  
>  config ARCH_HIBERNATION_HEADER
> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
> index 8c9f647ff9e1..8b977ebf9388 100644
> --- a/arch/x86/mm/hugetlbpage.c
> +++ b/arch/x86/mm/hugetlbpage.c
> @@ -58,11 +58,6 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  {
>  	return NULL;
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
>  #else
>  
>  struct page *
> @@ -80,11 +75,6 @@ int pud_huge(pud_t pud)
>  {
>  	return !!(pud_val(pud) & _PAGE_PSE);
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
>  #endif
>  
>  #ifdef CONFIG_HUGETLB_PAGE
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 63214868c5b2..61c2e349af64 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -385,15 +385,18 @@ static inline pgoff_t basepage_index(struct page *page)
>  
>  extern void dissolve_free_huge_pages(unsigned long start_pfn,
>  				     unsigned long end_pfn);
> -int pmd_huge_support(void);
>  /*
> - * Currently hugepage migration is enabled only for pmd-based hugepage.
> + * Currently hugepage migration is enabled only for x86_64.

You don't want to have to update that comment every time an architecture
opts in.  No need for any comment here, I think, the name is good enough
(though hugepage_migration_supported() would be better).

>   * This function will be updated when hugepage migration is more widely
>   * supported.
>   */
>  static inline int hugepage_migration_support(struct hstate *h)
>  {
> -	return pmd_huge_support() && (huge_page_shift(h) == PMD_SHIFT);
> +#ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
> +	return huge_page_shift(h) == PMD_SHIFT;
> +#else
> +	return 0;
> +#endif
>  }
>  
>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
> @@ -443,7 +446,6 @@ static inline pgoff_t basepage_index(struct page *page)
>  	return page->index;
>  }
>  #define dissolve_free_huge_pages(s, e)	do {} while (0)
> -#define pmd_huge_support()	0
>  #define hugepage_migration_support(h)	0
>  
>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
> diff --git a/mm/Kconfig b/mm/Kconfig
> index ebe5880c29d6..1e22701c972b 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -264,6 +264,9 @@ config MIGRATION
>  	  pages as migration can relocate pages to satisfy a huge page
>  	  allocation instead of reclaiming.
>  
> +config ARCH_ENABLE_HUGEPAGE_MIGRATION
> +	boolean
> +

I don't remember how duplicated config entries work,
so cannot comment on that.

>  config PHYS_ADDR_T_64BIT
>  	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
>  
> -- 
> 1.9.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] hugetlb: restrict hugepage_migration_support() to x86_64 (Re: BUG at mm/memory.c:1489!)
@ 2014-05-29 22:04         ` Hugh Dickins
  0 siblings, 0 replies; 28+ messages in thread
From: Hugh Dickins @ 2014-05-29 22:04 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: mpe, Hugh Dickins, Andrew Morton, benh, Tony Luck, linux-mm,
	linux-kernel, trinity

On Thu, 29 May 2014, Naoya Horiguchi wrote:
> 
> Curretly hugepage migration is available for all archs which support pmd-level
> hugepage, but testing is done only for x86_64 and there're bugs for other archs.

And even for x86_64 I think: the follow_huge_pmd() locking issue I
mentioned.  But I agree that's a different kind of bug, and probably
not cause to disable the feature even on x86_64 at this stage - but
cause to fix it in a different patch Cc stable when you have a moment.

> So to avoid breaking such archs, this patch limits the availability strictly to
> x86_64 until developers of other archs get interested in enabling this feature.

Hmm, I don't like the sound of "until developers of other archs get
interested in enabling this feature".  Your choice, I suppose, but I
had been expecting you to give them a little more help than that, by
fixing up the follow_huge_addr() and locking as you expect it to be
(and whatever Michael's subsequent remove_migration_pte() crash comes
from - maybe obvious with a little thought, but I haven't), then
pinging those architectures to give it a try and enable if they wish.

Perhaps I'm expecting too much, and you haven't the time; doubt I have.

I believe your patch below is incomplete, or perhaps you were
expecting to layer it on top of my follow_huge_addr get_page one.
No, I think we should throw mine away if you're going to disable
the feature on most architectures for now (and once the locking is
corrected, my get_page after follow_huge_addr will be wrong anyway).

What I think you're missing is an adjustment to your 71ea2efb1e93
("mm: migrate: remove VM_HUGETLB from vma flag check in vma_migratable()"):
doesn't vma_migratable() need to test VM_HUGETLB when
!CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION?  Then we are saved from
reaching the follow_huge_addr() BUG; and avoid the weird preparation
for migrating HUGETLB pages on architectures which do not support it.

But yes, I think your disablement approach is the right thing for 3.15.

> 
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: stable@vger.kernel.org # 3.12+
> ---
>  arch/arm/mm/hugetlbpage.c     |  5 -----
>  arch/arm64/mm/hugetlbpage.c   |  5 -----
>  arch/ia64/mm/hugetlbpage.c    |  5 -----
>  arch/metag/mm/hugetlbpage.c   |  5 -----
>  arch/mips/mm/hugetlbpage.c    |  5 -----
>  arch/powerpc/mm/hugetlbpage.c | 10 ----------
>  arch/s390/mm/hugetlbpage.c    |  5 -----
>  arch/sh/mm/hugetlbpage.c      |  5 -----
>  arch/sparc/mm/hugetlbpage.c   |  5 -----
>  arch/tile/mm/hugetlbpage.c    |  5 -----
>  arch/x86/Kconfig              |  4 ++++
>  arch/x86/mm/hugetlbpage.c     | 10 ----------
>  include/linux/hugetlb.h       | 10 ++++++----
>  mm/Kconfig                    |  3 +++
>  14 files changed, 13 insertions(+), 69 deletions(-)
> 
> diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c
> index 54ee6163c181..66781bf34077 100644
> --- a/arch/arm/mm/hugetlbpage.c
> +++ b/arch/arm/mm/hugetlbpage.c
> @@ -56,8 +56,3 @@ int pmd_huge(pmd_t pmd)
>  {
>  	return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index 5e9aec358306..2fc8258bab2d 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -54,11 +54,6 @@ int pud_huge(pud_t pud)
>  	return !(pud_val(pud) & PUD_TABLE_BIT);
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  static __init int setup_hugepagesz(char *opt)
>  {
>  	unsigned long ps = memparse(opt, &opt);
> diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c
> index 68232db98baa..76069c18ee42 100644
> --- a/arch/ia64/mm/hugetlbpage.c
> +++ b/arch/ia64/mm/hugetlbpage.c
> @@ -114,11 +114,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
> -
>  struct page *
>  follow_huge_pmd(struct mm_struct *mm, unsigned long address, pmd_t *pmd, int write)
>  {
> diff --git a/arch/metag/mm/hugetlbpage.c b/arch/metag/mm/hugetlbpage.c
> index 042431509b56..3c52fa6d0f8e 100644
> --- a/arch/metag/mm/hugetlbpage.c
> +++ b/arch/metag/mm/hugetlbpage.c
> @@ -110,11 +110,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmd, int write)
>  {
> diff --git a/arch/mips/mm/hugetlbpage.c b/arch/mips/mm/hugetlbpage.c
> index 77e0ae036e7c..4ec8ee10d371 100644
> --- a/arch/mips/mm/hugetlbpage.c
> +++ b/arch/mips/mm/hugetlbpage.c
> @@ -84,11 +84,6 @@ int pud_huge(pud_t pud)
>  	return (pud_val(pud) & _PAGE_HUGE) != 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  struct page *
>  follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  		pmd_t *pmd, int write)
> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
> index eb923654ba80..7e70ae968e5f 100644
> --- a/arch/powerpc/mm/hugetlbpage.c
> +++ b/arch/powerpc/mm/hugetlbpage.c
> @@ -86,11 +86,6 @@ int pgd_huge(pgd_t pgd)
>  	 */
>  	return ((pgd_val(pgd) & 0x3) != 0x0);
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
>  #else
>  int pmd_huge(pmd_t pmd)
>  {
> @@ -106,11 +101,6 @@ int pgd_huge(pgd_t pgd)
>  {
>  	return 0;
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
>  #endif
>  
>  pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
> diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
> index 0727a55d87d9..0ff66a7e29bb 100644
> --- a/arch/s390/mm/hugetlbpage.c
> +++ b/arch/s390/mm/hugetlbpage.c
> @@ -220,11 +220,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmdp, int write)
>  {
> diff --git a/arch/sh/mm/hugetlbpage.c b/arch/sh/mm/hugetlbpage.c
> index 0d676a41081e..d7762349ea48 100644
> --- a/arch/sh/mm/hugetlbpage.c
> +++ b/arch/sh/mm/hugetlbpage.c
> @@ -83,11 +83,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmd, int write)
>  {
> diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
> index 9bd9ce80bf77..d329537739c6 100644
> --- a/arch/sparc/mm/hugetlbpage.c
> +++ b/arch/sparc/mm/hugetlbpage.c
> @@ -231,11 +231,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmd, int write)
>  {
> diff --git a/arch/tile/mm/hugetlbpage.c b/arch/tile/mm/hugetlbpage.c
> index 0cb3bbaa580c..e514899e1100 100644
> --- a/arch/tile/mm/hugetlbpage.c
> +++ b/arch/tile/mm/hugetlbpage.c
> @@ -166,11 +166,6 @@ int pud_huge(pud_t pud)
>  	return !!(pud_val(pud) & _PAGE_HUGE_PAGE);
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmd, int write)
>  {
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 25d2c6f7325e..0cf6a7d0a93e 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1871,6 +1871,10 @@ config ARCH_ENABLE_SPLIT_PMD_PTLOCK
>  	def_bool y
>  	depends on X86_64 || X86_PAE
>  
> +config ARCH_ENABLE_HUGEPAGE_MIGRATION
> +	def_bool y
> +	depends on X86_64 || MIGRATION
> +

Should that be X86_64 && MIGRATION?  X86_64 && HUGETLB_PAGE && MIGRATION?
Maybe it doesn't matter.

Yes, I agree a per-arch config option is better than all those
pmd_huge_support() functions, especially all the ones saying 0.

>  menu "Power management and ACPI options"
>  
>  config ARCH_HIBERNATION_HEADER
> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
> index 8c9f647ff9e1..8b977ebf9388 100644
> --- a/arch/x86/mm/hugetlbpage.c
> +++ b/arch/x86/mm/hugetlbpage.c
> @@ -58,11 +58,6 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  {
>  	return NULL;
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
>  #else
>  
>  struct page *
> @@ -80,11 +75,6 @@ int pud_huge(pud_t pud)
>  {
>  	return !!(pud_val(pud) & _PAGE_PSE);
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
>  #endif
>  
>  #ifdef CONFIG_HUGETLB_PAGE
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 63214868c5b2..61c2e349af64 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -385,15 +385,18 @@ static inline pgoff_t basepage_index(struct page *page)
>  
>  extern void dissolve_free_huge_pages(unsigned long start_pfn,
>  				     unsigned long end_pfn);
> -int pmd_huge_support(void);
>  /*
> - * Currently hugepage migration is enabled only for pmd-based hugepage.
> + * Currently hugepage migration is enabled only for x86_64.

You don't want to have to update that comment every time an architecture
opts in.  No need for any comment here, I think, the name is good enough
(though hugepage_migration_supported() would be better).

>   * This function will be updated when hugepage migration is more widely
>   * supported.
>   */
>  static inline int hugepage_migration_support(struct hstate *h)
>  {
> -	return pmd_huge_support() && (huge_page_shift(h) == PMD_SHIFT);
> +#ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
> +	return huge_page_shift(h) == PMD_SHIFT;
> +#else
> +	return 0;
> +#endif
>  }
>  
>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
> @@ -443,7 +446,6 @@ static inline pgoff_t basepage_index(struct page *page)
>  	return page->index;
>  }
>  #define dissolve_free_huge_pages(s, e)	do {} while (0)
> -#define pmd_huge_support()	0
>  #define hugepage_migration_support(h)	0
>  
>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
> diff --git a/mm/Kconfig b/mm/Kconfig
> index ebe5880c29d6..1e22701c972b 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -264,6 +264,9 @@ config MIGRATION
>  	  pages as migration can relocate pages to satisfy a huge page
>  	  allocation instead of reclaiming.
>  
> +config ARCH_ENABLE_HUGEPAGE_MIGRATION
> +	boolean
> +

I don't remember how duplicated config entries work,
so cannot comment on that.

>  config PHYS_ADDR_T_64BIT
>  	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
>  
> -- 
> 1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] hugetlb: restrict hugepage_migration_support() to x86_64 (Re: BUG at mm/memory.c:1489!)
       [not found]     ` <1401388474-mqnis5cp@n-horiguchi@ah.jp.nec.com>
@ 2014-05-30  1:35         ` Michael Ellerman
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Ellerman @ 2014-05-30  1:35 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Hugh Dickins, Andrew Morton, benh, Tony Luck, linux-mm,
	linux-kernel, trinity

On Thu, 2014-05-29 at 14:34 -0400, Naoya Horiguchi wrote:
> On Thu, May 29, 2014 at 06:59:43PM +1000, Michael Ellerman wrote:
> > Applying your patch and running trinity pretty immediately results in the
> > following, which looks related (sys_move_pages() again) ?
> >
> > Unable to handle kernel paging request for data at address 0xf2000f80000000
> > Faulting instruction address: 0xc0000000001e29bc
> > cpu 0x1b: Vector: 300 (Data Access) at [c0000003c70f76f0]
> >     pc: c0000000001e29bc: .remove_migration_pte+0x9c/0x320
> >     lr: c0000000001e29b8: .remove_migration_pte+0x98/0x320
> >     sp: c0000003c70f7970
> >    msr: 8000000000009032
> >    dar: f2000f80000000
> >  dsisr: 40000000
> >   current = 0xc0000003f9045800
> >   paca    = 0xc000000001dc6c00   softe: 0        irq_happened: 0x01
> >     pid   = 3585, comm = trinity-c27
> > enter ? for help
> > [c0000003c70f7a20] c0000000001bce88 .rmap_walk+0x328/0x470
> > [c0000003c70f7ae0] c0000000001e2904 .remove_migration_ptes+0x44/0x60
> > [c0000003c70f7b80] c0000000001e4ce8 .migrate_pages+0x6d8/0xa00
> > [c0000003c70f7cc0] c0000000001e55ec .SyS_move_pages+0x5dc/0x7d0
> > [c0000003c70f7e30] c00000000000a1d8 syscall_exit+0x0/0x98
> > --- Exception: c01 (System Call) at 00003fff7b2b30a8
> > SP (3fffe09728a0) is in userspace
> > 1b:mon>
>
> Sorry for inconvenience on your testing.
 
That's fine, it's good to find bugs :)

> Hugepage migration is enabled for archs which have pmd-level hugepage
> (including ppc64,) but not tested except for x86_64.
> hugepage_migration_support() controls this so the following patch should
> help you avoid the problem, I believe.
> Could you try to test with it?

Sure. So this patch, in addition to Hugh's patch to remove the BUG_ON(), does
avoid the crash above (remove_migration_pte()).

I dropped Hugh's patch, as he has decided he doesn't like it, and added the
following hunk instead:

diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 3c1b968..f230a97 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -175,6 +175,12 @@ static inline int vma_migratable(struct vm_area_struct *vma)
 {
        if (vma->vm_flags & (VM_IO | VM_PFNMAP))
                return 0;
+
+#ifndef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
+       if (vma->vm_flags & VM_HUGETLB)
+               return 0;
+#endif
+
        /*
         * Migration allocates pages in the highest zone. If we cannot
         * do so then migration (at least from node to node) is not


Which seems to be what Hugh was referring to in his mail - correct me if I'm
wrong Hugh.

With your patch and the above hunk I can run trinity happily for a while,
whereas without it crashes almost immediately.

So with the above hunk you can add my tested-by.

cheers



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] hugetlb: restrict hugepage_migration_support() to x86_64 (Re: BUG at mm/memory.c:1489!)
@ 2014-05-30  1:35         ` Michael Ellerman
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Ellerman @ 2014-05-30  1:35 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Hugh Dickins, Andrew Morton, benh, Tony Luck, linux-mm,
	linux-kernel, trinity

On Thu, 2014-05-29 at 14:34 -0400, Naoya Horiguchi wrote:
> On Thu, May 29, 2014 at 06:59:43PM +1000, Michael Ellerman wrote:
> > Applying your patch and running trinity pretty immediately results in the
> > following, which looks related (sys_move_pages() again) ?
> >
> > Unable to handle kernel paging request for data at address 0xf2000f80000000
> > Faulting instruction address: 0xc0000000001e29bc
> > cpu 0x1b: Vector: 300 (Data Access) at [c0000003c70f76f0]
> >     pc: c0000000001e29bc: .remove_migration_pte+0x9c/0x320
> >     lr: c0000000001e29b8: .remove_migration_pte+0x98/0x320
> >     sp: c0000003c70f7970
> >    msr: 8000000000009032
> >    dar: f2000f80000000
> >  dsisr: 40000000
> >   current = 0xc0000003f9045800
> >   paca    = 0xc000000001dc6c00   softe: 0        irq_happened: 0x01
> >     pid   = 3585, comm = trinity-c27
> > enter ? for help
> > [c0000003c70f7a20] c0000000001bce88 .rmap_walk+0x328/0x470
> > [c0000003c70f7ae0] c0000000001e2904 .remove_migration_ptes+0x44/0x60
> > [c0000003c70f7b80] c0000000001e4ce8 .migrate_pages+0x6d8/0xa00
> > [c0000003c70f7cc0] c0000000001e55ec .SyS_move_pages+0x5dc/0x7d0
> > [c0000003c70f7e30] c00000000000a1d8 syscall_exit+0x0/0x98
> > --- Exception: c01 (System Call) at 00003fff7b2b30a8
> > SP (3fffe09728a0) is in userspace
> > 1b:mon>
>
> Sorry for inconvenience on your testing.
 
That's fine, it's good to find bugs :)

> Hugepage migration is enabled for archs which have pmd-level hugepage
> (including ppc64,) but not tested except for x86_64.
> hugepage_migration_support() controls this so the following patch should
> help you avoid the problem, I believe.
> Could you try to test with it?

Sure. So this patch, in addition to Hugh's patch to remove the BUG_ON(), does
avoid the crash above (remove_migration_pte()).

I dropped Hugh's patch, as he has decided he doesn't like it, and added the
following hunk instead:

diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 3c1b968..f230a97 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -175,6 +175,12 @@ static inline int vma_migratable(struct vm_area_struct *vma)
 {
        if (vma->vm_flags & (VM_IO | VM_PFNMAP))
                return 0;
+
+#ifndef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
+       if (vma->vm_flags & VM_HUGETLB)
+               return 0;
+#endif
+
        /*
         * Migration allocates pages in the highest zone. If we cannot
         * do so then migration (at least from node to node) is not


Which seems to be what Hugh was referring to in his mail - correct me if I'm
wrong Hugh.

With your patch and the above hunk I can run trinity happily for a while,
whereas without it crashes almost immediately.

So with the above hunk you can add my tested-by.

cheers


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] hugetlb: restrict hugepage_migration_support() to x86_64 (Re: BUG at mm/memory.c:1489!)
  2014-05-30  1:35         ` Michael Ellerman
@ 2014-05-30  1:52           ` Hugh Dickins
  -1 siblings, 0 replies; 28+ messages in thread
From: Hugh Dickins @ 2014-05-30  1:52 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Naoya Horiguchi, Hugh Dickins, Andrew Morton, benh, Tony Luck,
	linux-mm, linux-kernel, trinity

On Fri, 30 May 2014, Michael Ellerman wrote:
> 
> diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
> index 3c1b968..f230a97 100644
> --- a/include/linux/mempolicy.h
> +++ b/include/linux/mempolicy.h
> @@ -175,6 +175,12 @@ static inline int vma_migratable(struct vm_area_struct *vma)
>  {
>         if (vma->vm_flags & (VM_IO | VM_PFNMAP))
>                 return 0;
> +
> +#ifndef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
> +       if (vma->vm_flags & VM_HUGETLB)
> +               return 0;
> +#endif
> +
>         /*
>          * Migration allocates pages in the highest zone. If we cannot
>          * do so then migration (at least from node to node) is not

That's right, thanks.

> 
> 
> Which seems to be what Hugh was referring to in his mail - correct me if I'm
> wrong Hugh.
> 
> With your patch and the above hunk I can run trinity happily for a while,
> whereas without it crashes almost immediately.
> 
> So with the above hunk you can add my tested-by.
> 
> cheers

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] hugetlb: restrict hugepage_migration_support() to x86_64 (Re: BUG at mm/memory.c:1489!)
@ 2014-05-30  1:52           ` Hugh Dickins
  0 siblings, 0 replies; 28+ messages in thread
From: Hugh Dickins @ 2014-05-30  1:52 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Naoya Horiguchi, Hugh Dickins, Andrew Morton, benh, Tony Luck,
	linux-mm, linux-kernel, trinity

On Fri, 30 May 2014, Michael Ellerman wrote:
> 
> diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
> index 3c1b968..f230a97 100644
> --- a/include/linux/mempolicy.h
> +++ b/include/linux/mempolicy.h
> @@ -175,6 +175,12 @@ static inline int vma_migratable(struct vm_area_struct *vma)
>  {
>         if (vma->vm_flags & (VM_IO | VM_PFNMAP))
>                 return 0;
> +
> +#ifndef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
> +       if (vma->vm_flags & VM_HUGETLB)
> +               return 0;
> +#endif
> +
>         /*
>          * Migration allocates pages in the highest zone. If we cannot
>          * do so then migration (at least from node to node) is not

That's right, thanks.

> 
> 
> Which seems to be what Hugh was referring to in his mail - correct me if I'm
> wrong Hugh.
> 
> With your patch and the above hunk I can run trinity happily for a while,
> whereas without it crashes almost immediately.
> 
> So with the above hunk you can add my tested-by.
> 
> cheers

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] hugetlb: restrict hugepage_migration_support() to x86_64 (Re: BUG at mm/memory.c:1489!)
  2014-05-29 22:04         ` Hugh Dickins
  (?)
@ 2014-05-30  2:56         ` Naoya Horiguchi
  -1 siblings, 0 replies; 28+ messages in thread
From: Naoya Horiguchi @ 2014-05-30  2:56 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: mpe, Andrew Morton, benh, Tony Luck, linux-mm, linux-kernel, trinity

On Thu, May 29, 2014 at 03:04:57PM -0700, Hugh Dickins wrote:
> On Thu, 29 May 2014, Naoya Horiguchi wrote:
> > 
> > Curretly hugepage migration is available for all archs which support pmd-level
> > hugepage, but testing is done only for x86_64 and there're bugs for other archs.
> 
> And even for x86_64 I think: the follow_huge_pmd() locking issue I
> mentioned.  But I agree that's a different kind of bug, and probably
> not cause to disable the feature even on x86_64 at this stage - but
> cause to fix it in a different patch Cc stable when you have a moment.

Yes, I promise to do it.

> > So to avoid breaking such archs, this patch limits the availability strictly to
> > x86_64 until developers of other archs get interested in enabling this feature.
> 
> Hmm, I don't like the sound of "until developers of other archs get
> interested in enabling this feature".  Your choice, I suppose, but I
> had been expecting you to give them a little more help than that, by
> fixing up the follow_huge_addr() and locking as you expect it to be
> (and whatever Michael's subsequent remove_migration_pte() crash comes
> from - maybe obvious with a little thought, but I haven't), then
> pinging those architectures to give it a try and enable if they wish.

I agree, I'll do this (maybe starting by adding more comment to help enable
on other archs) after fixing locking problem.

> Perhaps I'm expecting too much, and you haven't the time; doubt I have.

I can not say when, but I want to do it in a few months.

> I believe your patch below is incomplete, or perhaps you were
> expecting to layer it on top of my follow_huge_addr get_page one.
> No, I think we should throw mine away if you're going to disable
> the feature on most architectures for now (and once the locking is
> corrected, my get_page after follow_huge_addr will be wrong anyway).

OK, you dropped the patch, so follow_page(FOLL_GET) should not called for
hugepages on non-x86_64 archs.

> What I think you're missing is an adjustment to your 71ea2efb1e93
> ("mm: migrate: remove VM_HUGETLB from vma flag check in vma_migratable()"):
> doesn't vma_migratable() need to test VM_HUGETLB when
> !CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION? 

Right, that looks to me a good precheck to skip hugepage.

> Then we are saved from
> reaching the follow_huge_addr() BUG; and avoid the weird preparation
> for migrating HUGETLB pages on architectures which do not support it.
> 
> But yes, I think your disablement approach is the right thing for 3.15.
> 
> > 
> > Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Cc: stable@vger.kernel.org # 3.12+
> > ---
> >  arch/arm/mm/hugetlbpage.c     |  5 -----
> >  arch/arm64/mm/hugetlbpage.c   |  5 -----
> >  arch/ia64/mm/hugetlbpage.c    |  5 -----
> >  arch/metag/mm/hugetlbpage.c   |  5 -----
> >  arch/mips/mm/hugetlbpage.c    |  5 -----
> >  arch/powerpc/mm/hugetlbpage.c | 10 ----------
> >  arch/s390/mm/hugetlbpage.c    |  5 -----
> >  arch/sh/mm/hugetlbpage.c      |  5 -----
> >  arch/sparc/mm/hugetlbpage.c   |  5 -----
> >  arch/tile/mm/hugetlbpage.c    |  5 -----
> >  arch/x86/Kconfig              |  4 ++++
> >  arch/x86/mm/hugetlbpage.c     | 10 ----------
> >  include/linux/hugetlb.h       | 10 ++++++----
> >  mm/Kconfig                    |  3 +++
> >  14 files changed, 13 insertions(+), 69 deletions(-)
> > 
> > diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c
> > index 54ee6163c181..66781bf34077 100644
> > --- a/arch/arm/mm/hugetlbpage.c
> > +++ b/arch/arm/mm/hugetlbpage.c
> > @@ -56,8 +56,3 @@ int pmd_huge(pmd_t pmd)
> >  {
> >  	return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
> >  }
> > -
> > -int pmd_huge_support(void)
> > -{
> > -	return 1;
> > -}
> > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> > index 5e9aec358306..2fc8258bab2d 100644
> > --- a/arch/arm64/mm/hugetlbpage.c
> > +++ b/arch/arm64/mm/hugetlbpage.c
> > @@ -54,11 +54,6 @@ int pud_huge(pud_t pud)
> >  	return !(pud_val(pud) & PUD_TABLE_BIT);
> >  }
> >  
> > -int pmd_huge_support(void)
> > -{
> > -	return 1;
> > -}
> > -
> >  static __init int setup_hugepagesz(char *opt)
> >  {
> >  	unsigned long ps = memparse(opt, &opt);
> > diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c
> > index 68232db98baa..76069c18ee42 100644
> > --- a/arch/ia64/mm/hugetlbpage.c
> > +++ b/arch/ia64/mm/hugetlbpage.c
> > @@ -114,11 +114,6 @@ int pud_huge(pud_t pud)
> >  	return 0;
> >  }
> >  
> > -int pmd_huge_support(void)
> > -{
> > -	return 0;
> > -}
> > -
> >  struct page *
> >  follow_huge_pmd(struct mm_struct *mm, unsigned long address, pmd_t *pmd, int write)
> >  {
> > diff --git a/arch/metag/mm/hugetlbpage.c b/arch/metag/mm/hugetlbpage.c
> > index 042431509b56..3c52fa6d0f8e 100644
> > --- a/arch/metag/mm/hugetlbpage.c
> > +++ b/arch/metag/mm/hugetlbpage.c
> > @@ -110,11 +110,6 @@ int pud_huge(pud_t pud)
> >  	return 0;
> >  }
> >  
> > -int pmd_huge_support(void)
> > -{
> > -	return 1;
> > -}
> > -
> >  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
> >  			     pmd_t *pmd, int write)
> >  {
> > diff --git a/arch/mips/mm/hugetlbpage.c b/arch/mips/mm/hugetlbpage.c
> > index 77e0ae036e7c..4ec8ee10d371 100644
> > --- a/arch/mips/mm/hugetlbpage.c
> > +++ b/arch/mips/mm/hugetlbpage.c
> > @@ -84,11 +84,6 @@ int pud_huge(pud_t pud)
> >  	return (pud_val(pud) & _PAGE_HUGE) != 0;
> >  }
> >  
> > -int pmd_huge_support(void)
> > -{
> > -	return 1;
> > -}
> > -
> >  struct page *
> >  follow_huge_pmd(struct mm_struct *mm, unsigned long address,
> >  		pmd_t *pmd, int write)
> > diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
> > index eb923654ba80..7e70ae968e5f 100644
> > --- a/arch/powerpc/mm/hugetlbpage.c
> > +++ b/arch/powerpc/mm/hugetlbpage.c
> > @@ -86,11 +86,6 @@ int pgd_huge(pgd_t pgd)
> >  	 */
> >  	return ((pgd_val(pgd) & 0x3) != 0x0);
> >  }
> > -
> > -int pmd_huge_support(void)
> > -{
> > -	return 1;
> > -}
> >  #else
> >  int pmd_huge(pmd_t pmd)
> >  {
> > @@ -106,11 +101,6 @@ int pgd_huge(pgd_t pgd)
> >  {
> >  	return 0;
> >  }
> > -
> > -int pmd_huge_support(void)
> > -{
> > -	return 0;
> > -}
> >  #endif
> >  
> >  pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
> > diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
> > index 0727a55d87d9..0ff66a7e29bb 100644
> > --- a/arch/s390/mm/hugetlbpage.c
> > +++ b/arch/s390/mm/hugetlbpage.c
> > @@ -220,11 +220,6 @@ int pud_huge(pud_t pud)
> >  	return 0;
> >  }
> >  
> > -int pmd_huge_support(void)
> > -{
> > -	return 1;
> > -}
> > -
> >  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
> >  			     pmd_t *pmdp, int write)
> >  {
> > diff --git a/arch/sh/mm/hugetlbpage.c b/arch/sh/mm/hugetlbpage.c
> > index 0d676a41081e..d7762349ea48 100644
> > --- a/arch/sh/mm/hugetlbpage.c
> > +++ b/arch/sh/mm/hugetlbpage.c
> > @@ -83,11 +83,6 @@ int pud_huge(pud_t pud)
> >  	return 0;
> >  }
> >  
> > -int pmd_huge_support(void)
> > -{
> > -	return 0;
> > -}
> > -
> >  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
> >  			     pmd_t *pmd, int write)
> >  {
> > diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
> > index 9bd9ce80bf77..d329537739c6 100644
> > --- a/arch/sparc/mm/hugetlbpage.c
> > +++ b/arch/sparc/mm/hugetlbpage.c
> > @@ -231,11 +231,6 @@ int pud_huge(pud_t pud)
> >  	return 0;
> >  }
> >  
> > -int pmd_huge_support(void)
> > -{
> > -	return 0;
> > -}
> > -
> >  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
> >  			     pmd_t *pmd, int write)
> >  {
> > diff --git a/arch/tile/mm/hugetlbpage.c b/arch/tile/mm/hugetlbpage.c
> > index 0cb3bbaa580c..e514899e1100 100644
> > --- a/arch/tile/mm/hugetlbpage.c
> > +++ b/arch/tile/mm/hugetlbpage.c
> > @@ -166,11 +166,6 @@ int pud_huge(pud_t pud)
> >  	return !!(pud_val(pud) & _PAGE_HUGE_PAGE);
> >  }
> >  
> > -int pmd_huge_support(void)
> > -{
> > -	return 1;
> > -}
> > -
> >  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
> >  			     pmd_t *pmd, int write)
> >  {
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 25d2c6f7325e..0cf6a7d0a93e 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -1871,6 +1871,10 @@ config ARCH_ENABLE_SPLIT_PMD_PTLOCK
> >  	def_bool y
> >  	depends on X86_64 || X86_PAE
> >  
> > +config ARCH_ENABLE_HUGEPAGE_MIGRATION
> > +	def_bool y
> > +	depends on X86_64 || MIGRATION
> > +
> 
> Should that be X86_64 && MIGRATION?  X86_64 && HUGETLB_PAGE && MIGRATION?
> Maybe it doesn't matter.

Ouch, "&&" is what I meant, and HUGETLB_PAGE is also fine.

> Yes, I agree a per-arch config option is better than all those
> pmd_huge_support() functions, especially all the ones saying 0.
>
> >  menu "Power management and ACPI options"
> >  
> >  config ARCH_HIBERNATION_HEADER
> > diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
> > index 8c9f647ff9e1..8b977ebf9388 100644
> > --- a/arch/x86/mm/hugetlbpage.c
> > +++ b/arch/x86/mm/hugetlbpage.c
> > @@ -58,11 +58,6 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
> >  {
> >  	return NULL;
> >  }
> > -
> > -int pmd_huge_support(void)
> > -{
> > -	return 0;
> > -}
> >  #else
> >  
> >  struct page *
> > @@ -80,11 +75,6 @@ int pud_huge(pud_t pud)
> >  {
> >  	return !!(pud_val(pud) & _PAGE_PSE);
> >  }
> > -
> > -int pmd_huge_support(void)
> > -{
> > -	return 1;
> > -}
> >  #endif
> >  
> >  #ifdef CONFIG_HUGETLB_PAGE
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index 63214868c5b2..61c2e349af64 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -385,15 +385,18 @@ static inline pgoff_t basepage_index(struct page *page)
> >  
> >  extern void dissolve_free_huge_pages(unsigned long start_pfn,
> >  				     unsigned long end_pfn);
> > -int pmd_huge_support(void);
> >  /*
> > - * Currently hugepage migration is enabled only for pmd-based hugepage.
> > + * Currently hugepage migration is enabled only for x86_64.
> 
> You don't want to have to update that comment every time an architecture
> opts in.  No need for any comment here, I think, the name is good enough
> (though hugepage_migration_supported() would be better).

Ah, OK. I'll remove this comment.

Thanks,
Naoya Horiguchi

> >   * This function will be updated when hugepage migration is more widely
> >   * supported.
> >   */
> >  static inline int hugepage_migration_support(struct hstate *h)
> >  {
> > -	return pmd_huge_support() && (huge_page_shift(h) == PMD_SHIFT);
> > +#ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
> > +	return huge_page_shift(h) == PMD_SHIFT;
> > +#else
> > +	return 0;
> > +#endif
> >  }
> >  
> >  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
> > @@ -443,7 +446,6 @@ static inline pgoff_t basepage_index(struct page *page)
> >  	return page->index;
> >  }
> >  #define dissolve_free_huge_pages(s, e)	do {} while (0)
> > -#define pmd_huge_support()	0
> >  #define hugepage_migration_support(h)	0
> >  
> >  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index ebe5880c29d6..1e22701c972b 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -264,6 +264,9 @@ config MIGRATION
> >  	  pages as migration can relocate pages to satisfy a huge page
> >  	  allocation instead of reclaiming.
> >  
> > +config ARCH_ENABLE_HUGEPAGE_MIGRATION
> > +	boolean
> > +
> 
> I don't remember how duplicated config entries work,
> so cannot comment on that.
> 
> >  config PHYS_ADDR_T_64BIT
> >  	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
> >  
> > -- 
> > 1.9.3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] hugetlb: restrict hugepage_migration_support() to x86_64 (Re: BUG at mm/memory.c:1489!)
  2014-05-30  1:35         ` Michael Ellerman
  (?)
  (?)
@ 2014-05-30  3:04         ` Naoya Horiguchi
  2014-05-30  4:13             ` Naoya Horiguchi
  2014-05-30  4:13             ` Naoya Horiguchi
  -1 siblings, 2 replies; 28+ messages in thread
From: Naoya Horiguchi @ 2014-05-30  3:04 UTC (permalink / raw)
  To: mpe
  Cc: Hugh Dickins, Andrew Morton, benh, Tony Luck, linux-mm,
	linux-kernel, trinity

On Fri, May 30, 2014 at 11:35:16AM +1000, Michael Ellerman wrote:
> On Thu, 2014-05-29 at 14:34 -0400, Naoya Horiguchi wrote:
> > On Thu, May 29, 2014 at 06:59:43PM +1000, Michael Ellerman wrote:
> > > Applying your patch and running trinity pretty immediately results in the
> > > following, which looks related (sys_move_pages() again) ?
> > >
> > > Unable to handle kernel paging request for data at address 0xf2000f80000000
> > > Faulting instruction address: 0xc0000000001e29bc
> > > cpu 0x1b: Vector: 300 (Data Access) at [c0000003c70f76f0]
> > >     pc: c0000000001e29bc: .remove_migration_pte+0x9c/0x320
> > >     lr: c0000000001e29b8: .remove_migration_pte+0x98/0x320
> > >     sp: c0000003c70f7970
> > >    msr: 8000000000009032
> > >    dar: f2000f80000000
> > >  dsisr: 40000000
> > >   current = 0xc0000003f9045800
> > >   paca    = 0xc000000001dc6c00   softe: 0        irq_happened: 0x01
> > >     pid   = 3585, comm = trinity-c27
> > > enter ? for help
> > > [c0000003c70f7a20] c0000000001bce88 .rmap_walk+0x328/0x470
> > > [c0000003c70f7ae0] c0000000001e2904 .remove_migration_ptes+0x44/0x60
> > > [c0000003c70f7b80] c0000000001e4ce8 .migrate_pages+0x6d8/0xa00
> > > [c0000003c70f7cc0] c0000000001e55ec .SyS_move_pages+0x5dc/0x7d0
> > > [c0000003c70f7e30] c00000000000a1d8 syscall_exit+0x0/0x98
> > > --- Exception: c01 (System Call) at 00003fff7b2b30a8
> > > SP (3fffe09728a0) is in userspace
> > > 1b:mon>
> >
> > Sorry for inconvenience on your testing.
>  
> That's fine, it's good to find bugs :)
> 
> > Hugepage migration is enabled for archs which have pmd-level hugepage
> > (including ppc64,) but not tested except for x86_64.
> > hugepage_migration_support() controls this so the following patch should
> > help you avoid the problem, I believe.
> > Could you try to test with it?
> 
> Sure. So this patch, in addition to Hugh's patch to remove the BUG_ON(), does
> avoid the crash above (remove_migration_pte()).
> 
> I dropped Hugh's patch, as he has decided he doesn't like it, and added the
> following hunk instead:
> 
> diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
> index 3c1b968..f230a97 100644
> --- a/include/linux/mempolicy.h
> +++ b/include/linux/mempolicy.h
> @@ -175,6 +175,12 @@ static inline int vma_migratable(struct vm_area_struct *vma)
>  {
>         if (vma->vm_flags & (VM_IO | VM_PFNMAP))
>                 return 0;
> +
> +#ifndef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
> +       if (vma->vm_flags & VM_HUGETLB)
> +               return 0;
> +#endif
> +
>         /*
>          * Migration allocates pages in the highest zone. If we cannot
>          * do so then migration (at least from node to node) is not
> 
> 
> Which seems to be what Hugh was referring to in his mail - correct me if I'm
> wrong Hugh.
> 
> With your patch and the above hunk I can run trinity happily for a while,
> whereas without it crashes almost immediately.

Great.

> So with the above hunk you can add my tested-by.

OK, thank you for your help.

I'll post the revised patch later.

Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/2] hugetlb: restrict hugepage_migration_support() to x86_64
  2014-05-30  3:04         ` Naoya Horiguchi
@ 2014-05-30  4:13             ` Naoya Horiguchi
  2014-05-30  4:13             ` Naoya Horiguchi
  1 sibling, 0 replies; 28+ messages in thread
From: Naoya Horiguchi @ 2014-05-30  4:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Michael Ellerman, Benjamin Herrenschmidt,
	Tony Luck, linux-kernel, linux-mm, trinity

Curretly hugepage migration is available for all archs which support pmd-level
hugepage, but testing is done only for x86_64 and there're bugs for other archs.
So to avoid breaking such archs, this patch limits the availability strictly to
x86_64 until developers of other archs get interested in enabling this feature.

Simply disabling hugepage migration on non-x86_64 archs is not enough to fix
the reported problem where sys_move_pages() hits the BUG_ON() in
follow_page(FOLL_GET), so let's fix this by checking if hugepage migration is
supported in vma_migratable().

ChangeLog:
- add VM_HUGETLB check in vma_migratable()
- fix dependency in config ARCH_ENABLE_HUGEPAGE_MIGRATION
- remove comment on hugepage_migration_support()

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: stable@vger.kernel.org # 3.12+
---
 arch/arm/mm/hugetlbpage.c     |  5 -----
 arch/arm64/mm/hugetlbpage.c   |  5 -----
 arch/ia64/mm/hugetlbpage.c    |  5 -----
 arch/metag/mm/hugetlbpage.c   |  5 -----
 arch/mips/mm/hugetlbpage.c    |  5 -----
 arch/powerpc/mm/hugetlbpage.c | 10 ----------
 arch/s390/mm/hugetlbpage.c    |  5 -----
 arch/sh/mm/hugetlbpage.c      |  5 -----
 arch/sparc/mm/hugetlbpage.c   |  5 -----
 arch/tile/mm/hugetlbpage.c    |  5 -----
 arch/x86/Kconfig              |  4 ++++
 arch/x86/mm/hugetlbpage.c     | 10 ----------
 include/linux/hugetlb.h       | 13 +++++--------
 include/linux/mempolicy.h     |  6 ++++++
 mm/Kconfig                    |  3 +++
 15 files changed, 18 insertions(+), 73 deletions(-)

diff --git v3.15-rc5.orig/arch/arm/mm/hugetlbpage.c v3.15-rc5/arch/arm/mm/hugetlbpage.c
index 54ee6163c181..66781bf34077 100644
--- v3.15-rc5.orig/arch/arm/mm/hugetlbpage.c
+++ v3.15-rc5/arch/arm/mm/hugetlbpage.c
@@ -56,8 +56,3 @@ int pmd_huge(pmd_t pmd)
 {
 	return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
 }
-
-int pmd_huge_support(void)
-{
-	return 1;
-}
diff --git v3.15-rc5.orig/arch/arm64/mm/hugetlbpage.c v3.15-rc5/arch/arm64/mm/hugetlbpage.c
index 5e9aec358306..2fc8258bab2d 100644
--- v3.15-rc5.orig/arch/arm64/mm/hugetlbpage.c
+++ v3.15-rc5/arch/arm64/mm/hugetlbpage.c
@@ -54,11 +54,6 @@ int pud_huge(pud_t pud)
 	return !(pud_val(pud) & PUD_TABLE_BIT);
 }
 
-int pmd_huge_support(void)
-{
-	return 1;
-}
-
 static __init int setup_hugepagesz(char *opt)
 {
 	unsigned long ps = memparse(opt, &opt);
diff --git v3.15-rc5.orig/arch/ia64/mm/hugetlbpage.c v3.15-rc5/arch/ia64/mm/hugetlbpage.c
index 68232db98baa..76069c18ee42 100644
--- v3.15-rc5.orig/arch/ia64/mm/hugetlbpage.c
+++ v3.15-rc5/arch/ia64/mm/hugetlbpage.c
@@ -114,11 +114,6 @@ int pud_huge(pud_t pud)
 	return 0;
 }
 
-int pmd_huge_support(void)
-{
-	return 0;
-}
-
 struct page *
 follow_huge_pmd(struct mm_struct *mm, unsigned long address, pmd_t *pmd, int write)
 {
diff --git v3.15-rc5.orig/arch/metag/mm/hugetlbpage.c v3.15-rc5/arch/metag/mm/hugetlbpage.c
index 042431509b56..3c52fa6d0f8e 100644
--- v3.15-rc5.orig/arch/metag/mm/hugetlbpage.c
+++ v3.15-rc5/arch/metag/mm/hugetlbpage.c
@@ -110,11 +110,6 @@ int pud_huge(pud_t pud)
 	return 0;
 }
 
-int pmd_huge_support(void)
-{
-	return 1;
-}
-
 struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 			     pmd_t *pmd, int write)
 {
diff --git v3.15-rc5.orig/arch/mips/mm/hugetlbpage.c v3.15-rc5/arch/mips/mm/hugetlbpage.c
index 77e0ae036e7c..4ec8ee10d371 100644
--- v3.15-rc5.orig/arch/mips/mm/hugetlbpage.c
+++ v3.15-rc5/arch/mips/mm/hugetlbpage.c
@@ -84,11 +84,6 @@ int pud_huge(pud_t pud)
 	return (pud_val(pud) & _PAGE_HUGE) != 0;
 }
 
-int pmd_huge_support(void)
-{
-	return 1;
-}
-
 struct page *
 follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 		pmd_t *pmd, int write)
diff --git v3.15-rc5.orig/arch/powerpc/mm/hugetlbpage.c v3.15-rc5/arch/powerpc/mm/hugetlbpage.c
index eb923654ba80..7e70ae968e5f 100644
--- v3.15-rc5.orig/arch/powerpc/mm/hugetlbpage.c
+++ v3.15-rc5/arch/powerpc/mm/hugetlbpage.c
@@ -86,11 +86,6 @@ int pgd_huge(pgd_t pgd)
 	 */
 	return ((pgd_val(pgd) & 0x3) != 0x0);
 }
-
-int pmd_huge_support(void)
-{
-	return 1;
-}
 #else
 int pmd_huge(pmd_t pmd)
 {
@@ -106,11 +101,6 @@ int pgd_huge(pgd_t pgd)
 {
 	return 0;
 }
-
-int pmd_huge_support(void)
-{
-	return 0;
-}
 #endif
 
 pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
diff --git v3.15-rc5.orig/arch/s390/mm/hugetlbpage.c v3.15-rc5/arch/s390/mm/hugetlbpage.c
index 0727a55d87d9..0ff66a7e29bb 100644
--- v3.15-rc5.orig/arch/s390/mm/hugetlbpage.c
+++ v3.15-rc5/arch/s390/mm/hugetlbpage.c
@@ -220,11 +220,6 @@ int pud_huge(pud_t pud)
 	return 0;
 }
 
-int pmd_huge_support(void)
-{
-	return 1;
-}
-
 struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 			     pmd_t *pmdp, int write)
 {
diff --git v3.15-rc5.orig/arch/sh/mm/hugetlbpage.c v3.15-rc5/arch/sh/mm/hugetlbpage.c
index 0d676a41081e..d7762349ea48 100644
--- v3.15-rc5.orig/arch/sh/mm/hugetlbpage.c
+++ v3.15-rc5/arch/sh/mm/hugetlbpage.c
@@ -83,11 +83,6 @@ int pud_huge(pud_t pud)
 	return 0;
 }
 
-int pmd_huge_support(void)
-{
-	return 0;
-}
-
 struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 			     pmd_t *pmd, int write)
 {
diff --git v3.15-rc5.orig/arch/sparc/mm/hugetlbpage.c v3.15-rc5/arch/sparc/mm/hugetlbpage.c
index 9bd9ce80bf77..d329537739c6 100644
--- v3.15-rc5.orig/arch/sparc/mm/hugetlbpage.c
+++ v3.15-rc5/arch/sparc/mm/hugetlbpage.c
@@ -231,11 +231,6 @@ int pud_huge(pud_t pud)
 	return 0;
 }
 
-int pmd_huge_support(void)
-{
-	return 0;
-}
-
 struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 			     pmd_t *pmd, int write)
 {
diff --git v3.15-rc5.orig/arch/tile/mm/hugetlbpage.c v3.15-rc5/arch/tile/mm/hugetlbpage.c
index 0cb3bbaa580c..e514899e1100 100644
--- v3.15-rc5.orig/arch/tile/mm/hugetlbpage.c
+++ v3.15-rc5/arch/tile/mm/hugetlbpage.c
@@ -166,11 +166,6 @@ int pud_huge(pud_t pud)
 	return !!(pud_val(pud) & _PAGE_HUGE_PAGE);
 }
 
-int pmd_huge_support(void)
-{
-	return 1;
-}
-
 struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 			     pmd_t *pmd, int write)
 {
diff --git v3.15-rc5.orig/arch/x86/Kconfig v3.15-rc5/arch/x86/Kconfig
index 25d2c6f7325e..6b8b429c832f 100644
--- v3.15-rc5.orig/arch/x86/Kconfig
+++ v3.15-rc5/arch/x86/Kconfig
@@ -1871,6 +1871,10 @@ config ARCH_ENABLE_SPLIT_PMD_PTLOCK
 	def_bool y
 	depends on X86_64 || X86_PAE
 
+config ARCH_ENABLE_HUGEPAGE_MIGRATION
+	def_bool y
+	depends on X86_64 && HUGETLB_PAGE && MIGRATION
+
 menu "Power management and ACPI options"
 
 config ARCH_HIBERNATION_HEADER
diff --git v3.15-rc5.orig/arch/x86/mm/hugetlbpage.c v3.15-rc5/arch/x86/mm/hugetlbpage.c
index 8c9f647ff9e1..8b977ebf9388 100644
--- v3.15-rc5.orig/arch/x86/mm/hugetlbpage.c
+++ v3.15-rc5/arch/x86/mm/hugetlbpage.c
@@ -58,11 +58,6 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 {
 	return NULL;
 }
-
-int pmd_huge_support(void)
-{
-	return 0;
-}
 #else
 
 struct page *
@@ -80,11 +75,6 @@ int pud_huge(pud_t pud)
 {
 	return !!(pud_val(pud) & _PAGE_PSE);
 }
-
-int pmd_huge_support(void)
-{
-	return 1;
-}
 #endif
 
 #ifdef CONFIG_HUGETLB_PAGE
diff --git v3.15-rc5.orig/include/linux/hugetlb.h v3.15-rc5/include/linux/hugetlb.h
index 63214868c5b2..c9de64cf288d 100644
--- v3.15-rc5.orig/include/linux/hugetlb.h
+++ v3.15-rc5/include/linux/hugetlb.h
@@ -385,15 +385,13 @@ static inline pgoff_t basepage_index(struct page *page)
 
 extern void dissolve_free_huge_pages(unsigned long start_pfn,
 				     unsigned long end_pfn);
-int pmd_huge_support(void);
-/*
- * Currently hugepage migration is enabled only for pmd-based hugepage.
- * This function will be updated when hugepage migration is more widely
- * supported.
- */
 static inline int hugepage_migration_support(struct hstate *h)
 {
-	return pmd_huge_support() && (huge_page_shift(h) == PMD_SHIFT);
+#ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
+	return huge_page_shift(h) == PMD_SHIFT;
+#else
+	return 0;
+#endif
 }
 
 static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
@@ -443,7 +441,6 @@ static inline pgoff_t basepage_index(struct page *page)
 	return page->index;
 }
 #define dissolve_free_huge_pages(s, e)	do {} while (0)
-#define pmd_huge_support()	0
 #define hugepage_migration_support(h)	0
 
 static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
diff --git v3.15-rc5.orig/include/linux/mempolicy.h v3.15-rc5/include/linux/mempolicy.h
index 3c1b968da0ca..f230a978e6ba 100644
--- v3.15-rc5.orig/include/linux/mempolicy.h
+++ v3.15-rc5/include/linux/mempolicy.h
@@ -175,6 +175,12 @@ static inline int vma_migratable(struct vm_area_struct *vma)
 {
 	if (vma->vm_flags & (VM_IO | VM_PFNMAP))
 		return 0;
+
+#ifndef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
+	if (vma->vm_flags & VM_HUGETLB)
+		return 0;
+#endif
+
 	/*
 	 * Migration allocates pages in the highest zone. If we cannot
 	 * do so then migration (at least from node to node) is not
diff --git v3.15-rc5.orig/mm/Kconfig v3.15-rc5/mm/Kconfig
index ebe5880c29d6..1e22701c972b 100644
--- v3.15-rc5.orig/mm/Kconfig
+++ v3.15-rc5/mm/Kconfig
@@ -264,6 +264,9 @@ config MIGRATION
 	  pages as migration can relocate pages to satisfy a huge page
 	  allocation instead of reclaiming.
 
+config ARCH_ENABLE_HUGEPAGE_MIGRATION
+	boolean
+
 config PHYS_ADDR_T_64BIT
 	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 1/2] hugetlb: restrict hugepage_migration_support() to x86_64
@ 2014-05-30  4:13             ` Naoya Horiguchi
  0 siblings, 0 replies; 28+ messages in thread
From: Naoya Horiguchi @ 2014-05-30  4:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Michael Ellerman, Benjamin Herrenschmidt,
	Tony Luck, linux-kernel, linux-mm, trinity

Curretly hugepage migration is available for all archs which support pmd-level
hugepage, but testing is done only for x86_64 and there're bugs for other archs.
So to avoid breaking such archs, this patch limits the availability strictly to
x86_64 until developers of other archs get interested in enabling this feature.

Simply disabling hugepage migration on non-x86_64 archs is not enough to fix
the reported problem where sys_move_pages() hits the BUG_ON() in
follow_page(FOLL_GET), so let's fix this by checking if hugepage migration is
supported in vma_migratable().

ChangeLog:
- add VM_HUGETLB check in vma_migratable()
- fix dependency in config ARCH_ENABLE_HUGEPAGE_MIGRATION
- remove comment on hugepage_migration_support()

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: stable@vger.kernel.org # 3.12+
---
 arch/arm/mm/hugetlbpage.c     |  5 -----
 arch/arm64/mm/hugetlbpage.c   |  5 -----
 arch/ia64/mm/hugetlbpage.c    |  5 -----
 arch/metag/mm/hugetlbpage.c   |  5 -----
 arch/mips/mm/hugetlbpage.c    |  5 -----
 arch/powerpc/mm/hugetlbpage.c | 10 ----------
 arch/s390/mm/hugetlbpage.c    |  5 -----
 arch/sh/mm/hugetlbpage.c      |  5 -----
 arch/sparc/mm/hugetlbpage.c   |  5 -----
 arch/tile/mm/hugetlbpage.c    |  5 -----
 arch/x86/Kconfig              |  4 ++++
 arch/x86/mm/hugetlbpage.c     | 10 ----------
 include/linux/hugetlb.h       | 13 +++++--------
 include/linux/mempolicy.h     |  6 ++++++
 mm/Kconfig                    |  3 +++
 15 files changed, 18 insertions(+), 73 deletions(-)

diff --git v3.15-rc5.orig/arch/arm/mm/hugetlbpage.c v3.15-rc5/arch/arm/mm/hugetlbpage.c
index 54ee6163c181..66781bf34077 100644
--- v3.15-rc5.orig/arch/arm/mm/hugetlbpage.c
+++ v3.15-rc5/arch/arm/mm/hugetlbpage.c
@@ -56,8 +56,3 @@ int pmd_huge(pmd_t pmd)
 {
 	return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
 }
-
-int pmd_huge_support(void)
-{
-	return 1;
-}
diff --git v3.15-rc5.orig/arch/arm64/mm/hugetlbpage.c v3.15-rc5/arch/arm64/mm/hugetlbpage.c
index 5e9aec358306..2fc8258bab2d 100644
--- v3.15-rc5.orig/arch/arm64/mm/hugetlbpage.c
+++ v3.15-rc5/arch/arm64/mm/hugetlbpage.c
@@ -54,11 +54,6 @@ int pud_huge(pud_t pud)
 	return !(pud_val(pud) & PUD_TABLE_BIT);
 }
 
-int pmd_huge_support(void)
-{
-	return 1;
-}
-
 static __init int setup_hugepagesz(char *opt)
 {
 	unsigned long ps = memparse(opt, &opt);
diff --git v3.15-rc5.orig/arch/ia64/mm/hugetlbpage.c v3.15-rc5/arch/ia64/mm/hugetlbpage.c
index 68232db98baa..76069c18ee42 100644
--- v3.15-rc5.orig/arch/ia64/mm/hugetlbpage.c
+++ v3.15-rc5/arch/ia64/mm/hugetlbpage.c
@@ -114,11 +114,6 @@ int pud_huge(pud_t pud)
 	return 0;
 }
 
-int pmd_huge_support(void)
-{
-	return 0;
-}
-
 struct page *
 follow_huge_pmd(struct mm_struct *mm, unsigned long address, pmd_t *pmd, int write)
 {
diff --git v3.15-rc5.orig/arch/metag/mm/hugetlbpage.c v3.15-rc5/arch/metag/mm/hugetlbpage.c
index 042431509b56..3c52fa6d0f8e 100644
--- v3.15-rc5.orig/arch/metag/mm/hugetlbpage.c
+++ v3.15-rc5/arch/metag/mm/hugetlbpage.c
@@ -110,11 +110,6 @@ int pud_huge(pud_t pud)
 	return 0;
 }
 
-int pmd_huge_support(void)
-{
-	return 1;
-}
-
 struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 			     pmd_t *pmd, int write)
 {
diff --git v3.15-rc5.orig/arch/mips/mm/hugetlbpage.c v3.15-rc5/arch/mips/mm/hugetlbpage.c
index 77e0ae036e7c..4ec8ee10d371 100644
--- v3.15-rc5.orig/arch/mips/mm/hugetlbpage.c
+++ v3.15-rc5/arch/mips/mm/hugetlbpage.c
@@ -84,11 +84,6 @@ int pud_huge(pud_t pud)
 	return (pud_val(pud) & _PAGE_HUGE) != 0;
 }
 
-int pmd_huge_support(void)
-{
-	return 1;
-}
-
 struct page *
 follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 		pmd_t *pmd, int write)
diff --git v3.15-rc5.orig/arch/powerpc/mm/hugetlbpage.c v3.15-rc5/arch/powerpc/mm/hugetlbpage.c
index eb923654ba80..7e70ae968e5f 100644
--- v3.15-rc5.orig/arch/powerpc/mm/hugetlbpage.c
+++ v3.15-rc5/arch/powerpc/mm/hugetlbpage.c
@@ -86,11 +86,6 @@ int pgd_huge(pgd_t pgd)
 	 */
 	return ((pgd_val(pgd) & 0x3) != 0x0);
 }
-
-int pmd_huge_support(void)
-{
-	return 1;
-}
 #else
 int pmd_huge(pmd_t pmd)
 {
@@ -106,11 +101,6 @@ int pgd_huge(pgd_t pgd)
 {
 	return 0;
 }
-
-int pmd_huge_support(void)
-{
-	return 0;
-}
 #endif
 
 pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
diff --git v3.15-rc5.orig/arch/s390/mm/hugetlbpage.c v3.15-rc5/arch/s390/mm/hugetlbpage.c
index 0727a55d87d9..0ff66a7e29bb 100644
--- v3.15-rc5.orig/arch/s390/mm/hugetlbpage.c
+++ v3.15-rc5/arch/s390/mm/hugetlbpage.c
@@ -220,11 +220,6 @@ int pud_huge(pud_t pud)
 	return 0;
 }
 
-int pmd_huge_support(void)
-{
-	return 1;
-}
-
 struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 			     pmd_t *pmdp, int write)
 {
diff --git v3.15-rc5.orig/arch/sh/mm/hugetlbpage.c v3.15-rc5/arch/sh/mm/hugetlbpage.c
index 0d676a41081e..d7762349ea48 100644
--- v3.15-rc5.orig/arch/sh/mm/hugetlbpage.c
+++ v3.15-rc5/arch/sh/mm/hugetlbpage.c
@@ -83,11 +83,6 @@ int pud_huge(pud_t pud)
 	return 0;
 }
 
-int pmd_huge_support(void)
-{
-	return 0;
-}
-
 struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 			     pmd_t *pmd, int write)
 {
diff --git v3.15-rc5.orig/arch/sparc/mm/hugetlbpage.c v3.15-rc5/arch/sparc/mm/hugetlbpage.c
index 9bd9ce80bf77..d329537739c6 100644
--- v3.15-rc5.orig/arch/sparc/mm/hugetlbpage.c
+++ v3.15-rc5/arch/sparc/mm/hugetlbpage.c
@@ -231,11 +231,6 @@ int pud_huge(pud_t pud)
 	return 0;
 }
 
-int pmd_huge_support(void)
-{
-	return 0;
-}
-
 struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 			     pmd_t *pmd, int write)
 {
diff --git v3.15-rc5.orig/arch/tile/mm/hugetlbpage.c v3.15-rc5/arch/tile/mm/hugetlbpage.c
index 0cb3bbaa580c..e514899e1100 100644
--- v3.15-rc5.orig/arch/tile/mm/hugetlbpage.c
+++ v3.15-rc5/arch/tile/mm/hugetlbpage.c
@@ -166,11 +166,6 @@ int pud_huge(pud_t pud)
 	return !!(pud_val(pud) & _PAGE_HUGE_PAGE);
 }
 
-int pmd_huge_support(void)
-{
-	return 1;
-}
-
 struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 			     pmd_t *pmd, int write)
 {
diff --git v3.15-rc5.orig/arch/x86/Kconfig v3.15-rc5/arch/x86/Kconfig
index 25d2c6f7325e..6b8b429c832f 100644
--- v3.15-rc5.orig/arch/x86/Kconfig
+++ v3.15-rc5/arch/x86/Kconfig
@@ -1871,6 +1871,10 @@ config ARCH_ENABLE_SPLIT_PMD_PTLOCK
 	def_bool y
 	depends on X86_64 || X86_PAE
 
+config ARCH_ENABLE_HUGEPAGE_MIGRATION
+	def_bool y
+	depends on X86_64 && HUGETLB_PAGE && MIGRATION
+
 menu "Power management and ACPI options"
 
 config ARCH_HIBERNATION_HEADER
diff --git v3.15-rc5.orig/arch/x86/mm/hugetlbpage.c v3.15-rc5/arch/x86/mm/hugetlbpage.c
index 8c9f647ff9e1..8b977ebf9388 100644
--- v3.15-rc5.orig/arch/x86/mm/hugetlbpage.c
+++ v3.15-rc5/arch/x86/mm/hugetlbpage.c
@@ -58,11 +58,6 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 {
 	return NULL;
 }
-
-int pmd_huge_support(void)
-{
-	return 0;
-}
 #else
 
 struct page *
@@ -80,11 +75,6 @@ int pud_huge(pud_t pud)
 {
 	return !!(pud_val(pud) & _PAGE_PSE);
 }
-
-int pmd_huge_support(void)
-{
-	return 1;
-}
 #endif
 
 #ifdef CONFIG_HUGETLB_PAGE
diff --git v3.15-rc5.orig/include/linux/hugetlb.h v3.15-rc5/include/linux/hugetlb.h
index 63214868c5b2..c9de64cf288d 100644
--- v3.15-rc5.orig/include/linux/hugetlb.h
+++ v3.15-rc5/include/linux/hugetlb.h
@@ -385,15 +385,13 @@ static inline pgoff_t basepage_index(struct page *page)
 
 extern void dissolve_free_huge_pages(unsigned long start_pfn,
 				     unsigned long end_pfn);
-int pmd_huge_support(void);
-/*
- * Currently hugepage migration is enabled only for pmd-based hugepage.
- * This function will be updated when hugepage migration is more widely
- * supported.
- */
 static inline int hugepage_migration_support(struct hstate *h)
 {
-	return pmd_huge_support() && (huge_page_shift(h) == PMD_SHIFT);
+#ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
+	return huge_page_shift(h) == PMD_SHIFT;
+#else
+	return 0;
+#endif
 }
 
 static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
@@ -443,7 +441,6 @@ static inline pgoff_t basepage_index(struct page *page)
 	return page->index;
 }
 #define dissolve_free_huge_pages(s, e)	do {} while (0)
-#define pmd_huge_support()	0
 #define hugepage_migration_support(h)	0
 
 static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
diff --git v3.15-rc5.orig/include/linux/mempolicy.h v3.15-rc5/include/linux/mempolicy.h
index 3c1b968da0ca..f230a978e6ba 100644
--- v3.15-rc5.orig/include/linux/mempolicy.h
+++ v3.15-rc5/include/linux/mempolicy.h
@@ -175,6 +175,12 @@ static inline int vma_migratable(struct vm_area_struct *vma)
 {
 	if (vma->vm_flags & (VM_IO | VM_PFNMAP))
 		return 0;
+
+#ifndef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
+	if (vma->vm_flags & VM_HUGETLB)
+		return 0;
+#endif
+
 	/*
 	 * Migration allocates pages in the highest zone. If we cannot
 	 * do so then migration (at least from node to node) is not
diff --git v3.15-rc5.orig/mm/Kconfig v3.15-rc5/mm/Kconfig
index ebe5880c29d6..1e22701c972b 100644
--- v3.15-rc5.orig/mm/Kconfig
+++ v3.15-rc5/mm/Kconfig
@@ -264,6 +264,9 @@ config MIGRATION
 	  pages as migration can relocate pages to satisfy a huge page
 	  allocation instead of reclaiming.
 
+config ARCH_ENABLE_HUGEPAGE_MIGRATION
+	boolean
+
 config PHYS_ADDR_T_64BIT
 	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
 
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/2] hugetlb: rename hugepage_migration_support() to ..._supported()
  2014-05-30  3:04         ` Naoya Horiguchi
@ 2014-05-30  4:13             ` Naoya Horiguchi
  2014-05-30  4:13             ` Naoya Horiguchi
  1 sibling, 0 replies; 28+ messages in thread
From: Naoya Horiguchi @ 2014-05-30  4:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Michael Ellerman, Benjamin Herrenschmidt,
	Tony Luck, linux-kernel, linux-mm, trinity

We already have a function named hugepage_supported(), and the similar
name hugepage_migration_support() is a bit unconfortable, so let's rename
it hugepage_migration_supported().

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/hugetlb.h | 4 ++--
 mm/hugetlb.c            | 2 +-
 mm/migrate.c            | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git v3.15-rc5.orig/include/linux/hugetlb.h v3.15-rc5/include/linux/hugetlb.h
index c9de64cf288d..9d35e514312b 100644
--- v3.15-rc5.orig/include/linux/hugetlb.h
+++ v3.15-rc5/include/linux/hugetlb.h
@@ -385,7 +385,7 @@ static inline pgoff_t basepage_index(struct page *page)
 
 extern void dissolve_free_huge_pages(unsigned long start_pfn,
 				     unsigned long end_pfn);
-static inline int hugepage_migration_support(struct hstate *h)
+static inline int hugepage_migration_supported(struct hstate *h)
 {
 #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
 	return huge_page_shift(h) == PMD_SHIFT;
@@ -441,7 +441,7 @@ static inline pgoff_t basepage_index(struct page *page)
 	return page->index;
 }
 #define dissolve_free_huge_pages(s, e)	do {} while (0)
-#define hugepage_migration_support(h)	0
+#define hugepage_migration_supported(h)	0
 
 static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
 					   struct mm_struct *mm, pte_t *pte)
diff --git v3.15-rc5.orig/mm/hugetlb.c v3.15-rc5/mm/hugetlb.c
index ea42b584661a..83d936d12c1d 100644
--- v3.15-rc5.orig/mm/hugetlb.c
+++ v3.15-rc5/mm/hugetlb.c
@@ -545,7 +545,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
 /* Movability of hugepages depends on migration support. */
 static inline gfp_t htlb_alloc_mask(struct hstate *h)
 {
-	if (hugepages_treat_as_movable || hugepage_migration_support(h))
+	if (hugepages_treat_as_movable || hugepage_migration_supported(h))
 		return GFP_HIGHUSER_MOVABLE;
 	else
 		return GFP_HIGHUSER;
diff --git v3.15-rc5.orig/mm/migrate.c v3.15-rc5/mm/migrate.c
index bed48809e5d0..15b589ae6aaf 100644
--- v3.15-rc5.orig/mm/migrate.c
+++ v3.15-rc5/mm/migrate.c
@@ -1031,7 +1031,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 	 * tables or check whether the hugepage is pmd-based or not before
 	 * kicking migration.
 	 */
-	if (!hugepage_migration_support(page_hstate(hpage))) {
+	if (!hugepage_migration_supported(page_hstate(hpage))) {
 		putback_active_hugepage(hpage);
 		return -ENOSYS;
 	}
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/2] hugetlb: rename hugepage_migration_support() to ..._supported()
@ 2014-05-30  4:13             ` Naoya Horiguchi
  0 siblings, 0 replies; 28+ messages in thread
From: Naoya Horiguchi @ 2014-05-30  4:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Michael Ellerman, Benjamin Herrenschmidt,
	Tony Luck, linux-kernel, linux-mm, trinity

We already have a function named hugepage_supported(), and the similar
name hugepage_migration_support() is a bit unconfortable, so let's rename
it hugepage_migration_supported().

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/hugetlb.h | 4 ++--
 mm/hugetlb.c            | 2 +-
 mm/migrate.c            | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git v3.15-rc5.orig/include/linux/hugetlb.h v3.15-rc5/include/linux/hugetlb.h
index c9de64cf288d..9d35e514312b 100644
--- v3.15-rc5.orig/include/linux/hugetlb.h
+++ v3.15-rc5/include/linux/hugetlb.h
@@ -385,7 +385,7 @@ static inline pgoff_t basepage_index(struct page *page)
 
 extern void dissolve_free_huge_pages(unsigned long start_pfn,
 				     unsigned long end_pfn);
-static inline int hugepage_migration_support(struct hstate *h)
+static inline int hugepage_migration_supported(struct hstate *h)
 {
 #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
 	return huge_page_shift(h) == PMD_SHIFT;
@@ -441,7 +441,7 @@ static inline pgoff_t basepage_index(struct page *page)
 	return page->index;
 }
 #define dissolve_free_huge_pages(s, e)	do {} while (0)
-#define hugepage_migration_support(h)	0
+#define hugepage_migration_supported(h)	0
 
 static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
 					   struct mm_struct *mm, pte_t *pte)
diff --git v3.15-rc5.orig/mm/hugetlb.c v3.15-rc5/mm/hugetlb.c
index ea42b584661a..83d936d12c1d 100644
--- v3.15-rc5.orig/mm/hugetlb.c
+++ v3.15-rc5/mm/hugetlb.c
@@ -545,7 +545,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
 /* Movability of hugepages depends on migration support. */
 static inline gfp_t htlb_alloc_mask(struct hstate *h)
 {
-	if (hugepages_treat_as_movable || hugepage_migration_support(h))
+	if (hugepages_treat_as_movable || hugepage_migration_supported(h))
 		return GFP_HIGHUSER_MOVABLE;
 	else
 		return GFP_HIGHUSER;
diff --git v3.15-rc5.orig/mm/migrate.c v3.15-rc5/mm/migrate.c
index bed48809e5d0..15b589ae6aaf 100644
--- v3.15-rc5.orig/mm/migrate.c
+++ v3.15-rc5/mm/migrate.c
@@ -1031,7 +1031,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 	 * tables or check whether the hugepage is pmd-based or not before
 	 * kicking migration.
 	 */
-	if (!hugepage_migration_support(page_hstate(hpage))) {
+	if (!hugepage_migration_supported(page_hstate(hpage))) {
 		putback_active_hugepage(hpage);
 		return -ENOSYS;
 	}
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] hugetlb: restrict hugepage_migration_support() to x86_64
  2014-05-30  4:13             ` Naoya Horiguchi
@ 2014-05-30 12:00               ` Hugh Dickins
  -1 siblings, 0 replies; 28+ messages in thread
From: Hugh Dickins @ 2014-05-30 12:00 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Hugh Dickins, Michael Ellerman,
	Benjamin Herrenschmidt, Tony Luck, linux-kernel, linux-mm,
	trinity

On Fri, 30 May 2014, Naoya Horiguchi wrote:

> Curretly hugepage migration is available for all archs which support pmd-level
> hugepage, but testing is done only for x86_64 and there're bugs for other archs.
> So to avoid breaking such archs, this patch limits the availability strictly to
> x86_64 until developers of other archs get interested in enabling this feature.
> 
> Simply disabling hugepage migration on non-x86_64 archs is not enough to fix
> the reported problem where sys_move_pages() hits the BUG_ON() in
> follow_page(FOLL_GET), so let's fix this by checking if hugepage migration is
> supported in vma_migratable().
> 
> ChangeLog:
> - add VM_HUGETLB check in vma_migratable()
> - fix dependency in config ARCH_ENABLE_HUGEPAGE_MIGRATION
> - remove comment on hugepage_migration_support()
> 
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Tested-by: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: stable@vger.kernel.org # 3.12+

Acked-by: Hugh Dickins <hughd@google.com>

> ---
>  arch/arm/mm/hugetlbpage.c     |  5 -----
>  arch/arm64/mm/hugetlbpage.c   |  5 -----
>  arch/ia64/mm/hugetlbpage.c    |  5 -----
>  arch/metag/mm/hugetlbpage.c   |  5 -----
>  arch/mips/mm/hugetlbpage.c    |  5 -----
>  arch/powerpc/mm/hugetlbpage.c | 10 ----------
>  arch/s390/mm/hugetlbpage.c    |  5 -----
>  arch/sh/mm/hugetlbpage.c      |  5 -----
>  arch/sparc/mm/hugetlbpage.c   |  5 -----
>  arch/tile/mm/hugetlbpage.c    |  5 -----
>  arch/x86/Kconfig              |  4 ++++
>  arch/x86/mm/hugetlbpage.c     | 10 ----------
>  include/linux/hugetlb.h       | 13 +++++--------
>  include/linux/mempolicy.h     |  6 ++++++
>  mm/Kconfig                    |  3 +++
>  15 files changed, 18 insertions(+), 73 deletions(-)
> 
> diff --git v3.15-rc5.orig/arch/arm/mm/hugetlbpage.c v3.15-rc5/arch/arm/mm/hugetlbpage.c
> index 54ee6163c181..66781bf34077 100644
> --- v3.15-rc5.orig/arch/arm/mm/hugetlbpage.c
> +++ v3.15-rc5/arch/arm/mm/hugetlbpage.c
> @@ -56,8 +56,3 @@ int pmd_huge(pmd_t pmd)
>  {
>  	return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> diff --git v3.15-rc5.orig/arch/arm64/mm/hugetlbpage.c v3.15-rc5/arch/arm64/mm/hugetlbpage.c
> index 5e9aec358306..2fc8258bab2d 100644
> --- v3.15-rc5.orig/arch/arm64/mm/hugetlbpage.c
> +++ v3.15-rc5/arch/arm64/mm/hugetlbpage.c
> @@ -54,11 +54,6 @@ int pud_huge(pud_t pud)
>  	return !(pud_val(pud) & PUD_TABLE_BIT);
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  static __init int setup_hugepagesz(char *opt)
>  {
>  	unsigned long ps = memparse(opt, &opt);
> diff --git v3.15-rc5.orig/arch/ia64/mm/hugetlbpage.c v3.15-rc5/arch/ia64/mm/hugetlbpage.c
> index 68232db98baa..76069c18ee42 100644
> --- v3.15-rc5.orig/arch/ia64/mm/hugetlbpage.c
> +++ v3.15-rc5/arch/ia64/mm/hugetlbpage.c
> @@ -114,11 +114,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
> -
>  struct page *
>  follow_huge_pmd(struct mm_struct *mm, unsigned long address, pmd_t *pmd, int write)
>  {
> diff --git v3.15-rc5.orig/arch/metag/mm/hugetlbpage.c v3.15-rc5/arch/metag/mm/hugetlbpage.c
> index 042431509b56..3c52fa6d0f8e 100644
> --- v3.15-rc5.orig/arch/metag/mm/hugetlbpage.c
> +++ v3.15-rc5/arch/metag/mm/hugetlbpage.c
> @@ -110,11 +110,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmd, int write)
>  {
> diff --git v3.15-rc5.orig/arch/mips/mm/hugetlbpage.c v3.15-rc5/arch/mips/mm/hugetlbpage.c
> index 77e0ae036e7c..4ec8ee10d371 100644
> --- v3.15-rc5.orig/arch/mips/mm/hugetlbpage.c
> +++ v3.15-rc5/arch/mips/mm/hugetlbpage.c
> @@ -84,11 +84,6 @@ int pud_huge(pud_t pud)
>  	return (pud_val(pud) & _PAGE_HUGE) != 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  struct page *
>  follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  		pmd_t *pmd, int write)
> diff --git v3.15-rc5.orig/arch/powerpc/mm/hugetlbpage.c v3.15-rc5/arch/powerpc/mm/hugetlbpage.c
> index eb923654ba80..7e70ae968e5f 100644
> --- v3.15-rc5.orig/arch/powerpc/mm/hugetlbpage.c
> +++ v3.15-rc5/arch/powerpc/mm/hugetlbpage.c
> @@ -86,11 +86,6 @@ int pgd_huge(pgd_t pgd)
>  	 */
>  	return ((pgd_val(pgd) & 0x3) != 0x0);
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
>  #else
>  int pmd_huge(pmd_t pmd)
>  {
> @@ -106,11 +101,6 @@ int pgd_huge(pgd_t pgd)
>  {
>  	return 0;
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
>  #endif
>  
>  pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
> diff --git v3.15-rc5.orig/arch/s390/mm/hugetlbpage.c v3.15-rc5/arch/s390/mm/hugetlbpage.c
> index 0727a55d87d9..0ff66a7e29bb 100644
> --- v3.15-rc5.orig/arch/s390/mm/hugetlbpage.c
> +++ v3.15-rc5/arch/s390/mm/hugetlbpage.c
> @@ -220,11 +220,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmdp, int write)
>  {
> diff --git v3.15-rc5.orig/arch/sh/mm/hugetlbpage.c v3.15-rc5/arch/sh/mm/hugetlbpage.c
> index 0d676a41081e..d7762349ea48 100644
> --- v3.15-rc5.orig/arch/sh/mm/hugetlbpage.c
> +++ v3.15-rc5/arch/sh/mm/hugetlbpage.c
> @@ -83,11 +83,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmd, int write)
>  {
> diff --git v3.15-rc5.orig/arch/sparc/mm/hugetlbpage.c v3.15-rc5/arch/sparc/mm/hugetlbpage.c
> index 9bd9ce80bf77..d329537739c6 100644
> --- v3.15-rc5.orig/arch/sparc/mm/hugetlbpage.c
> +++ v3.15-rc5/arch/sparc/mm/hugetlbpage.c
> @@ -231,11 +231,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmd, int write)
>  {
> diff --git v3.15-rc5.orig/arch/tile/mm/hugetlbpage.c v3.15-rc5/arch/tile/mm/hugetlbpage.c
> index 0cb3bbaa580c..e514899e1100 100644
> --- v3.15-rc5.orig/arch/tile/mm/hugetlbpage.c
> +++ v3.15-rc5/arch/tile/mm/hugetlbpage.c
> @@ -166,11 +166,6 @@ int pud_huge(pud_t pud)
>  	return !!(pud_val(pud) & _PAGE_HUGE_PAGE);
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmd, int write)
>  {
> diff --git v3.15-rc5.orig/arch/x86/Kconfig v3.15-rc5/arch/x86/Kconfig
> index 25d2c6f7325e..6b8b429c832f 100644
> --- v3.15-rc5.orig/arch/x86/Kconfig
> +++ v3.15-rc5/arch/x86/Kconfig
> @@ -1871,6 +1871,10 @@ config ARCH_ENABLE_SPLIT_PMD_PTLOCK
>  	def_bool y
>  	depends on X86_64 || X86_PAE
>  
> +config ARCH_ENABLE_HUGEPAGE_MIGRATION
> +	def_bool y
> +	depends on X86_64 && HUGETLB_PAGE && MIGRATION
> +
>  menu "Power management and ACPI options"
>  
>  config ARCH_HIBERNATION_HEADER
> diff --git v3.15-rc5.orig/arch/x86/mm/hugetlbpage.c v3.15-rc5/arch/x86/mm/hugetlbpage.c
> index 8c9f647ff9e1..8b977ebf9388 100644
> --- v3.15-rc5.orig/arch/x86/mm/hugetlbpage.c
> +++ v3.15-rc5/arch/x86/mm/hugetlbpage.c
> @@ -58,11 +58,6 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  {
>  	return NULL;
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
>  #else
>  
>  struct page *
> @@ -80,11 +75,6 @@ int pud_huge(pud_t pud)
>  {
>  	return !!(pud_val(pud) & _PAGE_PSE);
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
>  #endif
>  
>  #ifdef CONFIG_HUGETLB_PAGE
> diff --git v3.15-rc5.orig/include/linux/hugetlb.h v3.15-rc5/include/linux/hugetlb.h
> index 63214868c5b2..c9de64cf288d 100644
> --- v3.15-rc5.orig/include/linux/hugetlb.h
> +++ v3.15-rc5/include/linux/hugetlb.h
> @@ -385,15 +385,13 @@ static inline pgoff_t basepage_index(struct page *page)
>  
>  extern void dissolve_free_huge_pages(unsigned long start_pfn,
>  				     unsigned long end_pfn);
> -int pmd_huge_support(void);
> -/*
> - * Currently hugepage migration is enabled only for pmd-based hugepage.
> - * This function will be updated when hugepage migration is more widely
> - * supported.
> - */
>  static inline int hugepage_migration_support(struct hstate *h)
>  {
> -	return pmd_huge_support() && (huge_page_shift(h) == PMD_SHIFT);
> +#ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
> +	return huge_page_shift(h) == PMD_SHIFT;
> +#else
> +	return 0;
> +#endif
>  }
>  
>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
> @@ -443,7 +441,6 @@ static inline pgoff_t basepage_index(struct page *page)
>  	return page->index;
>  }
>  #define dissolve_free_huge_pages(s, e)	do {} while (0)
> -#define pmd_huge_support()	0
>  #define hugepage_migration_support(h)	0
>  
>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
> diff --git v3.15-rc5.orig/include/linux/mempolicy.h v3.15-rc5/include/linux/mempolicy.h
> index 3c1b968da0ca..f230a978e6ba 100644
> --- v3.15-rc5.orig/include/linux/mempolicy.h
> +++ v3.15-rc5/include/linux/mempolicy.h
> @@ -175,6 +175,12 @@ static inline int vma_migratable(struct vm_area_struct *vma)
>  {
>  	if (vma->vm_flags & (VM_IO | VM_PFNMAP))
>  		return 0;
> +
> +#ifndef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
> +	if (vma->vm_flags & VM_HUGETLB)
> +		return 0;
> +#endif
> +
>  	/*
>  	 * Migration allocates pages in the highest zone. If we cannot
>  	 * do so then migration (at least from node to node) is not
> diff --git v3.15-rc5.orig/mm/Kconfig v3.15-rc5/mm/Kconfig
> index ebe5880c29d6..1e22701c972b 100644
> --- v3.15-rc5.orig/mm/Kconfig
> +++ v3.15-rc5/mm/Kconfig
> @@ -264,6 +264,9 @@ config MIGRATION
>  	  pages as migration can relocate pages to satisfy a huge page
>  	  allocation instead of reclaiming.
>  
> +config ARCH_ENABLE_HUGEPAGE_MIGRATION
> +	boolean
> +
>  config PHYS_ADDR_T_64BIT
>  	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
>  
> -- 
> 1.9.3
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] hugetlb: restrict hugepage_migration_support() to x86_64
@ 2014-05-30 12:00               ` Hugh Dickins
  0 siblings, 0 replies; 28+ messages in thread
From: Hugh Dickins @ 2014-05-30 12:00 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Hugh Dickins, Michael Ellerman,
	Benjamin Herrenschmidt, Tony Luck, linux-kernel, linux-mm,
	trinity

On Fri, 30 May 2014, Naoya Horiguchi wrote:

> Curretly hugepage migration is available for all archs which support pmd-level
> hugepage, but testing is done only for x86_64 and there're bugs for other archs.
> So to avoid breaking such archs, this patch limits the availability strictly to
> x86_64 until developers of other archs get interested in enabling this feature.
> 
> Simply disabling hugepage migration on non-x86_64 archs is not enough to fix
> the reported problem where sys_move_pages() hits the BUG_ON() in
> follow_page(FOLL_GET), so let's fix this by checking if hugepage migration is
> supported in vma_migratable().
> 
> ChangeLog:
> - add VM_HUGETLB check in vma_migratable()
> - fix dependency in config ARCH_ENABLE_HUGEPAGE_MIGRATION
> - remove comment on hugepage_migration_support()
> 
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Tested-by: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: stable@vger.kernel.org # 3.12+

Acked-by: Hugh Dickins <hughd@google.com>

> ---
>  arch/arm/mm/hugetlbpage.c     |  5 -----
>  arch/arm64/mm/hugetlbpage.c   |  5 -----
>  arch/ia64/mm/hugetlbpage.c    |  5 -----
>  arch/metag/mm/hugetlbpage.c   |  5 -----
>  arch/mips/mm/hugetlbpage.c    |  5 -----
>  arch/powerpc/mm/hugetlbpage.c | 10 ----------
>  arch/s390/mm/hugetlbpage.c    |  5 -----
>  arch/sh/mm/hugetlbpage.c      |  5 -----
>  arch/sparc/mm/hugetlbpage.c   |  5 -----
>  arch/tile/mm/hugetlbpage.c    |  5 -----
>  arch/x86/Kconfig              |  4 ++++
>  arch/x86/mm/hugetlbpage.c     | 10 ----------
>  include/linux/hugetlb.h       | 13 +++++--------
>  include/linux/mempolicy.h     |  6 ++++++
>  mm/Kconfig                    |  3 +++
>  15 files changed, 18 insertions(+), 73 deletions(-)
> 
> diff --git v3.15-rc5.orig/arch/arm/mm/hugetlbpage.c v3.15-rc5/arch/arm/mm/hugetlbpage.c
> index 54ee6163c181..66781bf34077 100644
> --- v3.15-rc5.orig/arch/arm/mm/hugetlbpage.c
> +++ v3.15-rc5/arch/arm/mm/hugetlbpage.c
> @@ -56,8 +56,3 @@ int pmd_huge(pmd_t pmd)
>  {
>  	return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> diff --git v3.15-rc5.orig/arch/arm64/mm/hugetlbpage.c v3.15-rc5/arch/arm64/mm/hugetlbpage.c
> index 5e9aec358306..2fc8258bab2d 100644
> --- v3.15-rc5.orig/arch/arm64/mm/hugetlbpage.c
> +++ v3.15-rc5/arch/arm64/mm/hugetlbpage.c
> @@ -54,11 +54,6 @@ int pud_huge(pud_t pud)
>  	return !(pud_val(pud) & PUD_TABLE_BIT);
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  static __init int setup_hugepagesz(char *opt)
>  {
>  	unsigned long ps = memparse(opt, &opt);
> diff --git v3.15-rc5.orig/arch/ia64/mm/hugetlbpage.c v3.15-rc5/arch/ia64/mm/hugetlbpage.c
> index 68232db98baa..76069c18ee42 100644
> --- v3.15-rc5.orig/arch/ia64/mm/hugetlbpage.c
> +++ v3.15-rc5/arch/ia64/mm/hugetlbpage.c
> @@ -114,11 +114,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
> -
>  struct page *
>  follow_huge_pmd(struct mm_struct *mm, unsigned long address, pmd_t *pmd, int write)
>  {
> diff --git v3.15-rc5.orig/arch/metag/mm/hugetlbpage.c v3.15-rc5/arch/metag/mm/hugetlbpage.c
> index 042431509b56..3c52fa6d0f8e 100644
> --- v3.15-rc5.orig/arch/metag/mm/hugetlbpage.c
> +++ v3.15-rc5/arch/metag/mm/hugetlbpage.c
> @@ -110,11 +110,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmd, int write)
>  {
> diff --git v3.15-rc5.orig/arch/mips/mm/hugetlbpage.c v3.15-rc5/arch/mips/mm/hugetlbpage.c
> index 77e0ae036e7c..4ec8ee10d371 100644
> --- v3.15-rc5.orig/arch/mips/mm/hugetlbpage.c
> +++ v3.15-rc5/arch/mips/mm/hugetlbpage.c
> @@ -84,11 +84,6 @@ int pud_huge(pud_t pud)
>  	return (pud_val(pud) & _PAGE_HUGE) != 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  struct page *
>  follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  		pmd_t *pmd, int write)
> diff --git v3.15-rc5.orig/arch/powerpc/mm/hugetlbpage.c v3.15-rc5/arch/powerpc/mm/hugetlbpage.c
> index eb923654ba80..7e70ae968e5f 100644
> --- v3.15-rc5.orig/arch/powerpc/mm/hugetlbpage.c
> +++ v3.15-rc5/arch/powerpc/mm/hugetlbpage.c
> @@ -86,11 +86,6 @@ int pgd_huge(pgd_t pgd)
>  	 */
>  	return ((pgd_val(pgd) & 0x3) != 0x0);
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
>  #else
>  int pmd_huge(pmd_t pmd)
>  {
> @@ -106,11 +101,6 @@ int pgd_huge(pgd_t pgd)
>  {
>  	return 0;
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
>  #endif
>  
>  pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
> diff --git v3.15-rc5.orig/arch/s390/mm/hugetlbpage.c v3.15-rc5/arch/s390/mm/hugetlbpage.c
> index 0727a55d87d9..0ff66a7e29bb 100644
> --- v3.15-rc5.orig/arch/s390/mm/hugetlbpage.c
> +++ v3.15-rc5/arch/s390/mm/hugetlbpage.c
> @@ -220,11 +220,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmdp, int write)
>  {
> diff --git v3.15-rc5.orig/arch/sh/mm/hugetlbpage.c v3.15-rc5/arch/sh/mm/hugetlbpage.c
> index 0d676a41081e..d7762349ea48 100644
> --- v3.15-rc5.orig/arch/sh/mm/hugetlbpage.c
> +++ v3.15-rc5/arch/sh/mm/hugetlbpage.c
> @@ -83,11 +83,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmd, int write)
>  {
> diff --git v3.15-rc5.orig/arch/sparc/mm/hugetlbpage.c v3.15-rc5/arch/sparc/mm/hugetlbpage.c
> index 9bd9ce80bf77..d329537739c6 100644
> --- v3.15-rc5.orig/arch/sparc/mm/hugetlbpage.c
> +++ v3.15-rc5/arch/sparc/mm/hugetlbpage.c
> @@ -231,11 +231,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmd, int write)
>  {
> diff --git v3.15-rc5.orig/arch/tile/mm/hugetlbpage.c v3.15-rc5/arch/tile/mm/hugetlbpage.c
> index 0cb3bbaa580c..e514899e1100 100644
> --- v3.15-rc5.orig/arch/tile/mm/hugetlbpage.c
> +++ v3.15-rc5/arch/tile/mm/hugetlbpage.c
> @@ -166,11 +166,6 @@ int pud_huge(pud_t pud)
>  	return !!(pud_val(pud) & _PAGE_HUGE_PAGE);
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmd, int write)
>  {
> diff --git v3.15-rc5.orig/arch/x86/Kconfig v3.15-rc5/arch/x86/Kconfig
> index 25d2c6f7325e..6b8b429c832f 100644
> --- v3.15-rc5.orig/arch/x86/Kconfig
> +++ v3.15-rc5/arch/x86/Kconfig
> @@ -1871,6 +1871,10 @@ config ARCH_ENABLE_SPLIT_PMD_PTLOCK
>  	def_bool y
>  	depends on X86_64 || X86_PAE
>  
> +config ARCH_ENABLE_HUGEPAGE_MIGRATION
> +	def_bool y
> +	depends on X86_64 && HUGETLB_PAGE && MIGRATION
> +
>  menu "Power management and ACPI options"
>  
>  config ARCH_HIBERNATION_HEADER
> diff --git v3.15-rc5.orig/arch/x86/mm/hugetlbpage.c v3.15-rc5/arch/x86/mm/hugetlbpage.c
> index 8c9f647ff9e1..8b977ebf9388 100644
> --- v3.15-rc5.orig/arch/x86/mm/hugetlbpage.c
> +++ v3.15-rc5/arch/x86/mm/hugetlbpage.c
> @@ -58,11 +58,6 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  {
>  	return NULL;
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
>  #else
>  
>  struct page *
> @@ -80,11 +75,6 @@ int pud_huge(pud_t pud)
>  {
>  	return !!(pud_val(pud) & _PAGE_PSE);
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
>  #endif
>  
>  #ifdef CONFIG_HUGETLB_PAGE
> diff --git v3.15-rc5.orig/include/linux/hugetlb.h v3.15-rc5/include/linux/hugetlb.h
> index 63214868c5b2..c9de64cf288d 100644
> --- v3.15-rc5.orig/include/linux/hugetlb.h
> +++ v3.15-rc5/include/linux/hugetlb.h
> @@ -385,15 +385,13 @@ static inline pgoff_t basepage_index(struct page *page)
>  
>  extern void dissolve_free_huge_pages(unsigned long start_pfn,
>  				     unsigned long end_pfn);
> -int pmd_huge_support(void);
> -/*
> - * Currently hugepage migration is enabled only for pmd-based hugepage.
> - * This function will be updated when hugepage migration is more widely
> - * supported.
> - */
>  static inline int hugepage_migration_support(struct hstate *h)
>  {
> -	return pmd_huge_support() && (huge_page_shift(h) == PMD_SHIFT);
> +#ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
> +	return huge_page_shift(h) == PMD_SHIFT;
> +#else
> +	return 0;
> +#endif
>  }
>  
>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
> @@ -443,7 +441,6 @@ static inline pgoff_t basepage_index(struct page *page)
>  	return page->index;
>  }
>  #define dissolve_free_huge_pages(s, e)	do {} while (0)
> -#define pmd_huge_support()	0
>  #define hugepage_migration_support(h)	0
>  
>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
> diff --git v3.15-rc5.orig/include/linux/mempolicy.h v3.15-rc5/include/linux/mempolicy.h
> index 3c1b968da0ca..f230a978e6ba 100644
> --- v3.15-rc5.orig/include/linux/mempolicy.h
> +++ v3.15-rc5/include/linux/mempolicy.h
> @@ -175,6 +175,12 @@ static inline int vma_migratable(struct vm_area_struct *vma)
>  {
>  	if (vma->vm_flags & (VM_IO | VM_PFNMAP))
>  		return 0;
> +
> +#ifndef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
> +	if (vma->vm_flags & VM_HUGETLB)
> +		return 0;
> +#endif
> +
>  	/*
>  	 * Migration allocates pages in the highest zone. If we cannot
>  	 * do so then migration (at least from node to node) is not
> diff --git v3.15-rc5.orig/mm/Kconfig v3.15-rc5/mm/Kconfig
> index ebe5880c29d6..1e22701c972b 100644
> --- v3.15-rc5.orig/mm/Kconfig
> +++ v3.15-rc5/mm/Kconfig
> @@ -264,6 +264,9 @@ config MIGRATION
>  	  pages as migration can relocate pages to satisfy a huge page
>  	  allocation instead of reclaiming.
>  
> +config ARCH_ENABLE_HUGEPAGE_MIGRATION
> +	boolean
> +
>  config PHYS_ADDR_T_64BIT
>  	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
>  
> -- 
> 1.9.3
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] hugetlb: rename hugepage_migration_support() to ..._supported()
  2014-05-30  4:13             ` Naoya Horiguchi
@ 2014-05-30 12:02               ` Hugh Dickins
  -1 siblings, 0 replies; 28+ messages in thread
From: Hugh Dickins @ 2014-05-30 12:02 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Hugh Dickins, Michael Ellerman,
	Benjamin Herrenschmidt, Tony Luck, linux-kernel, linux-mm,
	trinity

On Fri, 30 May 2014, Naoya Horiguchi wrote:

> We already have a function named hugepage_supported(), and the similar

hugepages_supported()

> name hugepage_migration_support() is a bit unconfortable, so let's rename
> it hugepage_migration_supported().
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Acked-by: Hugh Dickins <hughd@google.com>

> ---
>  include/linux/hugetlb.h | 4 ++--
>  mm/hugetlb.c            | 2 +-
>  mm/migrate.c            | 2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git v3.15-rc5.orig/include/linux/hugetlb.h v3.15-rc5/include/linux/hugetlb.h
> index c9de64cf288d..9d35e514312b 100644
> --- v3.15-rc5.orig/include/linux/hugetlb.h
> +++ v3.15-rc5/include/linux/hugetlb.h
> @@ -385,7 +385,7 @@ static inline pgoff_t basepage_index(struct page *page)
>  
>  extern void dissolve_free_huge_pages(unsigned long start_pfn,
>  				     unsigned long end_pfn);
> -static inline int hugepage_migration_support(struct hstate *h)
> +static inline int hugepage_migration_supported(struct hstate *h)
>  {
>  #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
>  	return huge_page_shift(h) == PMD_SHIFT;
> @@ -441,7 +441,7 @@ static inline pgoff_t basepage_index(struct page *page)
>  	return page->index;
>  }
>  #define dissolve_free_huge_pages(s, e)	do {} while (0)
> -#define hugepage_migration_support(h)	0
> +#define hugepage_migration_supported(h)	0
>  
>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
>  					   struct mm_struct *mm, pte_t *pte)
> diff --git v3.15-rc5.orig/mm/hugetlb.c v3.15-rc5/mm/hugetlb.c
> index ea42b584661a..83d936d12c1d 100644
> --- v3.15-rc5.orig/mm/hugetlb.c
> +++ v3.15-rc5/mm/hugetlb.c
> @@ -545,7 +545,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
>  /* Movability of hugepages depends on migration support. */
>  static inline gfp_t htlb_alloc_mask(struct hstate *h)
>  {
> -	if (hugepages_treat_as_movable || hugepage_migration_support(h))
> +	if (hugepages_treat_as_movable || hugepage_migration_supported(h))
>  		return GFP_HIGHUSER_MOVABLE;
>  	else
>  		return GFP_HIGHUSER;
> diff --git v3.15-rc5.orig/mm/migrate.c v3.15-rc5/mm/migrate.c
> index bed48809e5d0..15b589ae6aaf 100644
> --- v3.15-rc5.orig/mm/migrate.c
> +++ v3.15-rc5/mm/migrate.c
> @@ -1031,7 +1031,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>  	 * tables or check whether the hugepage is pmd-based or not before
>  	 * kicking migration.
>  	 */
> -	if (!hugepage_migration_support(page_hstate(hpage))) {
> +	if (!hugepage_migration_supported(page_hstate(hpage))) {
>  		putback_active_hugepage(hpage);
>  		return -ENOSYS;
>  	}
> -- 
> 1.9.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] hugetlb: rename hugepage_migration_support() to ..._supported()
@ 2014-05-30 12:02               ` Hugh Dickins
  0 siblings, 0 replies; 28+ messages in thread
From: Hugh Dickins @ 2014-05-30 12:02 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Hugh Dickins, Michael Ellerman,
	Benjamin Herrenschmidt, Tony Luck, linux-kernel, linux-mm,
	trinity

On Fri, 30 May 2014, Naoya Horiguchi wrote:

> We already have a function named hugepage_supported(), and the similar

hugepages_supported()

> name hugepage_migration_support() is a bit unconfortable, so let's rename
> it hugepage_migration_supported().
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Acked-by: Hugh Dickins <hughd@google.com>

> ---
>  include/linux/hugetlb.h | 4 ++--
>  mm/hugetlb.c            | 2 +-
>  mm/migrate.c            | 2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git v3.15-rc5.orig/include/linux/hugetlb.h v3.15-rc5/include/linux/hugetlb.h
> index c9de64cf288d..9d35e514312b 100644
> --- v3.15-rc5.orig/include/linux/hugetlb.h
> +++ v3.15-rc5/include/linux/hugetlb.h
> @@ -385,7 +385,7 @@ static inline pgoff_t basepage_index(struct page *page)
>  
>  extern void dissolve_free_huge_pages(unsigned long start_pfn,
>  				     unsigned long end_pfn);
> -static inline int hugepage_migration_support(struct hstate *h)
> +static inline int hugepage_migration_supported(struct hstate *h)
>  {
>  #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
>  	return huge_page_shift(h) == PMD_SHIFT;
> @@ -441,7 +441,7 @@ static inline pgoff_t basepage_index(struct page *page)
>  	return page->index;
>  }
>  #define dissolve_free_huge_pages(s, e)	do {} while (0)
> -#define hugepage_migration_support(h)	0
> +#define hugepage_migration_supported(h)	0
>  
>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
>  					   struct mm_struct *mm, pte_t *pte)
> diff --git v3.15-rc5.orig/mm/hugetlb.c v3.15-rc5/mm/hugetlb.c
> index ea42b584661a..83d936d12c1d 100644
> --- v3.15-rc5.orig/mm/hugetlb.c
> +++ v3.15-rc5/mm/hugetlb.c
> @@ -545,7 +545,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
>  /* Movability of hugepages depends on migration support. */
>  static inline gfp_t htlb_alloc_mask(struct hstate *h)
>  {
> -	if (hugepages_treat_as_movable || hugepage_migration_support(h))
> +	if (hugepages_treat_as_movable || hugepage_migration_supported(h))
>  		return GFP_HIGHUSER_MOVABLE;
>  	else
>  		return GFP_HIGHUSER;
> diff --git v3.15-rc5.orig/mm/migrate.c v3.15-rc5/mm/migrate.c
> index bed48809e5d0..15b589ae6aaf 100644
> --- v3.15-rc5.orig/mm/migrate.c
> +++ v3.15-rc5/mm/migrate.c
> @@ -1031,7 +1031,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>  	 * tables or check whether the hugepage is pmd-based or not before
>  	 * kicking migration.
>  	 */
> -	if (!hugepage_migration_support(page_hstate(hpage))) {
> +	if (!hugepage_migration_supported(page_hstate(hpage))) {
>  		putback_active_hugepage(hpage);
>  		return -ENOSYS;
>  	}
> -- 
> 1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-05-30 12:03 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-28  8:32 BUG at mm/memory.c:1489! Michael Ellerman
2014-05-28  8:32 ` Michael Ellerman
2014-05-29  0:33 ` Hugh Dickins
2014-05-29  0:33   ` Hugh Dickins
2014-05-29  4:52   ` Naoya Horiguchi
2014-05-29 20:50     ` Hugh Dickins
2014-05-29 20:50       ` Hugh Dickins
2014-05-29  8:59   ` Michael Ellerman
2014-05-29  8:59     ` Michael Ellerman
2014-05-29 18:34     ` [PATCH] hugetlb: restrict hugepage_migration_support() to x86_64 (Re: BUG at mm/memory.c:1489!) Naoya Horiguchi
2014-05-29 22:04       ` Hugh Dickins
2014-05-29 22:04         ` Hugh Dickins
2014-05-30  2:56         ` Naoya Horiguchi
2014-05-29 21:03     ` BUG at mm/memory.c:1489! Hugh Dickins
2014-05-29 21:03       ` Hugh Dickins
     [not found]     ` <1401388474-mqnis5cp@n-horiguchi@ah.jp.nec.com>
2014-05-30  1:35       ` [PATCH] hugetlb: restrict hugepage_migration_support() to x86_64 (Re: BUG at mm/memory.c:1489!) Michael Ellerman
2014-05-30  1:35         ` Michael Ellerman
2014-05-30  1:52         ` Hugh Dickins
2014-05-30  1:52           ` Hugh Dickins
2014-05-30  3:04         ` Naoya Horiguchi
2014-05-30  4:13           ` [PATCH 1/2] hugetlb: restrict hugepage_migration_support() to x86_64 Naoya Horiguchi
2014-05-30  4:13             ` Naoya Horiguchi
2014-05-30 12:00             ` Hugh Dickins
2014-05-30 12:00               ` Hugh Dickins
2014-05-30  4:13           ` [PATCH 2/2] hugetlb: rename hugepage_migration_support() to ..._supported() Naoya Horiguchi
2014-05-30  4:13             ` Naoya Horiguchi
2014-05-30 12:02             ` Hugh Dickins
2014-05-30 12:02               ` Hugh Dickins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.