linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BUG at mm/memory.c:1489!
@ 2014-05-28  8:32 Michael Ellerman
  2014-05-29  0:33 ` Hugh Dickins
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Ellerman @ 2014-05-28  8:32 UTC (permalink / raw)
  To: Linux MM, LKML, trinity

Hey folks,

Anyone seen this before? Trinity hit it just now:

Linux Blade312-5 3.15.0-rc7 #306 SMP Wed May 28 17:51:18 EST 2014 ppc64

[watchdog] 27853 iterations. [F:22642 S:5174 HI:1276]
------------[ cut here ]------------
kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
cpu 0xc: Vector: 700 (Program Check) at [c000000384eaf960]
    pc: c0000000001ad6f0: .follow_page_mask+0x90/0x650
    lr: c0000000001ad6d8: .follow_page_mask+0x78/0x650
    sp: c000000384eafbe0
   msr: 8000000000029032
  current = 0xc0000003c27e1bc0
  paca    = 0xc000000001dc3000   softe: 0        irq_happened: 0x01
    pid   = 20800, comm = trinity-c12
kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
enter ? for help
[c000000384eafcc0] c0000000001e5514 .SyS_move_pages+0x524/0x7d0
[c000000384eafe30] c00000000000a1d8 syscall_exit+0x0/0x98
--- Exception: c01 (System Call) at 00003fff795f30a8
SP (3ffff958f290) is in userspace


I've left it in the debugger, can dig into it a bit more tomorrow if anyone has
any clues.

cheers



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG at mm/memory.c:1489!
  2014-05-28  8:32 BUG at mm/memory.c:1489! Michael Ellerman
@ 2014-05-29  0:33 ` Hugh Dickins
  2014-05-29  8:59   ` Michael Ellerman
       [not found]   ` <5386bd2e.6478c20a.5839.5e40SMTPIN_ADDED_BROKEN@mx.google.com>
  0 siblings, 2 replies; 8+ messages in thread
From: Hugh Dickins @ 2014-05-29  0:33 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Andrew Morton, Naoya Horiguchi, Benjamin Herrenschmidt,
	Tony Luck, linux-mm, linux-kernel, trinity

On Wed, 28 May 2014, Michael Ellerman wrote:
> Hey folks,
> 
> Anyone seen this before? Trinity hit it just now:
> 
> Linux Blade312-5 3.15.0-rc7 #306 SMP Wed May 28 17:51:18 EST 2014 ppc64
> 
> [watchdog] 27853 iterations. [F:22642 S:5174 HI:1276]
> ------------[ cut here ]------------
> kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> cpu 0xc: Vector: 700 (Program Check) at [c000000384eaf960]
>     pc: c0000000001ad6f0: .follow_page_mask+0x90/0x650
>     lr: c0000000001ad6d8: .follow_page_mask+0x78/0x650
>     sp: c000000384eafbe0
>    msr: 8000000000029032
>   current = 0xc0000003c27e1bc0
>   paca    = 0xc000000001dc3000   softe: 0        irq_happened: 0x01
>     pid   = 20800, comm = trinity-c12
> kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> enter ? for help
> [c000000384eafcc0] c0000000001e5514 .SyS_move_pages+0x524/0x7d0
> [c000000384eafe30] c00000000000a1d8 syscall_exit+0x0/0x98
> --- Exception: c01 (System Call) at 00003fff795f30a8
> SP (3ffff958f290) is in userspace
> 
> I've left it in the debugger, can dig into it a bit more tomorrow
> if anyone has any clues.

Thanks for leaving it overnight, but this one is quite obvious,
so go ahead and reboot whenever suits you.

Trinity didn't even need to do anything bizarre to get this: that
ordinary path simply didn't get tried on powerpc or ia64 before.

Here's a patch which should fix it for you, but I believe leaves
a race in common with other architectures.  I must turn away to
other things, and hope Naoya-san can fix up the locking separately
(or point out why it's already safe).

[PATCH] mm: fix move_pages follow_page huge_addr BUG

v3.12's e632a938d914 ("mm: migrate: add hugepage migration code to
move_pages()") is okay on most arches, but on follow_huge_addr-style
arches ia64 and powerpc, it hits my old BUG_ON(flags & FOLL_GET)
from v2.6.15 deceb6cd17e6 ("mm: follow_page with inner ptlock").

The point of the BUG_ON was that nothing needed FOLL_GET there at
the time, and it was not clear that we have sufficient locking to
use get_page() safely here on the outside - maybe the page found has
already been freed and even reused when follow_huge_addr() returns.

I suspect that e632a938d914's use of get_page() after return from
follow_huge_pmd() has the same problem: what prevents a racing
instance of move_pages() from already migrating away and freeing
that page by then?  A reference to the page should be taken while
holding suitable lock (huge_pte_lockptr?), to serialize against
concurrent migration.

But I'm not prepared to rework the hugetlb locking here myself;
so for now just supply a patch to copy e632a938d914's get_page()
after follow_huge_pmd() to after follow_huge_addr(): removing
the BUG_ON(flags & FOLL_GET), but probably leaving a race.

Fixes: e632a938d914 ("mm: migrate: add hugepage migration code to move_pages()")
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: stable@vger.kernel.org # 3.12+
---
Whether this is a patch that should go in without fixing the locking,
I don't know.  An unlikely race is better than a triggerable BUG?
Or perhaps I'm just wrong about there being any such race.

 mm/memory.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

--- 3.15-rc7/mm/memory.c	2014-04-27 23:55:53.608801152 -0700
+++ linux/mm/memory.c	2014-05-28 13:05:48.340124615 -0700
@@ -1486,7 +1486,17 @@ struct page *follow_page_mask(struct vm_
 
 	page = follow_huge_addr(mm, address, flags & FOLL_WRITE);
 	if (!IS_ERR(page)) {
-		BUG_ON(flags & FOLL_GET);
+		if (page && (flags & FOLL_GET)) {
+			/*
+			 * Refcount on tail pages are not well-defined and
+			 * shouldn't be taken. The caller should handle a NULL
+			 * return when trying to follow tail pages.
+			 */
+			if (PageHead(page))
+				get_page(page);
+			else
+				page = NULL;
+		}
 		goto out;
 	}
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG at mm/memory.c:1489!
  2014-05-29  0:33 ` Hugh Dickins
@ 2014-05-29  8:59   ` Michael Ellerman
  2014-05-29 21:03     ` Hugh Dickins
                       ` (2 more replies)
       [not found]   ` <5386bd2e.6478c20a.5839.5e40SMTPIN_ADDED_BROKEN@mx.google.com>
  1 sibling, 3 replies; 8+ messages in thread
From: Michael Ellerman @ 2014-05-29  8:59 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Naoya Horiguchi, Benjamin Herrenschmidt,
	Tony Luck, linux-mm, linux-kernel, trinity

On Wed, 2014-05-28 at 17:33 -0700, Hugh Dickins wrote:
> On Wed, 28 May 2014, Michael Ellerman wrote:
> > Linux Blade312-5 3.15.0-rc7 #306 SMP Wed May 28 17:51:18 EST 2014 ppc64
> > 
> > [watchdog] 27853 iterations. [F:22642 S:5174 HI:1276]
> > ------------[ cut here ]------------
> > kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> > cpu 0xc: Vector: 700 (Program Check) at [c000000384eaf960]
> >     pc: c0000000001ad6f0: .follow_page_mask+0x90/0x650
> >     lr: c0000000001ad6d8: .follow_page_mask+0x78/0x650
> >     sp: c000000384eafbe0
> >    msr: 8000000000029032
> >   current = 0xc0000003c27e1bc0
> >   paca    = 0xc000000001dc3000   softe: 0        irq_happened: 0x01
> >     pid   = 20800, comm = trinity-c12
> > kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> > enter ? for help
> > [c000000384eafcc0] c0000000001e5514 .SyS_move_pages+0x524/0x7d0
> > [c000000384eafe30] c00000000000a1d8 syscall_exit+0x0/0x98
> > --- Exception: c01 (System Call) at 00003fff795f30a8
> > SP (3ffff958f290) is in userspace
> > 
> > I've left it in the debugger, can dig into it a bit more tomorrow
> > if anyone has any clues.
> 
> Thanks for leaving it overnight, but this one is quite obvious,
> so go ahead and reboot whenever suits you.
> 
> Trinity didn't even need to do anything bizarre to get this: that
> ordinary path simply didn't get tried on powerpc or ia64 before.
> 
> Here's a patch which should fix it for you, but I believe leaves
> a race in common with other architectures.  I must turn away to
> other things, and hope Naoya-san can fix up the locking separately
> (or point out why it's already safe).
> 
> [PATCH] mm: fix move_pages follow_page huge_addr BUG
> 
> v3.12's e632a938d914 ("mm: migrate: add hugepage migration code to
> move_pages()") is okay on most arches, but on follow_huge_addr-style
> arches ia64 and powerpc, it hits my old BUG_ON(flags & FOLL_GET)
> from v2.6.15 deceb6cd17e6 ("mm: follow_page with inner ptlock").
> 
> The point of the BUG_ON was that nothing needed FOLL_GET there at
> the time, and it was not clear that we have sufficient locking to
> use get_page() safely here on the outside - maybe the page found has
> already been freed and even reused when follow_huge_addr() returns.
> 
> I suspect that e632a938d914's use of get_page() after return from
> follow_huge_pmd() has the same problem: what prevents a racing
> instance of move_pages() from already migrating away and freeing
> that page by then?  A reference to the page should be taken while
> holding suitable lock (huge_pte_lockptr?), to serialize against
> concurrent migration.
> 
> But I'm not prepared to rework the hugetlb locking here myself;
> so for now just supply a patch to copy e632a938d914's get_page()
> after follow_huge_pmd() to after follow_huge_addr(): removing
> the BUG_ON(flags & FOLL_GET), but probably leaving a race.

Thanks for the detailed explanation Hugh.

Unfortunately I don't know our mm/hugetlb code well enough to give you a good
answer. Ben had a quick look at our follow_huge_addr() and thought it looked
"fishy". He suggested something like what we do in gup_pte_range() with
page_cache_get_speculative() might be in order.

Applying your patch and running trinity pretty immediately results in the
following, which looks related (sys_move_pages() again) ?

Unable to handle kernel paging request for data at address 0xf2000f80000000
Faulting instruction address: 0xc0000000001e29bc
cpu 0x1b: Vector: 300 (Data Access) at [c0000003c70f76f0]
    pc: c0000000001e29bc: .remove_migration_pte+0x9c/0x320
    lr: c0000000001e29b8: .remove_migration_pte+0x98/0x320
    sp: c0000003c70f7970
   msr: 8000000000009032
   dar: f2000f80000000
 dsisr: 40000000
  current = 0xc0000003f9045800
  paca    = 0xc000000001dc6c00   softe: 0        irq_happened: 0x01
    pid   = 3585, comm = trinity-c27
enter ? for help
[c0000003c70f7a20] c0000000001bce88 .rmap_walk+0x328/0x470
[c0000003c70f7ae0] c0000000001e2904 .remove_migration_ptes+0x44/0x60
[c0000003c70f7b80] c0000000001e4ce8 .migrate_pages+0x6d8/0xa00
[c0000003c70f7cc0] c0000000001e55ec .SyS_move_pages+0x5dc/0x7d0
[c0000003c70f7e30] c00000000000a1d8 syscall_exit+0x0/0x98
--- Exception: c01 (System Call) at 00003fff7b2b30a8
SP (3fffe09728a0) is in userspace
1b:mon> 

I've hit it twice in two runs:

Unable to handle kernel paging request for data at address 0xf2400f00000000
Faulting instruction address: 0xc0000000001e2a3c
cpu 0xd: Vector: 300 (Data Access) at [c00000038a4bf6f0]
    pc: c0000000001e2a3c: .remove_migration_pte+0x9c/0x320
    lr: c0000000001e2a38: .remove_migration_pte+0x98/0x320
    sp: c00000038a4bf970
   msr: 8000000000009032
   dar: f2400f00000000
 dsisr: 40000000
  current = 0xc0000003acd9e680
  paca    = 0xc000000001dc3400   softe: 0        irq_happened: 0x01
    pid   = 13334, comm = trinity-c13
enter ? for help
[c00000038a4bfa20] c0000000001bcf08 .rmap_walk+0x328/0x470
[c00000038a4bfae0] c0000000001e2984 .remove_migration_ptes+0x44/0x60
[c00000038a4bfb80] c0000000001e4d68 .migrate_pages+0x6d8/0xa00
[c00000038a4bfcc0] c0000000001e566c .SyS_move_pages+0x5dc/0x7d0
[c00000038a4bfe30] c00000000000a1d8 syscall_exit+0x0/0x98
--- Exception: c01 (System Call) at 00003fff79df30a8
SP (3fffda95d500) is in userspace
d:mon> 


If I tell trinity to skip sys_move_pages() it runs for hours.

cheers



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG at mm/memory.c:1489!
       [not found]   ` <5386bd2e.6478c20a.5839.5e40SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2014-05-29 20:50     ` Hugh Dickins
  0 siblings, 0 replies; 8+ messages in thread
From: Hugh Dickins @ 2014-05-29 20:50 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Hugh Dickins, mpe, Andrew Morton, benh, Tony Luck, linux-mm,
	linux-kernel, trinity

On Thu, 29 May 2014, Naoya Horiguchi wrote:
> On Wed, May 28, 2014 at 05:33:11PM -0700, Hugh Dickins wrote:
> > On Wed, 28 May 2014, Michael Ellerman wrote:
> > > Hey folks,
> > > 
> > > Anyone seen this before? Trinity hit it just now:
> > > 
> > > Linux Blade312-5 3.15.0-rc7 #306 SMP Wed May 28 17:51:18 EST 2014 ppc64
> > > 
> > > [watchdog] 27853 iterations. [F:22642 S:5174 HI:1276]
> > > ------------[ cut here ]------------
> > > kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> > > cpu 0xc: Vector: 700 (Program Check) at [c000000384eaf960]
> > >     pc: c0000000001ad6f0: .follow_page_mask+0x90/0x650
> > >     lr: c0000000001ad6d8: .follow_page_mask+0x78/0x650
> > >     sp: c000000384eafbe0
> > >    msr: 8000000000029032
> > >   current = 0xc0000003c27e1bc0
> > >   paca    = 0xc000000001dc3000   softe: 0        irq_happened: 0x01
> > >     pid   = 20800, comm = trinity-c12
> > > kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> > > enter ? for help
> > > [c000000384eafcc0] c0000000001e5514 .SyS_move_pages+0x524/0x7d0
> > > [c000000384eafe30] c00000000000a1d8 syscall_exit+0x0/0x98
> > > --- Exception: c01 (System Call) at 00003fff795f30a8
> > > SP (3ffff958f290) is in userspace
> > > 
> > > I've left it in the debugger, can dig into it a bit more tomorrow
> > > if anyone has any clues.
> > 
> > Thanks for leaving it overnight, but this one is quite obvious,
> > so go ahead and reboot whenever suits you.
> > 
> > Trinity didn't even need to do anything bizarre to get this: that
> > ordinary path simply didn't get tried on powerpc or ia64 before.
> > 
> > Here's a patch which should fix it for you, but I believe leaves
> > a race in common with other architectures.  I must turn away to
> > other things, and hope Naoya-san can fix up the locking separately
> > (or point out why it's already safe).
> > 
> > [PATCH] mm: fix move_pages follow_page huge_addr BUG
> > 
> > v3.12's e632a938d914 ("mm: migrate: add hugepage migration code to
> > move_pages()") is okay on most arches, but on follow_huge_addr-style
> > arches ia64 and powerpc, it hits my old BUG_ON(flags & FOLL_GET)
> > from v2.6.15 deceb6cd17e6 ("mm: follow_page with inner ptlock").
> > 
> > The point of the BUG_ON was that nothing needed FOLL_GET there at
> > the time, and it was not clear that we have sufficient locking to
> > use get_page() safely here on the outside - maybe the page found has
> > already been freed and even reused when follow_huge_addr() returns.
> > 
> > I suspect that e632a938d914's use of get_page() after return from
> > follow_huge_pmd() has the same problem: what prevents a racing
> > instance of move_pages() from already migrating away and freeing
> > that page by then?  A reference to the page should be taken while
> > holding suitable lock (huge_pte_lockptr?), to serialize against
> > concurrent migration.
> 
> Right, we need take huge_pte_lockptr() here, I think.
> 
> > But I'm not prepared to rework the hugetlb locking here myself;
> > so for now just supply a patch to copy e632a938d914's get_page()
> > after follow_huge_pmd() to after follow_huge_addr(): removing
> > the BUG_ON(flags & FOLL_GET), but probably leaving a race.
> 
> This bug was introduced by me, so I'll fix this.
> Thank you for reporting.
> 
> > Fixes: e632a938d914 ("mm: migrate: add hugepage migration code to move_pages()")
> > Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> > Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Cc: stable@vger.kernel.org # 3.12+
> 
> This patch looks good to me.
> 
> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Thanks, but seeing Michael's further problems once this is in,
I think we are better off for now with your non-x86_64 disablement.
But that's not complete in itself, I'll comment there.

Hugh

> 
> > ---
> > Whether this is a patch that should go in without fixing the locking,
> > I don't know.  An unlikely race is better than a triggerable BUG?
> > Or perhaps I'm just wrong about there being any such race.
> > 
> >  mm/memory.c |   12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> > 
> > --- 3.15-rc7/mm/memory.c	2014-04-27 23:55:53.608801152 -0700
> > +++ linux/mm/memory.c	2014-05-28 13:05:48.340124615 -0700
> > @@ -1486,7 +1486,17 @@ struct page *follow_page_mask(struct vm_
> >  
> >  	page = follow_huge_addr(mm, address, flags & FOLL_WRITE);
> >  	if (!IS_ERR(page)) {
> > -		BUG_ON(flags & FOLL_GET);
> > +		if (page && (flags & FOLL_GET)) {
> > +			/*
> > +			 * Refcount on tail pages are not well-defined and
> > +			 * shouldn't be taken. The caller should handle a NULL
> > +			 * return when trying to follow tail pages.
> > +			 */
> > +			if (PageHead(page))
> > +				get_page(page);
> > +			else
> > +				page = NULL;
> > +		}
> >  		goto out;
> >  	}
> >  
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> > 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG at mm/memory.c:1489!
  2014-05-29  8:59   ` Michael Ellerman
@ 2014-05-29 21:03     ` Hugh Dickins
       [not found]     ` <53877dd1.0350c20a.2dde.ffff99d7SMTPIN_ADDED_BROKEN@mx.google.com>
       [not found]     ` <1401388474-mqnis5cp@n-horiguchi@ah.jp.nec.com>
  2 siblings, 0 replies; 8+ messages in thread
From: Hugh Dickins @ 2014-05-29 21:03 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Hugh Dickins, Andrew Morton, Naoya Horiguchi,
	Benjamin Herrenschmidt, Tony Luck, linux-mm, linux-kernel,
	trinity

On Thu, 29 May 2014, Michael Ellerman wrote:
> 
> Unfortunately I don't know our mm/hugetlb code well enough to give you a good
> answer. Ben had a quick look at our follow_huge_addr() and thought it looked
> "fishy". He suggested something like what we do in gup_pte_range() with
> page_cache_get_speculative() might be in order.

Fishy indeed, ancient code that was only ever intended for stats-like
usage, not designed for actually getting a hold on the page.  But I
don't think there's a big problem to getting the locking right: just
hope it doesn't require a different strategy on each architecture -
often an irritation with hugetlb.  Naoya-san will sort it out in
due course (not 3.15) I expect, but will probably need testing help.

> 
> Applying your patch and running trinity pretty immediately results in the
> following, which looks related (sys_move_pages() again) ?
> 
> Unable to handle kernel paging request for data at address 0xf2000f80000000
> Faulting instruction address: 0xc0000000001e29bc
> cpu 0x1b: Vector: 300 (Data Access) at [c0000003c70f76f0]
>     pc: c0000000001e29bc: .remove_migration_pte+0x9c/0x320
>     lr: c0000000001e29b8: .remove_migration_pte+0x98/0x320
>     sp: c0000003c70f7970
>    msr: 8000000000009032
>    dar: f2000f80000000
>  dsisr: 40000000
>   current = 0xc0000003f9045800
>   paca    = 0xc000000001dc6c00   softe: 0        irq_happened: 0x01
>     pid   = 3585, comm = trinity-c27
> enter ? for help
> [c0000003c70f7a20] c0000000001bce88 .rmap_walk+0x328/0x470
> [c0000003c70f7ae0] c0000000001e2904 .remove_migration_ptes+0x44/0x60
> [c0000003c70f7b80] c0000000001e4ce8 .migrate_pages+0x6d8/0xa00
> [c0000003c70f7cc0] c0000000001e55ec .SyS_move_pages+0x5dc/0x7d0
> [c0000003c70f7e30] c00000000000a1d8 syscall_exit+0x0/0x98
> --- Exception: c01 (System Call) at 00003fff7b2b30a8
> SP (3fffe09728a0) is in userspace
> 1b:mon> 
> 
> I've hit it twice in two runs:
> 
> If I tell trinity to skip sys_move_pages() it runs for hours.

That's sad.  Sorry for wasting your time with my patch, thank you
for trying it.  What you see might be a consequence of the locking
deficiency I mentioned, given trinity's deviousness; though if it's
being clever like that, I would expect it to have already found the
equivalent issue on x86-64.  So probably not, probably another issue.

As I've said elsewhere, I think we need to go with disablement for now.

Hugh

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] hugetlb: restrict hugepage_migration_support() to x86_64 (Re: BUG at mm/memory.c:1489!)
       [not found]     ` <53877dd1.0350c20a.2dde.ffff99d7SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2014-05-29 22:04       ` Hugh Dickins
  0 siblings, 0 replies; 8+ messages in thread
From: Hugh Dickins @ 2014-05-29 22:04 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: mpe, Hugh Dickins, Andrew Morton, benh, Tony Luck, linux-mm,
	linux-kernel, trinity

On Thu, 29 May 2014, Naoya Horiguchi wrote:
> 
> Curretly hugepage migration is available for all archs which support pmd-level
> hugepage, but testing is done only for x86_64 and there're bugs for other archs.

And even for x86_64 I think: the follow_huge_pmd() locking issue I
mentioned.  But I agree that's a different kind of bug, and probably
not cause to disable the feature even on x86_64 at this stage - but
cause to fix it in a different patch Cc stable when you have a moment.

> So to avoid breaking such archs, this patch limits the availability strictly to
> x86_64 until developers of other archs get interested in enabling this feature.

Hmm, I don't like the sound of "until developers of other archs get
interested in enabling this feature".  Your choice, I suppose, but I
had been expecting you to give them a little more help than that, by
fixing up the follow_huge_addr() and locking as you expect it to be
(and whatever Michael's subsequent remove_migration_pte() crash comes
from - maybe obvious with a little thought, but I haven't), then
pinging those architectures to give it a try and enable if they wish.

Perhaps I'm expecting too much, and you haven't the time; doubt I have.

I believe your patch below is incomplete, or perhaps you were
expecting to layer it on top of my follow_huge_addr get_page one.
No, I think we should throw mine away if you're going to disable
the feature on most architectures for now (and once the locking is
corrected, my get_page after follow_huge_addr will be wrong anyway).

What I think you're missing is an adjustment to your 71ea2efb1e93
("mm: migrate: remove VM_HUGETLB from vma flag check in vma_migratable()"):
doesn't vma_migratable() need to test VM_HUGETLB when
!CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION?  Then we are saved from
reaching the follow_huge_addr() BUG; and avoid the weird preparation
for migrating HUGETLB pages on architectures which do not support it.

But yes, I think your disablement approach is the right thing for 3.15.

> 
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: stable@vger.kernel.org # 3.12+
> ---
>  arch/arm/mm/hugetlbpage.c     |  5 -----
>  arch/arm64/mm/hugetlbpage.c   |  5 -----
>  arch/ia64/mm/hugetlbpage.c    |  5 -----
>  arch/metag/mm/hugetlbpage.c   |  5 -----
>  arch/mips/mm/hugetlbpage.c    |  5 -----
>  arch/powerpc/mm/hugetlbpage.c | 10 ----------
>  arch/s390/mm/hugetlbpage.c    |  5 -----
>  arch/sh/mm/hugetlbpage.c      |  5 -----
>  arch/sparc/mm/hugetlbpage.c   |  5 -----
>  arch/tile/mm/hugetlbpage.c    |  5 -----
>  arch/x86/Kconfig              |  4 ++++
>  arch/x86/mm/hugetlbpage.c     | 10 ----------
>  include/linux/hugetlb.h       | 10 ++++++----
>  mm/Kconfig                    |  3 +++
>  14 files changed, 13 insertions(+), 69 deletions(-)
> 
> diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c
> index 54ee6163c181..66781bf34077 100644
> --- a/arch/arm/mm/hugetlbpage.c
> +++ b/arch/arm/mm/hugetlbpage.c
> @@ -56,8 +56,3 @@ int pmd_huge(pmd_t pmd)
>  {
>  	return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index 5e9aec358306..2fc8258bab2d 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -54,11 +54,6 @@ int pud_huge(pud_t pud)
>  	return !(pud_val(pud) & PUD_TABLE_BIT);
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  static __init int setup_hugepagesz(char *opt)
>  {
>  	unsigned long ps = memparse(opt, &opt);
> diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c
> index 68232db98baa..76069c18ee42 100644
> --- a/arch/ia64/mm/hugetlbpage.c
> +++ b/arch/ia64/mm/hugetlbpage.c
> @@ -114,11 +114,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
> -
>  struct page *
>  follow_huge_pmd(struct mm_struct *mm, unsigned long address, pmd_t *pmd, int write)
>  {
> diff --git a/arch/metag/mm/hugetlbpage.c b/arch/metag/mm/hugetlbpage.c
> index 042431509b56..3c52fa6d0f8e 100644
> --- a/arch/metag/mm/hugetlbpage.c
> +++ b/arch/metag/mm/hugetlbpage.c
> @@ -110,11 +110,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmd, int write)
>  {
> diff --git a/arch/mips/mm/hugetlbpage.c b/arch/mips/mm/hugetlbpage.c
> index 77e0ae036e7c..4ec8ee10d371 100644
> --- a/arch/mips/mm/hugetlbpage.c
> +++ b/arch/mips/mm/hugetlbpage.c
> @@ -84,11 +84,6 @@ int pud_huge(pud_t pud)
>  	return (pud_val(pud) & _PAGE_HUGE) != 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  struct page *
>  follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  		pmd_t *pmd, int write)
> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
> index eb923654ba80..7e70ae968e5f 100644
> --- a/arch/powerpc/mm/hugetlbpage.c
> +++ b/arch/powerpc/mm/hugetlbpage.c
> @@ -86,11 +86,6 @@ int pgd_huge(pgd_t pgd)
>  	 */
>  	return ((pgd_val(pgd) & 0x3) != 0x0);
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
>  #else
>  int pmd_huge(pmd_t pmd)
>  {
> @@ -106,11 +101,6 @@ int pgd_huge(pgd_t pgd)
>  {
>  	return 0;
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
>  #endif
>  
>  pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
> diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
> index 0727a55d87d9..0ff66a7e29bb 100644
> --- a/arch/s390/mm/hugetlbpage.c
> +++ b/arch/s390/mm/hugetlbpage.c
> @@ -220,11 +220,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmdp, int write)
>  {
> diff --git a/arch/sh/mm/hugetlbpage.c b/arch/sh/mm/hugetlbpage.c
> index 0d676a41081e..d7762349ea48 100644
> --- a/arch/sh/mm/hugetlbpage.c
> +++ b/arch/sh/mm/hugetlbpage.c
> @@ -83,11 +83,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmd, int write)
>  {
> diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
> index 9bd9ce80bf77..d329537739c6 100644
> --- a/arch/sparc/mm/hugetlbpage.c
> +++ b/arch/sparc/mm/hugetlbpage.c
> @@ -231,11 +231,6 @@ int pud_huge(pud_t pud)
>  	return 0;
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmd, int write)
>  {
> diff --git a/arch/tile/mm/hugetlbpage.c b/arch/tile/mm/hugetlbpage.c
> index 0cb3bbaa580c..e514899e1100 100644
> --- a/arch/tile/mm/hugetlbpage.c
> +++ b/arch/tile/mm/hugetlbpage.c
> @@ -166,11 +166,6 @@ int pud_huge(pud_t pud)
>  	return !!(pud_val(pud) & _PAGE_HUGE_PAGE);
>  }
>  
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
> -
>  struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  			     pmd_t *pmd, int write)
>  {
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 25d2c6f7325e..0cf6a7d0a93e 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1871,6 +1871,10 @@ config ARCH_ENABLE_SPLIT_PMD_PTLOCK
>  	def_bool y
>  	depends on X86_64 || X86_PAE
>  
> +config ARCH_ENABLE_HUGEPAGE_MIGRATION
> +	def_bool y
> +	depends on X86_64 || MIGRATION
> +

Should that be X86_64 && MIGRATION?  X86_64 && HUGETLB_PAGE && MIGRATION?
Maybe it doesn't matter.

Yes, I agree a per-arch config option is better than all those
pmd_huge_support() functions, especially all the ones saying 0.

>  menu "Power management and ACPI options"
>  
>  config ARCH_HIBERNATION_HEADER
> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
> index 8c9f647ff9e1..8b977ebf9388 100644
> --- a/arch/x86/mm/hugetlbpage.c
> +++ b/arch/x86/mm/hugetlbpage.c
> @@ -58,11 +58,6 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  {
>  	return NULL;
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 0;
> -}
>  #else
>  
>  struct page *
> @@ -80,11 +75,6 @@ int pud_huge(pud_t pud)
>  {
>  	return !!(pud_val(pud) & _PAGE_PSE);
>  }
> -
> -int pmd_huge_support(void)
> -{
> -	return 1;
> -}
>  #endif
>  
>  #ifdef CONFIG_HUGETLB_PAGE
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 63214868c5b2..61c2e349af64 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -385,15 +385,18 @@ static inline pgoff_t basepage_index(struct page *page)
>  
>  extern void dissolve_free_huge_pages(unsigned long start_pfn,
>  				     unsigned long end_pfn);
> -int pmd_huge_support(void);
>  /*
> - * Currently hugepage migration is enabled only for pmd-based hugepage.
> + * Currently hugepage migration is enabled only for x86_64.

You don't want to have to update that comment every time an architecture
opts in.  No need for any comment here, I think, the name is good enough
(though hugepage_migration_supported() would be better).

>   * This function will be updated when hugepage migration is more widely
>   * supported.
>   */
>  static inline int hugepage_migration_support(struct hstate *h)
>  {
> -	return pmd_huge_support() && (huge_page_shift(h) == PMD_SHIFT);
> +#ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
> +	return huge_page_shift(h) == PMD_SHIFT;
> +#else
> +	return 0;
> +#endif
>  }
>  
>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
> @@ -443,7 +446,6 @@ static inline pgoff_t basepage_index(struct page *page)
>  	return page->index;
>  }
>  #define dissolve_free_huge_pages(s, e)	do {} while (0)
> -#define pmd_huge_support()	0
>  #define hugepage_migration_support(h)	0
>  
>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
> diff --git a/mm/Kconfig b/mm/Kconfig
> index ebe5880c29d6..1e22701c972b 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -264,6 +264,9 @@ config MIGRATION
>  	  pages as migration can relocate pages to satisfy a huge page
>  	  allocation instead of reclaiming.
>  
> +config ARCH_ENABLE_HUGEPAGE_MIGRATION
> +	boolean
> +

I don't remember how duplicated config entries work,
so cannot comment on that.

>  config PHYS_ADDR_T_64BIT
>  	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
>  
> -- 
> 1.9.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] hugetlb: restrict hugepage_migration_support() to x86_64 (Re: BUG at mm/memory.c:1489!)
       [not found]     ` <1401388474-mqnis5cp@n-horiguchi@ah.jp.nec.com>
@ 2014-05-30  1:35       ` Michael Ellerman
  2014-05-30  1:52         ` Hugh Dickins
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Ellerman @ 2014-05-30  1:35 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Hugh Dickins, Andrew Morton, benh, Tony Luck, linux-mm,
	linux-kernel, trinity

On Thu, 2014-05-29 at 14:34 -0400, Naoya Horiguchi wrote:
> On Thu, May 29, 2014 at 06:59:43PM +1000, Michael Ellerman wrote:
> > Applying your patch and running trinity pretty immediately results in the
> > following, which looks related (sys_move_pages() again) ?
> >
> > Unable to handle kernel paging request for data at address 0xf2000f80000000
> > Faulting instruction address: 0xc0000000001e29bc
> > cpu 0x1b: Vector: 300 (Data Access) at [c0000003c70f76f0]
> >     pc: c0000000001e29bc: .remove_migration_pte+0x9c/0x320
> >     lr: c0000000001e29b8: .remove_migration_pte+0x98/0x320
> >     sp: c0000003c70f7970
> >    msr: 8000000000009032
> >    dar: f2000f80000000
> >  dsisr: 40000000
> >   current = 0xc0000003f9045800
> >   paca    = 0xc000000001dc6c00   softe: 0        irq_happened: 0x01
> >     pid   = 3585, comm = trinity-c27
> > enter ? for help
> > [c0000003c70f7a20] c0000000001bce88 .rmap_walk+0x328/0x470
> > [c0000003c70f7ae0] c0000000001e2904 .remove_migration_ptes+0x44/0x60
> > [c0000003c70f7b80] c0000000001e4ce8 .migrate_pages+0x6d8/0xa00
> > [c0000003c70f7cc0] c0000000001e55ec .SyS_move_pages+0x5dc/0x7d0
> > [c0000003c70f7e30] c00000000000a1d8 syscall_exit+0x0/0x98
> > --- Exception: c01 (System Call) at 00003fff7b2b30a8
> > SP (3fffe09728a0) is in userspace
> > 1b:mon>
>
> Sorry for inconvenience on your testing.
 
That's fine, it's good to find bugs :)

> Hugepage migration is enabled for archs which have pmd-level hugepage
> (including ppc64,) but not tested except for x86_64.
> hugepage_migration_support() controls this so the following patch should
> help you avoid the problem, I believe.
> Could you try to test with it?

Sure. So this patch, in addition to Hugh's patch to remove the BUG_ON(), does
avoid the crash above (remove_migration_pte()).

I dropped Hugh's patch, as he has decided he doesn't like it, and added the
following hunk instead:

diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 3c1b968..f230a97 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -175,6 +175,12 @@ static inline int vma_migratable(struct vm_area_struct *vma)
 {
        if (vma->vm_flags & (VM_IO | VM_PFNMAP))
                return 0;
+
+#ifndef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
+       if (vma->vm_flags & VM_HUGETLB)
+               return 0;
+#endif
+
        /*
         * Migration allocates pages in the highest zone. If we cannot
         * do so then migration (at least from node to node) is not


Which seems to be what Hugh was referring to in his mail - correct me if I'm
wrong Hugh.

With your patch and the above hunk I can run trinity happily for a while,
whereas without it crashes almost immediately.

So with the above hunk you can add my tested-by.

cheers



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] hugetlb: restrict hugepage_migration_support() to x86_64 (Re: BUG at mm/memory.c:1489!)
  2014-05-30  1:35       ` Michael Ellerman
@ 2014-05-30  1:52         ` Hugh Dickins
  0 siblings, 0 replies; 8+ messages in thread
From: Hugh Dickins @ 2014-05-30  1:52 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Naoya Horiguchi, Hugh Dickins, Andrew Morton, benh, Tony Luck,
	linux-mm, linux-kernel, trinity

On Fri, 30 May 2014, Michael Ellerman wrote:
> 
> diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
> index 3c1b968..f230a97 100644
> --- a/include/linux/mempolicy.h
> +++ b/include/linux/mempolicy.h
> @@ -175,6 +175,12 @@ static inline int vma_migratable(struct vm_area_struct *vma)
>  {
>         if (vma->vm_flags & (VM_IO | VM_PFNMAP))
>                 return 0;
> +
> +#ifndef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
> +       if (vma->vm_flags & VM_HUGETLB)
> +               return 0;
> +#endif
> +
>         /*
>          * Migration allocates pages in the highest zone. If we cannot
>          * do so then migration (at least from node to node) is not

That's right, thanks.

> 
> 
> Which seems to be what Hugh was referring to in his mail - correct me if I'm
> wrong Hugh.
> 
> With your patch and the above hunk I can run trinity happily for a while,
> whereas without it crashes almost immediately.
> 
> So with the above hunk you can add my tested-by.
> 
> cheers

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-05-30  1:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-28  8:32 BUG at mm/memory.c:1489! Michael Ellerman
2014-05-29  0:33 ` Hugh Dickins
2014-05-29  8:59   ` Michael Ellerman
2014-05-29 21:03     ` Hugh Dickins
     [not found]     ` <53877dd1.0350c20a.2dde.ffff99d7SMTPIN_ADDED_BROKEN@mx.google.com>
2014-05-29 22:04       ` [PATCH] hugetlb: restrict hugepage_migration_support() to x86_64 (Re: BUG at mm/memory.c:1489!) Hugh Dickins
     [not found]     ` <1401388474-mqnis5cp@n-horiguchi@ah.jp.nec.com>
2014-05-30  1:35       ` Michael Ellerman
2014-05-30  1:52         ` Hugh Dickins
     [not found]   ` <5386bd2e.6478c20a.5839.5e40SMTPIN_ADDED_BROKEN@mx.google.com>
2014-05-29 20:50     ` BUG at mm/memory.c:1489! Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).