All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: hunting an IO hang
       [not found]           ` <1295229722-sup-6494@think>
@ 2011-01-17  2:30             ` Andrew Morton
  2011-01-17  2:41               ` Chris Mason
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2011-01-17  2:30 UTC (permalink / raw)
  To: Chris Mason
  Cc: Linus Torvalds, Jens Axboe, linux-mm, KAMEZAWA Hiroyuki,
	Mel Gorman, Andrea Arcangeli

(lots of cc's added)

On Sun, 16 Jan 2011 21:07:40 -0500 Chris Mason <chris.mason@oracle.com> wrote:

> Excerpts from Linus Torvalds's message of 2011-01-16 20:53:04 -0500:
> > .. except I actually didn't add Andrew to the cc after all.
> > 
> > NOW I did.
> > 
> > Oh, and if you can repeat this and bisect it, it would obviously be
> > great. But that sounds rather painful.
> 
> Ok, so I've got 3 different problems in 3 totally different areas.
> I'm running w/kvm, but this VM is very stable with 2.6.37.  Running
> Linus' current git it goes boom in exotic ways, this time it was only on
> ext3, btrfs code never loaded.
> 
> Linus, if you're planning on rc1 tonight I'll send my pull request out
> the door.  Otherwise I'd prefer to fix this and send my pull after
> actually getting a long btrfs run on the current code.
> 
> Next up, CONFIG_DEBUG*, always an adventure on rc1 kernels ;)
> 
> WARNING: at lib/list_debug.c:57 list_del+0xc0/0xed()
> Hardware name: Bochs
> list_del corruption. next->prev should be ffffea000010cde0, but was ffff88007cff6bc8
> Modules linked in:
> Pid: 524, comm: kswapd0 Not tainted 2.6.37-josef+ #180
> Call Trace:
>  [<ffffffff8106ec94>] ? warn_slowpath_common+0x85/0x9d
>  [<ffffffff8106ed4f>] ? warn_slowpath_fmt+0x46/0x48
>  [<ffffffff81263d6c>] ? list_del+0xc0/0xed
>  [<ffffffff81106d9d>] ? migrate_pages+0x26f/0x357
>  [<ffffffff81100e18>] ? compaction_alloc+0x0/0x2dc
>  [<ffffffff8110150d>] ? compact_zone+0x391/0x5c4
>  [<ffffffff81101905>] ? compact_zone_order+0xc2/0xd1
>  [<ffffffff815c321e>] ? _raw_spin_unlock+0xe/0x10
>  [<ffffffff810dc446>] ? kswapd+0x5c8/0x88f
>  [<ffffffff810dbe7e>] ? kswapd+0x0/0x88f
>  [<ffffffff81089ce8>] ? kthread+0x82/0x8a
>  [<ffffffff810347d4>] ? kernel_thread_helper+0x4/0x10
>  [<ffffffff81089c66>] ? kthread+0x0/0x8a
>  [<ffffffff810347d0>] ? kernel_thread_helper+0x0/0x10
> ---[ end trace 5c6b7933d16b301f ]---

uh-oh.  Does disabling CONFIG_COMPACTION make this go away (requires
disabling CONFIG_TRANSPARENT_HUGEPAGE first).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: hunting an IO hang
  2011-01-17  2:30             ` hunting an IO hang Andrew Morton
@ 2011-01-17  2:41               ` Chris Mason
  2011-01-17  5:11                 ` Andrea Arcangeli
  2011-01-17 10:27                 ` Mel Gorman
  0 siblings, 2 replies; 24+ messages in thread
From: Chris Mason @ 2011-01-17  2:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, Jens Axboe, linux-mm, KAMEZAWA Hiroyuki,
	Mel Gorman, Andrea Arcangeli

Excerpts from Andrew Morton's message of 2011-01-16 21:30:00 -0500:
> (lots of cc's added)
> 
> On Sun, 16 Jan 2011 21:07:40 -0500 Chris Mason <chris.mason@oracle.com> wrote:
> 
> > Excerpts from Linus Torvalds's message of 2011-01-16 20:53:04 -0500:
> > > .. except I actually didn't add Andrew to the cc after all.
> > > 
> > > NOW I did.
> > > 
> > > Oh, and if you can repeat this and bisect it, it would obviously be
> > > great. But that sounds rather painful.
> > 
> > Ok, so I've got 3 different problems in 3 totally different areas.
> > I'm running w/kvm, but this VM is very stable with 2.6.37.  Running
> > Linus' current git it goes boom in exotic ways, this time it was only on
> > ext3, btrfs code never loaded.
> > 
> > Linus, if you're planning on rc1 tonight I'll send my pull request out
> > the door.  Otherwise I'd prefer to fix this and send my pull after
> > actually getting a long btrfs run on the current code.
> > 
> > Next up, CONFIG_DEBUG*, always an adventure on rc1 kernels ;)
> > 
> > WARNING: at lib/list_debug.c:57 list_del+0xc0/0xed()
> > Hardware name: Bochs
> > list_del corruption. next->prev should be ffffea000010cde0, but was ffff88007cff6bc8
> > Modules linked in:
> > Pid: 524, comm: kswapd0 Not tainted 2.6.37-josef+ #180
> > Call Trace:
> >  [<ffffffff8106ec94>] ? warn_slowpath_common+0x85/0x9d
> >  [<ffffffff8106ed4f>] ? warn_slowpath_fmt+0x46/0x48
> >  [<ffffffff81263d6c>] ? list_del+0xc0/0xed
> >  [<ffffffff81106d9d>] ? migrate_pages+0x26f/0x357
> >  [<ffffffff81100e18>] ? compaction_alloc+0x0/0x2dc
> >  [<ffffffff8110150d>] ? compact_zone+0x391/0x5c4
> >  [<ffffffff81101905>] ? compact_zone_order+0xc2/0xd1
> >  [<ffffffff815c321e>] ? _raw_spin_unlock+0xe/0x10
> >  [<ffffffff810dc446>] ? kswapd+0x5c8/0x88f
> >  [<ffffffff810dbe7e>] ? kswapd+0x0/0x88f
> >  [<ffffffff81089ce8>] ? kthread+0x82/0x8a
> >  [<ffffffff810347d4>] ? kernel_thread_helper+0x4/0x10
> >  [<ffffffff81089c66>] ? kthread+0x0/0x8a
> >  [<ffffffff810347d0>] ? kernel_thread_helper+0x0/0x10
> > ---[ end trace 5c6b7933d16b301f ]---
> 
> uh-oh.  Does disabling CONFIG_COMPACTION make this go away (requires
> disabling CONFIG_TRANSPARENT_HUGEPAGE first).

We'll see.  I gave THP this same run of tests back in November, it
passed without any problems (after fixing the related btrfs migration
bug).  All of the crashes I've seen this weekend had this in the
.config:

# CONFIG_TRANSPARENT_HUGEPAGE is not set
CONFIG_COMPACTION=y
CONFIG_MIGRATION=y

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: hunting an IO hang
  2011-01-17  2:41               ` Chris Mason
@ 2011-01-17  5:11                 ` Andrea Arcangeli
  2011-01-17 13:48                   ` Minchan Kim
  2011-01-17 14:10                   ` Chris Mason
  2011-01-17 10:27                 ` Mel Gorman
  1 sibling, 2 replies; 24+ messages in thread
From: Andrea Arcangeli @ 2011-01-17  5:11 UTC (permalink / raw)
  To: Chris Mason
  Cc: Andrew Morton, Linus Torvalds, Jens Axboe, linux-mm,
	KAMEZAWA Hiroyuki, Mel Gorman, Minchan Kim

On Sun, Jan 16, 2011 at 09:41:41PM -0500, Chris Mason wrote:
> Excerpts from Andrew Morton's message of 2011-01-16 21:30:00 -0500:
> > (lots of cc's added)
> > 
> > On Sun, 16 Jan 2011 21:07:40 -0500 Chris Mason <chris.mason@oracle.com> wrote:
> > 
> > > Excerpts from Linus Torvalds's message of 2011-01-16 20:53:04 -0500:
> > > > .. except I actually didn't add Andrew to the cc after all.
> > > > 
> > > > NOW I did.
> > > > 
> > > > Oh, and if you can repeat this and bisect it, it would obviously be
> > > > great. But that sounds rather painful.
> > > 
> > > Ok, so I've got 3 different problems in 3 totally different areas.
> > > I'm running w/kvm, but this VM is very stable with 2.6.37.  Running
> > > Linus' current git it goes boom in exotic ways, this time it was only on
> > > ext3, btrfs code never loaded.
> > > 
> > > Linus, if you're planning on rc1 tonight I'll send my pull request out
> > > the door.  Otherwise I'd prefer to fix this and send my pull after
> > > actually getting a long btrfs run on the current code.
> > > 
> > > Next up, CONFIG_DEBUG*, always an adventure on rc1 kernels ;)
> > > 
> > > WARNING: at lib/list_debug.c:57 list_del+0xc0/0xed()
> > > Hardware name: Bochs
> > > list_del corruption. next->prev should be ffffea000010cde0, but was ffff88007cff6bc8
> > > Modules linked in:
> > > Pid: 524, comm: kswapd0 Not tainted 2.6.37-josef+ #180
> > > Call Trace:
> > >  [<ffffffff8106ec94>] ? warn_slowpath_common+0x85/0x9d
> > >  [<ffffffff8106ed4f>] ? warn_slowpath_fmt+0x46/0x48
> > >  [<ffffffff81263d6c>] ? list_del+0xc0/0xed
> > >  [<ffffffff81106d9d>] ? migrate_pages+0x26f/0x357
> > >  [<ffffffff81100e18>] ? compaction_alloc+0x0/0x2dc
> > >  [<ffffffff8110150d>] ? compact_zone+0x391/0x5c4
> > >  [<ffffffff81101905>] ? compact_zone_order+0xc2/0xd1
> > >  [<ffffffff815c321e>] ? _raw_spin_unlock+0xe/0x10
> > >  [<ffffffff810dc446>] ? kswapd+0x5c8/0x88f
> > >  [<ffffffff810dbe7e>] ? kswapd+0x0/0x88f
> > >  [<ffffffff81089ce8>] ? kthread+0x82/0x8a
> > >  [<ffffffff810347d4>] ? kernel_thread_helper+0x4/0x10
> > >  [<ffffffff81089c66>] ? kthread+0x0/0x8a
> > >  [<ffffffff810347d0>] ? kernel_thread_helper+0x0/0x10
> > > ---[ end trace 5c6b7933d16b301f ]---
> > 
> > uh-oh.  Does disabling CONFIG_COMPACTION make this go away (requires
> > disabling CONFIG_TRANSPARENT_HUGEPAGE first).
> 
> We'll see.  I gave THP this same run of tests back in November, it
> passed without any problems (after fixing the related btrfs migration
> bug).  All of the crashes I've seen this weekend had this in the
> .config:
> 
> # CONFIG_TRANSPARENT_HUGEPAGE is not set
> CONFIG_COMPACTION=y
> CONFIG_MIGRATION=y

I think it's unrelated but reading commit
cf608ac19c95804dc2df43b1f4f9e068aa9034ab if page_count(page) == 1 we
leave the page in the lru but we return 0 (so the caller of
migrate_pages won't call putback_lru_pages to actually free the page,
however compaction would free it because it checks if the list is
empty and it ignores the migrate_pages retval). And in
mm/memory-failure.c:1419, nobody is calling putback_lru_pages (it
seems a missing bit from that older patch). They seem just two memleak
unrelated to the above though.

NOTE: with the last changes compaction is used for all order > 0 and
even from kswapd, so you will now be able to trigger bugs in
compaction or migration even with THP off. However I'm surprised that
you have issues with compaction...

I'm posting this for Minchan to review (not meant for merging, untested).

======
Subject: when migrate_pages returns 0, all pages must have been released

From: Andrea Arcangeli <aarcange@redhat.com>

In some cases migrate_pages could return zero while still leaving a
few pages in the pagelist (and some caller wouldn't notice it has to
call putback_lru_pages).

Add one missing putback_lru_pages not added by commit
cf608ac19c95804dc2df43b1f4f9e068aa9034ab.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 548fbd7..75398b0 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1419,6 +1419,7 @@ int soft_offline_page(struct page *page, int flags)
 		ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL,
 								0, true);
 		if (ret) {
+			putback_lru_pages(&pagelist);
 			pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
 				pfn, ret, page->flags);
 			if (ret > 0)
diff --git a/mm/migrate.c b/mm/migrate.c
index 46fe8cc..bea2a34 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -611,6 +611,14 @@ static int move_to_new_page(struct page *newpage, struct page *page,
 	return rc;
 }
 
+static void unmap_and_move_release_page(struct page *page)
+{
+	list_del(&page->lru);
+	dec_zone_page_state(page, NR_ISOLATED_ANON +
+			    page_is_file_cache(page));
+	putback_lru_page(page);
+}
+
 /*
  * Obtain the lock on page, remove all ptes and migrate the page
  * to the newly allocated page in newpage.
@@ -631,11 +639,14 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
 
 	if (page_count(page) == 1) {
 		/* page was freed from under us. So we are done. */
+		unmap_and_move_release_page(page);
 		goto move_newpage;
 	}
 	if (unlikely(PageTransHuge(page)))
-		if (unlikely(split_huge_page(page)))
+		if (unlikely(split_huge_page(page))) {
+			unmap_and_move_release_page(page);
 			goto move_newpage;
+		}
 
 	/* prepare cgroup just returns 0 or -ENOMEM */
 	rc = -EAGAIN;
@@ -779,10 +790,7 @@ unlock:
  		 * migrated will have kepts its references and be
  		 * restored.
  		 */
- 		list_del(&page->lru);
-		dec_zone_page_state(page, NR_ISOLATED_ANON +
-				page_is_file_cache(page));
-		putback_lru_page(page);
+		unmap_and_move_release_page(page);
 	}
 
 move_newpage:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: hunting an IO hang
  2011-01-17  2:41               ` Chris Mason
  2011-01-17  5:11                 ` Andrea Arcangeli
@ 2011-01-17 10:27                 ` Mel Gorman
  2011-01-17 13:21                   ` Chris Mason
  1 sibling, 1 reply; 24+ messages in thread
From: Mel Gorman @ 2011-01-17 10:27 UTC (permalink / raw)
  To: Chris Mason
  Cc: Andrew Morton, Linus Torvalds, Jens Axboe, linux-mm,
	KAMEZAWA Hiroyuki, Andrea Arcangeli

On Sun, Jan 16, 2011 at 09:41:41PM -0500, Chris Mason wrote:
> Excerpts from Andrew Morton's message of 2011-01-16 21:30:00 -0500:
> > (lots of cc's added)
> > 
> > On Sun, 16 Jan 2011 21:07:40 -0500 Chris Mason <chris.mason@oracle.com> wrote:
> > 
> > > Excerpts from Linus Torvalds's message of 2011-01-16 20:53:04 -0500:
> > > > .. except I actually didn't add Andrew to the cc after all.
> > > > 
> > > > NOW I did.
> > > > 
> > > > Oh, and if you can repeat this and bisect it, it would obviously be
> > > > great. But that sounds rather painful.
> > > 
> > > Ok, so I've got 3 different problems in 3 totally different areas.
> > > I'm running w/kvm, but this VM is very stable with 2.6.37.  Running
> > > Linus' current git it goes boom in exotic ways, this time it was only on
> > > ext3, btrfs code never loaded.
> > > 
> > > Linus, if you're planning on rc1 tonight I'll send my pull request out
> > > the door.  Otherwise I'd prefer to fix this and send my pull after
> > > actually getting a long btrfs run on the current code.
> > > 
> > > Next up, CONFIG_DEBUG*, always an adventure on rc1 kernels ;)
> > > 
> > > WARNING: at lib/list_debug.c:57 list_del+0xc0/0xed()
> > > Hardware name: Bochs
> > > list_del corruption. next->prev should be ffffea000010cde0, but was ffff88007cff6bc8
> > > Modules linked in:
> > > Pid: 524, comm: kswapd0 Not tainted 2.6.37-josef+ #180
> > > Call Trace:
> > >  [<ffffffff8106ec94>] ? warn_slowpath_common+0x85/0x9d
> > >  [<ffffffff8106ed4f>] ? warn_slowpath_fmt+0x46/0x48
> > >  [<ffffffff81263d6c>] ? list_del+0xc0/0xed
> > >  [<ffffffff81106d9d>] ? migrate_pages+0x26f/0x357
> > >  [<ffffffff81100e18>] ? compaction_alloc+0x0/0x2dc
> > >  [<ffffffff8110150d>] ? compact_zone+0x391/0x5c4
> > >  [<ffffffff81101905>] ? compact_zone_order+0xc2/0xd1
> > >  [<ffffffff815c321e>] ? _raw_spin_unlock+0xe/0x10
> > >  [<ffffffff810dc446>] ? kswapd+0x5c8/0x88f
> > >  [<ffffffff810dbe7e>] ? kswapd+0x0/0x88f
> > >  [<ffffffff81089ce8>] ? kthread+0x82/0x8a
> > >  [<ffffffff810347d4>] ? kernel_thread_helper+0x4/0x10
> > >  [<ffffffff81089c66>] ? kthread+0x0/0x8a
> > >  [<ffffffff810347d0>] ? kernel_thread_helper+0x0/0x10
> > > ---[ end trace 5c6b7933d16b301f ]---
> > 
> > uh-oh.  Does disabling CONFIG_COMPACTION make this go away (requires
> > disabling CONFIG_TRANSPARENT_HUGEPAGE first).
> 
> We'll see.  I gave THP this same run of tests back in November, it
> passed without any problems (after fixing the related btrfs migration
> bug).  All of the crashes I've seen this weekend had this in the
> .config:
> 

I can't find the reset of the thread on any mailing list and am trying
to reproduce the problem locally. What workload were you running?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: hunting an IO hang
  2011-01-17 10:27                 ` Mel Gorman
@ 2011-01-17 13:21                   ` Chris Mason
  2011-01-17 13:50                     ` Mel Gorman
  0 siblings, 1 reply; 24+ messages in thread
From: Chris Mason @ 2011-01-17 13:21 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linus Torvalds, Jens Axboe, linux-mm,
	KAMEZAWA Hiroyuki, Andrea Arcangeli

Excerpts from Mel Gorman's message of 2011-01-17 05:27:44 -0500:
> On Sun, Jan 16, 2011 at 09:41:41PM -0500, Chris Mason wrote:
> > Excerpts from Andrew Morton's message of 2011-01-16 21:30:00 -0500:
> > > (lots of cc's added)
> > > 
> > > On Sun, 16 Jan 2011 21:07:40 -0500 Chris Mason <chris.mason@oracle.com> wrote:
> > > 
> > > > Excerpts from Linus Torvalds's message of 2011-01-16 20:53:04 -0500:
> > > > > .. except I actually didn't add Andrew to the cc after all.
> > > > > 
> > > > > NOW I did.
> > > > > 
> > > > > Oh, and if you can repeat this and bisect it, it would obviously be
> > > > > great. But that sounds rather painful.
> > > > 
> > > > Ok, so I've got 3 different problems in 3 totally different areas.
> > > > I'm running w/kvm, but this VM is very stable with 2.6.37.  Running
> > > > Linus' current git it goes boom in exotic ways, this time it was only on
> > > > ext3, btrfs code never loaded.
> > > > 
> > > > Linus, if you're planning on rc1 tonight I'll send my pull request out
> > > > the door.  Otherwise I'd prefer to fix this and send my pull after
> > > > actually getting a long btrfs run on the current code.
> > > > 
> > > > Next up, CONFIG_DEBUG*, always an adventure on rc1 kernels ;)
> > > > 
> > > > WARNING: at lib/list_debug.c:57 list_del+0xc0/0xed()
> > > > Hardware name: Bochs
> > > > list_del corruption. next->prev should be ffffea000010cde0, but was ffff88007cff6bc8
> > > > Modules linked in:
> > > > Pid: 524, comm: kswapd0 Not tainted 2.6.37-josef+ #180
> > > > Call Trace:
> > > >  [<ffffffff8106ec94>] ? warn_slowpath_common+0x85/0x9d
> > > >  [<ffffffff8106ed4f>] ? warn_slowpath_fmt+0x46/0x48
> > > >  [<ffffffff81263d6c>] ? list_del+0xc0/0xed
> > > >  [<ffffffff81106d9d>] ? migrate_pages+0x26f/0x357
> > > >  [<ffffffff81100e18>] ? compaction_alloc+0x0/0x2dc
> > > >  [<ffffffff8110150d>] ? compact_zone+0x391/0x5c4
> > > >  [<ffffffff81101905>] ? compact_zone_order+0xc2/0xd1
> > > >  [<ffffffff815c321e>] ? _raw_spin_unlock+0xe/0x10
> > > >  [<ffffffff810dc446>] ? kswapd+0x5c8/0x88f
> > > >  [<ffffffff810dbe7e>] ? kswapd+0x0/0x88f
> > > >  [<ffffffff81089ce8>] ? kthread+0x82/0x8a
> > > >  [<ffffffff810347d4>] ? kernel_thread_helper+0x4/0x10
> > > >  [<ffffffff81089c66>] ? kthread+0x0/0x8a
> > > >  [<ffffffff810347d0>] ? kernel_thread_helper+0x0/0x10
> > > > ---[ end trace 5c6b7933d16b301f ]---
> > > 
> > > uh-oh.  Does disabling CONFIG_COMPACTION make this go away (requires
> > > disabling CONFIG_TRANSPARENT_HUGEPAGE first).
> > 
> > We'll see.  I gave THP this same run of tests back in November, it
> > passed without any problems (after fixing the related btrfs migration
> > bug).  All of the crashes I've seen this weekend had this in the
> > .config:
> > 
> 
> I can't find the reset of the thread on any mailing list and am trying
> to reproduce the problem locally. What workload were you running?

I'm running a very basic IO stress test:

http://oss.oracle.com/~mason/stress.sh

The command line is stress.sh -n 50 -c /mnt/linux-2.6 /mnt

Which starts 50 processes that do cp -a /mnt/linux-2.6
/mnt/stress/$$.  Then they verify the result was correct and then they
delete it, forever in a loop.  In this case my linux-2.6 directory is a
full git tree with sources checked out.  No obj files though.

This was my crash from an overnight run with CONFIG_COMPACTION off:

# CONFIG_COMPACTION is not set
CONFIG_MIGRATION=y
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
CONFIG_DEBUG_VM=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_LIST=y
CONFIG_DEBUG_PAGEALLOC=y

I do have an NFS mount active during the run, but it isn't part of the
test at all.

I've also managed to get all the procs in the system stuck waiting for
IO requests.  It is possible these are two different bugs.  This
list_del oops hits faster if I run the test with a good deal of memory
pressure via an external memory hog.

------------[ cut here ]------------
WARNING: at lib/list_debug.c:54 list_del+0x97/0xed()
Hardware name: Bochs
list_del corruption. prev->next should be ffffea000116d478, but was ffffea00014ba2c8
Modules linked in: btrfs lzo_compress
Pid: 524, comm: kswapd0 Not tainted 2.6.37-josef+ #182
Call Trace:
 [<ffffffff8106edc1>] ? warn_slowpath_common+0x85/0x9d
 [<ffffffff8106ee7c>] ? warn_slowpath_fmt+0x46/0x48
 [<ffffffff81262e27>] ? list_del+0x97/0xed
 [<ffffffff810dbc59>] ? putback_lru_pages+0x7c/0x1eb
 [<ffffffff810dc070>] ? shrink_inactive_list+0x2a8/0x342
 [<ffffffff810dc676>] ? shrink_zone+0x327/0x3d6
 [<ffffffff8119a17a>] ? nfs_access_cache_shrinker+0x179/0x1a0
 [<ffffffff815c302e>] ? _raw_spin_unlock+0xe/0x10
 [<ffffffff810d2055>] ? zone_watermark_ok_safe+0xa9/0xb8
 [<ffffffff810dd26c>] ? kswapd+0x509/0x876
 [<ffffffff810dcd63>] ? kswapd+0x0/0x876
 [<ffffffff81089e40>] ? kthread+0x82/0x8a
 [<ffffffff810347d4>] ? kernel_thread_helper+0x4/0x10
 [<ffffffff81089dbe>] ? kthread+0x0/0x8a
 [<ffffffff810347d0>] ? kernel_thread_helper+0x0/0x10

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: hunting an IO hang
  2011-01-17  5:11                 ` Andrea Arcangeli
@ 2011-01-17 13:48                   ` Minchan Kim
  2011-01-17 14:10                   ` Chris Mason
  1 sibling, 0 replies; 24+ messages in thread
From: Minchan Kim @ 2011-01-17 13:48 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Chris Mason, Andrew Morton, Linus Torvalds, Jens Axboe, linux-mm,
	KAMEZAWA Hiroyuki, Mel Gorman

Hi Andrea,

On Mon, Jan 17, 2011 at 06:11:35AM +0100, Andrea Arcangeli wrote:
> On Sun, Jan 16, 2011 at 09:41:41PM -0500, Chris Mason wrote:
> > Excerpts from Andrew Morton's message of 2011-01-16 21:30:00 -0500:
> > > (lots of cc's added)
> > > 
> > > On Sun, 16 Jan 2011 21:07:40 -0500 Chris Mason <chris.mason@oracle.com> wrote:
> > > 
> > > > Excerpts from Linus Torvalds's message of 2011-01-16 20:53:04 -0500:
> > > > > .. except I actually didn't add Andrew to the cc after all.
> > > > > 
> > > > > NOW I did.
> > > > > 
> > > > > Oh, and if you can repeat this and bisect it, it would obviously be
> > > > > great. But that sounds rather painful.
> > > > 
> > > > Ok, so I've got 3 different problems in 3 totally different areas.
> > > > I'm running w/kvm, but this VM is very stable with 2.6.37.  Running
> > > > Linus' current git it goes boom in exotic ways, this time it was only on
> > > > ext3, btrfs code never loaded.
> > > > 
> > > > Linus, if you're planning on rc1 tonight I'll send my pull request out
> > > > the door.  Otherwise I'd prefer to fix this and send my pull after
> > > > actually getting a long btrfs run on the current code.
> > > > 
> > > > Next up, CONFIG_DEBUG*, always an adventure on rc1 kernels ;)
> > > > 
> > > > WARNING: at lib/list_debug.c:57 list_del+0xc0/0xed()
> > > > Hardware name: Bochs
> > > > list_del corruption. next->prev should be ffffea000010cde0, but was ffff88007cff6bc8
> > > > Modules linked in:
> > > > Pid: 524, comm: kswapd0 Not tainted 2.6.37-josef+ #180
> > > > Call Trace:
> > > >  [<ffffffff8106ec94>] ? warn_slowpath_common+0x85/0x9d
> > > >  [<ffffffff8106ed4f>] ? warn_slowpath_fmt+0x46/0x48
> > > >  [<ffffffff81263d6c>] ? list_del+0xc0/0xed
> > > >  [<ffffffff81106d9d>] ? migrate_pages+0x26f/0x357
> > > >  [<ffffffff81100e18>] ? compaction_alloc+0x0/0x2dc
> > > >  [<ffffffff8110150d>] ? compact_zone+0x391/0x5c4
> > > >  [<ffffffff81101905>] ? compact_zone_order+0xc2/0xd1
> > > >  [<ffffffff815c321e>] ? _raw_spin_unlock+0xe/0x10
> > > >  [<ffffffff810dc446>] ? kswapd+0x5c8/0x88f
> > > >  [<ffffffff810dbe7e>] ? kswapd+0x0/0x88f
> > > >  [<ffffffff81089ce8>] ? kthread+0x82/0x8a
> > > >  [<ffffffff810347d4>] ? kernel_thread_helper+0x4/0x10
> > > >  [<ffffffff81089c66>] ? kthread+0x0/0x8a
> > > >  [<ffffffff810347d0>] ? kernel_thread_helper+0x0/0x10
> > > > ---[ end trace 5c6b7933d16b301f ]---
> > > 
> > > uh-oh.  Does disabling CONFIG_COMPACTION make this go away (requires
> > > disabling CONFIG_TRANSPARENT_HUGEPAGE first).
> > 
> > We'll see.  I gave THP this same run of tests back in November, it
> > passed without any problems (after fixing the related btrfs migration
> > bug).  All of the crashes I've seen this weekend had this in the
> > .config:
> > 
> > # CONFIG_TRANSPARENT_HUGEPAGE is not set
> > CONFIG_COMPACTION=y
> > CONFIG_MIGRATION=y
> 
> I think it's unrelated but reading commit
> cf608ac19c95804dc2df43b1f4f9e068aa9034ab if page_count(page) == 1 we
> leave the page in the lru but we return 0 (so the caller of
> migrate_pages won't call putback_lru_pages to actually free the page,

Good catch. Totally It's my fault. I made linux memory hogger. :(
Sorry for that.

> however compaction would free it because it checks if the list is
> empty and it ignores the migrate_pages retval). And in

I want to change it with checking retval for consistency.

> mm/memory-failure.c:1419, nobody is calling putback_lru_pages (it
> seems a missing bit from that older patch). They seem just two memleak
> unrelated to the above though.

Nice point. Thanks fot notice me.
It's strange in memory-failure.c:1419. I modified it at that time
and didn't modified migrate_huge_pages(At that time, I didn't noticed hugepage
migration so I missed it but I want to change it like migrate_pages)
You can see my final patch in https://lkml.org/lkml/2010/8/24/248. It was changed. 
Hmm. The putback_lru_pages in soft_offline_huge_page makes critical BUG(early free) 
so we should fix it, too.

We need 3 patch.

1. fix migrate_huge_pages(take out of put_page in migrate_huge_pages) for page corruption.
2. Your patch for memory leak
3. compaction retval check for putback lru pages for consistency

Hmm. Could I make a patch for any kernel tree?
Unfortunately I can make it out of office and we may spend 1 day at least.
If it is urgent, could you make the patch for me?

Thanks, Andrea.
> 
> NOTE: with the last changes compaction is used for all order > 0 and
> even from kswapd, so you will now be able to trigger bugs in
> compaction or migration even with THP off. However I'm surprised that
> you have issues with compaction...
> 
> I'm posting this for Minchan to review (not meant for merging, untested).
> 
> ======
> Subject: when migrate_pages returns 0, all pages must have been released
> 
> From: Andrea Arcangeli <aarcange@redhat.com>
> 
> In some cases migrate_pages could return zero while still leaving a
> few pages in the pagelist (and some caller wouldn't notice it has to
> call putback_lru_pages).
> 
> Add one missing putback_lru_pages not added by commit
> cf608ac19c95804dc2df43b1f4f9e068aa9034ab.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 548fbd7..75398b0 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1419,6 +1419,7 @@ int soft_offline_page(struct page *page, int flags)
>  		ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL,
>  								0, true);
>  		if (ret) {
> +			putback_lru_pages(&pagelist);
>  			pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
>  				pfn, ret, page->flags);
>  			if (ret > 0)
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 46fe8cc..bea2a34 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -611,6 +611,14 @@ static int move_to_new_page(struct page *newpage, struct page *page,
>  	return rc;
>  }
>  
> +static void unmap_and_move_release_page(struct page *page)
> +{
> +	list_del(&page->lru);
> +	dec_zone_page_state(page, NR_ISOLATED_ANON +
> +			    page_is_file_cache(page));
> +	putback_lru_page(page);
> +}
> +
>  /*
>   * Obtain the lock on page, remove all ptes and migrate the page
>   * to the newly allocated page in newpage.
> @@ -631,11 +639,14 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
>  
>  	if (page_count(page) == 1) {
>  		/* page was freed from under us. So we are done. */
> +		unmap_and_move_release_page(page);
>  		goto move_newpage;
>  	}
>  	if (unlikely(PageTransHuge(page)))
> -		if (unlikely(split_huge_page(page)))
> +		if (unlikely(split_huge_page(page))) {
> +			unmap_and_move_release_page(page);
>  			goto move_newpage;
> +		}
>  
>  	/* prepare cgroup just returns 0 or -ENOMEM */
>  	rc = -EAGAIN;
> @@ -779,10 +790,7 @@ unlock:
>   		 * migrated will have kepts its references and be
>   		 * restored.
>   		 */
> - 		list_del(&page->lru);
> -		dec_zone_page_state(page, NR_ISOLATED_ANON +
> -				page_is_file_cache(page));
> -		putback_lru_page(page);
> +		unmap_and_move_release_page(page);
>  	}
>  
>  move_newpage:

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: hunting an IO hang
  2011-01-17 13:21                   ` Chris Mason
@ 2011-01-17 13:50                     ` Mel Gorman
  2011-01-17 14:07                       ` Chris Mason
  0 siblings, 1 reply; 24+ messages in thread
From: Mel Gorman @ 2011-01-17 13:50 UTC (permalink / raw)
  To: Chris Mason
  Cc: Andrew Morton, Linus Torvalds, Jens Axboe, linux-mm,
	KAMEZAWA Hiroyuki, Andrea Arcangeli

On Mon, Jan 17, 2011 at 08:21:41AM -0500, Chris Mason wrote:
> Excerpts from Mel Gorman's message of 2011-01-17 05:27:44 -0500:
> > On Sun, Jan 16, 2011 at 09:41:41PM -0500, Chris Mason wrote:
> > > Excerpts from Andrew Morton's message of 2011-01-16 21:30:00 -0500:
> > > > (lots of cc's added)
> > > > 
> > > > On Sun, 16 Jan 2011 21:07:40 -0500 Chris Mason <chris.mason@oracle.com> wrote:
> > > > 
> > > > > Excerpts from Linus Torvalds's message of 2011-01-16 20:53:04 -0500:
> > > > > > .. except I actually didn't add Andrew to the cc after all.
> > > > > > 
> > > > > > NOW I did.
> > > > > > 
> > > > > > Oh, and if you can repeat this and bisect it, it would obviously be
> > > > > > great. But that sounds rather painful.
> > > > > 
> > > > > Ok, so I've got 3 different problems in 3 totally different areas.
> > > > > I'm running w/kvm, but this VM is very stable with 2.6.37.  Running
> > > > > Linus' current git it goes boom in exotic ways, this time it was only on
> > > > > ext3, btrfs code never loaded.
> > > > > 
> > > > > Linus, if you're planning on rc1 tonight I'll send my pull request out
> > > > > the door.  Otherwise I'd prefer to fix this and send my pull after
> > > > > actually getting a long btrfs run on the current code.
> > > > > 
> > > > > Next up, CONFIG_DEBUG*, always an adventure on rc1 kernels ;)
> > > > > 
> > > > > WARNING: at lib/list_debug.c:57 list_del+0xc0/0xed()
> > > > > Hardware name: Bochs
> > > > > list_del corruption. next->prev should be ffffea000010cde0, but was ffff88007cff6bc8
> > > > > Modules linked in:
> > > > > Pid: 524, comm: kswapd0 Not tainted 2.6.37-josef+ #180
> > > > > Call Trace:
> > > > >  [<ffffffff8106ec94>] ? warn_slowpath_common+0x85/0x9d
> > > > >  [<ffffffff8106ed4f>] ? warn_slowpath_fmt+0x46/0x48
> > > > >  [<ffffffff81263d6c>] ? list_del+0xc0/0xed
> > > > >  [<ffffffff81106d9d>] ? migrate_pages+0x26f/0x357
> > > > >  [<ffffffff81100e18>] ? compaction_alloc+0x0/0x2dc
> > > > >  [<ffffffff8110150d>] ? compact_zone+0x391/0x5c4
> > > > >  [<ffffffff81101905>] ? compact_zone_order+0xc2/0xd1
> > > > >  [<ffffffff815c321e>] ? _raw_spin_unlock+0xe/0x10
> > > > >  [<ffffffff810dc446>] ? kswapd+0x5c8/0x88f
> > > > >  [<ffffffff810dbe7e>] ? kswapd+0x0/0x88f
> > > > >  [<ffffffff81089ce8>] ? kthread+0x82/0x8a
> > > > >  [<ffffffff810347d4>] ? kernel_thread_helper+0x4/0x10
> > > > >  [<ffffffff81089c66>] ? kthread+0x0/0x8a
> > > > >  [<ffffffff810347d0>] ? kernel_thread_helper+0x0/0x10
> > > > > ---[ end trace 5c6b7933d16b301f ]---
> > > > 
> > > > uh-oh.  Does disabling CONFIG_COMPACTION make this go away (requires
> > > > disabling CONFIG_TRANSPARENT_HUGEPAGE first).
> > > 
> > > We'll see.  I gave THP this same run of tests back in November, it
> > > passed without any problems (after fixing the related btrfs migration
> > > bug).  All of the crashes I've seen this weekend had this in the
> > > .config:
> > > 
> > 
> > I can't find the reset of the thread on any mailing list and am trying
> > to reproduce the problem locally. What workload were you running?
> 
> I'm running a very basic IO stress test:
> 
> http://oss.oracle.com/~mason/stress.sh
> 
> The command line is stress.sh -n 50 -c /mnt/linux-2.6 /mnt
> 

Good to have for future reference. I also successfully reproduced it by
having a lot of dd instances running with fsmark running at the same time -
basically anything that pounds a filesystem when memory is low.  I'm checking
through parts of the tree to see can I pin down where it goes wrong.

A bisect in this case is problematic. Until commit
c5a73c3d55be1faadba35b41a862e036a3b12ddb, compaction was not used very
heavily but is used more frequently after that. Hence, "Good" results before
that can simply because compaction is not being used.  Fortunately, commit
1ce82b69e96c838d007f316b8347b911fdfa9842 looks good so I don't think it's
new breakage introduced to migration or compaction.

> Which starts 50 processes that do cp -a /mnt/linux-2.6
> /mnt/stress/$$.  Then they verify the result was correct and then they
> delete it, forever in a loop.  In this case my linux-2.6 directory is a
> full git tree with sources checked out.  No obj files though.
> 
> This was my crash from an overnight run with CONFIG_COMPACTION off:
> 
> # CONFIG_COMPACTION is not set
> CONFIG_MIGRATION=y
> CONFIG_DEBUG_PAGEALLOC=y
> CONFIG_DEBUG_SLAB=y
> CONFIG_DEBUG_SPINLOCK=y
> CONFIG_DEBUG_MUTEXES=y
> CONFIG_DEBUG_SPINLOCK_SLEEP=y
> CONFIG_DEBUG_VM=y
> CONFIG_DEBUG_MEMORY_INIT=y
> CONFIG_DEBUG_LIST=y
> CONFIG_DEBUG_PAGEALLOC=y
> 
> I do have an NFS mount active during the run, but it isn't part of the
> test at all.
> 
> I've also managed to get all the procs in the system stuck waiting for
> IO requests.  It is possible these are two different bugs.  This
> list_del oops hits faster if I run the test with a good deal of memory
> pressure via an external memory hog.
> 
> ------------[ cut here ]------------
> WARNING: at lib/list_debug.c:54 list_del+0x97/0xed()
> Hardware name: Bochs
> list_del corruption. prev->next should be ffffea000116d478, but was ffffea00014ba2c8
> Modules linked in: btrfs lzo_compress
> Pid: 524, comm: kswapd0 Not tainted 2.6.37-josef+ #182

Oddly I'm not seeing the same list corruption but it is locking up so I
still hope we're seeing the same problem. I'm still a bit away from
pinning down where things are going wrong, but I notice that
"vfs-scale-working" was merged some time after the last "good" point in
the tree.

> Call Trace:
>  [<ffffffff8106edc1>] ? warn_slowpath_common+0x85/0x9d
>  [<ffffffff8106ee7c>] ? warn_slowpath_fmt+0x46/0x48
>  [<ffffffff81262e27>] ? list_del+0x97/0xed
>  [<ffffffff810dbc59>] ? putback_lru_pages+0x7c/0x1eb
>  [<ffffffff810dc070>] ? shrink_inactive_list+0x2a8/0x342
>  [<ffffffff810dc676>] ? shrink_zone+0x327/0x3d6
>  [<ffffffff8119a17a>] ? nfs_access_cache_shrinker+0x179/0x1a0
>  [<ffffffff815c302e>] ? _raw_spin_unlock+0xe/0x10
>  [<ffffffff810d2055>] ? zone_watermark_ok_safe+0xa9/0xb8
>  [<ffffffff810dd26c>] ? kswapd+0x509/0x876
>  [<ffffffff810dcd63>] ? kswapd+0x0/0x876
>  [<ffffffff81089e40>] ? kthread+0x82/0x8a
>  [<ffffffff810347d4>] ? kernel_thread_helper+0x4/0x10
>  [<ffffffff81089dbe>] ? kthread+0x0/0x8a
>  [<ffffffff810347d0>] ? kernel_thread_helper+0x0/0x10
> 
> -chris
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: hunting an IO hang
  2011-01-17 13:50                     ` Mel Gorman
@ 2011-01-17 14:07                       ` Chris Mason
  2011-01-17 15:02                         ` Chris Mason
  0 siblings, 1 reply; 24+ messages in thread
From: Chris Mason @ 2011-01-17 14:07 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linus Torvalds, Jens Axboe, linux-mm,
	KAMEZAWA Hiroyuki, Andrea Arcangeli

Excerpts from Mel Gorman's message of 2011-01-17 08:50:59 -0500:
> On Mon, Jan 17, 2011 at 08:21:41AM -0500, Chris Mason wrote:
> > Excerpts from Mel Gorman's message of 2011-01-17 05:27:44 -0500:
> > > On Sun, Jan 16, 2011 at 09:41:41PM -0500, Chris Mason wrote:
> > > > Excerpts from Andrew Morton's message of 2011-01-16 21:30:00 -0500:
> > > > > (lots of cc's added)
> > > > > 
> > > > > On Sun, 16 Jan 2011 21:07:40 -0500 Chris Mason <chris.mason@oracle.com> wrote:
> > > > > 
> > > > > > Excerpts from Linus Torvalds's message of 2011-01-16 20:53:04 -0500:
> > > > > > > .. except I actually didn't add Andrew to the cc after all.
> > > > > > > 
> > > > > > > NOW I did.
> > > > > > > 
> > > > > > > Oh, and if you can repeat this and bisect it, it would obviously be
> > > > > > > great. But that sounds rather painful.
> > > > > > 
> > > > > > Ok, so I've got 3 different problems in 3 totally different areas.
> > > > > > I'm running w/kvm, but this VM is very stable with 2.6.37.  Running
> > > > > > Linus' current git it goes boom in exotic ways, this time it was only on
> > > > > > ext3, btrfs code never loaded.
> > > > > > 
> > > > > > Linus, if you're planning on rc1 tonight I'll send my pull request out
> > > > > > the door.  Otherwise I'd prefer to fix this and send my pull after
> > > > > > actually getting a long btrfs run on the current code.
> > > > > > 
> > > > > > Next up, CONFIG_DEBUG*, always an adventure on rc1 kernels ;)
> > > > > > 
> > > > > > WARNING: at lib/list_debug.c:57 list_del+0xc0/0xed()
> > > > > > Hardware name: Bochs
> > > > > > list_del corruption. next->prev should be ffffea000010cde0, but was ffff88007cff6bc8
> > > > > > Modules linked in:
> > > > > > Pid: 524, comm: kswapd0 Not tainted 2.6.37-josef+ #180
> > > > > > Call Trace:
> > > > > >  [<ffffffff8106ec94>] ? warn_slowpath_common+0x85/0x9d
> > > > > >  [<ffffffff8106ed4f>] ? warn_slowpath_fmt+0x46/0x48
> > > > > >  [<ffffffff81263d6c>] ? list_del+0xc0/0xed
> > > > > >  [<ffffffff81106d9d>] ? migrate_pages+0x26f/0x357
> > > > > >  [<ffffffff81100e18>] ? compaction_alloc+0x0/0x2dc
> > > > > >  [<ffffffff8110150d>] ? compact_zone+0x391/0x5c4
> > > > > >  [<ffffffff81101905>] ? compact_zone_order+0xc2/0xd1
> > > > > >  [<ffffffff815c321e>] ? _raw_spin_unlock+0xe/0x10
> > > > > >  [<ffffffff810dc446>] ? kswapd+0x5c8/0x88f
> > > > > >  [<ffffffff810dbe7e>] ? kswapd+0x0/0x88f
> > > > > >  [<ffffffff81089ce8>] ? kthread+0x82/0x8a
> > > > > >  [<ffffffff810347d4>] ? kernel_thread_helper+0x4/0x10
> > > > > >  [<ffffffff81089c66>] ? kthread+0x0/0x8a
> > > > > >  [<ffffffff810347d0>] ? kernel_thread_helper+0x0/0x10
> > > > > > ---[ end trace 5c6b7933d16b301f ]---
> > > > > 
> > > > > uh-oh.  Does disabling CONFIG_COMPACTION make this go away (requires
> > > > > disabling CONFIG_TRANSPARENT_HUGEPAGE first).
> > > > 
> > > > We'll see.  I gave THP this same run of tests back in November, it
> > > > passed without any problems (after fixing the related btrfs migration
> > > > bug).  All of the crashes I've seen this weekend had this in the
> > > > .config:
> > > > 
> > > 
> > > I can't find the reset of the thread on any mailing list and am trying
> > > to reproduce the problem locally. What workload were you running?
> > 
> > I'm running a very basic IO stress test:
> > 
> > http://oss.oracle.com/~mason/stress.sh
> > 
> > The command line is stress.sh -n 50 -c /mnt/linux-2.6 /mnt
> > 
> 
> Good to have for future reference. I also successfully reproduced it by
> having a lot of dd instances running with fsmark running at the same time -
> basically anything that pounds a filesystem when memory is low.

That's the idea.  The only thing that stress.sh adds is hitting
on the VFS and on reading back the results from the FS.  Good to hear we
don't need those extra steps in this case.

> I'm checking
> through parts of the tree to see can I pin down where it goes wrong.
> 
> A bisect in this case is problematic. Until commit
> c5a73c3d55be1faadba35b41a862e036a3b12ddb, compaction was not used very
> heavily but is used more frequently after that. Hence, "Good" results before
> that can simply because compaction is not being used.  Fortunately, commit
> 1ce82b69e96c838d007f316b8347b911fdfa9842 looks good so I don't think it's
> new breakage introduced to migration or compaction.

I did have CONFIG_COMPACTION off for my latest reproduce.  The last two
have been corruption on the page->lru lists, maybe that'll help narrow
our bisect pool down.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: hunting an IO hang
  2011-01-17  5:11                 ` Andrea Arcangeli
  2011-01-17 13:48                   ` Minchan Kim
@ 2011-01-17 14:10                   ` Chris Mason
  2011-01-17 14:26                     ` Andrea Arcangeli
  1 sibling, 1 reply; 24+ messages in thread
From: Chris Mason @ 2011-01-17 14:10 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, Linus Torvalds, Jens Axboe, linux-mm,
	KAMEZAWA Hiroyuki, Mel Gorman, Minchan Kim

Excerpts from Andrea Arcangeli's message of 2011-01-17 00:11:35 -0500:

[ crashes under load ]

> 
> NOTE: with the last changes compaction is used for all order > 0 and
> even from kswapd, so you will now be able to trigger bugs in
> compaction or migration even with THP off. However I'm surprised that
> you have issues with compaction...

I know I mentioned this in another email, but it is kind of buried in
other context.  I reproduced my crash with CONFIG_COMPACTION and
CONFIG_MIGRATION off.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: hunting an IO hang
  2011-01-17 14:10                   ` Chris Mason
@ 2011-01-17 14:26                     ` Andrea Arcangeli
  2011-01-17 14:47                       ` Minchan Kim
  0 siblings, 1 reply; 24+ messages in thread
From: Andrea Arcangeli @ 2011-01-17 14:26 UTC (permalink / raw)
  To: Chris Mason
  Cc: Andrew Morton, Linus Torvalds, Jens Axboe, linux-mm,
	KAMEZAWA Hiroyuki, Mel Gorman, Minchan Kim

On Mon, Jan 17, 2011 at 09:10:15AM -0500, Chris Mason wrote:
> Excerpts from Andrea Arcangeli's message of 2011-01-17 00:11:35 -0500:
> 
> [ crashes under load ]
> 
> > 
> > NOTE: with the last changes compaction is used for all order > 0 and
> > even from kswapd, so you will now be able to trigger bugs in
> > compaction or migration even with THP off. However I'm surprised that
> > you have issues with compaction...
> 
> I know I mentioned this in another email, but it is kind of buried in
> other context.  I reproduced my crash with CONFIG_COMPACTION and
> CONFIG_MIGRATION off.

Ok, then it was an accident the page->lru got corrupted during
migration and it has nothing to do with migration/compaction/thp. This
makes sense because we should have noticed long ago if something
wasn't stable there.

I reworked the fix for the two memleaks I found while reviewing
migration code for this bug (unrelated) introduced by the commit
cf608ac19c95804dc2df43b1f4f9e068aa9034ab. It was enough to move the
goto to fix this without having to add a new function (it's
functionally identical to the one I sent before). It also wouldn't
leak memory if it was compaction invoking migrate_pages (only other
callers checking the retval of migrate_pages instead of list_empty,
could leak memory). As said before, this couldn't explain your
problem, and this is only a code review fix, I never triggered this.

This is still only for review for Minchan, not meant for inclusion
yet.

===
Subject: when migrate_pages returns 0, all pages must have been released

From: Andrea Arcangeli <aarcange@redhat.com>

In some cases migrate_pages could return zero while still leaving a
few pages in the pagelist (and some caller wouldn't notice it has to
call putback_lru_pages after commit
cf608ac19c95804dc2df43b1f4f9e068aa9034ab).

Add one missing putback_lru_pages not added by commit
cf608ac19c95804dc2df43b1f4f9e068aa9034ab.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 548fbd7..75398b0 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1419,6 +1419,7 @@ int soft_offline_page(struct page *page, int flags)
 		ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL,
 								0, true);
 		if (ret) {
+			putback_lru_pages(&pagelist);
 			pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
 				pfn, ret, page->flags);
 			if (ret > 0)
diff --git a/mm/migrate.c b/mm/migrate.c
index 46fe8cc..7d34237 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -772,6 +772,7 @@ uncharge:
 unlock:
 	unlock_page(page);
 
+move_newpage:
 	if (rc != -EAGAIN) {
  		/*
  		 * A page that has been migrated has all references
@@ -785,8 +786,6 @@ unlock:
 		putback_lru_page(page);
 	}
 
-move_newpage:
-
 	/*
 	 * Move the new page to the LRU. If migration was not successful
 	 * then this will free the page.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: hunting an IO hang
  2011-01-17 14:26                     ` Andrea Arcangeli
@ 2011-01-17 14:47                       ` Minchan Kim
  2011-01-17 15:09                         ` Minchan Kim
  0 siblings, 1 reply; 24+ messages in thread
From: Minchan Kim @ 2011-01-17 14:47 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Chris Mason, Andrew Morton, Linus Torvalds, Jens Axboe, linux-mm,
	KAMEZAWA Hiroyuki, Mel Gorman

On Mon, Jan 17, 2011 at 03:26:15PM +0100, Andrea Arcangeli wrote:
> On Mon, Jan 17, 2011 at 09:10:15AM -0500, Chris Mason wrote:
> > Excerpts from Andrea Arcangeli's message of 2011-01-17 00:11:35 -0500:
> > 
> > [ crashes under load ]
> > 
> > > 
> > > NOTE: with the last changes compaction is used for all order > 0 and
> > > even from kswapd, so you will now be able to trigger bugs in
> > > compaction or migration even with THP off. However I'm surprised that
> > > you have issues with compaction...
> > 
> > I know I mentioned this in another email, but it is kind of buried in
> > other context.  I reproduced my crash with CONFIG_COMPACTION and
> > CONFIG_MIGRATION off.
> 
> Ok, then it was an accident the page->lru got corrupted during
> migration and it has nothing to do with migration/compaction/thp. This
> makes sense because we should have noticed long ago if something
> wasn't stable there.
> 
> I reworked the fix for the two memleaks I found while reviewing
> migration code for this bug (unrelated) introduced by the commit
> cf608ac19c95804dc2df43b1f4f9e068aa9034ab. It was enough to move the
> goto to fix this without having to add a new function (it's
> functionally identical to the one I sent before). It also wouldn't
> leak memory if it was compaction invoking migrate_pages (only other
> callers checking the retval of migrate_pages instead of list_empty,
> could leak memory). As said before, this couldn't explain your
> problem, and this is only a code review fix, I never triggered this.
> 
> This is still only for review for Minchan, not meant for inclusion
> yet.
> 
> ===
> Subject: when migrate_pages returns 0, all pages must have been released
> 
> From: Andrea Arcangeli <aarcange@redhat.com>
> 
> In some cases migrate_pages could return zero while still leaving a
> few pages in the pagelist (and some caller wouldn't notice it has to
> call putback_lru_pages after commit
> cf608ac19c95804dc2df43b1f4f9e068aa9034ab).
> 
> Add one missing putback_lru_pages not added by commit
> cf608ac19c95804dc2df43b1f4f9e068aa9034ab.

It would be better to have another patch.

> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

Thanks, Andrea.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: hunting an IO hang
  2011-01-17 14:07                       ` Chris Mason
@ 2011-01-17 15:02                         ` Chris Mason
  2011-01-17 16:32                           ` Johannes Weiner
  2011-01-17 17:09                           ` Mel Gorman
  0 siblings, 2 replies; 24+ messages in thread
From: Chris Mason @ 2011-01-17 15:02 UTC (permalink / raw)
  To: Chris Mason
  Cc: Mel Gorman, Andrew Morton, Linus Torvalds, Jens Axboe, linux-mm,
	KAMEZAWA Hiroyuki, Andrea Arcangeli

Excerpts from Chris Mason's message of 2011-01-17 09:07:40 -0500:

[ various crashes under load with current git ]

> 
> I did have CONFIG_COMPACTION off for my latest reproduce.  The last two
> have been corruption on the page->lru lists, maybe that'll help narrow
> our bisect pool down.

I've reverted 744ed1442757767ffede5008bb13e0805085902e, and
d8505dee1a87b8d41b9c4ee1325cd72258226fbc and the run has lasted longer
than any runs in the past.

I'll give this a few hours but they seem the most related to my various
crashes so far.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: hunting an IO hang
  2011-01-17 14:47                       ` Minchan Kim
@ 2011-01-17 15:09                         ` Minchan Kim
  2011-01-17 20:39                           ` Andrea Arcangeli
  0 siblings, 1 reply; 24+ messages in thread
From: Minchan Kim @ 2011-01-17 15:09 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Chris Mason, Andrew Morton, Linus Torvalds, Jens Axboe, linux-mm,
	KAMEZAWA Hiroyuki, Mel Gorman

On Mon, Jan 17, 2011 at 11:47:46PM +0900, Minchan Kim wrote:
> On Mon, Jan 17, 2011 at 03:26:15PM +0100, Andrea Arcangeli wrote:
> > On Mon, Jan 17, 2011 at 09:10:15AM -0500, Chris Mason wrote:
> > > Excerpts from Andrea Arcangeli's message of 2011-01-17 00:11:35 -0500:
> > > 
> > > [ crashes under load ]
> > > 
> > > > 
> > > > NOTE: with the last changes compaction is used for all order > 0 and
> > > > even from kswapd, so you will now be able to trigger bugs in
> > > > compaction or migration even with THP off. However I'm surprised that
> > > > you have issues with compaction...
> > > 
> > > I know I mentioned this in another email, but it is kind of buried in
> > > other context.  I reproduced my crash with CONFIG_COMPACTION and
> > > CONFIG_MIGRATION off.
> > 
> > Ok, then it was an accident the page->lru got corrupted during
> > migration and it has nothing to do with migration/compaction/thp. This
> > makes sense because we should have noticed long ago if something
> > wasn't stable there.
> > 
> > I reworked the fix for the two memleaks I found while reviewing
> > migration code for this bug (unrelated) introduced by the commit
> > cf608ac19c95804dc2df43b1f4f9e068aa9034ab. It was enough to move the
> > goto to fix this without having to add a new function (it's
> > functionally identical to the one I sent before). It also wouldn't
> > leak memory if it was compaction invoking migrate_pages (only other
> > callers checking the retval of migrate_pages instead of list_empty,
> > could leak memory). As said before, this couldn't explain your
> > problem, and this is only a code review fix, I never triggered this.
> > 
> > This is still only for review for Minchan, not meant for inclusion
> > yet.
> > 
> > ===
> > Subject: when migrate_pages returns 0, all pages must have been released
> > 
> > From: Andrea Arcangeli <aarcange@redhat.com>
> > 
> > In some cases migrate_pages could return zero while still leaving a
> > few pages in the pagelist (and some caller wouldn't notice it has to
> > call putback_lru_pages after commit
> > cf608ac19c95804dc2df43b1f4f9e068aa9034ab).
> > 
> > Add one missing putback_lru_pages not added by commit
> > cf608ac19c95804dc2df43b1f4f9e068aa9034ab.
> 
> It would be better to have another patch.
> 
> > 
> > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

And don't we need this patch, either?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: hunting an IO hang
  2011-01-17 15:02                         ` Chris Mason
@ 2011-01-17 16:32                           ` Johannes Weiner
  2011-01-17 18:10                             ` Mel Gorman
  2011-01-17 17:09                           ` Mel Gorman
  1 sibling, 1 reply; 24+ messages in thread
From: Johannes Weiner @ 2011-01-17 16:32 UTC (permalink / raw)
  To: Chris Mason
  Cc: Mel Gorman, Andrew Morton, Linus Torvalds, Jens Axboe, linux-mm,
	KAMEZAWA Hiroyuki, Andrea Arcangeli, Shaohua Li

On Mon, Jan 17, 2011 at 10:02:47AM -0500, Chris Mason wrote:
> Excerpts from Chris Mason's message of 2011-01-17 09:07:40 -0500:
> 
> [ various crashes under load with current git ]
> 
> > 
> > I did have CONFIG_COMPACTION off for my latest reproduce.  The last two
> > have been corruption on the page->lru lists, maybe that'll help narrow
> > our bisect pool down.
> 
> I've reverted 744ed1442757767ffede5008bb13e0805085902e, and
> d8505dee1a87b8d41b9c4ee1325cd72258226fbc and the run has lasted longer
> than any runs in the past.
> 
> I'll give this a few hours but they seem the most related to my various
> crashes so far.

I went through the new batched activation code.  Shaohua, can you
explain to me why the following sequence is not possible?

1. CPU A and B schedule activation of a page (PG_lru && !PG_active)
2. CPU A flushes the page to the active list (PG_lru && PG_active)
3. CPU A isolates the page for scanning/migration and
   puts it on private list (!PG_lru && PG_active)
4. CPU B flushes the page to the active list (!PG_lru && PG_active),
   the deferred activation code now assumes putback mode and adds the page
   to the active list, thus corrupting the link to the private list of CPU A
5. CPU A does list_del() from the private list (like unmap_and_move() does)
   and trips up on the corruption

	Hannes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: hunting an IO hang
  2011-01-17 15:02                         ` Chris Mason
  2011-01-17 16:32                           ` Johannes Weiner
@ 2011-01-17 17:09                           ` Mel Gorman
  2011-01-17 17:40                             ` Chris Mason
  1 sibling, 1 reply; 24+ messages in thread
From: Mel Gorman @ 2011-01-17 17:09 UTC (permalink / raw)
  To: Chris Mason
  Cc: Andrew Morton, Linus Torvalds, Jens Axboe, linux-mm,
	KAMEZAWA Hiroyuki, Andrea Arcangeli, Shaohua Li

On Mon, Jan 17, 2011 at 10:02:47AM -0500, Chris Mason wrote:
> Excerpts from Chris Mason's message of 2011-01-17 09:07:40 -0500:
> 
> [ various crashes under load with current git ]
> 
> > 
> > I did have CONFIG_COMPACTION off for my latest reproduce.  The last two
> > have been corruption on the page->lru lists, maybe that'll help narrow
> > our bisect pool down.
> 
> I've reverted 744ed1442757767ffede5008bb13e0805085902e, and
> d8505dee1a87b8d41b9c4ee1325cd72258226fbc and the run has lasted longer
> than any runs in the past.
> 

Confirmed that reverting these patches makes the problem unreproducible
for the many_dd's + fsmark for at least an hour here.

> I'll give this a few hours but they seem the most related to my various
> crashes so far.
> 
> -chris
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: hunting an IO hang
  2011-01-17 17:09                           ` Mel Gorman
@ 2011-01-17 17:40                             ` Chris Mason
  2011-01-17 18:24                               ` Linus Torvalds
  0 siblings, 1 reply; 24+ messages in thread
From: Chris Mason @ 2011-01-17 17:40 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linus Torvalds, Jens Axboe, linux-mm,
	KAMEZAWA Hiroyuki, Andrea Arcangeli, Shaohua Li

Excerpts from Mel Gorman's message of 2011-01-17 12:09:07 -0500:
> On Mon, Jan 17, 2011 at 10:02:47AM -0500, Chris Mason wrote:
> > Excerpts from Chris Mason's message of 2011-01-17 09:07:40 -0500:
> > 
> > [ various crashes under load with current git ]
> > 
> > > 
> > > I did have CONFIG_COMPACTION off for my latest reproduce.  The last two
> > > have been corruption on the page->lru lists, maybe that'll help narrow
> > > our bisect pool down.
> > 
> > I've reverted 744ed1442757767ffede5008bb13e0805085902e, and
> > d8505dee1a87b8d41b9c4ee1325cd72258226fbc and the run has lasted longer
> > than any runs in the past.
> > 
> 
> Confirmed that reverting these patches makes the problem unreproducible
> for the many_dd's + fsmark for at least an hour here.

After 2+ hours I'm still running with those two commits gone.  I'm
confident they are the cause of the crashes.  I also haven't triggered
the cfq stalls without them.

I basically picked them out of a hat:

git log -p v2.6.37..HEAD mm

And looked for anything that messed with page->lru.  The suspects
outside of THP and compaction was pretty short, and Shaohua's changelog
made it easy to guess they were involved.  Thanks for that, it saved
many hours of git rebasing ;)

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: hunting an IO hang
  2011-01-17 16:32                           ` Johannes Weiner
@ 2011-01-17 18:10                             ` Mel Gorman
  0 siblings, 0 replies; 24+ messages in thread
From: Mel Gorman @ 2011-01-17 18:10 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Chris Mason, Andrew Morton, Linus Torvalds, Jens Axboe, linux-mm,
	KAMEZAWA Hiroyuki, Andrea Arcangeli, Shaohua Li

On Mon, Jan 17, 2011 at 05:32:22PM +0100, Johannes Weiner wrote:
> On Mon, Jan 17, 2011 at 10:02:47AM -0500, Chris Mason wrote:
> > Excerpts from Chris Mason's message of 2011-01-17 09:07:40 -0500:
> > 
> > [ various crashes under load with current git ]
> > 
> > > 
> > > I did have CONFIG_COMPACTION off for my latest reproduce.  The last two
> > > have been corruption on the page->lru lists, maybe that'll help narrow
> > > our bisect pool down.
> > 
> > I've reverted 744ed1442757767ffede5008bb13e0805085902e, and
> > d8505dee1a87b8d41b9c4ee1325cd72258226fbc and the run has lasted longer
> > than any runs in the past.
> > 
> > I'll give this a few hours but they seem the most related to my various
> > crashes so far.
> 
> I went through the new batched activation code.  Shaohua, can you
> explain to me why the following sequence is not possible?
> 
> 1. CPU A and B schedule activation of a page (PG_lru && !PG_active)
> 2. CPU A flushes the page to the active list (PG_lru && PG_active)
> 3. CPU A isolates the page for scanning/migration and
>    puts it on private list (!PG_lru && PG_active)
> 4. CPU B flushes the page to the active list (!PG_lru && PG_active),
>    the deferred activation code now assumes putback mode and adds the page
>    to the active list, thus corrupting the link to the private list of CPU A
> 5. CPU A does list_del() from the private list (like unmap_and_move() does)
>    and trips up on the corruption
> 

In addition, PageLRU is a bad test in __activate_page for deciding whether
the page needs to be unlinked. When a page is on a pagevec, it's not an LRU
page and it's not on a linked list. When a page is on a private linked list,
it's not an LRU page but it has to be removed from the private list before
adding to the LRU to avoid list corruption.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: hunting an IO hang
  2011-01-17 17:40                             ` Chris Mason
@ 2011-01-17 18:24                               ` Linus Torvalds
  2011-01-17 21:23                                 ` Chris Mason
  2011-01-17 23:02                                 ` Linus Torvalds
  0 siblings, 2 replies; 24+ messages in thread
From: Linus Torvalds @ 2011-01-17 18:24 UTC (permalink / raw)
  To: Chris Mason
  Cc: Mel Gorman, Andrew Morton, Jens Axboe, linux-mm,
	KAMEZAWA Hiroyuki, Andrea Arcangeli, Shaohua Li

On Mon, Jan 17, 2011 at 9:40 AM, Chris Mason <chris.mason@oracle.com> wrote:
>> >
>> > I've reverted 744ed1442757767ffede5008bb13e0805085902e, and
>> > d8505dee1a87b8d41b9c4ee1325cd72258226fbc and the run has lasted longer
>> > than any runs in the past.
>> >
>>
>> Confirmed that reverting these patches makes the problem unreproducible
>> for the many_dd's + fsmark for at least an hour here.
>
> After 2+ hours I'm still running with those two commits gone.  I'm
> confident they are the cause of the crashes.  I also haven't triggered
> the cfq stalls without them.

Ok, so the question is how to proceed from here.

I can easily revert them, and since I was planning on doing -rc1
tonight, I probably will. But I promised Chris to delay until tomorrow
if he needed time to chase this down, and while it's now apparently
chased down, I'll certainly also be open to delaying until tomorrow if
somebody has a patch to fix it.

So right now my plan is:
 - I will revert those two later today and then release -rc1 in the evening
UNLESS
 - somebody posts a patch for the problem in the next few hours and
Chris/others are willing to give it a good test overnight (or whatever
people feel is "sufficient" based on how easily they can trigger the
issue), in which case I'd do -rc1 tomorrow (either with the reverts or
the patch, depending on how testing works out)

Sounds like a plan?

(Also, I'm really happy it didn't turn out to be the lock-less RCU
lookup. I didn't really think it would be based on the symptoms, but
I'm still happy. Reverting a few random MM patches is _sooo_ much
easier than having to worry about some subtle locking issue with the
totally changed VFS name lookup)

                       Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: hunting an IO hang
  2011-01-17 15:09                         ` Minchan Kim
@ 2011-01-17 20:39                           ` Andrea Arcangeli
  0 siblings, 0 replies; 24+ messages in thread
From: Andrea Arcangeli @ 2011-01-17 20:39 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Chris Mason, Andrew Morton, Linus Torvalds, Jens Axboe, linux-mm,
	KAMEZAWA Hiroyuki, Mel Gorman

On Tue, Jan 18, 2011 at 12:09:54AM +0900, Minchan Kim wrote:
> And don't we need this patch, either?

I think we need your fix too.

I thought about that but I wasn't sure (I was focusing on Chris's bug
that had no hugetlbfs involvement), but your patch makes it more
obvious. At least that place wouldn't risk to break silently ;). I
guess hugepage migration from memory failure wasn't much tested yet...

Maybe it'd be cleaner to add a putback_lru_huge_pages but I don't mind
because it seems nothing but memory-failure will ever attempt to
migrate an hugepage.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: hunting an IO hang
  2011-01-17 18:24                               ` Linus Torvalds
@ 2011-01-17 21:23                                 ` Chris Mason
  2011-01-17 23:03                                   ` Mel Gorman
  2011-01-17 23:02                                 ` Linus Torvalds
  1 sibling, 1 reply; 24+ messages in thread
From: Chris Mason @ 2011-01-17 21:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mel Gorman, Andrew Morton, Jens Axboe, linux-mm,
	KAMEZAWA Hiroyuki, Andrea Arcangeli, Shaohua Li

Excerpts from Linus Torvalds's message of 2011-01-17 13:24:55 -0500:
> On Mon, Jan 17, 2011 at 9:40 AM, Chris Mason <chris.mason@oracle.com> wrote:
> >> >
> >> > I've reverted 744ed1442757767ffede5008bb13e0805085902e, and
> >> > d8505dee1a87b8d41b9c4ee1325cd72258226fbc and the run has lasted longer
> >> > than any runs in the past.
> >> >
> >>
> >> Confirmed that reverting these patches makes the problem unreproducible
> >> for the many_dd's + fsmark for at least an hour here.
> >
> > After 2+ hours I'm still running with those two commits gone. A I'm
> > confident they are the cause of the crashes. A I also haven't triggered
> > the cfq stalls without them.
> 
> Ok, so the question is how to proceed from here.
> 
> I can easily revert them, and since I was planning on doing -rc1
> tonight, I probably will. But I promised Chris to delay until tomorrow
> if he needed time to chase this down, and while it's now apparently
> chased down, I'll certainly also be open to delaying until tomorrow if
> somebody has a patch to fix it.
> 
> So right now my plan is:
>  - I will revert those two later today and then release -rc1 in the evening
> UNLESS
>  - somebody posts a patch for the problem in the next few hours and
> Chris/others are willing to give it a good test overnight (or whatever
> people feel is "sufficient" based on how easily they can trigger the
> issue), in which case I'd do -rc1 tomorrow (either with the reverts or
> the patch, depending on how testing works out)

If a patch does come in, I'm happy to test it.  Mel had a test that
triggered within 1-2 minutes, mine took 30 or so, which means I'd want a
2 hour run to convince myself it was really fixed.  But, I'll give Mel's
fs_mark + dd workload a try on the buggy kernel.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: hunting an IO hang
  2011-01-17 18:24                               ` Linus Torvalds
  2011-01-17 21:23                                 ` Chris Mason
@ 2011-01-17 23:02                                 ` Linus Torvalds
  2011-01-17 23:13                                   ` Minchan Kim
  1 sibling, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2011-01-17 23:02 UTC (permalink / raw)
  To: Chris Mason
  Cc: Mel Gorman, Andrew Morton, Jens Axboe, linux-mm,
	KAMEZAWA Hiroyuki, Andrea Arcangeli, Shaohua Li, Minchan Kim

On Mon, Jan 17, 2011 at 10:24 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So right now my plan is:
>  - I will revert those two later today and then release -rc1 in the evening
> UNLESS
>  - somebody posts a patch for the problem in the next few hours [..]

Ok, so nothing obvious popped up, and I reverted the two patches.

I've also seen two other patches floating around here in this thread
(one by Andrea, one by Minchan), but didn't apply them as it wasn't
entirely clear what the status of those patches were. My current plan
is to do -rc1 tonight, and hopefully with the two reverts it will be
reasonably stable. We obviously will have several weeks for polishing.

                          Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: hunting an IO hang
  2011-01-17 21:23                                 ` Chris Mason
@ 2011-01-17 23:03                                   ` Mel Gorman
  2011-01-18  0:30                                     ` Shaohua Li
  0 siblings, 1 reply; 24+ messages in thread
From: Mel Gorman @ 2011-01-17 23:03 UTC (permalink / raw)
  To: Chris Mason
  Cc: Linus Torvalds, Andrew Morton, Jens Axboe, linux-mm,
	KAMEZAWA Hiroyuki, Andrea Arcangeli, Shaohua Li

On Mon, Jan 17, 2011 at 04:23:56PM -0500, Chris Mason wrote:
> Excerpts from Linus Torvalds's message of 2011-01-17 13:24:55 -0500:
> > On Mon, Jan 17, 2011 at 9:40 AM, Chris Mason <chris.mason@oracle.com> wrote:
> > >> >
> > >> > I've reverted 744ed1442757767ffede5008bb13e0805085902e, and
> > >> > d8505dee1a87b8d41b9c4ee1325cd72258226fbc and the run has lasted longer
> > >> > than any runs in the past.
> > >> >
> > >>
> > >> Confirmed that reverting these patches makes the problem unreproducible
> > >> for the many_dd's + fsmark for at least an hour here.
> > >
> > > After 2+ hours I'm still running with those two commits gone.  I'm
> > > confident they are the cause of the crashes.  I also haven't triggered
> > > the cfq stalls without them.
> > 
> > Ok, so the question is how to proceed from here.
> > 
> > I can easily revert them, and since I was planning on doing -rc1
> > tonight, I probably will. But I promised Chris to delay until tomorrow
> > if he needed time to chase this down, and while it's now apparently
> > chased down, I'll certainly also be open to delaying until tomorrow if
> > somebody has a patch to fix it.
> > 
> > So right now my plan is:
> >  - I will revert those two later today and then release -rc1 in the evening
> > UNLESS
> >  - somebody posts a patch for the problem in the next few hours and
> > Chris/others are willing to give it a good test overnight (or whatever
> > people feel is "sufficient" based on how easily they can trigger the
> > issue), in which case I'd do -rc1 tomorrow (either with the reverts or
> > the patch, depending on how testing works out)
> 
> If a patch does come in, I'm happy to test it.  Mel had a test that
> triggered within 1-2 minutes, mine took 30 or so, which means I'd want a
> 2 hour run to convince myself it was really fixed.  But, I'll give Mel's
> fs_mark + dd workload a try on the buggy kernel.
> 

I spent a while seeing if there was a simple patch but it's not trivially
fixable. __activate_page() is getting called in too many different situations
to be fully sure the function is doing the right thing in all cases. I also
couldn't convince myself that the accounting was correct in all cases. I
think the idea of batching updates from mark_page_accessed() in particular
is a good idea but the patch needs a do-over.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: hunting an IO hang
  2011-01-17 23:02                                 ` Linus Torvalds
@ 2011-01-17 23:13                                   ` Minchan Kim
  0 siblings, 0 replies; 24+ messages in thread
From: Minchan Kim @ 2011-01-17 23:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Mason, Mel Gorman, Andrew Morton, Jens Axboe, linux-mm,
	KAMEZAWA Hiroyuki, Andrea Arcangeli, Shaohua Li

On Tue, Jan 18, 2011 at 8:02 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Jan 17, 2011 at 10:24 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> So right now my plan is:
>>  - I will revert those two later today and then release -rc1 in the evening
>> UNLESS
>>  - somebody posts a patch for the problem in the next few hours [..]
>
> Ok, so nothing obvious popped up, and I reverted the two patches.
>
> I've also seen two other patches floating around here in this thread
> (one by Andrea, one by Minchan), but didn't apply them as it wasn't
> entirely clear what the status of those patches were. My current plan
> is to do -rc1 tonight, and hopefully with the two reverts it will be
> reasonably stable. We obviously will have several weeks for polishing.

Andrea patch fixes memory leak(except compaction) and my one's fixes
page corruption when memory-failure happens on hugepage(It's very rare
case).  It is apparent but not critical if we consider current
status(sooner or later, you should release rc1). So I will resend it
after rc1 release.

Thanks.

>
>                          Linus
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: hunting an IO hang
  2011-01-17 23:03                                   ` Mel Gorman
@ 2011-01-18  0:30                                     ` Shaohua Li
  0 siblings, 0 replies; 24+ messages in thread
From: Shaohua Li @ 2011-01-18  0:30 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Chris Mason, Linus Torvalds, Andrew Morton, Jens Axboe, linux-mm,
	KAMEZAWA Hiroyuki, Andrea Arcangeli

On Tue, 2011-01-18 at 07:03 +0800, Mel Gorman wrote:
> On Mon, Jan 17, 2011 at 04:23:56PM -0500, Chris Mason wrote:
> > Excerpts from Linus Torvalds's message of 2011-01-17 13:24:55 -0500:
> > > On Mon, Jan 17, 2011 at 9:40 AM, Chris Mason <chris.mason@oracle.com> wrote:
> > > >> >
> > > >> > I've reverted 744ed1442757767ffede5008bb13e0805085902e, and
> > > >> > d8505dee1a87b8d41b9c4ee1325cd72258226fbc and the run has lasted longer
> > > >> > than any runs in the past.
> > > >> >
> > > >>
> > > >> Confirmed that reverting these patches makes the problem unreproducible
> > > >> for the many_dd's + fsmark for at least an hour here.
> > > >
> > > > After 2+ hours I'm still running with those two commits gone.  I'm
> > > > confident they are the cause of the crashes.  I also haven't triggered
> > > > the cfq stalls without them.
> > > 
> > > Ok, so the question is how to proceed from here.
> > > 
> > > I can easily revert them, and since I was planning on doing -rc1
> > > tonight, I probably will. But I promised Chris to delay until tomorrow
> > > if he needed time to chase this down, and while it's now apparently
> > > chased down, I'll certainly also be open to delaying until tomorrow if
> > > somebody has a patch to fix it.
> > > 
> > > So right now my plan is:
> > >  - I will revert those two later today and then release -rc1 in the evening
> > > UNLESS
> > >  - somebody posts a patch for the problem in the next few hours and
> > > Chris/others are willing to give it a good test overnight (or whatever
> > > people feel is "sufficient" based on how easily they can trigger the
> > > issue), in which case I'd do -rc1 tomorrow (either with the reverts or
> > > the patch, depending on how testing works out)
> > 
> > If a patch does come in, I'm happy to test it.  Mel had a test that
> > triggered within 1-2 minutes, mine took 30 or so, which means I'd want a
> > 2 hour run to convince myself it was really fixed.  But, I'll give Mel's
> > fs_mark + dd workload a try on the buggy kernel.
> > 
> 
> I spent a while seeing if there was a simple patch but it's not trivially
> fixable. __activate_page() is getting called in too many different situations
> to be fully sure the function is doing the right thing in all cases. I also
> couldn't convince myself that the accounting was correct in all cases. I
> think the idea of batching updates from mark_page_accessed() in particular
> is a good idea but the patch needs a do-over.
Sorry for the trouble. I'll look at it.

Thanks,
Shaohua


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2011-01-18  0:30 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1295225684-sup-7168@think>
     [not found] ` <AANLkTikBamG2NG6j-z9fyTx=mk6NXFEE7LpB5z9s6ufr@mail.gmail.com>
     [not found]   ` <4D339C87.30100@fusionio.com>
     [not found]     ` <1295228148-sup-7379@think>
     [not found]       ` <AANLkTimp6ef0W_=ijW=CfH6iC1mQzW3gLr1LZivJ5Bmd@mail.gmail.com>
     [not found]         ` <AANLkTimr3hN8SDmbwv98hkcVfWoh9tioYg4M+0yanzpb@mail.gmail.com>
     [not found]           ` <1295229722-sup-6494@think>
2011-01-17  2:30             ` hunting an IO hang Andrew Morton
2011-01-17  2:41               ` Chris Mason
2011-01-17  5:11                 ` Andrea Arcangeli
2011-01-17 13:48                   ` Minchan Kim
2011-01-17 14:10                   ` Chris Mason
2011-01-17 14:26                     ` Andrea Arcangeli
2011-01-17 14:47                       ` Minchan Kim
2011-01-17 15:09                         ` Minchan Kim
2011-01-17 20:39                           ` Andrea Arcangeli
2011-01-17 10:27                 ` Mel Gorman
2011-01-17 13:21                   ` Chris Mason
2011-01-17 13:50                     ` Mel Gorman
2011-01-17 14:07                       ` Chris Mason
2011-01-17 15:02                         ` Chris Mason
2011-01-17 16:32                           ` Johannes Weiner
2011-01-17 18:10                             ` Mel Gorman
2011-01-17 17:09                           ` Mel Gorman
2011-01-17 17:40                             ` Chris Mason
2011-01-17 18:24                               ` Linus Torvalds
2011-01-17 21:23                                 ` Chris Mason
2011-01-17 23:03                                   ` Mel Gorman
2011-01-18  0:30                                     ` Shaohua Li
2011-01-17 23:02                                 ` Linus Torvalds
2011-01-17 23:13                                   ` Minchan Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.