* [2.6.35-rc1, bug] mm: minute-long livelocks in memory reclaim
@ 2010-08-22 23:48 ` Dave Chinner
0 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2010-08-22 23:48 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm
Folks,
I've been testing parallel create workloads over the weekend, and
I've seen this a couple of times now under 8 thread parallel creates
with XFS. I'm running on an 8p VM with 4GB RAM and a fast disk
subsystem. Basically I am seeing the create rate drop to zero
with all 8 CPUs stuck spinning for up to 2 minutes. 'echo t >
/proc/sysrq-trigger' while this is occurring gives the following
trace for all the fs-mark processes:
[49506.624018] fs_mark R running task 0 8376 7917 0x00000008
[49506.624018] 0000000000000000 ffffffff81b94590 00000000000008fc 0000000000000002
[49506.624018] 0000000000000000 0000000000000286 0000000000000297 ffffffffffffff10
[49506.624018] ffffffff810b3d02 0000000000000010 0000000000000202 ffff88011df777a8
[49506.624018] Call Trace:
[49506.624018] [<ffffffff810b3d02>] ? smp_call_function_many+0x1a2/0x210
[49506.624018] [<ffffffff810b3ce5>] ? smp_call_function_many+0x185/0x210
[49506.624018] [<ffffffff81109170>] ? drain_local_pages+0x0/0x20
[49506.624018] [<ffffffff810b3d92>] ? smp_call_function+0x22/0x30
[49506.624018] [<ffffffff810849a4>] ? on_each_cpu+0x24/0x50
[49506.624018] [<ffffffff81107bec>] ? drain_all_pages+0x1c/0x20
[49506.624018] [<ffffffff8110825a>] ? __alloc_pages_nodemask+0x57a/0x730
[49506.624018] [<ffffffff8113c6d2>] ? kmem_getpages+0x62/0x160
[49506.624018] [<ffffffff8113d2b2>] ? fallback_alloc+0x192/0x240
[49506.624018] [<ffffffff8113cce1>] ? cache_grow+0x2d1/0x300
[49506.624018] [<ffffffff8113d04a>] ? ____cache_alloc_node+0x9a/0x170
[49506.624018] [<ffffffff8113cf6c>] ? cache_alloc_refill+0x25c/0x2a0
[49506.624018] [<ffffffff8113ddb3>] ? __kmalloc+0x193/0x230
[49506.624018] [<ffffffff812f59af>] ? kmem_alloc+0x8f/0xe0
[49506.624018] [<ffffffff812f59af>] ? kmem_alloc+0x8f/0xe0
[49506.624018] [<ffffffff812f5a9e>] ? kmem_zalloc+0x1e/0x50
[49506.624018] [<ffffffff812e2f4d>] ? xfs_log_commit_cil+0x9d/0x440
[49506.624018] [<ffffffff812eeec6>] ? _xfs_trans_commit+0x1e6/0x2b0
[49506.624018] [<ffffffff812f2b6f>] ? xfs_create+0x51f/0x690
[49506.624018] [<ffffffff812ffdb7>] ? xfs_vn_mknod+0xa7/0x1c0
[49506.624018] [<ffffffff812fff00>] ? xfs_vn_create+0x10/0x20
[49506.624018] [<ffffffff811510b8>] ? vfs_create+0xb8/0xf0
[49506.624018] [<ffffffff81151d2c>] ? do_last+0x4dc/0x5d0
[49506.624018] [<ffffffff81153bd7>] ? do_filp_open+0x207/0x5e0
[49506.624018] [<ffffffff8105fc58>] ? pvclock_clocksource_read+0x58/0xd0
[49506.624018] [<ffffffff8115eaca>] ? alloc_fd+0x10a/0x150
[49506.624018] [<ffffffff81144005>] ? do_sys_open+0x65/0x130
[49506.624018] [<ffffffff81144110>] ? sys_open+0x20/0x30
[49506.624018] [<ffffffff81036072>] ? system_call_fastpath+0x16/0x1b
Eventually the problem goes away, and the system goes back to
performing at the normal rate. Any ideas on how to avoid this
problem? I'm using CONFIG_SLAB=y is that is relevant....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* [2.6.35-rc1, bug] mm: minute-long livelocks in memory reclaim
@ 2010-08-22 23:48 ` Dave Chinner
0 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2010-08-22 23:48 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm
Folks,
I've been testing parallel create workloads over the weekend, and
I've seen this a couple of times now under 8 thread parallel creates
with XFS. I'm running on an 8p VM with 4GB RAM and a fast disk
subsystem. Basically I am seeing the create rate drop to zero
with all 8 CPUs stuck spinning for up to 2 minutes. 'echo t >
/proc/sysrq-trigger' while this is occurring gives the following
trace for all the fs-mark processes:
[49506.624018] fs_mark R running task 0 8376 7917 0x00000008
[49506.624018] 0000000000000000 ffffffff81b94590 00000000000008fc 0000000000000002
[49506.624018] 0000000000000000 0000000000000286 0000000000000297 ffffffffffffff10
[49506.624018] ffffffff810b3d02 0000000000000010 0000000000000202 ffff88011df777a8
[49506.624018] Call Trace:
[49506.624018] [<ffffffff810b3d02>] ? smp_call_function_many+0x1a2/0x210
[49506.624018] [<ffffffff810b3ce5>] ? smp_call_function_many+0x185/0x210
[49506.624018] [<ffffffff81109170>] ? drain_local_pages+0x0/0x20
[49506.624018] [<ffffffff810b3d92>] ? smp_call_function+0x22/0x30
[49506.624018] [<ffffffff810849a4>] ? on_each_cpu+0x24/0x50
[49506.624018] [<ffffffff81107bec>] ? drain_all_pages+0x1c/0x20
[49506.624018] [<ffffffff8110825a>] ? __alloc_pages_nodemask+0x57a/0x730
[49506.624018] [<ffffffff8113c6d2>] ? kmem_getpages+0x62/0x160
[49506.624018] [<ffffffff8113d2b2>] ? fallback_alloc+0x192/0x240
[49506.624018] [<ffffffff8113cce1>] ? cache_grow+0x2d1/0x300
[49506.624018] [<ffffffff8113d04a>] ? ____cache_alloc_node+0x9a/0x170
[49506.624018] [<ffffffff8113cf6c>] ? cache_alloc_refill+0x25c/0x2a0
[49506.624018] [<ffffffff8113ddb3>] ? __kmalloc+0x193/0x230
[49506.624018] [<ffffffff812f59af>] ? kmem_alloc+0x8f/0xe0
[49506.624018] [<ffffffff812f59af>] ? kmem_alloc+0x8f/0xe0
[49506.624018] [<ffffffff812f5a9e>] ? kmem_zalloc+0x1e/0x50
[49506.624018] [<ffffffff812e2f4d>] ? xfs_log_commit_cil+0x9d/0x440
[49506.624018] [<ffffffff812eeec6>] ? _xfs_trans_commit+0x1e6/0x2b0
[49506.624018] [<ffffffff812f2b6f>] ? xfs_create+0x51f/0x690
[49506.624018] [<ffffffff812ffdb7>] ? xfs_vn_mknod+0xa7/0x1c0
[49506.624018] [<ffffffff812fff00>] ? xfs_vn_create+0x10/0x20
[49506.624018] [<ffffffff811510b8>] ? vfs_create+0xb8/0xf0
[49506.624018] [<ffffffff81151d2c>] ? do_last+0x4dc/0x5d0
[49506.624018] [<ffffffff81153bd7>] ? do_filp_open+0x207/0x5e0
[49506.624018] [<ffffffff8105fc58>] ? pvclock_clocksource_read+0x58/0xd0
[49506.624018] [<ffffffff8115eaca>] ? alloc_fd+0x10a/0x150
[49506.624018] [<ffffffff81144005>] ? do_sys_open+0x65/0x130
[49506.624018] [<ffffffff81144110>] ? sys_open+0x20/0x30
[49506.624018] [<ffffffff81036072>] ? system_call_fastpath+0x16/0x1b
Eventually the problem goes away, and the system goes back to
performing at the normal rate. Any ideas on how to avoid this
problem? I'm using CONFIG_SLAB=y is that is relevant....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [2.6.35-rc1, bug] mm: minute-long livelocks in memory reclaim
2010-08-22 23:48 ` Dave Chinner
@ 2010-08-23 6:58 ` Wu Fengguang
-1 siblings, 0 replies; 8+ messages in thread
From: Wu Fengguang @ 2010-08-23 6:58 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-kernel, linux-mm
On Mon, Aug 23, 2010 at 09:48:11AM +1000, Dave Chinner wrote:
> Folks,
>
> I've been testing parallel create workloads over the weekend, and
> I've seen this a couple of times now under 8 thread parallel creates
> with XFS. I'm running on an 8p VM with 4GB RAM and a fast disk
> subsystem. Basically I am seeing the create rate drop to zero
> with all 8 CPUs stuck spinning for up to 2 minutes. 'echo t >
> /proc/sysrq-trigger' while this is occurring gives the following
> trace for all the fs-mark processes:
>
> [49506.624018] fs_mark R running task 0 8376 7917 0x00000008
> [49506.624018] 0000000000000000 ffffffff81b94590 00000000000008fc 0000000000000002
> [49506.624018] 0000000000000000 0000000000000286 0000000000000297 ffffffffffffff10
> [49506.624018] ffffffff810b3d02 0000000000000010 0000000000000202 ffff88011df777a8
> [49506.624018] Call Trace:
> [49506.624018] [<ffffffff810b3d02>] ? smp_call_function_many+0x1a2/0x210
> [49506.624018] [<ffffffff810b3ce5>] ? smp_call_function_many+0x185/0x210
> [49506.624018] [<ffffffff81109170>] ? drain_local_pages+0x0/0x20
> [49506.624018] [<ffffffff810b3d92>] ? smp_call_function+0x22/0x30
> [49506.624018] [<ffffffff810849a4>] ? on_each_cpu+0x24/0x50
> [49506.624018] [<ffffffff81107bec>] ? drain_all_pages+0x1c/0x20
> [49506.624018] [<ffffffff8110825a>] ? __alloc_pages_nodemask+0x57a/0x730
> [49506.624018] [<ffffffff8113c6d2>] ? kmem_getpages+0x62/0x160
> [49506.624018] [<ffffffff8113d2b2>] ? fallback_alloc+0x192/0x240
> [49506.624018] [<ffffffff8113cce1>] ? cache_grow+0x2d1/0x300
> [49506.624018] [<ffffffff8113d04a>] ? ____cache_alloc_node+0x9a/0x170
> [49506.624018] [<ffffffff8113cf6c>] ? cache_alloc_refill+0x25c/0x2a0
> [49506.624018] [<ffffffff8113ddb3>] ? __kmalloc+0x193/0x230
> [49506.624018] [<ffffffff812f59af>] ? kmem_alloc+0x8f/0xe0
> [49506.624018] [<ffffffff812f59af>] ? kmem_alloc+0x8f/0xe0
> [49506.624018] [<ffffffff812f5a9e>] ? kmem_zalloc+0x1e/0x50
> [49506.624018] [<ffffffff812e2f4d>] ? xfs_log_commit_cil+0x9d/0x440
> [49506.624018] [<ffffffff812eeec6>] ? _xfs_trans_commit+0x1e6/0x2b0
> [49506.624018] [<ffffffff812f2b6f>] ? xfs_create+0x51f/0x690
> [49506.624018] [<ffffffff812ffdb7>] ? xfs_vn_mknod+0xa7/0x1c0
> [49506.624018] [<ffffffff812fff00>] ? xfs_vn_create+0x10/0x20
> [49506.624018] [<ffffffff811510b8>] ? vfs_create+0xb8/0xf0
> [49506.624018] [<ffffffff81151d2c>] ? do_last+0x4dc/0x5d0
> [49506.624018] [<ffffffff81153bd7>] ? do_filp_open+0x207/0x5e0
> [49506.624018] [<ffffffff8105fc58>] ? pvclock_clocksource_read+0x58/0xd0
> [49506.624018] [<ffffffff8115eaca>] ? alloc_fd+0x10a/0x150
> [49506.624018] [<ffffffff81144005>] ? do_sys_open+0x65/0x130
> [49506.624018] [<ffffffff81144110>] ? sys_open+0x20/0x30
> [49506.624018] [<ffffffff81036072>] ? system_call_fastpath+0x16/0x1b
>
> Eventually the problem goes away, and the system goes back to
> performing at the normal rate. Any ideas on how to avoid this
> problem? I'm using CONFIG_SLAB=y is that is relevant....
zone->lock contention? Try rip the following two lines. The change
might be a bit aggressive though :)
Thanks,
Fengguang
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1bb327a..c08b8d3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1864,9 +1864,6 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
cond_resched();
- if (order != 0)
- drain_all_pages();
-
if (likely(*did_some_progress))
page = get_page_from_freelist(gfp_mask, nodemask, order,
zonelist, high_zoneidx,
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [2.6.35-rc1, bug] mm: minute-long livelocks in memory reclaim
@ 2010-08-23 6:58 ` Wu Fengguang
0 siblings, 0 replies; 8+ messages in thread
From: Wu Fengguang @ 2010-08-23 6:58 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-kernel, linux-mm
On Mon, Aug 23, 2010 at 09:48:11AM +1000, Dave Chinner wrote:
> Folks,
>
> I've been testing parallel create workloads over the weekend, and
> I've seen this a couple of times now under 8 thread parallel creates
> with XFS. I'm running on an 8p VM with 4GB RAM and a fast disk
> subsystem. Basically I am seeing the create rate drop to zero
> with all 8 CPUs stuck spinning for up to 2 minutes. 'echo t >
> /proc/sysrq-trigger' while this is occurring gives the following
> trace for all the fs-mark processes:
>
> [49506.624018] fs_mark R running task 0 8376 7917 0x00000008
> [49506.624018] 0000000000000000 ffffffff81b94590 00000000000008fc 0000000000000002
> [49506.624018] 0000000000000000 0000000000000286 0000000000000297 ffffffffffffff10
> [49506.624018] ffffffff810b3d02 0000000000000010 0000000000000202 ffff88011df777a8
> [49506.624018] Call Trace:
> [49506.624018] [<ffffffff810b3d02>] ? smp_call_function_many+0x1a2/0x210
> [49506.624018] [<ffffffff810b3ce5>] ? smp_call_function_many+0x185/0x210
> [49506.624018] [<ffffffff81109170>] ? drain_local_pages+0x0/0x20
> [49506.624018] [<ffffffff810b3d92>] ? smp_call_function+0x22/0x30
> [49506.624018] [<ffffffff810849a4>] ? on_each_cpu+0x24/0x50
> [49506.624018] [<ffffffff81107bec>] ? drain_all_pages+0x1c/0x20
> [49506.624018] [<ffffffff8110825a>] ? __alloc_pages_nodemask+0x57a/0x730
> [49506.624018] [<ffffffff8113c6d2>] ? kmem_getpages+0x62/0x160
> [49506.624018] [<ffffffff8113d2b2>] ? fallback_alloc+0x192/0x240
> [49506.624018] [<ffffffff8113cce1>] ? cache_grow+0x2d1/0x300
> [49506.624018] [<ffffffff8113d04a>] ? ____cache_alloc_node+0x9a/0x170
> [49506.624018] [<ffffffff8113cf6c>] ? cache_alloc_refill+0x25c/0x2a0
> [49506.624018] [<ffffffff8113ddb3>] ? __kmalloc+0x193/0x230
> [49506.624018] [<ffffffff812f59af>] ? kmem_alloc+0x8f/0xe0
> [49506.624018] [<ffffffff812f59af>] ? kmem_alloc+0x8f/0xe0
> [49506.624018] [<ffffffff812f5a9e>] ? kmem_zalloc+0x1e/0x50
> [49506.624018] [<ffffffff812e2f4d>] ? xfs_log_commit_cil+0x9d/0x440
> [49506.624018] [<ffffffff812eeec6>] ? _xfs_trans_commit+0x1e6/0x2b0
> [49506.624018] [<ffffffff812f2b6f>] ? xfs_create+0x51f/0x690
> [49506.624018] [<ffffffff812ffdb7>] ? xfs_vn_mknod+0xa7/0x1c0
> [49506.624018] [<ffffffff812fff00>] ? xfs_vn_create+0x10/0x20
> [49506.624018] [<ffffffff811510b8>] ? vfs_create+0xb8/0xf0
> [49506.624018] [<ffffffff81151d2c>] ? do_last+0x4dc/0x5d0
> [49506.624018] [<ffffffff81153bd7>] ? do_filp_open+0x207/0x5e0
> [49506.624018] [<ffffffff8105fc58>] ? pvclock_clocksource_read+0x58/0xd0
> [49506.624018] [<ffffffff8115eaca>] ? alloc_fd+0x10a/0x150
> [49506.624018] [<ffffffff81144005>] ? do_sys_open+0x65/0x130
> [49506.624018] [<ffffffff81144110>] ? sys_open+0x20/0x30
> [49506.624018] [<ffffffff81036072>] ? system_call_fastpath+0x16/0x1b
>
> Eventually the problem goes away, and the system goes back to
> performing at the normal rate. Any ideas on how to avoid this
> problem? I'm using CONFIG_SLAB=y is that is relevant....
zone->lock contention? Try rip the following two lines. The change
might be a bit aggressive though :)
Thanks,
Fengguang
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1bb327a..c08b8d3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1864,9 +1864,6 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
cond_resched();
- if (order != 0)
- drain_all_pages();
-
if (likely(*did_some_progress))
page = get_page_from_freelist(gfp_mask, nodemask, order,
zonelist, high_zoneidx,
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [2.6.35-rc1, bug] mm: minute-long livelocks in memory reclaim
2010-08-23 6:58 ` Wu Fengguang
@ 2010-08-23 9:23 ` David Rientjes
-1 siblings, 0 replies; 8+ messages in thread
From: David Rientjes @ 2010-08-23 9:23 UTC (permalink / raw)
To: Wu Fengguang; +Cc: Dave Chinner, Mel Gorman, linux-kernel, linux-mm
On Mon, 23 Aug 2010, Wu Fengguang wrote:
> > I've been testing parallel create workloads over the weekend, and
> > I've seen this a couple of times now under 8 thread parallel creates
> > with XFS. I'm running on an 8p VM with 4GB RAM and a fast disk
> > subsystem. Basically I am seeing the create rate drop to zero
> > with all 8 CPUs stuck spinning for up to 2 minutes. 'echo t >
> > /proc/sysrq-trigger' while this is occurring gives the following
> > trace for all the fs-mark processes:
> >
> > [49506.624018] fs_mark R running task 0 8376 7917 0x00000008
> > [49506.624018] 0000000000000000 ffffffff81b94590 00000000000008fc 0000000000000002
> > [49506.624018] 0000000000000000 0000000000000286 0000000000000297 ffffffffffffff10
> > [49506.624018] ffffffff810b3d02 0000000000000010 0000000000000202 ffff88011df777a8
> > [49506.624018] Call Trace:
> > [49506.624018] [<ffffffff810b3d02>] ? smp_call_function_many+0x1a2/0x210
> > [49506.624018] [<ffffffff810b3ce5>] ? smp_call_function_many+0x185/0x210
> > [49506.624018] [<ffffffff81109170>] ? drain_local_pages+0x0/0x20
> > [49506.624018] [<ffffffff810b3d92>] ? smp_call_function+0x22/0x30
> > [49506.624018] [<ffffffff810849a4>] ? on_each_cpu+0x24/0x50
> > [49506.624018] [<ffffffff81107bec>] ? drain_all_pages+0x1c/0x20
> > [49506.624018] [<ffffffff8110825a>] ? __alloc_pages_nodemask+0x57a/0x730
> > [49506.624018] [<ffffffff8113c6d2>] ? kmem_getpages+0x62/0x160
> > [49506.624018] [<ffffffff8113d2b2>] ? fallback_alloc+0x192/0x240
> > [49506.624018] [<ffffffff8113cce1>] ? cache_grow+0x2d1/0x300
> > [49506.624018] [<ffffffff8113d04a>] ? ____cache_alloc_node+0x9a/0x170
> > [49506.624018] [<ffffffff8113cf6c>] ? cache_alloc_refill+0x25c/0x2a0
> > [49506.624018] [<ffffffff8113ddb3>] ? __kmalloc+0x193/0x230
> > [49506.624018] [<ffffffff812f59af>] ? kmem_alloc+0x8f/0xe0
> > [49506.624018] [<ffffffff812f59af>] ? kmem_alloc+0x8f/0xe0
> > [49506.624018] [<ffffffff812f5a9e>] ? kmem_zalloc+0x1e/0x50
> > [49506.624018] [<ffffffff812e2f4d>] ? xfs_log_commit_cil+0x9d/0x440
> > [49506.624018] [<ffffffff812eeec6>] ? _xfs_trans_commit+0x1e6/0x2b0
> > [49506.624018] [<ffffffff812f2b6f>] ? xfs_create+0x51f/0x690
> > [49506.624018] [<ffffffff812ffdb7>] ? xfs_vn_mknod+0xa7/0x1c0
> > [49506.624018] [<ffffffff812fff00>] ? xfs_vn_create+0x10/0x20
> > [49506.624018] [<ffffffff811510b8>] ? vfs_create+0xb8/0xf0
> > [49506.624018] [<ffffffff81151d2c>] ? do_last+0x4dc/0x5d0
> > [49506.624018] [<ffffffff81153bd7>] ? do_filp_open+0x207/0x5e0
> > [49506.624018] [<ffffffff8105fc58>] ? pvclock_clocksource_read+0x58/0xd0
> > [49506.624018] [<ffffffff8115eaca>] ? alloc_fd+0x10a/0x150
> > [49506.624018] [<ffffffff81144005>] ? do_sys_open+0x65/0x130
> > [49506.624018] [<ffffffff81144110>] ? sys_open+0x20/0x30
> > [49506.624018] [<ffffffff81036072>] ? system_call_fastpath+0x16/0x1b
> >
> > Eventually the problem goes away, and the system goes back to
> > performing at the normal rate. Any ideas on how to avoid this
> > problem? I'm using CONFIG_SLAB=y is that is relevant....
>
> zone->lock contention? Try rip the following two lines. The change
> might be a bit aggressive though :)
>
> Thanks,
> Fengguang
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 1bb327a..c08b8d3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1864,9 +1864,6 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
>
> cond_resched();
>
> - if (order != 0)
> - drain_all_pages();
> -
> if (likely(*did_some_progress))
> page = get_page_from_freelist(gfp_mask, nodemask, order,
> zonelist, high_zoneidx,
You may be interested in Mel's patchset that he just proposed for -mm
which identifies watermark variations on machines with high cpu counts
(perhaps even eight, as in this report). The last patch actually reworks
this hunk of the code as well.
http://marc.info/?l=linux-mm&m=128255044912938
http://marc.info/?l=linux-mm&m=128255045312950
http://marc.info/?l=linux-mm&m=128255045012942
http://marc.info/?l=linux-mm&m=128255045612954
Dave, it would be interesting to see if this fixes your problem.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [2.6.35-rc1, bug] mm: minute-long livelocks in memory reclaim
@ 2010-08-23 9:23 ` David Rientjes
0 siblings, 0 replies; 8+ messages in thread
From: David Rientjes @ 2010-08-23 9:23 UTC (permalink / raw)
To: Wu Fengguang; +Cc: Dave Chinner, Mel Gorman, linux-kernel, linux-mm
On Mon, 23 Aug 2010, Wu Fengguang wrote:
> > I've been testing parallel create workloads over the weekend, and
> > I've seen this a couple of times now under 8 thread parallel creates
> > with XFS. I'm running on an 8p VM with 4GB RAM and a fast disk
> > subsystem. Basically I am seeing the create rate drop to zero
> > with all 8 CPUs stuck spinning for up to 2 minutes. 'echo t >
> > /proc/sysrq-trigger' while this is occurring gives the following
> > trace for all the fs-mark processes:
> >
> > [49506.624018] fs_mark R running task 0 8376 7917 0x00000008
> > [49506.624018] 0000000000000000 ffffffff81b94590 00000000000008fc 0000000000000002
> > [49506.624018] 0000000000000000 0000000000000286 0000000000000297 ffffffffffffff10
> > [49506.624018] ffffffff810b3d02 0000000000000010 0000000000000202 ffff88011df777a8
> > [49506.624018] Call Trace:
> > [49506.624018] [<ffffffff810b3d02>] ? smp_call_function_many+0x1a2/0x210
> > [49506.624018] [<ffffffff810b3ce5>] ? smp_call_function_many+0x185/0x210
> > [49506.624018] [<ffffffff81109170>] ? drain_local_pages+0x0/0x20
> > [49506.624018] [<ffffffff810b3d92>] ? smp_call_function+0x22/0x30
> > [49506.624018] [<ffffffff810849a4>] ? on_each_cpu+0x24/0x50
> > [49506.624018] [<ffffffff81107bec>] ? drain_all_pages+0x1c/0x20
> > [49506.624018] [<ffffffff8110825a>] ? __alloc_pages_nodemask+0x57a/0x730
> > [49506.624018] [<ffffffff8113c6d2>] ? kmem_getpages+0x62/0x160
> > [49506.624018] [<ffffffff8113d2b2>] ? fallback_alloc+0x192/0x240
> > [49506.624018] [<ffffffff8113cce1>] ? cache_grow+0x2d1/0x300
> > [49506.624018] [<ffffffff8113d04a>] ? ____cache_alloc_node+0x9a/0x170
> > [49506.624018] [<ffffffff8113cf6c>] ? cache_alloc_refill+0x25c/0x2a0
> > [49506.624018] [<ffffffff8113ddb3>] ? __kmalloc+0x193/0x230
> > [49506.624018] [<ffffffff812f59af>] ? kmem_alloc+0x8f/0xe0
> > [49506.624018] [<ffffffff812f59af>] ? kmem_alloc+0x8f/0xe0
> > [49506.624018] [<ffffffff812f5a9e>] ? kmem_zalloc+0x1e/0x50
> > [49506.624018] [<ffffffff812e2f4d>] ? xfs_log_commit_cil+0x9d/0x440
> > [49506.624018] [<ffffffff812eeec6>] ? _xfs_trans_commit+0x1e6/0x2b0
> > [49506.624018] [<ffffffff812f2b6f>] ? xfs_create+0x51f/0x690
> > [49506.624018] [<ffffffff812ffdb7>] ? xfs_vn_mknod+0xa7/0x1c0
> > [49506.624018] [<ffffffff812fff00>] ? xfs_vn_create+0x10/0x20
> > [49506.624018] [<ffffffff811510b8>] ? vfs_create+0xb8/0xf0
> > [49506.624018] [<ffffffff81151d2c>] ? do_last+0x4dc/0x5d0
> > [49506.624018] [<ffffffff81153bd7>] ? do_filp_open+0x207/0x5e0
> > [49506.624018] [<ffffffff8105fc58>] ? pvclock_clocksource_read+0x58/0xd0
> > [49506.624018] [<ffffffff8115eaca>] ? alloc_fd+0x10a/0x150
> > [49506.624018] [<ffffffff81144005>] ? do_sys_open+0x65/0x130
> > [49506.624018] [<ffffffff81144110>] ? sys_open+0x20/0x30
> > [49506.624018] [<ffffffff81036072>] ? system_call_fastpath+0x16/0x1b
> >
> > Eventually the problem goes away, and the system goes back to
> > performing at the normal rate. Any ideas on how to avoid this
> > problem? I'm using CONFIG_SLAB=y is that is relevant....
>
> zone->lock contention? Try rip the following two lines. The change
> might be a bit aggressive though :)
>
> Thanks,
> Fengguang
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 1bb327a..c08b8d3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1864,9 +1864,6 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
>
> cond_resched();
>
> - if (order != 0)
> - drain_all_pages();
> -
> if (likely(*did_some_progress))
> page = get_page_from_freelist(gfp_mask, nodemask, order,
> zonelist, high_zoneidx,
You may be interested in Mel's patchset that he just proposed for -mm
which identifies watermark variations on machines with high cpu counts
(perhaps even eight, as in this report). The last patch actually reworks
this hunk of the code as well.
http://marc.info/?l=linux-mm&m=128255044912938
http://marc.info/?l=linux-mm&m=128255045312950
http://marc.info/?l=linux-mm&m=128255045012942
http://marc.info/?l=linux-mm&m=128255045612954
Dave, it would be interesting to see if this fixes your problem.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [2.6.35-rc1, bug] mm: minute-long livelocks in memory reclaim
2010-08-23 9:23 ` David Rientjes
@ 2010-08-23 12:33 ` Dave Chinner
-1 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2010-08-23 12:33 UTC (permalink / raw)
To: David Rientjes; +Cc: Wu Fengguang, Mel Gorman, linux-kernel, linux-mm
On Mon, Aug 23, 2010 at 02:23:27AM -0700, David Rientjes wrote:
> On Mon, 23 Aug 2010, Wu Fengguang wrote:
>
> > > I've been testing parallel create workloads over the weekend, and
> > > I've seen this a couple of times now under 8 thread parallel creates
> > > with XFS. I'm running on an 8p VM with 4GB RAM and a fast disk
> > > subsystem. Basically I am seeing the create rate drop to zero
> > > with all 8 CPUs stuck spinning for up to 2 minutes. 'echo t >
> > > /proc/sysrq-trigger' while this is occurring gives the following
> > > trace for all the fs-mark processes:
.....
>
> You may be interested in Mel's patchset that he just proposed for -mm
> which identifies watermark variations on machines with high cpu counts
> (perhaps even eight, as in this report). The last patch actually reworks
> this hunk of the code as well.
>
> http://marc.info/?l=linux-mm&m=128255044912938
> http://marc.info/?l=linux-mm&m=128255045312950
> http://marc.info/?l=linux-mm&m=128255045012942
> http://marc.info/?l=linux-mm&m=128255045612954
>
> Dave, it would be interesting to see if this fixes your problem.
That looks promising - I'll give it a shot, though my test case is
not really what you'd call reproducable(*) so it might take a
couple of days before I can say whether the issue has gone away or
not.
Cheers,
Dave.
(*) create 100 million inodes in parallel using fsmark, collect and
watch behavioural metrics via PCP/pmchart for stuff out of the
ordinary, and dump stack traces, etc when somthing strange occurs.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [2.6.35-rc1, bug] mm: minute-long livelocks in memory reclaim
@ 2010-08-23 12:33 ` Dave Chinner
0 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2010-08-23 12:33 UTC (permalink / raw)
To: David Rientjes; +Cc: Wu Fengguang, Mel Gorman, linux-kernel, linux-mm
On Mon, Aug 23, 2010 at 02:23:27AM -0700, David Rientjes wrote:
> On Mon, 23 Aug 2010, Wu Fengguang wrote:
>
> > > I've been testing parallel create workloads over the weekend, and
> > > I've seen this a couple of times now under 8 thread parallel creates
> > > with XFS. I'm running on an 8p VM with 4GB RAM and a fast disk
> > > subsystem. Basically I am seeing the create rate drop to zero
> > > with all 8 CPUs stuck spinning for up to 2 minutes. 'echo t >
> > > /proc/sysrq-trigger' while this is occurring gives the following
> > > trace for all the fs-mark processes:
.....
>
> You may be interested in Mel's patchset that he just proposed for -mm
> which identifies watermark variations on machines with high cpu counts
> (perhaps even eight, as in this report). The last patch actually reworks
> this hunk of the code as well.
>
> http://marc.info/?l=linux-mm&m=128255044912938
> http://marc.info/?l=linux-mm&m=128255045312950
> http://marc.info/?l=linux-mm&m=128255045012942
> http://marc.info/?l=linux-mm&m=128255045612954
>
> Dave, it would be interesting to see if this fixes your problem.
That looks promising - I'll give it a shot, though my test case is
not really what you'd call reproducable(*) so it might take a
couple of days before I can say whether the issue has gone away or
not.
Cheers,
Dave.
(*) create 100 million inodes in parallel using fsmark, collect and
watch behavioural metrics via PCP/pmchart for stuff out of the
ordinary, and dump stack traces, etc when somthing strange occurs.
--
Dave Chinner
david@fromorbit.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-08-23 12:33 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-22 23:48 [2.6.35-rc1, bug] mm: minute-long livelocks in memory reclaim Dave Chinner
2010-08-22 23:48 ` Dave Chinner
2010-08-23 6:58 ` Wu Fengguang
2010-08-23 6:58 ` Wu Fengguang
2010-08-23 9:23 ` David Rientjes
2010-08-23 9:23 ` David Rientjes
2010-08-23 12:33 ` Dave Chinner
2010-08-23 12:33 ` Dave Chinner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.