All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC] use WQ_MEM_RECLAIM for m_log_workqueue
@ 2015-11-04 18:51 Chris Mason
  2015-11-05 12:10 ` Chris Mason
  2015-11-10 18:01 ` [PATCH v2] xfs: " Chris Mason
  0 siblings, 2 replies; 4+ messages in thread
From: Chris Mason @ 2015-11-04 18:51 UTC (permalink / raw)
  To: Dave Chinner, xfs, Tejun Heo

We're consistently hitting deadlocks here with XFS on recent kernels.
After some digging through the crash files, it looks like everyone in
the system is waiting for XFS to reclaim memory.

Something like this:

PID: 2733434  TASK: ffff8808cd242800  CPU: 19  COMMAND: "java"
 #0 [ffff880019c53588] __schedule at ffffffff818c4df2
 #1 [ffff880019c535d8] schedule at ffffffff818c5517
 #2 [ffff880019c535f8] _xfs_log_force_lsn at ffffffff81316348
 #3 [ffff880019c53688] xfs_log_force_lsn at ffffffff813164fb
 #4 [ffff880019c536b8] xfs_iunpin_wait at ffffffff8130835e
 #5 [ffff880019c53728] xfs_reclaim_inode at ffffffff812fd453
 #6 [ffff880019c53778] xfs_reclaim_inodes_ag at ffffffff812fd8c7
 #7 [ffff880019c53928] xfs_reclaim_inodes_nr at ffffffff812fe433
 #8 [ffff880019c53958] xfs_fs_free_cached_objects at ffffffff8130d3b9
 #9 [ffff880019c53968] super_cache_scan at ffffffff811a6f73
#10 [ffff880019c539c8] shrink_slab at ffffffff811460e6
#11 [ffff880019c53aa8] shrink_zone at ffffffff8114a53f
#12 [ffff880019c53b48] do_try_to_free_pages at ffffffff8114a8ba
#13 [ffff880019c53be8] try_to_free_pages at ffffffff8114ad5a
#14 [ffff880019c53c78] __alloc_pages_nodemask at ffffffff8113e1b8
#15 [ffff880019c53d88] alloc_kmem_pages_node at ffffffff8113e671
#16 [ffff880019c53dd8] copy_process at ffffffff8104f781
#17 [ffff880019c53ec8] do_fork at ffffffff8105129c
#18 [ffff880019c53f38] sys_clone at ffffffff810515b6
#19 [ffff880019c53f48] stub_clone at ffffffff818c8e4d

xfs_log_force_lsn is waiting for logs to get cleaned, which is waiting
for IO, which is waiting for workers to complete the IO which is waiting
for worker threads that don't exist yet:

PID: 2752451  TASK: ffff880bd6bdda00  CPU: 37  COMMAND: "kworker/37:1"
 #0 [ffff8808d20abbb0] __schedule at ffffffff818c4df2
 #1 [ffff8808d20abc00] schedule at ffffffff818c5517
 #2 [ffff8808d20abc20] schedule_timeout at ffffffff818c7c6c
 #3 [ffff8808d20abcc0] wait_for_completion_killable at ffffffff818c6495
 #4 [ffff8808d20abd30] kthread_create_on_node at ffffffff8106ec82
 #5 [ffff8808d20abdf0] create_worker at ffffffff8106752f
 #6 [ffff8808d20abe40] worker_thread at ffffffff810699be
 #7 [ffff8808d20abec0] kthread at ffffffff8106ef59
 #8 [ffff8808d20abf50] ret_from_fork at ffffffff818c8ac8

I think we should be using WQ_MEM_RECLAIM to make sure this thread pool
makes progress when we're not able to allocate new workers.

Outside of extensive analysis from gcc, this patch is untested.  If it
looks good, we'll try it here and report back about the results.

Signed-off-by: Chris Mason <clm@fb.com>

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 904f637..40276bd 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -843,7 +843,8 @@ xfs_init_mount_workqueues(
 		goto out_destroy_cil;
 
 	mp->m_log_workqueue = alloc_workqueue("xfs-log/%s",
-			WQ_FREEZABLE|WQ_HIGHPRI, 0, mp->m_fsname);
+			WQ_MEM_RECLAIM|WQ_FREEZABLE|WQ_HIGHPRI,
+			0, mp->m_fsname);
 	if (!mp->m_log_workqueue)
 		goto out_destroy_reclaim;
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC] use WQ_MEM_RECLAIM for m_log_workqueue
  2015-11-04 18:51 [PATCH RFC] use WQ_MEM_RECLAIM for m_log_workqueue Chris Mason
@ 2015-11-05 12:10 ` Chris Mason
  2015-11-08 21:16   ` Dave Chinner
  2015-11-10 18:01 ` [PATCH v2] xfs: " Chris Mason
  1 sibling, 1 reply; 4+ messages in thread
From: Chris Mason @ 2015-11-05 12:10 UTC (permalink / raw)
  To: Dave Chinner, xfs, Tejun Heo

On Wed, Nov 04, 2015 at 01:51:03PM -0500, Chris Mason wrote:
> I think we should be using WQ_MEM_RECLAIM to make sure this thread pool
> makes progress when we're not able to allocate new workers.

Thinking harder, it's probably best to just flag them all
WQ_MEM_RECLAIM.  This is what btrfs does, and it saves you from painful
discoveries about how different queues depend on each other.

Tejun did verify in the dump that progress on m_log_workqueue was stuck
waiting for more threads.

I'll start testing and send a v2.

-chris

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC] use WQ_MEM_RECLAIM for m_log_workqueue
  2015-11-05 12:10 ` Chris Mason
@ 2015-11-08 21:16   ` Dave Chinner
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2015-11-08 21:16 UTC (permalink / raw)
  To: Chris Mason; +Cc: Tejun Heo, xfs

On Thu, Nov 05, 2015 at 07:10:59AM -0500, Chris Mason wrote:
> On Wed, Nov 04, 2015 at 01:51:03PM -0500, Chris Mason wrote:
> > I think we should be using WQ_MEM_RECLAIM to make sure this thread pool
> > makes progress when we're not able to allocate new workers.
> 
> Thinking harder, it's probably best to just flag them all
> WQ_MEM_RECLAIM.  This is what btrfs does, and it saves you from painful
> discoveries about how different queues depend on each other.

Makes sense, we missed this one because the original use of the
workqueue was just for a periodic, non-critical function. Then we
move the log IO completion to it in 3.19 in commit b29c70f ("xfs:
split metadata and log buffer completion to separate workqueues").

> I'll start testing and send a v2.

Seems like a no-brainer to me...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2] xfs: use WQ_MEM_RECLAIM for m_log_workqueue
  2015-11-04 18:51 [PATCH RFC] use WQ_MEM_RECLAIM for m_log_workqueue Chris Mason
  2015-11-05 12:10 ` Chris Mason
@ 2015-11-10 18:01 ` Chris Mason
  1 sibling, 0 replies; 4+ messages in thread
From: Chris Mason @ 2015-11-10 18:01 UTC (permalink / raw)
  To: Dave Chinner, xfs, Tejun Heo

We're consistently hitting deadlocks here with XFS on recent kernels.
After some digging through the crash files, it looks like everyone in
the system is waiting for XFS to reclaim memory.

Something like this:

PID: 2733434  TASK: ffff8808cd242800  CPU: 19  COMMAND: "java"
0 [ffff88082d093368] __schedule at ffffffff818c4df2
1 [ffff88082d0933b8] schedule at ffffffff818c5517
2 [ffff88082d0933d8] schedule_timeout at ffffffff818c7c6c
3 [ffff88082d093478] wait_for_completion at ffffffff818c6125
4 [ffff88082d0934d8] flush_work at ffffffff8106867c
5 [ffff88082d093548] xlog_cil_force_lsn at ffffffff81317b00
6 [ffff88082d0935f8] _xfs_log_force_lsn at ffffffff81316249
7 [ffff88082d093688] xfs_log_force_lsn at ffffffff813164fb
8 [ffff88082d0936b8] xfs_iunpin_wait at ffffffff8130835e
9 [ffff88082d093728] xfs_reclaim_inode at ffffffff812fd453
10 [ffff88082d093778] xfs_reclaim_inodes_ag at ffffffff812fd8c7
11 [ffff88082d093928] xfs_reclaim_inodes_nr at ffffffff812fe433
12 [ffff88082d093958] xfs_fs_free_cached_objects at ffffffff8130d3b9
13 [ffff88082d093968] super_cache_scan at ffffffff811a6f73
14 [ffff88082d0939c8] shrink_slab at ffffffff811460e6
15 [ffff88082d093aa8] shrink_zone at ffffffff8114a53f
16 [ffff88082d093b48] do_try_to_free_pages at ffffffff8114a8ba
17 [ffff88082d093be8] try_to_free_pages at ffffffff8114ad5a
18 [ffff88082d093c78] __alloc_pages_nodemask at ffffffff8113e1b8
19 [ffff88082d093d88] alloc_kmem_pages_node at ffffffff8113e671
20 [ffff88082d093dd8] copy_process at ffffffff8104f781
21 [ffff88082d093ec8] do_fork at ffffffff8105129c
22 [ffff88082d093f38] sys_clone at ffffffff810515b6
23 [ffff88082d093f48] stub_clone at ffffffff818c8e4d

xfs_log_force_lsn is waiting for logs to get cleaned, which is waiting
for IO, which is waiting for workers to complete the IO which is waiting
for worker threads that don't exist yet:

PID: 2752451  TASK: ffff880bd6bdda00  CPU: 37  COMMAND: "kworker/37:1"
0 [ffff88000394fbb0] __schedule at ffffffff818c4df2
1 [ffff88000394fc00] schedule at ffffffff818c5517
2 [ffff88000394fc20] schedule_timeout at ffffffff818c7c6c
3 [ffff88000394fcc0] wait_for_completion_killable at ffffffff818c6495
4 [ffff88000394fd30] kthread_create_on_node at ffffffff8106ec82
5 [ffff88000394fdf0] create_worker at ffffffff8106752f
6 [ffff88000394fe40] worker_thread at ffffffff810699be
7 [ffff88000394fec0] kthread at ffffffff8106ef59
8 [ffff88000394ff50] ret_from_fork at ffffffff818c8ac8

I think we should be using WQ_MEM_RECLAIM to make sure this thread pool
makes progress when we're not able to allocate new workers.

While we are here, just assume that everything in the FS will eventually
wait on one of these workqueues, and flag them all for reclaim.

Signed-off-by: Chris Mason <clm@fb.com>
---
 v1 -> v2 add WQ_MEM_RECLAIM to xfs-reclaim and xfs-eofblocks too,
 fix commit text so git commit doesn't eat the stack traces

 fs/xfs/xfs_super.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 904f637..d74d7e1 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -838,17 +838,18 @@ xfs_init_mount_workqueues(
 		goto out_destroy_unwritten;
 
 	mp->m_reclaim_workqueue = alloc_workqueue("xfs-reclaim/%s",
-			WQ_FREEZABLE, 0, mp->m_fsname);
+			WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_fsname);
 	if (!mp->m_reclaim_workqueue)
 		goto out_destroy_cil;
 
 	mp->m_log_workqueue = alloc_workqueue("xfs-log/%s",
-			WQ_FREEZABLE|WQ_HIGHPRI, 0, mp->m_fsname);
+			WQ_MEM_RECLAIM|WQ_FREEZABLE|WQ_HIGHPRI,
+			0, mp->m_fsname);
 	if (!mp->m_log_workqueue)
 		goto out_destroy_reclaim;
 
 	mp->m_eofblocks_workqueue = alloc_workqueue("xfs-eofblocks/%s",
-			WQ_FREEZABLE, 0, mp->m_fsname);
+			WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_fsname);
 	if (!mp->m_eofblocks_workqueue)
 		goto out_destroy_log;
 
-- 
2.4.6

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-11-10 18:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-04 18:51 [PATCH RFC] use WQ_MEM_RECLAIM for m_log_workqueue Chris Mason
2015-11-05 12:10 ` Chris Mason
2015-11-08 21:16   ` Dave Chinner
2015-11-10 18:01 ` [PATCH v2] xfs: " Chris Mason

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.