All of lore.kernel.org
 help / color / mirror / Atom feed
* XFS: 3-way deadlock with xfs_dquot, xfs_buf and xfs_inode
@ 2018-12-15  5:34 张本龙
  2018-12-17 23:33 ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: 张本龙 @ 2018-12-15  5:34 UTC (permalink / raw)
  To: linux-xfs; +Cc: Dave Chinner, Brian Foster

Hi Developpers and XFS,

There seems to be a deadlock involving 3 threads: 1) the fsync thread
has acquired the project quota lock, and is trying to get the xfs_buf
(it's a an agf); 2) the xfs_buf is attached to a transaction, and
xfs_end_io is trying to get the xfs_inode ilock; 3) the write thread
has acquired the xfs_inode ilock, and tries to get the xfs_dquot.
Below are the traces.

INFO: task xxx-super:14692 blocked for more than 120 seconds.
---------------------------------------
Call Trace:
 schedule+0x29/0x70
 schedule_timeout+0x239/0x2c0
 ? kmem_cache_alloc+0x1ba/0x1e0
 ? kmem_zone_alloc+0x97/0x130 [xfs]
 ? kmem_zone_alloc+0x97/0x130 [xfs]
 __down_common+0x108/0x154
 ? i40e_xmit_frame_ring+0x3f0/0x12d0 [i40e]
 ? _xfs_buf_find+0x176/0x340 [xfs]
 __down+0x1d/0x1f
 down+0x41/0x50
 xfs_buf_lock+0x3c/0xd0 [xfs]
 _xfs_buf_find+0x176/0x340 [xfs]
 xfs_buf_get_map+0x2a/0x240 [xfs]
 xfs_buf_read_map+0x30/0x160 [xfs]
 xfs_trans_read_buf_map+0x211/0x400 [xfs]
 xfs_read_agf+0x93/0x110 [xfs]
 xfs_alloc_read_agf+0x4b/0x110 [xfs]
 xfs_alloc_fix_freelist+0x34b/0x410 [xfs]
 ? xfs_bmap_add_extent_hole_delay+0xe0/0x5e0 [xfs]
 ? radix_tree_lookup+0xd/0x10
 ? xfs_perag_get+0x2a/0xb0 [xfs]
 ? radix_tree_lookup+0xd/0x10
 ? xfs_perag_get+0x2a/0xb0 [xfs]
 xfs_alloc_vextent+0x294/0x5f0 [xfs]
 xfs_bmap_btalloc+0x3f3/0x780 [xfs]
 xfs_bmap_alloc+0xe/0x10 [xfs]
 xfs_bmapi_write+0x499/0xab0 [xfs]
 xfs_iomap_write_allocate+0x177/0x390 [xfs] (xfs_qm_dqattach)
 xfs_map_blocks+0x1a6/0x210 [xfs]
 xfs_do_writepage+0x17b/0x550 [xfs]
 write_cache_pages+0x251/0x4d0
 ? xfs_aops_discard_page+0x150/0x150 [xfs]
 ? try_to_wake_up+0x1c8/0x320
 xfs_vm_writepages+0xc5/0xe0 [xfs]
 do_writepages+0x1e/0x40
__filemap_fdatawrite_range+0x65/0x80
 filemap_write_and_wait_range+0x41/0x90
 xfs_file_fsync+0x66/0x1e0 [xfs]
 do_fsync+0x65/0xa0
 ? SyS_write+0x9f/0xe0
 SyS_fsync+0x10/0x20
 system_call_fastpath+0x16/0x1b

Workqueue: xfs-data/md1 xfs_end_io
-------------------------------------
Call Trace:
 schedule+0x29/0x70
 rwsem_down_write_failed+0x115/0x220
 ? load_balance+0x1e2/0x990
 ? xfs_setfilesize+0x2d/0x100 [xfs]
 call_rwsem_down_write_failed+0x17/0x30
 down_write+0x2d/0x30
 xfs_ilock+0xc1/0x120 [xfs]
 xfs_setfilesize+0x2d/0x100 [xfs]
 xfs_setfilesize_ioend+0x4a/0x60 [xfs]
 xfs_end_io+0x43/0x80 [xfs]
 process_one_work+0x17b/0x470
 worker_thread+0x126/0x410
 ? rescuer_thread+0x460/0x460
 kthread+0xcf/0xe0
 ? kthread_create_on_node+0x140/0x140
 ret_from_fork+0x58/0x90
 kthread_create_on_node+0x140/0x140

INFO: task java:39107 blocked for more than 120 seconds.
-------------------------------------
 Call Trace:
 schedule_preempt_disabled+0x29/0x70
 __mutex_lock_slowpath+0xc5/0x1c0
 mutex_lock+0x1f/0x2f
 xfs_trans_dqresv+0x44/0x470 [xfs]
 xfs_trans_reserve_quota_bydquots+0x11e/0x180 [xfs]
 xfs_trans_reserve_quota_nblks+0x5f/0x70 [xfs]
 xfs_bmapi_reserve_delalloc+0x87/0x1f0 [xfs]
 xfs_bmapi_delay+0x12b/0x2a0 [xfs]
 xfs_iomap_write_delay+0x178/0x2e0 [xfs]
 __xfs_get_blocks+0x4c3/0x7d0 [xfs] (xfs_ilock)
 xfs_get_blocks+0x14/0x20 [xfs]
 __block_write_begin+0x1a7/0x490
 ? __xfs_get_blocks+0x7d0/0x7d0 [xfs]
 ? grab_cache_page_write_begin+0x9b/0xd0
 xfs_vm_write_begin+0x51/0xe0 [xfs]
 ? xfs_vm_write_end+0x29/0x80 [xfs]
 generic_file_buffered_write+0x11e/0x2a0
 xfs_file_buffered_aio_write+0x10b/0x260 [xfs]
 xfs_file_aio_write+0x18d/0x1a0 [xfs]
 do_sync_write+0x8d/0xd0
 vfs_write+0xbd/0x1e0
 SyS_write+0x7f/0xe0
 tracesys+0xdd/0xe2

Once they lockup, kworkers are blocked on xfs_dquot, leading to dirty
pages piling up on memory cgroup. Then a bunch of threads won't get
pages in path:
__alloc_pages_nodemask
mem_cgoup_reclaim
  shrink_zone
    shrink_page_list
      wait_on_page_writeback

It's 3.10.0-514.16.1.el7.x86_64 kernel, met about 10-20 times a week
on several hundred of servers.

Actually I'm not quite sure about the scenario, or whether it has been
fixed in mainline.

Thank you very much,
Benlong

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: XFS: 3-way deadlock with xfs_dquot, xfs_buf and xfs_inode
  2018-12-15  5:34 XFS: 3-way deadlock with xfs_dquot, xfs_buf and xfs_inode 张本龙
@ 2018-12-17 23:33 ` Dave Chinner
  2018-12-18  2:41   ` 张本龙
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2018-12-17 23:33 UTC (permalink / raw)
  To: 张本龙; +Cc: linux-xfs, Brian Foster

On Sat, Dec 15, 2018 at 01:34:33PM +0800, 张本龙 wrote:
> Hi Developpers and XFS,
> 
> There seems to be a deadlock involving 3 threads: 1) the fsync thread
> has acquired the project quota lock, and is trying to get the xfs_buf
> (it's a an agf); 2) the xfs_buf is attached to a transaction, and
> xfs_end_io is trying to get the xfs_inode ilock; 3) the write thread
> has acquired the xfs_inode ilock, and tries to get the xfs_dquot.
> Below are the traces.

I don't see a deadlock here. What's holding the AGF lock and
preventing progress from being made?

i.e. we have:

process 	1		2		3
	fsync()
	  ilock A
	  dqlock P
	    agf lock
	    <blocks>
				xfs_end_io
				  ilock A
				  <blocks>
						write()
						  ilock B
						  dqlock P
						  <blocks>

So, basically, everyhting is waiting for the AGF lock, which
is held by something other than these three threads. When the AGF
lock is relesaed, the fsync() will make progress, then release
both dqlock P and ilock A and the other two threads will make
progress again.

> It's 3.10.0-514.16.1.el7.x86_64 kernel, met about 10-20 times a week
> on several hundred of servers.

That's not a mainline kernel and so we can't really help diagnose
this much further. You should really report it to your distro
support channels so they can help you with further diagnosis and
fixes...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: XFS: 3-way deadlock with xfs_dquot, xfs_buf and xfs_inode
  2018-12-17 23:33 ` Dave Chinner
@ 2018-12-18  2:41   ` 张本龙
  2018-12-18  4:36     ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: 张本龙 @ 2018-12-18  2:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, Brian Foster

Dave Chinner <david@fromorbit.com> 于2018年12月18日周二 上午7:33写道:
>
> On Sat, Dec 15, 2018 at 01:34:33PM +0800, 张本龙 wrote:
> > Hi Developpers and XFS,
> >
> > There seems to be a deadlock involving 3 threads: 1) the fsync thread
> > has acquired the project quota lock, and is trying to get the xfs_buf
> > (it's a an agf); 2) the xfs_buf is attached to a transaction, and
> > xfs_end_io is trying to get the xfs_inode ilock; 3) the write thread
> > has acquired the xfs_inode ilock, and tries to get the xfs_dquot.
> > Below are the traces.
>
> I don't see a deadlock here. What's holding the AGF lock and
> preventing progress from being made?
>

Oh, I was thinking the AGF is attached to a transaction. So between
xfs_trans_bjoin() and xfs_trans_commit(), a buf cannot be used by
others right? Then it should be released by xfs_end_io() in
xfs_trans_commit(), and the deadlock is like:

Thread          1                  2
         3
                   fsync()
                   dqlock P
                   agf lock
                   <blocks>
                                  xfs_end_io
                                  (agf locked by transaction)
                                  ilock A
                                  <blocks>
                                  unlock agf in trans commit

            write()

            ilock A

            dqlock P

            <blocks>

> i.e. we have:
>
> process         1               2               3
>         fsync()
>           ilock A
>           dqlock P
>             agf lock
>             <blocks>
>                                 xfs_end_io
>                                   ilock A
>                                   <blocks>
>                                                 write()
>                                                   ilock B
>                                                   dqlock P
>                                                   <blocks>
>
> So, basically, everyhting is waiting for the AGF lock, which
> is held by something other than these three threads. When the AGF
> lock is relesaed, the fsync() will make progress, then release
> both dqlock P and ilock A and the other two threads will make
> progress again.
>

Absolutely possible. fsync() indeed acquired ilock in
xfs_iomap_write_allocate(). Yes the key is who holds AGF in this
scenario.

But in either guess it seems the AGF is being held by a transaction,
blocked in xfs_end_io() by ilock.
> > It's 3.10.0-514.16.1.el7.x86_64 kernel, met about 10-20 times a week
> > on several hundred of servers.
>
> That's not a mainline kernel and so we can't really help diagnose
> this much further. You should really report it to your distro
> support channels so they can help you with further diagnosis and
> fixes...

Oh oh sure, thank you for pointing out. We don't have support channels
since we use CentOS...

I really appreciate your reply!
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: XFS: 3-way deadlock with xfs_dquot, xfs_buf and xfs_inode
  2018-12-18  2:41   ` 张本龙
@ 2018-12-18  4:36     ` Dave Chinner
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2018-12-18  4:36 UTC (permalink / raw)
  To: 张本龙; +Cc: linux-xfs, Brian Foster

On Tue, Dec 18, 2018 at 10:41:48AM +0800, 张本龙 wrote:
> Dave Chinner <david@fromorbit.com> 于2018年12月18日周二 上午7:33写道:
> >
> > On Sat, Dec 15, 2018 at 01:34:33PM +0800, 张本龙 wrote:
> > > Hi Developpers and XFS,
> > >
> > > There seems to be a deadlock involving 3 threads: 1) the fsync thread
> > > has acquired the project quota lock, and is trying to get the xfs_buf
> > > (it's a an agf); 2) the xfs_buf is attached to a transaction, and
> > > xfs_end_io is trying to get the xfs_inode ilock; 3) the write thread
> > > has acquired the xfs_inode ilock, and tries to get the xfs_dquot.
> > > Below are the traces.
> >
> > I don't see a deadlock here. What's holding the AGF lock and
> > preventing progress from being made?
> >
> 
> Oh, I was thinking the AGF is attached to a transaction.

it may be, but it has to be locked to be joined to a transaction.

> So between
> xfs_trans_bjoin() and xfs_trans_commit(), a buf cannot be used by
> others right? Then it should be released by xfs_end_io() in
> xfs_trans_commit(),

No, because that transaction doesn't hold the AGF.

> and the deadlock is like:
> 
> Thread          1                  2
>          3
>                    fsync()
>                    dqlock P
>                    agf lock
>                    <blocks>
>                                   xfs_end_io
>                                   (agf locked by transaction)
>                                   ilock A
>                                   <blocks>
>                                   unlock agf in trans commit
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				This is wrong.

There is no AGF held in the ioend transaction in progress.
xfs_setfilesize() only needs to lock the inode as that is all it
modifies. It's also compeltely independent of the transaction being
run in the fsync context unless they have ot modify the same
metadata (which they don't).

Use 'echo w > /proc/sysrq-trigger' to list all the blocked
processes. Maybe one of them is holding the AGF locked and is
waiting on something else...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-12-18  4:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-15  5:34 XFS: 3-way deadlock with xfs_dquot, xfs_buf and xfs_inode 张本龙
2018-12-17 23:33 ` Dave Chinner
2018-12-18  2:41   ` 张本龙
2018-12-18  4:36     ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.