All of lore.kernel.org
 help / color / mirror / Atom feed
* NFS deadlock between 'sync' and commit after unmount....
@ 2014-04-07  3:50 NeilBrown
  2014-04-07 14:10 ` Trond Myklebust
  0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2014-04-07  3:50 UTC (permalink / raw)
  To: Trond Myklebust, Alexander Viro; +Cc: NFS

[-- Attachment #1: Type: text/plain, Size: 2364 bytes --]


Hi,
 I've just hit a deadlock in NFS that seems very strange.
The kernel is 3.14-rc8 which some local changes which shouldn't affect the
deadlocking code.

Shortly after umounting the NFS filesystem with "umount -f" (though I don't
think the -f is important), I ran "sync".

The sync is now stuck in

[<ffffffff81197fc1>] sync_inodes_sb+0xa1/0x1c0
[<ffffffff8119cd99>] sync_inodes_one_sb+0x19/0x20
[<ffffffff81173372>] iterate_supers+0xb2/0x110
[<ffffffff8119cfd0>] sys_sync+0x30/0x90
[<ffffffff81aa4622>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff


while kworker/u16:1 is stuck:

[<ffffffff815420b3>] call_rwsem_down_write_failed+0x13/0x20
[<ffffffff81172889>] deactivate_super+0x39/0x60
[<ffffffff812d56f1>] nfs_sb_deactive+0x21/0x30
[<ffffffff812d2ef9>] __put_nfs_open_context+0xc9/0x100
[<ffffffff812d2f3b>] put_nfs_open_context+0xb/0x10
[<ffffffff812ddd14>] nfs_commitdata_release+0x14/0x30
[<ffffffff812ddd4a>] nfs_commit_release+0x1a/0x20
[<ffffffff81a45a05>] rpc_free_task+0x25/0x70
[<ffffffff81a45fd8>] rpc_do_put_task+0x78/0x80
[<ffffffff81a45feb>] rpc_put_task+0xb/0x10
[<ffffffff812de3fe>] nfs_initiate_commit+0xce/0x110
[<ffffffff812df112>] nfs_commit_list+0x62/0x90
[<ffffffff812dfd26>] nfs_commit_inode+0xa6/0x170
[<ffffffff812dfe4d>] nfs_write_inode+0x5d/0xa0
[<ffffffff81300d69>] nfs4_write_inode+0x9/0x10
[<ffffffff811978ec>] __writeback_single_inode+0x10c/0x2c0
[<ffffffff811987ea>] writeback_sb_inodes+0x2ca/0x450
[<ffffffff81198b2c>] wb_writeback+0xec/0x320
[<ffffffff81199365>] bdi_writeback_workfn+0x115/0x4c0
[<ffffffff810a595b>] process_one_work+0x16b/0x430
[<ffffffff810a6619>] worker_thread+0x119/0x3a0
[<ffffffff810ac2bd>] kthread+0xcd/0xf0
[<ffffffff81aa457c>] ret_from_fork+0x7c/0xb0
[<ffffffffffffffff>] 0xffffffffffffffff


So sync is holding sb->s_umount, queued some bdi work on the filesystem and
is waiting for it to complete.
Mean while, that work has (I think) submitted a 'commit' (via ->write_inode)
and that commit wants to deactivate_super and so needs to get ->s_umount.

I suspect this could happen even more easily with a lazy unmount.

It seems that this commit request is that last thing that is keeping
->s_active elevated and it deadlocks trying to drop the last s_active.

I have no idea how to fix it....  help?

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NFS deadlock between 'sync' and commit after unmount....
  2014-04-07  3:50 NFS deadlock between 'sync' and commit after unmount NeilBrown
@ 2014-04-07 14:10 ` Trond Myklebust
  2014-04-07 20:27   ` Jan Kara
  0 siblings, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2014-04-07 14:10 UTC (permalink / raw)
  To: Brown Neil, Jan Kara; +Cc: Viro Alexander, NFS


On Apr 6, 2014, at 23:50, NeilBrown <neilb@suse.de> wrote:

> 
> Hi,
> I've just hit a deadlock in NFS that seems very strange.
> The kernel is 3.14-rc8 which some local changes which shouldn't affect the
> deadlocking code.
> 
> Shortly after umounting the NFS filesystem with "umount -f" (though I don't
> think the -f is important), I ran "sync".
> 
> The sync is now stuck in
> 
> [<ffffffff81197fc1>] sync_inodes_sb+0xa1/0x1c0
> [<ffffffff8119cd99>] sync_inodes_one_sb+0x19/0x20
> [<ffffffff81173372>] iterate_supers+0xb2/0x110
> [<ffffffff8119cfd0>] sys_sync+0x30/0x90
> [<ffffffff81aa4622>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> 
> while kworker/u16:1 is stuck:
> 
> [<ffffffff815420b3>] call_rwsem_down_write_failed+0x13/0x20
> [<ffffffff81172889>] deactivate_super+0x39/0x60
> [<ffffffff812d56f1>] nfs_sb_deactive+0x21/0x30
> [<ffffffff812d2ef9>] __put_nfs_open_context+0xc9/0x100
> [<ffffffff812d2f3b>] put_nfs_open_context+0xb/0x10
> [<ffffffff812ddd14>] nfs_commitdata_release+0x14/0x30
> [<ffffffff812ddd4a>] nfs_commit_release+0x1a/0x20
> [<ffffffff81a45a05>] rpc_free_task+0x25/0x70
> [<ffffffff81a45fd8>] rpc_do_put_task+0x78/0x80
> [<ffffffff81a45feb>] rpc_put_task+0xb/0x10
> [<ffffffff812de3fe>] nfs_initiate_commit+0xce/0x110
> [<ffffffff812df112>] nfs_commit_list+0x62/0x90
> [<ffffffff812dfd26>] nfs_commit_inode+0xa6/0x170
> [<ffffffff812dfe4d>] nfs_write_inode+0x5d/0xa0
> [<ffffffff81300d69>] nfs4_write_inode+0x9/0x10
> [<ffffffff811978ec>] __writeback_single_inode+0x10c/0x2c0
> [<ffffffff811987ea>] writeback_sb_inodes+0x2ca/0x450
> [<ffffffff81198b2c>] wb_writeback+0xec/0x320
> [<ffffffff81199365>] bdi_writeback_workfn+0x115/0x4c0
> [<ffffffff810a595b>] process_one_work+0x16b/0x430
> [<ffffffff810a6619>] worker_thread+0x119/0x3a0
> [<ffffffff810ac2bd>] kthread+0xcd/0xf0
> [<ffffffff81aa457c>] ret_from_fork+0x7c/0xb0
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> 
> So sync is holding sb->s_umount, queued some bdi work on the filesystem and
> is waiting for it to complete.
> Mean while, that work has (I think) submitted a 'commit' (via ->write_inode)
> and that commit wants to deactivate_super and so needs to get ->s_umount.
> 
> I suspect this could happen even more easily with a lazy unmount.
> 
> It seems that this commit request is that last thing that is keeping
> ->s_active elevated and it deadlocks trying to drop the last s_active.
> 
> I have no idea how to fix it....  help?
> 

The problem seems to be the use of iterate_supers(), which grabs a passive reference, and conflicts with our use of an active reference in the open context.

Jan, any suggestions?

Cheers
  Trond

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NFS deadlock between 'sync' and commit after unmount....
  2014-04-07 14:10 ` Trond Myklebust
@ 2014-04-07 20:27   ` Jan Kara
  2014-04-07 22:02     ` Trond Myklebust
  2014-04-07 22:09     ` Trond Myklebust
  0 siblings, 2 replies; 9+ messages in thread
From: Jan Kara @ 2014-04-07 20:27 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Brown Neil, Jan Kara, Viro Alexander, NFS

  Hello,

On Mon 07-04-14 10:10:27, Trond Myklebust wrote:
> On Apr 6, 2014, at 23:50, NeilBrown <neilb@suse.de> wrote:
> > I've just hit a deadlock in NFS that seems very strange.
> > The kernel is 3.14-rc8 which some local changes which shouldn't affect the
> > deadlocking code.
> > 
> > Shortly after umounting the NFS filesystem with "umount -f" (though I don't
> > think the -f is important), I ran "sync".
> > 
> > The sync is now stuck in
> > 
> > [<ffffffff81197fc1>] sync_inodes_sb+0xa1/0x1c0
> > [<ffffffff8119cd99>] sync_inodes_one_sb+0x19/0x20
> > [<ffffffff81173372>] iterate_supers+0xb2/0x110
> > [<ffffffff8119cfd0>] sys_sync+0x30/0x90
> > [<ffffffff81aa4622>] system_call_fastpath+0x16/0x1b
> > [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> > while kworker/u16:1 is stuck:
> > 
> > [<ffffffff815420b3>] call_rwsem_down_write_failed+0x13/0x20
> > [<ffffffff81172889>] deactivate_super+0x39/0x60
> > [<ffffffff812d56f1>] nfs_sb_deactive+0x21/0x30
> > [<ffffffff812d2ef9>] __put_nfs_open_context+0xc9/0x100
> > [<ffffffff812d2f3b>] put_nfs_open_context+0xb/0x10
> > [<ffffffff812ddd14>] nfs_commitdata_release+0x14/0x30
> > [<ffffffff812ddd4a>] nfs_commit_release+0x1a/0x20
> > [<ffffffff81a45a05>] rpc_free_task+0x25/0x70
> > [<ffffffff81a45fd8>] rpc_do_put_task+0x78/0x80
> > [<ffffffff81a45feb>] rpc_put_task+0xb/0x10
> > [<ffffffff812de3fe>] nfs_initiate_commit+0xce/0x110
> > [<ffffffff812df112>] nfs_commit_list+0x62/0x90
> > [<ffffffff812dfd26>] nfs_commit_inode+0xa6/0x170
> > [<ffffffff812dfe4d>] nfs_write_inode+0x5d/0xa0
> > [<ffffffff81300d69>] nfs4_write_inode+0x9/0x10
> > [<ffffffff811978ec>] __writeback_single_inode+0x10c/0x2c0
> > [<ffffffff811987ea>] writeback_sb_inodes+0x2ca/0x450
> > [<ffffffff81198b2c>] wb_writeback+0xec/0x320
> > [<ffffffff81199365>] bdi_writeback_workfn+0x115/0x4c0
> > [<ffffffff810a595b>] process_one_work+0x16b/0x430
> > [<ffffffff810a6619>] worker_thread+0x119/0x3a0
> > [<ffffffff810ac2bd>] kthread+0xcd/0xf0
> > [<ffffffff81aa457c>] ret_from_fork+0x7c/0xb0
> > [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> > 
> > So sync is holding sb->s_umount, queued some bdi work on the filesystem
> > and is waiting for it to complete.  Mean while, that work has (I think)
> > submitted a 'commit' (via ->write_inode) and that commit wants to
> > deactivate_super and so needs to get ->s_umount.
> > 
> > I suspect this could happen even more easily with a lazy unmount.
> > 
> > It seems that this commit request is that last thing that is keeping
> > ->s_active elevated and it deadlocks trying to drop the last s_active.
> > 
> > I have no idea how to fix it....  help?
> > 
> 
> The problem seems to be the use of iterate_supers(), which grabs a
> passive reference, and conflicts with our use of an active reference in
> the open context.
  Yeah, we cannot really do otherwise in iterate_supers() - we have to grab
some superblock reference and we don't really want to get an active one
since that would result in spurious EBUSY returns from umount.

Cannot we just punt the deactivate_super() call to a workqueue to avoid
this deadlock? It's a bit ugly but it should do the trick. Or is
nfs_sb_deactive() called too often and we'd see some adverse effects for
that? We could also offload it to workqueue only in the special case where
sb->s_active == 1. That should be really rare then but it's a bit ugly
poking in VFS internals. BTW why is nfs_sb_active() safe doing just
atomic_inc() on s_active? grab_super() is more careful...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NFS deadlock between 'sync' and commit after unmount....
  2014-04-07 20:27   ` Jan Kara
@ 2014-04-07 22:02     ` Trond Myklebust
  2014-04-07 22:35       ` Jan Kara
  2014-04-07 22:09     ` Trond Myklebust
  1 sibling, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2014-04-07 22:02 UTC (permalink / raw)
  To: Jan Kara; +Cc: Brown Neil, Viro Alexander, NFS

On Mon, 2014-04-07 at 22:27 +0200, Jan Kara wrote:
>   Hello,
> 
> On Mon 07-04-14 10:10:27, Trond Myklebust wrote:
> > On Apr 6, 2014, at 23:50, NeilBrown <neilb@suse.de> wrote:
> > > I've just hit a deadlock in NFS that seems very strange.
> > > The kernel is 3.14-rc8 which some local changes which shouldn't affect the
> > > deadlocking code.
> > > 
> > > Shortly after umounting the NFS filesystem with "umount -f" (though I don't
> > > think the -f is important), I ran "sync".
> > > 
> > > The sync is now stuck in
> > > 
> > > [<ffffffff81197fc1>] sync_inodes_sb+0xa1/0x1c0
> > > [<ffffffff8119cd99>] sync_inodes_one_sb+0x19/0x20
> > > [<ffffffff81173372>] iterate_supers+0xb2/0x110
> > > [<ffffffff8119cfd0>] sys_sync+0x30/0x90
> > > [<ffffffff81aa4622>] system_call_fastpath+0x16/0x1b
> > > [<ffffffffffffffff>] 0xffffffffffffffff
> > > 
> > > while kworker/u16:1 is stuck:
> > > 
> > > [<ffffffff815420b3>] call_rwsem_down_write_failed+0x13/0x20
> > > [<ffffffff81172889>] deactivate_super+0x39/0x60
> > > [<ffffffff812d56f1>] nfs_sb_deactive+0x21/0x30
> > > [<ffffffff812d2ef9>] __put_nfs_open_context+0xc9/0x100
> > > [<ffffffff812d2f3b>] put_nfs_open_context+0xb/0x10
> > > [<ffffffff812ddd14>] nfs_commitdata_release+0x14/0x30
> > > [<ffffffff812ddd4a>] nfs_commit_release+0x1a/0x20
> > > [<ffffffff81a45a05>] rpc_free_task+0x25/0x70
> > > [<ffffffff81a45fd8>] rpc_do_put_task+0x78/0x80
> > > [<ffffffff81a45feb>] rpc_put_task+0xb/0x10
> > > [<ffffffff812de3fe>] nfs_initiate_commit+0xce/0x110
> > > [<ffffffff812df112>] nfs_commit_list+0x62/0x90
> > > [<ffffffff812dfd26>] nfs_commit_inode+0xa6/0x170
> > > [<ffffffff812dfe4d>] nfs_write_inode+0x5d/0xa0
> > > [<ffffffff81300d69>] nfs4_write_inode+0x9/0x10
> > > [<ffffffff811978ec>] __writeback_single_inode+0x10c/0x2c0
> > > [<ffffffff811987ea>] writeback_sb_inodes+0x2ca/0x450
> > > [<ffffffff81198b2c>] wb_writeback+0xec/0x320
> > > [<ffffffff81199365>] bdi_writeback_workfn+0x115/0x4c0
> > > [<ffffffff810a595b>] process_one_work+0x16b/0x430
> > > [<ffffffff810a6619>] worker_thread+0x119/0x3a0
> > > [<ffffffff810ac2bd>] kthread+0xcd/0xf0
> > > [<ffffffff81aa457c>] ret_from_fork+0x7c/0xb0
> > > [<ffffffffffffffff>] 0xffffffffffffffff
> > > 
> > > 
> > > So sync is holding sb->s_umount, queued some bdi work on the filesystem
> > > and is waiting for it to complete.  Mean while, that work has (I think)
> > > submitted a 'commit' (via ->write_inode) and that commit wants to
> > > deactivate_super and so needs to get ->s_umount.
> > > 
> > > I suspect this could happen even more easily with a lazy unmount.
> > > 
> > > It seems that this commit request is that last thing that is keeping
> > > ->s_active elevated and it deadlocks trying to drop the last s_active.
> > > 
> > > I have no idea how to fix it....  help?
> > > 
> > 
> > The problem seems to be the use of iterate_supers(), which grabs a
> > passive reference, and conflicts with our use of an active reference in
> > the open context.
>   Yeah, we cannot really do otherwise in iterate_supers() - we have to grab
> some superblock reference and we don't really want to get an active one
> since that would result in spurious EBUSY returns from umount.
> 
> Cannot we just punt the deactivate_super() call to a workqueue to avoid
> this deadlock? It's a bit ugly but it should do the trick. Or is
> nfs_sb_deactive() called too often and we'd see some adverse effects for
> that? We could also offload it to workqueue only in the special case where
> sb->s_active == 1. That should be really rare then but it's a bit ugly
> poking in VFS internals.

The activate/deactivate super is basically there to save our bacon when
NFS file state extends beyond the usual VFS path walk, open() and
close(). Examples include sillyrename and NFSv4 delegations. Even
ordinary read and write state can extend beyond close() if the user
decides to 'kill -9' in the wrong places.
In most of these situations, we need to keep a dentry around until we're
finished, which means that we want to keep the super block alive too.

>  BTW why is nfs_sb_active() safe doing just
> atomic_inc() on s_active? grab_super() is more careful...

We're never grabbing a reference without already holding a reference via
a struct path or a struct file passed down to us by the VFS. However we
may want to keep the struct dentry around for longer than the lifetime
of that struct path/file.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NFS deadlock between 'sync' and commit after unmount....
  2014-04-07 20:27   ` Jan Kara
  2014-04-07 22:02     ` Trond Myklebust
@ 2014-04-07 22:09     ` Trond Myklebust
  2014-04-07 22:32       ` Jan Kara
  1 sibling, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2014-04-07 22:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: Brown Neil, Viro Alexander, NFS

On Mon, 2014-04-07 at 22:27 +0200, Jan Kara wrote:
>   Hello,
> 
> On Mon 07-04-14 10:10:27, Trond Myklebust wrote:

> > The problem seems to be the use of iterate_supers(), which grabs a
> > passive reference, and conflicts with our use of an active reference in
> > the open context.
>   Yeah, we cannot really do otherwise in iterate_supers() - we have to grab
> some superblock reference and we don't really want to get an active one
> since that would result in spurious EBUSY returns from umount.

BTW: By what mechanism does an active reference lead to EBUSY issues
here?

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NFS deadlock between 'sync' and commit after unmount....
  2014-04-07 22:09     ` Trond Myklebust
@ 2014-04-07 22:32       ` Jan Kara
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Kara @ 2014-04-07 22:32 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jan Kara, Brown Neil, Viro Alexander, NFS

On Mon 07-04-14 18:09:02, Trond Myklebust wrote:
> On Mon, 2014-04-07 at 22:27 +0200, Jan Kara wrote:
> >   Hello,
> > 
> > On Mon 07-04-14 10:10:27, Trond Myklebust wrote:
> 
> > > The problem seems to be the use of iterate_supers(), which grabs a
> > > passive reference, and conflicts with our use of an active reference in
> > > the open context.
> >   Yeah, we cannot really do otherwise in iterate_supers() - we have to grab
> > some superblock reference and we don't really want to get an active one
> > since that would result in spurious EBUSY returns from umount.
> 
> BTW: By what mechanism does an active reference lead to EBUSY issues
> here?
  Ah, sorry. I was wrong. We use mount usecount for this. But still using
active reference for iterate_supers() seems wrong as that could result in
handling of destruction of superblock from it...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NFS deadlock between 'sync' and commit after unmount....
  2014-04-07 22:02     ` Trond Myklebust
@ 2014-04-07 22:35       ` Jan Kara
  2014-04-07 23:07         ` Trond Myklebust
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2014-04-07 22:35 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jan Kara, Brown Neil, Viro Alexander, NFS

On Mon 07-04-14 18:02:16, Trond Myklebust wrote:
> On Mon, 2014-04-07 at 22:27 +0200, Jan Kara wrote:
> > On Mon 07-04-14 10:10:27, Trond Myklebust wrote:
> > > On Apr 6, 2014, at 23:50, NeilBrown <neilb@suse.de> wrote:
> > > > I've just hit a deadlock in NFS that seems very strange.
> > > > The kernel is 3.14-rc8 which some local changes which shouldn't affect the
> > > > deadlocking code.
> > > > 
> > > > Shortly after umounting the NFS filesystem with "umount -f" (though I don't
> > > > think the -f is important), I ran "sync".
> > > > 
> > > > The sync is now stuck in
> > > > 
> > > > [<ffffffff81197fc1>] sync_inodes_sb+0xa1/0x1c0
> > > > [<ffffffff8119cd99>] sync_inodes_one_sb+0x19/0x20
> > > > [<ffffffff81173372>] iterate_supers+0xb2/0x110
> > > > [<ffffffff8119cfd0>] sys_sync+0x30/0x90
> > > > [<ffffffff81aa4622>] system_call_fastpath+0x16/0x1b
> > > > [<ffffffffffffffff>] 0xffffffffffffffff
> > > > 
> > > > while kworker/u16:1 is stuck:
> > > > 
> > > > [<ffffffff815420b3>] call_rwsem_down_write_failed+0x13/0x20
> > > > [<ffffffff81172889>] deactivate_super+0x39/0x60
> > > > [<ffffffff812d56f1>] nfs_sb_deactive+0x21/0x30
> > > > [<ffffffff812d2ef9>] __put_nfs_open_context+0xc9/0x100
> > > > [<ffffffff812d2f3b>] put_nfs_open_context+0xb/0x10
> > > > [<ffffffff812ddd14>] nfs_commitdata_release+0x14/0x30
> > > > [<ffffffff812ddd4a>] nfs_commit_release+0x1a/0x20
> > > > [<ffffffff81a45a05>] rpc_free_task+0x25/0x70
> > > > [<ffffffff81a45fd8>] rpc_do_put_task+0x78/0x80
> > > > [<ffffffff81a45feb>] rpc_put_task+0xb/0x10
> > > > [<ffffffff812de3fe>] nfs_initiate_commit+0xce/0x110
> > > > [<ffffffff812df112>] nfs_commit_list+0x62/0x90
> > > > [<ffffffff812dfd26>] nfs_commit_inode+0xa6/0x170
> > > > [<ffffffff812dfe4d>] nfs_write_inode+0x5d/0xa0
> > > > [<ffffffff81300d69>] nfs4_write_inode+0x9/0x10
> > > > [<ffffffff811978ec>] __writeback_single_inode+0x10c/0x2c0
> > > > [<ffffffff811987ea>] writeback_sb_inodes+0x2ca/0x450
> > > > [<ffffffff81198b2c>] wb_writeback+0xec/0x320
> > > > [<ffffffff81199365>] bdi_writeback_workfn+0x115/0x4c0
> > > > [<ffffffff810a595b>] process_one_work+0x16b/0x430
> > > > [<ffffffff810a6619>] worker_thread+0x119/0x3a0
> > > > [<ffffffff810ac2bd>] kthread+0xcd/0xf0
> > > > [<ffffffff81aa457c>] ret_from_fork+0x7c/0xb0
> > > > [<ffffffffffffffff>] 0xffffffffffffffff
> > > > 
> > > > 
> > > > So sync is holding sb->s_umount, queued some bdi work on the filesystem
> > > > and is waiting for it to complete.  Mean while, that work has (I think)
> > > > submitted a 'commit' (via ->write_inode) and that commit wants to
> > > > deactivate_super and so needs to get ->s_umount.
> > > > 
> > > > I suspect this could happen even more easily with a lazy unmount.
> > > > 
> > > > It seems that this commit request is that last thing that is keeping
> > > > ->s_active elevated and it deadlocks trying to drop the last s_active.
> > > > 
> > > > I have no idea how to fix it....  help?
> > > > 
> > > 
> > > The problem seems to be the use of iterate_supers(), which grabs a
> > > passive reference, and conflicts with our use of an active reference in
> > > the open context.
> >   Yeah, we cannot really do otherwise in iterate_supers() - we have to grab
> > some superblock reference and we don't really want to get an active one
> > since that would result in spurious EBUSY returns from umount.
> > 
> > Cannot we just punt the deactivate_super() call to a workqueue to avoid
> > this deadlock? It's a bit ugly but it should do the trick. Or is
> > nfs_sb_deactive() called too often and we'd see some adverse effects for
> > that? We could also offload it to workqueue only in the special case where
> > sb->s_active == 1. That should be really rare then but it's a bit ugly
> > poking in VFS internals.
> 
> The activate/deactivate super is basically there to save our bacon when
> NFS file state extends beyond the usual VFS path walk, open() and
> close(). Examples include sillyrename and NFSv4 delegations. Even
> ordinary read and write state can extend beyond close() if the user
> decides to 'kill -9' in the wrong places.
> In most of these situations, we need to keep a dentry around until we're
> finished, which means that we want to keep the super block alive too.
  Yeah, that makes sense. But offloading dropping of sb reference to a
workqueue would work then, wouldn't it?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NFS deadlock between 'sync' and commit after unmount....
  2014-04-07 22:35       ` Jan Kara
@ 2014-04-07 23:07         ` Trond Myklebust
  2014-04-10 21:25           ` Jan Kara
  0 siblings, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2014-04-07 23:07 UTC (permalink / raw)
  To: Jan Kara; +Cc: Brown Neil, Viro Alexander, NFS


On Apr 7, 2014, at 18:35, Jan Kara <jack@suse.cz> wrote:

> On Mon 07-04-14 18:02:16, Trond Myklebust wrote:
>> On Mon, 2014-04-07 at 22:27 +0200, Jan Kara wrote:
>>> On Mon 07-04-14 10:10:27, Trond Myklebust wrote:
>>>> On Apr 6, 2014, at 23:50, NeilBrown <neilb@suse.de> wrote:
>>>>> I've just hit a deadlock in NFS that seems very strange.
>>>>> The kernel is 3.14-rc8 which some local changes which shouldn't affect the
>>>>> deadlocking code.
>>>>> 
>>>>> Shortly after umounting the NFS filesystem with "umount -f" (though I don't
>>>>> think the -f is important), I ran "sync".
>>>>> 
>>>>> The sync is now stuck in
>>>>> 
>>>>> [<ffffffff81197fc1>] sync_inodes_sb+0xa1/0x1c0
>>>>> [<ffffffff8119cd99>] sync_inodes_one_sb+0x19/0x20
>>>>> [<ffffffff81173372>] iterate_supers+0xb2/0x110
>>>>> [<ffffffff8119cfd0>] sys_sync+0x30/0x90
>>>>> [<ffffffff81aa4622>] system_call_fastpath+0x16/0x1b
>>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>>> 
>>>>> while kworker/u16:1 is stuck:
>>>>> 
>>>>> [<ffffffff815420b3>] call_rwsem_down_write_failed+0x13/0x20
>>>>> [<ffffffff81172889>] deactivate_super+0x39/0x60
>>>>> [<ffffffff812d56f1>] nfs_sb_deactive+0x21/0x30
>>>>> [<ffffffff812d2ef9>] __put_nfs_open_context+0xc9/0x100
>>>>> [<ffffffff812d2f3b>] put_nfs_open_context+0xb/0x10
>>>>> [<ffffffff812ddd14>] nfs_commitdata_release+0x14/0x30
>>>>> [<ffffffff812ddd4a>] nfs_commit_release+0x1a/0x20
>>>>> [<ffffffff81a45a05>] rpc_free_task+0x25/0x70
>>>>> [<ffffffff81a45fd8>] rpc_do_put_task+0x78/0x80
>>>>> [<ffffffff81a45feb>] rpc_put_task+0xb/0x10
>>>>> [<ffffffff812de3fe>] nfs_initiate_commit+0xce/0x110
>>>>> [<ffffffff812df112>] nfs_commit_list+0x62/0x90
>>>>> [<ffffffff812dfd26>] nfs_commit_inode+0xa6/0x170
>>>>> [<ffffffff812dfe4d>] nfs_write_inode+0x5d/0xa0
>>>>> [<ffffffff81300d69>] nfs4_write_inode+0x9/0x10
>>>>> [<ffffffff811978ec>] __writeback_single_inode+0x10c/0x2c0
>>>>> [<ffffffff811987ea>] writeback_sb_inodes+0x2ca/0x450
>>>>> [<ffffffff81198b2c>] wb_writeback+0xec/0x320
>>>>> [<ffffffff81199365>] bdi_writeback_workfn+0x115/0x4c0
>>>>> [<ffffffff810a595b>] process_one_work+0x16b/0x430
>>>>> [<ffffffff810a6619>] worker_thread+0x119/0x3a0
>>>>> [<ffffffff810ac2bd>] kthread+0xcd/0xf0
>>>>> [<ffffffff81aa457c>] ret_from_fork+0x7c/0xb0
>>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>>> 
>>>>> 
>>>>> So sync is holding sb->s_umount, queued some bdi work on the filesystem
>>>>> and is waiting for it to complete.  Mean while, that work has (I think)
>>>>> submitted a 'commit' (via ->write_inode) and that commit wants to
>>>>> deactivate_super and so needs to get ->s_umount.
>>>>> 
>>>>> I suspect this could happen even more easily with a lazy unmount.
>>>>> 
>>>>> It seems that this commit request is that last thing that is keeping
>>>>> ->s_active elevated and it deadlocks trying to drop the last s_active.
>>>>> 
>>>>> I have no idea how to fix it....  help?
>>>>> 
>>>> 
>>>> The problem seems to be the use of iterate_supers(), which grabs a
>>>> passive reference, and conflicts with our use of an active reference in
>>>> the open context.
>>>  Yeah, we cannot really do otherwise in iterate_supers() - we have to grab
>>> some superblock reference and we don't really want to get an active one
>>> since that would result in spurious EBUSY returns from umount.
>>> 
>>> Cannot we just punt the deactivate_super() call to a workqueue to avoid
>>> this deadlock? It's a bit ugly but it should do the trick. Or is
>>> nfs_sb_deactive() called too often and we'd see some adverse effects for
>>> that? We could also offload it to workqueue only in the special case where
>>> sb->s_active == 1. That should be really rare then but it's a bit ugly
>>> poking in VFS internals.
>> 
>> The activate/deactivate super is basically there to save our bacon when
>> NFS file state extends beyond the usual VFS path walk, open() and
>> close(). Examples include sillyrename and NFSv4 delegations. Even
>> ordinary read and write state can extend beyond close() if the user
>> decides to 'kill -9' in the wrong places.
>> In most of these situations, we need to keep a dentry around until we're
>> finished, which means that we want to keep the super block alive too.
>  Yeah, that makes sense. But offloading dropping of sb reference to a
> workqueue would work then, wouldn't it?

Could we perhaps have a helper in the VFS that can optimise away the case where s->s_active > 1?

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NFS deadlock between 'sync' and commit after unmount....
  2014-04-07 23:07         ` Trond Myklebust
@ 2014-04-10 21:25           ` Jan Kara
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Kara @ 2014-04-10 21:25 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jan Kara, Brown Neil, Viro Alexander, NFS

On Mon 07-04-14 19:07:35, Trond Myklebust wrote:
> On Apr 7, 2014, at 18:35, Jan Kara <jack@suse.cz> wrote:
> 
> > On Mon 07-04-14 18:02:16, Trond Myklebust wrote:
> >> On Mon, 2014-04-07 at 22:27 +0200, Jan Kara wrote:
> >>> On Mon 07-04-14 10:10:27, Trond Myklebust wrote:
> >>>> On Apr 6, 2014, at 23:50, NeilBrown <neilb@suse.de> wrote:
> >>>>> I've just hit a deadlock in NFS that seems very strange.
> >>>>> The kernel is 3.14-rc8 which some local changes which shouldn't affect the
> >>>>> deadlocking code.
> >>>>> 
> >>>>> Shortly after umounting the NFS filesystem with "umount -f" (though I don't
> >>>>> think the -f is important), I ran "sync".
> >>>>> 
> >>>>> The sync is now stuck in
> >>>>> 
> >>>>> [<ffffffff81197fc1>] sync_inodes_sb+0xa1/0x1c0
> >>>>> [<ffffffff8119cd99>] sync_inodes_one_sb+0x19/0x20
> >>>>> [<ffffffff81173372>] iterate_supers+0xb2/0x110
> >>>>> [<ffffffff8119cfd0>] sys_sync+0x30/0x90
> >>>>> [<ffffffff81aa4622>] system_call_fastpath+0x16/0x1b
> >>>>> [<ffffffffffffffff>] 0xffffffffffffffff
> >>>>> 
> >>>>> while kworker/u16:1 is stuck:
> >>>>> 
> >>>>> [<ffffffff815420b3>] call_rwsem_down_write_failed+0x13/0x20
> >>>>> [<ffffffff81172889>] deactivate_super+0x39/0x60
> >>>>> [<ffffffff812d56f1>] nfs_sb_deactive+0x21/0x30
> >>>>> [<ffffffff812d2ef9>] __put_nfs_open_context+0xc9/0x100
> >>>>> [<ffffffff812d2f3b>] put_nfs_open_context+0xb/0x10
> >>>>> [<ffffffff812ddd14>] nfs_commitdata_release+0x14/0x30
> >>>>> [<ffffffff812ddd4a>] nfs_commit_release+0x1a/0x20
> >>>>> [<ffffffff81a45a05>] rpc_free_task+0x25/0x70
> >>>>> [<ffffffff81a45fd8>] rpc_do_put_task+0x78/0x80
> >>>>> [<ffffffff81a45feb>] rpc_put_task+0xb/0x10
> >>>>> [<ffffffff812de3fe>] nfs_initiate_commit+0xce/0x110
> >>>>> [<ffffffff812df112>] nfs_commit_list+0x62/0x90
> >>>>> [<ffffffff812dfd26>] nfs_commit_inode+0xa6/0x170
> >>>>> [<ffffffff812dfe4d>] nfs_write_inode+0x5d/0xa0
> >>>>> [<ffffffff81300d69>] nfs4_write_inode+0x9/0x10
> >>>>> [<ffffffff811978ec>] __writeback_single_inode+0x10c/0x2c0
> >>>>> [<ffffffff811987ea>] writeback_sb_inodes+0x2ca/0x450
> >>>>> [<ffffffff81198b2c>] wb_writeback+0xec/0x320
> >>>>> [<ffffffff81199365>] bdi_writeback_workfn+0x115/0x4c0
> >>>>> [<ffffffff810a595b>] process_one_work+0x16b/0x430
> >>>>> [<ffffffff810a6619>] worker_thread+0x119/0x3a0
> >>>>> [<ffffffff810ac2bd>] kthread+0xcd/0xf0
> >>>>> [<ffffffff81aa457c>] ret_from_fork+0x7c/0xb0
> >>>>> [<ffffffffffffffff>] 0xffffffffffffffff
> >>>>> 
> >>>>> 
> >>>>> So sync is holding sb->s_umount, queued some bdi work on the filesystem
> >>>>> and is waiting for it to complete.  Mean while, that work has (I think)
> >>>>> submitted a 'commit' (via ->write_inode) and that commit wants to
> >>>>> deactivate_super and so needs to get ->s_umount.
> >>>>> 
> >>>>> I suspect this could happen even more easily with a lazy unmount.
> >>>>> 
> >>>>> It seems that this commit request is that last thing that is keeping
> >>>>> ->s_active elevated and it deadlocks trying to drop the last s_active.
> >>>>> 
> >>>>> I have no idea how to fix it....  help?
> >>>>> 
> >>>> 
> >>>> The problem seems to be the use of iterate_supers(), which grabs a
> >>>> passive reference, and conflicts with our use of an active reference in
> >>>> the open context.
> >>>  Yeah, we cannot really do otherwise in iterate_supers() - we have to grab
> >>> some superblock reference and we don't really want to get an active one
> >>> since that would result in spurious EBUSY returns from umount.
> >>> 
> >>> Cannot we just punt the deactivate_super() call to a workqueue to avoid
> >>> this deadlock? It's a bit ugly but it should do the trick. Or is
> >>> nfs_sb_deactive() called too often and we'd see some adverse effects for
> >>> that? We could also offload it to workqueue only in the special case where
> >>> sb->s_active == 1. That should be really rare then but it's a bit ugly
> >>> poking in VFS internals.
> >> 
> >> The activate/deactivate super is basically there to save our bacon when
> >> NFS file state extends beyond the usual VFS path walk, open() and
> >> close(). Examples include sillyrename and NFSv4 delegations. Even
> >> ordinary read and write state can extend beyond close() if the user
> >> decides to 'kill -9' in the wrong places.
> >> In most of these situations, we need to keep a dentry around until we're
> >> finished, which means that we want to keep the super block alive too.
> >  Yeah, that makes sense. But offloading dropping of sb reference to a
> > workqueue would work then, wouldn't it?
> 
> Could we perhaps have a helper in the VFS that can optimise away the case
> where s->s_active > 1?
  I'm not sure how you'd imagine the optimisation in VFS. But what I had in
mind was something like:

void nfs_deactivate_super(struct super_block *sb)
{
	if (!atomic_add_unless(&sb->s_active, -1, 1)) {
		/*
		 * Postpone deactivation to workqueue to avoid deadlocking
		 * on s_umount semaphore - we can get here when trying to
		 * complete sync(2) request for forcefully unmounted
		 * filesystem.
		 */
		schedule_work(&NFS_SB(sb)->deactivate_work);
	}
}

static void nfs_deactivate_sb_work(struct work_struct *work)
{
	struct super_block *sb = container_of(work, struct nfs_server,
					      deactivate_work)->super;

	deactivate_super(sb);
}

in nfs_initialise_sb():
	INIT_WORK(&NFS_SB(sb)->deactivate_work, nfs_deactivate_sb_work);

and then use nfs_deactive_super() instead of deactivate_super(). That
should do the trick and do the offloading only if we are really dropping
the last reference.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-04-10 21:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-07  3:50 NFS deadlock between 'sync' and commit after unmount NeilBrown
2014-04-07 14:10 ` Trond Myklebust
2014-04-07 20:27   ` Jan Kara
2014-04-07 22:02     ` Trond Myklebust
2014-04-07 22:35       ` Jan Kara
2014-04-07 23:07         ` Trond Myklebust
2014-04-10 21:25           ` Jan Kara
2014-04-07 22:09     ` Trond Myklebust
2014-04-07 22:32       ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.