All of lore.kernel.org
 help / color / mirror / Atom feed
* XFS race on umount
@ 2017-03-10 12:04 Quentin Casasnovas
  2017-03-10 14:05 ` Brian Foster
  0 siblings, 1 reply; 7+ messages in thread
From: Quentin Casasnovas @ 2017-03-10 12:04 UTC (permalink / raw)
  To: linux-xfs; +Cc: Darrick J. Wong

[-- Attachment #1: Type: text/plain, Size: 3574 bytes --]

Hi Guys,

We've been using XFS recently on our build system because we found that it
scales pretty well and we have good use for the reflink feature :)

I think our setup is relivatively unique in that on every one of our build
server, we mount hundreds of XFS filesystem from NBD devices in parallel,
where our build environment are stored on qcow2 images and connected with
qemu-nbd, then umount them when the build is finished.  Those qcow2 images
are stored on a NFS mount, which leads to some (expected) hickups when
reading/writing blocks where sometimes the NBD layer will return some
errors to the block layer, which in turn will pass them on to XFS.  It
could be due to network contention, very high load on the server, or any
transcient error really, and in those cases, XFS will normally force shut
down the filesystem and wait for a umount.

All of this is fine and is exactly the behaviour we'd expect, though it
turns out that we keep hiting what I think is a race condition between
umount and a force shutdown from XFS itself, where I have a umount process
completely stuck in xfs_ail_push_all_sync():

  [<ffffffff813d987e>] xfs_ail_push_all_sync+0x9e/0xe0
  [<ffffffff813c20c7>] xfs_unmountfs+0x67/0x150
  [<ffffffff813c5540>] xfs_fs_put_super+0x20/0x70
  [<ffffffff811cba7a>] generic_shutdown_super+0x6a/0xf0
  [<ffffffff811cbb2b>] kill_block_super+0x2b/0x80
  [<ffffffff811cc067>] deactivate_locked_super+0x47/0x80
  [<ffffffff811ccc19>] deactivate_super+0x49/0x70
  [<ffffffff811e7b3e>] cleanup_mnt+0x3e/0x90
  [<ffffffff811e7bdd>] __cleanup_mnt+0xd/0x10
  [<ffffffff810e1b39>] task_work_run+0x79/0xa0
  [<ffffffff810c2df7>] exit_to_usermode_loop+0x4f/0x75
  [<ffffffff8100134b>] syscall_return_slowpath+0x5b/0x70
  [<ffffffff81a2cbe3>] entry_SYSCALL_64_fastpath+0x96/0x98
  [<ffffffffffffffff>] 0xffffffffffffffff

This is on a v4.10.1 kernel.  I've had a look at xfs_ail_push_all_sync()
and I wonder if there isn't a potential lost wake up problem, where I can't
see that we retest the condition after setting the current process to
TASK_UNINTERRUPTIBLE and before calling schedule() (though I know nothing
about XFS internals...).

Here's an exerpt of relevant dmesg messages that very likely happened at
the same time the unmount process was started:

  [29961.767707] block nbd74: Other side returned error (22)
  [29961.837518] XFS (nbd74): metadata I/O error: block 0x6471ba0 ("xfs_tra=
ns_read_buf_map") error 5 numblks 32
  [29961.838172] block nbd74: Other side returned error (22)
  [29961.838179] block nbd74: Other side returned error (22)
  [29961.838184] block nbd74: Other side returned error (22)
  [29961.838203] block nbd74: Other side returned error (22)
  [29961.838208] block nbd74: Other side returned error (22)
  [29962.259551] XFS (nbd74): xfs_imap_to_bp: xfs_trans_read_buf() returned=
 error -5.
  [29962.356376] XFS (nbd74): xfs_do_force_shutdown(0x8) called from line 3=
454 of file fs/xfs/xfs_inode.c.  Return address =3D 0xffffffff813bf471
  [29962.503003] XFS (nbd74): Corruption of in-memory data detected.  Shutt=
ing down filesystem
  [29963.166314] XFS (nbd74): Please umount the filesystem and rectify the =
problem(s)

I'm pretty sure the process isn't deadlocking on the spinlock because it
doesn't burn any CPU and is really out of the scheduler pool.  It should be
noted that when I noticed the hung umount process, I've manually tried to
unmount the corresponding XFS mountpoint and that was fine, though it
obviously didn't "unhang" the stuck umount process.

Any help would be appreciated :)

Thanks,
Quentin

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS race on umount
  2017-03-10 12:04 XFS race on umount Quentin Casasnovas
@ 2017-03-10 14:05 ` Brian Foster
  2017-03-10 14:38   ` Quentin Casasnovas
  0 siblings, 1 reply; 7+ messages in thread
From: Brian Foster @ 2017-03-10 14:05 UTC (permalink / raw)
  To: Quentin Casasnovas; +Cc: linux-xfs, Darrick J. Wong

On Fri, Mar 10, 2017 at 01:04:06PM +0100, Quentin Casasnovas wrote:
> Hi Guys,
> 
> We've been using XFS recently on our build system because we found that it
> scales pretty well and we have good use for the reflink feature :)
> 
> I think our setup is relivatively unique in that on every one of our build
> server, we mount hundreds of XFS filesystem from NBD devices in parallel,
> where our build environment are stored on qcow2 images and connected with
> qemu-nbd, then umount them when the build is finished.  Those qcow2 images
> are stored on a NFS mount, which leads to some (expected) hickups when
> reading/writing blocks where sometimes the NBD layer will return some
> errors to the block layer, which in turn will pass them on to XFS.  It
> could be due to network contention, very high load on the server, or any
> transcient error really, and in those cases, XFS will normally force shut
> down the filesystem and wait for a umount.
> 
> All of this is fine and is exactly the behaviour we'd expect, though it
> turns out that we keep hiting what I think is a race condition between
> umount and a force shutdown from XFS itself, where I have a umount process
> completely stuck in xfs_ail_push_all_sync():
> 
>   [<ffffffff813d987e>] xfs_ail_push_all_sync+0x9e/0xe0
>   [<ffffffff813c20c7>] xfs_unmountfs+0x67/0x150
>   [<ffffffff813c5540>] xfs_fs_put_super+0x20/0x70
>   [<ffffffff811cba7a>] generic_shutdown_super+0x6a/0xf0
>   [<ffffffff811cbb2b>] kill_block_super+0x2b/0x80
>   [<ffffffff811cc067>] deactivate_locked_super+0x47/0x80
>   [<ffffffff811ccc19>] deactivate_super+0x49/0x70
>   [<ffffffff811e7b3e>] cleanup_mnt+0x3e/0x90
>   [<ffffffff811e7bdd>] __cleanup_mnt+0xd/0x10
>   [<ffffffff810e1b39>] task_work_run+0x79/0xa0
>   [<ffffffff810c2df7>] exit_to_usermode_loop+0x4f/0x75
>   [<ffffffff8100134b>] syscall_return_slowpath+0x5b/0x70
>   [<ffffffff81a2cbe3>] entry_SYSCALL_64_fastpath+0x96/0x98
>   [<ffffffffffffffff>] 0xffffffffffffffff
> 
> This is on a v4.10.1 kernel.  I've had a look at xfs_ail_push_all_sync()
> and I wonder if there isn't a potential lost wake up problem, where I can't
> see that we retest the condition after setting the current process to
> TASK_UNINTERRUPTIBLE and before calling schedule() (though I know nothing
> about XFS internals...).
> 
> Here's an exerpt of relevant dmesg messages that very likely happened at
> the same time the unmount process was started:
> 
>   [29961.767707] block nbd74: Other side returned error (22)
>   [29961.837518] XFS (nbd74): metadata I/O error: block 0x6471ba0 ("xfs_tra=
> ns_read_buf_map") error 5 numblks 32
>   [29961.838172] block nbd74: Other side returned error (22)
>   [29961.838179] block nbd74: Other side returned error (22)
>   [29961.838184] block nbd74: Other side returned error (22)
>   [29961.838203] block nbd74: Other side returned error (22)
>   [29961.838208] block nbd74: Other side returned error (22)
>   [29962.259551] XFS (nbd74): xfs_imap_to_bp: xfs_trans_read_buf() returned=
>  error -5.
>   [29962.356376] XFS (nbd74): xfs_do_force_shutdown(0x8) called from line 3=
> 454 of file fs/xfs/xfs_inode.c.  Return address =3D 0xffffffff813bf471
>   [29962.503003] XFS (nbd74): Corruption of in-memory data detected.  Shutt=
> ing down filesystem
>   [29963.166314] XFS (nbd74): Please umount the filesystem and rectify the =
> problem(s)
> 
> I'm pretty sure the process isn't deadlocking on the spinlock because it
> doesn't burn any CPU and is really out of the scheduler pool.  It should be
> noted that when I noticed the hung umount process, I've manually tried to
> unmount the corresponding XFS mountpoint and that was fine, though it
> obviously didn't "unhang" the stuck umount process.
> 

I'm not parsing the last bit here.. you were able to manually unmount
the hung unmount..?

That aside, could you post a snippet of the tracepoint output
('trace-cmd start -e "xfs:*"; cat /sys/kernel/debug/tracing/trace_pipe')
when the problem occurs? Also, how about the stack of the xfsaild thread
for that specific mount ('ps aux | grep xfsaild; cat
/proc/<pid>/stack').

Brian

> Any help would be appreciated :)
> 
> Thanks,
> Quentin



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS race on umount
  2017-03-10 14:05 ` Brian Foster
@ 2017-03-10 14:38   ` Quentin Casasnovas
  2017-03-10 14:52     ` Brian Foster
  0 siblings, 1 reply; 7+ messages in thread
From: Quentin Casasnovas @ 2017-03-10 14:38 UTC (permalink / raw)
  To: Brian Foster; +Cc: Quentin Casasnovas, linux-xfs, Darrick J. Wong

[-- Attachment #1: Type: text/plain, Size: 6265 bytes --]

On Fri, Mar 10, 2017 at 09:05:35AM -0500, Brian Foster wrote:
> On Fri, Mar 10, 2017 at 01:04:06PM +0100, Quentin Casasnovas wrote:
> > Hi Guys,
> > 
> > We've been using XFS recently on our build system because we found that it
> > scales pretty well and we have good use for the reflink feature :)
> > 
> > I think our setup is relivatively unique in that on every one of our build
> > server, we mount hundreds of XFS filesystem from NBD devices in parallel,
> > where our build environment are stored on qcow2 images and connected with
> > qemu-nbd, then umount them when the build is finished.  Those qcow2 images
> > are stored on a NFS mount, which leads to some (expected) hickups when
> > reading/writing blocks where sometimes the NBD layer will return some
> > errors to the block layer, which in turn will pass them on to XFS.  It
> > could be due to network contention, very high load on the server, or any
> > transcient error really, and in those cases, XFS will normally force shut
> > down the filesystem and wait for a umount.
> > 
> > All of this is fine and is exactly the behaviour we'd expect, though it
> > turns out that we keep hiting what I think is a race condition between
> > umount and a force shutdown from XFS itself, where I have a umount process
> > completely stuck in xfs_ail_push_all_sync():
> > 
> >   [<ffffffff813d987e>] xfs_ail_push_all_sync+0x9e/0xe0
> >   [<ffffffff813c20c7>] xfs_unmountfs+0x67/0x150
> >   [<ffffffff813c5540>] xfs_fs_put_super+0x20/0x70
> >   [<ffffffff811cba7a>] generic_shutdown_super+0x6a/0xf0
> >   [<ffffffff811cbb2b>] kill_block_super+0x2b/0x80
> >   [<ffffffff811cc067>] deactivate_locked_super+0x47/0x80
> >   [<ffffffff811ccc19>] deactivate_super+0x49/0x70
> >   [<ffffffff811e7b3e>] cleanup_mnt+0x3e/0x90
> >   [<ffffffff811e7bdd>] __cleanup_mnt+0xd/0x10
> >   [<ffffffff810e1b39>] task_work_run+0x79/0xa0
> >   [<ffffffff810c2df7>] exit_to_usermode_loop+0x4f/0x75
> >   [<ffffffff8100134b>] syscall_return_slowpath+0x5b/0x70
> >   [<ffffffff81a2cbe3>] entry_SYSCALL_64_fastpath+0x96/0x98
> >   [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> > This is on a v4.10.1 kernel.  I've had a look at xfs_ail_push_all_sync()
> > and I wonder if there isn't a potential lost wake up problem, where I can't
> > see that we retest the condition after setting the current process to
> > TASK_UNINTERRUPTIBLE and before calling schedule() (though I know nothing
> > about XFS internals...).
> > 
> > Here's an exerpt of relevant dmesg messages that very likely happened at
> > the same time the unmount process was started:
> > 
> >   [29961.767707] block nbd74: Other side returned error (22)
> >   [29961.837518] XFS (nbd74): metadata I/O error: block 0x6471ba0 ("xfs_tra=
> > ns_read_buf_map") error 5 numblks 32
> >   [29961.838172] block nbd74: Other side returned error (22)
> >   [29961.838179] block nbd74: Other side returned error (22)
> >   [29961.838184] block nbd74: Other side returned error (22)
> >   [29961.838203] block nbd74: Other side returned error (22)
> >   [29961.838208] block nbd74: Other side returned error (22)
> >   [29962.259551] XFS (nbd74): xfs_imap_to_bp: xfs_trans_read_buf() returned=
> >  error -5.
> >   [29962.356376] XFS (nbd74): xfs_do_force_shutdown(0x8) called from line 3=
> > 454 of file fs/xfs/xfs_inode.c.  Return address =3D 0xffffffff813bf471
> >   [29962.503003] XFS (nbd74): Corruption of in-memory data detected.  Shutt=
> > ing down filesystem
> >   [29963.166314] XFS (nbd74): Please umount the filesystem and rectify the =
> > problem(s)
> > 
> > I'm pretty sure the process isn't deadlocking on the spinlock because it
> > doesn't burn any CPU and is really out of the scheduler pool.  It should be
> > noted that when I noticed the hung umount process, I've manually tried to
> > unmount the corresponding XFS mountpoint and that was fine, though it
> > obviously didn't "unhang" the stuck umount process.
> > 
> 
> I'm not parsing the last bit here.. you were able to manually unmount
> the hung unmount..?

Sorry if I'm not very clear...  There is a 'umount' process hung, with the
stack trace I've pasted above.  I then tried to run another umount on the
mount point, and that seems to have succeeded.  I'll need to reproduce the
hang to confirm it was the case.

> 
> That aside, could you post a snippet of the tracepoint output
> ('trace-cmd start -e "xfs:*"; cat /sys/kernel/debug/tracing/trace_pipe')
> when the problem occurs? Also, how about the stack of the xfsaild thread
> for that specific mount ('ps aux | grep xfsaild; cat
> /proc/<pid>/stack').
> 

I'll try to collect the trace output requested next time it reproduces.  In
the meantime, the xfsaild trace isn't very telling much afaict:

  $ ps aux | grep umount | grep -v grep
  root      63425  0.0  0.0  18688  2552 ?        DN   Mar09   0:00 umount -n /var/lib/chroot/mount/qcow2/i386sYA9rN-mnt

  $ sudo cat /proc/63425/stack
  [<ffffffff813d987e>] xfs_ail_push_all_sync+0x9e/0xe0
  [<ffffffff813c20c7>] xfs_unmountfs+0x67/0x150
  [<ffffffff813c5540>] xfs_fs_put_super+0x20/0x70
  [<ffffffff811cba7a>] generic_shutdown_super+0x6a/0xf0
  [<ffffffff811cbb2b>] kill_block_super+0x2b/0x80
  [<ffffffff811cc067>] deactivate_locked_super+0x47/0x80
  [<ffffffff811ccc19>] deactivate_super+0x49/0x70
  [<ffffffff811e7b3e>] cleanup_mnt+0x3e/0x90
  [<ffffffff811e7bdd>] __cleanup_mnt+0xd/0x10
  [<ffffffff810e1b39>] task_work_run+0x79/0xa0
  [<ffffffff810c2df7>] exit_to_usermode_loop+0x4f/0x75
  [<ffffffff8100134b>] syscall_return_slowpath+0x5b/0x70
  [<ffffffff81a2cbe3>] entry_SYSCALL_64_fastpath+0x96/0x98
  [<ffffffffffffffff>] 0xffffffffffffffff

  $ ps aux | grep xfsaild/nbd116 | grep -v grep
  root      15256  6.8  0.0      0     0 ?        S    Mar09 110:13 [xfsaild/nbd116]

  $ sudo cat /proc/15256/stack
  [<ffffffff813d920c>] xfsaild+0xcc/0x5b0
  [<ffffffff810e3e27>] kthread+0xf7/0x130
  [<ffffffff81a2cdb9>] ret_from_fork+0x29/0x40
  [<ffffffffffffffff>] 0xffffffffffffffff

  $ sudo umount /var/lib/chroot/mount/qcow2/i386sYA9rN-mnt
  umount: /var/lib/chroot/mount/qcow2/i386sYA9rN-mnt: not mounted


Thanks for the help!

Cheers,
Quentin

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS race on umount
  2017-03-10 14:38   ` Quentin Casasnovas
@ 2017-03-10 14:52     ` Brian Foster
  2017-03-20 12:33       ` Carlos Maiolino
  0 siblings, 1 reply; 7+ messages in thread
From: Brian Foster @ 2017-03-10 14:52 UTC (permalink / raw)
  To: Quentin Casasnovas; +Cc: linux-xfs, Darrick J. Wong

On Fri, Mar 10, 2017 at 03:38:46PM +0100, Quentin Casasnovas wrote:
> On Fri, Mar 10, 2017 at 09:05:35AM -0500, Brian Foster wrote:
> > On Fri, Mar 10, 2017 at 01:04:06PM +0100, Quentin Casasnovas wrote:
> > > Hi Guys,
> > > 
> > > We've been using XFS recently on our build system because we found that it
> > > scales pretty well and we have good use for the reflink feature :)
> > > 
> > > I think our setup is relivatively unique in that on every one of our build
> > > server, we mount hundreds of XFS filesystem from NBD devices in parallel,
> > > where our build environment are stored on qcow2 images and connected with
> > > qemu-nbd, then umount them when the build is finished.  Those qcow2 images
> > > are stored on a NFS mount, which leads to some (expected) hickups when
> > > reading/writing blocks where sometimes the NBD layer will return some
> > > errors to the block layer, which in turn will pass them on to XFS.  It
> > > could be due to network contention, very high load on the server, or any
> > > transcient error really, and in those cases, XFS will normally force shut
> > > down the filesystem and wait for a umount.
> > > 
> > > All of this is fine and is exactly the behaviour we'd expect, though it
> > > turns out that we keep hiting what I think is a race condition between
> > > umount and a force shutdown from XFS itself, where I have a umount process
> > > completely stuck in xfs_ail_push_all_sync():
> > > 
> > >   [<ffffffff813d987e>] xfs_ail_push_all_sync+0x9e/0xe0
> > >   [<ffffffff813c20c7>] xfs_unmountfs+0x67/0x150
> > >   [<ffffffff813c5540>] xfs_fs_put_super+0x20/0x70
> > >   [<ffffffff811cba7a>] generic_shutdown_super+0x6a/0xf0
> > >   [<ffffffff811cbb2b>] kill_block_super+0x2b/0x80
> > >   [<ffffffff811cc067>] deactivate_locked_super+0x47/0x80
> > >   [<ffffffff811ccc19>] deactivate_super+0x49/0x70
> > >   [<ffffffff811e7b3e>] cleanup_mnt+0x3e/0x90
> > >   [<ffffffff811e7bdd>] __cleanup_mnt+0xd/0x10
> > >   [<ffffffff810e1b39>] task_work_run+0x79/0xa0
> > >   [<ffffffff810c2df7>] exit_to_usermode_loop+0x4f/0x75
> > >   [<ffffffff8100134b>] syscall_return_slowpath+0x5b/0x70
> > >   [<ffffffff81a2cbe3>] entry_SYSCALL_64_fastpath+0x96/0x98
> > >   [<ffffffffffffffff>] 0xffffffffffffffff
> > > 
> > > This is on a v4.10.1 kernel.  I've had a look at xfs_ail_push_all_sync()
> > > and I wonder if there isn't a potential lost wake up problem, where I can't
> > > see that we retest the condition after setting the current process to
> > > TASK_UNINTERRUPTIBLE and before calling schedule() (though I know nothing
> > > about XFS internals...).
> > > 
> > > Here's an exerpt of relevant dmesg messages that very likely happened at
> > > the same time the unmount process was started:
> > > 
> > >   [29961.767707] block nbd74: Other side returned error (22)
> > >   [29961.837518] XFS (nbd74): metadata I/O error: block 0x6471ba0 ("xfs_tra=
> > > ns_read_buf_map") error 5 numblks 32
> > >   [29961.838172] block nbd74: Other side returned error (22)
> > >   [29961.838179] block nbd74: Other side returned error (22)
> > >   [29961.838184] block nbd74: Other side returned error (22)
> > >   [29961.838203] block nbd74: Other side returned error (22)
> > >   [29961.838208] block nbd74: Other side returned error (22)
> > >   [29962.259551] XFS (nbd74): xfs_imap_to_bp: xfs_trans_read_buf() returned=
> > >  error -5.
> > >   [29962.356376] XFS (nbd74): xfs_do_force_shutdown(0x8) called from line 3=
> > > 454 of file fs/xfs/xfs_inode.c.  Return address =3D 0xffffffff813bf471
> > >   [29962.503003] XFS (nbd74): Corruption of in-memory data detected.  Shutt=
> > > ing down filesystem
> > >   [29963.166314] XFS (nbd74): Please umount the filesystem and rectify the =
> > > problem(s)
> > > 
> > > I'm pretty sure the process isn't deadlocking on the spinlock because it
> > > doesn't burn any CPU and is really out of the scheduler pool.  It should be
> > > noted that when I noticed the hung umount process, I've manually tried to
> > > unmount the corresponding XFS mountpoint and that was fine, though it
> > > obviously didn't "unhang" the stuck umount process.
> > > 
> > 
> > I'm not parsing the last bit here.. you were able to manually unmount
> > the hung unmount..?
> 
> Sorry if I'm not very clear...  There is a 'umount' process hung, with the
> stack trace I've pasted above.  I then tried to run another umount on the
> mount point, and that seems to have succeeded.  I'll need to reproduce the
> hang to confirm it was the case.
> 

Oh, Ok. What is probably happening there is that the mountpoint has
actually been removed from the directory tree by the first unmount
attempt (and just got stuck before it completed breaking down the fs),
and so the second doesn't actually do anything. You can confirm whether
anything is actually there when you reproduce, though.

> > 
> > That aside, could you post a snippet of the tracepoint output
> > ('trace-cmd start -e "xfs:*"; cat /sys/kernel/debug/tracing/trace_pipe')
> > when the problem occurs? Also, how about the stack of the xfsaild thread
> > for that specific mount ('ps aux | grep xfsaild; cat
> > /proc/<pid>/stack').
> > 
> 
> I'll try to collect the trace output requested next time it reproduces.  In
> the meantime, the xfsaild trace isn't very telling much afaict:
> 
>   $ ps aux | grep umount | grep -v grep
>   root      63425  0.0  0.0  18688  2552 ?        DN   Mar09   0:00 umount -n /var/lib/chroot/mount/qcow2/i386sYA9rN-mnt
> 
>   $ sudo cat /proc/63425/stack
>   [<ffffffff813d987e>] xfs_ail_push_all_sync+0x9e/0xe0
>   [<ffffffff813c20c7>] xfs_unmountfs+0x67/0x150
>   [<ffffffff813c5540>] xfs_fs_put_super+0x20/0x70
>   [<ffffffff811cba7a>] generic_shutdown_super+0x6a/0xf0
>   [<ffffffff811cbb2b>] kill_block_super+0x2b/0x80
>   [<ffffffff811cc067>] deactivate_locked_super+0x47/0x80
>   [<ffffffff811ccc19>] deactivate_super+0x49/0x70
>   [<ffffffff811e7b3e>] cleanup_mnt+0x3e/0x90
>   [<ffffffff811e7bdd>] __cleanup_mnt+0xd/0x10
>   [<ffffffff810e1b39>] task_work_run+0x79/0xa0
>   [<ffffffff810c2df7>] exit_to_usermode_loop+0x4f/0x75
>   [<ffffffff8100134b>] syscall_return_slowpath+0x5b/0x70
>   [<ffffffff81a2cbe3>] entry_SYSCALL_64_fastpath+0x96/0x98
>   [<ffffffffffffffff>] 0xffffffffffffffff
> 
>   $ ps aux | grep xfsaild/nbd116 | grep -v grep
>   root      15256  6.8  0.0      0     0 ?        S    Mar09 110:13 [xfsaild/nbd116]
> 
>   $ sudo cat /proc/15256/stack
>   [<ffffffff813d920c>] xfsaild+0xcc/0x5b0
>   [<ffffffff810e3e27>] kthread+0xf7/0x130
>   [<ffffffff81a2cdb9>] ret_from_fork+0x29/0x40
>   [<ffffffffffffffff>] 0xffffffffffffffff
> 

Indeed. This probably means xfsaild is spinning (not blocked). The
tracepoint output should hopefully give us an idea of what it is doing.
BTW, please do filter it against the particular device, if possible.
Most XFS tracepoints start with something like "dev <major>:<minor>"
that is easily grep'able to distinguish one fs from another.

>   $ sudo umount /var/lib/chroot/mount/qcow2/i386sYA9rN-mnt
>   umount: /var/lib/chroot/mount/qcow2/i386sYA9rN-mnt: not mounted
> 

Yeah, it looks like the situation mentioned above. The mountpoint is no
longer accessible, but still not fully unmounted internally.

Brian

> 
> Thanks for the help!
> 
> Cheers,
> Quentin



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS race on umount
  2017-03-10 14:52     ` Brian Foster
@ 2017-03-20 12:33       ` Carlos Maiolino
  2017-03-24 12:13         ` Quentin Casasnovas
  0 siblings, 1 reply; 7+ messages in thread
From: Carlos Maiolino @ 2017-03-20 12:33 UTC (permalink / raw)
  To: Brian Foster; +Cc: Quentin Casasnovas, linux-xfs, Darrick J. Wong

On Fri, Mar 10, 2017 at 09:52:54AM -0500, Brian Foster wrote:
> On Fri, Mar 10, 2017 at 03:38:46PM +0100, Quentin Casasnovas wrote:
> > On Fri, Mar 10, 2017 at 09:05:35AM -0500, Brian Foster wrote:
> > > On Fri, Mar 10, 2017 at 01:04:06PM +0100, Quentin Casasnovas wrote:
> > > > Hi Guys,
> > > > 
> > > > We've been using XFS recently on our build system because we found that it
> > > > scales pretty well and we have good use for the reflink feature :)
> > > > 
> > > > I think our setup is relivatively unique in that on every one of our build
> > > > server, we mount hundreds of XFS filesystem from NBD devices in parallel,
> > > > where our build environment are stored on qcow2 images and connected with
> > > > qemu-nbd, then umount them when the build is finished.  Those qcow2 images
> > > > are stored on a NFS mount, which leads to some (expected) hickups when
> > > > reading/writing blocks where sometimes the NBD layer will return some
> > > > errors to the block layer, which in turn will pass them on to XFS.  It
> > > > could be due to network contention, very high load on the server, or any
> > > > transcient error really, and in those cases, XFS will normally force shut
> > > > down the filesystem and wait for a umount.
> > > > 
> > > > All of this is fine and is exactly the behaviour we'd expect, though it
> > > > turns out that we keep hiting what I think is a race condition between
> > > > umount and a force shutdown from XFS itself, where I have a umount process
> > > > completely stuck in xfs_ail_push_all_sync():
> > > > 
> > > >   [<ffffffff813d987e>] xfs_ail_push_all_sync+0x9e/0xe0
> > > >   [<ffffffff813c20c7>] xfs_unmountfs+0x67/0x150
> > > >   [<ffffffff813c5540>] xfs_fs_put_super+0x20/0x70
> > > >   [<ffffffff811cba7a>] generic_shutdown_super+0x6a/0xf0
> > > >   [<ffffffff811cbb2b>] kill_block_super+0x2b/0x80
> > > >   [<ffffffff811cc067>] deactivate_locked_super+0x47/0x80
> > > >   [<ffffffff811ccc19>] deactivate_super+0x49/0x70
> > > >   [<ffffffff811e7b3e>] cleanup_mnt+0x3e/0x90
> > > >   [<ffffffff811e7bdd>] __cleanup_mnt+0xd/0x10
> > > >   [<ffffffff810e1b39>] task_work_run+0x79/0xa0
> > > >   [<ffffffff810c2df7>] exit_to_usermode_loop+0x4f/0x75
> > > >   [<ffffffff8100134b>] syscall_return_slowpath+0x5b/0x70
> > > >   [<ffffffff81a2cbe3>] entry_SYSCALL_64_fastpath+0x96/0x98
> > > >   [<ffffffffffffffff>] 0xffffffffffffffff
> > > > 
> > > > This is on a v4.10.1 kernel.  I've had a look at xfs_ail_push_all_sync()
> > > > and I wonder if there isn't a potential lost wake up problem, where I can't
> > > > see that we retest the condition after setting the current process to
> > > > TASK_UNINTERRUPTIBLE and before calling schedule() (though I know nothing
> > > > about XFS internals...).
> > > > 
> > > > Here's an exerpt of relevant dmesg messages that very likely happened at
> > > > the same time the unmount process was started:
> > > > 
> > > >   [29961.767707] block nbd74: Other side returned error (22)
> > > >   [29961.837518] XFS (nbd74): metadata I/O error: block 0x6471ba0 ("xfs_tra=
> > > > ns_read_buf_map") error 5 numblks 32
> > > >   [29961.838172] block nbd74: Other side returned error (22)
> > > >   [29961.838179] block nbd74: Other side returned error (22)
> > > >   [29961.838184] block nbd74: Other side returned error (22)
> > > >   [29961.838203] block nbd74: Other side returned error (22)
> > > >   [29961.838208] block nbd74: Other side returned error (22)
> > > >   [29962.259551] XFS (nbd74): xfs_imap_to_bp: xfs_trans_read_buf() returned=
> > > >  error -5.
> > > >   [29962.356376] XFS (nbd74): xfs_do_force_shutdown(0x8) called from line 3=
> > > > 454 of file fs/xfs/xfs_inode.c.  Return address =3D 0xffffffff813bf471
> > > >   [29962.503003] XFS (nbd74): Corruption of in-memory data detected.  Shutt=
> > > > ing down filesystem
> > > >   [29963.166314] XFS (nbd74): Please umount the filesystem and rectify the =
> > > > problem(s)
> > > > 
> > > > I'm pretty sure the process isn't deadlocking on the spinlock because it
> > > > doesn't burn any CPU and is really out of the scheduler pool.  It should be
> > > > noted that when I noticed the hung umount process, I've manually tried to
> > > > unmount the corresponding XFS mountpoint and that was fine, though it
> > > > obviously didn't "unhang" the stuck umount process.
> > > > 
> > > 
> > > I'm not parsing the last bit here.. you were able to manually unmount
> > > the hung unmount..?
> > 
> > Sorry if I'm not very clear...  There is a 'umount' process hung, with the
> > stack trace I've pasted above.  I then tried to run another umount on the
> > mount point, and that seems to have succeeded.  I'll need to reproduce the
> > hang to confirm it was the case.
> > 
> 
> Oh, Ok. What is probably happening there is that the mountpoint has
> actually been removed from the directory tree by the first unmount
> attempt (and just got stuck before it completed breaking down the fs),
> and so the second doesn't actually do anything. You can confirm whether
> anything is actually there when you reproduce, though.
> 
> > > 
> > > That aside, could you post a snippet of the tracepoint output
> > > ('trace-cmd start -e "xfs:*"; cat /sys/kernel/debug/tracing/trace_pipe')
> > > when the problem occurs? Also, how about the stack of the xfsaild thread
> > > for that specific mount ('ps aux | grep xfsaild; cat
> > > /proc/<pid>/stack').
> > > 
> > 
> > I'll try to collect the trace output requested next time it reproduces.  In
> > the meantime, the xfsaild trace isn't very telling much afaict:
> > 
> >   $ ps aux | grep umount | grep -v grep
> >   root      63425  0.0  0.0  18688  2552 ?        DN   Mar09   0:00 umount -n /var/lib/chroot/mount/qcow2/i386sYA9rN-mnt
> > 
> >   $ sudo cat /proc/63425/stack
> >   [<ffffffff813d987e>] xfs_ail_push_all_sync+0x9e/0xe0
> >   [<ffffffff813c20c7>] xfs_unmountfs+0x67/0x150
> >   [<ffffffff813c5540>] xfs_fs_put_super+0x20/0x70
> >   [<ffffffff811cba7a>] generic_shutdown_super+0x6a/0xf0
> >   [<ffffffff811cbb2b>] kill_block_super+0x2b/0x80
> >   [<ffffffff811cc067>] deactivate_locked_super+0x47/0x80
> >   [<ffffffff811ccc19>] deactivate_super+0x49/0x70
> >   [<ffffffff811e7b3e>] cleanup_mnt+0x3e/0x90
> >   [<ffffffff811e7bdd>] __cleanup_mnt+0xd/0x10
> >   [<ffffffff810e1b39>] task_work_run+0x79/0xa0
> >   [<ffffffff810c2df7>] exit_to_usermode_loop+0x4f/0x75
> >   [<ffffffff8100134b>] syscall_return_slowpath+0x5b/0x70
> >   [<ffffffff81a2cbe3>] entry_SYSCALL_64_fastpath+0x96/0x98
> >   [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> >   $ ps aux | grep xfsaild/nbd116 | grep -v grep
> >   root      15256  6.8  0.0      0     0 ?        S    Mar09 110:13 [xfsaild/nbd116]
> > 
> >   $ sudo cat /proc/15256/stack
> >   [<ffffffff813d920c>] xfsaild+0xcc/0x5b0
> >   [<ffffffff810e3e27>] kthread+0xf7/0x130
> >   [<ffffffff81a2cdb9>] ret_from_fork+0x29/0x40
> >   [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> 
> Indeed. This probably means xfsaild is spinning (not blocked). The
> tracepoint output should hopefully give us an idea of what it is doing.
> BTW, please do filter it against the particular device, if possible.
> Most XFS tracepoints start with something like "dev <major>:<minor>"
> that is easily grep'able to distinguish one fs from another.
> 
> >   $ sudo umount /var/lib/chroot/mount/qcow2/i386sYA9rN-mnt
> >   umount: /var/lib/chroot/mount/qcow2/i386sYA9rN-mnt: not mounted
> > 
> 
> Yeah, it looks like the situation mentioned above. The mountpoint is no
> longer accessible, but still not fully unmounted internally.
> 

This actually looks pretty much with the problem I've been working on, or with
the previous one where we've introduced fail_at_unmount syscall config to avoid
such problems like this.

Can you confirm if fail_at_unmount is active and if it can avoid the above
problem to happen? If it doesn't avoid the problem to happen there, then, I'm
almost 100% sure it's the same problem I've been working on with AIL items not
being retried, but FWIW, this only happens if some sort of IO error happened
previously, which looks like to be your case too.

Cheers.

> Brian
> 
> > 
> > Thanks for the help!
> > 
> > Cheers,
> > Quentin
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Carlos

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS race on umount
  2017-03-20 12:33       ` Carlos Maiolino
@ 2017-03-24 12:13         ` Quentin Casasnovas
  2017-04-03  8:15           ` Carlos Maiolino
  0 siblings, 1 reply; 7+ messages in thread
From: Quentin Casasnovas @ 2017-03-24 12:13 UTC (permalink / raw)
  To: Brian Foster, Quentin Casasnovas, linux-xfs, Darrick J. Wong

[-- Attachment #1: Type: text/plain, Size: 4031 bytes --]

On Mon, Mar 20, 2017 at 01:33:50PM +0100, Carlos Maiolino wrote:
> On Fri, Mar 10, 2017 at 09:52:54AM -0500, Brian Foster wrote:
> > On Fri, Mar 10, 2017 at 03:38:46PM +0100, Quentin Casasnovas wrote:
> > > On Fri, Mar 10, 2017 at 09:05:35AM -0500, Brian Foster wrote:
> > > > On Fri, Mar 10, 2017 at 01:04:06PM +0100, Quentin Casasnovas wrote:
> > > > > Hi Guys,
> > > > > 
> > > > > We've been using XFS recently on our build system because we found that it
> > > > > scales pretty well and we have good use for the reflink feature :)
> > > > > 
> > > > > I think our setup is relivatively unique in that on every one of our build
> > > > > server, we mount hundreds of XFS filesystem from NBD devices in parallel,
> > > > > where our build environment are stored on qcow2 images and connected with
> > > > > qemu-nbd, then umount them when the build is finished.  Those qcow2 images
> > > > > are stored on a NFS mount, which leads to some (expected) hickups when
> > > > > reading/writing blocks where sometimes the NBD layer will return some
> > > > > errors to the block layer, which in turn will pass them on to XFS.  It
> > > > > could be due to network contention, very high load on the server, or any
> > > > > transcient error really, and in those cases, XFS will normally force shut
> > > > > down the filesystem and wait for a umount.
> > > > > 
> > > > > All of this is fine and is exactly the behaviour we'd expect, though it
> > > > > turns out that we keep hiting what I think is a race condition between
> > > > > umount and a force shutdown from XFS itself, where I have a umount process
> > > > > completely stuck in xfs_ail_push_all_sync():
> > > > > 
> > > > >   [<ffffffff813d987e>] xfs_ail_push_all_sync+0x9e/0xe0
> > > > >   [<ffffffff813c20c7>] xfs_unmountfs+0x67/0x150
> > > > >   [<ffffffff813c5540>] xfs_fs_put_super+0x20/0x70
> > > > >   [<ffffffff811cba7a>] generic_shutdown_super+0x6a/0xf0
> > > > >   [<ffffffff811cbb2b>] kill_block_super+0x2b/0x80
> > > > >   [<ffffffff811cc067>] deactivate_locked_super+0x47/0x80
> > > > >   [<ffffffff811ccc19>] deactivate_super+0x49/0x70
> > > > >   [<ffffffff811e7b3e>] cleanup_mnt+0x3e/0x90
> > > > >   [<ffffffff811e7bdd>] __cleanup_mnt+0xd/0x10
> > > > >   [<ffffffff810e1b39>] task_work_run+0x79/0xa0
> > > > >   [<ffffffff810c2df7>] exit_to_usermode_loop+0x4f/0x75
> > > > >   [<ffffffff8100134b>] syscall_return_slowpath+0x5b/0x70
> > > > >   [<ffffffff81a2cbe3>] entry_SYSCALL_64_fastpath+0x96/0x98
> > > > >   [<ffffffffffffffff>] 0xffffffffffffffff
> > > > > 
> 
> This actually looks pretty much with the problem I've been working on, or with
> the previous one where we've introduced fail_at_unmount syscall config to avoid
> such problems like this.
> 
> Can you confirm if fail_at_unmount is active and if it can avoid the above
> problem to happen? If it doesn't avoid the problem to happen there, then, I'm
> almost 100% sure it's the same problem I've been working on with AIL items not
> being retried, but FWIW, this only happens if some sort of IO error happened
> previously, which looks like to be your case too.
> 

I have not tried fail_at_umount yet but I could reproduce similar umount
hangs using NBD and NFS:

  # Create an image with an XFS filesystem on it
  qemu-img create -f qcow2 test-img.qcow2 10GB
  qemu-nbd -c /dev/nbd0 test-img.qcow2
  mkfs.xfs /dev/nbd0
  qemu-nbd -d /dev/nbd0

Now, serve the image over NFSv3, which doesn't support delete-on-last-close:

  cp test-img.qcow2 /path/to/nfs_mountpoint/
  qemu-nbd -c /dev/nbd0 /path/to/nfs_mountpoint/test-img.qcow2
  mount /dev/nbd0 /mnt

Trigger some IO on the mount point:

  cp -r ~/linux-2.6/ /mnt/

While there is on-going IO, overwrite the image served over NFS with your
original blank image:

  cp test-img.qcow2 /path/to/nfs_mountpoint/

Interrupt the IO if it hasn't failed with IO errors already and try to
unmount, this should result the umount process being stuck.

Quentin


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS race on umount
  2017-03-24 12:13         ` Quentin Casasnovas
@ 2017-04-03  8:15           ` Carlos Maiolino
  0 siblings, 0 replies; 7+ messages in thread
From: Carlos Maiolino @ 2017-04-03  8:15 UTC (permalink / raw)
  To: Quentin Casasnovas; +Cc: linux-xfs

> > > > > > 
> > > > > > We've been using XFS recently on our build system because we found that it
> > > > > > scales pretty well and we have good use for the reflink feature :)
> > > > > > 
> > > > > > I think our setup is relivatively unique in that on every one of our build
> > > > > > server, we mount hundreds of XFS filesystem from NBD devices in parallel,
> > > > > > where our build environment are stored on qcow2 images and connected with
> > > > > > qemu-nbd, then umount them when the build is finished.  Those qcow2 images
> > > > > > are stored on a NFS mount, which leads to some (expected) hickups when
> > > > > > reading/writing blocks where sometimes the NBD layer will return some
> > > > > > errors to the block layer, which in turn will pass them on to XFS.  It
> > > > > > could be due to network contention, very high load on the server, or any
> > > > > > transcient error really, and in those cases, XFS will normally force shut
> > > > > > down the filesystem and wait for a umount.
> > > > > > 
> > > > > > All of this is fine and is exactly the behaviour we'd expect, though it
> > > > > > turns out that we keep hiting what I think is a race condition between
> > > > > > umount and a force shutdown from XFS itself, where I have a umount process
> > > > > > completely stuck in xfs_ail_push_all_sync():
> > > > > > 
> > > > > >   [<ffffffff813d987e>] xfs_ail_push_all_sync+0x9e/0xe0
> > > > > >   [<ffffffff813c20c7>] xfs_unmountfs+0x67/0x150
> > > > > >   [<ffffffff813c5540>] xfs_fs_put_super+0x20/0x70
> > > > > >   [<ffffffff811cba7a>] generic_shutdown_super+0x6a/0xf0
> > > > > >   [<ffffffff811cbb2b>] kill_block_super+0x2b/0x80
> > > > > >   [<ffffffff811cc067>] deactivate_locked_super+0x47/0x80
> > > > > >   [<ffffffff811ccc19>] deactivate_super+0x49/0x70
> > > > > >   [<ffffffff811e7b3e>] cleanup_mnt+0x3e/0x90
> > > > > >   [<ffffffff811e7bdd>] __cleanup_mnt+0xd/0x10
> > > > > >   [<ffffffff810e1b39>] task_work_run+0x79/0xa0
> > > > > >   [<ffffffff810c2df7>] exit_to_usermode_loop+0x4f/0x75
> > > > > >   [<ffffffff8100134b>] syscall_return_slowpath+0x5b/0x70
> > > > > >   [<ffffffff81a2cbe3>] entry_SYSCALL_64_fastpath+0x96/0x98
> > > > > >   [<ffffffffffffffff>] 0xffffffffffffffff
> > > > > > 
> > 
> > This actually looks pretty much with the problem I've been working on, or with
> > the previous one where we've introduced fail_at_unmount syscall config to avoid
> > such problems like this.
> > 
> > Can you confirm if fail_at_unmount is active and if it can avoid the above
> > problem to happen? If it doesn't avoid the problem to happen there, then, I'm
> > almost 100% sure it's the same problem I've been working on with AIL items not
> > being retried, but FWIW, this only happens if some sort of IO error happened
> > previously, which looks like to be your case too.
> > 
> 
> I have not tried fail_at_umount yet but I could reproduce similar umount
> hangs using NBD and NFS:
> 

Do you have the whole stack dump from the system where you trigger this problem?
Also, please try to use fail_at_unmount, it has been designed to prevent such
cases, when we have EIOs and the filesystem can get into an hang state during
unmounting process.

Which kernel version are you using btw?


>   # Create an image with an XFS filesystem on it
>   qemu-img create -f qcow2 test-img.qcow2 10GB
>   qemu-nbd -c /dev/nbd0 test-img.qcow2
>   mkfs.xfs /dev/nbd0
>   qemu-nbd -d /dev/nbd0
> 
> Quentin
> 



-- 
Carlos

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-04-03  8:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-10 12:04 XFS race on umount Quentin Casasnovas
2017-03-10 14:05 ` Brian Foster
2017-03-10 14:38   ` Quentin Casasnovas
2017-03-10 14:52     ` Brian Foster
2017-03-20 12:33       ` Carlos Maiolino
2017-03-24 12:13         ` Quentin Casasnovas
2017-04-03  8:15           ` Carlos Maiolino

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.