All of lore.kernel.org
 help / color / mirror / Atom feed
* XFS blocking suspend
@ 2016-12-01  8:47 Jan Kara
  2016-12-01 13:44 ` Brian Foster
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kara @ 2016-12-01  8:47 UTC (permalink / raw)
  To: linux-xfs

Hi,

I've got a report of xfs_aild blocking system suspend in 4.8.7 (in openSUSE
Tumbleweed which is our rolling distro):

Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
xfsaild/sdb3    D 0000000000019680     0 918      2 0x00000080
 ffff9e685409fb88 0000000000000000 ffff9e67beaea080 ffff9e68504c6000
 ffff9e6677226b80 ffff9e68540a0000 ffff9e676068c6d8 ffff9e68504c6000
 ffff9e685e48dc00 ffff9e676068c600 ffff9e685409fba0 ffffffffb66cfbac
Call Trace:
 [<ffffffffb66cfbac>] schedule+0x3c/0x90
 [<ffffffffb66d2f1e>] schedule_timeout+0x22e/0x410
 [<ffffffffb66d0f4a>] wait_for_completion+0x9a/0x100
 [<ffffffffc0f0689e>] xfs_buf_submit_wait+0x7e/0x250 [xfs]
 [<ffffffffc0f06ba8>] xfs_buf_read_map+0x108/0x190 [xfs]
 [<ffffffffc0f340c0>] xfs_trans_read_buf_map+0x100/0x370 [xfs]
 [<ffffffffc0ef631e>] xfs_imap_to_bp+0x5e/0xd0 [xfs]
 [<ffffffffc0f1ac6a>] xfs_iflush+0xca/0x220 [xfs]                                                                                        
 [<ffffffffc0f2b21b>] xfs_inode_item_push+0xcb/0x120 [xfs]
 [<ffffffffc0f32e8e>] xfsaild+0x30e/0x770 [xfs]
 [<ffffffffb609c5ed>] kthread+0xbd/0xe0
 [<ffffffffb66d459f>] ret_from_fork+0x1f/0x40
DWARF2 unwinder stuck at ret_from_fork+0x1f/0x40

Leftover inexact backtrace:
 [<ffffffffb609c530>] ?  kthread_worker_fn+0x170/0x170

What I think has happened is that b_ioend_wq got already frozen during
suspend and thus submitted read could not be completed (all buffer IO
completions seem to be happening from workqueue now if I'm reading the code
right) and thus xfs_aild never finished waiting for IO so that it could be
frozen in try_to_freeze().

I'm not sure how to best fix this since I don't think we can easily have
suspend dependencies between different execution contexts... We could
possibly complete buffer IO already from softirq (which should also reduce
IO latency somewhat) if it does not have ->iodone callback but maybe there's
some problem with it I'm missing.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS blocking suspend
  2016-12-01  8:47 XFS blocking suspend Jan Kara
@ 2016-12-01 13:44 ` Brian Foster
  2016-12-01 14:09   ` Jan Kara
  0 siblings, 1 reply; 5+ messages in thread
From: Brian Foster @ 2016-12-01 13:44 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-xfs

On Thu, Dec 01, 2016 at 09:47:57AM +0100, Jan Kara wrote:
> Hi,
> 
> I've got a report of xfs_aild blocking system suspend in 4.8.7 (in openSUSE
> Tumbleweed which is our rolling distro):
> 
> Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
> xfsaild/sdb3    D 0000000000019680     0 918      2 0x00000080
>  ffff9e685409fb88 0000000000000000 ffff9e67beaea080 ffff9e68504c6000
>  ffff9e6677226b80 ffff9e68540a0000 ffff9e676068c6d8 ffff9e68504c6000
>  ffff9e685e48dc00 ffff9e676068c600 ffff9e685409fba0 ffffffffb66cfbac
> Call Trace:
>  [<ffffffffb66cfbac>] schedule+0x3c/0x90
>  [<ffffffffb66d2f1e>] schedule_timeout+0x22e/0x410
>  [<ffffffffb66d0f4a>] wait_for_completion+0x9a/0x100
>  [<ffffffffc0f0689e>] xfs_buf_submit_wait+0x7e/0x250 [xfs]
>  [<ffffffffc0f06ba8>] xfs_buf_read_map+0x108/0x190 [xfs]
>  [<ffffffffc0f340c0>] xfs_trans_read_buf_map+0x100/0x370 [xfs]
>  [<ffffffffc0ef631e>] xfs_imap_to_bp+0x5e/0xd0 [xfs]
>  [<ffffffffc0f1ac6a>] xfs_iflush+0xca/0x220 [xfs]                                                                                        
>  [<ffffffffc0f2b21b>] xfs_inode_item_push+0xcb/0x120 [xfs]
>  [<ffffffffc0f32e8e>] xfsaild+0x30e/0x770 [xfs]
>  [<ffffffffb609c5ed>] kthread+0xbd/0xe0
>  [<ffffffffb66d459f>] ret_from_fork+0x1f/0x40
> DWARF2 unwinder stuck at ret_from_fork+0x1f/0x40
> 
> Leftover inexact backtrace:
>  [<ffffffffb609c530>] ?  kthread_worker_fn+0x170/0x170
> 
> What I think has happened is that b_ioend_wq got already frozen during
> suspend and thus submitted read could not be completed (all buffer IO
> completions seem to be happening from workqueue now if I'm reading the code
> right) and thus xfs_aild never finished waiting for IO so that it could be
> frozen in try_to_freeze().
> 

Hmm, I'm not terribly familiar with the freezer, but shouldn't xfsaild()
end up frozen before the associated workqueues? Skimming through the
code, perhaps it is possible for the freezer to poke xfsaild(), but if
it doesn't actually wait for the freeze (and xfsaild() is busy doing
work), it goes ahead onto other tasks and potentially the workqueue if
it happens to not be busy at just the right time. Is that what you are
thinking?

If so, perhaps we need some kind of way to pin the workqueue as busy so
long as xfsaild() is active..? I was also wondering how necessary it is
for this workqueue to be freezable, but that goes back to 8018ec083c
("xfs: mark all internal workqueues as freezable") which apparently
added necessarily serialization to avoid reported corruptions.

Brian

> I'm not sure how to best fix this since I don't think we can easily have
> suspend dependencies between different execution contexts... We could
> possibly complete buffer IO already from softirq (which should also reduce
> IO latency somewhat) if it does not have ->iodone callback but maybe there's
> some problem with it I'm missing.
> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS blocking suspend
  2016-12-01 13:44 ` Brian Foster
@ 2016-12-01 14:09   ` Jan Kara
  2016-12-01 20:12     ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kara @ 2016-12-01 14:09 UTC (permalink / raw)
  To: Brian Foster; +Cc: Jan Kara, linux-xfs, jkosina

On Thu 01-12-16 08:44:52, Brian Foster wrote:
> On Thu, Dec 01, 2016 at 09:47:57AM +0100, Jan Kara wrote:
> > Hi,
> > 
> > I've got a report of xfs_aild blocking system suspend in 4.8.7 (in openSUSE
> > Tumbleweed which is our rolling distro):
> > 
> > Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
> > xfsaild/sdb3    D 0000000000019680     0 918      2 0x00000080
> >  ffff9e685409fb88 0000000000000000 ffff9e67beaea080 ffff9e68504c6000
> >  ffff9e6677226b80 ffff9e68540a0000 ffff9e676068c6d8 ffff9e68504c6000
> >  ffff9e685e48dc00 ffff9e676068c600 ffff9e685409fba0 ffffffffb66cfbac
> > Call Trace:
> >  [<ffffffffb66cfbac>] schedule+0x3c/0x90
> >  [<ffffffffb66d2f1e>] schedule_timeout+0x22e/0x410
> >  [<ffffffffb66d0f4a>] wait_for_completion+0x9a/0x100
> >  [<ffffffffc0f0689e>] xfs_buf_submit_wait+0x7e/0x250 [xfs]
> >  [<ffffffffc0f06ba8>] xfs_buf_read_map+0x108/0x190 [xfs]
> >  [<ffffffffc0f340c0>] xfs_trans_read_buf_map+0x100/0x370 [xfs]
> >  [<ffffffffc0ef631e>] xfs_imap_to_bp+0x5e/0xd0 [xfs]
> >  [<ffffffffc0f1ac6a>] xfs_iflush+0xca/0x220 [xfs]                                                                                        
> >  [<ffffffffc0f2b21b>] xfs_inode_item_push+0xcb/0x120 [xfs]
> >  [<ffffffffc0f32e8e>] xfsaild+0x30e/0x770 [xfs]
> >  [<ffffffffb609c5ed>] kthread+0xbd/0xe0
> >  [<ffffffffb66d459f>] ret_from_fork+0x1f/0x40
> > DWARF2 unwinder stuck at ret_from_fork+0x1f/0x40
> > 
> > Leftover inexact backtrace:
> >  [<ffffffffb609c530>] ?  kthread_worker_fn+0x170/0x170
> > 
> > What I think has happened is that b_ioend_wq got already frozen during
> > suspend and thus submitted read could not be completed (all buffer IO
> > completions seem to be happening from workqueue now if I'm reading the code
> > right) and thus xfs_aild never finished waiting for IO so that it could be
> > frozen in try_to_freeze().
> > 
> 
> Hmm, I'm not terribly familiar with the freezer, but shouldn't xfsaild()
> end up frozen before the associated workqueues? Skimming through the
> code, perhaps it is possible for the freezer to poke xfsaild(), but if
> it doesn't actually wait for the freeze (and xfsaild() is busy doing
> work), it goes ahead onto other tasks and potentially the workqueue if
> it happens to not be busy at just the right time. Is that what you are
> thinking?

Yes. Look at try_to_freeze_tasks() in kernel/power/process.c. We actually
first do freeze_workqueues_begin() - which basically makes sure we do not
start processing new workqueue items for freezable workqueues - and then
walk over all processes and try to freeze them. So while xfs_aild may still
be happily submitting IO, the IO completion workqueue is already frozen...

> If so, perhaps we need some kind of way to pin the workqueue as busy so
> long as xfsaild() is active..? I was also wondering how necessary it is
> for this workqueue to be freezable, but that goes back to 8018ec083c
> ("xfs: mark all internal workqueues as freezable") which apparently
> added necessarily serialization to avoid reported corruptions.

Yeah, so currently there's no way to "pin the workqueue as busy" as you
suggest. That would require new suspending primitive. And essentially you
are just modelling suspend dependencies with this.

WRT workqueue being freezable - I think it is freezable because IO
completion for unwritten extents leads to extent coversion which can
generate new IO. Whether there isn't a better way for XFS to plug this IO
source I cannot really tell.

Ultimately, the correct solution is to use filesystem freezing during
suspend to quiesce the filesystem. However that requires more work on the
suspend side - added Jiri to CC who promised to look into it some time ago
;).

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS blocking suspend
  2016-12-01 14:09   ` Jan Kara
@ 2016-12-01 20:12     ` Dave Chinner
  2016-12-02 13:47       ` Jiri Kosina
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2016-12-01 20:12 UTC (permalink / raw)
  To: Jan Kara; +Cc: Brian Foster, linux-xfs, jkosina

On Thu, Dec 01, 2016 at 03:09:59PM +0100, Jan Kara wrote:
> On Thu 01-12-16 08:44:52, Brian Foster wrote:
> > On Thu, Dec 01, 2016 at 09:47:57AM +0100, Jan Kara wrote:
> > > Hi,
> > > 
> > > I've got a report of xfs_aild blocking system suspend in 4.8.7 (in openSUSE
> > > Tumbleweed which is our rolling distro):
> > > 
> > > Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
> > > xfsaild/sdb3    D 0000000000019680     0 918      2 0x00000080
> > >  ffff9e685409fb88 0000000000000000 ffff9e67beaea080 ffff9e68504c6000
> > >  ffff9e6677226b80 ffff9e68540a0000 ffff9e676068c6d8 ffff9e68504c6000
> > >  ffff9e685e48dc00 ffff9e676068c600 ffff9e685409fba0 ffffffffb66cfbac
> > > Call Trace:
> > >  [<ffffffffb66cfbac>] schedule+0x3c/0x90
> > >  [<ffffffffb66d2f1e>] schedule_timeout+0x22e/0x410
> > >  [<ffffffffb66d0f4a>] wait_for_completion+0x9a/0x100
> > >  [<ffffffffc0f0689e>] xfs_buf_submit_wait+0x7e/0x250 [xfs]
> > >  [<ffffffffc0f06ba8>] xfs_buf_read_map+0x108/0x190 [xfs]
> > >  [<ffffffffc0f340c0>] xfs_trans_read_buf_map+0x100/0x370 [xfs]
> > >  [<ffffffffc0ef631e>] xfs_imap_to_bp+0x5e/0xd0 [xfs]
> > >  [<ffffffffc0f1ac6a>] xfs_iflush+0xca/0x220 [xfs]                                                                                        
> > >  [<ffffffffc0f2b21b>] xfs_inode_item_push+0xcb/0x120 [xfs]
> > >  [<ffffffffc0f32e8e>] xfsaild+0x30e/0x770 [xfs]
> > >  [<ffffffffb609c5ed>] kthread+0xbd/0xe0
> > >  [<ffffffffb66d459f>] ret_from_fork+0x1f/0x40
> > > DWARF2 unwinder stuck at ret_from_fork+0x1f/0x40
> > > 
> > > Leftover inexact backtrace:
> > >  [<ffffffffb609c530>] ?  kthread_worker_fn+0x170/0x170
> > > 
> > > What I think has happened is that b_ioend_wq got already frozen during
> > > suspend and thus submitted read could not be completed (all buffer IO
> > > completions seem to be happening from workqueue now if I'm reading the code
> > > right) and thus xfs_aild never finished waiting for IO so that it could be
> > > frozen in try_to_freeze().
> > > 
> > 
> > Hmm, I'm not terribly familiar with the freezer, but shouldn't xfsaild()
> > end up frozen before the associated workqueues? Skimming through the
> > code, perhaps it is possible for the freezer to poke xfsaild(), but if
> > it doesn't actually wait for the freeze (and xfsaild() is busy doing
> > work), it goes ahead onto other tasks and potentially the workqueue if
> > it happens to not be busy at just the right time. Is that what you are
> > thinking?
> 
> Yes. Look at try_to_freeze_tasks() in kernel/power/process.c. We actually
> first do freeze_workqueues_begin() - which basically makes sure we do not
> start processing new workqueue items for freezable workqueues - and then
> walk over all processes and try to freeze them. So while xfs_aild may still
> be happily submitting IO, the IO completion workqueue is already frozen...

Right - kernel threads are not frozen until the hibernation snapshot
is taken later on. The hibernate code does:

	sys_sync()
	freeze_processes()
	  -> freezes workqueues
	hibernate_snapshot()
	  -> freezes kernel threads

I've been saying for close on 10 years now that this sys_sync()
doesn't "freeze" journalling filesystems that can submit internal
metadata IO from kernel threads asynchronously after sync is run. As
such, freezing the filesystem kernel threads and workqueues while it
is operating is always going to be racy and dangerous.

> > If so, perhaps we need some kind of way to pin the workqueue as busy so
> > long as xfsaild() is active..? I was also wondering how necessary it is
> > for this workqueue to be freezable, but that goes back to 8018ec083c
> > ("xfs: mark all internal workqueues as freezable") which apparently
> > added necessarily serialization to avoid reported corruptions.
> 
> Yeah, so currently there's no way to "pin the workqueue as busy" as you
> suggest. That would require new suspending primitive. And essentially you
> are just modelling suspend dependencies with this.
> 
> WRT workqueue being freezable - I think it is freezable because IO
> completion for unwritten extents leads to extent coversion which can
> generate new IO. Whether there isn't a better way for XFS to plug this IO
> source I cannot really tell.

Well, that's one problem - the bigger problem was that when
workqueue processing of periodic work (e.g. eof block trimming) ran
during the hibernate snapshot, the memory image would end up
inconsistent and so on resume the in-memory state of the filesystem
would not match what was on disk.  Which pretty much guarantees
corruption will occur at some point, so we have to suspend all the
work queues at some point.

I'll also point out that if we only had work queues (i.e. xfsaild
was a work queue) we'd still have this same problem, and the xfsaild
workqueue would block waiting for IO completion queued to a
different workqueue and so always return "busy" and hence trigger
the suspend failure. Similarly, everything as kernel threads has the
same problem if the IO completion threads were frozen first...

> Ultimately, the correct solution is to use filesystem freezing during
> suspend to quiesce the filesystem. However that requires more work on the
> suspend side - added Jiri to CC who promised to look into it some time ago
> ;).

I've been saying that for 10 years, too, so I'm not going to hold my
breathe waiting for someone to fix this problem.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS blocking suspend
  2016-12-01 20:12     ` Dave Chinner
@ 2016-12-02 13:47       ` Jiri Kosina
  0 siblings, 0 replies; 5+ messages in thread
From: Jiri Kosina @ 2016-12-02 13:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Jan Kara, Brian Foster, linux-xfs

On Fri, 2 Dec 2016, Dave Chinner wrote:

> > Ultimately, the correct solution is to use filesystem freezing during
> > suspend to quiesce the filesystem. However that requires more work on the
> > suspend side - added Jiri to CC who promised to look into it some time ago
> > ;).
> 
> I've been saying that for 10 years, too, so I'm not going to hold my
> breathe waiting for someone to fix this problem.

Thanks for bringing up just another example why kthread freezer is such a 
mess.

I already have a sort-of working implementation that gets rid of kthread 
freezer, but there is still quite some work to do before it's mainline 
ready. Namely, we first have to get rid of all of the spurious kthread 
freezer usage we have in the tree so far.

The main part of this effort are these commits

https://git.kernel.org/cgit/linux/kernel/git/jikos/jikos.git/commit/?h=might-rebase/get-rid-of-kthread-freezer&id=394aa67810abefde6d79ea96a90e5d41a7df99f4
https://git.kernel.org/cgit/linux/kernel/git/jikos/jikos.git/commit/?h=might-rebase/get-rid-of-kthread-freezer&id=3f0d7690cbf813ce497f2ca816d8086afe490271
https://git.kernel.org/cgit/linux/kernel/git/jikos/jikos.git/commit/?h=might-rebase/get-rid-of-kthread-freezer&id=a637de712f9ca60bea6e93b13d09534170ec29fa

but especially the second commit needs a lot more care, as I am pretty 
sure it'd break some obscure corner cases, and needs to be investigated 
more.
Also, NFS needs to be taught about fs freezing before we can proceed this 
way.

Thanks,

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-12-02 13:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-01  8:47 XFS blocking suspend Jan Kara
2016-12-01 13:44 ` Brian Foster
2016-12-01 14:09   ` Jan Kara
2016-12-01 20:12     ` Dave Chinner
2016-12-02 13:47       ` Jiri Kosina

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.