All of lore.kernel.org
 help / color / mirror / Atom feed
* Suspend fails when xfs is involved?
@ 2017-02-03  1:04 Darrick J. Wong
  2017-02-03  2:18 ` Carlos E. R.
  2017-02-03 22:31 ` Dave Chinner
  0 siblings, 2 replies; 7+ messages in thread
From: Darrick J. Wong @ 2017-02-03  1:04 UTC (permalink / raw)
  To: xfs

Hi list,

So I've noticed that my laptop consistently fails to suspend with:

[1183625.726800] atkbd serio0: Unknown key pressed (translated set 2, code 0xd8 on isa0060/serio0).
[1183625.726804] atkbd serio0: Use 'setkeycodes e058 <keycode>' to make it known.
[1183625.727492] atkbd serio0: Unknown key released (translated set 2, code 0xd8 on isa0060/serio0).
[1183625.727497] atkbd serio0: Use 'setkeycodes e058 <keycode>' to make it known.
[1183626.203928] e1000e: enp0s25 NIC Link is Down
[1183626.422720] PM: Syncing filesystems ... done.
[1183626.450348] Freezing user space processes ... (elapsed 0.002 seconds) done.
[1183626.452995] Freezing remaining freezable tasks ... 
[1183632.657243] atkbd serio0: Unknown key pressed (translated set 2, code 0xd9 on isa0060/serio0).
[1183632.657247] atkbd serio0: Use 'setkeycodes e059 <keycode>' to make it known.
[1183632.657814] atkbd serio0: Unknown key released (translated set 2, code 0xd9 on isa0060/serio0).
[1183632.657817] atkbd serio0: Use 'setkeycodes e059 <keycode>' to make it known.
[1183646.459310] Freezing of tasks failed after 20.006 seconds (1 tasks refusing to freeze, wq_busy=0):
[1183646.459348] xfsaild/dm-1    D    0  1767      2 0x00000000
[1183646.459354]  ffffa1628c2eb980 ffffa1629e317940 ffffa162883b7840 ffffa162887ab980
[1183646.459359]  ffffa1629e317940 ffffaa1640213b18 ffffffffb5a43682 ffffffffb543452b
[1183646.459362]  ffffa1626deb5420 0000000000000000 ffffa162887a4018 ffffa162887ab980
[1183646.459366] Call Trace:
[1183646.459377]  [<ffffffffb5a43682>] ? __schedule+0x212/0x6e0
[1183646.459383]  [<ffffffffb543452b>] ? queue_unplugged+0x3b/0xc0
[1183646.459386]  [<ffffffffb5a43b8d>] schedule+0x3d/0x90
[1183646.459390]  [<ffffffffb5a47339>] schedule_timeout+0x239/0x420
[1183646.459392]  [<ffffffffb54398bc>] ? blk_finish_plug+0x2c/0x40
[1183646.459398]  [<ffffffffb5399b8f>] ? _xfs_buf_ioapply+0x33f/0x490
[1183646.459401]  [<ffffffffb5a450e6>] wait_for_completion+0xa6/0x120
[1183646.459405]  [<ffffffffb511ba60>] ? wake_up_q+0x70/0x70
[1183646.459411]  [<ffffffffb53eb270>] ? xfs_trans_read_buf_map+0x110/0x370
[1183646.459414]  [<ffffffffb539bc33>] ? _xfs_buf_read+0x23/0x30
[1183646.459460]  [<ffffffffb539ba0f>] xfs_buf_submit_wait+0x7f/0x280
[1183646.459463]  [<ffffffffb53eb270>] ? xfs_trans_read_buf_map+0x110/0x370
[1183646.459466]  [<ffffffffb539bc33>] _xfs_buf_read+0x23/0x30
[1183646.459470]  [<ffffffffb539bd64>] xfs_buf_read_map+0x124/0x1b0
[1183646.459473]  [<ffffffffb53eb270>] xfs_trans_read_buf_map+0x110/0x370
[1183646.459478]  [<ffffffffb538417e>] xfs_imap_to_bp+0x6e/0xe0
[1183646.459481]  [<ffffffffb53b3883>] xfs_iflush+0xd3/0x230
[1183646.459486]  [<ffffffffb53e0ab4>] xfs_inode_item_push+0xf4/0x150
[1183646.459489]  [<ffffffffb53e9cdf>] xfsaild+0x2df/0x740
[1183646.459492]  [<ffffffffb53e9a00>] ? xfs_trans_ail_cursor_first+0x90/0x90
[1183646.459495]  [<ffffffffb53e9a00>] ? xfs_trans_ail_cursor_first+0x90/0x90
[1183646.459500]  [<ffffffffb51101f9>] kthread+0xd9/0xf0
[1183646.459503]  [<ffffffffb5110120>] ? kthread_park+0x60/0x60
[1183646.459507]  [<ffffffffb5a49122>] ret_from_fork+0x22/0x30
[1183646.459696] Restarting kernel threads ... done.
[1183646.460145] Restarting tasks ... done.
[1183646.525519] PM: Syncing filesystems ... done.
[1183646.552616] Freezing user space processes ... (elapsed 0.002 seconds) done.
[1183646.554700] Freezing remaining freezable tasks ... 
[1183666.560888] Freezing of tasks failed after 20.006 seconds (1 tasks refusing to freeze, wq_busy=0):
[1183666.560944] xfsaild/dm-9    D    0  3044      2 0x00000000
[1183666.560950]  ffffa1628c2e8000 ffffa1629e397940 ffffa15e8418ed00 ffffa1627f418000
[1183666.560954]  ffffa1629e397940 ffffaa1641c2bb18 ffffffffb5a43682 ffffffffb543452b
[1183666.560957]  ffffa16245dfb2b0 0000000000000000 ffffa162887a4018 ffffa1627f418000
[1183666.560961] Call Trace:
[1183666.560971]  [<ffffffffb5a43682>] ? __schedule+0x212/0x6e0
[1183666.560977]  [<ffffffffb543452b>] ? queue_unplugged+0x3b/0xc0
[1183666.560980]  [<ffffffffb5a43b8d>] schedule+0x3d/0x90
[1183666.560984]  [<ffffffffb5a47339>] schedule_timeout+0x239/0x420
[1183666.560986]  [<ffffffffb54398bc>] ? blk_finish_plug+0x2c/0x40
[1183666.560992]  [<ffffffffb5399b8f>] ? _xfs_buf_ioapply+0x33f/0x490
[1183666.560995]  [<ffffffffb5a450e6>] wait_for_completion+0xa6/0x120
[1183666.560999]  [<ffffffffb511ba60>] ? wake_up_q+0x70/0x70
[1183666.561004]  [<ffffffffb53eb270>] ? xfs_trans_read_buf_map+0x110/0x370
[1183666.561008]  [<ffffffffb539bc33>] ? _xfs_buf_read+0x23/0x30
[1183666.561011]  [<ffffffffb539ba0f>] xfs_buf_submit_wait+0x7f/0x280
[1183666.561014]  [<ffffffffb53eb270>] ? xfs_trans_read_buf_map+0x110/0x370
[1183666.561017]  [<ffffffffb539bc33>] _xfs_buf_read+0x23/0x30
[1183666.561020]  [<ffffffffb539bd64>] xfs_buf_read_map+0x124/0x1b0
[1183666.561023]  [<ffffffffb53eb270>] xfs_trans_read_buf_map+0x110/0x370
[1183666.561028]  [<ffffffffb538417e>] xfs_imap_to_bp+0x6e/0xe0
[1183666.561031]  [<ffffffffb53b3883>] xfs_iflush+0xd3/0x230
[1183666.561036]  [<ffffffffb53e0ab4>] xfs_inode_item_push+0xf4/0x150
[1183666.561039]  [<ffffffffb53e9cdf>] xfsaild+0x2df/0x740
[1183666.561042]  [<ffffffffb53e9a00>] ? xfs_trans_ail_cursor_first+0x90/0x90
[1183666.561045]  [<ffffffffb53e9a00>] ? xfs_trans_ail_cursor_first+0x90/0x90
[1183666.561050]  [<ffffffffb51101f9>] kthread+0xd9/0xf0
[1183666.561053]  [<ffffffffb5110120>] ? kthread_park+0x60/0x60
[1183666.561057]  [<ffffffffb5a49122>] ret_from_fork+0x22/0x30
[1183666.561224] Restarting kernel threads ... done.
[1183666.561689] Restarting tasks ... done.

ISTR Dave or someone grumbling about this being some artifact of the log
trying to read in some buffer or other as part of flushing the log prior
to suspend, but the io completion ends up tied to a workqueue that's
already been put to sleep, so xfs gets stuck forever.

Look familiar to anyone before I try to debug this tomorrow?

(Oddly my other systems don't do this, only the laptop.)

--D

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Suspend fails when xfs is involved?
  2017-02-03  1:04 Suspend fails when xfs is involved? Darrick J. Wong
@ 2017-02-03  2:18 ` Carlos E. R.
  2017-02-03 22:31 ` Dave Chinner
  1 sibling, 0 replies; 7+ messages in thread
From: Carlos E. R. @ 2017-02-03  2:18 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 5458 bytes --]

On 2017-02-03 02:04, Darrick J. Wong wrote:
> Hi list,
> 
> So I've noticed that my laptop consistently fails to suspend with:
> 

> [1183646.459310] Freezing of tasks failed after 20.006 seconds (1 tasks refusing to freeze, wq_busy=0):
> [1183646.459348] xfsaild/dm-1    D    0  1767      2 0x00000000

I see what to me seem similar events on my desktop machine, occasionally, but a second attempt succeeds; it is a different function, but xfs is involved:

<0.6> 2017-01-14 19:50:38 Telcontar kernel - - - [748246.597459] PM: Syncing filesystems ... done.
<0.6> 2017-01-14 19:50:38 Telcontar kernel - - - [748247.261177] Freezing user space processes ... (elapsed 0.002 seconds) done.
<0.7> 2017-01-14 19:50:38 Telcontar kernel - - - [748247.263791] PM: Marking nosave pages: [mem 0x00000000-0x00000fff]
<0.7> 2017-01-14 19:50:38 Telcontar kernel - - - [748247.263797] PM: Marking nosave pages: [mem 0x0009f000-0x000fffff]
<0.7> 2017-01-14 19:50:38 Telcontar kernel - - - [748247.263800] PM: Marking nosave pages: [mem 0xbff90000-0xffffffff]
<0.7> 2017-01-14 19:50:38 Telcontar kernel - - - [748247.264421] PM: Basic memory bitmaps created
<0.6> 2017-01-14 19:50:38 Telcontar kernel - - - [748247.264453] PM: Preallocating image memory... done (allocated 822838 pages)
<0.6> 2017-01-14 19:50:38 Telcontar kernel - - - [748249.186840] PM: Allocated 3291352 kbytes in 1.92 seconds (1714.24 MB/s)
<0.6> 2017-01-14 19:50:38 Telcontar kernel - - - [748249.186841] Freezing remaining freezable tasks ... 
<0.3> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.189941] Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
<0.6> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.189944] kswapd0         D ffff88023fd95c40     0    44      2 0x00000000
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.189948]  ffff880235ed3848 ffff88017ee58340 ffff880235eccbc0 ffff880235ed4000
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.189952]  ffff880235eccbc0 ffff8801fa1a2498 ffff8800ba8c5340 ffff8801fa1a23c0
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.189956]  ffff880235ed3860 ffffffff8160680c 7fffffffffffffff ffff880235ed3908
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.189959] Call Trace:
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.189970]  [<ffffffff8160680c>] schedule+0x3c/0x90
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.189975]  [<ffffffff816095fa>] schedule_timeout+0x25a/0x2f0
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.189978]  [<ffffffff816071f6>] wait_for_completion+0xc6/0x140
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.190019]  [<ffffffffa0edef9c>] xfs_buf_submit_wait+0x7c/0x250 [xfs]
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.190051]  [<ffffffffa0edf2b1>] xfs_buf_read_map+0x111/0x1a0 [xfs]
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.190075]  [<ffffffffa0f0c6ed>] xfs_trans_read_buf_map+0xfd/0x330 [xfs]
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.190097]  [<ffffffffa0ed1aae>] xfs_imap_to_bp+0x5e/0xd0 [xfs]
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.190120]  [<ffffffffa0ef3e0d>] xfs_iflush+0xcd/0x230 [xfs]
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.190142]  [<ffffffffa0ee7653>] xfs_reclaim_inode+0x53/0x370 [xfs]
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.190164]  [<ffffffffa0ee7b69>] xfs_reclaim_inodes_ag+0x1f9/0x2f0 [xfs]
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.190186]  [<ffffffffa0ee8871>] xfs_reclaim_inodes_nr+0x31/0x40 [xfs]
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.190191]  [<ffffffff81208a36>] super_cache_scan+0x176/0x180
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.190195]  [<ffffffff8119cb35>] shrink_slab.part.40+0x205/0x3e0
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.190199]  [<ffffffff811a1368>] shrink_zone+0x268/0x280
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.190202]  [<ffffffff811a2433>] kswapd+0x4a3/0x960
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.190206]  [<ffffffff8109d308>] kthread+0xc8/0xe0
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.190209]  [<ffffffff8160ac8f>] ret_from_fork+0x3f/0x70
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.191726] DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.191727] 
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.191728] Leftover inexact backtrace:
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.191728] 
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.191730]  [<ffffffff8109d240>] ? kthread_park+0x50/0x50
<0.4> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.191774] 
<0.6> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.192073] Restarting kernel threads ... done.
<0.7> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.298537] PM: Basic memory bitmaps freed
<0.6> 2017-01-14 19:50:38 Telcontar kernel - - - [748269.300160] Restarting tasks ... done.


The kernel was 4.4.36-8-default #1 SMP Fri Dec 9
Distribution is openSUSE Leap 42.2

I have not reported this yet in Bugzilla, but seeing this post I thought I might as well tell.
You will know better than me what to make of it.

-- 
Cheers / Saludos,
               
		Carlos E. R.

  (from 42.2 x86_64 "Malachite" (Minas Tirith))


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Suspend fails when xfs is involved?
  2017-02-03  1:04 Suspend fails when xfs is involved? Darrick J. Wong
  2017-02-03  2:18 ` Carlos E. R.
@ 2017-02-03 22:31 ` Dave Chinner
  2017-03-27 20:46   ` Darrick J. Wong
  1 sibling, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2017-02-03 22:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On Thu, Feb 02, 2017 at 05:04:01PM -0800, Darrick J. Wong wrote:
> Hi list,
> 
> So I've noticed that my laptop consistently fails to suspend with:
> 
> [1183625.726800] atkbd serio0: Unknown key pressed (translated set 2, code 0xd8 on isa0060/serio0).
> [1183625.726804] atkbd serio0: Use 'setkeycodes e058 <keycode>' to make it known.
> [1183625.727492] atkbd serio0: Unknown key released (translated set 2, code 0xd8 on isa0060/serio0).
> [1183625.727497] atkbd serio0: Use 'setkeycodes e058 <keycode>' to make it known.
> [1183626.203928] e1000e: enp0s25 NIC Link is Down
> [1183626.422720] PM: Syncing filesystems ... done.
> [1183626.450348] Freezing user space processes ... (elapsed 0.002 seconds) done.
> [1183626.452995] Freezing remaining freezable tasks ... 
> [1183632.657243] atkbd serio0: Unknown key pressed (translated set 2, code 0xd9 on isa0060/serio0).
> [1183632.657247] atkbd serio0: Use 'setkeycodes e059 <keycode>' to make it known.
> [1183632.657814] atkbd serio0: Unknown key released (translated set 2, code 0xd9 on isa0060/serio0).
> [1183632.657817] atkbd serio0: Use 'setkeycodes e059 <keycode>' to make it known.
> [1183646.459310] Freezing of tasks failed after 20.006 seconds (1 tasks refusing to freeze, wq_busy=0):
> [1183646.459348] xfsaild/dm-1    D    0  1767      2 0x00000000

Yes, this can happen because suspend thinks that "sync" is
sufficient to quiesce a filesystem into an idle state. 

> [1183646.459366] Call Trace:
> [1183646.459386]  [<ffffffffb5a43b8d>] schedule+0x3d/0x90
> [1183646.459390]  [<ffffffffb5a47339>] schedule_timeout+0x239/0x420
> [1183646.459401]  [<ffffffffb5a450e6>] wait_for_completion+0xa6/0x120
> [1183646.459460]  [<ffffffffb539ba0f>] xfs_buf_submit_wait+0x7f/0x280
> [1183646.459466]  [<ffffffffb539bc33>] _xfs_buf_read+0x23/0x30
> [1183646.459470]  [<ffffffffb539bd64>] xfs_buf_read_map+0x124/0x1b0
> [1183646.459473]  [<ffffffffb53eb270>] xfs_trans_read_buf_map+0x110/0x370
> [1183646.459478]  [<ffffffffb538417e>] xfs_imap_to_bp+0x6e/0xe0
> [1183646.459481]  [<ffffffffb53b3883>] xfs_iflush+0xd3/0x230
> [1183646.459486]  [<ffffffffb53e0ab4>] xfs_inode_item_push+0xf4/0x150
> [1183646.459489]  [<ffffffffb53e9cdf>] xfsaild+0x2df/0x740
> [1183646.459500]  [<ffffffffb51101f9>] kthread+0xd9/0xf0

That's inode writeback when the underlying inode buffer has been
reclaimed before the dirty cached inode has been written. So the
xfsaild is doing read/modify/write cycles to write back dirty
inodes. i.e. you're running in active memory reclaim conditions
prior to suspend...

> ISTR Dave or someone grumbling about this being some artifact of the log
> trying to read in some buffer or other as part of flushing the log prior
> to suspend, but the io completion ends up tied to a workqueue that's
> already been put to sleep, so xfs gets stuck forever.

Yup, suspend is just completely fucked, has been for more than 10
years. It needs to freeze filesystems so they are quiesced sanely,
not left to run while random parts of the kernel infrastructure they
rely on are shut down behind the filesystem's back.

> Look familiar to anyone before I try to debug this tomorrow?

See this as a recent starting point.

https://lwn.net/Articles/705269/

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Suspend fails when xfs is involved?
  2017-02-03 22:31 ` Dave Chinner
@ 2017-03-27 20:46   ` Darrick J. Wong
  2017-03-27 22:30     ` Rafael J. Wysocki
  0 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2017-03-27 20:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs, linux-pm

[cc linux-pm since this intersects with suspend...]

On Sat, Feb 04, 2017 at 09:31:27AM +1100, Dave Chinner wrote:
> On Thu, Feb 02, 2017 at 05:04:01PM -0800, Darrick J. Wong wrote:
> > Hi list,
> > 
> > So I've noticed that my laptop consistently fails to suspend with:
> > 
> > [1183625.726800] atkbd serio0: Unknown key pressed (translated set 2, code 0xd8 on isa0060/serio0).
> > [1183625.726804] atkbd serio0: Use 'setkeycodes e058 <keycode>' to make it known.
> > [1183625.727492] atkbd serio0: Unknown key released (translated set 2, code 0xd8 on isa0060/serio0).
> > [1183625.727497] atkbd serio0: Use 'setkeycodes e058 <keycode>' to make it known.
> > [1183626.203928] e1000e: enp0s25 NIC Link is Down
> > [1183626.422720] PM: Syncing filesystems ... done.
> > [1183626.450348] Freezing user space processes ... (elapsed 0.002 seconds) done.
> > [1183626.452995] Freezing remaining freezable tasks ... 
> > [1183632.657243] atkbd serio0: Unknown key pressed (translated set 2, code 0xd9 on isa0060/serio0).
> > [1183632.657247] atkbd serio0: Use 'setkeycodes e059 <keycode>' to make it known.
> > [1183632.657814] atkbd serio0: Unknown key released (translated set 2, code 0xd9 on isa0060/serio0).
> > [1183632.657817] atkbd serio0: Use 'setkeycodes e059 <keycode>' to make it known.
> > [1183646.459310] Freezing of tasks failed after 20.006 seconds (1 tasks refusing to freeze, wq_busy=0):
> > [1183646.459348] xfsaild/dm-1    D    0  1767      2 0x00000000
> 
> Yes, this can happen because suspend thinks that "sync" is
> sufficient to quiesce a filesystem into an idle state. 
> 
> > [1183646.459366] Call Trace:
> > [1183646.459386]  [<ffffffffb5a43b8d>] schedule+0x3d/0x90
> > [1183646.459390]  [<ffffffffb5a47339>] schedule_timeout+0x239/0x420
> > [1183646.459401]  [<ffffffffb5a450e6>] wait_for_completion+0xa6/0x120
> > [1183646.459460]  [<ffffffffb539ba0f>] xfs_buf_submit_wait+0x7f/0x280
> > [1183646.459466]  [<ffffffffb539bc33>] _xfs_buf_read+0x23/0x30
> > [1183646.459470]  [<ffffffffb539bd64>] xfs_buf_read_map+0x124/0x1b0
> > [1183646.459473]  [<ffffffffb53eb270>] xfs_trans_read_buf_map+0x110/0x370
> > [1183646.459478]  [<ffffffffb538417e>] xfs_imap_to_bp+0x6e/0xe0
> > [1183646.459481]  [<ffffffffb53b3883>] xfs_iflush+0xd3/0x230
> > [1183646.459486]  [<ffffffffb53e0ab4>] xfs_inode_item_push+0xf4/0x150
> > [1183646.459489]  [<ffffffffb53e9cdf>] xfsaild+0x2df/0x740
> > [1183646.459500]  [<ffffffffb51101f9>] kthread+0xd9/0xf0
> 
> That's inode writeback when the underlying inode buffer has been
> reclaimed before the dirty cached inode has been written. So the
> xfsaild is doing read/modify/write cycles to write back dirty
> inodes. i.e. you're running in active memory reclaim conditions
> prior to suspend...

So I wrote up a patch that removes WQ_FREEZABLE from the xfs_buf thread,
and since then I haven't had any problems suspending my laptop.  Last
week at LSF I inquired about whether it was proper to be freezing IO
helper threads as part of suspend, and was told in response "Are you
convinced that use of WQ_FREEZABLE is even correct?"  TBH I can't see
why you'd want to freeze IO helper workqueues at all.

So, I'm going to email that patch out as an RFC and if anyone wants to
follow up the discussion, let's do it there.  I get it, suspend really
should just fsfreeze, but the question I really want to know is, why
does XFS freeze its own threads?  They seem to go to sleep just fine
after we're done doing all the IO we want.

> > ISTR Dave or someone grumbling about this being some artifact of the log
> > trying to read in some buffer or other as part of flushing the log prior
> > to suspend, but the io completion ends up tied to a workqueue that's
> > already been put to sleep, so xfs gets stuck forever.
> 
> Yup, suspend is just completely fucked, has been for more than 10
> years. It needs to freeze filesystems so they are quiesced sanely,
> not left to run while random parts of the kernel infrastructure they
> rely on are shut down behind the filesystem's back.
> 
> > Look familiar to anyone before I try to debug this tomorrow?
> 
> See this as a recent starting point.
> 
> https://lwn.net/Articles/705269/

I wonder if they've done any work on freezing filesystems...

--D

> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Suspend fails when xfs is involved?
  2017-03-27 20:46   ` Darrick J. Wong
@ 2017-03-27 22:30     ` Rafael J. Wysocki
  2017-03-27 23:14       ` Luis R. Rodriguez
  0 siblings, 1 reply; 7+ messages in thread
From: Rafael J. Wysocki @ 2017-03-27 22:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, xfs, linux-pm

On Monday, March 27, 2017 01:46:07 PM Darrick J. Wong wrote:
> [cc linux-pm since this intersects with suspend...]
> 
> On Sat, Feb 04, 2017 at 09:31:27AM +1100, Dave Chinner wrote:
> > On Thu, Feb 02, 2017 at 05:04:01PM -0800, Darrick J. Wong wrote:
> > > Hi list,
> > > 
> > > So I've noticed that my laptop consistently fails to suspend with:
> > > 
> > > [1183625.726800] atkbd serio0: Unknown key pressed (translated set 2, code 0xd8 on isa0060/serio0).
> > > [1183625.726804] atkbd serio0: Use 'setkeycodes e058 <keycode>' to make it known.
> > > [1183625.727492] atkbd serio0: Unknown key released (translated set 2, code 0xd8 on isa0060/serio0).
> > > [1183625.727497] atkbd serio0: Use 'setkeycodes e058 <keycode>' to make it known.
> > > [1183626.203928] e1000e: enp0s25 NIC Link is Down
> > > [1183626.422720] PM: Syncing filesystems ... done.
> > > [1183626.450348] Freezing user space processes ... (elapsed 0.002 seconds) done.
> > > [1183626.452995] Freezing remaining freezable tasks ... 
> > > [1183632.657243] atkbd serio0: Unknown key pressed (translated set 2, code 0xd9 on isa0060/serio0).
> > > [1183632.657247] atkbd serio0: Use 'setkeycodes e059 <keycode>' to make it known.
> > > [1183632.657814] atkbd serio0: Unknown key released (translated set 2, code 0xd9 on isa0060/serio0).
> > > [1183632.657817] atkbd serio0: Use 'setkeycodes e059 <keycode>' to make it known.
> > > [1183646.459310] Freezing of tasks failed after 20.006 seconds (1 tasks refusing to freeze, wq_busy=0):
> > > [1183646.459348] xfsaild/dm-1    D    0  1767      2 0x00000000
> > 
> > Yes, this can happen because suspend thinks that "sync" is
> > sufficient to quiesce a filesystem into an idle state. 
> > 
> > > [1183646.459366] Call Trace:
> > > [1183646.459386]  [<ffffffffb5a43b8d>] schedule+0x3d/0x90
> > > [1183646.459390]  [<ffffffffb5a47339>] schedule_timeout+0x239/0x420
> > > [1183646.459401]  [<ffffffffb5a450e6>] wait_for_completion+0xa6/0x120
> > > [1183646.459460]  [<ffffffffb539ba0f>] xfs_buf_submit_wait+0x7f/0x280
> > > [1183646.459466]  [<ffffffffb539bc33>] _xfs_buf_read+0x23/0x30
> > > [1183646.459470]  [<ffffffffb539bd64>] xfs_buf_read_map+0x124/0x1b0
> > > [1183646.459473]  [<ffffffffb53eb270>] xfs_trans_read_buf_map+0x110/0x370
> > > [1183646.459478]  [<ffffffffb538417e>] xfs_imap_to_bp+0x6e/0xe0
> > > [1183646.459481]  [<ffffffffb53b3883>] xfs_iflush+0xd3/0x230
> > > [1183646.459486]  [<ffffffffb53e0ab4>] xfs_inode_item_push+0xf4/0x150
> > > [1183646.459489]  [<ffffffffb53e9cdf>] xfsaild+0x2df/0x740
> > > [1183646.459500]  [<ffffffffb51101f9>] kthread+0xd9/0xf0
> > 
> > That's inode writeback when the underlying inode buffer has been
> > reclaimed before the dirty cached inode has been written. So the
> > xfsaild is doing read/modify/write cycles to write back dirty
> > inodes. i.e. you're running in active memory reclaim conditions
> > prior to suspend...
> 
> So I wrote up a patch that removes WQ_FREEZABLE from the xfs_buf thread,
> and since then I haven't had any problems suspending my laptop.  Last
> week at LSF I inquired about whether it was proper to be freezing IO
> helper threads as part of suspend, and was told in response "Are you
> convinced that use of WQ_FREEZABLE is even correct?"  TBH I can't see
> why you'd want to freeze IO helper workqueues at all.
> 
> So, I'm going to email that patch out as an RFC and if anyone wants to
> follow up the discussion, let's do it there.

Yes, please!

> I get it, suspend really
> should just fsfreeze, but the question I really want to know is, why
> does XFS freeze its own threads?  They seem to go to sleep just fine
> after we're done doing all the IO we want.

That, quite frankly, is what I would expect.

> > > ISTR Dave or someone grumbling about this being some artifact of the log
> > > trying to read in some buffer or other as part of flushing the log prior
> > > to suspend, but the io completion ends up tied to a workqueue that's
> > > already been put to sleep, so xfs gets stuck forever.
> > 
> > Yup, suspend is just completely fucked, has been for more than 10
> > years. It needs to freeze filesystems so they are quiesced sanely,
> > not left to run while random parts of the kernel infrastructure they
> > rely on are shut down behind the filesystem's back.
> > 
> > > Look familiar to anyone before I try to debug this tomorrow?
> > 
> > See this as a recent starting point.
> > 
> > https://lwn.net/Articles/705269/
> 
> I wonder if they've done any work on freezing filesystems...

Not that I know of.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Suspend fails when xfs is involved?
  2017-03-27 22:30     ` Rafael J. Wysocki
@ 2017-03-27 23:14       ` Luis R. Rodriguez
  2017-03-28 16:33         ` Rafael J. Wysocki
  0 siblings, 1 reply; 7+ messages in thread
From: Luis R. Rodriguez @ 2017-03-27 23:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Darrick J. Wong, Jiri Kosina, Petr Mladek, Dave Chinner, xfs, linux-pm

On Tue, Mar 28, 2017 at 12:30:40AM +0200, Rafael J. Wysocki wrote:
> On Monday, March 27, 2017 01:46:07 PM Darrick J. Wong wrote:
> > [cc linux-pm since this intersects with suspend...]
> > > 
> > > That's inode writeback when the underlying inode buffer has been
> > > reclaimed before the dirty cached inode has been written. So the
> > > xfsaild is doing read/modify/write cycles to write back dirty
> > > inodes. i.e. you're running in active memory reclaim conditions
> > > prior to suspend...
> > 
> > So I wrote up a patch that removes WQ_FREEZABLE from the xfs_buf thread,
> > and since then I haven't had any problems suspending my laptop.  Last
> > week at LSF I inquired about whether it was proper to be freezing IO
> > helper threads as part of suspend, and was told in response "Are you
> > convinced that use of WQ_FREEZABLE is even correct?"  TBH I can't see
> > why you'd want to freeze IO helper workqueues at all.
> > 
> > So, I'm going to email that patch out as an RFC and if anyone wants to
> > follow up the discussion, let's do it there.
> 
> Yes, please!

Do we any respective xfstests for this sort of stuff ? FWIW I've been
starting to try to draw up some tests on some other fs lookup stuff and
had to cook up scripts using resume on qemu, to do that, in case its
helpful use:

QEMU_CTRL="/dev/pts/7"
echo system_wakeup | socat - $QEMU_CTRL,raw,echo=0,crnl

Sadly, the above only works *sometimes*, so you have to smack qemu
a few times with it seems.

> > I get it, suspend really
> > should just fsfreeze, but the question I really want to know is, why
> > does XFS freeze its own threads?  They seem to go to sleep just fine
> > after we're done doing all the IO we want.
> 
> That, quite frankly, is what I would expect.

Do we have a deterministic requirement for how long this IO wait should / can
last for ? FWIW if we have tasks we *know* need to complete prior to suspend
we can do these with the pm ops. If something not done on pm ops is blocking
already though do we want a limit ? Why did XFS start freezing its own threads
to begin with ?

> > > > ISTR Dave or someone grumbling about this being some artifact of the log
> > > > trying to read in some buffer or other as part of flushing the log prior
> > > > to suspend, but the io completion ends up tied to a workqueue that's
> > > > already been put to sleep, so xfs gets stuck forever.
> > > 
> > > Yup, suspend is just completely fucked, has been for more than 10
> > > years. It needs to freeze filesystems so they are quiesced sanely,
> > > not left to run while random parts of the kernel infrastructure they
> > > rely on are shut down behind the filesystem's back.
> > > 
> > > > Look familiar to anyone before I try to debug this tomorrow?
> > > 
> > > See this as a recent starting point.
> > > 
> > > https://lwn.net/Articles/705269/
> > 
> > I wonder if they've done any work on freezing filesystems...
> 
> Not that I know of.

https://git.kernel.org/pub/scm/linux/kernel/git/jikos/jikos.git/commit/?h=might-rebase/get-rid-of-kthread-freezer&id=394aa67810abefde6d79ea96a90e5d41a7df99f4

freeze_all_supers() ?

  Luis

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Suspend fails when xfs is involved?
  2017-03-27 23:14       ` Luis R. Rodriguez
@ 2017-03-28 16:33         ` Rafael J. Wysocki
  0 siblings, 0 replies; 7+ messages in thread
From: Rafael J. Wysocki @ 2017-03-28 16:33 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Darrick J. Wong, Jiri Kosina, Petr Mladek, Dave Chinner, xfs, linux-pm

On Tuesday, March 28, 2017 01:14:26 AM Luis R. Rodriguez wrote:
> On Tue, Mar 28, 2017 at 12:30:40AM +0200, Rafael J. Wysocki wrote:
> > On Monday, March 27, 2017 01:46:07 PM Darrick J. Wong wrote:
> > > [cc linux-pm since this intersects with suspend...]
> > > > 
> > > > That's inode writeback when the underlying inode buffer has been
> > > > reclaimed before the dirty cached inode has been written. So the
> > > > xfsaild is doing read/modify/write cycles to write back dirty
> > > > inodes. i.e. you're running in active memory reclaim conditions
> > > > prior to suspend...
> > > 
> > > So I wrote up a patch that removes WQ_FREEZABLE from the xfs_buf thread,
> > > and since then I haven't had any problems suspending my laptop.  Last
> > > week at LSF I inquired about whether it was proper to be freezing IO
> > > helper threads as part of suspend, and was told in response "Are you
> > > convinced that use of WQ_FREEZABLE is even correct?"  TBH I can't see
> > > why you'd want to freeze IO helper workqueues at all.
> > > 
> > > So, I'm going to email that patch out as an RFC and if anyone wants to
> > > follow up the discussion, let's do it there.
> > 
> > Yes, please!
> 
> Do we any respective xfstests for this sort of stuff ? FWIW I've been
> starting to try to draw up some tests on some other fs lookup stuff and
> had to cook up scripts using resume on qemu, to do that, in case its
> helpful use:
> 
> QEMU_CTRL="/dev/pts/7"
> echo system_wakeup | socat - $QEMU_CTRL,raw,echo=0,crnl
> 
> Sadly, the above only works *sometimes*, so you have to smack qemu
> a few times with it seems.
> 
> > > I get it, suspend really
> > > should just fsfreeze, but the question I really want to know is, why
> > > does XFS freeze its own threads?  They seem to go to sleep just fine
> > > after we're done doing all the IO we want.
> > 
> > That, quite frankly, is what I would expect.
> 
> Do we have a deterministic requirement for how long this IO wait should / can
> last for ? FWIW if we have tasks we *know* need to complete prior to suspend
> we can do these with the pm ops. If something not done on pm ops is blocking
> already though do we want a limit ? Why did XFS start freezing its own threads
> to begin with ?

Please refer to the Dave's response here: http://marc.info/?l=linux-pm&m=149065484212704&w=2

> > > > > ISTR Dave or someone grumbling about this being some artifact of the log
> > > > > trying to read in some buffer or other as part of flushing the log prior
> > > > > to suspend, but the io completion ends up tied to a workqueue that's
> > > > > already been put to sleep, so xfs gets stuck forever.
> > > > 
> > > > Yup, suspend is just completely fucked, has been for more than 10
> > > > years. It needs to freeze filesystems so they are quiesced sanely,
> > > > not left to run while random parts of the kernel infrastructure they
> > > > rely on are shut down behind the filesystem's back.
> > > > 
> > > > > Look familiar to anyone before I try to debug this tomorrow?
> > > > 
> > > > See this as a recent starting point.
> > > > 
> > > > https://lwn.net/Articles/705269/
> > > 
> > > I wonder if they've done any work on freezing filesystems...
> > 
> > Not that I know of.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/jikos/jikos.git/commit/?h=might-rebase/get-rid-of-kthread-freezer&id=394aa67810abefde6d79ea96a90e5d41a7df99f4
> 
> freeze_all_supers() ?

Yup.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-03-28 16:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-03  1:04 Suspend fails when xfs is involved? Darrick J. Wong
2017-02-03  2:18 ` Carlos E. R.
2017-02-03 22:31 ` Dave Chinner
2017-03-27 20:46   ` Darrick J. Wong
2017-03-27 22:30     ` Rafael J. Wysocki
2017-03-27 23:14       ` Luis R. Rodriguez
2017-03-28 16:33         ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.