All of lore.kernel.org
 help / color / mirror / Atom feed
* XFS crash?
@ 2014-03-05 23:08 Austin Schuh
  2014-03-05 23:35 ` Dave Chinner
  0 siblings, 1 reply; 30+ messages in thread
From: Austin Schuh @ 2014-03-05 23:08 UTC (permalink / raw)
  To: xfs

Howdy,

I'm running a config_preempt_rt patched version of the 3.10.11 kernel,
and I'm seeing a couple lockups and crashes which I think are related
to XFS.

After the first lockup, I booted into single user mode, mounted the
filesystem readonly, and then ran xfs_repair -d.  When I rebooted
again, I initiated a large copy to a USB drive (formated EXT3) and
continued editing source files and compiling.  After a couple minutes,
my laptop started locking up and after 2 minutes, the following showed
up in the kernel.

Austin

$ uname -a
Linux vpc5 3.10-3-rt-amd64 #5 SMP PREEMPT RT Debian 3.10.11-2
(2013-09-10) x86_64 GNU/Linux

[ 1186.363397] usb 2-1: USB disconnect, device number 2
[ 1200.895262] INFO: task kworker/u16:5:250 blocked for more than 120 seconds.
[ 1200.895267] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1200.895270] kworker/u16:5   D ffff88040dc62cc0     0   250      2 0x00000000
[ 1200.895281] Workqueue: writeback bdi_writeback_workfn (flush-8:0)
[ 1200.895287]  ffff880406fc5c20 0000000000000046 0000000000008000
ffff880407f06ce0
[ 1200.895288]  0000000000062cc0 ffff880407dfdfd8 0000000000062cc0
ffff880407dfdfd8
[ 1200.895290]  ffff880407dfd5e8 ffff880406fc5c20 ffff880407dfd710
ffff880407dfd708
[ 1200.895294] Call Trace:
[ 1200.895301]  [<ffffffff813a10ef>] ? console_conditional_schedule+0xf/0xf
[ 1200.895304]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
[ 1200.895307]  [<ffffffff813a111b>] ? schedule_timeout+0x2c/0x123
[ 1200.895310]  [<ffffffff813a5c20>] ? add_preempt_count+0xb7/0xe0
[ 1200.895313]  [<ffffffff81065cfd>] ? migrate_enable+0x1cd/0x1dd
[ 1200.895317]  [<ffffffff810651ab>] ? get_parent_ip+0x9/0x1b
[ 1200.895319]  [<ffffffff813a5c20>] ? add_preempt_count+0xb7/0xe0
[ 1200.895322]  [<ffffffff813a188b>] ? __wait_for_common+0x78/0xd6
[ 1200.895356]  [<ffffffffa01a5032>] ? xfs_bmapi_allocate+0x92/0x9e [xfs]
[ 1200.895371]  [<ffffffffa01a535d>] ? xfs_bmapi_write+0x31f/0x558 [xfs]
[ 1200.895375]  [<ffffffff81109680>] ? kmem_cache_alloc+0x7c/0x17d
[ 1200.895390]  [<ffffffffa01a2e6e>] ? __xfs_bmapi_allocate+0x22b/0x22b [xfs]
[ 1200.895402]  [<ffffffffa018b899>] ?
xfs_iomap_write_allocate+0x1bc/0x2c8 [xfs]
[ 1200.895414]  [<ffffffffa017edc5>] ? xfs_map_blocks+0x125/0x1f5 [xfs]
[ 1200.895424]  [<ffffffffa017fc87>] ? xfs_vm_writepage+0x266/0x48f [xfs]
[ 1200.895428]  [<ffffffff810d3d14>] ? __writepage+0xd/0x2a
[ 1200.895430]  [<ffffffff810d4790>] ? write_cache_pages+0x207/0x302
[ 1200.895432]  [<ffffffff810d3d07>] ? page_index+0x14/0x14
[ 1200.895435]  [<ffffffff810d48c6>] ? generic_writepages+0x3b/0x57
[ 1200.895438]  [<ffffffff81134698>] ? __writeback_single_inode+0x72/0x225
[ 1200.895441]  [<ffffffff8113550b>] ? writeback_sb_inodes+0x215/0x36d
[ 1200.895444]  [<ffffffff811356cc>] ? __writeback_inodes_wb+0x69/0xab
[ 1200.895446]  [<ffffffff81135844>] ? wb_writeback+0x136/0x2a7
[ 1200.895448]  [<ffffffff810d3fc8>] ? global_dirtyable_memory+0xd/0x46
[ 1200.895450]  [<ffffffff810d4945>] ? global_dirty_limits+0x27/0x120
[ 1200.895453]  [<ffffffff81135cc8>] ? wb_do_writeback+0x1a1/0x1dc
[ 1200.895456]  [<ffffffff81135d66>] ? bdi_writeback_workfn+0x63/0xf4
[ 1200.895460]  [<ffffffff81055a49>] ? process_one_work+0x19b/0x2b2
[ 1200.895462]  [<ffffffff81055f41>] ? worker_thread+0x12b/0x1f6
[ 1200.895465]  [<ffffffff81055e16>] ? rescuer_thread+0x28f/0x28f
[ 1200.895467]  [<ffffffff8105a909>] ? kthread+0x81/0x89
[ 1200.895469]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c
[ 1200.895472]  [<ffffffff813a75fc>] ? ret_from_fork+0x7c/0xb0
[ 1200.895474]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c
[ 1200.895477] INFO: task kworker/0:2:272 blocked for more than 120 seconds.
[ 1200.895501] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1200.895503] kworker/0:2     D ffff880407f06ce0     0   272      2 0x00000000
[ 1200.895517] Workqueue: xfs-data/sda7 xfs_end_io [xfs]
[ 1200.895520]  ffff880407f06ce0 0000000000000046 ffffffffffffffff
ffff88032dc7baa0
[ 1200.895522]  0000000000062cc0 ffff8804053f3fd8 0000000000062cc0
ffff8804053f3fd8
[ 1200.895524]  0000000000000001 ffff880407f06ce0 ffff88030fb5df80
0000000000000002
[ 1200.895526] Call Trace:
[ 1200.895531]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
[ 1200.895534]  [<ffffffff813a2438>] ? __rt_mutex_slowlock+0x7b/0xb4
[ 1200.895537]  [<ffffffff813a2577>] ? rt_mutex_slowlock+0xe5/0x150
[ 1200.895542]  [<ffffffff8100c02f>] ? load_TLS+0x7/0xa
[ 1200.895552]  [<ffffffffa017eadb>] ? xfs_setfilesize+0x48/0x120 [xfs]
[ 1200.895555]  [<ffffffff81063d25>] ? finish_task_switch+0x80/0xc6
[ 1200.895565]  [<ffffffffa017f62f>] ? xfs_end_io+0x7a/0x8e [xfs]
[ 1200.895568]  [<ffffffff81055a49>] ? process_one_work+0x19b/0x2b2
[ 1200.895570]  [<ffffffff81055f41>] ? worker_thread+0x12b/0x1f6
[ 1200.895572]  [<ffffffff81055e16>] ? rescuer_thread+0x28f/0x28f
[ 1200.895574]  [<ffffffff8105a909>] ? kthread+0x81/0x89
[ 1200.895576]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c
[ 1200.895579]  [<ffffffff813a75fc>] ? ret_from_fork+0x7c/0xb0
[ 1200.895581]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c
[ 1200.895617] INFO: task BrowserBlocking:5547 blocked for more than
120 seconds.
[ 1200.895628] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1200.895630] BrowserBlocking D ffff88040dc62cc0     0  5547      1 0x00000000
[ 1200.895635]  ffff88040497baa0 0000000000000086 0000000000000000
ffffffff81616400
[ 1200.895636]  0000000000062cc0 ffff8803f07bbfd8 0000000000062cc0
ffff8803f07bbfd8
[ 1200.895638]  ffff8803f07bb868 ffff88040497baa0 ffff8803f07bb990
ffff8803f07bb988
[ 1200.895641] Call Trace:
[ 1200.895645]  [<ffffffff813a10ef>] ? console_conditional_schedule+0xf/0xf
[ 1200.895648]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
[ 1200.895650]  [<ffffffff813a111b>] ? schedule_timeout+0x2c/0x123
[ 1200.895653]  [<ffffffff813a5c20>] ? add_preempt_count+0xb7/0xe0
[ 1200.895654]  [<ffffffff81065cfd>] ? migrate_enable+0x1cd/0x1dd
[ 1200.895657]  [<ffffffff810651ab>] ? get_parent_ip+0x9/0x1b
[ 1200.895659]  [<ffffffff813a5c20>] ? add_preempt_count+0xb7/0xe0
[ 1200.895662]  [<ffffffff813a188b>] ? __wait_for_common+0x78/0xd6
[ 1200.895676]  [<ffffffffa01a5032>] ? xfs_bmapi_allocate+0x92/0x9e [xfs]
[ 1200.895689]  [<ffffffffa01a535d>] ? xfs_bmapi_write+0x31f/0x558 [xfs]
[ 1200.895691]  [<ffffffff81109680>] ? kmem_cache_alloc+0x7c/0x17d
[ 1200.895705]  [<ffffffffa01a2e6e>] ? __xfs_bmapi_allocate+0x22b/0x22b [xfs]
[ 1200.895717]  [<ffffffffa018b899>] ?
xfs_iomap_write_allocate+0x1bc/0x2c8 [xfs]
[ 1200.895728]  [<ffffffffa017edc5>] ? xfs_map_blocks+0x125/0x1f5 [xfs]
[ 1200.895739]  [<ffffffffa017fc87>] ? xfs_vm_writepage+0x266/0x48f [xfs]
[ 1200.895742]  [<ffffffff810d3d14>] ? __writepage+0xd/0x2a
[ 1200.895744]  [<ffffffff810d4790>] ? write_cache_pages+0x207/0x302
[ 1200.895745]  [<ffffffff810d3d07>] ? page_index+0x14/0x14
[ 1200.895749]  [<ffffffff8104031a>] ? pin_current_cpu+0x3e/0x191
[ 1200.895752]  [<ffffffff810d48c6>] ? generic_writepages+0x3b/0x57
[ 1200.895755]  [<ffffffff810cd303>] ? __filemap_fdatawrite_range+0x50/0x55
[ 1200.895758]  [<ffffffff810cd32d>] ? filemap_write_and_wait_range+0x25/0x59
[ 1200.895769]  [<ffffffffa01844d4>] ? xfs_file_fsync+0x78/0x1ad [xfs]
[ 1200.895773]  [<ffffffff8113876b>] ? do_fsync+0x2b/0x50
[ 1200.895776]  [<ffffffff8113897d>] ? SyS_fdatasync+0xb/0xf
[ 1200.895778]  [<ffffffff813a76a9>] ? system_call_fastpath+0x16/0x1b
[ 1200.895780] INFO: task BrowserBlocking:5558 blocked for more than
120 seconds.
[ 1200.895800] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1200.895801] BrowserBlocking D ffff88040dde2cc0     0  5558      1 0x00000000
[ 1200.895806]  ffff8804093ac300 0000000000000086 ffff8803efd94000
ffff88040ad28000
[ 1200.895808]  0000000000062cc0 ffff8803efd95fd8 0000000000062cc0
ffff8803efd95fd8
[ 1200.895809]  ffff88041e52efc0 ffff8804093ac300 ffff8804093ac300
ffff8803efd95da0
[ 1200.895812] Call Trace:
[ 1200.895816]  [<ffffffff810cc034>] ? __lock_page+0x66/0x66
[ 1200.895819]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
[ 1200.895821]  [<ffffffff813a1ff9>] ? io_schedule+0x55/0x6b
[ 1200.895823]  [<ffffffff810cc03a>] ? sleep_on_page+0x6/0xa
[ 1200.895826]  [<ffffffff813a128f>] ? __wait_on_bit+0x3e/0x75
[ 1200.895828]  [<ffffffff810cc4a4>] ? find_get_pages_tag+0xfa/0x125
[ 1200.895830]  [<ffffffff810cc0f3>] ? wait_on_page_bit+0x6e/0x73
[ 1200.895833]  [<ffffffff8105b333>] ? autoremove_wake_function+0x2a/0x2a
[ 1200.895835]  [<ffffffff810cc7bf>] ? filemap_fdatawait_range+0x66/0xfe
[ 1200.895838]  [<ffffffff810cd303>] ? __filemap_fdatawrite_range+0x50/0x55
[ 1200.895841]  [<ffffffff810cd342>] ? filemap_write_and_wait_range+0x3a/0x59
[ 1200.895852]  [<ffffffffa01844d4>] ? xfs_file_fsync+0x78/0x1ad [xfs]
[ 1200.895855]  [<ffffffff8113876b>] ? do_fsync+0x2b/0x50
[ 1200.895858]  [<ffffffff8113897d>] ? SyS_fdatasync+0xb/0xf
[ 1200.895860]  [<ffffffff813a76a9>] ? system_call_fastpath+0x16/0x1b
[ 1200.895863] INFO: task BrowserBlocking:5575 blocked for more than
120 seconds.
[ 1200.895874] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1200.895876] BrowserBlocking D ffff88040dd62cc0     0  5575      1 0x00000000
[ 1200.895881]  ffff880408c46480 0000000000000086 ffff8803f064e000
ffff88040accf540
[ 1200.895882]  0000000000062cc0 ffff8803f064ffd8 0000000000062cc0
ffff8803f064ffd8
[ 1200.895884]  ffff88041e54dfc0 ffff880408c46480 ffff880408c46480
ffff8803f064fda0
[ 1200.895887] Call Trace:
[ 1200.895891]  [<ffffffff810cc034>] ? __lock_page+0x66/0x66
[ 1200.895893]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
[ 1200.895896]  [<ffffffff813a1ff9>] ? io_schedule+0x55/0x6b
[ 1200.895898]  [<ffffffff810cc03a>] ? sleep_on_page+0x6/0xa
[ 1200.895900]  [<ffffffff813a128f>] ? __wait_on_bit+0x3e/0x75
[ 1200.895902]  [<ffffffff810cc4a4>] ? find_get_pages_tag+0xfa/0x125
[ 1200.895905]  [<ffffffff810cc0f3>] ? wait_on_page_bit+0x6e/0x73
[ 1200.895907]  [<ffffffff8105b333>] ? autoremove_wake_function+0x2a/0x2a
[ 1200.895909]  [<ffffffff810cc7bf>] ? filemap_fdatawait_range+0x66/0xfe
[ 1200.895912]  [<ffffffff810cd303>] ? __filemap_fdatawrite_range+0x50/0x55
[ 1200.895915]  [<ffffffff810cd342>] ? filemap_write_and_wait_range+0x3a/0x59
[ 1200.895926]  [<ffffffffa01844d4>] ? xfs_file_fsync+0x78/0x1ad [xfs]
[ 1200.895929]  [<ffffffff8113876b>] ? do_fsync+0x2b/0x50
[ 1200.895932]  [<ffffffff8113896d>] ? SyS_fsync+0x9/0xe
[ 1200.895934]  [<ffffffff813a76a9>] ? system_call_fastpath+0x16/0x1b
[ 1200.895965] INFO: task xterm:6032 blocked for more than 120 seconds.
[ 1200.895977] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1200.895979] xterm           D ffff88040dc62cc0     0  6032      1 0x00000000
[ 1200.895983]  ffff880407ce9920 0000000000000086 ffff8803b8473700
ffffffff81616400
[ 1200.895985]  0000000000062cc0 ffff8803b86affd8 0000000000062cc0
ffff8803b86affd8
[ 1200.895986]  ffff8803b86afc48 ffff880407ce9920 ffff8803b86afd50
ffff8803b86afd48
[ 1200.895989] Call Trace:
[ 1200.895994]  [<ffffffff813a10ef>] ? console_conditional_schedule+0xf/0xf
[ 1200.895996]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
[ 1200.895999]  [<ffffffff813a111b>] ? schedule_timeout+0x2c/0x123
[ 1200.896002]  [<ffffffff810651ab>] ? get_parent_ip+0x9/0x1b
[ 1200.896004]  [<ffffffff813a5c20>] ? add_preempt_count+0xb7/0xe0
[ 1200.896006]  [<ffffffff813a188b>] ? __wait_for_common+0x78/0xd6
[ 1200.896009]  [<ffffffff810549bb>] ? flush_work+0xf6/0x119
[ 1200.896011]  [<ffffffff8105344b>] ? create_and_start_worker+0x5e/0x5e
[ 1200.896016]  [<ffffffff812606fa>] ? input_available_p+0x15/0x53
[ 1200.896019]  [<ffffffff812629fd>] ? n_tty_read+0x374/0x78f
[ 1200.896022]  [<ffffffff810671e8>] ? try_to_wake_up+0x1ca/0x1ca
[ 1200.896024]  [<ffffffff8125e28c>] ? tty_read+0x79/0xbd
[ 1200.896029]  [<ffffffff8111587c>] ? vfs_read+0x93/0xf5
[ 1200.896032]  [<ffffffff811159f5>] ? SyS_read+0x51/0x80
[ 1200.896035]  [<ffffffff813a76a9>] ? system_call_fastpath+0x16/0x1b
[ 1200.896046] INFO: task tup:12846 blocked for more than 120 seconds.
[ 1200.896058] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1200.896060] tup             D ffff88040dc62cc0     0 12846   7061 0x00000000
[ 1200.896064]  ffff88032dc7baa0 0000000000000086 ffff8802cc83dde8
ffffffff81616400
[ 1200.896066]  0000000000062cc0 ffff8802cc83dfd8 0000000000062cc0
ffff8802cc83dfd8
[ 1200.896068]  ffff8802dd120818 ffff88032dc7baa0 ffff88030fb5df80
0000000000000002
[ 1200.896071] Call Trace:
[ 1200.896075]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
[ 1200.896078]  [<ffffffff813a2438>] ? __rt_mutex_slowlock+0x7b/0xb4
[ 1200.896081]  [<ffffffff813a2577>] ? rt_mutex_slowlock+0xe5/0x150
[ 1200.896094]  [<ffffffffa018c3ec>] ? xfs_vn_update_time+0xa0/0x16f [xfs]
[ 1200.896097]  [<ffffffff81128b78>] ? file_update_time+0x95/0xb5
[ 1200.896098]  [<ffffffff81065cfd>] ? migrate_enable+0x1cd/0x1dd
[ 1200.896109]  [<ffffffffa018524b>] ? xfs_file_aio_write_checks+0xa3/0xc0 [xfs]
[ 1200.896120]  [<ffffffffa01852cf>] ?
xfs_file_buffered_aio_write+0x67/0x175 [xfs]
[ 1200.896122]  [<ffffffff810671e8>] ? try_to_wake_up+0x1ca/0x1ca
[ 1200.896133]  [<ffffffffa018549a>] ? xfs_file_aio_write+0xbd/0x10f [xfs]
[ 1200.896136]  [<ffffffff81114d6f>] ? do_sync_write+0x55/0x7c
[ 1200.896140]  [<ffffffff81115783>] ? vfs_write+0x9e/0x104
[ 1200.896143]  [<ffffffff81115b8c>] ? SyS_pwrite64+0x61/0x87
[ 1200.896146]  [<ffffffff813a76a9>] ? system_call_fastpath+0x16/0x1b
[ 1200.896149] INFO: task ld:12961 blocked for more than 120 seconds.
[ 1200.896160] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1200.896162] ld              D ffff8802d3c4a180     0 12961      1 0x00000004
[ 1200.896166]  ffff8802d3c4a180 0000000000000086 ffff8802dd1b7ec0
ffff8802d41fbaa0
[ 1200.896168]  0000000000062cc0 ffff8802d3c9ffd8 0000000000062cc0
ffff8802d3c9ffd8
[ 1200.896169]  ffff8802dd1b7ec0 ffff8802d3c4a180 ffff880406cd1000
ffff8802d3c9fb90
[ 1200.896172] Call Trace:
[ 1200.896176]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
[ 1200.896180]  [<ffffffffa07bbd11>] ? __fuse_request_send+0x1fa/0x2d0 [fuse]
[ 1200.896183]  [<ffffffff8105b309>] ? abort_exclusive_wait+0x7f/0x7f
[ 1200.896187]  [<ffffffffa07c045e>] ? fuse_send_write+0x118/0x12b [fuse]
[ 1200.896190]  [<ffffffffa07c04e1>] ? fuse_send_write_pages+0x70/0x10f [fuse]
[ 1200.896193]  [<ffffffff8104047a>] ? unpin_current_cpu+0xd/0x51
[ 1200.896198]  [<ffffffffa07c1ae9>] ? fuse_perform_write+0x27a/0x2fd [fuse]
[ 1200.896202]  [<ffffffffa07c1dba>] ? fuse_file_aio_write+0x24e/0x297 [fuse]
[ 1200.896205]  [<ffffffff81114d6f>] ? do_sync_write+0x55/0x7c
[ 1200.896209]  [<ffffffff81115783>] ? vfs_write+0x9e/0x104
[ 1200.896212]  [<ffffffff81115a75>] ? SyS_write+0x51/0x80
[ 1200.896215]  [<ffffffff813a76a9>] ? system_call_fastpath+0x16/0x1b
[ 1200.896217] INFO: task ld:12970 blocked for more than 120 seconds.
[ 1200.896226] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1200.896228] ld              D ffff88040dce2cc0     0 12970      1 0x00000006
[ 1200.896232]  ffff8802d3d090c0 0000000000000046 ffff8802dd1b64f8
ffff88040accece0
[ 1200.896234]  0000000000062cc0 ffff880373e47fd8 0000000000062cc0
ffff880373e47fd8
[ 1200.896236]  ffff8802dd1b64f8 ffff8802d3d090c0 ffff880406cd1000
ffff880373e47be0
[ 1200.896239] Call Trace:
[ 1200.896243]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
[ 1200.896246]  [<ffffffffa07bbd11>] ? __fuse_request_send+0x1fa/0x2d0 [fuse]
[ 1200.896248]  [<ffffffff8105b309>] ? abort_exclusive_wait+0x7f/0x7f
[ 1200.896252]  [<ffffffffa07c030a>] ? fuse_flush+0xba/0xf6 [fuse]
[ 1200.896254]  [<ffffffff81113a4a>] ? filp_close+0x37/0x68
[ 1200.896257]  [<ffffffff8112aabf>] ? put_files_struct+0x6c/0xc5
[ 1200.896260]  [<ffffffff8104204d>] ? do_exit+0x3d5/0x946
[ 1200.896263]  [<ffffffff8104031a>] ? pin_current_cpu+0x3e/0x191
[ 1200.896265]  [<ffffffff8104265a>] ? do_group_exit+0x77/0xac
[ 1200.896268]  [<ffffffff8104e14a>] ? get_signal_to_deliver+0x4ff/0x543
[ 1200.896272]  [<ffffffff8100d1f7>] ? do_signal+0x3c/0x434
[ 1200.896275]  [<ffffffff8111587c>] ? vfs_read+0x93/0xf5
[ 1200.896278]  [<ffffffff8100d641>] ? do_notify_resume+0x52/0x97
[ 1200.896281]  [<ffffffff813a7966>] ? int_signal+0x12/0x17
[ 1320.927310] INFO: task khubd:210 blocked for more than 120 seconds.
[ 1320.927335] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1320.927338] khubd           D ffff88040dd62cc0     0   210      2 0x00000000
[ 1320.927346]  ffff880407a0dc20 0000000000000046 0000000000000000
ffff88040accf540
[ 1320.927348]  0000000000062cc0 ffff880404843fd8 0000000000062cc0
ffff880404843fd8
[ 1320.927350]  0000000000000000 ffff880407a0dc20 ffff880404843a70
ffff880404843a68
[ 1320.927354] Call Trace:
[ 1320.927364]  [<ffffffff813a10ef>] ? console_conditional_schedule+0xf/0xf
[ 1320.927367]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
[ 1320.927370]  [<ffffffff813a111b>] ? schedule_timeout+0x2c/0x123
[ 1320.927374]  [<ffffffff8102c27b>] ? _flat_send_IPI_mask+0x68/0x78
[ 1320.927378]  [<ffffffff810651ab>] ? get_parent_ip+0x9/0x1b
[ 1320.927381]  [<ffffffff813a5c20>] ? add_preempt_count+0xb7/0xe0
[ 1320.927382]  [<ffffffff813a188b>] ? __wait_for_common+0x78/0xd6
[ 1320.927394]  [<ffffffff810549bb>] ? flush_work+0xf6/0x119
[ 1320.927396]  [<ffffffff8105344b>] ? create_and_start_worker+0x5e/0x5e
[ 1320.927400]  [<ffffffff810d7309>] ? __pagevec_release+0x20/0x20
[ 1320.927402]  [<ffffffff810564cc>] ? schedule_on_each_cpu+0x9e/0xd5
[ 1320.927422]  [<ffffffff8113f5a0>] ? invalidate_bdev+0x1d/0x2e
[ 1320.927425]  [<ffffffff8113f5f4>] ? __invalidate_device+0x43/0x4b
[ 1320.927427]  [<ffffffff811bf94d>] ? invalidate_partition+0x27/0x41
[ 1320.927429]  [<ffffffff811c09f9>] ? del_gendisk+0x86/0x1bb
[ 1320.927435]  [<ffffffffa00de643>] ? sd_remove+0x5f/0x98 [sd_mod]
[ 1320.927443]  [<ffffffff8128d2a0>] ? __device_release_driver+0x7f/0xd5
[ 1320.927444]  [<ffffffff8128d53a>] ? device_release_driver+0x1a/0x25
[ 1320.927446]  [<ffffffff8128cee7>] ? bus_remove_device+0xe7/0xfc
[ 1320.927448]  [<ffffffff8128ada0>] ? device_del+0x120/0x176
[ 1320.927455]  [<ffffffffa0029a0f>] ? __scsi_remove_device+0x4d/0xaf [scsi_mod]
[ 1320.927461]  [<ffffffffa002898c>] ? scsi_forget_host+0x48/0x68 [scsi_mod]
[ 1320.927467]  [<ffffffffa002025c>] ? scsi_remove_host+0x85/0x101 [scsi_mod]
[ 1320.927472]  [<ffffffffa00d1f7c>] ? usb_stor_disconnect+0x63/0xae
[usb_storage]
[ 1320.927479]  [<ffffffffa0071d31>] ? usb_unbind_interface+0x5e/0x135 [usbcore]
[ 1320.927488]  [<ffffffff8128d2a0>] ? __device_release_driver+0x7f/0xd5
[ 1320.927490]  [<ffffffff8128d53a>] ? device_release_driver+0x1a/0x25
[ 1320.927491]  [<ffffffff8128cee7>] ? bus_remove_device+0xe7/0xfc
[ 1320.927493]  [<ffffffff8128ada0>] ? device_del+0x120/0x176
[ 1320.927501]  [<ffffffffa0070025>] ? usb_disable_device+0x6a/0x180 [usbcore]
[ 1320.927509]  [<ffffffffa0069507>] ? usb_disconnect+0x79/0x151 [usbcore]
[ 1320.927515]  [<ffffffffa006abee>] ? hub_thread+0x60d/0xea2 [usbcore]
[ 1320.927518]  [<ffffffff8105b309>] ? abort_exclusive_wait+0x7f/0x7f
[ 1320.927524]  [<ffffffffa006a5e1>] ? hub_port_debounce+0xcf/0xcf [usbcore]
[ 1320.927526]  [<ffffffff8105a909>] ? kthread+0x81/0x89
[ 1320.927528]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c
[ 1320.927530]  [<ffffffff813a75fc>] ? ret_from_fork+0x7c/0xb0
[ 1320.927532]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: XFS crash?
  2014-03-05 23:08 XFS crash? Austin Schuh
@ 2014-03-05 23:35 ` Dave Chinner
  2014-03-06  0:53   ` Austin Schuh
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2014-03-05 23:35 UTC (permalink / raw)
  To: Austin Schuh; +Cc: xfs

On Wed, Mar 05, 2014 at 03:08:16PM -0800, Austin Schuh wrote:
> Howdy,
> 
> I'm running a config_preempt_rt patched version of the 3.10.11 kernel,
> and I'm seeing a couple lockups and crashes which I think are related
> to XFS.

I think they ar emore likely related to RT issues....

> After the first lockup, I booted into single user mode, mounted the
> filesystem readonly, and then ran xfs_repair -d.  When I rebooted
> again, I initiated a large copy to a USB drive (formated EXT3) and
> continued editing source files and compiling.  After a couple minutes,
> my laptop started locking up and after 2 minutes, the following showed
> up in the kernel.
> 
> Austin
> 
> $ uname -a
> Linux vpc5 3.10-3-rt-amd64 #5 SMP PREEMPT RT Debian 3.10.11-2
> (2013-09-10) x86_64 GNU/Linux
> 
> [ 1186.363397] usb 2-1: USB disconnect, device number 2
> [ 1200.895262] INFO: task kworker/u16:5:250 blocked for more than 120 seconds.
> [ 1200.895267] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 1200.895270] kworker/u16:5   D ffff88040dc62cc0     0   250      2 0x00000000
> [ 1200.895281] Workqueue: writeback bdi_writeback_workfn (flush-8:0)
....
> [ 1200.895294] Call Trace:
> [ 1200.895301]  [<ffffffff813a10ef>] ? console_conditional_schedule+0xf/0xf
> [ 1200.895304]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
> [ 1200.895307]  [<ffffffff813a111b>] ? schedule_timeout+0x2c/0x123
> [ 1200.895310]  [<ffffffff813a5c20>] ? add_preempt_count+0xb7/0xe0
> [ 1200.895313]  [<ffffffff81065cfd>] ? migrate_enable+0x1cd/0x1dd
> [ 1200.895317]  [<ffffffff810651ab>] ? get_parent_ip+0x9/0x1b
> [ 1200.895319]  [<ffffffff813a5c20>] ? add_preempt_count+0xb7/0xe0
> [ 1200.895322]  [<ffffffff813a188b>] ? __wait_for_common+0x78/0xd6
> [ 1200.895356]  [<ffffffffa01a5032>] ? xfs_bmapi_allocate+0x92/0x9e [xfs]
> [ 1200.895371]  [<ffffffffa01a535d>] ? xfs_bmapi_write+0x31f/0x558 [xfs]
> [ 1200.895375]  [<ffffffff81109680>] ? kmem_cache_alloc+0x7c/0x17d
> [ 1200.895390]  [<ffffffffa01a2e6e>] ? __xfs_bmapi_allocate+0x22b/0x22b [xfs]
> [ 1200.895402]  [<ffffffffa018b899>] ? xfs_iomap_write_allocate+0x1bc/0x2c8 [xfs]
> [ 1200.895414]  [<ffffffffa017edc5>] ? xfs_map_blocks+0x125/0x1f5 [xfs]
> [ 1200.895424]  [<ffffffffa017fc87>] ? xfs_vm_writepage+0x266/0x48f [xfs]
> [ 1200.895428]  [<ffffffff810d3d14>] ? __writepage+0xd/0x2a
> [ 1200.895430]  [<ffffffff810d4790>] ? write_cache_pages+0x207/0x302
> [ 1200.895432]  [<ffffffff810d3d07>] ? page_index+0x14/0x14
> [ 1200.895435]  [<ffffffff810d48c6>] ? generic_writepages+0x3b/0x57
> [ 1200.895438]  [<ffffffff81134698>] ? __writeback_single_inode+0x72/0x225
> [ 1200.895441]  [<ffffffff8113550b>] ? writeback_sb_inodes+0x215/0x36d
> [ 1200.895444]  [<ffffffff811356cc>] ? __writeback_inodes_wb+0x69/0xab
> [ 1200.895446]  [<ffffffff81135844>] ? wb_writeback+0x136/0x2a7

That's waiting for allocation of blocks to complete during writeback
of dirty data via background writeback.

> [ 1200.895517] Workqueue: xfs-data/sda7 xfs_end_io [xfs]
....
> [ 1200.895526] Call Trace:
> [ 1200.895531]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
> [ 1200.895534]  [<ffffffff813a2438>] ? __rt_mutex_slowlock+0x7b/0xb4
> [ 1200.895537]  [<ffffffff813a2577>] ? rt_mutex_slowlock+0xe5/0x150
> [ 1200.895542]  [<ffffffff8100c02f>] ? load_TLS+0x7/0xa
> [ 1200.895552]  [<ffffffffa017eadb>] ? xfs_setfilesize+0x48/0x120 [xfs]
> [ 1200.895555]  [<ffffffff81063d25>] ? finish_task_switch+0x80/0xc6
> [ 1200.895565]  [<ffffffffa017f62f>] ? xfs_end_io+0x7a/0x8e [xfs]
> [ 1200.895568]  [<ffffffff81055a49>] ? process_one_work+0x19b/0x2b2
> [ 1200.895570]  [<ffffffff81055f41>] ? worker_thread+0x12b/0x1f6
> [ 1200.895572]  [<ffffffff81055e16>] ? rescuer_thread+0x28f/0x28f
> [ 1200.895574]  [<ffffffff8105a909>] ? kthread+0x81/0x89
> [ 1200.895576]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c
> [ 1200.895579]  [<ffffffff813a75fc>] ? ret_from_fork+0x7c/0xb0
> [ 1200.895581]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c

IO completion waiting on a lock. No idea what lock, because 
the rt code replaces various different types of locks with
"rt_mutexes".

> [ 1200.895630] BrowserBlocking D ffff88040dc62cc0     0  5547      1 0x00000000

fsync waiting for allocation completion during data writeback.

> [ 1200.895801] BrowserBlocking D ffff88040dde2cc0     0  5558      1 0x00000000

fsync waiting for IO completion of a data page during data
writeback.

> [ 1200.895876] BrowserBlocking D ffff88040dd62cc0     0  5575      1 0x00000000

ditto.

> [ 1200.895979] xterm           D ffff88040dc62cc0     0  6032      1 0x00000000

Blocked waiting on a workqueue flush. Completely unrealted to
filesystems and IO.

> [ 1200.896060] tup             D ffff88040dc62cc0     0 12846   7061 0x00000000

Blocked on an inode lock (rt_mutex, again) during timestamp updates
on a write(2) syscall.

> [ 1200.896162] ld              D ffff8802d3c4a180     0 12961      1 0x00000004

FUSE filesystem write blocked ?something? in it's IO path.

> [ 1200.896228] ld              D ffff88040dce2cc0     0 12970      1 0x00000006

FUSE filesystem write blocked ?something? in it's IO path after
recieving a signal in read(2) call.

> [ 1320.927338] khubd           D ffff88040dd62cc0     0   210      2 0x00000000

This is the smoking gun:

> [ 1320.927364]  [<ffffffff813a10ef>] ? console_conditional_schedule+0xf/0xf
> [ 1320.927367]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
> [ 1320.927370]  [<ffffffff813a111b>] ? schedule_timeout+0x2c/0x123
> [ 1320.927374]  [<ffffffff8102c27b>] ? _flat_send_IPI_mask+0x68/0x78
> [ 1320.927378]  [<ffffffff810651ab>] ? get_parent_ip+0x9/0x1b
> [ 1320.927381]  [<ffffffff813a5c20>] ? add_preempt_count+0xb7/0xe0
> [ 1320.927382]  [<ffffffff813a188b>] ? __wait_for_common+0x78/0xd6
> [ 1320.927394]  [<ffffffff810549bb>] ? flush_work+0xf6/0x119
> [ 1320.927396]  [<ffffffff8105344b>] ? create_and_start_worker+0x5e/0x5e
> [ 1320.927400]  [<ffffffff810d7309>] ? __pagevec_release+0x20/0x20
> [ 1320.927402]  [<ffffffff810564cc>] ? schedule_on_each_cpu+0x9e/0xd5
> [ 1320.927422]  [<ffffffff8113f5a0>] ? invalidate_bdev+0x1d/0x2e
> [ 1320.927425]  [<ffffffff8113f5f4>] ? __invalidate_device+0x43/0x4b
> [ 1320.927427]  [<ffffffff811bf94d>] ? invalidate_partition+0x27/0x41
> [ 1320.927429]  [<ffffffff811c09f9>] ? del_gendisk+0x86/0x1bb
> [ 1320.927435]  [<ffffffffa00de643>] ? sd_remove+0x5f/0x98 [sd_mod]
> [ 1320.927443]  [<ffffffff8128d2a0>] ? __device_release_driver+0x7f/0xd5
> [ 1320.927444]  [<ffffffff8128d53a>] ? device_release_driver+0x1a/0x25
> [ 1320.927446]  [<ffffffff8128cee7>] ? bus_remove_device+0xe7/0xfc
> [ 1320.927448]  [<ffffffff8128ada0>] ? device_del+0x120/0x176
> [ 1320.927455]  [<ffffffffa0029a0f>] ? __scsi_remove_device+0x4d/0xaf [scsi_mod]
> [ 1320.927461]  [<ffffffffa002898c>] ? scsi_forget_host+0x48/0x68 [scsi_mod]
> [ 1320.927467]  [<ffffffffa002025c>] ? scsi_remove_host+0x85/0x101 [scsi_mod]
> [ 1320.927472]  [<ffffffffa00d1f7c>] ? usb_stor_disconnect+0x63/0xae
> [usb_storage]
> [ 1320.927479]  [<ffffffffa0071d31>] ? usb_unbind_interface+0x5e/0x135 [usbcore]
> [ 1320.927488]  [<ffffffff8128d2a0>] ? __device_release_driver+0x7f/0xd5
> [ 1320.927490]  [<ffffffff8128d53a>] ? device_release_driver+0x1a/0x25
> [ 1320.927491]  [<ffffffff8128cee7>] ? bus_remove_device+0xe7/0xfc
> [ 1320.927493]  [<ffffffff8128ada0>] ? device_del+0x120/0x176
> [ 1320.927501]  [<ffffffffa0070025>] ? usb_disable_device+0x6a/0x180 [usbcore]
> [ 1320.927509]  [<ffffffffa0069507>] ? usb_disconnect+0x79/0x151 [usbcore]
> [ 1320.927515]  [<ffffffffa006abee>] ? hub_thread+0x60d/0xea2 [usbcore]
> [ 1320.927518]  [<ffffffff8105b309>] ? abort_exclusive_wait+0x7f/0x7f
> [ 1320.927524]  [<ffffffffa006a5e1>] ? hub_port_debounce+0xcf/0xcf [usbcore]
> [ 1320.927526]  [<ffffffff8105a909>] ? kthread+0x81/0x89
> [ 1320.927528]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c
> [ 1320.927530]  [<ffffffff813a75fc>] ? ret_from_fork+0x7c/0xb0
> [ 1320.927532]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c

Your usb device has disconnected and gone down the device
removal/invalidate partition route. and it's trying to flush the
device, which is stuck on IO completion which is stuck waiting for
the device error handling to error them out.

So, this is a block device problem error handling problem caused by
device unplug getting stuck because it's decided to ask the
filesystem to complete operations that can't be completed until the
device error handling progress far enough to error out the IOs that
the filesystem is waiting for completion on.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: XFS crash?
  2014-03-05 23:35 ` Dave Chinner
@ 2014-03-06  0:53   ` Austin Schuh
  2014-05-13  1:29     ` Austin Schuh
  0 siblings, 1 reply; 30+ messages in thread
From: Austin Schuh @ 2014-03-06  0:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hi Dave,

On Wed, Mar 5, 2014 at 3:35 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Wed, Mar 05, 2014 at 03:08:16PM -0800, Austin Schuh wrote:
>> Howdy,
>>
>> I'm running a config_preempt_rt patched version of the 3.10.11 kernel,
>> and I'm seeing a couple lockups and crashes which I think are related
>> to XFS.
>
> I think they ar emore likely related to RT issues....
>

That very well may be true.

> Your usb device has disconnected and gone down the device
> removal/invalidate partition route. and it's trying to flush the
> device, which is stuck on IO completion which is stuck waiting for
> the device error handling to error them out.
>
> So, this is a block device problem error handling problem caused by
> device unplug getting stuck because it's decided to ask the
> filesystem to complete operations that can't be completed until the
> device error handling progress far enough to error out the IOs that
> the filesystem is waiting for completion on.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

I unplugged the USB device _after_ the rest of the system started
locking up to see if that might unstick things.

Austin

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: XFS crash?
  2014-03-06  0:53   ` Austin Schuh
@ 2014-05-13  1:29     ` Austin Schuh
  2014-05-13  3:10       ` Austin Schuh
                         ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Austin Schuh @ 2014-05-13  1:29 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Mar 5, 2014 at 4:53 PM, Austin Schuh <austin@peloton-tech.com> wrote:
> Hi Dave,
>
> On Wed, Mar 5, 2014 at 3:35 PM, Dave Chinner <david@fromorbit.com> wrote:
>> On Wed, Mar 05, 2014 at 03:08:16PM -0800, Austin Schuh wrote:
>>> Howdy,
>>>
>>> I'm running a config_preempt_rt patched version of the 3.10.11 kernel,
>>> and I'm seeing a couple lockups and crashes which I think are related
>>> to XFS.
>>
>> I think they ar emore likely related to RT issues....
>>
>
> That very well may be true.
>
>> Your usb device has disconnected and gone down the device
>> removal/invalidate partition route. and it's trying to flush the
>> device, which is stuck on IO completion which is stuck waiting for
>> the device error handling to error them out.
>>
>> So, this is a block device problem error handling problem caused by
>> device unplug getting stuck because it's decided to ask the
>> filesystem to complete operations that can't be completed until the
>> device error handling progress far enough to error out the IOs that
>> the filesystem is waiting for completion on.
>>
>> Cheers,
>>
>> Dave.
>> --
>> Dave Chinner
>> david@fromorbit.com

I had the issue reproduce itself today with just the main SSD
installed.  This was on a new machine that was built this morning.
There is a lot less going on in this trace than the previous one.

[  360.448156] INFO: task kworker/1:1:42 blocked for more than 120 seconds.
[  360.450266] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  360.452404] kworker/1:1     D ffff88042e0e2cc0     0    42      2 0x00000000
[  360.452429] Workqueue: xfs-data/sda5 xfs_end_io [xfs]
[  360.452432]  ffff88042af38000 0000000000000046 0000000000000000
ffff88042b0cece0
[  360.452433]  0000000000062cc0 ffff88042af1ffd8 0000000000062cc0
ffff88042af1ffd8
[  360.452435]  0000000000062cc0 ffff88042af38000 ffff8803eac3be40
0000000000000002
[  360.452437] Call Trace:
[  360.452443]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
[  360.452445]  [<ffffffff813a2438>] ? __rt_mutex_slowlock+0x7b/0xb4
[  360.452447]  [<ffffffff813a2577>] ? rt_mutex_slowlock+0xe5/0x150
[  360.452455]  [<ffffffffa0099adb>] ? xfs_setfilesize+0x48/0x120 [xfs]
[  360.452462]  [<ffffffffa009a62f>] ? xfs_end_io+0x7a/0x8e [xfs]
[  360.452465]  [<ffffffff81055a49>] ? process_one_work+0x19b/0x2b2
[  360.452468]  [<ffffffff81055f41>] ? worker_thread+0x12b/0x1f6
[  360.452469]  [<ffffffff81055e16>] ? rescuer_thread+0x28f/0x28f
[  360.452471]  [<ffffffff8105a909>] ? kthread+0x81/0x89
[  360.452473]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c
[  360.452475]  [<ffffffff813a75fc>] ? ret_from_fork+0x7c/0xb0
[  360.452477]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c
[  360.452483] INFO: task kworker/u16:4:222 blocked for more than 120 seconds.
[  360.454614] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  360.456801] kworker/u16:4   D ffff88042e062cc0     0   222      2 0x00000000
[  360.456807] Workqueue: writeback bdi_writeback_workfn (flush-8:0)
[  360.456810]  ffff88042af3cb60 0000000000000046 ffff8804292a0000
ffffffff81616400
[  360.456812]  0000000000062cc0 ffff8804292a1fd8 0000000000062cc0
ffff8804292a1fd8
[  360.456813]  ffff8804292a1978 ffff88042af3cb60 ffff88042af3cb60
ffff8804292a1a50
[  360.456815] Call Trace:
[  360.456819]  [<ffffffff810cc034>] ? __lock_page+0x66/0x66
[  360.456822]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
[  360.456824]  [<ffffffff813a1ff9>] ? io_schedule+0x55/0x6b
[  360.456825]  [<ffffffff810cc03a>] ? sleep_on_page+0x6/0xa
[  360.456827]  [<ffffffff813a137a>] ? __wait_on_bit_lock+0x3c/0x85
[  360.456829]  [<ffffffff810cc4a4>] ? find_get_pages_tag+0xfa/0x125
[  360.456831]  [<ffffffff810cc02f>] ? __lock_page+0x61/0x66
[  360.456833]  [<ffffffff8105b333>] ? autoremove_wake_function+0x2a/0x2a
[  360.456835]  [<ffffffff810d4700>] ? write_cache_pages+0x177/0x302
[  360.456836]  [<ffffffff810d3d07>] ? page_index+0x14/0x14
[  360.456838]  [<ffffffff810d48c6>] ? generic_writepages+0x3b/0x57
[  360.456840]  [<ffffffff81134698>] ? __writeback_single_inode+0x72/0x225
[  360.456842]  [<ffffffff8113550b>] ? writeback_sb_inodes+0x215/0x36d
[  360.456844]  [<ffffffff811356cc>] ? __writeback_inodes_wb+0x69/0xab
[  360.456846]  [<ffffffff81135844>] ? wb_writeback+0x136/0x2a7
[  360.456848]  [<ffffffff81135c88>] ? wb_do_writeback+0x161/0x1dc
[  360.456851]  [<ffffffff81135d66>] ? bdi_writeback_workfn+0x63/0xf4
[  360.456852]  [<ffffffff81055a49>] ? process_one_work+0x19b/0x2b2
[  360.456854]  [<ffffffff81055f41>] ? worker_thread+0x12b/0x1f6
[  360.456856]  [<ffffffff81055e16>] ? rescuer_thread+0x28f/0x28f
[  360.456857]  [<ffffffff8105a909>] ? kthread+0x81/0x89
[  360.456859]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c
[  360.456860]  [<ffffffff813a75fc>] ? ret_from_fork+0x7c/0xb0
[  360.456862]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c
[  360.456881] INFO: task dpkg:5140 blocked for more than 120 seconds.
[  360.459062] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  360.461283] dpkg            D ffff88042e0e2cc0     0  5140   5112 0x00000000
[  360.461287]  ffff8804290d2180 0000000000000086 000000000000020c
ffff88042af38000
[  360.461289]  0000000000062cc0 ffff880429bf7fd8 0000000000062cc0
ffff880429bf7fd8
[  360.461290]  ffff880429bf78e8 ffff8804290d2180 ffff880429bf7a10
ffff880429bf7a08
[  360.461292] Call Trace:
[  360.461296]  [<ffffffff813a10ef>] ? console_conditional_schedule+0xf/0xf
[  360.461298]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
[  360.461299]  [<ffffffff813a111b>] ? schedule_timeout+0x2c/0x123
[  360.461301]  [<ffffffff813a5c20>] ? add_preempt_count+0xb7/0xe0
[  360.461303]  [<ffffffff81065cfd>] ? migrate_enable+0x1cd/0x1dd
[  360.461306]  [<ffffffff810651ab>] ? get_parent_ip+0x9/0x1b
[  360.461308]  [<ffffffff813a5c20>] ? add_preempt_count+0xb7/0xe0
[  360.461310]  [<ffffffff813a188b>] ? __wait_for_common+0x78/0xd6
[  360.461323]  [<ffffffffa00c0032>] ? xfs_bmapi_allocate+0x92/0x9e [xfs]
[  360.461333]  [<ffffffffa00c035d>] ? xfs_bmapi_write+0x31f/0x558 [xfs]
[  360.461336]  [<ffffffff81109680>] ? kmem_cache_alloc+0x7c/0x17d
[  360.461346]  [<ffffffffa00bde6e>] ? __xfs_bmapi_allocate+0x22b/0x22b [xfs]
[  360.461354]  [<ffffffffa00a6899>] ?
xfs_iomap_write_allocate+0x1bc/0x2c8 [xfs]
[  360.461362]  [<ffffffffa0099dc5>] ? xfs_map_blocks+0x125/0x1f5 [xfs]
[  360.461369]  [<ffffffffa009ac87>] ? xfs_vm_writepage+0x266/0x48f [xfs]
[  360.461371]  [<ffffffff810d3d14>] ? __writepage+0xd/0x2a
[  360.461372]  [<ffffffff810d4790>] ? write_cache_pages+0x207/0x302
[  360.461374]  [<ffffffff810d3d07>] ? page_index+0x14/0x14
[  360.461376]  [<ffffffff810d48c6>] ? generic_writepages+0x3b/0x57
[  360.461378]  [<ffffffff810cd303>] ? __filemap_fdatawrite_range+0x50/0x55
[  360.461380]  [<ffffffff81138a63>] ? SyS_sync_file_range+0xe2/0x127
[  360.461382]  [<ffffffff813a76a9>] ? system_call_fastpath+0x16/0x1b

Austin

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: XFS crash?
  2014-05-13  1:29     ` Austin Schuh
@ 2014-05-13  3:10       ` Austin Schuh
  2014-05-13  3:33       ` Austin Schuh
  2014-05-13  3:46       ` Dave Chinner
  2 siblings, 0 replies; 30+ messages in thread
From: Austin Schuh @ 2014-05-13  3:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, May 12, 2014 at 6:29 PM, Austin Schuh <austin@peloton-tech.com> wrote:
> On Wed, Mar 5, 2014 at 4:53 PM, Austin Schuh <austin@peloton-tech.com> wrote:
>> Hi Dave,
>>
>> On Wed, Mar 5, 2014 at 3:35 PM, Dave Chinner <david@fromorbit.com> wrote:
>>> On Wed, Mar 05, 2014 at 03:08:16PM -0800, Austin Schuh wrote:
>>>> Howdy,
>>>>
>>>> I'm running a config_preempt_rt patched version of the 3.10.11 kernel,
>>>> and I'm seeing a couple lockups and crashes which I think are related
>>>> to XFS.
>>>
>>> I think they ar emore likely related to RT issues....
>>>
>>
>> That very well may be true.
>>
>>> Your usb device has disconnected and gone down the device
>>> removal/invalidate partition route. and it's trying to flush the
>>> device, which is stuck on IO completion which is stuck waiting for
>>> the device error handling to error them out.
>>>
>>> So, this is a block device problem error handling problem caused by
>>> device unplug getting stuck because it's decided to ask the
>>> filesystem to complete operations that can't be completed until the
>>> device error handling progress far enough to error out the IOs that
>>> the filesystem is waiting for completion on.
>>>
>>> Cheers,
>>>
>>> Dave.
>>> --
>>> Dave Chinner
>>> david@fromorbit.com
>
> I had the issue reproduce itself today with just the main SSD
> installed.  This was on a new machine that was built this morning.
> There is a lot less going on in this trace than the previous one.
>
> [  360.448156] INFO: task kworker/1:1:42 blocked for more than 120 seconds.
> [  360.450266] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  360.452404] kworker/1:1     D ffff88042e0e2cc0     0    42      2 0x00000000
> [  360.452429] Workqueue: xfs-data/sda5 xfs_end_io [xfs]
> [  360.452432]  ffff88042af38000 0000000000000046 0000000000000000
> ffff88042b0cece0
> [  360.452433]  0000000000062cc0 ffff88042af1ffd8 0000000000062cc0
> ffff88042af1ffd8
> [  360.452435]  0000000000062cc0 ffff88042af38000 ffff8803eac3be40
> 0000000000000002
> [  360.452437] Call Trace:
> [  360.452443]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
> [  360.452445]  [<ffffffff813a2438>] ? __rt_mutex_slowlock+0x7b/0xb4
> [  360.452447]  [<ffffffff813a2577>] ? rt_mutex_slowlock+0xe5/0x150
> [  360.452455]  [<ffffffffa0099adb>] ? xfs_setfilesize+0x48/0x120 [xfs]
> [  360.452462]  [<ffffffffa009a62f>] ? xfs_end_io+0x7a/0x8e [xfs]
> [  360.452465]  [<ffffffff81055a49>] ? process_one_work+0x19b/0x2b2
> [  360.452468]  [<ffffffff81055f41>] ? worker_thread+0x12b/0x1f6
> [  360.452469]  [<ffffffff81055e16>] ? rescuer_thread+0x28f/0x28f
> [  360.452471]  [<ffffffff8105a909>] ? kthread+0x81/0x89
> [  360.452473]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c
> [  360.452475]  [<ffffffff813a75fc>] ? ret_from_fork+0x7c/0xb0
> [  360.452477]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c
> [  360.452483] INFO: task kworker/u16:4:222 blocked for more than 120 seconds.
> [  360.454614] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  360.456801] kworker/u16:4   D ffff88042e062cc0     0   222      2 0x00000000
> [  360.456807] Workqueue: writeback bdi_writeback_workfn (flush-8:0)
> [  360.456810]  ffff88042af3cb60 0000000000000046 ffff8804292a0000
> ffffffff81616400
> [  360.456812]  0000000000062cc0 ffff8804292a1fd8 0000000000062cc0
> ffff8804292a1fd8
> [  360.456813]  ffff8804292a1978 ffff88042af3cb60 ffff88042af3cb60
> ffff8804292a1a50
> [  360.456815] Call Trace:
> [  360.456819]  [<ffffffff810cc034>] ? __lock_page+0x66/0x66
> [  360.456822]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
> [  360.456824]  [<ffffffff813a1ff9>] ? io_schedule+0x55/0x6b
> [  360.456825]  [<ffffffff810cc03a>] ? sleep_on_page+0x6/0xa
> [  360.456827]  [<ffffffff813a137a>] ? __wait_on_bit_lock+0x3c/0x85
> [  360.456829]  [<ffffffff810cc4a4>] ? find_get_pages_tag+0xfa/0x125
> [  360.456831]  [<ffffffff810cc02f>] ? __lock_page+0x61/0x66
> [  360.456833]  [<ffffffff8105b333>] ? autoremove_wake_function+0x2a/0x2a
> [  360.456835]  [<ffffffff810d4700>] ? write_cache_pages+0x177/0x302
> [  360.456836]  [<ffffffff810d3d07>] ? page_index+0x14/0x14
> [  360.456838]  [<ffffffff810d48c6>] ? generic_writepages+0x3b/0x57
> [  360.456840]  [<ffffffff81134698>] ? __writeback_single_inode+0x72/0x225
> [  360.456842]  [<ffffffff8113550b>] ? writeback_sb_inodes+0x215/0x36d
> [  360.456844]  [<ffffffff811356cc>] ? __writeback_inodes_wb+0x69/0xab
> [  360.456846]  [<ffffffff81135844>] ? wb_writeback+0x136/0x2a7
> [  360.456848]  [<ffffffff81135c88>] ? wb_do_writeback+0x161/0x1dc
> [  360.456851]  [<ffffffff81135d66>] ? bdi_writeback_workfn+0x63/0xf4
> [  360.456852]  [<ffffffff81055a49>] ? process_one_work+0x19b/0x2b2
> [  360.456854]  [<ffffffff81055f41>] ? worker_thread+0x12b/0x1f6
> [  360.456856]  [<ffffffff81055e16>] ? rescuer_thread+0x28f/0x28f
> [  360.456857]  [<ffffffff8105a909>] ? kthread+0x81/0x89
> [  360.456859]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c
> [  360.456860]  [<ffffffff813a75fc>] ? ret_from_fork+0x7c/0xb0
> [  360.456862]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c
> [  360.456881] INFO: task dpkg:5140 blocked for more than 120 seconds.
> [  360.459062] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  360.461283] dpkg            D ffff88042e0e2cc0     0  5140   5112 0x00000000
> [  360.461287]  ffff8804290d2180 0000000000000086 000000000000020c
> ffff88042af38000
> [  360.461289]  0000000000062cc0 ffff880429bf7fd8 0000000000062cc0
> ffff880429bf7fd8
> [  360.461290]  ffff880429bf78e8 ffff8804290d2180 ffff880429bf7a10
> ffff880429bf7a08
> [  360.461292] Call Trace:
> [  360.461296]  [<ffffffff813a10ef>] ? console_conditional_schedule+0xf/0xf
> [  360.461298]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
> [  360.461299]  [<ffffffff813a111b>] ? schedule_timeout+0x2c/0x123
> [  360.461301]  [<ffffffff813a5c20>] ? add_preempt_count+0xb7/0xe0
> [  360.461303]  [<ffffffff81065cfd>] ? migrate_enable+0x1cd/0x1dd
> [  360.461306]  [<ffffffff810651ab>] ? get_parent_ip+0x9/0x1b
> [  360.461308]  [<ffffffff813a5c20>] ? add_preempt_count+0xb7/0xe0
> [  360.461310]  [<ffffffff813a188b>] ? __wait_for_common+0x78/0xd6
> [  360.461323]  [<ffffffffa00c0032>] ? xfs_bmapi_allocate+0x92/0x9e [xfs]
> [  360.461333]  [<ffffffffa00c035d>] ? xfs_bmapi_write+0x31f/0x558 [xfs]
> [  360.461336]  [<ffffffff81109680>] ? kmem_cache_alloc+0x7c/0x17d
> [  360.461346]  [<ffffffffa00bde6e>] ? __xfs_bmapi_allocate+0x22b/0x22b [xfs]
> [  360.461354]  [<ffffffffa00a6899>] ?
> xfs_iomap_write_allocate+0x1bc/0x2c8 [xfs]
> [  360.461362]  [<ffffffffa0099dc5>] ? xfs_map_blocks+0x125/0x1f5 [xfs]
> [  360.461369]  [<ffffffffa009ac87>] ? xfs_vm_writepage+0x266/0x48f [xfs]
> [  360.461371]  [<ffffffff810d3d14>] ? __writepage+0xd/0x2a
> [  360.461372]  [<ffffffff810d4790>] ? write_cache_pages+0x207/0x302
> [  360.461374]  [<ffffffff810d3d07>] ? page_index+0x14/0x14
> [  360.461376]  [<ffffffff810d48c6>] ? generic_writepages+0x3b/0x57
> [  360.461378]  [<ffffffff810cd303>] ? __filemap_fdatawrite_range+0x50/0x55
> [  360.461380]  [<ffffffff81138a63>] ? SyS_sync_file_range+0xe2/0x127
> [  360.461382]  [<ffffffff813a76a9>] ? system_call_fastpath+0x16/0x1b
>
> Austin

Fun times...  I rebooted the machine (had to power cycle it to get it
to go down), repeated the same set of commands and it locked up again.

I ran apt-get update; dpkg --configure -a; apt-get update; apt-get
upgrade, and then it locked up during the upgrade.  It was in the
middle of unpacking a 348 MB package.

[  241.634377] INFO: task kworker/1:2:60 blocked for more than 120 seconds.
[  241.641284] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  241.648252] kworker/1:2     D ffff880429ed10c0     0    60      2 0x00000000
[  241.648310] Workqueue: xfs-data/sda5 xfs_end_io [xfs]
[  241.648320]  ffff880429ed10c0 0000000000000046 ffffffffffffffff
ffff8804240053c0
[  241.648327]  0000000000062cc0 ffff880429f4dfd8 0000000000062cc0
ffff880429f4dfd8
[  241.648331]  0000000000000001 ffff880429ed10c0 ffff8803eb87dac0
0000000000000002
[  241.648339] Call Trace:
[  241.648358]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
[  241.648365]  [<ffffffff813a2438>] ? __rt_mutex_slowlock+0x7b/0xb4
[  241.648371]  [<ffffffff813a2577>] ? rt_mutex_slowlock+0xe5/0x150
[  241.648380]  [<ffffffff8100c02f>] ? load_TLS+0x7/0xa
[  241.648415]  [<ffffffffa00a9adb>] ? xfs_setfilesize+0x48/0x120 [xfs]
[  241.648423]  [<ffffffff81063d25>] ? finish_task_switch+0x80/0xc6
[  241.648447]  [<ffffffffa00aa62f>] ? xfs_end_io+0x7a/0x8e [xfs]
[  241.648455]  [<ffffffff81055a49>] ? process_one_work+0x19b/0x2b2
[  241.648462]  [<ffffffff81055f41>] ? worker_thread+0x12b/0x1f6
[  241.648468]  [<ffffffff81055e16>] ? rescuer_thread+0x28f/0x28f
[  241.648473]  [<ffffffff8105a909>] ? kthread+0x81/0x89
[  241.648481]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c
[  241.648487]  [<ffffffff813a75fc>] ? ret_from_fork+0x7c/0xb0
[  241.648492]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c
[  241.648531] INFO: task dpkg:5181 blocked for more than 120 seconds.
[  241.655649] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  241.662711] dpkg            D ffff88042e0e2cc0     0  5181   5153 0x00000000
[  241.662727]  ffff8804240053c0 0000000000000086 0000000000000018
ffff88042b0cece0
[  241.662731]  0000000000062cc0 ffff88042989dfd8 0000000000062cc0
ffff88042989dfd8
[  241.662735]  ffff88042989d8e8 ffff8804240053c0 ffff88042989da10
ffff88042989da08
[  241.662742] Call Trace:
[  241.662754]  [<ffffffff813a10ef>] ? console_conditional_schedule+0xf/0xf
[  241.662760]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
[  241.662767]  [<ffffffff813a111b>] ? schedule_timeout+0x2c/0x123
[  241.662772]  [<ffffffff813a5c20>] ? add_preempt_count+0xb7/0xe0
[  241.662777]  [<ffffffff81065cfd>] ? migrate_enable+0x1cd/0x1dd
[  241.662786]  [<ffffffff810651ab>] ? get_parent_ip+0x9/0x1b
[  241.662791]  [<ffffffff813a5c20>] ? add_preempt_count+0xb7/0xe0
[  241.662797]  [<ffffffff813a188b>] ? __wait_for_common+0x78/0xd6
[  241.662845]  [<ffffffffa00d0032>] ? xfs_bmapi_allocate+0x92/0x9e [xfs]
[  241.662878]  [<ffffffffa00d035d>] ? xfs_bmapi_write+0x31f/0x558 [xfs]
[  241.662884]  [<ffffffff81063d25>] ? finish_task_switch+0x80/0xc6
[  241.662924]  [<ffffffffa00cde6e>] ? __xfs_bmapi_allocate+0x22b/0x22b [xfs]
[  241.662950]  [<ffffffffa00b6899>] ?
xfs_iomap_write_allocate+0x1bc/0x2c8 [xfs]
[  241.662977]  [<ffffffffa00a9dc5>] ? xfs_map_blocks+0x125/0x1f5 [xfs]
[  241.663001]  [<ffffffffa00aac87>] ? xfs_vm_writepage+0x266/0x48f [xfs]
[  241.663010]  [<ffffffff810d3d14>] ? __writepage+0xd/0x2a
[  241.663014]  [<ffffffff810d4790>] ? write_cache_pages+0x207/0x302
[  241.663018]  [<ffffffff810d3d07>] ? page_index+0x14/0x14
[  241.663025]  [<ffffffff810d48c6>] ? generic_writepages+0x3b/0x57
[  241.663034]  [<ffffffff810cd303>] ? __filemap_fdatawrite_range+0x50/0x55
[  241.663039]  [<ffffffff81138a63>] ? SyS_sync_file_range+0xe2/0x127
[  241.663047]  [<ffffffff813a76a9>] ? system_call_fastpath+0x16/0x1b

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: XFS crash?
  2014-05-13  1:29     ` Austin Schuh
  2014-05-13  3:10       ` Austin Schuh
@ 2014-05-13  3:33       ` Austin Schuh
  2014-05-13  3:46       ` Dave Chinner
  2 siblings, 0 replies; 30+ messages in thread
From: Austin Schuh @ 2014-05-13  3:33 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, May 12, 2014 at 6:29 PM, Austin Schuh <austin@peloton-tech.com> wrote:
> On Wed, Mar 5, 2014 at 4:53 PM, Austin Schuh <austin@peloton-tech.com> wrote:
>> Hi Dave,
>>
>> On Wed, Mar 5, 2014 at 3:35 PM, Dave Chinner <david@fromorbit.com> wrote:
>>> On Wed, Mar 05, 2014 at 03:08:16PM -0800, Austin Schuh wrote:
>>>> Howdy,
>>>>
>>>> I'm running a config_preempt_rt patched version of the 3.10.11 kernel,
>>>> and I'm seeing a couple lockups and crashes which I think are related
>>>> to XFS.
>>>
>>> I think they ar emore likely related to RT issues....
>>>
>>
>> That very well may be true.
>>
>>> Your usb device has disconnected and gone down the device
>>> removal/invalidate partition route. and it's trying to flush the
>>> device, which is stuck on IO completion which is stuck waiting for
>>> the device error handling to error them out.
>>>
>>> So, this is a block device problem error handling problem caused by
>>> device unplug getting stuck because it's decided to ask the
>>> filesystem to complete operations that can't be completed until the
>>> device error handling progress far enough to error out the IOs that
>>> the filesystem is waiting for completion on.
>>>
>>> Cheers,
>>>
>>> Dave.
>>> --
>>> Dave Chinner
>>> david@fromorbit.com
>
> I had the issue reproduce itself today with just the main SSD
> installed.  This was on a new machine that was built this morning.
> There is a lot less going on in this trace than the previous one.


Fun times...  I rebooted the machine (had to power cycle it to get it
to go down), repeated the same set of commands and it locked up again.

I ran apt-get update; dpkg --configure -a; apt-get update; apt-get
upgrade, and then it locked up during the upgrade.  It was in the
middle of unpacking a 348 MB package.

[  241.634377] INFO: task kworker/1:2:60 blocked for more than 120 seconds.
[  241.641284] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  241.648252] kworker/1:2     D ffff880429ed10c0     0    60      2 0x00000000
[  241.648310] Workqueue: xfs-data/sda5 xfs_end_io [xfs]
[  241.648320]  ffff880429ed10c0 0000000000000046 ffffffffffffffff
ffff8804240053c0
[  241.648327]  0000000000062cc0 ffff880429f4dfd8 0000000000062cc0
ffff880429f4dfd8
[  241.648331]  0000000000000001 ffff880429ed10c0 ffff8803eb87dac0
0000000000000002
[  241.648339] Call Trace:
[  241.648358]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
[  241.648365]  [<ffffffff813a2438>] ? __rt_mutex_slowlock+0x7b/0xb4
[  241.648371]  [<ffffffff813a2577>] ? rt_mutex_slowlock+0xe5/0x150
[  241.648380]  [<ffffffff8100c02f>] ? load_TLS+0x7/0xa
[  241.648415]  [<ffffffffa00a9adb>] ? xfs_setfilesize+0x48/0x120 [xfs]
[  241.648423]  [<ffffffff81063d25>] ? finish_task_switch+0x80/0xc6
[  241.648447]  [<ffffffffa00aa62f>] ? xfs_end_io+0x7a/0x8e [xfs]
[  241.648455]  [<ffffffff81055a49>] ? process_one_work+0x19b/0x2b2
[  241.648462]  [<ffffffff81055f41>] ? worker_thread+0x12b/0x1f6
[  241.648468]  [<ffffffff81055e16>] ? rescuer_thread+0x28f/0x28f
[  241.648473]  [<ffffffff8105a909>] ? kthread+0x81/0x89
[  241.648481]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c
[  241.648487]  [<ffffffff813a75fc>] ? ret_from_fork+0x7c/0xb0
[  241.648492]  [<ffffffff8105a888>] ? __kthread_parkme+0x5c/0x5c
[  241.648531] INFO: task dpkg:5181 blocked for more than 120 seconds.
[  241.655649] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  241.662711] dpkg            D ffff88042e0e2cc0     0  5181   5153 0x00000000
[  241.662727]  ffff8804240053c0 0000000000000086 0000000000000018
ffff88042b0cece0
[  241.662731]  0000000000062cc0 ffff88042989dfd8 0000000000062cc0
ffff88042989dfd8
[  241.662735]  ffff88042989d8e8 ffff8804240053c0 ffff88042989da10
ffff88042989da08
[  241.662742] Call Trace:
[  241.662754]  [<ffffffff813a10ef>] ? console_conditional_schedule+0xf/0xf
[  241.662760]  [<ffffffff813a1f93>] ? schedule+0x6b/0x7c
[  241.662767]  [<ffffffff813a111b>] ? schedule_timeout+0x2c/0x123
[  241.662772]  [<ffffffff813a5c20>] ? add_preempt_count+0xb7/0xe0
[  241.662777]  [<ffffffff81065cfd>] ? migrate_enable+0x1cd/0x1dd
[  241.662786]  [<ffffffff810651ab>] ? get_parent_ip+0x9/0x1b
[  241.662791]  [<ffffffff813a5c20>] ? add_preempt_count+0xb7/0xe0
[  241.662797]  [<ffffffff813a188b>] ? __wait_for_common+0x78/0xd6
[  241.662845]  [<ffffffffa00d0032>] ? xfs_bmapi_allocate+0x92/0x9e [xfs]
[  241.662878]  [<ffffffffa00d035d>] ? xfs_bmapi_write+0x31f/0x558 [xfs]
[  241.662884]  [<ffffffff81063d25>] ? finish_task_switch+0x80/0xc6
[  241.662924]  [<ffffffffa00cde6e>] ? __xfs_bmapi_allocate+0x22b/0x22b [xfs]
[  241.662950]  [<ffffffffa00b6899>] ?
xfs_iomap_write_allocate+0x1bc/0x2c8 [xfs]
[  241.662977]  [<ffffffffa00a9dc5>] ? xfs_map_blocks+0x125/0x1f5 [xfs]
[  241.663001]  [<ffffffffa00aac87>] ? xfs_vm_writepage+0x266/0x48f [xfs]
[  241.663010]  [<ffffffff810d3d14>] ? __writepage+0xd/0x2a
[  241.663014]  [<ffffffff810d4790>] ? write_cache_pages+0x207/0x302
[  241.663018]  [<ffffffff810d3d07>] ? page_index+0x14/0x14
[  241.663025]  [<ffffffff810d48c6>] ? generic_writepages+0x3b/0x57
[  241.663034]  [<ffffffff810cd303>] ? __filemap_fdatawrite_range+0x50/0x55
[  241.663039]  [<ffffffff81138a63>] ? SyS_sync_file_range+0xe2/0x127
[  241.663047]  [<ffffffff813a76a9>] ? system_call_fastpath+0x16/0x1b

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: XFS crash?
  2014-05-13  1:29     ` Austin Schuh
  2014-05-13  3:10       ` Austin Schuh
  2014-05-13  3:33       ` Austin Schuh
@ 2014-05-13  3:46       ` Dave Chinner
  2014-05-13  4:03         ` Austin Schuh
  2 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2014-05-13  3:46 UTC (permalink / raw)
  To: Austin Schuh; +Cc: xfs

On Mon, May 12, 2014 at 06:29:28PM -0700, Austin Schuh wrote:
> On Wed, Mar 5, 2014 at 4:53 PM, Austin Schuh <austin@peloton-tech.com> wrote:
> > Hi Dave,
> >
> > On Wed, Mar 5, 2014 at 3:35 PM, Dave Chinner <david@fromorbit.com> wrote:
> >> On Wed, Mar 05, 2014 at 03:08:16PM -0800, Austin Schuh wrote:
> >>> Howdy,
> >>>
> >>> I'm running a config_preempt_rt patched version of the 3.10.11 kernel,
> >>> and I'm seeing a couple lockups and crashes which I think are related
> >>> to XFS.
> >>
> >> I think they ar emore likely related to RT issues....
> >>
> >
> > That very well may be true.
> >
> >> Your usb device has disconnected and gone down the device
> >> removal/invalidate partition route. and it's trying to flush the
> >> device, which is stuck on IO completion which is stuck waiting for
> >> the device error handling to error them out.
> >>
> >> So, this is a block device problem error handling problem caused by
> >> device unplug getting stuck because it's decided to ask the
> >> filesystem to complete operations that can't be completed until the
> >> device error handling progress far enough to error out the IOs that
> >> the filesystem is waiting for completion on.
> >>
> >> Cheers,
> >>
> >> Dave.
> >> --
> >> Dave Chinner
> >> david@fromorbit.com
> 
> I had the issue reproduce itself today with just the main SSD
> installed.  This was on a new machine that was built this morning.
> There is a lot less going on in this trace than the previous one.

The three blocked threads:

	1. kworker running IO completion waiting on an inode lock,
	   holding locked pages.
	2. kworker running writeback flusher work waiting for a page lock
	3. direct flush work waiting for allocation, holding page
	   locks and the inode lock.

What's the kworker thread running the allocation work doing?

You might need to run `echo w > proc-sysrq-trigger` to get this
information...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: XFS crash?
  2014-05-13  3:46       ` Dave Chinner
@ 2014-05-13  4:03         ` Austin Schuh
  2014-05-13  6:39           ` Dave Chinner
  0 siblings, 1 reply; 30+ messages in thread
From: Austin Schuh @ 2014-05-13  4:03 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

[-- Attachment #1: Type: text/plain, Size: 2284 bytes --]

On Mon, May 12, 2014 at 8:46 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Mon, May 12, 2014 at 06:29:28PM -0700, Austin Schuh wrote:
>> On Wed, Mar 5, 2014 at 4:53 PM, Austin Schuh <austin@peloton-tech.com> wrote:
>> > Hi Dave,
>> >
>> > On Wed, Mar 5, 2014 at 3:35 PM, Dave Chinner <david@fromorbit.com> wrote:
>> >> On Wed, Mar 05, 2014 at 03:08:16PM -0800, Austin Schuh wrote:
>> >>> Howdy,
>> >>>
>> >>> I'm running a config_preempt_rt patched version of the 3.10.11 kernel,
>> >>> and I'm seeing a couple lockups and crashes which I think are related
>> >>> to XFS.
>> >>
>> >> I think they ar emore likely related to RT issues....
>> >>
>> >
>> > That very well may be true.
>> >
>> >> Your usb device has disconnected and gone down the device
>> >> removal/invalidate partition route. and it's trying to flush the
>> >> device, which is stuck on IO completion which is stuck waiting for
>> >> the device error handling to error them out.
>> >>
>> >> So, this is a block device problem error handling problem caused by
>> >> device unplug getting stuck because it's decided to ask the
>> >> filesystem to complete operations that can't be completed until the
>> >> device error handling progress far enough to error out the IOs that
>> >> the filesystem is waiting for completion on.
>> >>
>> >> Cheers,
>> >>
>> >> Dave.
>> >> --
>> >> Dave Chinner
>> >> david@fromorbit.com
>>
>> I had the issue reproduce itself today with just the main SSD
>> installed.  This was on a new machine that was built this morning.
>> There is a lot less going on in this trace than the previous one.
>
> The three blocked threads:
>
>         1. kworker running IO completion waiting on an inode lock,
>            holding locked pages.
>         2. kworker running writeback flusher work waiting for a page lock
>         3. direct flush work waiting for allocation, holding page
>            locks and the inode lock.
>
> What's the kworker thread running the allocation work doing?
>
> You might need to run `echo w > proc-sysrq-trigger` to get this
> information...

I was able to reproduce the lockup.  I ran `echo w >
/proc/sysrq-trigger` per your suggestion.  I don't know how to figure
out what the kworker thread is doing, but I'll happily do it if you
can give me some guidance.

[-- Attachment #2: dmesg --]
[-- Type: application/octet-stream, Size: 33300 bytes --]

[ 1250.199560] SysRq : Show Blocked State
[ 1250.199772]   task                        PC stack   pid father
[ 1250.199798] kworker/3:1     D ffff88042dbe1e40     0    91      2 0x00000000
[ 1250.199820] Workqueue: xfs-data/sda5 xfs_end_io
[ 1250.199824]  ffff88041a8d70f0 0000000000000002 ffff88041af96840 0000000000000000
[ 1250.199827]  0000000000061e40 ffff8804195ddfd8 0000000000061e40 ffff8804195ddfd8
[ 1250.199836]  0000000000000400 ffff88041a8d70f0 ffff880357bb2b40 0000000000000002
[ 1250.199839] Call Trace:
[ 1250.199847]  [<ffffffff8147a22f>] ? schedule+0x70/0x81
[ 1250.199848]  [<ffffffff8147a738>] ? __rt_mutex_slowlock+0x80/0xb9
[ 1250.199850]  [<ffffffff8147a886>] ? rt_mutex_slowlock+0xea/0x155
[ 1250.199852]  [<ffffffff811996dd>] ? xfs_setfilesize+0x4d/0x125
[ 1250.199856]  [<ffffffff8105ab1e>] ? finish_task_switch+0x60/0xac
[ 1250.199857]  [<ffffffff8119a338>] ? xfs_end_io+0x7f/0x93
[ 1250.199860]  [<ffffffff8104bb96>] ? process_one_work+0x1a0/0x2b7
[ 1250.199862]  [<ffffffff8104c09b>] ? worker_thread+0x130/0x1fb
[ 1250.199864]  [<ffffffff8104bf6b>] ? rescuer_thread+0x294/0x294
[ 1250.199865]  [<ffffffff81050e74>] ? kthread+0x86/0x8e
[ 1250.199867]  [<ffffffff81050dee>] ? __kthread_parkme+0x60/0x60
[ 1250.199870]  [<ffffffff8147ffdc>] ? ret_from_fork+0x7c/0xb0
[ 1250.199872]  [<ffffffff81050dee>] ? __kthread_parkme+0x60/0x60
[ 1250.199881] kworker/2:1     D ffff88042db61e40     0   196      2 0x00000000
[ 1250.199894] Workqueue: xfs-data/sda5 xfs_end_io
[ 1250.199898]  ffff8804182c0000 0000000000000002 ffff88041a964580 ffff880357bb2af8
[ 1250.199899]  0000000000061e40 ffff8804182c9fd8 0000000000061e40 ffff8804182c9fd8
[ 1250.199905]  0000000000000000 ffff8804182c0000 ffff880357bb2b40 0000000000000002
[ 1250.199907] Call Trace:
[ 1250.199911]  [<ffffffff8147a22f>] ? schedule+0x70/0x81
[ 1250.199913]  [<ffffffff8147a738>] ? __rt_mutex_slowlock+0x80/0xb9
[ 1250.199914]  [<ffffffff8147a886>] ? rt_mutex_slowlock+0xea/0x155
[ 1250.199916]  [<ffffffff811996dd>] ? xfs_setfilesize+0x4d/0x125
[ 1250.199918]  [<ffffffff8105ab1e>] ? finish_task_switch+0x60/0xac
[ 1250.199920]  [<ffffffff8119a338>] ? xfs_end_io+0x7f/0x93
[ 1250.199921]  [<ffffffff8104bb96>] ? process_one_work+0x1a0/0x2b7
[ 1250.199923]  [<ffffffff8104c09b>] ? worker_thread+0x130/0x1fb
[ 1250.199925]  [<ffffffff8104bf6b>] ? rescuer_thread+0x294/0x294
[ 1250.199927]  [<ffffffff81050e74>] ? kthread+0x86/0x8e
[ 1250.199928]  [<ffffffff81050dee>] ? __kthread_parkme+0x60/0x60
[ 1250.199930]  [<ffffffff8147ffdc>] ? ret_from_fork+0x7c/0xb0
[ 1250.199932]  [<ffffffff81050dee>] ? __kthread_parkme+0x60/0x60
[ 1250.199939] kworker/u16:5   D ffff88042db61e40     0   247      2 0x00000000
[ 1250.199970] Workqueue: writeback bdi_writeback_workfn (flush-8:0)
[ 1250.199979]  ffff88041a964580 0000000000000002 ffff88041af95f90 ffff88041803f628
[ 1250.199980]  0000000000061e40 ffff88041803ffd8 0000000000061e40 ffff88041803ffd8
[ 1250.199982]  ffff88041803f638 ffff88041a964580 ffff88041803f760 ffff88041803f758
[ 1250.199990] Call Trace:
[ 1250.199993]  [<ffffffff81479302>] ? console_conditional_schedule+0x1a/0x1a
[ 1250.199996]  [<ffffffff8147a22f>] ? schedule+0x70/0x81
[ 1250.199998]  [<ffffffff81479333>] ? schedule_timeout+0x31/0x128
[ 1250.200000]  [<ffffffff8147e115>] ? add_preempt_count+0xbc/0xe5
[ 1250.200001]  [<ffffffff8105cb3f>] ? migrate_enable+0x1d3/0x1e3
[ 1250.200003]  [<ffffffff8105c244>] ? get_parent_ip+0x9/0x1b
[ 1250.200004]  [<ffffffff8147e115>] ? add_preempt_count+0xbc/0xe5
[ 1250.200006]  [<ffffffff81479b11>] ? __wait_for_common+0x78/0xd6
[ 1250.200009]  [<ffffffff811c06ce>] ? xfs_bmapi_allocate+0x97/0xa3
[ 1250.200011]  [<ffffffff811c09fe>] ? xfs_bmapi_write+0x324/0x55d
[ 1250.200013]  [<ffffffff81102a7e>] ? kmem_cache_alloc+0x81/0x182
[ 1250.200021]  [<ffffffff811be490>] ? __xfs_bmapi_allocate+0x230/0x230
[ 1250.200023]  [<ffffffff811a6956>] ? xfs_iomap_write_allocate+0x1c1/0x2cd
[ 1250.200026]  [<ffffffff811999d8>] ? xfs_map_blocks+0x12a/0x1fa
[ 1250.200027]  [<ffffffff8119a9af>] ? xfs_vm_writepage+0x26b/0x494
[ 1250.200031]  [<ffffffff810ce259>] ? __writepage+0x12/0x2f
[ 1250.200034]  [<ffffffff810cecd4>] ? write_cache_pages+0x20c/0x307
[ 1250.200036]  [<ffffffff810ce247>] ? page_index+0x14/0x14
[ 1250.200038]  [<ffffffff810cee0f>] ? generic_writepages+0x40/0x5c
[ 1250.200040]  [<ffffffff8112feb9>] ? __writeback_single_inode+0x77/0x22b
[ 1250.200042]  [<ffffffff81131039>] ? writeback_sb_inodes+0x21a/0x372
[ 1250.200045]  [<ffffffff8113136f>] ? wb_writeback+0x12e/0x2ac
[ 1250.200047]  [<ffffffff811316f8>] ? wb_do_writeback+0x7e/0x1e1
[ 1250.200049]  [<ffffffff8104d699>] ? set_worker_desc+0x66/0x71
[ 1250.200051]  [<ffffffff811318c4>] ? bdi_writeback_workfn+0x69/0xfa
[ 1250.200053]  [<ffffffff8104bb96>] ? process_one_work+0x1a0/0x2b7
[ 1250.200055]  [<ffffffff8104c09b>] ? worker_thread+0x130/0x1fb
[ 1250.200057]  [<ffffffff8104bf6b>] ? rescuer_thread+0x294/0x294
[ 1250.200058]  [<ffffffff81050e74>] ? kthread+0x86/0x8e
[ 1250.200060]  [<ffffffff81050dee>] ? __kthread_parkme+0x60/0x60
[ 1250.200062]  [<ffffffff8147ffdc>] ? ret_from_fork+0x7c/0xb0
[ 1250.200063]  [<ffffffff81050dee>] ? __kthread_parkme+0x60/0x60
[ 1250.200070] xfsaild/sda5    D ffff88042da61e40     0   301      2 0x00000000
[ 1250.200117]  ffff8804194fdf90 0000000000000002 ffffffff81813400 ffff8804197bfdb0
[ 1250.200118]  0000000000061e40 ffff8804197bffd8 0000000000061e40 ffff8804197bffd8
[ 1250.200119]  ffff8804197bfe00 ffff8804194fdf90 0000000000000001 ffff8804182ed140
[ 1250.200127] Call Trace:
[ 1250.200130]  [<ffffffff8147a22f>] ? schedule+0x70/0x81
[ 1250.200132]  [<ffffffff814793f2>] ? schedule_timeout+0xf0/0x128
[ 1250.200135]  [<ffffffff8103f6d3>] ? ftrace_raw_event_timer_class+0x9d/0x9d
[ 1250.200138]  [<ffffffff811ec032>] ? xfsaild+0xcf/0x51e
[ 1250.200140]  [<ffffffff811ebf63>] ? xfs_trans_ail_cursor_first+0x7d/0x7d
[ 1250.200142]  [<ffffffff81050e74>] ? kthread+0x86/0x8e
[ 1250.200143]  [<ffffffff81050dee>] ? __kthread_parkme+0x60/0x60
[ 1250.200145]  [<ffffffff8147ffdc>] ? ret_from_fork+0x7c/0xb0
[ 1250.200147]  [<ffffffff81050dee>] ? __kthread_parkme+0x60/0x60
[ 1250.200179] shutdown        D ffff88042dbe1e40     0  8686   6726 0x00000000
[ 1250.200196]  ffff8804195bc580 0000000000000002 ffff88041af96840 0000000000000000
[ 1250.200197]  0000000000061e40 ffff8803ef239fd8 0000000000061e40 ffff8803ef239fd8
[ 1250.200203]  ffff88041a964bc8 ffff8804195bc580 ffff8803ef239eb0 ffff8803ef239ea8
[ 1250.200205] Call Trace:
[ 1250.200208]  [<ffffffff81479302>] ? console_conditional_schedule+0x1a/0x1a
[ 1250.200211]  [<ffffffff8147a22f>] ? schedule+0x70/0x81
[ 1250.200213]  [<ffffffff81479333>] ? schedule_timeout+0x31/0x128
[ 1250.200214]  [<ffffffff8105c244>] ? get_parent_ip+0x9/0x1b
[ 1250.200215]  [<ffffffff8105c244>] ? get_parent_ip+0x9/0x1b
[ 1250.200217]  [<ffffffff8147e115>] ? add_preempt_count+0xbc/0xe5
[ 1250.200219]  [<ffffffff81479b11>] ? __wait_for_common+0x78/0xd6
[ 1250.200221]  [<ffffffff81130d09>] ? sync_inodes_sb+0xb2/0x1c8
[ 1250.200224]  [<ffffffff8113438c>] ? fdatawrite_one_bdev+0x14/0x14
[ 1250.200226]  [<ffffffff81112ea0>] ? iterate_supers+0x72/0xcd
[ 1250.200228]  [<ffffffff811344f0>] ? sys_sync+0x33/0x86
[ 1250.200230]  [<ffffffff8148027d>] ? tracesys+0xdd/0xe2
[ 1250.200244] Sched Debug Version: v0.10, 3.10.24-rt22abs #1
[ 1250.200257] ktime                                   : 1248567.801938
[ 1250.200259] sched_clk                               : 1250200.243781
[ 1250.200260] cpu_clk                                 : 1250200.243828
[ 1250.200267] jiffies                                 : 4295915864
[ 1250.200269] sched_clock_stable                      : 1
[ 1250.200270] 
[ 1250.200271] sysctl_sched
[ 1250.200278]   .sysctl_sched_latency                    : 18.000000
[ 1250.200279]   .sysctl_sched_min_granularity            : 2.250000
[ 1250.200280]   .sysctl_sched_wakeup_granularity         : 3.000000
[ 1250.200281]   .sysctl_sched_child_runs_first           : 0
[ 1250.200288]   .sysctl_sched_features                   : 22139
[ 1250.200290]   .sysctl_sched_tunable_scaling            : 1 (logaritmic)
[ 1250.200292] 
[ 1250.200309] cpu#0, 2194.795 MHz
[ 1250.200310]   .nr_running                    : 0
[ 1250.200312]   .load                          : 0
[ 1250.200313]   .nr_switches                   : 16832676
[ 1250.200315]   .nr_load_updates               : 1248568
[ 1250.200317]   .nr_uninterruptible            : 37
[ 1250.200318]   .next_balance                  : 4295.915872
[ 1250.200326]   .curr->pid                     : 0
[ 1250.200328]   .clock                         : 1250199.449420
[ 1250.200329]   .cpu_load[0]                   : 0
[ 1250.200337]   .cpu_load[1]                   : 0
[ 1250.200338]   .cpu_load[2]                   : 0
[ 1250.200339]   .cpu_load[3]                   : 0
[ 1250.200340]   .cpu_load[4]                   : 0
[ 1250.200349] 
[ 1250.200349] cfs_rq[0]:/autogroup-187
[ 1250.200350]   .exec_clock                    : 0.000000
[ 1250.200352]   .MIN_vruntime                  : 0.000001
[ 1250.200359]   .min_vruntime                  : 17.683521
[ 1250.200360]   .max_vruntime                  : 0.000001
[ 1250.200362]   .spread                        : 0.000000
[ 1250.200380]   .spread0                       : -45523.266692
[ 1250.200381]   .nr_spread_over                : 0
[ 1250.200383]   .nr_running                    : 0
[ 1250.200384]   .load                          : 0
[ 1250.200385]   .runnable_load_avg             : 0
[ 1250.200387]   .blocked_load_avg              : 2
[ 1250.200399]   .tg_load_avg                   : 2
[ 1250.200400]   .tg_load_contrib               : 2
[ 1250.200401]   .tg_runnable_contrib           : 2
[ 1250.200402]   .tg->runnable_avg              : 2
[ 1250.200403]   .se->exec_start                : 1250199.360813
[ 1250.200405]   .se->vruntime                  : 45532.111840
[ 1250.200412]   .se->sum_exec_runtime          : 18.732097
[ 1250.200413]   .se->load.weight               : 2
[ 1250.200414]   .se->avg.runnable_avg_sum      : 107
[ 1250.200415]   .se->avg.runnable_avg_period   : 47150
[ 1250.200422]   .se->avg.load_avg_contrib      : 1
[ 1250.200423]   .se->avg.decay_count           : 1192284
[ 1250.200425] 
[ 1250.200425] cfs_rq[0]:/autogroup-23
[ 1250.200432]   .exec_clock                    : 0.000000
[ 1250.200434]   .MIN_vruntime                  : 0.000001
[ 1250.200435]   .min_vruntime                  : 7909.675864
[ 1250.200437]   .max_vruntime                  : 0.000001
[ 1250.200464]   .spread                        : 0.000000
[ 1250.200466]   .spread0                       : -37631.274349
[ 1250.200468]   .nr_spread_over                : 0
[ 1250.200469]   .nr_running                    : 0
[ 1250.200479]   .load                          : 0
[ 1250.200491]   .runnable_load_avg             : 0
[ 1250.200501]   .blocked_load_avg              : 0
[ 1250.200514]   .tg_load_avg                   : 0
[ 1250.200516]   .tg_load_contrib               : 0
[ 1250.200517]   .tg_runnable_contrib           : 0
[ 1250.200518]   .tg->runnable_avg              : 0
[ 1250.200520]   .se->exec_start                : 1250193.745232
[ 1250.200521]   .se->vruntime                  : 45534.164663
[ 1250.200523]   .se->sum_exec_runtime          : 5352.648881
[ 1250.200524]   .se->load.weight               : 2
[ 1250.200525]   .se->avg.runnable_avg_sum      : 8
[ 1250.200527]   .se->avg.runnable_avg_period   : 47624
[ 1250.200541]   .se->avg.load_avg_contrib      : 0
[ 1250.200543]   .se->avg.decay_count           : 1192278
[ 1250.200551] 
[ 1250.200551] cfs_rq[0]:/
[ 1250.200552]   .exec_clock                    : 0.000000
[ 1250.200553]   .MIN_vruntime                  : 0.000001
[ 1250.200555]   .min_vruntime                  : 45540.950213
[ 1250.200561]   .max_vruntime                  : 0.000001
[ 1250.200562]   .spread                        : 0.000000
[ 1250.200564]   .spread0                       : 0.000000
[ 1250.200565]   .nr_spread_over                : 0
[ 1250.200566]   .nr_running                    : 1
[ 1250.200575]   .load                          : 1024
[ 1250.200577]   .runnable_load_avg             : 0
[ 1250.200578]   .blocked_load_avg              : 0
[ 1250.200599]   .tg_load_avg                   : 0
[ 1250.200601]   .tg_load_contrib               : 0
[ 1250.200602]   .tg_runnable_contrib           : 30
[ 1250.200603]   .tg->runnable_avg              : 1230
[ 1250.200604]   .avg->runnable_avg_sum         : 1367
[ 1250.200605]   .avg->runnable_avg_period      : 46273
[ 1250.200607] 
[ 1250.200607] rt_rq[0]:
[ 1250.200615]   .rt_nr_running                 : 0
[ 1250.200616]   .rt_throttled                  : 0
[ 1250.200618]   .rt_time                       : 96.917503
[ 1250.200626]   .rt_runtime                    : 950.000000
[ 1250.200627]   .rt_nr_migratory               : 0
[ 1250.200629] 
[ 1250.200629] runnable tasks:
[ 1250.200629]             task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
[ 1250.200629] ----------------------------------------------------------------------------------------------------------
[ 1250.200675] R       rsyslogd  6368        18.253322        19   120               0               0               0.000000               0.000000               0.000000 /autogroup-167
[ 1250.200698] 
[ 1250.200699] cpu#1, 2194.795 MHz
[ 1250.200707]   .nr_running                    : 2
[ 1250.200709]   .load                          : 2048
[ 1250.200710]   .nr_switches                   : 40655522
[ 1250.200711]   .nr_load_updates               : 1231168
[ 1250.200721]   .nr_uninterruptible            : 46
[ 1250.200722]   .next_balance                  : 4295.915866
[ 1250.200724]   .curr->pid                     : 2015
[ 1250.200725]   .clock                         : 1250200.715602
[ 1250.200726]   .cpu_load[0]                   : 0
[ 1250.200727]   .cpu_load[1]                   : 0
[ 1250.200729]   .cpu_load[2]                   : 0
[ 1250.200730]   .cpu_load[3]                   : 0
[ 1250.200732]   .cpu_load[4]                   : 0
[ 1250.200743] 
[ 1250.200743] cfs_rq[1]:/autogroup-183
[ 1250.200745]   .exec_clock                    : 0.000000
[ 1250.200755]   .MIN_vruntime                  : 0.000001
[ 1250.200756]   .min_vruntime                  : 0.616862
[ 1250.200758]   .max_vruntime                  : 0.000001
[ 1250.200765]   .spread                        : 0.000000
[ 1250.200767]   .spread0                       : -45540.333351
[ 1250.200768]   .nr_spread_over                : 0
[ 1250.200770]   .nr_running                    : 1
[ 1250.200778]   .load                          : 1024
[ 1250.200779]   .runnable_load_avg             : 0
[ 1250.200780]   .blocked_load_avg              : 0
[ 1250.200789]   .tg_load_avg                   : 0
[ 1250.200790]   .tg_load_contrib               : 0
[ 1250.200791]   .tg_runnable_contrib           : 0
[ 1250.200793]   .tg->runnable_avg              : 0
[ 1250.200794]   .se->exec_start                : 1250200.777722
[ 1250.200796]   .se->vruntime                  : 36158.952186
[ 1250.200797]   .se->sum_exec_runtime          : 1.696888
[ 1250.200798]   .se->load.weight               : 2
[ 1250.200799]   .se->avg.runnable_avg_sum      : 73
[ 1250.200801]   .se->avg.runnable_avg_period   : 48632
[ 1250.200802]   .se->avg.load_avg_contrib      : 0
[ 1250.200804]   .se->avg.decay_count           : 1192285
[ 1250.200814] 
[ 1250.200814] cfs_rq[1]:/autogroup-167
[ 1250.200816]   .exec_clock                    : 0.000000
[ 1250.200818]   .MIN_vruntime                  : 58.228650
[ 1250.200819]   .min_vruntime                  : 58.228650
[ 1250.200827]   .max_vruntime                  : 58.228650
[ 1250.200828]   .spread                        : 0.000000
[ 1250.200830]   .spread0                       : -45482.721563
[ 1250.200831]   .nr_spread_over                : 0
[ 1250.200850]   .nr_running                    : 1
[ 1250.200852]   .load                          : 1024
[ 1250.200854]   .runnable_load_avg             : 0
[ 1250.200856]   .blocked_load_avg              : 0
[ 1250.200857]   .tg_load_avg                   : 0
[ 1250.200877]   .tg_load_contrib               : 0
[ 1250.200878]   .tg_runnable_contrib           : 0
[ 1250.200879]   .tg->runnable_avg              : 0
[ 1250.200880]   .se->exec_start                : 1250200.853999
[ 1250.200882]   .se->vruntime                  : 36159.054504
[ 1250.200883]   .se->sum_exec_runtime          : 17.862348
[ 1250.200901]   .se->load.weight               : 1024
[ 1250.200912]   .se->avg.runnable_avg_sum      : 274
[ 1250.200913]   .se->avg.runnable_avg_period   : 48740
[ 1250.200914]   .se->avg.load_avg_contrib      : 0
[ 1250.200916]   .se->avg.decay_count           : 1192285
[ 1250.200925] 
[ 1250.200925] cfs_rq[1]:/
[ 1250.200927]   .exec_clock                    : 0.000000
[ 1250.200946]   .MIN_vruntime                  : 36159.087569
[ 1250.200947]   .min_vruntime                  : 36167.920736
[ 1250.200949]   .max_vruntime                  : 36159.087569
[ 1250.200950]   .spread                        : 0.000000
[ 1250.200951]   .spread0                       : -9373.029477
[ 1250.200959]   .nr_spread_over                : 0
[ 1250.200961]   .nr_running                    : 1
[ 1250.200962]   .load                          : 1024
[ 1250.200963]   .runnable_load_avg             : 0
[ 1250.200971]   .blocked_load_avg              : 0
[ 1250.200973]   .tg_load_avg                   : 0
[ 1250.200974]   .tg_load_contrib               : 0
[ 1250.200981]   .tg_runnable_contrib           : 445
[ 1250.200983]   .tg->runnable_avg              : 1247
[ 1250.200986]   .avg->runnable_avg_sum         : 20193
[ 1250.200993]   .avg->runnable_avg_period      : 46452
[ 1250.200995] 
[ 1250.200995] rt_rq[1]:
[ 1250.200997]   .rt_nr_running                 : 1
[ 1250.200999]   .rt_throttled                  : 0
[ 1250.201000]   .rt_time                       : 201.849919
[ 1250.201031]   .rt_runtime                    : 950.000000
[ 1250.201032]   .rt_nr_migratory               : 1
[ 1250.201034] 
[ 1250.201034] runnable tasks:
[ 1250.201034]             task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
[ 1250.201034] ----------------------------------------------------------------------------------------------------------
[ 1250.201071] R    irq/18-can1  2015         0.000000  17472905    49               0               0               0.000000               0.000000               0.000000 /
[ 1250.201082]    rs:main Q:Reg  6366        58.343361       757   120               0               0               0.000000               0.000000               0.000000 /autogroup-167
[ 1250.201100] 
[ 1250.201108] cpu#2, 2194.795 MHz
[ 1250.201109]   .nr_running                    : 0
[ 1250.201111]   .load                          : 0
[ 1250.201119]   .nr_switches                   : 74339322
[ 1250.201120]   .nr_load_updates               : 1289248
[ 1250.201122]   .nr_uninterruptible            : -139
[ 1250.201130]   .next_balance                  : 4295.915866
[ 1250.201131]   .curr->pid                     : 0
[ 1250.201133]   .clock                         : 1250201.079957
[ 1250.201148]   .cpu_load[0]                   : 0
[ 1250.201149]   .cpu_load[1]                   : 0
[ 1250.201150]   .cpu_load[2]                   : 0
[ 1250.201151]   .cpu_load[3]                   : 0
[ 1250.201162]   .cpu_load[4]                   : 0
[ 1250.201164] 
[ 1250.201164] cfs_rq[2]:/autogroup-23
[ 1250.201166]   .exec_clock                    : 0.000000
[ 1250.201174]   .MIN_vruntime                  : 0.000001
[ 1250.201176]   .min_vruntime                  : 3636.083046
[ 1250.201177]   .max_vruntime                  : 0.000001
[ 1250.201186]   .spread                        : 0.000000
[ 1250.201191]   .spread0                       : -41904.867167
[ 1250.201200]   .nr_spread_over                : 0
[ 1250.201202]   .nr_running                    : 0
[ 1250.201204]   .load                          : 0
[ 1250.201237]   .runnable_load_avg             : 0
[ 1250.201239]   .blocked_load_avg              : 0
[ 1250.201241]   .tg_load_avg                   : 0
[ 1250.201242]   .tg_load_contrib               : 0
[ 1250.201244]   .tg_runnable_contrib           : 0
[ 1250.201245]   .tg->runnable_avg              : 0
[ 1250.201247]   .se->exec_start                : 1250123.293600
[ 1250.201248]   .se->vruntime                  : 44269.300196
[ 1250.201250]   .se->sum_exec_runtime          : 2291.524471
[ 1250.201252]   .se->load.weight               : 2
[ 1250.201254]   .se->avg.runnable_avg_sum      : 15
[ 1250.201256]   .se->avg.runnable_avg_period   : 47514
[ 1250.201258]   .se->avg.load_avg_contrib      : 0
[ 1250.201259]   .se->avg.decay_count           : 1192211
[ 1250.201260] 
[ 1250.201260] cfs_rq[2]:/
[ 1250.201262]   .exec_clock                    : 0.000000
[ 1250.201264]   .MIN_vruntime                  : 0.000001
[ 1250.201266]   .min_vruntime                  : 44278.083100
[ 1250.201268]   .max_vruntime                  : 0.000001
[ 1250.201269]   .spread                        : 0.000000
[ 1250.201271]   .spread0                       : -1262.867113
[ 1250.201272]   .nr_spread_over                : 0
[ 1250.201274]   .nr_running                    : 0
[ 1250.201295]   .load                          : 0
[ 1250.201296]   .runnable_load_avg             : 0
[ 1250.201297]   .blocked_load_avg              : 0
[ 1250.201299]   .tg_load_avg                   : 0
[ 1250.201301]   .tg_load_contrib               : 0
[ 1250.201302]   .tg_runnable_contrib           : 302
[ 1250.201303]   .tg->runnable_avg              : 1252
[ 1250.201304]   .avg->runnable_avg_sum         : 13644
[ 1250.201306]   .avg->runnable_avg_period      : 46555
[ 1250.201307] 
[ 1250.201307] rt_rq[2]:
[ 1250.201308]   .rt_nr_running                 : 0
[ 1250.201310]   .rt_throttled                  : 0
[ 1250.201311]   .rt_time                       : 173.402316
[ 1250.201313]   .rt_runtime                    : 950.000000
[ 1250.201315]   .rt_nr_migratory               : 0
[ 1250.201316] 
[ 1250.201316] runnable tasks:
[ 1250.201316]             task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
[ 1250.201316] ----------------------------------------------------------------------------------------------------------
[ 1250.201337] 
[ 1250.201339] cpu#3, 2194.795 MHz
[ 1250.201341]   .nr_running                    : 1
[ 1250.201342]   .load                          : 1024
[ 1250.201344]   .nr_switches                   : 102749194
[ 1250.201345]   .nr_load_updates               : 1287421
[ 1250.201363]   .nr_uninterruptible            : 61
[ 1250.201365]   .next_balance                  : 4295.915885
[ 1250.201366]   .curr->pid                     : 8718
[ 1250.201367]   .clock                         : 1250201.361690
[ 1250.201368]   .cpu_load[0]                   : 1024
[ 1250.201369]   .cpu_load[1]                   : 768
[ 1250.201371]   .cpu_load[2]                   : 448
[ 1250.201372]   .cpu_load[3]                   : 240
[ 1250.201374]   .cpu_load[4]                   : 124
[ 1250.201376] 
[ 1250.201376] cfs_rq[3]:/autogroup-188
[ 1250.201378]   .exec_clock                    : 0.000000
[ 1250.201379]   .MIN_vruntime                  : 0.000001
[ 1250.201381]   .min_vruntime                  : 398.623639
[ 1250.201382]   .max_vruntime                  : 0.000001
[ 1250.201384]   .spread                        : 0.000000
[ 1250.201385]   .spread0                       : -45142.326574
[ 1250.201386]   .nr_spread_over                : 0
[ 1250.201388]   .nr_running                    : 1
[ 1250.201390]   .load                          : 1024
[ 1250.201391]   .runnable_load_avg             : 22
[ 1250.201392]   .blocked_load_avg              : 0
[ 1250.201393]   .tg_load_avg                   : 0
[ 1250.201395]   .tg_load_contrib               : 0
[ 1250.201396]   .tg_runnable_contrib           : 23
[ 1250.201398]   .tg->runnable_avg              : 23
[ 1250.201400]   .se->exec_start                : 1250201.361690
[ 1250.201401]   .se->vruntime                  : 53370.693519
[ 1250.201402]   .se->sum_exec_runtime          : 83.378026
[ 1250.201404]   .se->load.weight               : 1024
[ 1250.201405]   .se->avg.runnable_avg_sum      : 1940
[ 1250.201406]   .se->avg.runnable_avg_period   : 47013
[ 1250.201408]   .se->avg.load_avg_contrib      : 0
[ 1250.201409]   .se->avg.decay_count           : 0
[ 1250.201411] 
[ 1250.201411] cfs_rq[3]:/
[ 1250.201413]   .exec_clock                    : 0.000000
[ 1250.201414]   .MIN_vruntime                  : 0.000001
[ 1250.201415]   .min_vruntime                  : 53370.693519
[ 1250.201433]   .max_vruntime                  : 0.000001
[ 1250.201435]   .spread                        : 0.000000
[ 1250.201437]   .spread0                       : 7829.743306
[ 1250.201438]   .nr_spread_over                : 0
[ 1250.201439]   .nr_running                    : 1
[ 1250.201440]   .load                          : 1024
[ 1250.201441]   .runnable_load_avg             : 0
[ 1250.201442]   .blocked_load_avg              : 0
[ 1250.201455]   .tg_load_avg                   : 42
[ 1250.201457]   .tg_load_contrib               : 42
[ 1250.201459]   .tg_runnable_contrib           : 488
[ 1250.201461]   .tg->runnable_avg              : 1285
[ 1250.201462]   .avg->runnable_avg_sum         : 22203
[ 1250.201463]   .avg->runnable_avg_period      : 46581
[ 1250.201464] 
[ 1250.201464] rt_rq[3]:
[ 1250.201466]   .rt_nr_running                 : 0
[ 1250.201467]   .rt_throttled                  : 0
[ 1250.201468]   .rt_time                       : 221.302805
[ 1250.201470]   .rt_runtime                    : 950.000000
[ 1250.201471]   .rt_nr_migratory               : 0
[ 1250.201473] 
[ 1250.201473] runnable tasks:
[ 1250.201473]             task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
[ 1250.201473] ----------------------------------------------------------------------------------------------------------
[ 1250.201510] R           bash  8718       398.728001      6342   120               0               0               0.000000               0.000000               0.000000 /autogroup-188
[ 1250.201516] 
[ 1322.381902] INFO: task kworker/3:1:91 blocked for more than 120 seconds.
[ 1322.382075] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1322.382220] kworker/3:1     D ffff88042dbe1e40     0    91      2 0x00000000
[ 1322.382235] Workqueue: xfs-data/sda5 xfs_end_io
[ 1322.382245]  ffff88041a8d70f0 0000000000000002 ffff88041af96840 0000000000000000
[ 1322.382249]  0000000000061e40 ffff8804195ddfd8 0000000000061e40 ffff8804195ddfd8
[ 1322.382262]  0000000000000400 ffff88041a8d70f0 ffff880357bb2b40 0000000000000002
[ 1322.382265] Call Trace:
[ 1322.382273]  [<ffffffff8147a22f>] ? schedule+0x70/0x81
[ 1322.382276]  [<ffffffff8147a738>] ? __rt_mutex_slowlock+0x80/0xb9
[ 1322.382277]  [<ffffffff8147a886>] ? rt_mutex_slowlock+0xea/0x155
[ 1322.382280]  [<ffffffff811996dd>] ? xfs_setfilesize+0x4d/0x125
[ 1322.382283]  [<ffffffff8105ab1e>] ? finish_task_switch+0x60/0xac
[ 1322.382285]  [<ffffffff8119a338>] ? xfs_end_io+0x7f/0x93
[ 1322.382288]  [<ffffffff8104bb96>] ? process_one_work+0x1a0/0x2b7
[ 1322.382290]  [<ffffffff8104c09b>] ? worker_thread+0x130/0x1fb
[ 1322.382291]  [<ffffffff8104bf6b>] ? rescuer_thread+0x294/0x294
[ 1322.382293]  [<ffffffff81050e74>] ? kthread+0x86/0x8e
[ 1322.382295]  [<ffffffff81050dee>] ? __kthread_parkme+0x60/0x60
[ 1322.382298]  [<ffffffff8147ffdc>] ? ret_from_fork+0x7c/0xb0
[ 1322.382299]  [<ffffffff81050dee>] ? __kthread_parkme+0x60/0x60
[ 1322.382306] INFO: task kworker/2:1:196 blocked for more than 120 seconds.
[ 1322.382503] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1322.382649] kworker/2:1     D ffff88042db61e40     0   196      2 0x00000000
[ 1322.382653] Workqueue: xfs-data/sda5 xfs_end_io
[ 1322.382661]  ffff8804182c0000 0000000000000002 ffff88041a964580 ffff880357bb2af8
[ 1322.382663]  0000000000061e40 ffff8804182c9fd8 0000000000061e40 ffff8804182c9fd8
[ 1322.382672]  0000000000000000 ffff8804182c0000 ffff880357bb2b40 0000000000000002
[ 1322.382678] Call Trace:
[ 1322.382682]  [<ffffffff8147a22f>] ? schedule+0x70/0x81
[ 1322.382683]  [<ffffffff8147a738>] ? __rt_mutex_slowlock+0x80/0xb9
[ 1322.382689]  [<ffffffff8147a886>] ? rt_mutex_slowlock+0xea/0x155
[ 1322.382691]  [<ffffffff811996dd>] ? xfs_setfilesize+0x4d/0x125
[ 1322.382693]  [<ffffffff8105ab1e>] ? finish_task_switch+0x60/0xac
[ 1322.382694]  [<ffffffff8119a338>] ? xfs_end_io+0x7f/0x93
[ 1322.382696]  [<ffffffff8104bb96>] ? process_one_work+0x1a0/0x2b7
[ 1322.382698]  [<ffffffff8104c09b>] ? worker_thread+0x130/0x1fb
[ 1322.382700]  [<ffffffff8104bf6b>] ? rescuer_thread+0x294/0x294
[ 1322.382701]  [<ffffffff81050e74>] ? kthread+0x86/0x8e
[ 1322.382703]  [<ffffffff81050dee>] ? __kthread_parkme+0x60/0x60
[ 1322.382705]  [<ffffffff8147ffdc>] ? ret_from_fork+0x7c/0xb0
[ 1322.382707]  [<ffffffff81050dee>] ? __kthread_parkme+0x60/0x60
[ 1322.382714] INFO: task kworker/u16:5:247 blocked for more than 120 seconds.
[ 1322.382864] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1322.383034] kworker/u16:5   D ffff88042db61e40     0   247      2 0x00000000
[ 1322.383052] Workqueue: writeback bdi_writeback_workfn (flush-8:0)
[ 1322.383067]  ffff88041a964580 0000000000000002 ffff88041af95f90 ffff88041803f628
[ 1322.383072]  0000000000061e40 ffff88041803ffd8 0000000000061e40 ffff88041803ffd8
[ 1322.383702]  ffff88041803f638 ffff88041a964580 ffff88041803f760 ffff88041803f758
[ 1322.383707] Call Trace:
[ 1322.383715]  [<ffffffff81479302>] ? console_conditional_schedule+0x1a/0x1a
[ 1322.383728]  [<ffffffff8147a22f>] ? schedule+0x70/0x81
[ 1322.383743]  [<ffffffff81479333>] ? schedule_timeout+0x31/0x128
[ 1322.383746]  [<ffffffff8147e115>] ? add_preempt_count+0xbc/0xe5
[ 1322.383748]  [<ffffffff8105cb3f>] ? migrate_enable+0x1d3/0x1e3
[ 1322.383750]  [<ffffffff8105c244>] ? get_parent_ip+0x9/0x1b
[ 1322.383752]  [<ffffffff8147e115>] ? add_preempt_count+0xbc/0xe5
[ 1322.383755]  [<ffffffff81479b11>] ? __wait_for_common+0x78/0xd6
[ 1322.383758]  [<ffffffff811c06ce>] ? xfs_bmapi_allocate+0x97/0xa3
[ 1322.383760]  [<ffffffff811c09fe>] ? xfs_bmapi_write+0x324/0x55d
[ 1322.383763]  [<ffffffff81102a7e>] ? kmem_cache_alloc+0x81/0x182
[ 1322.383767]  [<ffffffff811be490>] ? __xfs_bmapi_allocate+0x230/0x230
[ 1322.383769]  [<ffffffff811a6956>] ? xfs_iomap_write_allocate+0x1c1/0x2cd
[ 1322.383772]  [<ffffffff811999d8>] ? xfs_map_blocks+0x12a/0x1fa
[ 1322.383774]  [<ffffffff8119a9af>] ? xfs_vm_writepage+0x26b/0x494
[ 1322.383778]  [<ffffffff810ce259>] ? __writepage+0x12/0x2f
[ 1322.383780]  [<ffffffff810cecd4>] ? write_cache_pages+0x20c/0x307
[ 1322.383781]  [<ffffffff810ce247>] ? page_index+0x14/0x14
[ 1322.383784]  [<ffffffff810cee0f>] ? generic_writepages+0x40/0x5c
[ 1322.383788]  [<ffffffff8112feb9>] ? __writeback_single_inode+0x77/0x22b
[ 1322.383790]  [<ffffffff81131039>] ? writeback_sb_inodes+0x21a/0x372
[ 1322.383792]  [<ffffffff8113136f>] ? wb_writeback+0x12e/0x2ac
[ 1322.383795]  [<ffffffff811316f8>] ? wb_do_writeback+0x7e/0x1e1
[ 1322.383798]  [<ffffffff8104d699>] ? set_worker_desc+0x66/0x71
[ 1322.383801]  [<ffffffff811318c4>] ? bdi_writeback_workfn+0x69/0xfa
[ 1322.383803]  [<ffffffff8104bb96>] ? process_one_work+0x1a0/0x2b7
[ 1322.383805]  [<ffffffff8104c09b>] ? worker_thread+0x130/0x1fb
[ 1322.383807]  [<ffffffff8104bf6b>] ? rescuer_thread+0x294/0x294
[ 1322.383809]  [<ffffffff81050e74>] ? kthread+0x86/0x8e
[ 1322.383811]  [<ffffffff81050dee>] ? __kthread_parkme+0x60/0x60
[ 1322.383814]  [<ffffffff8147ffdc>] ? ret_from_fork+0x7c/0xb0
[ 1322.383816]  [<ffffffff81050dee>] ? __kthread_parkme+0x60/0x60
[ 1322.383859] INFO: task shutdown:8686 blocked for more than 120 seconds.
[ 1322.384186] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1322.384346] shutdown        D ffff88042dbe1e40     0  8686   6726 0x00000000
[ 1322.384394]  ffff8804195bc580 0000000000000002 ffff88041af96840 0000000000000000
[ 1322.384396]  0000000000061e40 ffff8803ef239fd8 0000000000061e40 ffff8803ef239fd8
[ 1322.384399]  ffff88041a964bc8 ffff8804195bc580 ffff8803ef239eb0 ffff8803ef239ea8
[ 1322.384403] Call Trace:
[ 1322.384409]  [<ffffffff81479302>] ? console_conditional_schedule+0x1a/0x1a
[ 1322.384411]  [<ffffffff8147a22f>] ? schedule+0x70/0x81
[ 1322.384414]  [<ffffffff81479333>] ? schedule_timeout+0x31/0x128
[ 1322.384417]  [<ffffffff8105c244>] ? get_parent_ip+0x9/0x1b
[ 1322.384425]  [<ffffffff8105c244>] ? get_parent_ip+0x9/0x1b
[ 1322.384435]  [<ffffffff8147e115>] ? add_preempt_count+0xbc/0xe5
[ 1322.384438]  [<ffffffff81479b11>] ? __wait_for_common+0x78/0xd6
[ 1322.384441]  [<ffffffff81130d09>] ? sync_inodes_sb+0xb2/0x1c8
[ 1322.384444]  [<ffffffff8113438c>] ? fdatawrite_one_bdev+0x14/0x14
[ 1322.384447]  [<ffffffff81112ea0>] ? iterate_supers+0x72/0xcd
[ 1322.384449]  [<ffffffff811344f0>] ? sys_sync+0x33/0x86
[ 1322.384452]  [<ffffffff8148027d>] ? tracesys+0xdd/0xe2

[-- Attachment #3: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: XFS crash?
  2014-05-13  4:03         ` Austin Schuh
@ 2014-05-13  6:39           ` Dave Chinner
  2014-05-13  7:02             ` Austin Schuh
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2014-05-13  6:39 UTC (permalink / raw)
  To: Austin Schuh; +Cc: xfs

On Mon, May 12, 2014 at 09:03:48PM -0700, Austin Schuh wrote:
> On Mon, May 12, 2014 at 8:46 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Mon, May 12, 2014 at 06:29:28PM -0700, Austin Schuh wrote:
> >> On Wed, Mar 5, 2014 at 4:53 PM, Austin Schuh <austin@peloton-tech.com> wrote:
> >> > Hi Dave,
> >> >
> >> > On Wed, Mar 5, 2014 at 3:35 PM, Dave Chinner <david@fromorbit.com> wrote:
> >> >> On Wed, Mar 05, 2014 at 03:08:16PM -0800, Austin Schuh wrote:
> >> >>> Howdy,
> >> >>>
> >> >>> I'm running a config_preempt_rt patched version of the 3.10.11 kernel,
> >> >>> and I'm seeing a couple lockups and crashes which I think are related
> >> >>> to XFS.
> >> >>
> >> >> I think they ar emore likely related to RT issues....
> >> >>
> >> >
> >> > That very well may be true.
> >> >
> >> >> Your usb device has disconnected and gone down the device
> >> >> removal/invalidate partition route. and it's trying to flush the
> >> >> device, which is stuck on IO completion which is stuck waiting for
> >> >> the device error handling to error them out.
> >> >>
> >> >> So, this is a block device problem error handling problem caused by
> >> >> device unplug getting stuck because it's decided to ask the
> >> >> filesystem to complete operations that can't be completed until the
> >> >> device error handling progress far enough to error out the IOs that
> >> >> the filesystem is waiting for completion on.
> >> >>
> >> >> Cheers,
> >> >>
> >> >> Dave.
> >> >> --
> >> >> Dave Chinner
> >> >> david@fromorbit.com
> >>
> >> I had the issue reproduce itself today with just the main SSD
> >> installed.  This was on a new machine that was built this morning.
> >> There is a lot less going on in this trace than the previous one.
> >
> > The three blocked threads:
> >
> >         1. kworker running IO completion waiting on an inode lock,
> >            holding locked pages.
> >         2. kworker running writeback flusher work waiting for a page lock
> >         3. direct flush work waiting for allocation, holding page
> >            locks and the inode lock.
> >
> > What's the kworker thread running the allocation work doing?
> >
> > You might need to run `echo w > proc-sysrq-trigger` to get this
> > information...
> 
> I was able to reproduce the lockup.  I ran `echo w >
> /proc/sysrq-trigger` per your suggestion.  I don't know how to figure
> out what the kworker thread is doing, but I'll happily do it if you
> can give me some guidance.

There isn't a worker thread blocked doing an allocation in that
dump, so it doesn't shed any light on the problem at all. try
`echo l > /proc/sysrq-trigger`, followed by `echo t >
/proc/sysrq-trigger` so we can see all the processes running on CPUs
and all the processes in the system...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: XFS crash?
  2014-05-13  6:39           ` Dave Chinner
@ 2014-05-13  7:02             ` Austin Schuh
  2014-05-13  9:03               ` Dave Chinner
  0 siblings, 1 reply; 30+ messages in thread
From: Austin Schuh @ 2014-05-13  7:02 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

[-- Attachment #1: Type: text/plain, Size: 2320 bytes --]

On Mon, May 12, 2014 at 11:39 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Mon, May 12, 2014 at 09:03:48PM -0700, Austin Schuh wrote:
>> On Mon, May 12, 2014 at 8:46 PM, Dave Chinner <david@fromorbit.com> wrote:
>> > On Mon, May 12, 2014 at 06:29:28PM -0700, Austin Schuh wrote:
>> >> On Wed, Mar 5, 2014 at 4:53 PM, Austin Schuh <austin@peloton-tech.com> wrote:
>> >> > Hi Dave,
>> >> >
>> >> > On Wed, Mar 5, 2014 at 3:35 PM, Dave Chinner <david@fromorbit.com> wrote:
>> >> >> On Wed, Mar 05, 2014 at 03:08:16PM -0800, Austin Schuh wrote:
>> >> >>> Howdy,
>> >> >>>
>> >> >>> I'm running a config_preempt_rt patched version of the 3.10.11 kernel,
>> >> >>> and I'm seeing a couple lockups and crashes which I think are related
>> >> >>> to XFS.
>> >> >>
>> >> >> I think they ar emore likely related to RT issues....
>> >> >>
>> >> >
>> >> > That very well may be true.
>> >> >
>> >> >> Cheers,
>> >> >>
>> >> >> Dave.
>> >> >> --
>> >> >> Dave Chinner
>> >>
>> >> I had the issue reproduce itself today with just the main SSD
>> >> installed.  This was on a new machine that was built this morning.
>> >> There is a lot less going on in this trace than the previous one.
>> >
>> > The three blocked threads:
>> >
>> >         1. kworker running IO completion waiting on an inode lock,
>> >            holding locked pages.
>> >         2. kworker running writeback flusher work waiting for a page lock
>> >         3. direct flush work waiting for allocation, holding page
>> >            locks and the inode lock.
>> >
>> > What's the kworker thread running the allocation work doing?
>> >
>> > You might need to run `echo w > proc-sysrq-trigger` to get this
>> > information...
>>
>> I was able to reproduce the lockup.  I ran `echo w >
>> /proc/sysrq-trigger` per your suggestion.  I don't know how to figure
>> out what the kworker thread is doing, but I'll happily do it if you
>> can give me some guidance.
>
> There isn't a worker thread blocked doing an allocation in that
> dump, so it doesn't shed any light on the problem at all. try
> `echo l > /proc/sysrq-trigger`, followed by `echo t >
> /proc/sysrq-trigger` so we can see all the processes running on CPUs
> and all the processes in the system...
>
> Cheers,
>
> Dave.

Attached is the output of the two commands you asked for.

Thanks,

Austin

[-- Attachment #2: sysrq_dump.bz2 --]
[-- Type: application/x-bzip2, Size: 30640 bytes --]

[-- Attachment #3: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: XFS crash?
  2014-05-13  7:02             ` Austin Schuh
@ 2014-05-13  9:03               ` Dave Chinner
  2014-05-13 17:11                 ` Austin Schuh
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2014-05-13  9:03 UTC (permalink / raw)
  To: Austin Schuh; +Cc: xfs

On Tue, May 13, 2014 at 12:02:18AM -0700, Austin Schuh wrote:
> On Mon, May 12, 2014 at 11:39 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Mon, May 12, 2014 at 09:03:48PM -0700, Austin Schuh wrote:
> >> On Mon, May 12, 2014 at 8:46 PM, Dave Chinner <david@fromorbit.com> wrote:
> >> > On Mon, May 12, 2014 at 06:29:28PM -0700, Austin Schuh wrote:
> >> >> On Wed, Mar 5, 2014 at 4:53 PM, Austin Schuh <austin@peloton-tech.com> wrote:
> >> >> > Hi Dave,
> >> >> >
> >> >> > On Wed, Mar 5, 2014 at 3:35 PM, Dave Chinner <david@fromorbit.com> wrote:
> >> >> >> On Wed, Mar 05, 2014 at 03:08:16PM -0800, Austin Schuh wrote:
> >> >> >>> Howdy,
> >> >> >>>
> >> >> >>> I'm running a config_preempt_rt patched version of the 3.10.11 kernel,
> >> >> >>> and I'm seeing a couple lockups and crashes which I think are related
> >> >> >>> to XFS.
> >> >> >>
> >> >> >> I think they ar emore likely related to RT issues....
> >> >> >>
> >> >> >
> >> >> > That very well may be true.
> >> >> >
> >> >> >> Cheers,
> >> >> >>
> >> >> >> Dave.
> >> >> >> --
> >> >> >> Dave Chinner
> >> >>
> >> >> I had the issue reproduce itself today with just the main SSD
> >> >> installed.  This was on a new machine that was built this morning.
> >> >> There is a lot less going on in this trace than the previous one.
> >> >
> >> > The three blocked threads:
> >> >
> >> >         1. kworker running IO completion waiting on an inode lock,
> >> >            holding locked pages.
> >> >         2. kworker running writeback flusher work waiting for a page lock
> >> >         3. direct flush work waiting for allocation, holding page
> >> >            locks and the inode lock.
> >> >
> >> > What's the kworker thread running the allocation work doing?
> >> >
> >> > You might need to run `echo w > proc-sysrq-trigger` to get this
> >> > information...
> >>
> >> I was able to reproduce the lockup.  I ran `echo w >
> >> /proc/sysrq-trigger` per your suggestion.  I don't know how to figure
> >> out what the kworker thread is doing, but I'll happily do it if you
> >> can give me some guidance.
> >
> > There isn't a worker thread blocked doing an allocation in that
> > dump, so it doesn't shed any light on the problem at all. try
> > `echo l > /proc/sysrq-trigger`, followed by `echo t >
> > /proc/sysrq-trigger` so we can see all the processes running on CPUs
> > and all the processes in the system...
> >
> > Cheers,
> >
> > Dave.
> 
> Attached is the output of the two commands you asked for.

Nothing there. There's lots of processes waiting for allocation to
run, and no kworkers running allocation work. This looks more
like a rt-kernel workqueue issue, not an XFS problem.

FWIW, it woul dbe really helpful if you compiled your kernels with
frame pointers enabled - the stack traces are much more precise and
readable (i.e. gets rid of all the false/stale entrys) and that
helps understanding where things are stuck immensely.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: XFS crash?
  2014-05-13  9:03               ` Dave Chinner
@ 2014-05-13 17:11                 ` Austin Schuh
  2014-06-23 20:05                   ` Austin Schuh
  0 siblings, 1 reply; 30+ messages in thread
From: Austin Schuh @ 2014-05-13 17:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, May 13, 2014 at 2:03 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Tue, May 13, 2014 at 12:02:18AM -0700, Austin Schuh wrote:
>> On Mon, May 12, 2014 at 11:39 PM, Dave Chinner <david@fromorbit.com> wrote:
>> > On Mon, May 12, 2014 at 09:03:48PM -0700, Austin Schuh wrote:
>> >> On Mon, May 12, 2014 at 8:46 PM, Dave Chinner <david@fromorbit.com> wrote:
>> >> > On Mon, May 12, 2014 at 06:29:28PM -0700, Austin Schuh wrote:
>> >> >> On Wed, Mar 5, 2014 at 4:53 PM, Austin Schuh <austin@peloton-tech.com> wrote:
>> >> >> > Hi Dave,
>> >> >> >
>> >> >> > On Wed, Mar 5, 2014 at 3:35 PM, Dave Chinner <david@fromorbit.com> wrote:
>> >> >> >> On Wed, Mar 05, 2014 at 03:08:16PM -0800, Austin Schuh wrote:
>> >> >> >>> Howdy,
>> >> >> >>>
>> >> >> >>> I'm running a config_preempt_rt patched version of the 3.10.11 kernel,
>> >> >> >>> and I'm seeing a couple lockups and crashes which I think are related
>> >> >> >>> to XFS.
>> >> >> >>
>> >> >> >> I think they ar emore likely related to RT issues....
>> >> >> >>
>> >> >> >
>> >> >> > That very well may be true.
>> >> >> >
>> >> >> >> Cheers,
>> >> >> >>
>> >> >> >> Dave.
>> >> >> >> --
>> >> >> >> Dave Chinner
>> >> >>
>> >> >> I had the issue reproduce itself today with just the main SSD
>> >> >> installed.  This was on a new machine that was built this morning.
>> >> >> There is a lot less going on in this trace than the previous one.
>> >> >
>> >> > The three blocked threads:
>> >> >
>> >> >         1. kworker running IO completion waiting on an inode lock,
>> >> >            holding locked pages.
>> >> >         2. kworker running writeback flusher work waiting for a page lock
>> >> >         3. direct flush work waiting for allocation, holding page
>> >> >            locks and the inode lock.
>> >> >
>> >> > What's the kworker thread running the allocation work doing?
>> >> >
>> >> > You might need to run `echo w > proc-sysrq-trigger` to get this
>> >> > information...
>> >>
>> >> I was able to reproduce the lockup.  I ran `echo w >
>> >> /proc/sysrq-trigger` per your suggestion.  I don't know how to figure
>> >> out what the kworker thread is doing, but I'll happily do it if you
>> >> can give me some guidance.
>> >
>> > There isn't a worker thread blocked doing an allocation in that
>> > dump, so it doesn't shed any light on the problem at all. try
>> > `echo l > /proc/sysrq-trigger`, followed by `echo t >
>> > /proc/sysrq-trigger` so we can see all the processes running on CPUs
>> > and all the processes in the system...
>> >
>> > Cheers,
>> >
>> > Dave.
>>
>> Attached is the output of the two commands you asked for.
>
> Nothing there. There's lots of processes waiting for allocation to
> run, and no kworkers running allocation work. This looks more
> like a rt-kernel workqueue issue, not an XFS problem.
>
> FWIW, it woul dbe really helpful if you compiled your kernels with
> frame pointers enabled - the stack traces are much more precise and
> readable (i.e. gets rid of all the false/stale entrys) and that
> helps understanding where things are stuck immensely.
>
> Cheers,
>
> Dave.

Thanks Dave.

I'll go check with the rt-kernel guys and take it from there.  Thanks
for the frame pointers suggestion.  I'll make that change the next
time I build a kernel.

Austin

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: XFS crash?
  2014-05-13 17:11                 ` Austin Schuh
@ 2014-06-23 20:05                   ` Austin Schuh
  2014-06-24  3:02                       ` Dave Chinner
  0 siblings, 1 reply; 30+ messages in thread
From: Austin Schuh @ 2014-06-23 20:05 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 9223 bytes --]

Hi Dave,

On Tue, May 13, 2014 at 10:11 AM, Austin Schuh <austin@peloton-tech.com>
wrote:

> On Tue, May 13, 2014 at 2:03 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Tue, May 13, 2014 at 12:02:18AM -0700, Austin Schuh wrote:
> >> On Mon, May 12, 2014 at 11:39 PM, Dave Chinner <david@fromorbit.com>
> wrote:
> >> > On Mon, May 12, 2014 at 09:03:48PM -0700, Austin Schuh wrote:
> >> >> On Mon, May 12, 2014 at 8:46 PM, Dave Chinner <david@fromorbit.com>
> wrote:
> >> >> > On Mon, May 12, 2014 at 06:29:28PM -0700, Austin Schuh wrote:
> >> >> >> On Wed, Mar 5, 2014 at 4:53 PM, Austin Schuh <
> austin@peloton-tech.com> wrote:
> >> >> >> > Hi Dave,
> >> >> >> >
> >> >> >> > On Wed, Mar 5, 2014 at 3:35 PM, Dave Chinner <
> david@fromorbit.com> wrote:
> >> >> >> >> On Wed, Mar 05, 2014 at 03:08:16PM -0800, Austin Schuh wrote:
> >> >> >> >>> Howdy,
> >> >> >> >>>
> >> >> >> >>> I'm running a config_preempt_rt patched version of the
> 3.10.11 kernel,
> >> >> >> >>> and I'm seeing a couple lockups and crashes which I think are
> related
> >> >> >> >>> to XFS.
> >> >> >> >>
> >> >> >> >> I think they ar emore likely related to RT issues....
> >> >> >> >>
> >> >> >> >
> >> >> >> > That very well may be true.
> >> >> >> >
> >> >> >> >> Cheers,
> >> >> >> >>
> >> >> >> >> Dave.
> >> >> >> >> --
> >> >> >> >> Dave Chinner
> >> >> >>
> >> >> >> I had the issue reproduce itself today with just the main SSD
> >> >> >> installed.  This was on a new machine that was built this morning.
> >> >> >> There is a lot less going on in this trace than the previous one.
> >> >> >
> >> >> > The three blocked threads:
> >> >> >
> >> >> >         1. kworker running IO completion waiting on an inode lock,
> >> >> >            holding locked pages.
> >> >> >         2. kworker running writeback flusher work waiting for a
> page lock
> >> >> >         3. direct flush work waiting for allocation, holding page
> >> >> >            locks and the inode lock.
> >> >> >
> >> >> > What's the kworker thread running the allocation work doing?
> >> >> >
> >> >> > You might need to run `echo w > proc-sysrq-trigger` to get this
> >> >> > information...
> >> >>
> >> >> I was able to reproduce the lockup.  I ran `echo w >
> >> >> /proc/sysrq-trigger` per your suggestion.  I don't know how to figure
> >> >> out what the kworker thread is doing, but I'll happily do it if you
> >> >> can give me some guidance.
> >> >
> >> > There isn't a worker thread blocked doing an allocation in that
> >> > dump, so it doesn't shed any light on the problem at all. try
> >> > `echo l > /proc/sysrq-trigger`, followed by `echo t >
> >> > /proc/sysrq-trigger` so we can see all the processes running on CPUs
> >> > and all the processes in the system...
> >> >
> >> > Cheers,
> >> >
> >> > Dave.
> >>
> >> Attached is the output of the two commands you asked for.
> >
> > Nothing there. There's lots of processes waiting for allocation to
> > run, and no kworkers running allocation work. This looks more
> > like a rt-kernel workqueue issue, not an XFS problem.
> >
> > FWIW, it woul dbe really helpful if you compiled your kernels with
> > frame pointers enabled - the stack traces are much more precise and
> > readable (i.e. gets rid of all the false/stale entrys) and that
> > helps understanding where things are stuck immensely.
> >
> > Cheers,
> >
> > Dave.
>
> Thanks Dave.
>
> I'll go check with the rt-kernel guys and take it from there.  Thanks
> for the frame pointers suggestion.  I'll make that change the next
> time I build a kernel.
>
> Austin
>

I found 1 bug in XFS which I fixed, and I've uncovered something else that
I'm not completely sure how to fix.

In xfs_bmapi_allocate, you create a completion, and use that to wait until
the work has finished.  Occasionally, I'm seeing a case where I get a
context switch after the completion has been completed, but before the
workqueue has finished doing it's internal book-keeping.  This results in
the work being deleted before the workqueue is done using it, corrupting
the internal data structures.  I fixed it by waiting using flush_work and
removing the completion entirely.

--- a/fs/xfs/xfs_bmap_util.c        2014-06-23 12:59:10.008678410 -0700
+++ b/fs/xfs/xfs_bmap_util.c      2014-06-23 12:59:14.116678239 -0700
@@ -263,7 +263,6 @@
        current_set_flags_nested(&pflags, PF_FSTRANS);

        args->result = __xfs_bmapi_allocate(args);
-       complete(args->done);

        current_restore_flags_nested(&pflags, PF_FSTRANS);
 }
@@ -277,16 +276,13 @@
 xfs_bmapi_allocate(
        struct xfs_bmalloca     *args)
 {
-       DECLARE_COMPLETION_ONSTACK(done);
-
        if (!args->stack_switch)
                return __xfs_bmapi_allocate(args);


-       args->done = &done;
        INIT_WORK_ONSTACK(&args->work, xfs_bmapi_allocate_worker);
        queue_work(xfs_alloc_wq, &args->work);
-       wait_for_completion(&done);
+       flush_work(&args->work);
        destroy_work_on_stack(&args->work);
        return args->result;
 }
--- a/fs/xfs/xfs_bmap_util.h        2014-06-23 12:59:10.008678410 -0700
+++ b/fs/xfs/xfs_bmap_util.h      2014-06-23 12:59:11.708678340 -0700
@@ -57,7 +57,6 @@
        char                    conv;   /* overwriting unwritten extents */
        char                    stack_switch;
        int                     flags;
-       struct completion       *done;
        struct work_struct      work;
        int                     result;
 };

I enabled event tracing (and added a new event which lists the number of
workers running in a queue whenever that is changed).

To me, it looks like work is scheduled from irq/44-ahci-273 that will
acquire an inode lock.  scp-3986 then acquires the lock, and then goes and
schedules work.  That work is then scheduled behind the work from
irq/44-ahci-273 in the same pool.  The first work blocks waiting on the
lock, and scp-3986 won't finish and release that lock until the second work
gets run.

Any ideas on how to deal with this?  I think we need to create a new pool
to make sure that xfs_bmapi_allocate_worker gets run in a separate thread
to avoid this.

     irq/44-ahci-273   [000] ....1.5    76.340300: workqueue_queue_work:
work struct=ffff880406a6b998 function=xfs_end_io workqueue=ffff88040af82000
pool=ffff88042da63fc0 req_cpu=512 cpu=0
     irq/44-ahci-273   [000] ....1.5    76.340301: workqueue_activate_work:
work struct ffff880406a6b998

             scp-3986  [000] ....1..    76.342711: xfs_ilock_nowait: dev
8:5 ino 0x9794 lock_addr ffff880409fe0090 flags ILOCK_SHARED caller
xfs_map_blocks
             scp-3986  [000] ....1..    76.342714: xfs_iunlock: dev 8:5 ino
0x9794 lock_addr ffff880409fe0090 flags ILOCK_SHARED caller xfs_map_blocks
             scp-3986  [000] ....1..    76.342722: xfs_ilock: dev 8:5 ino
0x9794 lock_addr ffff880409fe0090 flags ILOCK_EXCL caller
xfs_iomap_write_allocate
             scp-3986  [000] ....1.3    76.342729: workqueue_queue_work:
work struct=ffff8800a5b8d900 function=xfs_bmapi_allocate_worker
workqueue=ffff88040bb36000 pool=ffff88042da63fc0 req_cpu=512 cpu=0
             scp-3986  [000] ....1.3    76.342730: workqueue_activate_work:
work struct ffff8800a5b8d900
             scp-3986  [000] ....1.4    76.342754: workqueue_queue_work:
work struct=ffff88040a6a01c8 function=blk_delay_work
workqueue=ffff88040c9f4a00 pool=ffff88042da644c0 req_cpu=512 cpu=0
             scp-3986  [000] ....1.4    76.342755: workqueue_activate_work:
work struct ffff88040a6a01c8

     kworker/0:2-757   [000] ....1.1    76.342868: workqueue_nr_running:
pool=ffff88042da63fc0 nr_running=1
     kworker/0:2-757   [000] ....1..    76.342869: workqueue_execute_start:
work struct ffff880406a6b998: function xfs_end_io
     kworker/0:2-757   [000] ....1..    76.342870: xfs_ilock: dev 8:5 ino
0x9794 lock_addr ffff880409fe0090 flags ILOCK_EXCL caller xfs_setfilesize

     irq/44-ahci-273   [003] ....1.5    76.419037: workqueue_queue_work:
work struct=ffff8800aebc3588 function=xfs_end_io workqueue=ffff88040af82000
pool=ffff88042dbe3fc0 req_cpu=512 cpu=3
     irq/44-ahci-273   [003] ....1.5    76.419038: workqueue_activate_work:
work struct ffff8800aebc3588

     kworker/3:1-119   [003] ....1.1    76.419105: workqueue_nr_running:
pool=ffff88042dbe3fc0 nr_running=1
     kworker/3:1-119   [003] ....1..    76.419106: workqueue_execute_start:
work struct ffff8800aebc3588: function xfs_end_io
     kworker/3:1-119   [003] ....1..    76.419107: xfs_ilock: dev 8:5 ino
0x9794 lock_addr ffff880409fe0090 flags ILOCK_EXCL caller xfs_setfilesize

     irq/44-ahci-273   [002] ....1.5    76.426120: workqueue_queue_work:
work struct=ffff880406a6b930 function=xfs_end_io workqueue=ffff88040af82000
pool=ffff88042db63fc0 req_cpu=512 cpu=2
     irq/44-ahci-273   [002] .N..1.5    76.426127: workqueue_activate_work:
work struct ffff880406a6b930
     kworker/2:1-72    [002] ....1.1    76.426242: workqueue_nr_running:
pool=ffff88042db63fc0 nr_running=1
     kworker/2:1-72    [002] ....1..    76.426243: workqueue_execute_start:
work struct ffff880406a6b930: function xfs_end_io
     kworker/2:1-72    [002] ....1..    76.426244: xfs_ilock: dev 8:5 ino
0x9794 lock_addr ffff880409fe0090 flags ILOCK_EXCL caller xfs_setfilesize

Thanks,
  Austin

[-- Attachment #1.2: Type: text/html, Size: 14087 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* On-stack work item completion race? (was Re: XFS crash?)
  2014-06-23 20:05                   ` Austin Schuh
@ 2014-06-24  3:02                       ` Dave Chinner
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Chinner @ 2014-06-24  3:02 UTC (permalink / raw)
  To: Austin Schuh; +Cc: xfs, linux-kernel, Tejun Heo

[Adding Tejun and lkml to the cc list]

On Mon, Jun 23, 2014 at 01:05:54PM -0700, Austin Schuh wrote:
> I found 1 bug in XFS which I fixed, and I've uncovered something else that
> I'm not completely sure how to fix.
> 
> In xfs_bmapi_allocate, you create a completion, and use that to wait until
> the work has finished.  Occasionally, I'm seeing a case where I get a
> context switch after the completion has been completed, but before the
> workqueue has finished doing it's internal book-keeping.  This results in
> the work being deleted before the workqueue is done using it, corrupting
> the internal data structures.  I fixed it by waiting using flush_work and
> removing the completion entirely.
> 
> --- a/fs/xfs/xfs_bmap_util.c        2014-06-23 12:59:10.008678410 -0700
> +++ b/fs/xfs/xfs_bmap_util.c      2014-06-23 12:59:14.116678239 -0700
> @@ -263,7 +263,6 @@
>         current_set_flags_nested(&pflags, PF_FSTRANS);
> 
>         args->result = __xfs_bmapi_allocate(args);
> -       complete(args->done);
> 
>         current_restore_flags_nested(&pflags, PF_FSTRANS);
>  }
> @@ -277,16 +276,13 @@
>  xfs_bmapi_allocate(
>         struct xfs_bmalloca     *args)
>  {
> -       DECLARE_COMPLETION_ONSTACK(done);
> -
>         if (!args->stack_switch)
>                 return __xfs_bmapi_allocate(args);
> 
> 
> -       args->done = &done;
>         INIT_WORK_ONSTACK(&args->work, xfs_bmapi_allocate_worker);
>         queue_work(xfs_alloc_wq, &args->work);
> -       wait_for_completion(&done);
> +       flush_work(&args->work);
>         destroy_work_on_stack(&args->work);
>         return args->result;
>  }
> --- a/fs/xfs/xfs_bmap_util.h        2014-06-23 12:59:10.008678410 -0700
> +++ b/fs/xfs/xfs_bmap_util.h      2014-06-23 12:59:11.708678340 -0700
> @@ -57,7 +57,6 @@
>         char                    conv;   /* overwriting unwritten extents */
>         char                    stack_switch;
>         int                     flags;
> -       struct completion       *done;
>         struct work_struct      work;
>         int                     result;
>  };

Ok, that's a surprise. However, I can't see how using flush_work()
solves that underlying context switch problem, because it's
implemented the same way:

bool flush_work(struct work_struct *work)
{
        struct wq_barrier barr;

        lock_map_acquire(&work->lockdep_map);
        lock_map_release(&work->lockdep_map);

        if (start_flush_work(work, &barr)) {
                wait_for_completion(&barr.done);
                destroy_work_on_stack(&barr.work);
                return true;
        } else {
                return false;
        }
}

start_flush_work() is effectively a special queue_work()
implementation, so if if it's not safe to call complete() from the
workqueue as the above patch implies then this code has the same
problem.

Tejun - is this "do it yourself completion" a known issue w.r.t.
workqueues? I can't find any documentation that says "don't do
that" so...?

A quick grep also shows up the same queue_work/wait_for_completion
pattern in arch/x86/kernel/hpet.c, drivers/md/dm-thin.c,
fs/fs-writeback.c, drivers/block/drbd/drbd_main.c....

> I enabled event tracing (and added a new event which lists the number of
> workers running in a queue whenever that is changed).
> 
> To me, it looks like work is scheduled from irq/44-ahci-273 that will
> acquire an inode lock.  scp-3986 then acquires the lock, and then goes and
> schedules work.  That work is then scheduled behind the work from
> irq/44-ahci-273 in the same pool.  The first work blocks waiting on the
> lock, and scp-3986 won't finish and release that lock until the second work
> gets run.

IOWs, scp takes an inode lock and queues work to the xfs_alloc_wq,
then schedules. Then a kworker runs an xfs-data work item, which
tries to take the inode lock and blocks.

As I understand it, what then happens is that the workqueue code
grabs another kworker thread and runs the next work item in it's
queue. IOWs, work items can block, but doing that does not prevent
execution of other work items queued on other work queues or even on
the same work queue. Tejun, did I get that correct?

Hence the work on the xfs-data queue will block until another
kworker processes the item on the xfs-alloc-wq which means progress
is made and the inode gets unlocked. Then the kworker for the work
on the xfs-data queue will get the lock, complete it's work and
everything has resolved itself.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* On-stack work item completion race? (was Re: XFS crash?)
@ 2014-06-24  3:02                       ` Dave Chinner
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Chinner @ 2014-06-24  3:02 UTC (permalink / raw)
  To: Austin Schuh; +Cc: Tejun Heo, linux-kernel, xfs

[Adding Tejun and lkml to the cc list]

On Mon, Jun 23, 2014 at 01:05:54PM -0700, Austin Schuh wrote:
> I found 1 bug in XFS which I fixed, and I've uncovered something else that
> I'm not completely sure how to fix.
> 
> In xfs_bmapi_allocate, you create a completion, and use that to wait until
> the work has finished.  Occasionally, I'm seeing a case where I get a
> context switch after the completion has been completed, but before the
> workqueue has finished doing it's internal book-keeping.  This results in
> the work being deleted before the workqueue is done using it, corrupting
> the internal data structures.  I fixed it by waiting using flush_work and
> removing the completion entirely.
> 
> --- a/fs/xfs/xfs_bmap_util.c        2014-06-23 12:59:10.008678410 -0700
> +++ b/fs/xfs/xfs_bmap_util.c      2014-06-23 12:59:14.116678239 -0700
> @@ -263,7 +263,6 @@
>         current_set_flags_nested(&pflags, PF_FSTRANS);
> 
>         args->result = __xfs_bmapi_allocate(args);
> -       complete(args->done);
> 
>         current_restore_flags_nested(&pflags, PF_FSTRANS);
>  }
> @@ -277,16 +276,13 @@
>  xfs_bmapi_allocate(
>         struct xfs_bmalloca     *args)
>  {
> -       DECLARE_COMPLETION_ONSTACK(done);
> -
>         if (!args->stack_switch)
>                 return __xfs_bmapi_allocate(args);
> 
> 
> -       args->done = &done;
>         INIT_WORK_ONSTACK(&args->work, xfs_bmapi_allocate_worker);
>         queue_work(xfs_alloc_wq, &args->work);
> -       wait_for_completion(&done);
> +       flush_work(&args->work);
>         destroy_work_on_stack(&args->work);
>         return args->result;
>  }
> --- a/fs/xfs/xfs_bmap_util.h        2014-06-23 12:59:10.008678410 -0700
> +++ b/fs/xfs/xfs_bmap_util.h      2014-06-23 12:59:11.708678340 -0700
> @@ -57,7 +57,6 @@
>         char                    conv;   /* overwriting unwritten extents */
>         char                    stack_switch;
>         int                     flags;
> -       struct completion       *done;
>         struct work_struct      work;
>         int                     result;
>  };

Ok, that's a surprise. However, I can't see how using flush_work()
solves that underlying context switch problem, because it's
implemented the same way:

bool flush_work(struct work_struct *work)
{
        struct wq_barrier barr;

        lock_map_acquire(&work->lockdep_map);
        lock_map_release(&work->lockdep_map);

        if (start_flush_work(work, &barr)) {
                wait_for_completion(&barr.done);
                destroy_work_on_stack(&barr.work);
                return true;
        } else {
                return false;
        }
}

start_flush_work() is effectively a special queue_work()
implementation, so if if it's not safe to call complete() from the
workqueue as the above patch implies then this code has the same
problem.

Tejun - is this "do it yourself completion" a known issue w.r.t.
workqueues? I can't find any documentation that says "don't do
that" so...?

A quick grep also shows up the same queue_work/wait_for_completion
pattern in arch/x86/kernel/hpet.c, drivers/md/dm-thin.c,
fs/fs-writeback.c, drivers/block/drbd/drbd_main.c....

> I enabled event tracing (and added a new event which lists the number of
> workers running in a queue whenever that is changed).
> 
> To me, it looks like work is scheduled from irq/44-ahci-273 that will
> acquire an inode lock.  scp-3986 then acquires the lock, and then goes and
> schedules work.  That work is then scheduled behind the work from
> irq/44-ahci-273 in the same pool.  The first work blocks waiting on the
> lock, and scp-3986 won't finish and release that lock until the second work
> gets run.

IOWs, scp takes an inode lock and queues work to the xfs_alloc_wq,
then schedules. Then a kworker runs an xfs-data work item, which
tries to take the inode lock and blocks.

As I understand it, what then happens is that the workqueue code
grabs another kworker thread and runs the next work item in it's
queue. IOWs, work items can block, but doing that does not prevent
execution of other work items queued on other work queues or even on
the same work queue. Tejun, did I get that correct?

Hence the work on the xfs-data queue will block until another
kworker processes the item on the xfs-alloc-wq which means progress
is made and the inode gets unlocked. Then the kworker for the work
on the xfs-data queue will get the lock, complete it's work and
everything has resolved itself.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: On-stack work item completion race? (was Re: XFS crash?)
  2014-06-24  3:02                       ` Dave Chinner
@ 2014-06-24  3:25                         ` Tejun Heo
  -1 siblings, 0 replies; 30+ messages in thread
From: Tejun Heo @ 2014-06-24  3:25 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Austin Schuh, xfs, linux-kernel

Hello,

On Tue, Jun 24, 2014 at 01:02:40PM +1000, Dave Chinner wrote:
> start_flush_work() is effectively a special queue_work()
> implementation, so if if it's not safe to call complete() from the
> workqueue as the above patch implies then this code has the same
> problem.
> 
> Tejun - is this "do it yourself completion" a known issue w.r.t.
> workqueues? I can't find any documentation that says "don't do
> that" so...?

It's more complex than using flush_work() but there's nothing
fundamentally wrong with it.  A work item is completely unlinked
before its execution starts.  It's safe to free the work item once its
work function started, whether through kfree() or returning.

One difference between flush_work() and manual completion would be
that if the work item gets requeued, flush_work() would wait for the
queued one to finish but given the work item is one-shot this doesn't
make any difference.

I can see no reason why manual completion would behave differently
from flush_work() in this case.

> As I understand it, what then happens is that the workqueue code
> grabs another kworker thread and runs the next work item in it's
> queue. IOWs, work items can block, but doing that does not prevent
> execution of other work items queued on other work queues or even on
> the same work queue. Tejun, did I get that correct?

Yes, as long as the workqueue is under its @max_active limit and has
access to an existing kworker or can create a new one, it'll start
executing the next work item immediately; however, the guaranteed
level of concurrency is 1 even for WQ_RECLAIM workqueues.  IOW, the
work items queued on a workqueue must be able to make forward progress
with single work item if the work items are being depended upon for
memory reclaim.

> Hence the work on the xfs-data queue will block until another
> kworker processes the item on the xfs-alloc-wq which means progress
> is made and the inode gets unlocked. Then the kworker for the work
> on the xfs-data queue will get the lock, complete it's work and
> everything has resolved itself.

As long as a WQ_RECLAIM workqueue dosen't depend upon itself,
forward-progress is guaranteed.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: On-stack work item completion race? (was Re: XFS crash?)
@ 2014-06-24  3:25                         ` Tejun Heo
  0 siblings, 0 replies; 30+ messages in thread
From: Tejun Heo @ 2014-06-24  3:25 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel, Austin Schuh, xfs

Hello,

On Tue, Jun 24, 2014 at 01:02:40PM +1000, Dave Chinner wrote:
> start_flush_work() is effectively a special queue_work()
> implementation, so if if it's not safe to call complete() from the
> workqueue as the above patch implies then this code has the same
> problem.
> 
> Tejun - is this "do it yourself completion" a known issue w.r.t.
> workqueues? I can't find any documentation that says "don't do
> that" so...?

It's more complex than using flush_work() but there's nothing
fundamentally wrong with it.  A work item is completely unlinked
before its execution starts.  It's safe to free the work item once its
work function started, whether through kfree() or returning.

One difference between flush_work() and manual completion would be
that if the work item gets requeued, flush_work() would wait for the
queued one to finish but given the work item is one-shot this doesn't
make any difference.

I can see no reason why manual completion would behave differently
from flush_work() in this case.

> As I understand it, what then happens is that the workqueue code
> grabs another kworker thread and runs the next work item in it's
> queue. IOWs, work items can block, but doing that does not prevent
> execution of other work items queued on other work queues or even on
> the same work queue. Tejun, did I get that correct?

Yes, as long as the workqueue is under its @max_active limit and has
access to an existing kworker or can create a new one, it'll start
executing the next work item immediately; however, the guaranteed
level of concurrency is 1 even for WQ_RECLAIM workqueues.  IOW, the
work items queued on a workqueue must be able to make forward progress
with single work item if the work items are being depended upon for
memory reclaim.

> Hence the work on the xfs-data queue will block until another
> kworker processes the item on the xfs-alloc-wq which means progress
> is made and the inode gets unlocked. Then the kworker for the work
> on the xfs-data queue will get the lock, complete it's work and
> everything has resolved itself.

As long as a WQ_RECLAIM workqueue dosen't depend upon itself,
forward-progress is guaranteed.

Thanks.

-- 
tejun

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: On-stack work item completion race? (was Re: XFS crash?)
  2014-06-24  3:25                         ` Tejun Heo
  (?)
@ 2014-06-25  3:05                         ` Austin Schuh
  2014-06-25 14:00                             ` Tejun Heo
  -1 siblings, 1 reply; 30+ messages in thread
From: Austin Schuh @ 2014-06-25  3:05 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Thomas Gleixner, linux-kernel, xfs


[-- Attachment #1.1: Type: text/plain, Size: 5900 bytes --]

[Adding tglx to the cc]

On Mon, Jun 23, 2014 at 8:25 PM, Tejun Heo <tj@kernel.org> wrote:

> Hello,
>
> On Tue, Jun 24, 2014 at 01:02:40PM +1000, Dave Chinner wrote:
> > start_flush_work() is effectively a special queue_work()
> > implementation, so if if it's not safe to call complete() from the
> > workqueue as the above patch implies then this code has the same
> > problem.
> >
> > Tejun - is this "do it yourself completion" a known issue w.r.t.
> > workqueues? I can't find any documentation that says "don't do
> > that" so...?
>
> It's more complex than using flush_work() but there's nothing
> fundamentally wrong with it.  A work item is completely unlinked
> before its execution starts.  It's safe to free the work item once its
> work function started, whether through kfree() or returning.
>
> One difference between flush_work() and manual completion would be
> that if the work item gets requeued, flush_work() would wait for the
> queued one to finish but given the work item is one-shot this doesn't
> make any difference.
>
> I can see no reason why manual completion would behave differently
> from flush_work() in this case.
>

I went looking for a short trace in my original log to show the problem,
and instead found evidence of the second problem.  I still like the shorter
flush_work call, but that's not my call.

I did find this comment in the process_one_work function.  Sounds like this
could be better documented.

  /*
   * It is permissible to free the struct work_struct from
   * inside the function that is called from it, this we need to
   * take into account for lockdep too.  To avoid bogus "held
   * lock freed" warnings as well as problems when looking into
   * work->lockdep_map, make a copy and use that here.
   */

 > As I understand it, what then happens is that the workqueue code
> > grabs another kworker thread and runs the next work item in it's
> > queue. IOWs, work items can block, but doing that does not prevent
> > execution of other work items queued on other work queues or even on
> > the same work queue. Tejun, did I get that correct?
>
> Yes, as long as the workqueue is under its @max_active limit and has
> access to an existing kworker or can create a new one, it'll start
> executing the next work item immediately; however, the guaranteed
> level of concurrency is 1 even for WQ_RECLAIM workqueues.  IOW, the
> work items queued on a workqueue must be able to make forward progress
> with single work item if the work items are being depended upon for
> memory reclaim.


I mentioned this to Dave when I initially started this thread, but I'm
running a RT patched kernel.  I don't see forwards progress.  The workqueue
is only used in 1 spot (xfs_alloc_wq), and has WQ_MEM_RECLAIM set.  I
started with a 10,000,000 line trace and pulled out the entries which
referenced the workqueue and pool leading up to the lockup.

             scp-4110  [001] ....1.3   101.184640: workqueue_queue_work:
work struct=ffff8803c782d900 function=xfs_bmapi_allocate_worker
workqueue=ffff88040c9f5a00 pool=ffff88042dae3fc0 req_cpu=512 cpu=1
             scp-4110  [001] ....1.3   101.184641: workqueue_activate_work:
work struct ffff8803c782d900
     kworker/1:1-89    [001] ....1.1   101.184883: workqueue_nr_running:
pool=ffff88042dae3fc0 nr_running=1
     kworker/1:1-89    [001] ....1..   101.184885: workqueue_execute_start:
work struct ffff8803c782d900: function xfs_bmapi_allocate_worker
     irq/44-ahci-275   [001] ....1.5   101.185086: workqueue_queue_work:
work struct=ffff8800ae3f01e0 function=xfs_end_io workqueue=ffff88040b459a00
pool=ffff88042dae3fc0 req_cpu=512 cpu=1
     irq/44-ahci-275   [001] ....1.5   101.185088: workqueue_activate_work:
work struct ffff8800ae3f01e0
             scp-4110  [001] ....1..   101.187911: xfs_ilock: dev 8:5 ino
0xf9e flags ILOCK_EXCL caller xfs_iomap_write_allocate
             scp-4110  [001] ....1.3   101.187914: workqueue_queue_work:
work struct=ffff8803c782d900 function=xfs_bmapi_allocate_worker
workqueue=ffff88040c9f5a00 pool=ffff88042dae3fc0 req_cpu=512 cpu=1
             scp-4110  [001] ....1.3   101.187915: workqueue_activate_work:
work struct ffff8803c782d900
             scp-4110  [001] ....1.4   101.187926: workqueue_queue_work:
work struct=ffff88040a6a01c8 function=blk_delay_work
workqueue=ffff88040c9f4a00 pool=ffff88042dae44c0 req_cpu=512 cpu=1
             scp-4110  [001] ....1.4   101.187926: workqueue_activate_work:
work struct ffff88040a6a01c8
     kworker/1:1-89    [001] ....1..   101.187998: workqueue_execute_end:
work struct ffff8803c782d900
     kworker/1:1-89    [001] ....1..   101.188000: workqueue_execute_start:
work struct ffff8800ae3f01e0: function xfs_end_io
     kworker/1:1-89    [001] ....1..   101.188001: xfs_ilock: dev 8:5 ino
0xf9e flags ILOCK_EXCL caller xfs_setfilesize

The last work never runs.  Hangcheck triggers shortly after.

I spent some more time debugging, and I am seeing that tsk_is_pi_blocked is
returning 1 in sched_submit_work (kernel/sched/core.c).  It looks
like sched_submit_work is not detecting that the worker task is blocked on
a mutex.

This looks very RT related right now.  I see 2 problems from my reading
(and experimentation).  The first is that the second worker isn't getting
started because tsk_is_pi_blocked is reporting that the task isn't blocked
on a mutex.  The second is that even if another worker needs to be
scheduled because the original worker is blocked on a mutex, we need the
pool lock to schedule another worker.  The pool lock can be acquired by any
CPU, and is a spin_lock.  If we end up on the slow path for the pool lock,
we hit BUG_ON(rt_mutex_real_waiter(task->pi_blocked_on))
in task_blocks_on_rt_mutex in rtmutex.c.  I'm not sure how to deal with
either problem.

Hopefully I've got all my facts right...  Debugging kernel code is a whole
new world from userspace code.

Thanks!
Austin

[-- Attachment #1.2: Type: text/html, Size: 7489 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: On-stack work item completion race? (was Re: XFS crash?)
  2014-06-24  3:25                         ` Tejun Heo
@ 2014-06-25  3:16                           ` Austin Schuh
  -1 siblings, 0 replies; 30+ messages in thread
From: Austin Schuh @ 2014-06-25  3:16 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Dave Chinner, xfs, linux-kernel, Thomas Gleixner

[Adding tglx to the cc.  Sorry for any double sends]

On Mon, Jun 23, 2014 at 8:25 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Tue, Jun 24, 2014 at 01:02:40PM +1000, Dave Chinner wrote:
>> start_flush_work() is effectively a special queue_work()
>> implementation, so if if it's not safe to call complete() from the
>> workqueue as the above patch implies then this code has the same
>> problem.
>>
>> Tejun - is this "do it yourself completion" a known issue w.r.t.
>> workqueues? I can't find any documentation that says "don't do
>> that" so...?
>
> It's more complex than using flush_work() but there's nothing
> fundamentally wrong with it.  A work item is completely unlinked
> before its execution starts.  It's safe to free the work item once its
> work function started, whether through kfree() or returning.
>
> One difference between flush_work() and manual completion would be
> that if the work item gets requeued, flush_work() would wait for the
> queued one to finish but given the work item is one-shot this doesn't
> make any difference.
>
> I can see no reason why manual completion would behave differently
> from flush_work() in this case.

I went looking for a short trace in my original log to show the
problem, and instead found evidence of the second problem.  I still
like the shorter flush_work call, but that's not my call.

I did find this comment in the process_one_work function.  Sounds like
this could be better documented.

  /*
   * It is permissible to free the struct work_struct from
   * inside the function that is called from it, this we need to
   * take into account for lockdep too.  To avoid bogus "held
   * lock freed" warnings as well as problems when looking into
   * work->lockdep_map, make a copy and use that here.
   */

>> As I understand it, what then happens is that the workqueue code
>> grabs another kworker thread and runs the next work item in it's
>> queue. IOWs, work items can block, but doing that does not prevent
>> execution of other work items queued on other work queues or even on
>> the same work queue. Tejun, did I get that correct?
>
> Yes, as long as the workqueue is under its @max_active limit and has
> access to an existing kworker or can create a new one, it'll start
> executing the next work item immediately; however, the guaranteed
> level of concurrency is 1 even for WQ_RECLAIM workqueues.  IOW, the
> work items queued on a workqueue must be able to make forward progress
> with single work item if the work items are being depended upon for
> memory reclaim.

I mentioned this to Dave when I initially started this thread, but I'm
running a RT patched kernel.  I don't see forwards progress.  The
workqueue is only used in 1 spot (xfs_alloc_wq), and has
WQ_MEM_RECLAIM set.  I started with a 10,000,000 line trace and pulled
out the entries which referenced the workqueue and pool leading up to
the lockup.

             scp-4110  [001] ....1.3   101.184640:
workqueue_queue_work: work struct=ffff8803c782d900
function=xfs_bmapi_allocate_worker workqueue=ffff88040c9f5a00
pool=ffff88042dae3fc0 req_cpu=512 cpu=1
             scp-4110  [001] ....1.3   101.184641:
workqueue_activate_work: work struct ffff8803c782d900
     kworker/1:1-89    [001] ....1.1   101.184883:
workqueue_nr_running: pool=ffff88042dae3fc0 nr_running=1
     kworker/1:1-89    [001] ....1..   101.184885:
workqueue_execute_start: work struct ffff8803c782d900: function
xfs_bmapi_allocate_worker
     irq/44-ahci-275   [001] ....1.5   101.185086:
workqueue_queue_work: work struct=ffff8800ae3f01e0 function=xfs_end_io
workqueue=ffff88040b459a00 pool=ffff88042dae3fc0 req_cpu=512 cpu=1
     irq/44-ahci-275   [001] ....1.5   101.185088:
workqueue_activate_work: work struct ffff8800ae3f01e0
             scp-4110  [001] ....1..   101.187911: xfs_ilock: dev 8:5
ino 0xf9e flags ILOCK_EXCL caller xfs_iomap_write_allocate
             scp-4110  [001] ....1.3   101.187914:
workqueue_queue_work: work struct=ffff8803c782d900
function=xfs_bmapi_allocate_worker workqueue=ffff88040c9f5a00
pool=ffff88042dae3fc0 req_cpu=512 cpu=1
             scp-4110  [001] ....1.3   101.187915:
workqueue_activate_work: work struct ffff8803c782d900
             scp-4110  [001] ....1.4   101.187926:
workqueue_queue_work: work struct=ffff88040a6a01c8
function=blk_delay_work workqueue=ffff88040c9f4a00
pool=ffff88042dae44c0 req_cpu=512 cpu=1
             scp-4110  [001] ....1.4   101.187926:
workqueue_activate_work: work struct ffff88040a6a01c8
     kworker/1:1-89    [001] ....1..   101.187998:
workqueue_execute_end: work struct ffff8803c782d900
     kworker/1:1-89    [001] ....1..   101.188000:
workqueue_execute_start: work struct ffff8800ae3f01e0: function
xfs_end_io
     kworker/1:1-89    [001] ....1..   101.188001: xfs_ilock: dev 8:5
ino 0xf9e flags ILOCK_EXCL caller xfs_setfilesize

The last work never runs.  Hangcheck triggers shortly after.

I spent some more time debugging, and I am seeing that
tsk_is_pi_blocked is returning 1 in sched_submit_work
(kernel/sched/core.c).  It looks like sched_submit_work is not
detecting that the worker task is blocked on a mutex.

This looks very RT related right now.  I see 2 problems from my
reading (and experimentation).  The first is that the second worker
isn't getting started because tsk_is_pi_blocked is reporting that the
task isn't blocked on a mutex.  The second is that even if another
worker needs to be scheduled because the original worker is blocked on
a mutex, we need the pool lock to schedule another worker.  The pool
lock can be acquired by any CPU, and is a spin_lock.  If we end up on
the slow path for the pool lock, we hit
BUG_ON(rt_mutex_real_waiter(task->pi_blocked_on)) in
task_blocks_on_rt_mutex in rtmutex.c.  I'm not sure how to deal with
either problem.  Thomas, any ideas?

Hopefully I've got all my facts right...  Debugging kernel code is a
whole new world from userspace code.

Thanks!
Austin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: On-stack work item completion race? (was Re: XFS crash?)
@ 2014-06-25  3:16                           ` Austin Schuh
  0 siblings, 0 replies; 30+ messages in thread
From: Austin Schuh @ 2014-06-25  3:16 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Thomas Gleixner, linux-kernel, xfs

[Adding tglx to the cc.  Sorry for any double sends]

On Mon, Jun 23, 2014 at 8:25 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Tue, Jun 24, 2014 at 01:02:40PM +1000, Dave Chinner wrote:
>> start_flush_work() is effectively a special queue_work()
>> implementation, so if if it's not safe to call complete() from the
>> workqueue as the above patch implies then this code has the same
>> problem.
>>
>> Tejun - is this "do it yourself completion" a known issue w.r.t.
>> workqueues? I can't find any documentation that says "don't do
>> that" so...?
>
> It's more complex than using flush_work() but there's nothing
> fundamentally wrong with it.  A work item is completely unlinked
> before its execution starts.  It's safe to free the work item once its
> work function started, whether through kfree() or returning.
>
> One difference between flush_work() and manual completion would be
> that if the work item gets requeued, flush_work() would wait for the
> queued one to finish but given the work item is one-shot this doesn't
> make any difference.
>
> I can see no reason why manual completion would behave differently
> from flush_work() in this case.

I went looking for a short trace in my original log to show the
problem, and instead found evidence of the second problem.  I still
like the shorter flush_work call, but that's not my call.

I did find this comment in the process_one_work function.  Sounds like
this could be better documented.

  /*
   * It is permissible to free the struct work_struct from
   * inside the function that is called from it, this we need to
   * take into account for lockdep too.  To avoid bogus "held
   * lock freed" warnings as well as problems when looking into
   * work->lockdep_map, make a copy and use that here.
   */

>> As I understand it, what then happens is that the workqueue code
>> grabs another kworker thread and runs the next work item in it's
>> queue. IOWs, work items can block, but doing that does not prevent
>> execution of other work items queued on other work queues or even on
>> the same work queue. Tejun, did I get that correct?
>
> Yes, as long as the workqueue is under its @max_active limit and has
> access to an existing kworker or can create a new one, it'll start
> executing the next work item immediately; however, the guaranteed
> level of concurrency is 1 even for WQ_RECLAIM workqueues.  IOW, the
> work items queued on a workqueue must be able to make forward progress
> with single work item if the work items are being depended upon for
> memory reclaim.

I mentioned this to Dave when I initially started this thread, but I'm
running a RT patched kernel.  I don't see forwards progress.  The
workqueue is only used in 1 spot (xfs_alloc_wq), and has
WQ_MEM_RECLAIM set.  I started with a 10,000,000 line trace and pulled
out the entries which referenced the workqueue and pool leading up to
the lockup.

             scp-4110  [001] ....1.3   101.184640:
workqueue_queue_work: work struct=ffff8803c782d900
function=xfs_bmapi_allocate_worker workqueue=ffff88040c9f5a00
pool=ffff88042dae3fc0 req_cpu=512 cpu=1
             scp-4110  [001] ....1.3   101.184641:
workqueue_activate_work: work struct ffff8803c782d900
     kworker/1:1-89    [001] ....1.1   101.184883:
workqueue_nr_running: pool=ffff88042dae3fc0 nr_running=1
     kworker/1:1-89    [001] ....1..   101.184885:
workqueue_execute_start: work struct ffff8803c782d900: function
xfs_bmapi_allocate_worker
     irq/44-ahci-275   [001] ....1.5   101.185086:
workqueue_queue_work: work struct=ffff8800ae3f01e0 function=xfs_end_io
workqueue=ffff88040b459a00 pool=ffff88042dae3fc0 req_cpu=512 cpu=1
     irq/44-ahci-275   [001] ....1.5   101.185088:
workqueue_activate_work: work struct ffff8800ae3f01e0
             scp-4110  [001] ....1..   101.187911: xfs_ilock: dev 8:5
ino 0xf9e flags ILOCK_EXCL caller xfs_iomap_write_allocate
             scp-4110  [001] ....1.3   101.187914:
workqueue_queue_work: work struct=ffff8803c782d900
function=xfs_bmapi_allocate_worker workqueue=ffff88040c9f5a00
pool=ffff88042dae3fc0 req_cpu=512 cpu=1
             scp-4110  [001] ....1.3   101.187915:
workqueue_activate_work: work struct ffff8803c782d900
             scp-4110  [001] ....1.4   101.187926:
workqueue_queue_work: work struct=ffff88040a6a01c8
function=blk_delay_work workqueue=ffff88040c9f4a00
pool=ffff88042dae44c0 req_cpu=512 cpu=1
             scp-4110  [001] ....1.4   101.187926:
workqueue_activate_work: work struct ffff88040a6a01c8
     kworker/1:1-89    [001] ....1..   101.187998:
workqueue_execute_end: work struct ffff8803c782d900
     kworker/1:1-89    [001] ....1..   101.188000:
workqueue_execute_start: work struct ffff8800ae3f01e0: function
xfs_end_io
     kworker/1:1-89    [001] ....1..   101.188001: xfs_ilock: dev 8:5
ino 0xf9e flags ILOCK_EXCL caller xfs_setfilesize

The last work never runs.  Hangcheck triggers shortly after.

I spent some more time debugging, and I am seeing that
tsk_is_pi_blocked is returning 1 in sched_submit_work
(kernel/sched/core.c).  It looks like sched_submit_work is not
detecting that the worker task is blocked on a mutex.

This looks very RT related right now.  I see 2 problems from my
reading (and experimentation).  The first is that the second worker
isn't getting started because tsk_is_pi_blocked is reporting that the
task isn't blocked on a mutex.  The second is that even if another
worker needs to be scheduled because the original worker is blocked on
a mutex, we need the pool lock to schedule another worker.  The pool
lock can be acquired by any CPU, and is a spin_lock.  If we end up on
the slow path for the pool lock, we hit
BUG_ON(rt_mutex_real_waiter(task->pi_blocked_on)) in
task_blocks_on_rt_mutex in rtmutex.c.  I'm not sure how to deal with
either problem.  Thomas, any ideas?

Hopefully I've got all my facts right...  Debugging kernel code is a
whole new world from userspace code.

Thanks!
Austin

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: On-stack work item completion race? (was Re: XFS crash?)
  2014-06-24  3:25                         ` Tejun Heo
@ 2014-06-25  5:56                           ` Dave Chinner
  -1 siblings, 0 replies; 30+ messages in thread
From: Dave Chinner @ 2014-06-25  5:56 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Austin Schuh, xfs, linux-kernel

On Mon, Jun 23, 2014 at 11:25:21PM -0400, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jun 24, 2014 at 01:02:40PM +1000, Dave Chinner wrote:
> > As I understand it, what then happens is that the workqueue code
> > grabs another kworker thread and runs the next work item in it's
> > queue. IOWs, work items can block, but doing that does not prevent
> > execution of other work items queued on other work queues or even on
> > the same work queue. Tejun, did I get that correct?
> 
> Yes, as long as the workqueue is under its @max_active limit and has
> access to an existing kworker or can create a new one, it'll start
> executing the next work item immediately; however, the guaranteed
> level of concurrency is 1 even for WQ_RECLAIM workqueues.  IOW, the
> work items queued on a workqueue must be able to make forward progress
> with single work item if the work items are being depended upon for
> memory reclaim.

Hmmm - that's different from my understanding of what the original
behaviour WQ_MEM_RECLAIM gave us. i.e. that WQ_MEM_RECLAIM
workqueues had a rescuer thread created to guarantee that the
*workqueue* could make forward progress executing work in a
reclaim context.

The concept that the *work being executed* needs to guarantee
forwards progress is something I've never heard stated before.
That worries me a lot, especially with all the memory reclaim
problems that have surfaced in the past couple of months....

> As long as a WQ_RECLAIM workqueue dosen't depend upon itself,
> forward-progress is guaranteed.

I can't find any documentation that actually defines what
WQ_MEM_RECLAIM means, so I can't tell when or how this requirement
came about. If it's true, then I suspect most of the WQ_MEM_RECLAIM
workqueues in filesystems violate it. Can you point me at
documentation/commits/code describing the constraints of
WQ_MEM_RECLAIM and the reasons for it?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: On-stack work item completion race? (was Re: XFS crash?)
@ 2014-06-25  5:56                           ` Dave Chinner
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Chinner @ 2014-06-25  5:56 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Austin Schuh, xfs

On Mon, Jun 23, 2014 at 11:25:21PM -0400, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jun 24, 2014 at 01:02:40PM +1000, Dave Chinner wrote:
> > As I understand it, what then happens is that the workqueue code
> > grabs another kworker thread and runs the next work item in it's
> > queue. IOWs, work items can block, but doing that does not prevent
> > execution of other work items queued on other work queues or even on
> > the same work queue. Tejun, did I get that correct?
> 
> Yes, as long as the workqueue is under its @max_active limit and has
> access to an existing kworker or can create a new one, it'll start
> executing the next work item immediately; however, the guaranteed
> level of concurrency is 1 even for WQ_RECLAIM workqueues.  IOW, the
> work items queued on a workqueue must be able to make forward progress
> with single work item if the work items are being depended upon for
> memory reclaim.

Hmmm - that's different from my understanding of what the original
behaviour WQ_MEM_RECLAIM gave us. i.e. that WQ_MEM_RECLAIM
workqueues had a rescuer thread created to guarantee that the
*workqueue* could make forward progress executing work in a
reclaim context.

The concept that the *work being executed* needs to guarantee
forwards progress is something I've never heard stated before.
That worries me a lot, especially with all the memory reclaim
problems that have surfaced in the past couple of months....

> As long as a WQ_RECLAIM workqueue dosen't depend upon itself,
> forward-progress is guaranteed.

I can't find any documentation that actually defines what
WQ_MEM_RECLAIM means, so I can't tell when or how this requirement
came about. If it's true, then I suspect most of the WQ_MEM_RECLAIM
workqueues in filesystems violate it. Can you point me at
documentation/commits/code describing the constraints of
WQ_MEM_RECLAIM and the reasons for it?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: On-stack work item completion race? (was Re: XFS crash?)
  2014-06-25  3:05                         ` Austin Schuh
@ 2014-06-25 14:00                             ` Tejun Heo
  0 siblings, 0 replies; 30+ messages in thread
From: Tejun Heo @ 2014-06-25 14:00 UTC (permalink / raw)
  To: Austin Schuh; +Cc: Dave Chinner, xfs, Thomas Gleixner, linux-kernel

Hello,

On Tue, Jun 24, 2014 at 08:05:07PM -0700, Austin Schuh wrote:
> > I can see no reason why manual completion would behave differently
> > from flush_work() in this case.
> 
> I went looking for a short trace in my original log to show the problem,
> and instead found evidence of the second problem.  I still like the shorter
> flush_work call, but that's not my call.

So, are you saying that the original issue you reported isn't actually
a problem?  But didn't you imply that changing the waiting mechanism
fixed a deadlock or was that a false positive?

> I did find this comment in the process_one_work function.  Sounds like this
> could be better documented.

Yeah, we prolly should beef up Documentation/workqueue.txt with
information on general usage.

> I spent some more time debugging, and I am seeing that tsk_is_pi_blocked is
> returning 1 in sched_submit_work (kernel/sched/core.c).  It looks
> like sched_submit_work is not detecting that the worker task is blocked on
> a mutex.

The function unplugs the block layer and doesn't have much to do with
workqueue although it has "_work" in its name.

> This looks very RT related right now.  I see 2 problems from my reading
> (and experimentation).  The first is that the second worker isn't getting
> started because tsk_is_pi_blocked is reporting that the task isn't blocked
> on a mutex.  The second is that even if another worker needs to be
> scheduled because the original worker is blocked on a mutex, we need the
> pool lock to schedule another worker.  The pool lock can be acquired by any
> CPU, and is a spin_lock.  If we end up on the slow path for the pool lock,
> we hit BUG_ON(rt_mutex_real_waiter(task->pi_blocked_on))
> in task_blocks_on_rt_mutex in rtmutex.c.  I'm not sure how to deal with
> either problem.
> 
> Hopefully I've got all my facts right...  Debugging kernel code is a whole
> new world from userspace code.

I don't have much idea how RT kernel works either.  Can you reproduce
the issues that you see on mainline?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: On-stack work item completion race? (was Re: XFS crash?)
@ 2014-06-25 14:00                             ` Tejun Heo
  0 siblings, 0 replies; 30+ messages in thread
From: Tejun Heo @ 2014-06-25 14:00 UTC (permalink / raw)
  To: Austin Schuh; +Cc: Thomas Gleixner, linux-kernel, xfs

Hello,

On Tue, Jun 24, 2014 at 08:05:07PM -0700, Austin Schuh wrote:
> > I can see no reason why manual completion would behave differently
> > from flush_work() in this case.
> 
> I went looking for a short trace in my original log to show the problem,
> and instead found evidence of the second problem.  I still like the shorter
> flush_work call, but that's not my call.

So, are you saying that the original issue you reported isn't actually
a problem?  But didn't you imply that changing the waiting mechanism
fixed a deadlock or was that a false positive?

> I did find this comment in the process_one_work function.  Sounds like this
> could be better documented.

Yeah, we prolly should beef up Documentation/workqueue.txt with
information on general usage.

> I spent some more time debugging, and I am seeing that tsk_is_pi_blocked is
> returning 1 in sched_submit_work (kernel/sched/core.c).  It looks
> like sched_submit_work is not detecting that the worker task is blocked on
> a mutex.

The function unplugs the block layer and doesn't have much to do with
workqueue although it has "_work" in its name.

> This looks very RT related right now.  I see 2 problems from my reading
> (and experimentation).  The first is that the second worker isn't getting
> started because tsk_is_pi_blocked is reporting that the task isn't blocked
> on a mutex.  The second is that even if another worker needs to be
> scheduled because the original worker is blocked on a mutex, we need the
> pool lock to schedule another worker.  The pool lock can be acquired by any
> CPU, and is a spin_lock.  If we end up on the slow path for the pool lock,
> we hit BUG_ON(rt_mutex_real_waiter(task->pi_blocked_on))
> in task_blocks_on_rt_mutex in rtmutex.c.  I'm not sure how to deal with
> either problem.
> 
> Hopefully I've got all my facts right...  Debugging kernel code is a whole
> new world from userspace code.

I don't have much idea how RT kernel works either.  Can you reproduce
the issues that you see on mainline?

Thanks.

-- 
tejun

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: On-stack work item completion race? (was Re: XFS crash?)
  2014-06-25  5:56                           ` Dave Chinner
@ 2014-06-25 14:18                             ` Tejun Heo
  -1 siblings, 0 replies; 30+ messages in thread
From: Tejun Heo @ 2014-06-25 14:18 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Austin Schuh, xfs, linux-kernel

Hello, Dave.

On Wed, Jun 25, 2014 at 03:56:41PM +1000, Dave Chinner wrote:
> Hmmm - that's different from my understanding of what the original
> behaviour WQ_MEM_RECLAIM gave us. i.e. that WQ_MEM_RECLAIM
> workqueues had a rescuer thread created to guarantee that the
> *workqueue* could make forward progress executing work in a
> reclaim context.

>From Documentation/workqueue.txt

  WQ_MEM_RECLAIM

	All wq which might be used in the memory reclaim paths _MUST_
	have this flag set.  The wq is guaranteed to have at least one
	execution context regardless of memory pressure.

So, all that's guaranteed is that the workqueue has at least one
worker executing its work items.  If that one worker is serving a work
item which can't make forward progress, the workqueue is not
guaranteed to make forward progress.

> The concept that the *work being executed* needs to guarantee
> forwards progress is something I've never heard stated before.
> That worries me a lot, especially with all the memory reclaim
> problems that have surfaced in the past couple of months....

I'd love to provide that but guaranteeing that at least one work is
always being executed requires unlimited task allocation (the ones
which get blocked gotta store their context somewhere).

> > As long as a WQ_RECLAIM workqueue dosen't depend upon itself,
> > forward-progress is guaranteed.
> 
> I can't find any documentation that actually defines what
> WQ_MEM_RECLAIM means, so I can't tell when or how this requirement
> came about. If it's true, then I suspect most of the WQ_MEM_RECLAIM
> workqueues in filesystems violate it. Can you point me at
> documentation/commits/code describing the constraints of
> WQ_MEM_RECLAIM and the reasons for it?

Documentation/workqueue.txt should be it but maybe we should be more
explicit.  The behavior is maintaining what the
pre-concurrency-management workqueue provided with static
per-workqueue workers.  Each workqueue reserved its workers (either
one per cpu or one globally) and it only supported single level of
concurrency on each CPU.  WQ_MEM_RECLAIM is providing equivalent
amount of forward progress guarantee and all the existing users
shouldn't have issues on this front.  If we have grown incorrect
usages from then on, we need to fix them.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: On-stack work item completion race? (was Re: XFS crash?)
@ 2014-06-25 14:18                             ` Tejun Heo
  0 siblings, 0 replies; 30+ messages in thread
From: Tejun Heo @ 2014-06-25 14:18 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel, Austin Schuh, xfs

Hello, Dave.

On Wed, Jun 25, 2014 at 03:56:41PM +1000, Dave Chinner wrote:
> Hmmm - that's different from my understanding of what the original
> behaviour WQ_MEM_RECLAIM gave us. i.e. that WQ_MEM_RECLAIM
> workqueues had a rescuer thread created to guarantee that the
> *workqueue* could make forward progress executing work in a
> reclaim context.

>From Documentation/workqueue.txt

  WQ_MEM_RECLAIM

	All wq which might be used in the memory reclaim paths _MUST_
	have this flag set.  The wq is guaranteed to have at least one
	execution context regardless of memory pressure.

So, all that's guaranteed is that the workqueue has at least one
worker executing its work items.  If that one worker is serving a work
item which can't make forward progress, the workqueue is not
guaranteed to make forward progress.

> The concept that the *work being executed* needs to guarantee
> forwards progress is something I've never heard stated before.
> That worries me a lot, especially with all the memory reclaim
> problems that have surfaced in the past couple of months....

I'd love to provide that but guaranteeing that at least one work is
always being executed requires unlimited task allocation (the ones
which get blocked gotta store their context somewhere).

> > As long as a WQ_RECLAIM workqueue dosen't depend upon itself,
> > forward-progress is guaranteed.
> 
> I can't find any documentation that actually defines what
> WQ_MEM_RECLAIM means, so I can't tell when or how this requirement
> came about. If it's true, then I suspect most of the WQ_MEM_RECLAIM
> workqueues in filesystems violate it. Can you point me at
> documentation/commits/code describing the constraints of
> WQ_MEM_RECLAIM and the reasons for it?

Documentation/workqueue.txt should be it but maybe we should be more
explicit.  The behavior is maintaining what the
pre-concurrency-management workqueue provided with static
per-workqueue workers.  Each workqueue reserved its workers (either
one per cpu or one globally) and it only supported single level of
concurrency on each CPU.  WQ_MEM_RECLAIM is providing equivalent
amount of forward progress guarantee and all the existing users
shouldn't have issues on this front.  If we have grown incorrect
usages from then on, we need to fix them.

Thanks.

-- 
tejun

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: On-stack work item completion race? (was Re: XFS crash?)
  2014-06-25 14:00                             ` Tejun Heo
@ 2014-06-25 17:04                               ` Austin Schuh
  -1 siblings, 0 replies; 30+ messages in thread
From: Austin Schuh @ 2014-06-25 17:04 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Dave Chinner, xfs, Thomas Gleixner, linux-kernel

On Wed, Jun 25, 2014 at 7:00 AM, Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Tue, Jun 24, 2014 at 08:05:07PM -0700, Austin Schuh wrote:
> > > I can see no reason why manual completion would behave differently
> > > from flush_work() in this case.
> >
> > I went looking for a short trace in my original log to show the problem,
> > and instead found evidence of the second problem.  I still like the shorter
> > flush_work call, but that's not my call.
>
> So, are you saying that the original issue you reported isn't actually
> a problem?  But didn't you imply that changing the waiting mechanism
> fixed a deadlock or was that a false positive?

Correct, that was a false positive.  Sorry for the noise.

> > I spent some more time debugging, and I am seeing that tsk_is_pi_blocked is
> > returning 1 in sched_submit_work (kernel/sched/core.c).  It looks
> > like sched_submit_work is not detecting that the worker task is blocked on
> > a mutex.
>
> The function unplugs the block layer and doesn't have much to do with
> workqueue although it has "_work" in its name.

Thomas moved
+ if (tsk->flags & PF_WQ_WORKER)
+   wq_worker_sleeping(tsk);
into sched_submit_work as part of the RT patchset.

> > This looks very RT related right now.  I see 2 problems from my reading
> > (and experimentation).  The first is that the second worker isn't getting
> > started because tsk_is_pi_blocked is reporting that the task isn't blocked
> > on a mutex.  The second is that even if another worker needs to be
> > scheduled because the original worker is blocked on a mutex, we need the
> > pool lock to schedule another worker.  The pool lock can be acquired by any
> > CPU, and is a spin_lock.  If we end up on the slow path for the pool lock,
> > we hit BUG_ON(rt_mutex_real_waiter(task->pi_blocked_on))
> > in task_blocks_on_rt_mutex in rtmutex.c.  I'm not sure how to deal with
> > either problem.
> >
> > Hopefully I've got all my facts right...  Debugging kernel code is a whole
> > new world from userspace code.
>
> I don't have much idea how RT kernel works either.  Can you reproduce
> the issues that you see on mainline?
>
> Thanks.
>
> --
> tejun

I'll see what I can do.

Thanks!
  Austin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: On-stack work item completion race? (was Re: XFS crash?)
@ 2014-06-25 17:04                               ` Austin Schuh
  0 siblings, 0 replies; 30+ messages in thread
From: Austin Schuh @ 2014-06-25 17:04 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Thomas Gleixner, linux-kernel, xfs

On Wed, Jun 25, 2014 at 7:00 AM, Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Tue, Jun 24, 2014 at 08:05:07PM -0700, Austin Schuh wrote:
> > > I can see no reason why manual completion would behave differently
> > > from flush_work() in this case.
> >
> > I went looking for a short trace in my original log to show the problem,
> > and instead found evidence of the second problem.  I still like the shorter
> > flush_work call, but that's not my call.
>
> So, are you saying that the original issue you reported isn't actually
> a problem?  But didn't you imply that changing the waiting mechanism
> fixed a deadlock or was that a false positive?

Correct, that was a false positive.  Sorry for the noise.

> > I spent some more time debugging, and I am seeing that tsk_is_pi_blocked is
> > returning 1 in sched_submit_work (kernel/sched/core.c).  It looks
> > like sched_submit_work is not detecting that the worker task is blocked on
> > a mutex.
>
> The function unplugs the block layer and doesn't have much to do with
> workqueue although it has "_work" in its name.

Thomas moved
+ if (tsk->flags & PF_WQ_WORKER)
+   wq_worker_sleeping(tsk);
into sched_submit_work as part of the RT patchset.

> > This looks very RT related right now.  I see 2 problems from my reading
> > (and experimentation).  The first is that the second worker isn't getting
> > started because tsk_is_pi_blocked is reporting that the task isn't blocked
> > on a mutex.  The second is that even if another worker needs to be
> > scheduled because the original worker is blocked on a mutex, we need the
> > pool lock to schedule another worker.  The pool lock can be acquired by any
> > CPU, and is a spin_lock.  If we end up on the slow path for the pool lock,
> > we hit BUG_ON(rt_mutex_real_waiter(task->pi_blocked_on))
> > in task_blocks_on_rt_mutex in rtmutex.c.  I'm not sure how to deal with
> > either problem.
> >
> > Hopefully I've got all my facts right...  Debugging kernel code is a whole
> > new world from userspace code.
>
> I don't have much idea how RT kernel works either.  Can you reproduce
> the issues that you see on mainline?
>
> Thanks.
>
> --
> tejun

I'll see what I can do.

Thanks!
  Austin

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: On-stack work item completion race? (was Re: XFS crash?)
  2014-06-25 14:18                             ` Tejun Heo
@ 2014-06-25 22:08                               ` Dave Chinner
  -1 siblings, 0 replies; 30+ messages in thread
From: Dave Chinner @ 2014-06-25 22:08 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Austin Schuh, xfs, linux-kernel

On Wed, Jun 25, 2014 at 10:18:36AM -0400, Tejun Heo wrote:
> Hello, Dave.
> 
> On Wed, Jun 25, 2014 at 03:56:41PM +1000, Dave Chinner wrote:
> > Hmmm - that's different from my understanding of what the original
> > behaviour WQ_MEM_RECLAIM gave us. i.e. that WQ_MEM_RECLAIM
> > workqueues had a rescuer thread created to guarantee that the
> > *workqueue* could make forward progress executing work in a
> > reclaim context.
> 
> From Documentation/workqueue.txt
> 
>   WQ_MEM_RECLAIM
> 
> 	All wq which might be used in the memory reclaim paths _MUST_
> 	have this flag set.  The wq is guaranteed to have at least one
> 	execution context regardless of memory pressure.
> 
> So, all that's guaranteed is that the workqueue has at least one
> worker executing its work items.  If that one worker is serving a work
> item which can't make forward progress, the workqueue is not
> guaranteed to make forward progress.

Adding that to the docco might be useful ;)

> > > As long as a WQ_RECLAIM workqueue dosen't depend upon itself,
> > > forward-progress is guaranteed.
> > 
> > I can't find any documentation that actually defines what
> > WQ_MEM_RECLAIM means, so I can't tell when or how this requirement
> > came about. If it's true, then I suspect most of the WQ_MEM_RECLAIM
> > workqueues in filesystems violate it. Can you point me at
> > documentation/commits/code describing the constraints of
> > WQ_MEM_RECLAIM and the reasons for it?
> 
> Documentation/workqueue.txt should be it but maybe we should be more
> explicit.  The behavior is maintaining what the
> pre-concurrency-management workqueue provided with static
> per-workqueue workers.  Each workqueue reserved its workers (either
> one per cpu or one globally) and it only supported single level of
> concurrency on each CPU.  WQ_MEM_RECLAIM is providing equivalent
> amount of forward progress guarantee and all the existing users
> shouldn't have issues on this front.  If we have grown incorrect
> usages from then on, we need to fix them.

Ok, so it hasn't changed. We're still usingthem like we used the
original workqueues, and we never, ever provided a guarantee of
forwards progress for them, either. So if the workqueues haven't
changed, and we haven't changed how we use workqueues, then
something else is causing all our recent problems....

Thanks for the clarification, Tejun!

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: On-stack work item completion race? (was Re: XFS crash?)
@ 2014-06-25 22:08                               ` Dave Chinner
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Chinner @ 2014-06-25 22:08 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Austin Schuh, xfs

On Wed, Jun 25, 2014 at 10:18:36AM -0400, Tejun Heo wrote:
> Hello, Dave.
> 
> On Wed, Jun 25, 2014 at 03:56:41PM +1000, Dave Chinner wrote:
> > Hmmm - that's different from my understanding of what the original
> > behaviour WQ_MEM_RECLAIM gave us. i.e. that WQ_MEM_RECLAIM
> > workqueues had a rescuer thread created to guarantee that the
> > *workqueue* could make forward progress executing work in a
> > reclaim context.
> 
> From Documentation/workqueue.txt
> 
>   WQ_MEM_RECLAIM
> 
> 	All wq which might be used in the memory reclaim paths _MUST_
> 	have this flag set.  The wq is guaranteed to have at least one
> 	execution context regardless of memory pressure.
> 
> So, all that's guaranteed is that the workqueue has at least one
> worker executing its work items.  If that one worker is serving a work
> item which can't make forward progress, the workqueue is not
> guaranteed to make forward progress.

Adding that to the docco might be useful ;)

> > > As long as a WQ_RECLAIM workqueue dosen't depend upon itself,
> > > forward-progress is guaranteed.
> > 
> > I can't find any documentation that actually defines what
> > WQ_MEM_RECLAIM means, so I can't tell when or how this requirement
> > came about. If it's true, then I suspect most of the WQ_MEM_RECLAIM
> > workqueues in filesystems violate it. Can you point me at
> > documentation/commits/code describing the constraints of
> > WQ_MEM_RECLAIM and the reasons for it?
> 
> Documentation/workqueue.txt should be it but maybe we should be more
> explicit.  The behavior is maintaining what the
> pre-concurrency-management workqueue provided with static
> per-workqueue workers.  Each workqueue reserved its workers (either
> one per cpu or one globally) and it only supported single level of
> concurrency on each CPU.  WQ_MEM_RECLAIM is providing equivalent
> amount of forward progress guarantee and all the existing users
> shouldn't have issues on this front.  If we have grown incorrect
> usages from then on, we need to fix them.

Ok, so it hasn't changed. We're still usingthem like we used the
original workqueues, and we never, ever provided a guarantee of
forwards progress for them, either. So if the workqueues haven't
changed, and we haven't changed how we use workqueues, then
something else is causing all our recent problems....

Thanks for the clarification, Tejun!

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2014-06-25 22:09 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-05 23:08 XFS crash? Austin Schuh
2014-03-05 23:35 ` Dave Chinner
2014-03-06  0:53   ` Austin Schuh
2014-05-13  1:29     ` Austin Schuh
2014-05-13  3:10       ` Austin Schuh
2014-05-13  3:33       ` Austin Schuh
2014-05-13  3:46       ` Dave Chinner
2014-05-13  4:03         ` Austin Schuh
2014-05-13  6:39           ` Dave Chinner
2014-05-13  7:02             ` Austin Schuh
2014-05-13  9:03               ` Dave Chinner
2014-05-13 17:11                 ` Austin Schuh
2014-06-23 20:05                   ` Austin Schuh
2014-06-24  3:02                     ` On-stack work item completion race? (was Re: XFS crash?) Dave Chinner
2014-06-24  3:02                       ` Dave Chinner
2014-06-24  3:25                       ` Tejun Heo
2014-06-24  3:25                         ` Tejun Heo
2014-06-25  3:05                         ` Austin Schuh
2014-06-25 14:00                           ` Tejun Heo
2014-06-25 14:00                             ` Tejun Heo
2014-06-25 17:04                             ` Austin Schuh
2014-06-25 17:04                               ` Austin Schuh
2014-06-25  3:16                         ` Austin Schuh
2014-06-25  3:16                           ` Austin Schuh
2014-06-25  5:56                         ` Dave Chinner
2014-06-25  5:56                           ` Dave Chinner
2014-06-25 14:18                           ` Tejun Heo
2014-06-25 14:18                             ` Tejun Heo
2014-06-25 22:08                             ` Dave Chinner
2014-06-25 22:08                               ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.