linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* I/O hangs after resuming from suspend-to-ram
@ 2017-08-22 11:45 Oleksandr Natalenko
  2017-08-26 10:37 ` Oleksandr Natalenko
  0 siblings, 1 reply; 29+ messages in thread
From: Oleksandr Natalenko @ 2017-08-22 11:45 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-raid,
	linux-kernel, Shaohua Li

Hi.

v4.12.8 kernel hangs in I/O path after resuming from suspend-to-ram. I have 
blk-mq enabled, tried both BFQ and mq-deadline schedulers with the same 
result. Soft lockup happens showing stacktraces I'm pasting below.

Stacktrace shows that I/O hangs in md_super_wait(), and it means it waits for 
"all superblock writes that were scheduled to complete". Since there is 
"scheduled" word, should I also try "none" scheduler with blk-mq enabled?

While I'm trying to reproduce it on a VM without much luck (it happens on my 
laptop rarely, like 1 out of 10 suspend-resume cycles), and also re-checking 
it with blk-mq disabled, by any chance is this something already known?

Ideally, I'd like to reprduce it in a VM and capture vmcore.

Any suggestions are welcome. Thanks.

===
[ 9460.165958] INFO: task md0_raid10:225 blocked for more than 120 seconds.
[ 9460.165983]       Not tainted 4.12.0-pf7 #1
[ 9460.165989] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[ 9460.165996] md0_raid10      D    0   225      2 0x00000000
[ 9460.166005] Call Trace:
[ 9460.166027]  __schedule+0x6e7/0xcd0
[ 9460.166043]  schedule+0x3d/0xd0
[ 9460.166067]  md_super_wait+0x6e/0xa0 [md_mod]
[ 9460.166130]  ? wake_bit_function+0x60/0x60
[ 9460.166143]  write_page+0x185/0x310 [md_mod]
[ 9460.166156]  bitmap_update_sb+0x12f/0x140 [md_mod]
[ 9460.166167]  md_update_sb.part.60+0x354/0x840 [md_mod]
[ 9460.166179]  ? percpu_ref_switch_to_percpu+0x36/0x40
[ 9460.166190]  md_check_recovery+0x214/0x4b0 [md_mod]
[ 9460.166201]  raid10d+0x62/0x13c0 [raid10]
[ 9460.166213]  ? lock_timer_base+0x81/0xa0
[ 9460.166222]  ? try_to_del_timer_sync+0x53/0x80
[ 9460.166232]  ? del_timer_sync+0x39/0x40
[ 9460.166246]  ? schedule_timeout+0x19b/0x330
[ 9460.166261]  ? call_timer_fn+0x160/0x160
[ 9460.166279]  md_thread+0x120/0x160 [md_mod]
[ 9460.166290]  ? md_thread+0x120/0x160 [md_mod]
[ 9460.166298]  ? wake_bit_function+0x60/0x60
[ 9460.166309]  kthread+0x124/0x140
[ 9460.166319]  ? find_pers+0x70/0x70 [md_mod]
[ 9460.166353]  ? kthread_create_on_node+0x70/0x70
[ 9460.166360]  ret_from_fork+0x25/0x30
[ 9460.166371] INFO: task dmcrypt_write:254 blocked for more than 120 seconds.
[ 9460.166376]       Not tainted 4.12.0-pf7 #1
[ 9460.166381] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[ 9460.166386] dmcrypt_write   D    0   254      2 0x00000000
[ 9460.166492] Call Trace:
[ 9460.166501]  __schedule+0x6e7/0xcd0
[ 9460.166511]  schedule+0x3d/0xd0
[ 9460.166522]  md_write_start+0xe3/0x270 [md_mod]
[ 9460.166529]  ? wake_bit_function+0x60/0x60
[ 9460.166538]  raid10_make_request+0x3f/0x140 [raid10]
[ 9460.166549]  md_make_request+0xa9/0x2a0 [md_mod]
[ 9460.166557]  ? __schedule+0x6ef/0xcd0
[ 9460.166567]  generic_make_request+0x11e/0x2f0
[ 9460.166578]  dmcrypt_write+0x22d/0x250 [dm_crypt]
[ 9460.166586]  ? dmcrypt_write+0x22d/0x250 [dm_crypt]
[ 9460.166596]  ? wake_up_q+0x80/0x80
[ 9460.166605]  kthread+0x124/0x140
[ 9460.166613]  ? kthread+0x124/0x140
[ 9460.166621]  ? crypt_iv_essiv_dtr+0x70/0x70 [dm_crypt]
[ 9460.166628]  ? kthread_create_on_node+0x70/0x70
[ 9460.166635]  ret_from_fork+0x25/0x30
[ 9460.166677] INFO: task konversation:7110 blocked for more than 120 seconds.
[ 9460.166683]       Not tainted 4.12.0-pf7 #1
[ 9460.166687] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[ 9460.166693] konversation    D    0  7110   6972 0x00000000
[ 9460.166700] Call Trace:
[ 9460.166709]  __schedule+0x6e7/0xcd0
[ 9460.166719]  schedule+0x3d/0xd0
[ 9460.166824]  io_schedule+0x16/0x40
[ 9460.166833]  wait_on_page_bit+0xeb/0x130
[ 9460.166841]  ? page_cache_tree_insert+0xc0/0xc0
[ 9460.166882]  prepare_pages+0x1b5/0x1d0 [btrfs]
[ 9460.166915]  __btrfs_buffered_write+0x267/0x6d0 [btrfs]
[ 9460.166948]  btrfs_file_write_iter+0x1c1/0x500 [btrfs]
[ 9460.166956]  ? cp_new_stat+0x152/0x180
[ 9460.166966]  __vfs_write+0xe4/0x140
[ 9460.166975]  vfs_write+0xb1/0x1a0
[ 9460.166984]  SyS_write+0x55/0xc0
[ 9460.166994]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[ 9460.167001] RIP: 0033:0x7fd020f09c0d
[ 9460.167006] RSP: 002b:00007ffe27e70360 EFLAGS: 00000293 ORIG_RAX: 
0000000000000001
[ 9460.167013] RAX: ffffffffffffffda RBX: 0000000003178840 RCX: 
00007fd020f09c0d
[ 9460.167019] RDX: 000000000000007d RSI: 0000000003241598 RDI: 
000000000000002b
[ 9460.167024] RBP: 00007fd0211c72a0 R08: 00007fd021a55010 R09: 
0000000000000030
[ 9460.167029] R10: 00007fd0211caad8 R11: 0000000000000293 R12: 
0000000000000000
[ 9460.167034] R13: 00000000013b4cef R14: 0000000000000400 R15: 
0000000000000001
[ 9460.167075] INFO: task akonadi_imap_re:7363 blocked for more than 120 
seconds.
[ 9460.167080]       Not tainted 4.12.0-pf7 #1
[ 9460.167084] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[ 9460.167089] akonadi_imap_re D    0  7363   7107 0x00000000
[ 9460.167096] Call Trace:
[ 9460.167105]  __schedule+0x6e7/0xcd0
[ 9460.167114]  schedule+0x3d/0xd0
[ 9460.167149]  wait_for_writer+0xb1/0xe0 [btrfs]
[ 9460.167157]  ? wake_bit_function+0x60/0x60
[ 9460.167190]  btrfs_sync_log+0xe2/0xad0 [btrfs]
[ 9460.167201]  ? dput.part.24+0x51/0x1d0
[ 9460.167210]  ? dput+0x13/0x20
[ 9460.167242]  ? btrfs_log_dentry_safe+0x6a/0x80 [btrfs]
[ 9460.167275]  btrfs_sync_file+0x2f4/0x3f0 [btrfs]
[ 9460.167302]  ? btrfs_sync_file+0x2f4/0x3f0 [btrfs]
[ 9460.167313]  vfs_fsync_range+0x4b/0xb0
[ 9460.167320]  do_fsync+0x3d/0x70
[ 9460.167327]  SyS_fdatasync+0x13/0x20
[ 9460.167336]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[ 9460.167342] RIP: 0033:0x7fcdf0d7522d
[ 9460.167348] RSP: 002b:00007ffdcb64e9a0 EFLAGS: 00000293 ORIG_RAX: 
000000000000004b
[ 9460.167355] RAX: ffffffffffffffda RBX: 0000007b8c7c9f80 RCX: 
00007fcdf0d7522d
[ 9460.167361] RDX: 0000000000000023 RSI: 0000007b8c7ca228 RDI: 
0000000000000010
[ 9460.167366] RBP: 0000000000000001 R08: 0000007b8c7c9270 R09: 
000000000000002f
[ 9460.167371] R10: 0000007b8c7c9830 R11: 0000000000000293 R12: 
0000007b8c7b88f0
[ 9460.167376] R13: 0000000000000000 R14: 0000000000000010 R15: 
0000007b8c7b66b0
[ 9460.167387] INFO: task akonadi_maildis:7366 blocked for more than 120 
seconds.
[ 9460.167492]       Not tainted 4.12.0-pf7 #1
[ 9460.167496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[ 9460.167501] akonadi_maildis D    0  7366   7107 0x00000000
[ 9460.167507] Call Trace:
[ 9460.167516]  __schedule+0x6e7/0xcd0
[ 9460.167525]  schedule+0x3d/0xd0
[ 9460.167557]  wait_log_commit+0xd6/0x100 [btrfs]
[ 9460.167564]  ? wake_bit_function+0x60/0x60
[ 9460.167593]  btrfs_sync_log+0x13e/0xad0 [btrfs]
[ 9460.167603]  ? dput.part.24+0x51/0x1d0
[ 9460.167611]  ? dput+0x13/0x20
[ 9460.167640]  ? btrfs_log_dentry_safe+0x6a/0x80 [btrfs]
[ 9460.167670]  btrfs_sync_file+0x2f4/0x3f0 [btrfs]
[ 9460.167696]  ? btrfs_sync_file+0x2f4/0x3f0 [btrfs]
[ 9460.167705]  vfs_fsync_range+0x4b/0xb0
[ 9460.167712]  do_fsync+0x3d/0x70
[ 9460.167718]  SyS_fdatasync+0x13/0x20
[ 9460.167727]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[ 9460.167733] RIP: 0033:0x7ff91da0822d
[ 9460.167737] RSP: 002b:00007ffdbded6fb0 EFLAGS: 00000293 ORIG_RAX: 
000000000000004b
[ 9460.167744] RAX: ffffffffffffffda RBX: 000000841e8cc480 RCX: 
00007ff91da0822d
[ 9460.167749] RDX: 0000000000000028 RSI: 000000841e902608 RDI: 
000000000000000f
[ 9460.167754] RBP: 0000000000000001 R08: 000000841e8aff00 R09: 
000000000000002f
[ 9460.167760] R10: 000000841e91a1d0 R11: 0000000000000293 R12: 
000000841e910cc0
[ 9460.167764] R13: 0000000000000000 R14: 000000000000000f R15: 
000000841e9165c0
[ 9460.167786] INFO: task chromium:7600 blocked for more than 120 seconds.
[ 9460.167791]       Not tainted 4.12.0-pf7 #1
[ 9460.167796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[ 9460.167800] chromium        D    0  7600   7020 0x00000000
[ 9460.167807] Call Trace:
[ 9460.167815]  __schedule+0x6e7/0xcd0
[ 9460.167824]  schedule+0x3d/0xd0
[ 9460.167831]  io_schedule+0x16/0x40
[ 9460.167838]  wait_on_page_bit+0xeb/0x130
[ 9460.167847]  ? page_cache_tree_insert+0xc0/0xc0
[ 9460.167879]  btrfs_page_mkwrite+0x1bf/0x400 [btrfs]
[ 9460.167889]  do_page_mkwrite+0x31/0x90
[ 9460.167896]  ? current_time+0x38/0x70
[ 9460.167905]  do_wp_page+0x200/0x4b0
[ 9460.167911]  ? file_update_time+0x60/0x110
[ 9460.167920]  __handle_mm_fault+0x832/0xd50
[ 9460.167930]  handle_mm_fault+0xde/0x220
[ 9460.167938]  __do_page_fault+0x24d/0x510
[ 9460.167946]  do_page_fault+0x22/0x30
[ 9460.167954]  page_fault+0x28/0x30
[ 9460.167959] RIP: 0033:0x64de23cd9c
[ 9460.167964] RSP: 002b:00007ffdfd4ad4c0 EFLAGS: 00010202
[ 9460.167970] RAX: 00007f80a51b9d58 RBX: 0000398cef048700 RCX: 
0000398ce8da6510
[ 9460.167982] RDX: 0000000000000006 RSI: 0000000000000000 RDI: 
0000398cef048700
[ 9460.167988] RBP: 0000000000000005 R08: 0000000000000000 R09: 
0000000000000000
[ 9460.167993] R10: 00007ffdfd4acdc0 R11: 0000000000000000 R12: 
0000000000000001
[ 9460.168000] R13: 0000000000000005 R14: 00007ffdfd4ad908 R15: 
0000000000000000
[ 9460.168014] INFO: task BrowserBlocking:7639 blocked for more than 120 
seconds.
[ 9460.168019]       Not tainted 4.12.0-pf7 #1
[ 9460.168024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[ 9460.168028] BrowserBlocking D    0  7639   7020 0x00000000
[ 9460.168035] Call Trace:
[ 9460.168044]  __schedule+0x6e7/0xcd0
[ 9460.168054]  schedule+0x3d/0xd0
[ 9460.168061]  io_schedule+0x16/0x40
[ 9460.168068]  wait_on_page_bit_common+0xe3/0x180
[ 9460.168076]  ? page_cache_tree_insert+0xc0/0xc0
[ 9460.168083]  __filemap_fdatawait_range+0x12a/0x1a0
[ 9460.168093]  filemap_fdatawait_range+0x14/0x30
[ 9460.168128]  btrfs_wait_ordered_range+0x6b/0x110 [btrfs]
[ 9460.168157]  ? btrfs_fdatawrite_range+0x20/0x50 [btrfs]
[ 9460.168186]  btrfs_sync_file+0x17c/0x3f0 [btrfs]
[ 9460.168195]  vfs_fsync_range+0x4b/0xb0
[ 9460.168202]  do_fsync+0x3d/0x70
[ 9460.168209]  SyS_fdatasync+0x13/0x20
[ 9460.168218]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[ 9460.168223] RIP: 0033:0x7f80b8fcd22d
[ 9460.168228] RSP: 002b:00007f808c8b7680 EFLAGS: 00000293 ORIG_RAX: 
000000000000004b
[ 9460.168235] RAX: ffffffffffffffda RBX: 00007f808c8b7670 RCX: 
00007f80b8fcd22d
[ 9460.168240] RDX: 00000000000019bc RSI: 0000398cef877800 RDI: 
0000000000000119
[ 9460.168246] RBP: 0000000000000119 R08: 0000000000000037 R09: 
0000398ce88693c0
[ 9460.168251] R10: 0000000000000000 R11: 0000000000000293 R12: 
00007f808c8b77d8
[ 9460.168257] R13: 0000398ceb6cf820 R14: 00000000000019bc R15: 
0000398cef877800
[ 9460.168267] INFO: task Chrome_SyncThre:7867 blocked for more than 120 
seconds.
[ 9460.168272]       Not tainted 4.12.0-pf7 #1
[ 9460.168277] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[ 9460.168282] Chrome_SyncThre D    0  7867   7020 0x00000000
[ 9460.168289] Call Trace:
[ 9460.168297]  __schedule+0x6e7/0xcd0
[ 9460.168306]  schedule+0x3d/0xd0
[ 9460.168338]  wait_ordered_extents.isra.22.constprop.26+0x354/0x3d0 [btrfs]
[ 9460.168348]  ? wake_bit_function+0x60/0x60
[ 9460.168377]  btrfs_log_changed_extents+0x28a/0x620 [btrfs]
[ 9460.168412]  ? release_extent_buffer+0x98/0xb0 [btrfs]
[ 9460.168444]  btrfs_log_inode+0x97e/0x1140 [btrfs]
[ 9460.168668]  btrfs_log_inode_parent+0x277/0xa30 [btrfs]
[ 9460.168676]  ? touch_atime+0xc1/0xd0
[ 9460.168686]  ? refcount_inc+0x9/0x30
[ 9460.168719]  ? join_transaction+0x122/0x450 [btrfs]
[ 9460.168750]  btrfs_log_dentry_safe+0x60/0x80 [btrfs]
[ 9460.168779]  btrfs_sync_file+0x2b3/0x3f0 [btrfs]
[ 9460.168787]  vfs_fsync_range+0x4b/0xb0
[ 9460.168794]  do_fsync+0x3d/0x70
[ 9460.168801]  SyS_fdatasync+0x13/0x20
[ 9460.168810]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[ 9460.168815] RIP: 0033:0x7f80b8fcd22d
[ 9460.168819] RSP: 002b:00007f80804e1720 EFLAGS: 00000293 ORIG_RAX: 
000000000000004b
[ 9460.168826] RAX: ffffffffffffffda RBX: 0000000000008000 RCX: 
00007f80b8fcd22d
[ 9460.168830] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 
00000000000000fb
[ 9460.168835] RBP: 0000398ce9e33000 R08: 0000398ced562800 R09: 
0000000000000001
[ 9460.168840] R10: 0000000000018400 R11: 0000000000000293 R12: 
00000000004b0000
[ 9460.168845] R13: 0000398ceac2ac78 R14: 0000398ce9e3b038 R15: 
0000398ceb9a1048
[ 9460.168854] INFO: task BrowserBlocking:8011 blocked for more than 120 
seconds.
[ 9460.168858]       Not tainted 4.12.0-pf7 #1
[ 9460.168863] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[ 9460.168867] BrowserBlocking D    0  8011   7020 0x00000000
[ 9460.168873] Call Trace:
[ 9460.168882]  __schedule+0x6e7/0xcd0
[ 9460.168892]  schedule+0x3d/0xd0
[ 9460.168898]  io_schedule+0x16/0x40
[ 9460.168905]  wait_on_page_bit_common+0xe3/0x180
[ 9460.168912]  ? page_cache_tree_insert+0xc0/0xc0
[ 9460.168919]  __filemap_fdatawait_range+0x12a/0x1a0
[ 9460.168929]  filemap_fdatawait_range+0x14/0x30
[ 9460.168961]  btrfs_wait_ordered_range+0x6b/0x110 [btrfs]
[ 9460.168989]  ? btrfs_fdatawrite_range+0x20/0x50 [btrfs]
[ 9460.169015]  btrfs_sync_file+0x17c/0x3f0 [btrfs]
[ 9460.169025]  vfs_fsync_range+0x4b/0xb0
[ 9460.169033]  do_fsync+0x3d/0x70
[ 9460.169040]  SyS_fdatasync+0x13/0x20
[ 9460.169050]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[ 9460.169055] RIP: 0033:0x7f80b8fcd22d
[ 9460.169060] RSP: 002b:00007f807d5f1680 EFLAGS: 00000293 ORIG_RAX: 
000000000000004b
[ 9460.169066] RAX: ffffffffffffffda RBX: 00007f807d5f1670 RCX: 
00007f80b8fcd22d
[ 9460.169073] RDX: 000000000001f95c RSI: 0000398cece75000 RDI: 
000000000000011d
[ 9460.169078] RBP: 000000000000011d R08: 000000000000003f R09: 
0000398ce8870200
[ 9460.169083] R10: 0000000000000000 R11: 0000000000000293 R12: 
00007f807d5f17d8
[ 9460.169088] R13: 0000398ceafb9160 R14: 000000000001f95c R15: 
0000398cece75000
[ 9460.169098] INFO: task TaskSchedulerBa:9603 blocked for more than 120 
seconds.
[ 9460.169103]       Not tainted 4.12.0-pf7 #1
[ 9460.169108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[ 9460.169113] TaskSchedulerBa D    0  9603   7020 0x00000000
[ 9460.169120] Call Trace:
[ 9460.169128]  __schedule+0x6e7/0xcd0
[ 9460.169138]  schedule+0x3d/0xd0
[ 9460.169166]  wait_ordered_extents.isra.22.constprop.26+0x354/0x3d0 [btrfs]
[ 9460.169175]  ? wake_bit_function+0x60/0x60
[ 9460.169204]  btrfs_log_changed_extents+0x28a/0x620 [btrfs]
[ 9460.169237]  ? release_extent_buffer+0x98/0xb0 [btrfs]
[ 9460.169265]  btrfs_log_inode+0x97e/0x1140 [btrfs]
[ 9460.169278]  ? __set_page_dirty_nobuffers+0x117/0x150
[ 9460.169310]  ? btree_set_page_dirty+0xe/0x10 [btrfs]
[ 9460.169346]  ? set_extent_buffer_dirty+0x7b/0xb0 [btrfs]
[ 9460.169378]  btrfs_log_inode_parent+0x277/0xa30 [btrfs]
[ 9460.169386]  ? touch_atime+0xc1/0xd0
[ 9460.169395]  ? refcount_inc+0x9/0x30
[ 9460.169427]  ? join_transaction+0x122/0x450 [btrfs]
[ 9460.169457]  btrfs_log_dentry_safe+0x60/0x80 [btrfs]
[ 9460.169490]  btrfs_sync_file+0x2b3/0x3f0 [btrfs]
[ 9460.169498]  vfs_fsync_range+0x4b/0xb0
[ 9460.169505]  do_fsync+0x3d/0x70
[ 9460.169512]  SyS_fdatasync+0x13/0x20
[ 9460.169522]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[ 9460.169527] RIP: 0033:0x7f80b8fcd22d
[ 9460.169532] RSP: 002b:00007f80777ea2c0 EFLAGS: 00000293 ORIG_RAX: 
000000000000004b
[ 9460.169541] RAX: ffffffffffffffda RBX: 0000398ce9546d90 RCX: 
00007f80b8fcd22d
[ 9460.169546] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 
0000000000000117
[ 9460.169552] RBP: 0000000000000200 R08: 0000398ced6459c0 R09: 
0000000000000001
[ 9460.169557] R10: 000000000000e400 R11: 0000000000000293 R12: 
0000000000000001
[ 9460.169562] R13: 0000000000000015 R14: 0000000000000200 R15: 
0000398ce9b50d80
===

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-08-22 11:45 I/O hangs after resuming from suspend-to-ram Oleksandr Natalenko
@ 2017-08-26 10:37 ` Oleksandr Natalenko
  2017-08-26 10:48   ` Oleksandr Natalenko
  0 siblings, 1 reply; 29+ messages in thread
From: Oleksandr Natalenko @ 2017-08-26 10:37 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-raid,
	linux-kernel, Shaohua Li

Hi.

I've re-checked this issue with 4.12.9, and it is still there.

Also, I've managed to reproduce it in a VM with non-virtio disks (just -hda/-
hdb pair in QEMU).

I'm not able to reproduce it with blk_mq disabled. Also, if blk_mq is enabled, 
scheduler does not make any difference, even setting "none" triggers hang on 
resume.

Next, will check for possible change between v4.12.7 and v4.12.8 (I don't 
remember this issue to be triggered on v4.12.7, or maybe I just haven't 
noticed it), and also will recompile kernel with debug info to capture proper 
vmcore.

On úterý 22. srpna 2017 13:45:43 CEST Oleksandr Natalenko wrote:
> Hi.
> 
> v4.12.8 kernel hangs in I/O path after resuming from suspend-to-ram. I have
> blk-mq enabled, tried both BFQ and mq-deadline schedulers with the same
> result. Soft lockup happens showing stacktraces I'm pasting below.
> 
> Stacktrace shows that I/O hangs in md_super_wait(), and it means it waits
> for "all superblock writes that were scheduled to complete". Since there is
> "scheduled" word, should I also try "none" scheduler with blk-mq enabled?
> 
> While I'm trying to reproduce it on a VM without much luck (it happens on my
> laptop rarely, like 1 out of 10 suspend-resume cycles), and also
> re-checking it with blk-mq disabled, by any chance is this something
> already known?
> 
> Ideally, I'd like to reprduce it in a VM and capture vmcore.
> 
> Any suggestions are welcome. Thanks.
> 
> ===
> [ 9460.165958] INFO: task md0_raid10:225 blocked for more than 120 seconds.
> [ 9460.165983]       Not tainted 4.12.0-pf7 #1
> [ 9460.165989] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 9460.165996] md0_raid10      D    0   225      2 0x00000000
> [ 9460.166005] Call Trace:
> [ 9460.166027]  __schedule+0x6e7/0xcd0
> [ 9460.166043]  schedule+0x3d/0xd0
> [ 9460.166067]  md_super_wait+0x6e/0xa0 [md_mod]
> [ 9460.166130]  ? wake_bit_function+0x60/0x60
> [ 9460.166143]  write_page+0x185/0x310 [md_mod]
> [ 9460.166156]  bitmap_update_sb+0x12f/0x140 [md_mod]
> [ 9460.166167]  md_update_sb.part.60+0x354/0x840 [md_mod]
> [ 9460.166179]  ? percpu_ref_switch_to_percpu+0x36/0x40
> [ 9460.166190]  md_check_recovery+0x214/0x4b0 [md_mod]
> [ 9460.166201]  raid10d+0x62/0x13c0 [raid10]
> [ 9460.166213]  ? lock_timer_base+0x81/0xa0
> [ 9460.166222]  ? try_to_del_timer_sync+0x53/0x80
> [ 9460.166232]  ? del_timer_sync+0x39/0x40
> [ 9460.166246]  ? schedule_timeout+0x19b/0x330
> [ 9460.166261]  ? call_timer_fn+0x160/0x160
> [ 9460.166279]  md_thread+0x120/0x160 [md_mod]
> [ 9460.166290]  ? md_thread+0x120/0x160 [md_mod]
> [ 9460.166298]  ? wake_bit_function+0x60/0x60
> [ 9460.166309]  kthread+0x124/0x140
> [ 9460.166319]  ? find_pers+0x70/0x70 [md_mod]
> [ 9460.166353]  ? kthread_create_on_node+0x70/0x70
> [ 9460.166360]  ret_from_fork+0x25/0x30
> [ 9460.166371] INFO: task dmcrypt_write:254 blocked for more than 120
> seconds. [ 9460.166376]       Not tainted 4.12.0-pf7 #1
> [ 9460.166381] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 9460.166386] dmcrypt_write   D    0   254      2 0x00000000
> [ 9460.166492] Call Trace:
> [ 9460.166501]  __schedule+0x6e7/0xcd0
> [ 9460.166511]  schedule+0x3d/0xd0
> [ 9460.166522]  md_write_start+0xe3/0x270 [md_mod]
> [ 9460.166529]  ? wake_bit_function+0x60/0x60
> [ 9460.166538]  raid10_make_request+0x3f/0x140 [raid10]
> [ 9460.166549]  md_make_request+0xa9/0x2a0 [md_mod]
> [ 9460.166557]  ? __schedule+0x6ef/0xcd0
> [ 9460.166567]  generic_make_request+0x11e/0x2f0
> [ 9460.166578]  dmcrypt_write+0x22d/0x250 [dm_crypt]
> [ 9460.166586]  ? dmcrypt_write+0x22d/0x250 [dm_crypt]
> [ 9460.166596]  ? wake_up_q+0x80/0x80
> [ 9460.166605]  kthread+0x124/0x140
> [ 9460.166613]  ? kthread+0x124/0x140
> [ 9460.166621]  ? crypt_iv_essiv_dtr+0x70/0x70 [dm_crypt]
> [ 9460.166628]  ? kthread_create_on_node+0x70/0x70
> [ 9460.166635]  ret_from_fork+0x25/0x30
> [ 9460.166677] INFO: task konversation:7110 blocked for more than 120
> seconds. [ 9460.166683]       Not tainted 4.12.0-pf7 #1
> [ 9460.166687] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 9460.166693] konversation    D    0  7110   6972 0x00000000
> [ 9460.166700] Call Trace:
> [ 9460.166709]  __schedule+0x6e7/0xcd0
> [ 9460.166719]  schedule+0x3d/0xd0
> [ 9460.166824]  io_schedule+0x16/0x40
> [ 9460.166833]  wait_on_page_bit+0xeb/0x130
> [ 9460.166841]  ? page_cache_tree_insert+0xc0/0xc0
> [ 9460.166882]  prepare_pages+0x1b5/0x1d0 [btrfs]
> [ 9460.166915]  __btrfs_buffered_write+0x267/0x6d0 [btrfs]
> [ 9460.166948]  btrfs_file_write_iter+0x1c1/0x500 [btrfs]
> [ 9460.166956]  ? cp_new_stat+0x152/0x180
> [ 9460.166966]  __vfs_write+0xe4/0x140
> [ 9460.166975]  vfs_write+0xb1/0x1a0
> [ 9460.166984]  SyS_write+0x55/0xc0
> [ 9460.166994]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> [ 9460.167001] RIP: 0033:0x7fd020f09c0d
> [ 9460.167006] RSP: 002b:00007ffe27e70360 EFLAGS: 00000293 ORIG_RAX:
> 0000000000000001
> [ 9460.167013] RAX: ffffffffffffffda RBX: 0000000003178840 RCX:
> 00007fd020f09c0d
> [ 9460.167019] RDX: 000000000000007d RSI: 0000000003241598 RDI:
> 000000000000002b
> [ 9460.167024] RBP: 00007fd0211c72a0 R08: 00007fd021a55010 R09:
> 0000000000000030
> [ 9460.167029] R10: 00007fd0211caad8 R11: 0000000000000293 R12:
> 0000000000000000
> [ 9460.167034] R13: 00000000013b4cef R14: 0000000000000400 R15:
> 0000000000000001
> [ 9460.167075] INFO: task akonadi_imap_re:7363 blocked for more than 120
> seconds.
> [ 9460.167080]       Not tainted 4.12.0-pf7 #1
> [ 9460.167084] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 9460.167089] akonadi_imap_re D    0  7363   7107 0x00000000
> [ 9460.167096] Call Trace:
> [ 9460.167105]  __schedule+0x6e7/0xcd0
> [ 9460.167114]  schedule+0x3d/0xd0
> [ 9460.167149]  wait_for_writer+0xb1/0xe0 [btrfs]
> [ 9460.167157]  ? wake_bit_function+0x60/0x60
> [ 9460.167190]  btrfs_sync_log+0xe2/0xad0 [btrfs]
> [ 9460.167201]  ? dput.part.24+0x51/0x1d0
> [ 9460.167210]  ? dput+0x13/0x20
> [ 9460.167242]  ? btrfs_log_dentry_safe+0x6a/0x80 [btrfs]
> [ 9460.167275]  btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> [ 9460.167302]  ? btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> [ 9460.167313]  vfs_fsync_range+0x4b/0xb0
> [ 9460.167320]  do_fsync+0x3d/0x70
> [ 9460.167327]  SyS_fdatasync+0x13/0x20
> [ 9460.167336]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> [ 9460.167342] RIP: 0033:0x7fcdf0d7522d
> [ 9460.167348] RSP: 002b:00007ffdcb64e9a0 EFLAGS: 00000293 ORIG_RAX:
> 000000000000004b
> [ 9460.167355] RAX: ffffffffffffffda RBX: 0000007b8c7c9f80 RCX:
> 00007fcdf0d7522d
> [ 9460.167361] RDX: 0000000000000023 RSI: 0000007b8c7ca228 RDI:
> 0000000000000010
> [ 9460.167366] RBP: 0000000000000001 R08: 0000007b8c7c9270 R09:
> 000000000000002f
> [ 9460.167371] R10: 0000007b8c7c9830 R11: 0000000000000293 R12:
> 0000007b8c7b88f0
> [ 9460.167376] R13: 0000000000000000 R14: 0000000000000010 R15:
> 0000007b8c7b66b0
> [ 9460.167387] INFO: task akonadi_maildis:7366 blocked for more than 120
> seconds.
> [ 9460.167492]       Not tainted 4.12.0-pf7 #1
> [ 9460.167496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 9460.167501] akonadi_maildis D    0  7366   7107 0x00000000
> [ 9460.167507] Call Trace:
> [ 9460.167516]  __schedule+0x6e7/0xcd0
> [ 9460.167525]  schedule+0x3d/0xd0
> [ 9460.167557]  wait_log_commit+0xd6/0x100 [btrfs]
> [ 9460.167564]  ? wake_bit_function+0x60/0x60
> [ 9460.167593]  btrfs_sync_log+0x13e/0xad0 [btrfs]
> [ 9460.167603]  ? dput.part.24+0x51/0x1d0
> [ 9460.167611]  ? dput+0x13/0x20
> [ 9460.167640]  ? btrfs_log_dentry_safe+0x6a/0x80 [btrfs]
> [ 9460.167670]  btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> [ 9460.167696]  ? btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> [ 9460.167705]  vfs_fsync_range+0x4b/0xb0
> [ 9460.167712]  do_fsync+0x3d/0x70
> [ 9460.167718]  SyS_fdatasync+0x13/0x20
> [ 9460.167727]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> [ 9460.167733] RIP: 0033:0x7ff91da0822d
> [ 9460.167737] RSP: 002b:00007ffdbded6fb0 EFLAGS: 00000293 ORIG_RAX:
> 000000000000004b
> [ 9460.167744] RAX: ffffffffffffffda RBX: 000000841e8cc480 RCX:
> 00007ff91da0822d
> [ 9460.167749] RDX: 0000000000000028 RSI: 000000841e902608 RDI:
> 000000000000000f
> [ 9460.167754] RBP: 0000000000000001 R08: 000000841e8aff00 R09:
> 000000000000002f
> [ 9460.167760] R10: 000000841e91a1d0 R11: 0000000000000293 R12:
> 000000841e910cc0
> [ 9460.167764] R13: 0000000000000000 R14: 000000000000000f R15:
> 000000841e9165c0
> [ 9460.167786] INFO: task chromium:7600 blocked for more than 120 seconds.
> [ 9460.167791]       Not tainted 4.12.0-pf7 #1
> [ 9460.167796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 9460.167800] chromium        D    0  7600   7020 0x00000000
> [ 9460.167807] Call Trace:
> [ 9460.167815]  __schedule+0x6e7/0xcd0
> [ 9460.167824]  schedule+0x3d/0xd0
> [ 9460.167831]  io_schedule+0x16/0x40
> [ 9460.167838]  wait_on_page_bit+0xeb/0x130
> [ 9460.167847]  ? page_cache_tree_insert+0xc0/0xc0
> [ 9460.167879]  btrfs_page_mkwrite+0x1bf/0x400 [btrfs]
> [ 9460.167889]  do_page_mkwrite+0x31/0x90
> [ 9460.167896]  ? current_time+0x38/0x70
> [ 9460.167905]  do_wp_page+0x200/0x4b0
> [ 9460.167911]  ? file_update_time+0x60/0x110
> [ 9460.167920]  __handle_mm_fault+0x832/0xd50
> [ 9460.167930]  handle_mm_fault+0xde/0x220
> [ 9460.167938]  __do_page_fault+0x24d/0x510
> [ 9460.167946]  do_page_fault+0x22/0x30
> [ 9460.167954]  page_fault+0x28/0x30
> [ 9460.167959] RIP: 0033:0x64de23cd9c
> [ 9460.167964] RSP: 002b:00007ffdfd4ad4c0 EFLAGS: 00010202
> [ 9460.167970] RAX: 00007f80a51b9d58 RBX: 0000398cef048700 RCX:
> 0000398ce8da6510
> [ 9460.167982] RDX: 0000000000000006 RSI: 0000000000000000 RDI:
> 0000398cef048700
> [ 9460.167988] RBP: 0000000000000005 R08: 0000000000000000 R09:
> 0000000000000000
> [ 9460.167993] R10: 00007ffdfd4acdc0 R11: 0000000000000000 R12:
> 0000000000000001
> [ 9460.168000] R13: 0000000000000005 R14: 00007ffdfd4ad908 R15:
> 0000000000000000
> [ 9460.168014] INFO: task BrowserBlocking:7639 blocked for more than 120
> seconds.
> [ 9460.168019]       Not tainted 4.12.0-pf7 #1
> [ 9460.168024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 9460.168028] BrowserBlocking D    0  7639   7020 0x00000000
> [ 9460.168035] Call Trace:
> [ 9460.168044]  __schedule+0x6e7/0xcd0
> [ 9460.168054]  schedule+0x3d/0xd0
> [ 9460.168061]  io_schedule+0x16/0x40
> [ 9460.168068]  wait_on_page_bit_common+0xe3/0x180
> [ 9460.168076]  ? page_cache_tree_insert+0xc0/0xc0
> [ 9460.168083]  __filemap_fdatawait_range+0x12a/0x1a0
> [ 9460.168093]  filemap_fdatawait_range+0x14/0x30
> [ 9460.168128]  btrfs_wait_ordered_range+0x6b/0x110 [btrfs]
> [ 9460.168157]  ? btrfs_fdatawrite_range+0x20/0x50 [btrfs]
> [ 9460.168186]  btrfs_sync_file+0x17c/0x3f0 [btrfs]
> [ 9460.168195]  vfs_fsync_range+0x4b/0xb0
> [ 9460.168202]  do_fsync+0x3d/0x70
> [ 9460.168209]  SyS_fdatasync+0x13/0x20
> [ 9460.168218]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> [ 9460.168223] RIP: 0033:0x7f80b8fcd22d
> [ 9460.168228] RSP: 002b:00007f808c8b7680 EFLAGS: 00000293 ORIG_RAX:
> 000000000000004b
> [ 9460.168235] RAX: ffffffffffffffda RBX: 00007f808c8b7670 RCX:
> 00007f80b8fcd22d
> [ 9460.168240] RDX: 00000000000019bc RSI: 0000398cef877800 RDI:
> 0000000000000119
> [ 9460.168246] RBP: 0000000000000119 R08: 0000000000000037 R09:
> 0000398ce88693c0
> [ 9460.168251] R10: 0000000000000000 R11: 0000000000000293 R12:
> 00007f808c8b77d8
> [ 9460.168257] R13: 0000398ceb6cf820 R14: 00000000000019bc R15:
> 0000398cef877800
> [ 9460.168267] INFO: task Chrome_SyncThre:7867 blocked for more than 120
> seconds.
> [ 9460.168272]       Not tainted 4.12.0-pf7 #1
> [ 9460.168277] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 9460.168282] Chrome_SyncThre D    0  7867   7020 0x00000000
> [ 9460.168289] Call Trace:
> [ 9460.168297]  __schedule+0x6e7/0xcd0
> [ 9460.168306]  schedule+0x3d/0xd0
> [ 9460.168338]  wait_ordered_extents.isra.22.constprop.26+0x354/0x3d0
> [btrfs] [ 9460.168348]  ? wake_bit_function+0x60/0x60
> [ 9460.168377]  btrfs_log_changed_extents+0x28a/0x620 [btrfs]
> [ 9460.168412]  ? release_extent_buffer+0x98/0xb0 [btrfs]
> [ 9460.168444]  btrfs_log_inode+0x97e/0x1140 [btrfs]
> [ 9460.168668]  btrfs_log_inode_parent+0x277/0xa30 [btrfs]
> [ 9460.168676]  ? touch_atime+0xc1/0xd0
> [ 9460.168686]  ? refcount_inc+0x9/0x30
> [ 9460.168719]  ? join_transaction+0x122/0x450 [btrfs]
> [ 9460.168750]  btrfs_log_dentry_safe+0x60/0x80 [btrfs]
> [ 9460.168779]  btrfs_sync_file+0x2b3/0x3f0 [btrfs]
> [ 9460.168787]  vfs_fsync_range+0x4b/0xb0
> [ 9460.168794]  do_fsync+0x3d/0x70
> [ 9460.168801]  SyS_fdatasync+0x13/0x20
> [ 9460.168810]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> [ 9460.168815] RIP: 0033:0x7f80b8fcd22d
> [ 9460.168819] RSP: 002b:00007f80804e1720 EFLAGS: 00000293 ORIG_RAX:
> 000000000000004b
> [ 9460.168826] RAX: ffffffffffffffda RBX: 0000000000008000 RCX:
> 00007f80b8fcd22d
> [ 9460.168830] RDX: 0000000000000001 RSI: 0000000000000002 RDI:
> 00000000000000fb
> [ 9460.168835] RBP: 0000398ce9e33000 R08: 0000398ced562800 R09:
> 0000000000000001
> [ 9460.168840] R10: 0000000000018400 R11: 0000000000000293 R12:
> 00000000004b0000
> [ 9460.168845] R13: 0000398ceac2ac78 R14: 0000398ce9e3b038 R15:
> 0000398ceb9a1048
> [ 9460.168854] INFO: task BrowserBlocking:8011 blocked for more than 120
> seconds.
> [ 9460.168858]       Not tainted 4.12.0-pf7 #1
> [ 9460.168863] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 9460.168867] BrowserBlocking D    0  8011   7020 0x00000000
> [ 9460.168873] Call Trace:
> [ 9460.168882]  __schedule+0x6e7/0xcd0
> [ 9460.168892]  schedule+0x3d/0xd0
> [ 9460.168898]  io_schedule+0x16/0x40
> [ 9460.168905]  wait_on_page_bit_common+0xe3/0x180
> [ 9460.168912]  ? page_cache_tree_insert+0xc0/0xc0
> [ 9460.168919]  __filemap_fdatawait_range+0x12a/0x1a0
> [ 9460.168929]  filemap_fdatawait_range+0x14/0x30
> [ 9460.168961]  btrfs_wait_ordered_range+0x6b/0x110 [btrfs]
> [ 9460.168989]  ? btrfs_fdatawrite_range+0x20/0x50 [btrfs]
> [ 9460.169015]  btrfs_sync_file+0x17c/0x3f0 [btrfs]
> [ 9460.169025]  vfs_fsync_range+0x4b/0xb0
> [ 9460.169033]  do_fsync+0x3d/0x70
> [ 9460.169040]  SyS_fdatasync+0x13/0x20
> [ 9460.169050]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> [ 9460.169055] RIP: 0033:0x7f80b8fcd22d
> [ 9460.169060] RSP: 002b:00007f807d5f1680 EFLAGS: 00000293 ORIG_RAX:
> 000000000000004b
> [ 9460.169066] RAX: ffffffffffffffda RBX: 00007f807d5f1670 RCX:
> 00007f80b8fcd22d
> [ 9460.169073] RDX: 000000000001f95c RSI: 0000398cece75000 RDI:
> 000000000000011d
> [ 9460.169078] RBP: 000000000000011d R08: 000000000000003f R09:
> 0000398ce8870200
> [ 9460.169083] R10: 0000000000000000 R11: 0000000000000293 R12:
> 00007f807d5f17d8
> [ 9460.169088] R13: 0000398ceafb9160 R14: 000000000001f95c R15:
> 0000398cece75000
> [ 9460.169098] INFO: task TaskSchedulerBa:9603 blocked for more than 120
> seconds.
> [ 9460.169103]       Not tainted 4.12.0-pf7 #1
> [ 9460.169108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 9460.169113] TaskSchedulerBa D    0  9603   7020 0x00000000
> [ 9460.169120] Call Trace:
> [ 9460.169128]  __schedule+0x6e7/0xcd0
> [ 9460.169138]  schedule+0x3d/0xd0
> [ 9460.169166]  wait_ordered_extents.isra.22.constprop.26+0x354/0x3d0
> [btrfs] [ 9460.169175]  ? wake_bit_function+0x60/0x60
> [ 9460.169204]  btrfs_log_changed_extents+0x28a/0x620 [btrfs]
> [ 9460.169237]  ? release_extent_buffer+0x98/0xb0 [btrfs]
> [ 9460.169265]  btrfs_log_inode+0x97e/0x1140 [btrfs]
> [ 9460.169278]  ? __set_page_dirty_nobuffers+0x117/0x150
> [ 9460.169310]  ? btree_set_page_dirty+0xe/0x10 [btrfs]
> [ 9460.169346]  ? set_extent_buffer_dirty+0x7b/0xb0 [btrfs]
> [ 9460.169378]  btrfs_log_inode_parent+0x277/0xa30 [btrfs]
> [ 9460.169386]  ? touch_atime+0xc1/0xd0
> [ 9460.169395]  ? refcount_inc+0x9/0x30
> [ 9460.169427]  ? join_transaction+0x122/0x450 [btrfs]
> [ 9460.169457]  btrfs_log_dentry_safe+0x60/0x80 [btrfs]
> [ 9460.169490]  btrfs_sync_file+0x2b3/0x3f0 [btrfs]
> [ 9460.169498]  vfs_fsync_range+0x4b/0xb0
> [ 9460.169505]  do_fsync+0x3d/0x70
> [ 9460.169512]  SyS_fdatasync+0x13/0x20
> [ 9460.169522]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> [ 9460.169527] RIP: 0033:0x7f80b8fcd22d
> [ 9460.169532] RSP: 002b:00007f80777ea2c0 EFLAGS: 00000293 ORIG_RAX:
> 000000000000004b
> [ 9460.169541] RAX: ffffffffffffffda RBX: 0000398ce9546d90 RCX:
> 00007f80b8fcd22d
> [ 9460.169546] RDX: 0000000000000001 RSI: 0000000000000002 RDI:
> 0000000000000117
> [ 9460.169552] RBP: 0000000000000200 R08: 0000398ced6459c0 R09:
> 0000000000000001
> [ 9460.169557] R10: 000000000000e400 R11: 0000000000000293 R12:
> 0000000000000001
> [ 9460.169562] R13: 0000000000000015 R14: 0000000000000200 R15:
> 0000398ce9b50d80
> ===

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-08-26 10:37 ` Oleksandr Natalenko
@ 2017-08-26 10:48   ` Oleksandr Natalenko
  2017-08-26 11:19     ` Martin Steigerwald
                       ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Oleksandr Natalenko @ 2017-08-26 10:48 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-raid,
	linux-kernel, Shaohua Li

Quick update: reproduced on both v4.12.7 and v4.13.0-rc6.

On sobota 26. srpna 2017 12:37:29 CEST Oleksandr Natalenko wrote:
> Hi.
> 
> I've re-checked this issue with 4.12.9, and it is still there.
> 
> Also, I've managed to reproduce it in a VM with non-virtio disks (just
> -hda/- hdb pair in QEMU).
> 
> I'm not able to reproduce it with blk_mq disabled. Also, if blk_mq is
> enabled, scheduler does not make any difference, even setting "none"
> triggers hang on resume.
> 
> Next, will check for possible change between v4.12.7 and v4.12.8 (I don't
> remember this issue to be triggered on v4.12.7, or maybe I just haven't
> noticed it), and also will recompile kernel with debug info to capture
> proper vmcore.
> 
> On úterý 22. srpna 2017 13:45:43 CEST Oleksandr Natalenko wrote:
> > Hi.
> > 
> > v4.12.8 kernel hangs in I/O path after resuming from suspend-to-ram. I
> > have
> > blk-mq enabled, tried both BFQ and mq-deadline schedulers with the same
> > result. Soft lockup happens showing stacktraces I'm pasting below.
> > 
> > Stacktrace shows that I/O hangs in md_super_wait(), and it means it waits
> > for "all superblock writes that were scheduled to complete". Since there
> > is
> > "scheduled" word, should I also try "none" scheduler with blk-mq enabled?
> > 
> > While I'm trying to reproduce it on a VM without much luck (it happens on
> > my laptop rarely, like 1 out of 10 suspend-resume cycles), and also
> > re-checking it with blk-mq disabled, by any chance is this something
> > already known?
> > 
> > Ideally, I'd like to reprduce it in a VM and capture vmcore.
> > 
> > Any suggestions are welcome. Thanks.
> > 
> > ===
> > [ 9460.165958] INFO: task md0_raid10:225 blocked for more than 120
> > seconds.
> > [ 9460.165983]       Not tainted 4.12.0-pf7 #1
> > [ 9460.165989] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [ 9460.165996] md0_raid10      D    0   225      2 0x00000000
> > [ 9460.166005] Call Trace:
> > [ 9460.166027]  __schedule+0x6e7/0xcd0
> > [ 9460.166043]  schedule+0x3d/0xd0
> > [ 9460.166067]  md_super_wait+0x6e/0xa0 [md_mod]
> > [ 9460.166130]  ? wake_bit_function+0x60/0x60
> > [ 9460.166143]  write_page+0x185/0x310 [md_mod]
> > [ 9460.166156]  bitmap_update_sb+0x12f/0x140 [md_mod]
> > [ 9460.166167]  md_update_sb.part.60+0x354/0x840 [md_mod]
> > [ 9460.166179]  ? percpu_ref_switch_to_percpu+0x36/0x40
> > [ 9460.166190]  md_check_recovery+0x214/0x4b0 [md_mod]
> > [ 9460.166201]  raid10d+0x62/0x13c0 [raid10]
> > [ 9460.166213]  ? lock_timer_base+0x81/0xa0
> > [ 9460.166222]  ? try_to_del_timer_sync+0x53/0x80
> > [ 9460.166232]  ? del_timer_sync+0x39/0x40
> > [ 9460.166246]  ? schedule_timeout+0x19b/0x330
> > [ 9460.166261]  ? call_timer_fn+0x160/0x160
> > [ 9460.166279]  md_thread+0x120/0x160 [md_mod]
> > [ 9460.166290]  ? md_thread+0x120/0x160 [md_mod]
> > [ 9460.166298]  ? wake_bit_function+0x60/0x60
> > [ 9460.166309]  kthread+0x124/0x140
> > [ 9460.166319]  ? find_pers+0x70/0x70 [md_mod]
> > [ 9460.166353]  ? kthread_create_on_node+0x70/0x70
> > [ 9460.166360]  ret_from_fork+0x25/0x30
> > [ 9460.166371] INFO: task dmcrypt_write:254 blocked for more than 120
> > seconds. [ 9460.166376]       Not tainted 4.12.0-pf7 #1
> > [ 9460.166381] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [ 9460.166386] dmcrypt_write   D    0   254      2 0x00000000
> > [ 9460.166492] Call Trace:
> > [ 9460.166501]  __schedule+0x6e7/0xcd0
> > [ 9460.166511]  schedule+0x3d/0xd0
> > [ 9460.166522]  md_write_start+0xe3/0x270 [md_mod]
> > [ 9460.166529]  ? wake_bit_function+0x60/0x60
> > [ 9460.166538]  raid10_make_request+0x3f/0x140 [raid10]
> > [ 9460.166549]  md_make_request+0xa9/0x2a0 [md_mod]
> > [ 9460.166557]  ? __schedule+0x6ef/0xcd0
> > [ 9460.166567]  generic_make_request+0x11e/0x2f0
> > [ 9460.166578]  dmcrypt_write+0x22d/0x250 [dm_crypt]
> > [ 9460.166586]  ? dmcrypt_write+0x22d/0x250 [dm_crypt]
> > [ 9460.166596]  ? wake_up_q+0x80/0x80
> > [ 9460.166605]  kthread+0x124/0x140
> > [ 9460.166613]  ? kthread+0x124/0x140
> > [ 9460.166621]  ? crypt_iv_essiv_dtr+0x70/0x70 [dm_crypt]
> > [ 9460.166628]  ? kthread_create_on_node+0x70/0x70
> > [ 9460.166635]  ret_from_fork+0x25/0x30
> > [ 9460.166677] INFO: task konversation:7110 blocked for more than 120
> > seconds. [ 9460.166683]       Not tainted 4.12.0-pf7 #1
> > [ 9460.166687] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [ 9460.166693] konversation    D    0  7110   6972 0x00000000
> > [ 9460.166700] Call Trace:
> > [ 9460.166709]  __schedule+0x6e7/0xcd0
> > [ 9460.166719]  schedule+0x3d/0xd0
> > [ 9460.166824]  io_schedule+0x16/0x40
> > [ 9460.166833]  wait_on_page_bit+0xeb/0x130
> > [ 9460.166841]  ? page_cache_tree_insert+0xc0/0xc0
> > [ 9460.166882]  prepare_pages+0x1b5/0x1d0 [btrfs]
> > [ 9460.166915]  __btrfs_buffered_write+0x267/0x6d0 [btrfs]
> > [ 9460.166948]  btrfs_file_write_iter+0x1c1/0x500 [btrfs]
> > [ 9460.166956]  ? cp_new_stat+0x152/0x180
> > [ 9460.166966]  __vfs_write+0xe4/0x140
> > [ 9460.166975]  vfs_write+0xb1/0x1a0
> > [ 9460.166984]  SyS_write+0x55/0xc0
> > [ 9460.166994]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> > [ 9460.167001] RIP: 0033:0x7fd020f09c0d
> > [ 9460.167006] RSP: 002b:00007ffe27e70360 EFLAGS: 00000293 ORIG_RAX:
> > 0000000000000001
> > [ 9460.167013] RAX: ffffffffffffffda RBX: 0000000003178840 RCX:
> > 00007fd020f09c0d
> > [ 9460.167019] RDX: 000000000000007d RSI: 0000000003241598 RDI:
> > 000000000000002b
> > [ 9460.167024] RBP: 00007fd0211c72a0 R08: 00007fd021a55010 R09:
> > 0000000000000030
> > [ 9460.167029] R10: 00007fd0211caad8 R11: 0000000000000293 R12:
> > 0000000000000000
> > [ 9460.167034] R13: 00000000013b4cef R14: 0000000000000400 R15:
> > 0000000000000001
> > [ 9460.167075] INFO: task akonadi_imap_re:7363 blocked for more than 120
> > seconds.
> > [ 9460.167080]       Not tainted 4.12.0-pf7 #1
> > [ 9460.167084] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [ 9460.167089] akonadi_imap_re D    0  7363   7107 0x00000000
> > [ 9460.167096] Call Trace:
> > [ 9460.167105]  __schedule+0x6e7/0xcd0
> > [ 9460.167114]  schedule+0x3d/0xd0
> > [ 9460.167149]  wait_for_writer+0xb1/0xe0 [btrfs]
> > [ 9460.167157]  ? wake_bit_function+0x60/0x60
> > [ 9460.167190]  btrfs_sync_log+0xe2/0xad0 [btrfs]
> > [ 9460.167201]  ? dput.part.24+0x51/0x1d0
> > [ 9460.167210]  ? dput+0x13/0x20
> > [ 9460.167242]  ? btrfs_log_dentry_safe+0x6a/0x80 [btrfs]
> > [ 9460.167275]  btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> > [ 9460.167302]  ? btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> > [ 9460.167313]  vfs_fsync_range+0x4b/0xb0
> > [ 9460.167320]  do_fsync+0x3d/0x70
> > [ 9460.167327]  SyS_fdatasync+0x13/0x20
> > [ 9460.167336]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> > [ 9460.167342] RIP: 0033:0x7fcdf0d7522d
> > [ 9460.167348] RSP: 002b:00007ffdcb64e9a0 EFLAGS: 00000293 ORIG_RAX:
> > 000000000000004b
> > [ 9460.167355] RAX: ffffffffffffffda RBX: 0000007b8c7c9f80 RCX:
> > 00007fcdf0d7522d
> > [ 9460.167361] RDX: 0000000000000023 RSI: 0000007b8c7ca228 RDI:
> > 0000000000000010
> > [ 9460.167366] RBP: 0000000000000001 R08: 0000007b8c7c9270 R09:
> > 000000000000002f
> > [ 9460.167371] R10: 0000007b8c7c9830 R11: 0000000000000293 R12:
> > 0000007b8c7b88f0
> > [ 9460.167376] R13: 0000000000000000 R14: 0000000000000010 R15:
> > 0000007b8c7b66b0
> > [ 9460.167387] INFO: task akonadi_maildis:7366 blocked for more than 120
> > seconds.
> > [ 9460.167492]       Not tainted 4.12.0-pf7 #1
> > [ 9460.167496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [ 9460.167501] akonadi_maildis D    0  7366   7107 0x00000000
> > [ 9460.167507] Call Trace:
> > [ 9460.167516]  __schedule+0x6e7/0xcd0
> > [ 9460.167525]  schedule+0x3d/0xd0
> > [ 9460.167557]  wait_log_commit+0xd6/0x100 [btrfs]
> > [ 9460.167564]  ? wake_bit_function+0x60/0x60
> > [ 9460.167593]  btrfs_sync_log+0x13e/0xad0 [btrfs]
> > [ 9460.167603]  ? dput.part.24+0x51/0x1d0
> > [ 9460.167611]  ? dput+0x13/0x20
> > [ 9460.167640]  ? btrfs_log_dentry_safe+0x6a/0x80 [btrfs]
> > [ 9460.167670]  btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> > [ 9460.167696]  ? btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> > [ 9460.167705]  vfs_fsync_range+0x4b/0xb0
> > [ 9460.167712]  do_fsync+0x3d/0x70
> > [ 9460.167718]  SyS_fdatasync+0x13/0x20
> > [ 9460.167727]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> > [ 9460.167733] RIP: 0033:0x7ff91da0822d
> > [ 9460.167737] RSP: 002b:00007ffdbded6fb0 EFLAGS: 00000293 ORIG_RAX:
> > 000000000000004b
> > [ 9460.167744] RAX: ffffffffffffffda RBX: 000000841e8cc480 RCX:
> > 00007ff91da0822d
> > [ 9460.167749] RDX: 0000000000000028 RSI: 000000841e902608 RDI:
> > 000000000000000f
> > [ 9460.167754] RBP: 0000000000000001 R08: 000000841e8aff00 R09:
> > 000000000000002f
> > [ 9460.167760] R10: 000000841e91a1d0 R11: 0000000000000293 R12:
> > 000000841e910cc0
> > [ 9460.167764] R13: 0000000000000000 R14: 000000000000000f R15:
> > 000000841e9165c0
> > [ 9460.167786] INFO: task chromium:7600 blocked for more than 120 seconds.
> > [ 9460.167791]       Not tainted 4.12.0-pf7 #1
> > [ 9460.167796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [ 9460.167800] chromium        D    0  7600   7020 0x00000000
> > [ 9460.167807] Call Trace:
> > [ 9460.167815]  __schedule+0x6e7/0xcd0
> > [ 9460.167824]  schedule+0x3d/0xd0
> > [ 9460.167831]  io_schedule+0x16/0x40
> > [ 9460.167838]  wait_on_page_bit+0xeb/0x130
> > [ 9460.167847]  ? page_cache_tree_insert+0xc0/0xc0
> > [ 9460.167879]  btrfs_page_mkwrite+0x1bf/0x400 [btrfs]
> > [ 9460.167889]  do_page_mkwrite+0x31/0x90
> > [ 9460.167896]  ? current_time+0x38/0x70
> > [ 9460.167905]  do_wp_page+0x200/0x4b0
> > [ 9460.167911]  ? file_update_time+0x60/0x110
> > [ 9460.167920]  __handle_mm_fault+0x832/0xd50
> > [ 9460.167930]  handle_mm_fault+0xde/0x220
> > [ 9460.167938]  __do_page_fault+0x24d/0x510
> > [ 9460.167946]  do_page_fault+0x22/0x30
> > [ 9460.167954]  page_fault+0x28/0x30
> > [ 9460.167959] RIP: 0033:0x64de23cd9c
> > [ 9460.167964] RSP: 002b:00007ffdfd4ad4c0 EFLAGS: 00010202
> > [ 9460.167970] RAX: 00007f80a51b9d58 RBX: 0000398cef048700 RCX:
> > 0000398ce8da6510
> > [ 9460.167982] RDX: 0000000000000006 RSI: 0000000000000000 RDI:
> > 0000398cef048700
> > [ 9460.167988] RBP: 0000000000000005 R08: 0000000000000000 R09:
> > 0000000000000000
> > [ 9460.167993] R10: 00007ffdfd4acdc0 R11: 0000000000000000 R12:
> > 0000000000000001
> > [ 9460.168000] R13: 0000000000000005 R14: 00007ffdfd4ad908 R15:
> > 0000000000000000
> > [ 9460.168014] INFO: task BrowserBlocking:7639 blocked for more than 120
> > seconds.
> > [ 9460.168019]       Not tainted 4.12.0-pf7 #1
> > [ 9460.168024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [ 9460.168028] BrowserBlocking D    0  7639   7020 0x00000000
> > [ 9460.168035] Call Trace:
> > [ 9460.168044]  __schedule+0x6e7/0xcd0
> > [ 9460.168054]  schedule+0x3d/0xd0
> > [ 9460.168061]  io_schedule+0x16/0x40
> > [ 9460.168068]  wait_on_page_bit_common+0xe3/0x180
> > [ 9460.168076]  ? page_cache_tree_insert+0xc0/0xc0
> > [ 9460.168083]  __filemap_fdatawait_range+0x12a/0x1a0
> > [ 9460.168093]  filemap_fdatawait_range+0x14/0x30
> > [ 9460.168128]  btrfs_wait_ordered_range+0x6b/0x110 [btrfs]
> > [ 9460.168157]  ? btrfs_fdatawrite_range+0x20/0x50 [btrfs]
> > [ 9460.168186]  btrfs_sync_file+0x17c/0x3f0 [btrfs]
> > [ 9460.168195]  vfs_fsync_range+0x4b/0xb0
> > [ 9460.168202]  do_fsync+0x3d/0x70
> > [ 9460.168209]  SyS_fdatasync+0x13/0x20
> > [ 9460.168218]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> > [ 9460.168223] RIP: 0033:0x7f80b8fcd22d
> > [ 9460.168228] RSP: 002b:00007f808c8b7680 EFLAGS: 00000293 ORIG_RAX:
> > 000000000000004b
> > [ 9460.168235] RAX: ffffffffffffffda RBX: 00007f808c8b7670 RCX:
> > 00007f80b8fcd22d
> > [ 9460.168240] RDX: 00000000000019bc RSI: 0000398cef877800 RDI:
> > 0000000000000119
> > [ 9460.168246] RBP: 0000000000000119 R08: 0000000000000037 R09:
> > 0000398ce88693c0
> > [ 9460.168251] R10: 0000000000000000 R11: 0000000000000293 R12:
> > 00007f808c8b77d8
> > [ 9460.168257] R13: 0000398ceb6cf820 R14: 00000000000019bc R15:
> > 0000398cef877800
> > [ 9460.168267] INFO: task Chrome_SyncThre:7867 blocked for more than 120
> > seconds.
> > [ 9460.168272]       Not tainted 4.12.0-pf7 #1
> > [ 9460.168277] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [ 9460.168282] Chrome_SyncThre D    0  7867   7020 0x00000000
> > [ 9460.168289] Call Trace:
> > [ 9460.168297]  __schedule+0x6e7/0xcd0
> > [ 9460.168306]  schedule+0x3d/0xd0
> > [ 9460.168338]  wait_ordered_extents.isra.22.constprop.26+0x354/0x3d0
> > [btrfs] [ 9460.168348]  ? wake_bit_function+0x60/0x60
> > [ 9460.168377]  btrfs_log_changed_extents+0x28a/0x620 [btrfs]
> > [ 9460.168412]  ? release_extent_buffer+0x98/0xb0 [btrfs]
> > [ 9460.168444]  btrfs_log_inode+0x97e/0x1140 [btrfs]
> > [ 9460.168668]  btrfs_log_inode_parent+0x277/0xa30 [btrfs]
> > [ 9460.168676]  ? touch_atime+0xc1/0xd0
> > [ 9460.168686]  ? refcount_inc+0x9/0x30
> > [ 9460.168719]  ? join_transaction+0x122/0x450 [btrfs]
> > [ 9460.168750]  btrfs_log_dentry_safe+0x60/0x80 [btrfs]
> > [ 9460.168779]  btrfs_sync_file+0x2b3/0x3f0 [btrfs]
> > [ 9460.168787]  vfs_fsync_range+0x4b/0xb0
> > [ 9460.168794]  do_fsync+0x3d/0x70
> > [ 9460.168801]  SyS_fdatasync+0x13/0x20
> > [ 9460.168810]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> > [ 9460.168815] RIP: 0033:0x7f80b8fcd22d
> > [ 9460.168819] RSP: 002b:00007f80804e1720 EFLAGS: 00000293 ORIG_RAX:
> > 000000000000004b
> > [ 9460.168826] RAX: ffffffffffffffda RBX: 0000000000008000 RCX:
> > 00007f80b8fcd22d
> > [ 9460.168830] RDX: 0000000000000001 RSI: 0000000000000002 RDI:
> > 00000000000000fb
> > [ 9460.168835] RBP: 0000398ce9e33000 R08: 0000398ced562800 R09:
> > 0000000000000001
> > [ 9460.168840] R10: 0000000000018400 R11: 0000000000000293 R12:
> > 00000000004b0000
> > [ 9460.168845] R13: 0000398ceac2ac78 R14: 0000398ce9e3b038 R15:
> > 0000398ceb9a1048
> > [ 9460.168854] INFO: task BrowserBlocking:8011 blocked for more than 120
> > seconds.
> > [ 9460.168858]       Not tainted 4.12.0-pf7 #1
> > [ 9460.168863] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [ 9460.168867] BrowserBlocking D    0  8011   7020 0x00000000
> > [ 9460.168873] Call Trace:
> > [ 9460.168882]  __schedule+0x6e7/0xcd0
> > [ 9460.168892]  schedule+0x3d/0xd0
> > [ 9460.168898]  io_schedule+0x16/0x40
> > [ 9460.168905]  wait_on_page_bit_common+0xe3/0x180
> > [ 9460.168912]  ? page_cache_tree_insert+0xc0/0xc0
> > [ 9460.168919]  __filemap_fdatawait_range+0x12a/0x1a0
> > [ 9460.168929]  filemap_fdatawait_range+0x14/0x30
> > [ 9460.168961]  btrfs_wait_ordered_range+0x6b/0x110 [btrfs]
> > [ 9460.168989]  ? btrfs_fdatawrite_range+0x20/0x50 [btrfs]
> > [ 9460.169015]  btrfs_sync_file+0x17c/0x3f0 [btrfs]
> > [ 9460.169025]  vfs_fsync_range+0x4b/0xb0
> > [ 9460.169033]  do_fsync+0x3d/0x70
> > [ 9460.169040]  SyS_fdatasync+0x13/0x20
> > [ 9460.169050]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> > [ 9460.169055] RIP: 0033:0x7f80b8fcd22d
> > [ 9460.169060] RSP: 002b:00007f807d5f1680 EFLAGS: 00000293 ORIG_RAX:
> > 000000000000004b
> > [ 9460.169066] RAX: ffffffffffffffda RBX: 00007f807d5f1670 RCX:
> > 00007f80b8fcd22d
> > [ 9460.169073] RDX: 000000000001f95c RSI: 0000398cece75000 RDI:
> > 000000000000011d
> > [ 9460.169078] RBP: 000000000000011d R08: 000000000000003f R09:
> > 0000398ce8870200
> > [ 9460.169083] R10: 0000000000000000 R11: 0000000000000293 R12:
> > 00007f807d5f17d8
> > [ 9460.169088] R13: 0000398ceafb9160 R14: 000000000001f95c R15:
> > 0000398cece75000
> > [ 9460.169098] INFO: task TaskSchedulerBa:9603 blocked for more than 120
> > seconds.
> > [ 9460.169103]       Not tainted 4.12.0-pf7 #1
> > [ 9460.169108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [ 9460.169113] TaskSchedulerBa D    0  9603   7020 0x00000000
> > [ 9460.169120] Call Trace:
> > [ 9460.169128]  __schedule+0x6e7/0xcd0
> > [ 9460.169138]  schedule+0x3d/0xd0
> > [ 9460.169166]  wait_ordered_extents.isra.22.constprop.26+0x354/0x3d0
> > [btrfs] [ 9460.169175]  ? wake_bit_function+0x60/0x60
> > [ 9460.169204]  btrfs_log_changed_extents+0x28a/0x620 [btrfs]
> > [ 9460.169237]  ? release_extent_buffer+0x98/0xb0 [btrfs]
> > [ 9460.169265]  btrfs_log_inode+0x97e/0x1140 [btrfs]
> > [ 9460.169278]  ? __set_page_dirty_nobuffers+0x117/0x150
> > [ 9460.169310]  ? btree_set_page_dirty+0xe/0x10 [btrfs]
> > [ 9460.169346]  ? set_extent_buffer_dirty+0x7b/0xb0 [btrfs]
> > [ 9460.169378]  btrfs_log_inode_parent+0x277/0xa30 [btrfs]
> > [ 9460.169386]  ? touch_atime+0xc1/0xd0
> > [ 9460.169395]  ? refcount_inc+0x9/0x30
> > [ 9460.169427]  ? join_transaction+0x122/0x450 [btrfs]
> > [ 9460.169457]  btrfs_log_dentry_safe+0x60/0x80 [btrfs]
> > [ 9460.169490]  btrfs_sync_file+0x2b3/0x3f0 [btrfs]
> > [ 9460.169498]  vfs_fsync_range+0x4b/0xb0
> > [ 9460.169505]  do_fsync+0x3d/0x70
> > [ 9460.169512]  SyS_fdatasync+0x13/0x20
> > [ 9460.169522]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> > [ 9460.169527] RIP: 0033:0x7f80b8fcd22d
> > [ 9460.169532] RSP: 002b:00007f80777ea2c0 EFLAGS: 00000293 ORIG_RAX:
> > 000000000000004b
> > [ 9460.169541] RAX: ffffffffffffffda RBX: 0000398ce9546d90 RCX:
> > 00007f80b8fcd22d
> > [ 9460.169546] RDX: 0000000000000001 RSI: 0000000000000002 RDI:
> > 0000000000000117
> > [ 9460.169552] RBP: 0000000000000200 R08: 0000398ced6459c0 R09:
> > 0000000000000001
> > [ 9460.169557] R10: 000000000000e400 R11: 0000000000000293 R12:
> > 0000000000000001
> > [ 9460.169562] R13: 0000000000000015 R14: 0000000000000200 R15:
> > 0000398ce9b50d80
> > ===

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-08-26 10:48   ` Oleksandr Natalenko
@ 2017-08-26 11:19     ` Martin Steigerwald
  2017-08-26 17:17       ` Wols Lists
  2017-08-26 13:32     ` Oleksandr Natalenko
  2017-08-27  6:02     ` Ming Lei
  2 siblings, 1 reply; 29+ messages in thread
From: Martin Steigerwald @ 2017-08-26 11:19 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Ming Lei, Jens Axboe, Christoph Hellwig, linux-block, linux-raid,
	linux-kernel, Shaohua Li

Hello Oleksandr,

Oleksandr Natalenko - 26.08.17, 12:48:
> Quick update: reproduced on both v4.12.7 and v4.13.0-rc6.
> 
> On sobota 26. srpna 2017 12:37:29 CEST Oleksandr Natalenko wrote:
[…]
> > I've re-checked this issue with 4.12.9, and it is still there.
[…]
> > On úterý 22. srpna 2017 13:45:43 CEST Oleksandr Natalenko wrote:
> > > Hi.
> > > 
> > > v4.12.8 kernel hangs in I/O path after resuming from suspend-to-ram. I
> > > have
> > > blk-mq enabled, tried both BFQ and mq-deadline schedulers with the same
> > > result. Soft lockup happens showing stacktraces I'm pasting below.

I did have occassional hangs on resuming from suspend-to-ram with 4.12, but… I 
am not certain that its related to I/O issues.

The hangs were gone as I tried with a kernel config from a friend that still 
uses CFQ, I then adapted it to use 1000HZ, low-latency desktop and blk-mq as I 
used before and also enabled optimization for the processor type in this 
ThinkPad T520, and got hangs on resuming from suspend-to-ram again. 

As backing out the change from 250 HZ to 1000 HZ, and from low-latency desktop 
to the lesser low latency option did not help, I am now currently re-using the 
config from my friend minus quite some drivers and unneeded kernel features, 
but otherwise almost unchanged. I.e. back with CFQ as well. So far so good, 
but it needs at least 4-5 days of additional testing to be sure.

Also… when a hang happened the mouse pointer was frozen, Ctrl-Alt-F1 didn´t 
work and so on… so it may easily be a completely different issue.

I did not see much point in reporting it so far… as I have no idea on how to 
reliably pin-point the issue. It happens once every few days, so a bisect 
again is out of questions – (it is anyway for a production machine for me) –, 
it appears to be a hard freeze, so no debug data… its one of these "you don´t 
get to debug me" hangs again. I really have no idea how to a get hold on such 
complexity. I am hoping to at least pin-point the exact kernel option that 
triggers this issue, but it may take weeks to do so. I´d really love a way for 
the kernel to at least to write out debug data before doing hanging 
completely.

Thanks,
Martin

> > > 
> > > Stacktrace shows that I/O hangs in md_super_wait(), and it means it
> > > waits
> > > for "all superblock writes that were scheduled to complete". Since there
> > > is
> > > "scheduled" word, should I also try "none" scheduler with blk-mq
> > > enabled?
> > > 
> > > While I'm trying to reproduce it on a VM without much luck (it happens
> > > on
> > > my laptop rarely, like 1 out of 10 suspend-resume cycles), and also
> > > re-checking it with blk-mq disabled, by any chance is this something
> > > already known?
> > > 
> > > Ideally, I'd like to reprduce it in a VM and capture vmcore.
> > > 
> > > Any suggestions are welcome. Thanks.
> > > 
> > > ===
> > > [ 9460.165958] INFO: task md0_raid10:225 blocked for more than 120
> > > seconds.
> > > [ 9460.165983]       Not tainted 4.12.0-pf7 #1
> > > [ 9460.165989] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables
> > > this message.
> > > [ 9460.165996] md0_raid10      D    0   225      2 0x00000000
> > > [ 9460.166005] Call Trace:
> > > [ 9460.166027]  __schedule+0x6e7/0xcd0
> > > [ 9460.166043]  schedule+0x3d/0xd0
> > > [ 9460.166067]  md_super_wait+0x6e/0xa0 [md_mod]
> > > [ 9460.166130]  ? wake_bit_function+0x60/0x60
> > > [ 9460.166143]  write_page+0x185/0x310 [md_mod]
> > > [ 9460.166156]  bitmap_update_sb+0x12f/0x140 [md_mod]
> > > [ 9460.166167]  md_update_sb.part.60+0x354/0x840 [md_mod]
> > > [ 9460.166179]  ? percpu_ref_switch_to_percpu+0x36/0x40
> > > [ 9460.166190]  md_check_recovery+0x214/0x4b0 [md_mod]
> > > [ 9460.166201]  raid10d+0x62/0x13c0 [raid10]
> > > [ 9460.166213]  ? lock_timer_base+0x81/0xa0
> > > [ 9460.166222]  ? try_to_del_timer_sync+0x53/0x80
> > > [ 9460.166232]  ? del_timer_sync+0x39/0x40
> > > [ 9460.166246]  ? schedule_timeout+0x19b/0x330
> > > [ 9460.166261]  ? call_timer_fn+0x160/0x160
> > > [ 9460.166279]  md_thread+0x120/0x160 [md_mod]
> > > [ 9460.166290]  ? md_thread+0x120/0x160 [md_mod]
> > > [ 9460.166298]  ? wake_bit_function+0x60/0x60
> > > [ 9460.166309]  kthread+0x124/0x140
> > > [ 9460.166319]  ? find_pers+0x70/0x70 [md_mod]
> > > [ 9460.166353]  ? kthread_create_on_node+0x70/0x70
> > > [ 9460.166360]  ret_from_fork+0x25/0x30
> > > [ 9460.166371] INFO: task dmcrypt_write:254 blocked for more than 120
> > > seconds. [ 9460.166376]       Not tainted 4.12.0-pf7 #1
> > > [ 9460.166381] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables
> > > this message.
> > > [ 9460.166386] dmcrypt_write   D    0   254      2 0x00000000
> > > [ 9460.166492] Call Trace:
> > > [ 9460.166501]  __schedule+0x6e7/0xcd0
> > > [ 9460.166511]  schedule+0x3d/0xd0
> > > [ 9460.166522]  md_write_start+0xe3/0x270 [md_mod]
> > > [ 9460.166529]  ? wake_bit_function+0x60/0x60
> > > [ 9460.166538]  raid10_make_request+0x3f/0x140 [raid10]
> > > [ 9460.166549]  md_make_request+0xa9/0x2a0 [md_mod]
> > > [ 9460.166557]  ? __schedule+0x6ef/0xcd0
> > > [ 9460.166567]  generic_make_request+0x11e/0x2f0
> > > [ 9460.166578]  dmcrypt_write+0x22d/0x250 [dm_crypt]
> > > [ 9460.166586]  ? dmcrypt_write+0x22d/0x250 [dm_crypt]
> > > [ 9460.166596]  ? wake_up_q+0x80/0x80
> > > [ 9460.166605]  kthread+0x124/0x140
> > > [ 9460.166613]  ? kthread+0x124/0x140
> > > [ 9460.166621]  ? crypt_iv_essiv_dtr+0x70/0x70 [dm_crypt]
> > > [ 9460.166628]  ? kthread_create_on_node+0x70/0x70
> > > [ 9460.166635]  ret_from_fork+0x25/0x30
> > > [ 9460.166677] INFO: task konversation:7110 blocked for more than 120
> > > seconds. [ 9460.166683]       Not tainted 4.12.0-pf7 #1
> > > [ 9460.166687] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables
> > > this message.
> > > [ 9460.166693] konversation    D    0  7110   6972 0x00000000
> > > [ 9460.166700] Call Trace:
> > > [ 9460.166709]  __schedule+0x6e7/0xcd0
> > > [ 9460.166719]  schedule+0x3d/0xd0
> > > [ 9460.166824]  io_schedule+0x16/0x40
> > > [ 9460.166833]  wait_on_page_bit+0xeb/0x130
> > > [ 9460.166841]  ? page_cache_tree_insert+0xc0/0xc0
> > > [ 9460.166882]  prepare_pages+0x1b5/0x1d0 [btrfs]
> > > [ 9460.166915]  __btrfs_buffered_write+0x267/0x6d0 [btrfs]
> > > [ 9460.166948]  btrfs_file_write_iter+0x1c1/0x500 [btrfs]
> > > [ 9460.166956]  ? cp_new_stat+0x152/0x180
> > > [ 9460.166966]  __vfs_write+0xe4/0x140
> > > [ 9460.166975]  vfs_write+0xb1/0x1a0
> > > [ 9460.166984]  SyS_write+0x55/0xc0
> > > [ 9460.166994]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> > > [ 9460.167001] RIP: 0033:0x7fd020f09c0d
> > > [ 9460.167006] RSP: 002b:00007ffe27e70360 EFLAGS: 00000293 ORIG_RAX:
> > > 0000000000000001
> > > [ 9460.167013] RAX: ffffffffffffffda RBX: 0000000003178840 RCX:
> > > 00007fd020f09c0d
> > > [ 9460.167019] RDX: 000000000000007d RSI: 0000000003241598 RDI:
> > > 000000000000002b
> > > [ 9460.167024] RBP: 00007fd0211c72a0 R08: 00007fd021a55010 R09:
> > > 0000000000000030
> > > [ 9460.167029] R10: 00007fd0211caad8 R11: 0000000000000293 R12:
> > > 0000000000000000
> > > [ 9460.167034] R13: 00000000013b4cef R14: 0000000000000400 R15:
> > > 0000000000000001
> > > [ 9460.167075] INFO: task akonadi_imap_re:7363 blocked for more than 120
> > > seconds.
> > > [ 9460.167080]       Not tainted 4.12.0-pf7 #1
> > > [ 9460.167084] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables
> > > this message.
> > > [ 9460.167089] akonadi_imap_re D    0  7363   7107 0x00000000
> > > [ 9460.167096] Call Trace:
> > > [ 9460.167105]  __schedule+0x6e7/0xcd0
> > > [ 9460.167114]  schedule+0x3d/0xd0
> > > [ 9460.167149]  wait_for_writer+0xb1/0xe0 [btrfs]
> > > [ 9460.167157]  ? wake_bit_function+0x60/0x60
> > > [ 9460.167190]  btrfs_sync_log+0xe2/0xad0 [btrfs]
> > > [ 9460.167201]  ? dput.part.24+0x51/0x1d0
> > > [ 9460.167210]  ? dput+0x13/0x20
> > > [ 9460.167242]  ? btrfs_log_dentry_safe+0x6a/0x80 [btrfs]
> > > [ 9460.167275]  btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> > > [ 9460.167302]  ? btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> > > [ 9460.167313]  vfs_fsync_range+0x4b/0xb0
> > > [ 9460.167320]  do_fsync+0x3d/0x70
> > > [ 9460.167327]  SyS_fdatasync+0x13/0x20
> > > [ 9460.167336]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> > > [ 9460.167342] RIP: 0033:0x7fcdf0d7522d
> > > [ 9460.167348] RSP: 002b:00007ffdcb64e9a0 EFLAGS: 00000293 ORIG_RAX:
> > > 000000000000004b
> > > [ 9460.167355] RAX: ffffffffffffffda RBX: 0000007b8c7c9f80 RCX:
> > > 00007fcdf0d7522d
> > > [ 9460.167361] RDX: 0000000000000023 RSI: 0000007b8c7ca228 RDI:
> > > 0000000000000010
> > > [ 9460.167366] RBP: 0000000000000001 R08: 0000007b8c7c9270 R09:
> > > 000000000000002f
> > > [ 9460.167371] R10: 0000007b8c7c9830 R11: 0000000000000293 R12:
> > > 0000007b8c7b88f0
> > > [ 9460.167376] R13: 0000000000000000 R14: 0000000000000010 R15:
> > > 0000007b8c7b66b0
> > > [ 9460.167387] INFO: task akonadi_maildis:7366 blocked for more than 120
> > > seconds.
> > > [ 9460.167492]       Not tainted 4.12.0-pf7 #1
> > > [ 9460.167496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables
> > > this message.
> > > [ 9460.167501] akonadi_maildis D    0  7366   7107 0x00000000
> > > [ 9460.167507] Call Trace:
> > > [ 9460.167516]  __schedule+0x6e7/0xcd0
> > > [ 9460.167525]  schedule+0x3d/0xd0
> > > [ 9460.167557]  wait_log_commit+0xd6/0x100 [btrfs]
> > > [ 9460.167564]  ? wake_bit_function+0x60/0x60
> > > [ 9460.167593]  btrfs_sync_log+0x13e/0xad0 [btrfs]
> > > [ 9460.167603]  ? dput.part.24+0x51/0x1d0
> > > [ 9460.167611]  ? dput+0x13/0x20
> > > [ 9460.167640]  ? btrfs_log_dentry_safe+0x6a/0x80 [btrfs]
> > > [ 9460.167670]  btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> > > [ 9460.167696]  ? btrfs_sync_file+0x2f4/0x3f0 [btrfs]
> > > [ 9460.167705]  vfs_fsync_range+0x4b/0xb0
> > > [ 9460.167712]  do_fsync+0x3d/0x70
> > > [ 9460.167718]  SyS_fdatasync+0x13/0x20
> > > [ 9460.167727]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> > > [ 9460.167733] RIP: 0033:0x7ff91da0822d
> > > [ 9460.167737] RSP: 002b:00007ffdbded6fb0 EFLAGS: 00000293 ORIG_RAX:
> > > 000000000000004b
> > > [ 9460.167744] RAX: ffffffffffffffda RBX: 000000841e8cc480 RCX:
> > > 00007ff91da0822d
> > > [ 9460.167749] RDX: 0000000000000028 RSI: 000000841e902608 RDI:
> > > 000000000000000f
> > > [ 9460.167754] RBP: 0000000000000001 R08: 000000841e8aff00 R09:
> > > 000000000000002f
> > > [ 9460.167760] R10: 000000841e91a1d0 R11: 0000000000000293 R12:
> > > 000000841e910cc0
> > > [ 9460.167764] R13: 0000000000000000 R14: 000000000000000f R15:
> > > 000000841e9165c0
> > > [ 9460.167786] INFO: task chromium:7600 blocked for more than 120
> > > seconds.
> > > [ 9460.167791]       Not tainted 4.12.0-pf7 #1
> > > [ 9460.167796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables
> > > this message.
> > > [ 9460.167800] chromium        D    0  7600   7020 0x00000000
> > > [ 9460.167807] Call Trace:
> > > [ 9460.167815]  __schedule+0x6e7/0xcd0
> > > [ 9460.167824]  schedule+0x3d/0xd0
> > > [ 9460.167831]  io_schedule+0x16/0x40
> > > [ 9460.167838]  wait_on_page_bit+0xeb/0x130
> > > [ 9460.167847]  ? page_cache_tree_insert+0xc0/0xc0
> > > [ 9460.167879]  btrfs_page_mkwrite+0x1bf/0x400 [btrfs]
> > > [ 9460.167889]  do_page_mkwrite+0x31/0x90
> > > [ 9460.167896]  ? current_time+0x38/0x70
> > > [ 9460.167905]  do_wp_page+0x200/0x4b0
> > > [ 9460.167911]  ? file_update_time+0x60/0x110
> > > [ 9460.167920]  __handle_mm_fault+0x832/0xd50
> > > [ 9460.167930]  handle_mm_fault+0xde/0x220
> > > [ 9460.167938]  __do_page_fault+0x24d/0x510
> > > [ 9460.167946]  do_page_fault+0x22/0x30
> > > [ 9460.167954]  page_fault+0x28/0x30
> > > [ 9460.167959] RIP: 0033:0x64de23cd9c
> > > [ 9460.167964] RSP: 002b:00007ffdfd4ad4c0 EFLAGS: 00010202
> > > [ 9460.167970] RAX: 00007f80a51b9d58 RBX: 0000398cef048700 RCX:
> > > 0000398ce8da6510
> > > [ 9460.167982] RDX: 0000000000000006 RSI: 0000000000000000 RDI:
> > > 0000398cef048700
> > > [ 9460.167988] RBP: 0000000000000005 R08: 0000000000000000 R09:
> > > 0000000000000000
> > > [ 9460.167993] R10: 00007ffdfd4acdc0 R11: 0000000000000000 R12:
> > > 0000000000000001
> > > [ 9460.168000] R13: 0000000000000005 R14: 00007ffdfd4ad908 R15:
> > > 0000000000000000
> > > [ 9460.168014] INFO: task BrowserBlocking:7639 blocked for more than 120
> > > seconds.
> > > [ 9460.168019]       Not tainted 4.12.0-pf7 #1
> > > [ 9460.168024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables
> > > this message.
> > > [ 9460.168028] BrowserBlocking D    0  7639   7020 0x00000000
> > > [ 9460.168035] Call Trace:
> > > [ 9460.168044]  __schedule+0x6e7/0xcd0
> > > [ 9460.168054]  schedule+0x3d/0xd0
> > > [ 9460.168061]  io_schedule+0x16/0x40
> > > [ 9460.168068]  wait_on_page_bit_common+0xe3/0x180
> > > [ 9460.168076]  ? page_cache_tree_insert+0xc0/0xc0
> > > [ 9460.168083]  __filemap_fdatawait_range+0x12a/0x1a0
> > > [ 9460.168093]  filemap_fdatawait_range+0x14/0x30
> > > [ 9460.168128]  btrfs_wait_ordered_range+0x6b/0x110 [btrfs]
> > > [ 9460.168157]  ? btrfs_fdatawrite_range+0x20/0x50 [btrfs]
> > > [ 9460.168186]  btrfs_sync_file+0x17c/0x3f0 [btrfs]
> > > [ 9460.168195]  vfs_fsync_range+0x4b/0xb0
> > > [ 9460.168202]  do_fsync+0x3d/0x70
> > > [ 9460.168209]  SyS_fdatasync+0x13/0x20
> > > [ 9460.168218]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> > > [ 9460.168223] RIP: 0033:0x7f80b8fcd22d
> > > [ 9460.168228] RSP: 002b:00007f808c8b7680 EFLAGS: 00000293 ORIG_RAX:
> > > 000000000000004b
> > > [ 9460.168235] RAX: ffffffffffffffda RBX: 00007f808c8b7670 RCX:
> > > 00007f80b8fcd22d
> > > [ 9460.168240] RDX: 00000000000019bc RSI: 0000398cef877800 RDI:
> > > 0000000000000119
> > > [ 9460.168246] RBP: 0000000000000119 R08: 0000000000000037 R09:
> > > 0000398ce88693c0
> > > [ 9460.168251] R10: 0000000000000000 R11: 0000000000000293 R12:
> > > 00007f808c8b77d8
> > > [ 9460.168257] R13: 0000398ceb6cf820 R14: 00000000000019bc R15:
> > > 0000398cef877800
> > > [ 9460.168267] INFO: task Chrome_SyncThre:7867 blocked for more than 120
> > > seconds.
> > > [ 9460.168272]       Not tainted 4.12.0-pf7 #1
> > > [ 9460.168277] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables
> > > this message.
> > > [ 9460.168282] Chrome_SyncThre D    0  7867   7020 0x00000000
> > > [ 9460.168289] Call Trace:
> > > [ 9460.168297]  __schedule+0x6e7/0xcd0
> > > [ 9460.168306]  schedule+0x3d/0xd0
> > > [ 9460.168338]  wait_ordered_extents.isra.22.constprop.26+0x354/0x3d0
> > > [btrfs] [ 9460.168348]  ? wake_bit_function+0x60/0x60
> > > [ 9460.168377]  btrfs_log_changed_extents+0x28a/0x620 [btrfs]
> > > [ 9460.168412]  ? release_extent_buffer+0x98/0xb0 [btrfs]
> > > [ 9460.168444]  btrfs_log_inode+0x97e/0x1140 [btrfs]
> > > [ 9460.168668]  btrfs_log_inode_parent+0x277/0xa30 [btrfs]
> > > [ 9460.168676]  ? touch_atime+0xc1/0xd0
> > > [ 9460.168686]  ? refcount_inc+0x9/0x30
> > > [ 9460.168719]  ? join_transaction+0x122/0x450 [btrfs]
> > > [ 9460.168750]  btrfs_log_dentry_safe+0x60/0x80 [btrfs]
> > > [ 9460.168779]  btrfs_sync_file+0x2b3/0x3f0 [btrfs]
> > > [ 9460.168787]  vfs_fsync_range+0x4b/0xb0
> > > [ 9460.168794]  do_fsync+0x3d/0x70
> > > [ 9460.168801]  SyS_fdatasync+0x13/0x20
> > > [ 9460.168810]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> > > [ 9460.168815] RIP: 0033:0x7f80b8fcd22d
> > > [ 9460.168819] RSP: 002b:00007f80804e1720 EFLAGS: 00000293 ORIG_RAX:
> > > 000000000000004b
> > > [ 9460.168826] RAX: ffffffffffffffda RBX: 0000000000008000 RCX:
> > > 00007f80b8fcd22d
> > > [ 9460.168830] RDX: 0000000000000001 RSI: 0000000000000002 RDI:
> > > 00000000000000fb
> > > [ 9460.168835] RBP: 0000398ce9e33000 R08: 0000398ced562800 R09:
> > > 0000000000000001
> > > [ 9460.168840] R10: 0000000000018400 R11: 0000000000000293 R12:
> > > 00000000004b0000
> > > [ 9460.168845] R13: 0000398ceac2ac78 R14: 0000398ce9e3b038 R15:
> > > 0000398ceb9a1048
> > > [ 9460.168854] INFO: task BrowserBlocking:8011 blocked for more than 120
> > > seconds.
> > > [ 9460.168858]       Not tainted 4.12.0-pf7 #1
> > > [ 9460.168863] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables
> > > this message.
> > > [ 9460.168867] BrowserBlocking D    0  8011   7020 0x00000000
> > > [ 9460.168873] Call Trace:
> > > [ 9460.168882]  __schedule+0x6e7/0xcd0
> > > [ 9460.168892]  schedule+0x3d/0xd0
> > > [ 9460.168898]  io_schedule+0x16/0x40
> > > [ 9460.168905]  wait_on_page_bit_common+0xe3/0x180
> > > [ 9460.168912]  ? page_cache_tree_insert+0xc0/0xc0
> > > [ 9460.168919]  __filemap_fdatawait_range+0x12a/0x1a0
> > > [ 9460.168929]  filemap_fdatawait_range+0x14/0x30
> > > [ 9460.168961]  btrfs_wait_ordered_range+0x6b/0x110 [btrfs]
> > > [ 9460.168989]  ? btrfs_fdatawrite_range+0x20/0x50 [btrfs]
> > > [ 9460.169015]  btrfs_sync_file+0x17c/0x3f0 [btrfs]
> > > [ 9460.169025]  vfs_fsync_range+0x4b/0xb0
> > > [ 9460.169033]  do_fsync+0x3d/0x70
> > > [ 9460.169040]  SyS_fdatasync+0x13/0x20
> > > [ 9460.169050]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> > > [ 9460.169055] RIP: 0033:0x7f80b8fcd22d
> > > [ 9460.169060] RSP: 002b:00007f807d5f1680 EFLAGS: 00000293 ORIG_RAX:
> > > 000000000000004b
> > > [ 9460.169066] RAX: ffffffffffffffda RBX: 00007f807d5f1670 RCX:
> > > 00007f80b8fcd22d
> > > [ 9460.169073] RDX: 000000000001f95c RSI: 0000398cece75000 RDI:
> > > 000000000000011d
> > > [ 9460.169078] RBP: 000000000000011d R08: 000000000000003f R09:
> > > 0000398ce8870200
> > > [ 9460.169083] R10: 0000000000000000 R11: 0000000000000293 R12:
> > > 00007f807d5f17d8
> > > [ 9460.169088] R13: 0000398ceafb9160 R14: 000000000001f95c R15:
> > > 0000398cece75000
> > > [ 9460.169098] INFO: task TaskSchedulerBa:9603 blocked for more than 120
> > > seconds.
> > > [ 9460.169103]       Not tainted 4.12.0-pf7 #1
> > > [ 9460.169108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables
> > > this message.
> > > [ 9460.169113] TaskSchedulerBa D    0  9603   7020 0x00000000
> > > [ 9460.169120] Call Trace:
> > > [ 9460.169128]  __schedule+0x6e7/0xcd0
> > > [ 9460.169138]  schedule+0x3d/0xd0
> > > [ 9460.169166]  wait_ordered_extents.isra.22.constprop.26+0x354/0x3d0
> > > [btrfs] [ 9460.169175]  ? wake_bit_function+0x60/0x60
> > > [ 9460.169204]  btrfs_log_changed_extents+0x28a/0x620 [btrfs]
> > > [ 9460.169237]  ? release_extent_buffer+0x98/0xb0 [btrfs]
> > > [ 9460.169265]  btrfs_log_inode+0x97e/0x1140 [btrfs]
> > > [ 9460.169278]  ? __set_page_dirty_nobuffers+0x117/0x150
> > > [ 9460.169310]  ? btree_set_page_dirty+0xe/0x10 [btrfs]
> > > [ 9460.169346]  ? set_extent_buffer_dirty+0x7b/0xb0 [btrfs]
> > > [ 9460.169378]  btrfs_log_inode_parent+0x277/0xa30 [btrfs]
> > > [ 9460.169386]  ? touch_atime+0xc1/0xd0
> > > [ 9460.169395]  ? refcount_inc+0x9/0x30
> > > [ 9460.169427]  ? join_transaction+0x122/0x450 [btrfs]
> > > [ 9460.169457]  btrfs_log_dentry_safe+0x60/0x80 [btrfs]
> > > [ 9460.169490]  btrfs_sync_file+0x2b3/0x3f0 [btrfs]
> > > [ 9460.169498]  vfs_fsync_range+0x4b/0xb0
> > > [ 9460.169505]  do_fsync+0x3d/0x70
> > > [ 9460.169512]  SyS_fdatasync+0x13/0x20
> > > [ 9460.169522]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> > > [ 9460.169527] RIP: 0033:0x7f80b8fcd22d
> > > [ 9460.169532] RSP: 002b:00007f80777ea2c0 EFLAGS: 00000293 ORIG_RAX:
> > > 000000000000004b
> > > [ 9460.169541] RAX: ffffffffffffffda RBX: 0000398ce9546d90 RCX:
> > > 00007f80b8fcd22d
> > > [ 9460.169546] RDX: 0000000000000001 RSI: 0000000000000002 RDI:
> > > 0000000000000117
> > > [ 9460.169552] RBP: 0000000000000200 R08: 0000398ced6459c0 R09:
> > > 0000000000000001
> > > [ 9460.169557] R10: 000000000000e400 R11: 0000000000000293 R12:
> > > 0000000000000001
> > > [ 9460.169562] R13: 0000000000000015 R14: 0000000000000200 R15:
> > > 0000398ce9b50d80
> > > ===


-- 
Martin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-08-26 10:48   ` Oleksandr Natalenko
  2017-08-26 11:19     ` Martin Steigerwald
@ 2017-08-26 13:32     ` Oleksandr Natalenko
  2017-08-27  6:02     ` Ming Lei
  2 siblings, 0 replies; 29+ messages in thread
From: Oleksandr Natalenko @ 2017-08-26 13:32 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-raid,
	linux-kernel, Shaohua Li

Recompiled v4.13-rc6 with debug info and lockdep.

Unfortunately, still cannot use crash utility because 7.1.9 does not support 
v4.13 (and git HEAD with required fix for 5-level pagetable does not compile 
for me).

But I've got much nicer stacktrace + lockdep output (check paste below).

Looks like kernel is blocked on device mutex here:

1510 void scsi_rescan_device(struct device *dev)
1511 {
1512     struct scsi_device *sdev = to_scsi_device(dev);
1513 
1514     device_lock(dev); // here

and md0_raid10 kthread is blocked on I/O scheduling point in blk_mq_get_tag().

Again, with blk-mq this issue is not reproducible.

===
[   62.789665] INFO: task kworker/2:1:36 blocked for more than 20 seconds.
[   62.793858]       Not tainted 4.13.0-rc6 #3
[   62.796537] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[   62.800467] kworker/2:1     D13856    36      2 0x00000000
[   62.802934] Workqueue: events ata_scsi_dev_rescan
[   62.805207] Call Trace:
[   62.806957]  __schedule+0x8d7/0x930
[   62.809956]  schedule+0x73/0x90
[   62.812677]  schedule_preempt_disabled+0x13/0x20
[   62.816012]  __mutex_lock+0x54f/0x820
[   62.819050]  ? scsi_rescan_device+0x21/0x90
[   62.821930]  ? mark_held_locks+0x58/0x80
[   62.823994]  mutex_lock_nested+0x16/0x20
[   62.825984]  ? mutex_lock_nested+0x16/0x20
[   62.828032]  scsi_rescan_device+0x21/0x90
[   62.830174]  ata_scsi_dev_rescan+0xa1/0x110
[   62.832732]  process_one_work+0x32a/0x620
[   62.835328]  worker_thread+0x221/0x3f0
[   62.837553]  kthread+0x124/0x130
[   62.839215]  ? process_one_work+0x620/0x620
[   62.841293]  ? kthread_stop+0x260/0x260
[   62.843887]  ret_from_fork+0x2a/0x40
[   62.846664] INFO: task kworker/0:1:37 blocked for more than 20 seconds.
[   62.849751]       Not tainted 4.13.0-rc6 #3
[   62.851724] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[   62.856832] kworker/0:1     D13664    37      2 0x00000000
[   62.859750] Workqueue: events ata_scsi_dev_rescan
[   62.862611] Call Trace:
[   62.864213]  __schedule+0x8d7/0x930
[   62.866859]  schedule+0x73/0x90
[   62.869507]  schedule_preempt_disabled+0x13/0x20
[   62.871662]  __mutex_lock+0x54f/0x820
[   62.873541]  ? scsi_rescan_device+0x21/0x90
[   62.875580]  ? mark_held_locks+0x58/0x80
[   62.877871]  mutex_lock_nested+0x16/0x20
[   62.880760]  ? mutex_lock_nested+0x16/0x20
[   62.883544]  scsi_rescan_device+0x21/0x90
[   62.886288]  ata_scsi_dev_rescan+0xa1/0x110
[   62.888766]  process_one_work+0x32a/0x620
[   62.891593]  worker_thread+0x221/0x3f0
[   62.894223]  kthread+0x124/0x130
[   62.895974]  ? process_one_work+0x620/0x620
[   62.898643]  ? kthread_stop+0x260/0x260
[   62.902062]  ret_from_fork+0x2a/0x40
[   62.905109] INFO: task kworker/1:1:40 blocked for more than 20 seconds.
[   62.908068]       Not tainted 4.13.0-rc6 #3
[   62.910115] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[   62.913326] kworker/1:1     D13888    40      2 0x00000000
[   62.915904] Workqueue: events ata_scsi_dev_rescan
[   62.918133] Call Trace:
[   62.919958]  __schedule+0x8d7/0x930
[   62.922615]  schedule+0x73/0x90
[   62.925042]  schedule_preempt_disabled+0x13/0x20
[   62.928377]  __mutex_lock+0x54f/0x820
[   62.930915]  ? scsi_rescan_device+0x21/0x90
[   62.933510]  ? mark_held_locks+0x58/0x80
[   62.936079]  mutex_lock_nested+0x16/0x20
[   62.938122]  ? mutex_lock_nested+0x16/0x20
[   62.940394]  scsi_rescan_device+0x21/0x90
[   62.942405]  ata_scsi_dev_rescan+0xa1/0x110
[   62.944491]  process_one_work+0x32a/0x620
[   62.946580]  worker_thread+0x221/0x3f0
[   62.948719]  kthread+0x124/0x130
[   62.950708]  ? process_one_work+0x620/0x620
[   62.953185]  ? kthread_stop+0x260/0x260
[   62.955792]  ret_from_fork+0x2a/0x40
[   62.958235] INFO: task md0_raid10:1716 blocked for more than 20 seconds.
[   62.962058]       Not tainted 4.13.0-rc6 #3
[   62.965328] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[   62.970630] md0_raid10      D13696  1716      2 0x00000000
[   62.972990] Call Trace:
[   62.975155]  __schedule+0x8d7/0x930
[   62.977618]  ? sbitmap_get+0x4e/0xa0
[   62.979601]  schedule+0x73/0x90
[   62.981206]  io_schedule+0x11/0x40
[   62.982862]  blk_mq_get_tag+0x13e/0x250
[   62.984641]  ? blk_mq_get_request+0xdb/0x400
[   62.986555]  ? do_wait_intr_irq+0x80/0x80
[   62.988503]  blk_mq_get_request+0x1a8/0x400
[   62.990636]  blk_mq_make_request+0x22f/0x7c0
[   62.992783]  ? generic_make_request+0xcb/0x360
[   62.995084]  generic_make_request+0xf8/0x360
[   62.997735]  submit_bio+0x10f/0x120
[   62.999500]  ? submit_bio+0x10f/0x120
[   63.002395]  md_super_write+0xc8/0xe0
[   63.005273]  md_update_sb+0x533/0x6e0
[   63.007286]  md_check_recovery+0x1ee/0x4b0
[   63.010229]  raid10d+0x4b/0x15e0
[   63.012121]  ? finish_wait+0x57/0x60
[   63.015065]  ? trace_hardirqs_on_caller+0x178/0x1a0
[   63.018147]  ? trace_hardirqs_on+0xd/0x10
[   63.020969]  md_thread+0xd9/0x120
[   63.023033]  ? md_thread+0xd9/0x120
[   63.025723]  ? do_wait_intr_irq+0x80/0x80
[   63.028288]  kthread+0x124/0x130
[   63.029947]  ? find_pers+0x60/0x60
[   63.031410]  ? kthread_stop+0x260/0x260
[   63.032922]  ? umh_complete+0x40/0x40
[   63.034415]  ? call_usermodehelper_exec_async+0x12f/0x160
[   63.037659]  ret_from_fork+0x2a/0x40
[   63.039245] INFO: task md0_resync:1720 blocked for more than 20 seconds.
[   63.042346]       Not tainted 4.13.0-rc6 #3
[   63.044393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[   63.048591] md0_resync      D13104  1720      2 0x00000000
[   63.051279] Call Trace:
[   63.053335]  __schedule+0x8d7/0x930
[   63.056151]  ? blk_flush_plug_list+0xd5/0x250
[   63.058725]  schedule+0x73/0x90
[   63.060382]  io_schedule+0x11/0x40
[   63.062212]  blk_mq_get_tag+0x13e/0x250
[   63.064222]  ? blk_mq_get_request+0xdb/0x400
[   63.066339]  ? do_wait_intr_irq+0x80/0x80
[   63.068875]  blk_mq_get_request+0x1a8/0x400
[   63.071329]  blk_mq_make_request+0x22f/0x7c0
[   63.073599]  ? generic_make_request+0xcb/0x360
[   63.076392]  generic_make_request+0xf8/0x360
[   63.078033]  raid10_sync_request+0x214a/0x2290
[   63.080008]  ? raid10_sync_request+0x214a/0x2290
[   63.081820]  ? find_held_lock+0x35/0xa0
[   63.083345]  ? rcu_read_unlock+0x3e/0x5d
[   63.086222]  ? is_mddev_idle+0x101/0x113
[   63.088684]  md_do_sync+0x960/0xe90
[   63.090960]  ? do_wait_intr_irq+0x80/0x80
[   63.093329]  md_thread+0xd9/0x120
[   63.095277]  ? md_thread+0xd9/0x120
[   63.097438]  ? trace_hardirqs_on+0xd/0x10
[   63.100310]  kthread+0x124/0x130
[   63.102471]  ? find_pers+0x60/0x60
[   63.104895]  ? kthread_stop+0x260/0x260
[   63.107208]  ? umh_complete+0x40/0x40
[   63.110149]  ? call_usermodehelper_exec_async+0x12f/0x160
[   63.113570]  ret_from_fork+0x2a/0x40
[   63.138339] INFO: task systemd-sleep:2599 blocked for more than 20 seconds.
[   63.141320]       Not tainted 4.13.0-rc6 #3
[   63.143560] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[   63.147738] systemd-sleep   D12640  2599      1 0x00000080
[   63.150645] Call Trace:
[   63.152842]  __schedule+0x8d7/0x930
[   63.156430]  schedule+0x73/0x90
[   63.158521]  async_synchronize_cookie_domain+0x6c/0xa0
[   63.161252]  ? do_wait_intr_irq+0x80/0x80
[   63.163461]  async_synchronize_full+0x12/0x20
[   63.166359]  dpm_resume+0x303/0x480
[   63.168656]  ? rcu_read_lock_sched_held+0x78/0xa0
[   63.171555]  dpm_resume_end+0xc/0x20
[   63.174369]  suspend_devices_and_enter+0x62f/0x990
[   63.177177]  ? printk+0x3e/0x46
[   63.179845]  ? rcu_read_lock_sched_held+0x78/0xa0
[   63.182931]  pm_suspend+0x2a8/0x490
[   63.185463]  state_store+0xaf/0xe0
[   63.187429]  kobj_attr_store+0xf/0x20
[   63.190402]  sysfs_kf_write+0x3d/0x50
[   63.192772]  kernfs_fop_write+0x114/0x180
[   63.195435]  __vfs_write+0x23/0x110
[   63.197935]  ? rcu_read_lock_sched_held+0x78/0xa0
[   63.201616]  ? rcu_sync_lockdep_assert+0x27/0x50
[   63.204820]  ? __sb_start_write+0x95/0x1f0
[   63.207522]  vfs_write+0xc0/0x180
[   63.209411]  SyS_write+0x44/0xa0
[   63.211155]  do_syscall_64+0x64/0x120
[   63.212996]  entry_SYSCALL64_slow_path+0x25/0x25
[   63.215264] RIP: 0033:0x7f160e214bf0
[   63.217244] RSP: 002b:00007fffa8f08e08 EFLAGS: 00000246 ORIG_RAX: 
0000000000000001
[   63.222682] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 
00007f160e214bf0
[   63.226774] RDX: 0000000000000004 RSI: 0000565398088390 RDI: 
0000000000000004
[   63.231227] RBP: 0000565398088390 R08: 0000565398088240 R09: 
00007f160e6e28c0
[   63.234773] R10: 00007f160e4d5ad8 R11: 0000000000000246 R12: 
0000000000000004
[   63.237877] R13: 0000000000000001 R14: 0000565398088160 R15: 
0000000000000004
[   63.240832] INFO: task kworker/u8:20:2615 blocked for more than 20 seconds.
[   63.243874]       Not tainted 4.13.0-rc6 #3
[   63.246036] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[   63.249754] kworker/u8:20   D14128  2615      2 0x00000080
[   63.252260] Workqueue: events_unbound async_run_entry_fn
[   63.254765] Call Trace:
[   63.256338]  __schedule+0x8d7/0x930
[   63.258270]  schedule+0x73/0x90
[   63.260107]  async_synchronize_cookie_domain+0x6c/0xa0
[   63.262473]  ? do_wait_intr_irq+0x80/0x80
[   63.264494]  ? async_sdev_thaw+0x20/0x20
[   63.266467]  ? scsi_bus_thaw+0x20/0x20
[   63.268372]  async_synchronize_full_domain+0x13/0x20
[   63.271074]  scsi_bus_resume_common+0x95/0x140
[   63.274168]  scsi_bus_resume+0x10/0x20
[   63.277058]  dpm_run_callback+0x142/0x290
[   63.279259]  device_resume+0x192/0x1e0
[   63.280883]  async_resume+0x18/0x40
[   63.282321]  async_run_entry_fn+0x32/0xd0
[   63.283856]  process_one_work+0x32a/0x620
[   63.285390]  worker_thread+0x221/0x3f0
[   63.286885]  kthread+0x124/0x130
[   63.288515]  ? process_one_work+0x620/0x620
[   63.290482]  ? kthread_stop+0x260/0x260
[   63.292203]  ret_from_fork+0x2a/0x40
[   63.294060] INFO: task kworker/u8:21:2616 blocked for more than 20 seconds.
[   63.296818]       Not tainted 4.13.0-rc6 #3
[   63.299821] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[   63.305221] kworker/u8:21   D14392  2616      2 0x00000080
[   63.308313] Workqueue: events_unbound async_run_entry_fn
[   63.312513] Call Trace:
[   63.314752]  __schedule+0x8d7/0x930
[   63.317683]  schedule+0x73/0x90
[   63.319485]  async_synchronize_cookie_domain+0x6c/0xa0
[   63.321883]  ? do_wait_intr_irq+0x80/0x80
[   63.323875]  ? async_sdev_thaw+0x20/0x20
[   63.325848]  ? scsi_bus_thaw+0x20/0x20
[   63.328287]  async_synchronize_full_domain+0x13/0x20
[   63.330451]  scsi_bus_resume_common+0x95/0x140
[   63.332381]  scsi_bus_resume+0x10/0x20
[   63.334129]  dpm_run_callback+0x142/0x290
[   63.336353]  device_resume+0x192/0x1e0
[   63.338854]  async_resume+0x18/0x40
[   63.341101]  async_run_entry_fn+0x32/0xd0
[   63.343197]  process_one_work+0x32a/0x620
[   63.345300]  worker_thread+0x221/0x3f0
[   63.347356]  kthread+0x124/0x130
[   63.349463]  ? process_one_work+0x620/0x620
[   63.352192]  ? kthread_stop+0x260/0x260
[   63.354697]  ret_from_fork+0x2a/0x40
[   63.357611] INFO: task kworker/u8:22:2617 blocked for more than 20 seconds.
[   63.361106]       Not tainted 4.13.0-rc6 #3
[   63.363990] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[   63.367847] kworker/u8:22   D14392  2617      2 0x00000080
[   63.370561] Workqueue: events_unbound async_run_entry_fn
[   63.374235] Call Trace:
[   63.376351]  __schedule+0x8d7/0x930
[   63.379545]  schedule+0x73/0x90
[   63.382199]  async_synchronize_cookie_domain+0x6c/0xa0
[   63.385132]  ? do_wait_intr_irq+0x80/0x80
[   63.387533]  ? async_sdev_thaw+0x20/0x20
[   63.389684]  ? scsi_bus_thaw+0x20/0x20
[   63.391849]  async_synchronize_full_domain+0x13/0x20
[   63.395454]  scsi_bus_resume_common+0x95/0x140
[   63.397842]  scsi_bus_resume+0x10/0x20
[   63.399883]  dpm_run_callback+0x142/0x290
[   63.402873]  device_resume+0x192/0x1e0
[   63.405915]  async_resume+0x18/0x40
[   63.408723]  async_run_entry_fn+0x32/0xd0
[   63.411627]  process_one_work+0x32a/0x620
[   63.413672]  worker_thread+0x221/0x3f0
[   63.415661]  kthread+0x124/0x130
[   63.417452]  ? process_one_work+0x620/0x620
[   63.419750]  ? kthread_stop+0x260/0x260
[   63.421781]  ret_from_fork+0x2a/0x40
[   63.423653] INFO: task kworker/u8:24:2619 blocked for more than 20 seconds.
[   63.426647]       Not tainted 4.13.0-rc6 #3
[   63.428599] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[   63.431762] kworker/u8:24   D14416  2619      2 0x00000080
[   63.433650] Workqueue: events_unbound async_run_entry_fn
[   63.435956] Call Trace:
[   63.437711]  __schedule+0x8d7/0x930
[   63.440567]  ? sbitmap_get+0x4e/0xa0
[   63.442571]  schedule+0x73/0x90
[   63.444331]  io_schedule+0x11/0x40
[   63.446731]  blk_mq_get_tag+0x13e/0x250
[   63.448974]  ? blk_mq_get_request+0xdb/0x400
[   63.451208]  ? do_wait_intr_irq+0x80/0x80
[   63.453716]  blk_mq_get_request+0x1a8/0x400
[   63.455700]  blk_mq_alloc_request+0x56/0xc0
[   63.457883]  blk_get_request+0x29/0x100
[   63.459658]  scsi_execute+0x3b/0x230
[   63.461536]  sd_start_stop_device+0x93/0x120
[   63.463616]  sd_resume+0x67/0x70
[   63.465310]  do_scsi_resume+0x16/0x20
[   63.467217]  scsi_dev_type_resume+0x26/0x70
[   63.469287]  async_sdev_resume+0x10/0x20
[   63.471178]  async_run_entry_fn+0x32/0xd0
[   63.473197]  process_one_work+0x32a/0x620
[   63.474939]  worker_thread+0x221/0x3f0
[   63.476752]  kthread+0x124/0x130
[   63.478406]  ? process_one_work+0x620/0x620
[   63.480031]  ? kthread_stop+0x260/0x260
[   63.481579]  ret_from_fork+0x2a/0x40
[   63.483548] 
               Showing all locks held in the system:
[   63.487087] 4 locks held by kworker/2:1/36:
[   63.489578]  #0:  ("events"){.+.+.+}, at: [<ffffffff8109f083>] 
process_one_work+0x1b3/0x620
[   63.494571]  #1:  ((&ap->scsi_rescan_task)){+.+...}, at: 
[<ffffffff8109f083>] process_one_work+0x1b3/0x620
[   63.499045]  #2:  (&ap->scsi_scan_mutex){+.+...}, at: [<ffffffff816f9074>] 
ata_scsi_dev_rescan+0x34/0x110
[   63.505049]  #3:  (&dev->mutex){......}, at: [<ffffffff816d4b41>] 
scsi_rescan_device+0x21/0x90
[   63.510159] 4 locks held by kworker/0:1/37:
[   63.512237]  #0:  ("events"){.+.+.+}, at: [<ffffffff8109f083>] 
process_one_work+0x1b3/0x620
[   63.516225]  #1:  ((&ap->scsi_rescan_task)){+.+...}, at: 
[<ffffffff8109f083>] process_one_work+0x1b3/0x620
[   63.520622]  #2:  (&ap->scsi_scan_mutex){+.+...}, at: [<ffffffff816f9074>] 
ata_scsi_dev_rescan+0x34/0x110
[   63.524931]  #3:  (&dev->mutex){......}, at: [<ffffffff816d4b41>] 
scsi_rescan_device+0x21/0x90
[   63.528790] 4 locks held by kworker/1:1/40:
[   63.530879]  #0:  ("events"){.+.+.+}, at: [<ffffffff8109f083>] 
process_one_work+0x1b3/0x620
[   63.536187]  #1:  ((&ap->scsi_rescan_task)){+.+...}, at: 
[<ffffffff8109f083>] process_one_work+0x1b3/0x620
[   63.540720]  #2:  (&ap->scsi_scan_mutex){+.+...}, at: [<ffffffff816f9074>] 
ata_scsi_dev_rescan+0x34/0x110
[   63.544863]  #3:  (&dev->mutex){......}, at: [<ffffffff816d4b41>] 
scsi_rescan_device+0x21/0x90
[   63.549947] 1 lock held by khungtaskd/461:
[   63.552274]  #0:  (tasklist_lock){.+.+..}, at: [<ffffffff810ce5da>] 
debug_show_all_locks+0x4a/0x170
[   63.557027] 1 lock held by md0_raid10/1716:
[   63.559209]  #0:  (&mddev->reconfig_mutex){+.+.+.}, at: 
[<ffffffff81832cc7>] md_check_recovery+0xe7/0x4b0
[   63.587846] 2 locks held by btrfs-transacti/1867:
[   63.590223]  #0:  (&fs_info->transaction_kthread_mutex){+.+...}, at: 
[<ffffffff81372779>] transaction_kthread+0x49/0x190
[   63.595024]  #1:  (&fs_info->tree_log_mutex){+.+...}, at: 
[<ffffffff81377c1a>] btrfs_commit_transaction+0x3aa/0x910
[   63.600911] 5 locks held by systemd-sleep/2599:
[   63.603394]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811eaeaf>] 
vfs_write+0xaf/0x180
[   63.607819]  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff8126bf99>] 
kernfs_fop_write+0xb9/0x180
[   63.612001]  #2:  (s_active#146){.+.+.+}, at: [<ffffffff8126bfa1>] 
kernfs_fop_write+0xc1/0x180
[   63.616079]  #3:  (pm_mutex){+.+.+.}, at: [<ffffffff810d89f2>] pm_suspend
+0x472/0x490
[   63.621514]  #4:  (acpi_scan_lock){+.+.+.}, at: [<ffffffff814e53e2>] 
acpi_scan_lock_acquire+0x12/0x20
[   63.627617] 3 locks held by kworker/u8:20/2615:
[   63.629583]  #0:  ("events_unbound"){.+.+.+}, at: [<ffffffff8109f083>] 
process_one_work+0x1b3/0x620
[   63.633125]  #1:  ((&entry->work)){+.+.+.}, at: [<ffffffff8109f083>] 
process_one_work+0x1b3/0x620
[   63.636934]  #2:  (&dev->mutex){......}, at: [<ffffffff816ae8f9>] 
device_resume+0x89/0x1e0
[   63.641510] 3 locks held by kworker/u8:21/2616:
[   63.643984]  #0:  ("events_unbound"){.+.+.+}, at: [<ffffffff8109f083>] 
process_one_work+0x1b3/0x620
[   63.649344]  #1:  ((&entry->work)){+.+.+.}, at: [<ffffffff8109f083>] 
process_one_work+0x1b3/0x620
[   63.655669]  #2:  (&dev->mutex){......}, at: [<ffffffff816ae8f9>] 
device_resume+0x89/0x1e0
[   63.660176] 3 locks held by kworker/u8:22/2617:
[   63.662542]  #0:  ("events_unbound"){.+.+.+}, at: [<ffffffff8109f083>] 
process_one_work+0x1b3/0x620
[   63.668704]  #1:  ((&entry->work)){+.+.+.}, at: [<ffffffff8109f083>] 
process_one_work+0x1b3/0x620
[   63.674307]  #2:  (&dev->mutex){......}, at: [<ffffffff816ae8f9>] 
device_resume+0x89/0x1e0
[   63.678492] 2 locks held by kworker/u8:24/2619:
[   63.680737]  #0:  ("events_unbound"){.+.+.+}, at: [<ffffffff8109f083>] 
process_one_work+0x1b3/0x620
[   63.685153]  #1:  ((&entry->work)){+.+.+.}, at: [<ffffffff8109f083>] 
process_one_work+0x1b3/0x620
[   63.689648] 
[   63.691151] =============================================
===

On sobota 26. srpna 2017 12:48:01 CEST Oleksandr Natalenko wrote:
> Quick update: reproduced on both v4.12.7 and v4.13.0-rc6.
> …SNIP…

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-08-26 11:19     ` Martin Steigerwald
@ 2017-08-26 17:17       ` Wols Lists
  2017-08-26 19:33         ` Martin Steigerwald
  0 siblings, 1 reply; 29+ messages in thread
From: Wols Lists @ 2017-08-26 17:17 UTC (permalink / raw)
  To: Martin Steigerwald, Oleksandr Natalenko
  Cc: Ming Lei, Jens Axboe, Christoph Hellwig, linux-block, linux-raid,
	linux-kernel, Shaohua Li

On 26/08/17 12:19, Martin Steigerwald wrote:
> Also… when a hang happened the mouse pointer was frozen, Ctrl-Alt-F1 didn´t 
> work and so on… so it may easily be a completely different issue.
> 
> I did not see much point in reporting it so far… as I have no idea on how to 
> reliably pin-point the issue. It happens once every few days, so a bisect 
> again is out of questions – (it is anyway for a production machine for me) –, 
> it appears to be a hard freeze, so no debug data… its one of these "you don´t 
> get to debug me" hangs again. I really have no idea how to a get hold on such 
> complexity. I am hoping to at least pin-point the exact kernel option that 
> triggers this issue, but it may take weeks to do so. I´d really love a way for 
> the kernel to at least to write out debug data before doing hanging 
> completely.

This sounds like what I'm getting - SuSE 42.3, no raid, doesn't happen
when the power lead is in (but I think the laptop is configured not to
suspend when powered, precisely to avoid exactly this).

It happens to me quite often, unfortunately, but it seems a KDE issue in
that applications work fine, UNTIL KDE seems to get control of the mouse
at which point I can't do anything.

I've got a feeling it's related to wireless networking actually, my
setup is somewhat borked because KDE, systemd, and wifi don't seem to
work nicely together :-(

Cheers,
Wol

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-08-26 17:17       ` Wols Lists
@ 2017-08-26 19:33         ` Martin Steigerwald
  0 siblings, 0 replies; 29+ messages in thread
From: Martin Steigerwald @ 2017-08-26 19:33 UTC (permalink / raw)
  To: Wols Lists
  Cc: Oleksandr Natalenko, Ming Lei, Jens Axboe, Christoph Hellwig,
	linux-block, linux-raid, linux-kernel, Shaohua Li

Wols Lists - 26.08.17, 18:17:
> On 26/08/17 12:19, Martin Steigerwald wrote:
> > Also… when a hang happened the mouse pointer was frozen, Ctrl-Alt-F1
> > didn´t
> > work and so on… so it may easily be a completely different issue.
> > 
> > I did not see much point in reporting it so far… as I have no idea on how
> > to reliably pin-point the issue. It happens once every few days, so a
> > bisect again is out of questions – (it is anyway for a production machine
> > for me) –, it appears to be a hard freeze, so no debug data… its one of
> > these "you don´t get to debug me" hangs again. I really have no idea how
> > to a get hold on such complexity. I am hoping to at least pin-point the
> > exact kernel option that triggers this issue, but it may take weeks to do
> > so. I´d really love a way for the kernel to at least to write out debug
> > data before doing hanging completely.
> 
> This sounds like what I'm getting - SuSE 42.3, no raid, doesn't happen
> when the power lead is in (but I think the laptop is configured not to
> suspend when powered, precisely to avoid exactly this).
> 
> It happens to me quite often, unfortunately, but it seems a KDE issue in
> that applications work fine, UNTIL KDE seems to get control of the mouse
> at which point I can't do anything.
> 
> I've got a feeling it's related to wireless networking actually, my
> setup is somewhat borked because KDE, systemd, and wifi don't seem to
> work nicely together :-(

Well my issue isn´t related to wireless networking as my laptop at home is 
using good old cable based ethernet. I do use Plasma tough. The issue of the 
original poster appears to be blk-mq related.

As written, I have no idea whether my issue is related to blk-mq in any way. 
Thats the next step I´d test. If the current kernel runs stable… then I´d 
enable blk-mq by default again and test whether the hangs reappear. Unless I 
can pinpoint the issue to blk-mq… I have nothing further to comment here, I 
think. (And I didn´t intend to use the thread for every hang on resume issue 
that currently may exist. I just don´t know yet whether my issue is related to 
blk-mq so I chimed in.)

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-08-26 10:48   ` Oleksandr Natalenko
  2017-08-26 11:19     ` Martin Steigerwald
  2017-08-26 13:32     ` Oleksandr Natalenko
@ 2017-08-27  6:02     ` Ming Lei
  2017-08-27  7:43       ` Oleksandr Natalenko
  2 siblings, 1 reply; 29+ messages in thread
From: Ming Lei @ 2017-08-27  6:02 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-raid,
	linux-kernel, Shaohua Li

On Sat, Aug 26, 2017 at 12:48:01PM +0200, Oleksandr Natalenko wrote:
> Quick update: reproduced on both v4.12.7 and v4.13.0-rc6.

BTW, given it hangs during resume, it isn't easy to collect debug
info, and there should have been lots useful info there.

You mentioned that you can reproduce it on QEMU, could you
share the exact raid10 setting? such as which disks behind
the raid10.

Or can you reproduce the issue without raid10 involved?

I will try to reproduce it in my VM if you may provide the
above info.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-08-27  6:02     ` Ming Lei
@ 2017-08-27  7:43       ` Oleksandr Natalenko
  2017-08-28 12:58         ` Ming Lei
  0 siblings, 1 reply; 29+ messages in thread
From: Oleksandr Natalenko @ 2017-08-27  7:43 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-raid,
	linux-kernel, Shaohua Li

Hi.

Here is disk setup for QEMU VM:

===
[root@archmq ~]# smartctl -i /dev/sda
…
Device Model:     QEMU HARDDISK
Serial Number:    QM00001
Firmware Version: 2.5+
User Capacity:    4,294,967,296 bytes [4.29 GB]
Sector Size:      512 bytes logical/physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA/ATAPI-7, ATA/ATAPI-5 published, ANSI NCITS 340-2000
Local Time is:    Sun Aug 27 09:31:54 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

[root@archmq ~]# lsblk
NAME                MAJ:MIN RM  SIZE RO TYPE   MOUNTPOINT
sda                   8:0    0    4G  0 disk   
`-sda1                8:1    0    4G  0 part   
  `-md0               9:0    0    4G  0 raid10 
    `-system        253:0    0    4G  0 crypt  
      |-system-boot 253:1    0  512M  0 lvm    /boot
      |-system-swap 253:2    0  512M  0 lvm    [SWAP]
      `-system-root 253:3    0    3G  0 lvm    /
sdb                   8:16   0    4G  0 disk   
`-sdb1                8:17   0    4G  0 part   
  `-md0               9:0    0    4G  0 raid10 
    `-system        253:0    0    4G  0 crypt  
      |-system-boot 253:1    0  512M  0 lvm    /boot
      |-system-swap 253:2    0  512M  0 lvm    [SWAP]
      `-system-root 253:3    0    3G  0 lvm    /
sr0                  11:0    1 1024M  0 rom

[root@archmq ~]# mdadm --misc --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Sat Jul 29 16:37:05 2017
     Raid Level : raid10
     Array Size : 4191232 (4.00 GiB 4.29 GB)
  Used Dev Size : 4191232 (4.00 GiB 4.29 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Sun Aug 27 09:30:33 2017
          State : clean 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

         Layout : far=2
     Chunk Size : 512K

           Name : archiso:0
           UUID : 43f4be59:c8d2fa0a:a94acdff:1c7f2f4e
         Events : 485

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
===

In words: 2 virtual disks, RAID10 setup with far-2 layout, LUKS on it, then 
LVM, then ext4 for boot, swap and btrfs for /.

I couldn't reproduce the issue with single disk without RAID.

On neděle 27. srpna 2017 8:02:00 CEST Ming Lei wrote:
> On Sat, Aug 26, 2017 at 12:48:01PM +0200, Oleksandr Natalenko wrote:
> > Quick update: reproduced on both v4.12.7 and v4.13.0-rc6.
> 
> BTW, given it hangs during resume, it isn't easy to collect debug
> info, and there should have been lots useful info there.
> 
> You mentioned that you can reproduce it on QEMU, could you
> share the exact raid10 setting? such as which disks behind
> the raid10.
> 
> Or can you reproduce the issue without raid10 involved?
> 
> I will try to reproduce it in my VM if you may provide the
> above info.
> 
> Thanks,
> Ming

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-08-27  7:43       ` Oleksandr Natalenko
@ 2017-08-28 12:58         ` Ming Lei
  2017-08-28 13:10           ` Martin Steigerwald
  2017-08-28 18:22           ` Oleksandr Natalenko
  0 siblings, 2 replies; 29+ messages in thread
From: Ming Lei @ 2017-08-28 12:58 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-raid,
	linux-kernel, Shaohua Li

On Sun, Aug 27, 2017 at 09:43:52AM +0200, Oleksandr Natalenko wrote:
> Hi.
> 
> Here is disk setup for QEMU VM:
> 
> ===
> [root@archmq ~]# smartctl -i /dev/sda
> …
> Device Model:     QEMU HARDDISK
> Serial Number:    QM00001
> Firmware Version: 2.5+
> User Capacity:    4,294,967,296 bytes [4.29 GB]
> Sector Size:      512 bytes logical/physical
> Device is:        Not in smartctl database [for details use: -P showall]
> ATA Version is:   ATA/ATAPI-7, ATA/ATAPI-5 published, ANSI NCITS 340-2000
> Local Time is:    Sun Aug 27 09:31:54 2017 CEST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> [root@archmq ~]# lsblk
> NAME                MAJ:MIN RM  SIZE RO TYPE   MOUNTPOINT
> sda                   8:0    0    4G  0 disk   
> `-sda1                8:1    0    4G  0 part   
>   `-md0               9:0    0    4G  0 raid10 
>     `-system        253:0    0    4G  0 crypt  
>       |-system-boot 253:1    0  512M  0 lvm    /boot
>       |-system-swap 253:2    0  512M  0 lvm    [SWAP]
>       `-system-root 253:3    0    3G  0 lvm    /
> sdb                   8:16   0    4G  0 disk   
> `-sdb1                8:17   0    4G  0 part   
>   `-md0               9:0    0    4G  0 raid10 
>     `-system        253:0    0    4G  0 crypt  
>       |-system-boot 253:1    0  512M  0 lvm    /boot
>       |-system-swap 253:2    0  512M  0 lvm    [SWAP]
>       `-system-root 253:3    0    3G  0 lvm    /
> sr0                  11:0    1 1024M  0 rom
> 
> [root@archmq ~]# mdadm --misc --detail /dev/md0
> /dev/md0:
>         Version : 1.2
>   Creation Time : Sat Jul 29 16:37:05 2017
>      Raid Level : raid10
>      Array Size : 4191232 (4.00 GiB 4.29 GB)
>   Used Dev Size : 4191232 (4.00 GiB 4.29 GB)
>    Raid Devices : 2
>   Total Devices : 2
>     Persistence : Superblock is persistent
> 
>     Update Time : Sun Aug 27 09:30:33 2017
>           State : clean 
>  Active Devices : 2
> Working Devices : 2
>  Failed Devices : 0
>   Spare Devices : 0
> 
>          Layout : far=2
>      Chunk Size : 512K
> 
>            Name : archiso:0
>            UUID : 43f4be59:c8d2fa0a:a94acdff:1c7f2f4e
>          Events : 485
> 
>     Number   Major   Minor   RaidDevice State
>        0       8        1        0      active sync   /dev/sda1
>        1       8       17        1      active sync   /dev/sdb1
> ===
> 
> In words: 2 virtual disks, RAID10 setup with far-2 layout, LUKS on it, then 
> LVM, then ext4 for boot, swap and btrfs for /.
> 
> I couldn't reproduce the issue with single disk without RAID.

Could you verify if the following patch fixes your issue?


>From 9fa53d708ebc1d5b87e62e542dc54272529da244 Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@redhat.com>
Date: Mon, 28 Aug 2017 19:59:08 +0800
Subject: [PATCH] blk-mq: align to legacy path for implementing blk_execute_rq

In legacy path, when one request is run via blk_execute_rq(),
it is added to q->queue_head directly, and I/O scheduler's
queue is bypassed because either merging or sorting isn't
needed.

When SCSI device is put into quiece state, such as during
system suspend, we need to add the request of RQF_PM into
head of the queue.

This patch fixes I/O hang after system resume.

Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c     |  2 +-
 block/blk-exec.c     |  2 +-
 block/blk-flush.c    |  2 +-
 block/blk-mq-sched.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 block/blk-mq-sched.h |  2 ++
 5 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index dbecbf4a64e0..fb75bc646ebc 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2330,7 +2330,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
 	if (q->mq_ops) {
 		if (blk_queue_io_stat(q))
 			blk_account_io_start(rq, true);
-		blk_mq_sched_insert_request(rq, false, true, false, false);
+		blk_mq_sched_insert_request_bypass(rq, false, true, false, false);
 		return BLK_STS_OK;
 	}
 
diff --git a/block/blk-exec.c b/block/blk-exec.c
index 5c0f3dc446dc..4565aa6bb624 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -61,7 +61,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
 	 * be reused after dying flag is set
 	 */
 	if (q->mq_ops) {
-		blk_mq_sched_insert_request(rq, at_head, true, false, false);
+		blk_mq_sched_insert_request_bypass(rq, at_head, true, false, false);
 		return;
 	}
 
diff --git a/block/blk-flush.c b/block/blk-flush.c
index ed5fe322abba..51e89e5c525a 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -463,7 +463,7 @@ void blk_insert_flush(struct request *rq)
 	if ((policy & REQ_FSEQ_DATA) &&
 	    !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
 		if (q->mq_ops)
-			blk_mq_sched_insert_request(rq, false, true, false, false);
+			blk_mq_sched_insert_request_bypass(rq, false, true, false, false);
 		else
 			list_add_tail(&rq->queuelist, &q->queue_head);
 		return;
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 4ab69435708c..eeeea026fb47 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -354,6 +354,64 @@ static void blk_mq_sched_insert_flush(struct blk_mq_hw_ctx *hctx,
 		blk_mq_add_to_requeue_list(rq, false, true);
 }
 
+static void blk_mq_flush_hctx(struct blk_mq_hw_ctx *hctx,
+			      struct elevator_queue *e,
+			      const bool has_sched_dispatch,
+			      struct list_head *rqs)
+{
+	LIST_HEAD(list);
+
+	if (!has_sched_dispatch)
+		blk_mq_flush_busy_ctxs(hctx, &list);
+	else {
+		while (true) {
+			struct request *rq;
+
+			rq = e->type->ops.mq.dispatch_request(hctx);
+			if (!rq)
+				break;
+			list_add_tail(&rq->queuelist, &list);
+		}
+	}
+
+	list_splice_tail(&list, rqs);
+}
+
+void blk_mq_sched_insert_request_bypass(struct request *rq, bool at_head,
+					bool run_queue, bool async,
+					bool can_block)
+{
+	struct request_queue *q = rq->q;
+	struct elevator_queue *e = q->elevator;
+	struct blk_mq_ctx *ctx = rq->mq_ctx;
+	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
+	LIST_HEAD(list);
+	const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
+
+	if (rq->tag == -1 && op_is_flush(rq->cmd_flags)) {
+		blk_mq_sched_insert_flush(hctx, rq, can_block);
+		return;
+	}
+
+	if (at_head)
+		list_add_tail(&rq->queuelist, &list);
+	else {
+		blk_mq_flush_hctx(hctx, e, has_sched_dispatch, &list);
+		list_add_tail(&rq->queuelist, &list);
+		run_queue = true;
+	}
+
+	spin_lock(&hctx->lock);
+	if (at_head)
+		list_splice(&list, &hctx->dispatch);
+	else
+		list_splice_tail(&list, &hctx->dispatch);
+	spin_unlock(&hctx->lock);
+
+	if (run_queue)
+		blk_mq_run_hw_queue(hctx, async);
+}
+
 void blk_mq_sched_insert_request(struct request *rq, bool at_head,
 				 bool run_queue, bool async, bool can_block)
 {
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 9267d0b7c197..4d01697a627f 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -18,6 +18,8 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx);
 
 void blk_mq_sched_insert_request(struct request *rq, bool at_head,
 				 bool run_queue, bool async, bool can_block);
+void blk_mq_sched_insert_request_bypass(struct request *rq, bool at_head,
+					bool run_queue, bool async, bool can_block);
 void blk_mq_sched_insert_requests(struct request_queue *q,
 				  struct blk_mq_ctx *ctx,
 				  struct list_head *list, bool run_queue_async);
-- 
2.9.5



-- 
Ming

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-08-28 12:58         ` Ming Lei
@ 2017-08-28 13:10           ` Martin Steigerwald
  2017-08-28 13:32             ` Ming Lei
  2017-09-20 22:20             ` Ming Lei
  2017-08-28 18:22           ` Oleksandr Natalenko
  1 sibling, 2 replies; 29+ messages in thread
From: Martin Steigerwald @ 2017-08-28 13:10 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oleksandr Natalenko, Jens Axboe, Christoph Hellwig, linux-block,
	linux-raid, linux-kernel, Shaohua Li

Ming Lei - 28.08.17, 20:58:
> On Sun, Aug 27, 2017 at 09:43:52AM +0200, Oleksandr Natalenko wrote:
> > Hi.
> > 
> > Here is disk setup for QEMU VM:
> > 
> > ===
> > [root@archmq ~]# smartctl -i /dev/sda
> > …
> > Device Model:     QEMU HARDDISK
> > Serial Number:    QM00001
> > Firmware Version: 2.5+
> > User Capacity:    4,294,967,296 bytes [4.29 GB]
> > Sector Size:      512 bytes logical/physical
> > Device is:        Not in smartctl database [for details use: -P showall]
> > ATA Version is:   ATA/ATAPI-7, ATA/ATAPI-5 published, ANSI NCITS 340-2000
> > Local Time is:    Sun Aug 27 09:31:54 2017 CEST
> > SMART support is: Available - device has SMART capability.
> > SMART support is: Enabled
> > 
> > [root@archmq ~]# lsblk
> > NAME                MAJ:MIN RM  SIZE RO TYPE   MOUNTPOINT
> > sda                   8:0    0    4G  0 disk
> > `-sda1                8:1    0    4G  0 part
> > 
> >   `-md0               9:0    0    4G  0 raid10
> >   
> >     `-system        253:0    0    4G  0 crypt
> >     
> >       |-system-boot 253:1    0  512M  0 lvm    /boot
> >       |-system-swap 253:2    0  512M  0 lvm    [SWAP]
> >       
> >       `-system-root 253:3    0    3G  0 lvm    /
> > 
> > sdb                   8:16   0    4G  0 disk
> > `-sdb1                8:17   0    4G  0 part
> > 
> >   `-md0               9:0    0    4G  0 raid10
> >   
> >     `-system        253:0    0    4G  0 crypt
> >     
> >       |-system-boot 253:1    0  512M  0 lvm    /boot
> >       |-system-swap 253:2    0  512M  0 lvm    [SWAP]
> >       
> >       `-system-root 253:3    0    3G  0 lvm    /
> > 
> > sr0                  11:0    1 1024M  0 rom
> > 
> > [root@archmq ~]# mdadm --misc --detail /dev/md0
> > 
> > /dev/md0:
> >         Version : 1.2
> >   
> >   Creation Time : Sat Jul 29 16:37:05 2017
> >   
> >      Raid Level : raid10
> >      Array Size : 4191232 (4.00 GiB 4.29 GB)
> >   
> >   Used Dev Size : 4191232 (4.00 GiB 4.29 GB)
> >   
> >    Raid Devices : 2
> >   
> >   Total Devices : 2
> >   
> >     Persistence : Superblock is persistent
> >     
> >     Update Time : Sun Aug 27 09:30:33 2017
> >     
> >           State : clean
> >  
> >  Active Devices : 2
> > 
> > Working Devices : 2
> > 
> >  Failed Devices : 0
> >  
> >   Spare Devices : 0
> >   
> >          Layout : far=2
> >      
> >      Chunk Size : 512K
> >      
> >            Name : archiso:0
> >            UUID : 43f4be59:c8d2fa0a:a94acdff:1c7f2f4e
> >          
> >          Events : 485
> >     
> >     Number   Major   Minor   RaidDevice State
> >     
> >        0       8        1        0      active sync   /dev/sda1
> >        1       8       17        1      active sync   /dev/sdb1
> > 
> > ===
> > 
> > In words: 2 virtual disks, RAID10 setup with far-2 layout, LUKS on it,
> > then
> > LVM, then ext4 for boot, swap and btrfs for /.
> > 
> > I couldn't reproduce the issue with single disk without RAID.
> 
> Could you verify if the following patch fixes your issue?

Could this also apply to non MD RAID systems? I am using BTRFS RAID
1 with two SSDs. So far with CFQ it runs stable.

Thanks,
Martin

> From 9fa53d708ebc1d5b87e62e542dc54272529da244 Mon Sep 17 00:00:00 2001
> From: Ming Lei <ming.lei@redhat.com>
> Date: Mon, 28 Aug 2017 19:59:08 +0800
> Subject: [PATCH] blk-mq: align to legacy path for implementing
> blk_execute_rq
> 
> In legacy path, when one request is run via blk_execute_rq(),
> it is added to q->queue_head directly, and I/O scheduler's
> queue is bypassed because either merging or sorting isn't
> needed.
> 
> When SCSI device is put into quiece state, such as during
> system suspend, we need to add the request of RQF_PM into
> head of the queue.
> 
> This patch fixes I/O hang after system resume.
> 
> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-core.c     |  2 +-
>  block/blk-exec.c     |  2 +-
>  block/blk-flush.c    |  2 +-
>  block/blk-mq-sched.c | 58
> ++++++++++++++++++++++++++++++++++++++++++++++++++++ block/blk-mq-sched.h |
>  2 ++
>  5 files changed, 63 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index dbecbf4a64e0..fb75bc646ebc 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2330,7 +2330,7 @@ blk_status_t blk_insert_cloned_request(struct
> request_queue *q, struct request * if (q->mq_ops) {
>  		if (blk_queue_io_stat(q))
>  			blk_account_io_start(rq, true);
> -		blk_mq_sched_insert_request(rq, false, true, false, false);
> +		blk_mq_sched_insert_request_bypass(rq, false, true, false, false);
>  		return BLK_STS_OK;
>  	}
> 
> diff --git a/block/blk-exec.c b/block/blk-exec.c
> index 5c0f3dc446dc..4565aa6bb624 100644
> --- a/block/blk-exec.c
> +++ b/block/blk-exec.c
> @@ -61,7 +61,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct
> gendisk *bd_disk, * be reused after dying flag is set
>  	 */
>  	if (q->mq_ops) {
> -		blk_mq_sched_insert_request(rq, at_head, true, false, false);
> +		blk_mq_sched_insert_request_bypass(rq, at_head, true, false, false);
>  		return;
>  	}
> 
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index ed5fe322abba..51e89e5c525a 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -463,7 +463,7 @@ void blk_insert_flush(struct request *rq)
>  	if ((policy & REQ_FSEQ_DATA) &&
>  	    !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
>  		if (q->mq_ops)
> -			blk_mq_sched_insert_request(rq, false, true, false, false);
> +			blk_mq_sched_insert_request_bypass(rq, false, true, false, false);
>  		else
>  			list_add_tail(&rq->queuelist, &q->queue_head);
>  		return;
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 4ab69435708c..eeeea026fb47 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -354,6 +354,64 @@ static void blk_mq_sched_insert_flush(struct
> blk_mq_hw_ctx *hctx, blk_mq_add_to_requeue_list(rq, false, true);
>  }
> 
> +static void blk_mq_flush_hctx(struct blk_mq_hw_ctx *hctx,
> +			      struct elevator_queue *e,
> +			      const bool has_sched_dispatch,
> +			      struct list_head *rqs)
> +{
> +	LIST_HEAD(list);
> +
> +	if (!has_sched_dispatch)
> +		blk_mq_flush_busy_ctxs(hctx, &list);
> +	else {
> +		while (true) {
> +			struct request *rq;
> +
> +			rq = e->type->ops.mq.dispatch_request(hctx);
> +			if (!rq)
> +				break;
> +			list_add_tail(&rq->queuelist, &list);
> +		}
> +	}
> +
> +	list_splice_tail(&list, rqs);
> +}
> +
> +void blk_mq_sched_insert_request_bypass(struct request *rq, bool at_head,
> +					bool run_queue, bool async,
> +					bool can_block)
> +{
> +	struct request_queue *q = rq->q;
> +	struct elevator_queue *e = q->elevator;
> +	struct blk_mq_ctx *ctx = rq->mq_ctx;
> +	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
> +	LIST_HEAD(list);
> +	const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
> +
> +	if (rq->tag == -1 && op_is_flush(rq->cmd_flags)) {
> +		blk_mq_sched_insert_flush(hctx, rq, can_block);
> +		return;
> +	}
> +
> +	if (at_head)
> +		list_add_tail(&rq->queuelist, &list);
> +	else {
> +		blk_mq_flush_hctx(hctx, e, has_sched_dispatch, &list);
> +		list_add_tail(&rq->queuelist, &list);
> +		run_queue = true;
> +	}
> +
> +	spin_lock(&hctx->lock);
> +	if (at_head)
> +		list_splice(&list, &hctx->dispatch);
> +	else
> +		list_splice_tail(&list, &hctx->dispatch);
> +	spin_unlock(&hctx->lock);
> +
> +	if (run_queue)
> +		blk_mq_run_hw_queue(hctx, async);
> +}
> +
>  void blk_mq_sched_insert_request(struct request *rq, bool at_head,
>  				 bool run_queue, bool async, bool can_block)
>  {
> diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
> index 9267d0b7c197..4d01697a627f 100644
> --- a/block/blk-mq-sched.h
> +++ b/block/blk-mq-sched.h
> @@ -18,6 +18,8 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx);
> 
>  void blk_mq_sched_insert_request(struct request *rq, bool at_head,
>  				 bool run_queue, bool async, bool can_block);
> +void blk_mq_sched_insert_request_bypass(struct request *rq, bool at_head,
> +					bool run_queue, bool async, bool can_block);
>  void blk_mq_sched_insert_requests(struct request_queue *q,
>  				  struct blk_mq_ctx *ctx,
>  				  struct list_head *list, bool run_queue_async);


-- 
Martin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-08-28 13:10           ` Martin Steigerwald
@ 2017-08-28 13:32             ` Ming Lei
  2017-09-20 17:25               ` Martin Steigerwald
  2017-09-20 22:20             ` Ming Lei
  1 sibling, 1 reply; 29+ messages in thread
From: Ming Lei @ 2017-08-28 13:32 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Oleksandr Natalenko, Jens Axboe, Christoph Hellwig, linux-block,
	linux-raid, linux-kernel, Shaohua Li

On Mon, Aug 28, 2017 at 03:10:35PM +0200, Martin Steigerwald wrote:
> Ming Lei - 28.08.17, 20:58:
> > On Sun, Aug 27, 2017 at 09:43:52AM +0200, Oleksandr Natalenko wrote:
> > > Hi.
> > > 
> > > Here is disk setup for QEMU VM:
> > > 
> > > ===
> > > [root@archmq ~]# smartctl -i /dev/sda
> > > …
> > > Device Model:     QEMU HARDDISK
> > > Serial Number:    QM00001
> > > Firmware Version: 2.5+
> > > User Capacity:    4,294,967,296 bytes [4.29 GB]
> > > Sector Size:      512 bytes logical/physical
> > > Device is:        Not in smartctl database [for details use: -P showall]
> > > ATA Version is:   ATA/ATAPI-7, ATA/ATAPI-5 published, ANSI NCITS 340-2000
> > > Local Time is:    Sun Aug 27 09:31:54 2017 CEST
> > > SMART support is: Available - device has SMART capability.
> > > SMART support is: Enabled
> > > 
> > > [root@archmq ~]# lsblk
> > > NAME                MAJ:MIN RM  SIZE RO TYPE   MOUNTPOINT
> > > sda                   8:0    0    4G  0 disk
> > > `-sda1                8:1    0    4G  0 part
> > > 
> > >   `-md0               9:0    0    4G  0 raid10
> > >   
> > >     `-system        253:0    0    4G  0 crypt
> > >     
> > >       |-system-boot 253:1    0  512M  0 lvm    /boot
> > >       |-system-swap 253:2    0  512M  0 lvm    [SWAP]
> > >       
> > >       `-system-root 253:3    0    3G  0 lvm    /
> > > 
> > > sdb                   8:16   0    4G  0 disk
> > > `-sdb1                8:17   0    4G  0 part
> > > 
> > >   `-md0               9:0    0    4G  0 raid10
> > >   
> > >     `-system        253:0    0    4G  0 crypt
> > >     
> > >       |-system-boot 253:1    0  512M  0 lvm    /boot
> > >       |-system-swap 253:2    0  512M  0 lvm    [SWAP]
> > >       
> > >       `-system-root 253:3    0    3G  0 lvm    /
> > > 
> > > sr0                  11:0    1 1024M  0 rom
> > > 
> > > [root@archmq ~]# mdadm --misc --detail /dev/md0
> > > 
> > > /dev/md0:
> > >         Version : 1.2
> > >   
> > >   Creation Time : Sat Jul 29 16:37:05 2017
> > >   
> > >      Raid Level : raid10
> > >      Array Size : 4191232 (4.00 GiB 4.29 GB)
> > >   
> > >   Used Dev Size : 4191232 (4.00 GiB 4.29 GB)
> > >   
> > >    Raid Devices : 2
> > >   
> > >   Total Devices : 2
> > >   
> > >     Persistence : Superblock is persistent
> > >     
> > >     Update Time : Sun Aug 27 09:30:33 2017
> > >     
> > >           State : clean
> > >  
> > >  Active Devices : 2
> > > 
> > > Working Devices : 2
> > > 
> > >  Failed Devices : 0
> > >  
> > >   Spare Devices : 0
> > >   
> > >          Layout : far=2
> > >      
> > >      Chunk Size : 512K
> > >      
> > >            Name : archiso:0
> > >            UUID : 43f4be59:c8d2fa0a:a94acdff:1c7f2f4e
> > >          
> > >          Events : 485
> > >     
> > >     Number   Major   Minor   RaidDevice State
> > >     
> > >        0       8        1        0      active sync   /dev/sda1
> > >        1       8       17        1      active sync   /dev/sdb1
> > > 
> > > ===
> > > 
> > > In words: 2 virtual disks, RAID10 setup with far-2 layout, LUKS on it,
> > > then
> > > LVM, then ext4 for boot, swap and btrfs for /.
> > > 
> > > I couldn't reproduce the issue with single disk without RAID.
> > 
> > Could you verify if the following patch fixes your issue?
> 
> Could this also apply to non MD RAID systems? I am using BTRFS RAID
> 1 with two SSDs. So far with CFQ it runs stable.

It is for fixing Oleksandr's issue wrt. blk-mq, and looks not for you.

-- 
Ming

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-08-28 12:58         ` Ming Lei
  2017-08-28 13:10           ` Martin Steigerwald
@ 2017-08-28 18:22           ` Oleksandr Natalenko
  2017-08-29  0:24             ` Ming Lei
  1 sibling, 1 reply; 29+ messages in thread
From: Oleksandr Natalenko @ 2017-08-28 18:22 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-raid,
	linux-kernel, Shaohua Li

Hi.

On pondělí 28. srpna 2017 14:58:28 CEST Ming Lei wrote:
> Could you verify if the following patch fixes your issue?
> …SNIP…

I've applied it to v4.12.9 and rechecked — the issue is still there, 
unfortunately. Stacktrace is the same as before.

Were you able to reproduce it in a VM?

Should I re-check it with v4.13-rc7?

Any other suggestions?

Thanks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-08-28 18:22           ` Oleksandr Natalenko
@ 2017-08-29  0:24             ` Ming Lei
  2017-08-29 15:52               ` Oleksandr Natalenko
  0 siblings, 1 reply; 29+ messages in thread
From: Ming Lei @ 2017-08-29  0:24 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-raid,
	linux-kernel, Shaohua Li

On Mon, Aug 28, 2017 at 08:22:26PM +0200, Oleksandr Natalenko wrote:
> Hi.
> 
> On pondělí 28. srpna 2017 14:58:28 CEST Ming Lei wrote:
> > Could you verify if the following patch fixes your issue?
> > …SNIP…
> 
> I've applied it to v4.12.9 and rechecked — the issue is still there, 
> unfortunately. Stacktrace is the same as before.
> 
> Were you able to reproduce it in a VM?

Yes, I can.

> 
> Should I re-check it with v4.13-rc7?
> 
> Any other suggestions?

Please test it with v4.13-rc7 first.

-- 
Ming

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-08-29  0:24             ` Ming Lei
@ 2017-08-29 15:52               ` Oleksandr Natalenko
  2017-08-30  2:15                 ` Ming Lei
  0 siblings, 1 reply; 29+ messages in thread
From: Oleksandr Natalenko @ 2017-08-29 15:52 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-raid,
	linux-kernel, Shaohua Li

Hello.

Re-tested with v4.13-rc7 + proposed patch and got the same result.

Let me know if any additional testing is needed.

===
[   82.638148] INFO: task md0_raid10:193 blocked for more than 20 seconds.
[   82.642804]       Not tainted 4.13.0-pf1 #1
[   82.646998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[   82.649007] md0_raid10      D    0   193      2 0x00000000
[   82.650799] Call Trace:
[   82.652118]  __schedule+0x239/0x890
[   82.653469]  schedule+0x3d/0x90
[   82.654649]  md_super_wait+0x6e/0xa0 [md_mod]
[   82.656186]  ? wait_woken+0x80/0x80
[   82.657333]  md_update_sb.part.59+0x3df/0x840 [md_mod]
[   82.659084]  ? percpu_ref_switch_to_percpu+0x36/0x40
[   82.660477]  md_check_recovery+0x453/0x520 [md_mod]
[   82.662125]  raid10d+0x62/0x1420 [raid10]
[   82.663457]  ? __schedule+0x241/0x890
[   82.664766]  ? schedule+0x3d/0x90
[   82.665816]  ? schedule_timeout+0x208/0x390
[   82.666785]  md_thread+0x120/0x160 [md_mod]
[   82.668117]  ? md_thread+0x120/0x160 [md_mod]
[   82.669487]  ? wait_woken+0x80/0x80
[   82.670531]  kthread+0x125/0x140
[   82.671602]  ? find_pers+0x70/0x70 [md_mod]
[   82.672452]  ? kthread_create_on_node+0x70/0x70
[   82.673844]  ret_from_fork+0x25/0x30
[   82.674991] INFO: task dmcrypt_write:226 blocked for more than 20 seconds.
[   82.678246]       Not tainted 4.13.0-pf1 #1
[   82.679336] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[   82.681175] dmcrypt_write   D    0   226      2 0x00000000
[   82.682522] Call Trace:
[   82.683104]  __schedule+0x239/0x890
[   82.684099]  schedule+0x3d/0x90
[   82.684754]  md_write_start+0xe3/0x270 [md_mod]
[   82.685599]  ? wait_woken+0x80/0x80
[   82.686273]  raid10_make_request+0x3f/0x140 [raid10]
[   82.687258]  md_make_request+0xa2/0x290 [md_mod]
[   82.688132]  ? _raw_spin_unlock_irq+0x10/0x30
[   82.689660]  ? finish_task_switch+0x75/0x200
[   82.690624]  generic_make_request+0x125/0x320
[   82.691722]  dmcrypt_write+0x22d/0x250 [dm_crypt]
[   82.693120]  ? dmcrypt_write+0x22d/0x250 [dm_crypt]
[   82.694760]  ? wake_up_q+0x80/0x80
[   82.695752]  kthread+0x125/0x140
[   82.696831]  ? kthread+0x125/0x140
[   82.697964]  ? crypt_iv_essiv_dtr+0x70/0x70 [dm_crypt]
[   82.699340]  ? kthread_create_on_node+0x70/0x70
[   82.700728]  ret_from_fork+0x25/0x30
[   82.702378] INFO: task NetworkManager:432 blocked for more than 20 seconds.
[   82.704022]       Not tainted 4.13.0-pf1 #1
[   82.705264] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[   82.709068] NetworkManager  D    0   432      1 0x00000000
[   82.710189] Call Trace:
[   82.711041]  __schedule+0x239/0x890
[   82.712162]  schedule+0x3d/0x90
[   82.713223]  io_schedule+0x16/0x40
[   82.714324]  wait_on_page_bit_common+0xe7/0x170
[   82.715644]  ? page_cache_tree_insert+0xc0/0xc0
[   82.717008]  __filemap_fdatawait_range+0x10d/0x170
[   82.718562]  ? __filemap_fdatawrite_range+0xc1/0x100
[   82.720103]  ? __filemap_fdatawrite_range+0xcd/0x100
[   82.721561]  file_write_and_wait_range+0x78/0xa0
[   82.722839]  xfs_file_fsync+0x5f/0x210 [xfs]
[   82.724085]  vfs_fsync_range+0x4b/0xb0
[   82.725152]  do_fsync+0x3d/0x70
[   82.726535]  SyS_fsync+0x10/0x20
[   82.727953]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[   82.729196] RIP: 0033:0x7fcd816f6c8d
[   82.730336] RSP: 002b:00007ffd4650d4d0 EFLAGS: 00000293 ORIG_RAX: 
000000000000004a
[   82.732192] RAX: ffffffffffffffda RBX: 0000000000000013 RCX: 
00007fcd816f6c8d
[   82.733922] RDX: 00007ffd4650d4f0 RSI: 00007ffd4650d4f0 RDI: 
0000000000000013
[   82.735313] RBP: 000056436710ff00 R08: 0000000000000000 R09: 
00000000313d26e9
[   82.738973] R10: 000000000000003d R11: 0000000000000293 R12: 
00007ffd4650d638
[   82.740531] R13: 0000000000000001 R14: 0000000000000002 R15: 
00005643670eb1c0
[   82.741869] INFO: task sync:676 blocked for more than 20 seconds.
[   82.743116]       Not tainted 4.13.0-pf1 #1
[   82.744286] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[   82.745935] sync            D    0   676    542 0x00000000
[   82.747338] Call Trace:
[   82.748281]  __schedule+0x239/0x890
[   82.749421]  schedule+0x3d/0x90
[   82.750458]  io_schedule+0x16/0x40
[   82.751420]  wait_on_page_bit_common+0xe7/0x170
[   82.752668]  ? page_cache_tree_insert+0xc0/0xc0
[   82.753873]  __filemap_fdatawait_range+0x10d/0x170
[   82.755153]  ? finish_wait+0x56/0x70
[   82.756226]  filemap_fdatawait_keep_errors+0x27/0x50
[   82.757568]  sync_inodes_sb+0x204/0x2a0
[   82.758788]  ? SyS_tee+0x3d0/0x3d0
[   82.759926]  sync_inodes_one_sb+0x16/0x20
[   82.761152]  iterate_supers+0x94/0x100
[   82.762419]  sys_sync+0x44/0xb0
[   82.763499]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[   82.764495] RIP: 0033:0x7fd85d7941d7
[   82.765736] RSP: 002b:00007ffe648d4f48 EFLAGS: 00000206 ORIG_RAX: 
00000000000000a2
[   82.769068] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 
00007fd85d7941d7
[   82.770663] RDX: 00007fd85da50e01 RSI: 0000000000000000 RDI: 
00007fd85d8197d3
[   82.772211] RBP: 0000000000002710 R08: 0000000000000000 R09: 
0000000000000000
[   82.774138] R10: 000000000000082c R11: 0000000000000206 R12: 
00007fd85da4ead8
[   82.776430] R13: 0000000000000030 R14: 0000000000c6cfd0 R15: 
00007fd85da4ea80
===

On úterý 29. srpna 2017 2:24:25 CEST Ming Lei wrote:
> On Mon, Aug 28, 2017 at 08:22:26PM +0200, Oleksandr Natalenko wrote:
> > Hi.
> > 
> > On pondělí 28. srpna 2017 14:58:28 CEST Ming Lei wrote:
> > > Could you verify if the following patch fixes your issue?
> > > …SNIP…
> > 
> > I've applied it to v4.12.9 and rechecked — the issue is still there,
> > unfortunately. Stacktrace is the same as before.
> > 
> > Were you able to reproduce it in a VM?
> 
> Yes, I can.
> 
> > Should I re-check it with v4.13-rc7?
> > 
> > Any other suggestions?
> 
> Please test it with v4.13-rc7 first.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-08-29 15:52               ` Oleksandr Natalenko
@ 2017-08-30  2:15                 ` Ming Lei
  2017-08-30  5:17                   ` Ming Lei
  0 siblings, 1 reply; 29+ messages in thread
From: Ming Lei @ 2017-08-30  2:15 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-raid,
	linux-kernel, Shaohua Li

Hi,

On Tue, Aug 29, 2017 at 05:52:42PM +0200, Oleksandr Natalenko wrote:
> Hello.
> 
> Re-tested with v4.13-rc7 + proposed patch and got the same result.

Maybe there is another issue, I didn't use dmcrypt on raid10, will
test in your way to see if I can reproduce it.

BTW, could you share us which blk-mq scheduler you are using on sata?
The patch I posted should address one issue on none scheduler.


-- 
Ming

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-08-30  2:15                 ` Ming Lei
@ 2017-08-30  5:17                   ` Ming Lei
  2017-08-30  6:15                     ` oleksandr
  0 siblings, 1 reply; 29+ messages in thread
From: Ming Lei @ 2017-08-30  5:17 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-raid,
	linux-kernel, Shaohua Li

On Wed, Aug 30, 2017 at 10:15:37AM +0800, Ming Lei wrote:
> Hi,
> 
> On Tue, Aug 29, 2017 at 05:52:42PM +0200, Oleksandr Natalenko wrote:
> > Hello.
> > 
> > Re-tested with v4.13-rc7 + proposed patch and got the same result.
> 
> Maybe there is another issue, I didn't use dmcrypt on raid10, will
> test in your way to see if I can reproduce it.
> 
> BTW, could you share us which blk-mq scheduler you are using on sata?
> The patch I posted should address one issue on none scheduler.

Can't reproduce even with putting dmcypt on raid10 after applying
my patch.

Could you apply the following debug patch and provide the dmesg log
after running the commands below?

	# echo 9 > /proc/sys/kernel/printk
	# echo devices > /sys/power/pm_test
	# echo mem > /sys/power/state

BTW, it is better to provide the two sata disk(behind raid10) name.

diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
index b44c1bb687a2..75b13248ea1c 100644
--- a/drivers/scsi/scsi_pm.c
+++ b/drivers/scsi/scsi_pm.c
@@ -53,17 +53,22 @@ static int scsi_dev_type_suspend(struct device *dev,
 {
 	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 	int err;
+	struct scsi_device *sdev = to_scsi_device(dev);
 
 	/* flush pending in-flight resume operations, suspend is synchronous */
 	async_synchronize_full_domain(&scsi_sd_pm_domain);
 
-	err = scsi_device_quiesce(to_scsi_device(dev));
+	sdev_printk(KERN_WARNING, sdev, "%s: enter\n", __func__);
+	err = scsi_device_quiesce(sdev);
 	if (err == 0) {
+		sdev_printk(KERN_WARNING, sdev, "%s: before suspend\n", __func__);
 		err = cb(dev, pm);
+		sdev_printk(KERN_WARNING, sdev, "%s: after suspend\n", __func__);
 		if (err)
-			scsi_device_resume(to_scsi_device(dev));
+			scsi_device_resume(sdev);
 	}
 	dev_dbg(dev, "scsi suspend: %d\n", err);
+	sdev_printk(KERN_WARNING, sdev, "%s: exit\n", __func__);
 	return err;
 }
 
@@ -72,9 +77,13 @@ static int scsi_dev_type_resume(struct device *dev,
 {
 	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 	int err = 0;
+	struct scsi_device *sdev = to_scsi_device(dev);
 
+	sdev_printk(KERN_WARNING, sdev, "%s: enter\n", __func__);
+	sdev_printk(KERN_WARNING, sdev, "%s: before resume\n", __func__);
 	err = cb(dev, pm);
-	scsi_device_resume(to_scsi_device(dev));
+	sdev_printk(KERN_WARNING, sdev, "%s: after resume\n", __func__);
+	scsi_device_resume(sdev);
 	dev_dbg(dev, "scsi resume: %d\n", err);
 
 	if (err == 0) {
@@ -83,6 +92,7 @@ static int scsi_dev_type_resume(struct device *dev,
 		pm_runtime_enable(dev);
 	}
 
+	sdev_printk(KERN_WARNING, sdev, "%s: exit\n", __func__);
 	return err;
 }


-- 
Ming

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-08-30  5:17                   ` Ming Lei
@ 2017-08-30  6:15                     ` oleksandr
  2017-08-30  8:06                       ` Ming Lei
  0 siblings, 1 reply; 29+ messages in thread
From: oleksandr @ 2017-08-30  6:15 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-raid,
	linux-kernel, Shaohua Li

Hello.

Addressing your questions below.

> Can't reproduce even with putting dmcypt on raid10 after applying my 
> patch.

Just a side note, that dm-crypt is not necessary here — I am able to 
trigger hang with RAID10 and LVM only.

> BTW, could you share us which blk-mq scheduler you are using on sata?

Okay, now it makes sense. Previously, without your patch I was able to 
reproduce the issue with "mq-deadline", "bfq" and "none". Now, I cannot 
trigger it with "none" set, but "mq-deadline" still hangs for me. Does 
this mean each scheduler should be modified separately?

> Could you apply the following debug patch and provide the dmesg log 
> after running the commands below?

Is it still relevant since I confirm issue to be fixed with "none"?

Thanks.

30.08.2017 07:17, Ming Lei написав:
> …SNIP…
> Can't reproduce even with putting dmcypt on raid10 after applying
> my patch.
> …SNIP…

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-08-30  6:15                     ` oleksandr
@ 2017-08-30  8:06                       ` Ming Lei
  2017-08-30 10:58                         ` oleksandr
  0 siblings, 1 reply; 29+ messages in thread
From: Ming Lei @ 2017-08-30  8:06 UTC (permalink / raw)
  To: oleksandr
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-raid,
	linux-kernel, Shaohua Li

Hi,

On Wed, Aug 30, 2017 at 08:15:02AM +0200, oleksandr@natalenko.name wrote:
> Hello.
> 
> Addressing your questions below.
> 
> > Can't reproduce even with putting dmcypt on raid10 after applying my
> > patch.
> 
> Just a side note, that dm-crypt is not necessary here — I am able to trigger
> hang with RAID10 and LVM only.
> 
> > BTW, could you share us which blk-mq scheduler you are using on sata?
> 
> Okay, now it makes sense. Previously, without your patch I was able to
> reproduce the issue with "mq-deadline", "bfq" and "none". Now, I cannot
> trigger it with "none" set, but "mq-deadline" still hangs for me. Does this
> mean each scheduler should be modified separately?

No, it shouldn't.

> 
> > Could you apply the following debug patch and provide the dmesg log
> > after running the commands below?
> 
> Is it still relevant since I confirm issue to be fixed with "none"?

Please try the following patch and previous patch together and see
if they make a difference:


>From bc5626b4b65c7ff26567e75f42584c2c43ffe7c1 Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@redhat.com>
Date: Wed, 30 Aug 2017 15:53:01 +0800
Subject: [PATCH] blk-mq: add requests in the tail of hctx->dispatch

So that we can allow to insert request at the head
of hctx->dispatch.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c | 2 +-
 block/blk-mq.c       | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index eeeea026fb47..b40dd063d61f 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -272,7 +272,7 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
 	 * the dispatch list.
 	 */
 	spin_lock(&hctx->lock);
-	list_add(&rq->queuelist, &hctx->dispatch);
+	list_add_tail(&rq->queuelist, &hctx->dispatch);
 	spin_unlock(&hctx->lock);
 	return true;
 }
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4603b115e234..fed3d0c16266 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1067,7 +1067,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
 		blk_mq_put_driver_tag(rq);
 
 		spin_lock(&hctx->lock);
-		list_splice_init(list, &hctx->dispatch);
+		list_splice_tail_init(list, &hctx->dispatch);
 		spin_unlock(&hctx->lock);
 
 		/*
-- 
2.9.5


-- 
Ming

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-08-30  8:06                       ` Ming Lei
@ 2017-08-30 10:58                         ` oleksandr
  2017-08-30 14:37                           ` Ming Lei
  0 siblings, 1 reply; 29+ messages in thread
From: oleksandr @ 2017-08-30 10:58 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-raid,
	linux-kernel, Shaohua Li

Hi.

So, current summary:

1) first patch + debug patch: I can reproduce the issue in wild, but not 
with pm_test set;
2) first patch + debug patch + second patch: I cannot reproduce issue at 
all, neither with "none", nor with "mq-deadline".

Thus, "blk-mq: align to legacy path for implementing blk_execute_rq" + 
"blk-mq: add requests in the tail of hctx->dispatch" look like a proper 
fix. Thanks for that!

> I setup dm-crypt on /dev/md0 directly, could you show me how to setup 
> lvm on raid10?

If still relevant, here is a layout:

===
sda                 8:0    0    8G  0 disk
|-sda1              8:1    0  128M  0 part   /boot/EFI
`-sda2              8:2    0  7.9G  0 part
   `-md0             9:0    0  7.9G  0 raid10
     `-system-root 253:0    0  7.9G  0 lvm    /
sdb                 8:16   0    8G  0 disk
|-sdb1              8:17   0  128M  0 part
`-sdb2              8:18   0  7.9G  0 part
   `-md0             9:0    0  7.9G  0 raid10
     `-system-root 253:0    0  7.9G  0 lvm    /
===

Anything else you'd like /me to test?

30.08.2017 10:06, Ming Lei написав:
> …SNIP…
> Please try the following patch and previous patch together and see
> if they make a difference:
> …SNIP…

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-08-30 10:58                         ` oleksandr
@ 2017-08-30 14:37                           ` Ming Lei
  0 siblings, 0 replies; 29+ messages in thread
From: Ming Lei @ 2017-08-30 14:37 UTC (permalink / raw)
  To: oleksandr
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-raid,
	linux-kernel, Shaohua Li

Hi,

On Wed, Aug 30, 2017 at 12:58:21PM +0200, oleksandr@natalenko.name wrote:
> Hi.
> 
> So, current summary:
> 
> 1) first patch + debug patch: I can reproduce the issue in wild, but not
> with pm_test set;

That is interesting, since I always test via pm_test.

> 2) first patch + debug patch + second patch: I cannot reproduce issue at
> all, neither with "none", nor with "mq-deadline".
> 
> Thus, "blk-mq: align to legacy path for implementing blk_execute_rq" +
> "blk-mq: add requests in the tail of hctx->dispatch" look like a proper fix.
> Thanks for that!

That is great, will prepare a formal one for merge.

> 
> > I setup dm-crypt on /dev/md0 directly, could you show me how to setup
> > lvm on raid10?
> 
> If still relevant, here is a layout:

Looks not relevant, just because I use pm_test.

> 
> ===
> sda                 8:0    0    8G  0 disk
> |-sda1              8:1    0  128M  0 part   /boot/EFI
> `-sda2              8:2    0  7.9G  0 part
>   `-md0             9:0    0  7.9G  0 raid10
>     `-system-root 253:0    0  7.9G  0 lvm    /
> sdb                 8:16   0    8G  0 disk
> |-sdb1              8:17   0  128M  0 part
> `-sdb2              8:18   0  7.9G  0 part
>   `-md0             9:0    0  7.9G  0 raid10
>     `-system-root 253:0    0  7.9G  0 lvm    /
> ===
> 
> Anything else you'd like /me to test?

You have done enough, thanks a lot for your report and test!


-- 
Ming

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-08-28 13:32             ` Ming Lei
@ 2017-09-20 17:25               ` Martin Steigerwald
  2017-09-20 22:17                 ` Ming Lei
  0 siblings, 1 reply; 29+ messages in thread
From: Martin Steigerwald @ 2017-09-20 17:25 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oleksandr Natalenko, Jens Axboe, Christoph Hellwig, linux-block,
	linux-raid, linux-kernel, Shaohua Li

[-- Attachment #1: Type: text/plain, Size: 1720 bytes --]

Ming Lei - 28.08.17, 21:32:
> On Mon, Aug 28, 2017 at 03:10:35PM +0200, Martin Steigerwald wrote:
> > Ming Lei - 28.08.17, 20:58:
> > > On Sun, Aug 27, 2017 at 09:43:52AM +0200, Oleksandr Natalenko wrote:
> > > > Hi.
> > > > 
> > > > Here is disk setup for QEMU VM:
[…]
> > > > In words: 2 virtual disks, RAID10 setup with far-2 layout, LUKS on it,
> > > > then
> > > > LVM, then ext4 for boot, swap and btrfs for /.
> > > > 
> > > > I couldn't reproduce the issue with single disk without RAID.
> > > 
> > > Could you verify if the following patch fixes your issue?
> > 
> > Could this also apply to non MD RAID systems? I am using BTRFS RAID
> > 1 with two SSDs. So far with CFQ it runs stable.
> 
> It is for fixing Oleksandr's issue wrt. blk-mq, and looks not for you.

My findings are different:

On 4.12.10 with CONFIG_HZ=1000, CONFIG_PREEMPT=y and optimizations for Intel 
Core/newer Xeon I see this:

1) Running with CFQ: No hang after resume

2) Running with scsi_mod.use_blk_mq=1 + BFQ: Hang after resume within first 1-2 
days.

However with 4.12.9 with CONFIG_HZ=250, CONFIG_PREEMPT_VOLUNTARY=y + no CPU 
optimizations: No hang after resume, no matter whether I use CFQ or scsi-mq + 
BFQ.

Both configs attached.

Also 4.13.2 with CFQ is stable so far.

I am looking forward for fixes to appear in 4.13.x with x>2 and retest with BFQ 
then, unless you have a different advice.

I am running Debian Sid/Experimental from BTRFS RAID 1 with two SSDs

merkaba:~> lsscsi | grep ATA
[0:0:0:0]    disk    ATA      INTEL SSDSA2CW30 0362  /dev/sda 
[2:0:0:0]    disk    ATA      Crucial_CT480M50 MU03  /dev/sdb

on ThinkPad T520 (Sandybridge).

Thanks,
-- 
Martin

[-- Attachment #2: config-4.12.10-tp520-btrfstrim.xz --]
[-- Type: application/x-xz, Size: 26728 bytes --]

[-- Attachment #3: config-4.12.9-tp520-btrfstrim+.xz --]
[-- Type: application/x-xz, Size: 26680 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-09-20 17:25               ` Martin Steigerwald
@ 2017-09-20 22:17                 ` Ming Lei
  2017-09-24 17:33                   ` Martin Steigerwald
  0 siblings, 1 reply; 29+ messages in thread
From: Ming Lei @ 2017-09-20 22:17 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Oleksandr Natalenko, Jens Axboe, Christoph Hellwig, linux-block,
	linux-raid, linux-kernel, Shaohua Li

On Wed, Sep 20, 2017 at 07:25:02PM +0200, Martin Steigerwald wrote:
> Ming Lei - 28.08.17, 21:32:
> > On Mon, Aug 28, 2017 at 03:10:35PM +0200, Martin Steigerwald wrote:
> > > Ming Lei - 28.08.17, 20:58:
> > > > On Sun, Aug 27, 2017 at 09:43:52AM +0200, Oleksandr Natalenko wrote:
> > > > > Hi.
> > > > > 
> > > > > Here is disk setup for QEMU VM:
> […]
> > > > > In words: 2 virtual disks, RAID10 setup with far-2 layout, LUKS on it,
> > > > > then
> > > > > LVM, then ext4 for boot, swap and btrfs for /.
> > > > > 
> > > > > I couldn't reproduce the issue with single disk without RAID.
> > > > 
> > > > Could you verify if the following patch fixes your issue?
> > > 
> > > Could this also apply to non MD RAID systems? I am using BTRFS RAID
> > > 1 with two SSDs. So far with CFQ it runs stable.
> > 
> > It is for fixing Oleksandr's issue wrt. blk-mq, and looks not for you.
> 
> My findings are different:
> 
> On 4.12.10 with CONFIG_HZ=1000, CONFIG_PREEMPT=y and optimizations for Intel 
> Core/newer Xeon I see this:
> 
> 1) Running with CFQ: No hang after resume
> 
> 2) Running with scsi_mod.use_blk_mq=1 + BFQ: Hang after resume within first 1-2 
> days.

Hi Martin,

Thanks for your report!

Could you test the following patchset to see if it fixes your issue?

	https://marc.info/?l=linux-block&m=150579298505484&w=2


-- 
Ming

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-08-28 13:10           ` Martin Steigerwald
  2017-08-28 13:32             ` Ming Lei
@ 2017-09-20 22:20             ` Ming Lei
  2017-09-21  7:30               ` Martin Steigerwald
  1 sibling, 1 reply; 29+ messages in thread
From: Ming Lei @ 2017-09-20 22:20 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Oleksandr Natalenko, Jens Axboe, Christoph Hellwig, linux-block,
	linux-raid, linux-kernel, Shaohua Li

On Mon, Aug 28, 2017 at 03:10:35PM +0200, Martin Steigerwald wrote:
> Ming Lei - 28.08.17, 20:58:
> > On Sun, Aug 27, 2017 at 09:43:52AM +0200, Oleksandr Natalenko wrote:
> > > Hi.
> > > 
> > > Here is disk setup for QEMU VM:
> > > 
> > > ===
> > > [root@archmq ~]# smartctl -i /dev/sda
> > > …
> > > Device Model:     QEMU HARDDISK
> > > Serial Number:    QM00001
> > > Firmware Version: 2.5+
> > > User Capacity:    4,294,967,296 bytes [4.29 GB]
> > > Sector Size:      512 bytes logical/physical
> > > Device is:        Not in smartctl database [for details use: -P showall]
> > > ATA Version is:   ATA/ATAPI-7, ATA/ATAPI-5 published, ANSI NCITS 340-2000
> > > Local Time is:    Sun Aug 27 09:31:54 2017 CEST
> > > SMART support is: Available - device has SMART capability.
> > > SMART support is: Enabled
> > > 
> > > [root@archmq ~]# lsblk
> > > NAME                MAJ:MIN RM  SIZE RO TYPE   MOUNTPOINT
> > > sda                   8:0    0    4G  0 disk
> > > `-sda1                8:1    0    4G  0 part
> > > 
> > >   `-md0               9:0    0    4G  0 raid10
> > >   
> > >     `-system        253:0    0    4G  0 crypt
> > >     
> > >       |-system-boot 253:1    0  512M  0 lvm    /boot
> > >       |-system-swap 253:2    0  512M  0 lvm    [SWAP]
> > >       
> > >       `-system-root 253:3    0    3G  0 lvm    /
> > > 
> > > sdb                   8:16   0    4G  0 disk
> > > `-sdb1                8:17   0    4G  0 part
> > > 
> > >   `-md0               9:0    0    4G  0 raid10
> > >   
> > >     `-system        253:0    0    4G  0 crypt
> > >     
> > >       |-system-boot 253:1    0  512M  0 lvm    /boot
> > >       |-system-swap 253:2    0  512M  0 lvm    [SWAP]
> > >       
> > >       `-system-root 253:3    0    3G  0 lvm    /
> > > 
> > > sr0                  11:0    1 1024M  0 rom
> > > 
> > > [root@archmq ~]# mdadm --misc --detail /dev/md0
> > > 
> > > /dev/md0:
> > >         Version : 1.2
> > >   
> > >   Creation Time : Sat Jul 29 16:37:05 2017
> > >   
> > >      Raid Level : raid10
> > >      Array Size : 4191232 (4.00 GiB 4.29 GB)
> > >   
> > >   Used Dev Size : 4191232 (4.00 GiB 4.29 GB)
> > >   
> > >    Raid Devices : 2
> > >   
> > >   Total Devices : 2
> > >   
> > >     Persistence : Superblock is persistent
> > >     
> > >     Update Time : Sun Aug 27 09:30:33 2017
> > >     
> > >           State : clean
> > >  
> > >  Active Devices : 2
> > > 
> > > Working Devices : 2
> > > 
> > >  Failed Devices : 0
> > >  
> > >   Spare Devices : 0
> > >   
> > >          Layout : far=2
> > >      
> > >      Chunk Size : 512K
> > >      
> > >            Name : archiso:0
> > >            UUID : 43f4be59:c8d2fa0a:a94acdff:1c7f2f4e
> > >          
> > >          Events : 485
> > >     
> > >     Number   Major   Minor   RaidDevice State
> > >     
> > >        0       8        1        0      active sync   /dev/sda1
> > >        1       8       17        1      active sync   /dev/sdb1
> > > 
> > > ===
> > > 
> > > In words: 2 virtual disks, RAID10 setup with far-2 layout, LUKS on it,
> > > then
> > > LVM, then ext4 for boot, swap and btrfs for /.
> > > 
> > > I couldn't reproduce the issue with single disk without RAID.
> > 
> > Could you verify if the following patch fixes your issue?

Yes, the patch should address this kind of issue, not related
with RAID specially, and the latest version can be found in the
following link:

	https://marc.info/?l=linux-block&m=150579298505484&w=2

-- 
Ming

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-09-20 22:20             ` Ming Lei
@ 2017-09-21  7:30               ` Martin Steigerwald
  2017-09-21  7:33                 ` Martin Steigerwald
  0 siblings, 1 reply; 29+ messages in thread
From: Martin Steigerwald @ 2017-09-21  7:30 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oleksandr Natalenko, Jens Axboe, Christoph Hellwig, linux-block,
	linux-raid, linux-kernel, Shaohua Li

Ming Lei - 21.09.17, 06:20:
> On Mon, Aug 28, 2017 at 03:10:35PM +0200, Martin Steigerwald wrote:
> > Ming Lei - 28.08.17, 20:58:
> > > On Sun, Aug 27, 2017 at 09:43:52AM +0200, Oleksandr Natalenko wrote:
> > > > Hi.
> > > > 
> > > > Here is disk setup for QEMU VM:
> > > > 
> > > > ===
> > > > [root@archmq ~]# smartctl -i /dev/sda
> > > > …
> > > > Device Model:     QEMU HARDDISK
> > > > Serial Number:    QM00001
> > > > Firmware Version: 2.5+
> > > > User Capacity:    4,294,967,296 bytes [4.29 GB]
> > > > Sector Size:      512 bytes logical/physical
> > > > Device is:        Not in smartctl database [for details use: -P
> > > > showall]
> > > > ATA Version is:   ATA/ATAPI-7, ATA/ATAPI-5 published, ANSI NCITS
> > > > 340-2000
> > > > Local Time is:    Sun Aug 27 09:31:54 2017 CEST
> > > > SMART support is: Available - device has SMART capability.
> > > > SMART support is: Enabled
> > > > 
> > > > [root@archmq ~]# lsblk
> > > > NAME                MAJ:MIN RM  SIZE RO TYPE   MOUNTPOINT
> > > > sda                   8:0    0    4G  0 disk
> > > > `-sda1                8:1    0    4G  0 part
> > > > 
> > > >   `-md0               9:0    0    4G  0 raid10
> > > >   
> > > >     `-system        253:0    0    4G  0 crypt
> > > >     
> > > >       |-system-boot 253:1    0  512M  0 lvm    /boot
> > > >       |-system-swap 253:2    0  512M  0 lvm    [SWAP]
> > > >       
> > > >       `-system-root 253:3    0    3G  0 lvm    /
> > > > 
> > > > sdb                   8:16   0    4G  0 disk
> > > > `-sdb1                8:17   0    4G  0 part
> > > > 
> > > >   `-md0               9:0    0    4G  0 raid10
> > > >   
> > > >     `-system        253:0    0    4G  0 crypt
> > > >     
> > > >       |-system-boot 253:1    0  512M  0 lvm    /boot
> > > >       |-system-swap 253:2    0  512M  0 lvm    [SWAP]
> > > >       
> > > >       `-system-root 253:3    0    3G  0 lvm    /
> > > > 
> > > > sr0                  11:0    1 1024M  0 rom
> > > > 
> > > > [root@archmq ~]# mdadm --misc --detail /dev/md0
> > > > 
> > > > /dev/md0:
> > > >         Version : 1.2
> > > >   
> > > >   Creation Time : Sat Jul 29 16:37:05 2017
> > > >   
> > > >      Raid Level : raid10
> > > >      Array Size : 4191232 (4.00 GiB 4.29 GB)
> > > >   
> > > >   Used Dev Size : 4191232 (4.00 GiB 4.29 GB)
> > > >   
> > > >    Raid Devices : 2
> > > >   
> > > >   Total Devices : 2
> > > >   
> > > >     Persistence : Superblock is persistent
> > > >     
> > > >     Update Time : Sun Aug 27 09:30:33 2017
> > > >     
> > > >           State : clean
> > > >  
> > > >  Active Devices : 2
> > > > 
> > > > Working Devices : 2
> > > > 
> > > >  Failed Devices : 0
> > > >  
> > > >   Spare Devices : 0
> > > >   
> > > >          Layout : far=2
> > > >      
> > > >      Chunk Size : 512K
> > > >      
> > > >            Name : archiso:0
> > > >            UUID : 43f4be59:c8d2fa0a:a94acdff:1c7f2f4e
> > > >          
> > > >          Events : 485
> > > >     
> > > >     Number   Major   Minor   RaidDevice State
> > > >     
> > > >        0       8        1        0      active sync   /dev/sda1
> > > >        1       8       17        1      active sync   /dev/sdb1
> > > > 
> > > > ===
> > > > 
> > > > In words: 2 virtual disks, RAID10 setup with far-2 layout, LUKS on it,
> > > > then
> > > > LVM, then ext4 for boot, swap and btrfs for /.
> > > > 
> > > > I couldn't reproduce the issue with single disk without RAID.
> > > 
> > > Could you verify if the following patch fixes your issue?
> 
> Yes, the patch should address this kind of issue, not related
> with RAID specially, and the latest version can be found in the
> following link:
> 
> 	https://marc.info/?l=linux-block&m=150579298505484&w=2

Thank you.

So if I understand already I can just add

https://github.com/ming1/linux/tree/my_v4.13-safe-scsi-quiesce_V5_for_test

as an remote and go from there.

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-09-21  7:30               ` Martin Steigerwald
@ 2017-09-21  7:33                 ` Martin Steigerwald
  0 siblings, 0 replies; 29+ messages in thread
From: Martin Steigerwald @ 2017-09-21  7:33 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oleksandr Natalenko, Jens Axboe, Christoph Hellwig, linux-block,
	linux-raid, linux-kernel, Shaohua Li

Martin Steigerwald - 21.09.17, 09:30:
> Ming Lei - 21.09.17, 06:20:
> > On Mon, Aug 28, 2017 at 03:10:35PM +0200, Martin Steigerwald wrote:
> > > Ming Lei - 28.08.17, 20:58:
> > > > On Sun, Aug 27, 2017 at 09:43:52AM +0200, Oleksandr Natalenko wrote:
> > > > > Hi.
> > > > > 
> > > > > Here is disk setup for QEMU VM:
> > > > > 
> > > > > ===
> > > > > [root@archmq ~]# smartctl -i /dev/sda
> > > > > …
> > > > > Device Model:     QEMU HARDDISK
> > > > > Serial Number:    QM00001
> > > > > Firmware Version: 2.5+
> > > > > User Capacity:    4,294,967,296 bytes [4.29 GB]
> > > > > Sector Size:      512 bytes logical/physical
> > > > > Device is:        Not in smartctl database [for details use: -P
> > > > > showall]
> > > > > ATA Version is:   ATA/ATAPI-7, ATA/ATAPI-5 published, ANSI NCITS
> > > > > 340-2000
> > > > > Local Time is:    Sun Aug 27 09:31:54 2017 CEST
> > > > > SMART support is: Available - device has SMART capability.
> > > > > SMART support is: Enabled
> > > > > 
> > > > > [root@archmq ~]# lsblk
> > > > > NAME                MAJ:MIN RM  SIZE RO TYPE   MOUNTPOINT
> > > > > sda                   8:0    0    4G  0 disk
> > > > > `-sda1                8:1    0    4G  0 part
> > > > > 
> > > > >   `-md0               9:0    0    4G  0 raid10
> > > > >   
> > > > >     `-system        253:0    0    4G  0 crypt
> > > > >     
> > > > >       |-system-boot 253:1    0  512M  0 lvm    /boot
> > > > >       |-system-swap 253:2    0  512M  0 lvm    [SWAP]
> > > > >       
> > > > >       `-system-root 253:3    0    3G  0 lvm    /
> > > > > 
> > > > > sdb                   8:16   0    4G  0 disk
> > > > > `-sdb1                8:17   0    4G  0 part
> > > > > 
> > > > >   `-md0               9:0    0    4G  0 raid10
> > > > >   
> > > > >     `-system        253:0    0    4G  0 crypt
> > > > >     
> > > > >       |-system-boot 253:1    0  512M  0 lvm    /boot
> > > > >       |-system-swap 253:2    0  512M  0 lvm    [SWAP]
> > > > >       
> > > > >       `-system-root 253:3    0    3G  0 lvm    /
> > > > > 
> > > > > sr0                  11:0    1 1024M  0 rom
> > > > > 
> > > > > [root@archmq ~]# mdadm --misc --detail /dev/md0
> > > > > 
> > > > > /dev/md0:
> > > > >         Version : 1.2
> > > > >   
> > > > >   Creation Time : Sat Jul 29 16:37:05 2017
> > > > >   
> > > > >      Raid Level : raid10
> > > > >      Array Size : 4191232 (4.00 GiB 4.29 GB)
> > > > >   
> > > > >   Used Dev Size : 4191232 (4.00 GiB 4.29 GB)
> > > > >   
> > > > >    Raid Devices : 2
> > > > >   
> > > > >   Total Devices : 2
> > > > >   
> > > > >     Persistence : Superblock is persistent
> > > > >     
> > > > >     Update Time : Sun Aug 27 09:30:33 2017
> > > > >     
> > > > >           State : clean
> > > > >  
> > > > >  Active Devices : 2
> > > > > 
> > > > > Working Devices : 2
> > > > > 
> > > > >  Failed Devices : 0
> > > > >  
> > > > >   Spare Devices : 0
> > > > >   
> > > > >          Layout : far=2
> > > > >      
> > > > >      Chunk Size : 512K
> > > > >      
> > > > >            Name : archiso:0
> > > > >            UUID : 43f4be59:c8d2fa0a:a94acdff:1c7f2f4e
> > > > >          
> > > > >          Events : 485
> > > > >     
> > > > >     Number   Major   Minor   RaidDevice State
> > > > >     
> > > > >        0       8        1        0      active sync   /dev/sda1
> > > > >        1       8       17        1      active sync   /dev/sdb1
> > > > > 
> > > > > ===
> > > > > 
> > > > > In words: 2 virtual disks, RAID10 setup with far-2 layout, LUKS on
> > > > > it,
> > > > > then
> > > > > LVM, then ext4 for boot, swap and btrfs for /.
> > > > > 
> > > > > I couldn't reproduce the issue with single disk without RAID.
> > > > 
> > > > Could you verify if the following patch fixes your issue?
> > 
> > Yes, the patch should address this kind of issue, not related
> > with RAID specially, and the latest version can be found in the
> > 
> > following link:
> > 	https://marc.info/?l=linux-block&m=150579298505484&w=2
> 
> Thank you.
> 
> So if I understand already I can just add
> 
> https://github.com/ming1/linux/tree/my_v4.13-safe-scsi-quiesce_V5_for_test
> 
> as an remote and go from there.

https://github.com/ming1/linux/tree/my_v4.13-safe-scsi-quiesce_V5_for_test

and checkout my_v4.13-safe-scsi-quiesce_V5_for_test branch of course.

-- 
Martin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-09-20 22:17                 ` Ming Lei
@ 2017-09-24 17:33                   ` Martin Steigerwald
  2017-09-25  2:59                     ` Ming Lei
  0 siblings, 1 reply; 29+ messages in thread
From: Martin Steigerwald @ 2017-09-24 17:33 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oleksandr Natalenko, Jens Axboe, Christoph Hellwig, linux-block,
	linux-raid, linux-kernel, Shaohua Li

Ming Lei - 21.09.17, 06:17:
> On Wed, Sep 20, 2017 at 07:25:02PM +0200, Martin Steigerwald wrote:
> > Ming Lei - 28.08.17, 21:32:
> > > On Mon, Aug 28, 2017 at 03:10:35PM +0200, Martin Steigerwald wrote:
> > > > Ming Lei - 28.08.17, 20:58:
> > > > > On Sun, Aug 27, 2017 at 09:43:52AM +0200, Oleksandr Natalenko wrote:
> > > > > > Hi.
> > 
> > > > > > Here is disk setup for QEMU VM:
> > […]
> > 
> > > > > > In words: 2 virtual disks, RAID10 setup with far-2 layout, LUKS on
> > > > > > it,
> > > > > > then
> > > > > > LVM, then ext4 for boot, swap and btrfs for /.
> > > > > > 
> > > > > > I couldn't reproduce the issue with single disk without RAID.
> > > > > 
> > > > > Could you verify if the following patch fixes your issue?
> > > > 
> > > > Could this also apply to non MD RAID systems? I am using BTRFS RAID
> > > > 1 with two SSDs. So far with CFQ it runs stable.
> > > 
> > > It is for fixing Oleksandr's issue wrt. blk-mq, and looks not for you.
> > 
> > My findings are different:
> > 
> > On 4.12.10 with CONFIG_HZ=1000, CONFIG_PREEMPT=y and optimizations for
> > Intel Core/newer Xeon I see this:
> > 
> > 1) Running with CFQ: No hang after resume
> > 
> > 2) Running with scsi_mod.use_blk_mq=1 + BFQ: Hang after resume within
> > first 1-2 days.
> 
> Hi Martin,
> 
> Thanks for your report!
> 
> Could you test the following patchset to see if it fixes your issue?
> 
> 	https://marc.info/?l=linux-block&m=150579298505484&w=2

Testing with https://github.com/ming1/linux.git, my_v4.13-safe-scsi-
quiesce_V5_for_test branch as of 53954fd58fb9fe6894b1635dad1acec96ca51a2f I 
now have a bit more than three days and 8 hours of uptime without a hang on 
resume from memory or disk, despite using blk-mq + BFQ.

So it looks like this patch set fixes the issue for me. To say for sure I´d say 
some more days of testing are needed. But it looks like the hang on resume is 
fixed by this patch set.

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-09-24 17:33                   ` Martin Steigerwald
@ 2017-09-25  2:59                     ` Ming Lei
  2017-09-25 14:13                       ` Martin Steigerwald
  0 siblings, 1 reply; 29+ messages in thread
From: Ming Lei @ 2017-09-25  2:59 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Oleksandr Natalenko, Jens Axboe, Christoph Hellwig, linux-block,
	linux-raid, linux-kernel, Shaohua Li

On Sun, Sep 24, 2017 at 07:33:00PM +0200, Martin Steigerwald wrote:
> Ming Lei - 21.09.17, 06:17:
> > On Wed, Sep 20, 2017 at 07:25:02PM +0200, Martin Steigerwald wrote:
> > > Ming Lei - 28.08.17, 21:32:
> > > > On Mon, Aug 28, 2017 at 03:10:35PM +0200, Martin Steigerwald wrote:
> > > > > Ming Lei - 28.08.17, 20:58:
> > > > > > On Sun, Aug 27, 2017 at 09:43:52AM +0200, Oleksandr Natalenko wrote:
> > > > > > > Hi.
> > > 
> > > > > > > Here is disk setup for QEMU VM:
> > > […]
> > > 
> > > > > > > In words: 2 virtual disks, RAID10 setup with far-2 layout, LUKS on
> > > > > > > it,
> > > > > > > then
> > > > > > > LVM, then ext4 for boot, swap and btrfs for /.
> > > > > > > 
> > > > > > > I couldn't reproduce the issue with single disk without RAID.
> > > > > > 
> > > > > > Could you verify if the following patch fixes your issue?
> > > > > 
> > > > > Could this also apply to non MD RAID systems? I am using BTRFS RAID
> > > > > 1 with two SSDs. So far with CFQ it runs stable.
> > > > 
> > > > It is for fixing Oleksandr's issue wrt. blk-mq, and looks not for you.
> > > 
> > > My findings are different:
> > > 
> > > On 4.12.10 with CONFIG_HZ=1000, CONFIG_PREEMPT=y and optimizations for
> > > Intel Core/newer Xeon I see this:
> > > 
> > > 1) Running with CFQ: No hang after resume
> > > 
> > > 2) Running with scsi_mod.use_blk_mq=1 + BFQ: Hang after resume within
> > > first 1-2 days.
> > 
> > Hi Martin,
> > 
> > Thanks for your report!
> > 
> > Could you test the following patchset to see if it fixes your issue?
> > 
> > 	https://marc.info/?l=linux-block&m=150579298505484&w=2
> 
> Testing with https://github.com/ming1/linux.git, my_v4.13-safe-scsi-
> quiesce_V5_for_test branch as of 53954fd58fb9fe6894b1635dad1acec96ca51a2f I 
> now have a bit more than three days and 8 hours of uptime without a hang on 
> resume from memory or disk, despite using blk-mq + BFQ.
> 
> So it looks like this patch set fixes the issue for me. To say for sure I´d say 
> some more days of testing are needed. But it looks like the hang on resume is 
> fixed by this patch set.

Martin, thanks for your test, it is great to see your issue is fixed
by this patchset.

Also I remember that your issue wasn't related with MD, and actually
you were using BTRFS(RAID), just want to double check with you,
is that true?

-- 
Ming

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: I/O hangs after resuming from suspend-to-ram
  2017-09-25  2:59                     ` Ming Lei
@ 2017-09-25 14:13                       ` Martin Steigerwald
  0 siblings, 0 replies; 29+ messages in thread
From: Martin Steigerwald @ 2017-09-25 14:13 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oleksandr Natalenko, Jens Axboe, Christoph Hellwig, linux-block,
	linux-raid, linux-kernel, Shaohua Li

Ming Lei - 25.09.17, 10:59:
> On Sun, Sep 24, 2017 at 07:33:00PM +0200, Martin Steigerwald wrote:
> > Ming Lei - 21.09.17, 06:17:
> > > On Wed, Sep 20, 2017 at 07:25:02PM +0200, Martin Steigerwald wrote:
> > > > Ming Lei - 28.08.17, 21:32:
> > > > > On Mon, Aug 28, 2017 at 03:10:35PM +0200, Martin Steigerwald wrote:
> > > > > > Ming Lei - 28.08.17, 20:58:
> > > > > > > On Sun, Aug 27, 2017 at 09:43:52AM +0200, Oleksandr Natalenko 
wrote:
> > > > > > > > Hi.
> > > > 
> > > > > > > > Here is disk setup for QEMU VM:
> > > > […]
> > > > 
> > > > > > > > In words: 2 virtual disks, RAID10 setup with far-2 layout,
> > > > > > > > LUKS on
> > > > > > > > it,
> > > > > > > > then
> > > > > > > > LVM, then ext4 for boot, swap and btrfs for /.
> > > > > > > > 
> > > > > > > > I couldn't reproduce the issue with single disk without RAID.
> > > > > > > 
> > > > > > > Could you verify if the following patch fixes your issue?
> > > > > > 
> > > > > > Could this also apply to non MD RAID systems? I am using BTRFS
> > > > > > RAID
> > > > > > 1 with two SSDs. So far with CFQ it runs stable.
> > > > > 
> > > > > It is for fixing Oleksandr's issue wrt. blk-mq, and looks not for
> > > > > you.
> > > > 
> > > > My findings are different:
> > > > 
> > > > On 4.12.10 with CONFIG_HZ=1000, CONFIG_PREEMPT=y and optimizations for
> > > > Intel Core/newer Xeon I see this:
> > > > 
> > > > 1) Running with CFQ: No hang after resume
> > > > 
> > > > 2) Running with scsi_mod.use_blk_mq=1 + BFQ: Hang after resume within
> > > > first 1-2 days.
> > > 
> > > Hi Martin,
> > > 
> > > Thanks for your report!
> > > 
> > > Could you test the following patchset to see if it fixes your issue?
> > > 
> > > 	https://marc.info/?l=linux-block&m=150579298505484&w=2
> > 
> > Testing with https://github.com/ming1/linux.git, my_v4.13-safe-scsi-
> > quiesce_V5_for_test branch as of 53954fd58fb9fe6894b1635dad1acec96ca51a2f
> > I
> > now have a bit more than three days and 8 hours of uptime without a hang
> > on
> > resume from memory or disk, despite using blk-mq + BFQ.
> > 
> > So it looks like this patch set fixes the issue for me. To say for sure
> > I´d say some more days of testing are needed. But it looks like the hang
> > on resume is fixed by this patch set.
> 
> Martin, thanks for your test, it is great to see your issue is fixed
> by this patchset.
> 
> Also I remember that your issue wasn't related with MD, and actually
> you were using BTRFS(RAID), just want to double check with you,
> is that true?

Yes. But *additionally*, as I explained, the hangs only occured with 
CONFIG_HZ=1000, Low latency + CPU optimized kernel, not with CONFIG_HZ=250, 
voluntary preemption + generic x86 kernel. This hints a timing issue for me. 
So blk-mq + BFQ + CONFIG_HZ=1000 + Low Latency + CPU optimization = hang.

I did not investigate whether blk-mq without scheduler would work. And I did 
not investigate which of the three changes to the kernel configurations 
triggers the issue. I suspect its either CONFIG_HZ or the preemption option or 
both of them.

My original mail had both working and non working kernel configurations 
attached.

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-09-25 14:13 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-22 11:45 I/O hangs after resuming from suspend-to-ram Oleksandr Natalenko
2017-08-26 10:37 ` Oleksandr Natalenko
2017-08-26 10:48   ` Oleksandr Natalenko
2017-08-26 11:19     ` Martin Steigerwald
2017-08-26 17:17       ` Wols Lists
2017-08-26 19:33         ` Martin Steigerwald
2017-08-26 13:32     ` Oleksandr Natalenko
2017-08-27  6:02     ` Ming Lei
2017-08-27  7:43       ` Oleksandr Natalenko
2017-08-28 12:58         ` Ming Lei
2017-08-28 13:10           ` Martin Steigerwald
2017-08-28 13:32             ` Ming Lei
2017-09-20 17:25               ` Martin Steigerwald
2017-09-20 22:17                 ` Ming Lei
2017-09-24 17:33                   ` Martin Steigerwald
2017-09-25  2:59                     ` Ming Lei
2017-09-25 14:13                       ` Martin Steigerwald
2017-09-20 22:20             ` Ming Lei
2017-09-21  7:30               ` Martin Steigerwald
2017-09-21  7:33                 ` Martin Steigerwald
2017-08-28 18:22           ` Oleksandr Natalenko
2017-08-29  0:24             ` Ming Lei
2017-08-29 15:52               ` Oleksandr Natalenko
2017-08-30  2:15                 ` Ming Lei
2017-08-30  5:17                   ` Ming Lei
2017-08-30  6:15                     ` oleksandr
2017-08-30  8:06                       ` Ming Lei
2017-08-30 10:58                         ` oleksandr
2017-08-30 14:37                           ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).