[Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload"

* [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload"
@ 2017-01-12 10:55 Dexuan Cui
  2017-01-12 13:44 ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Dexuan Cui @ 2017-01-12 10:55 UTC (permalink / raw)
  To: Christoph Hellwig, linux-block, Jens Axboe
  Cc: Vitaly Kuznetsov, linux-kernel, KY Srinivasan,
	Chris Valean (Cloudbase Solutions SRL)

Hi,
Recently fstrim and mkfs always hang in Linux VM running on Hyper-V 2012 R2 or 2016.
The VM uses the latest mainline kernel (v4.10-rc3).

git-bisect shows the patch 
"block: improve handling of the magic discard payload (f9d03f96)"
causes the issue. 
If I revert the patch, the issue will go away.

When the issue happens, any new shell command causing disk I/O will hang too, and
I even can't reboot the VM due to the pending I/O.

It seems blkdev_issue_discard() never returns, meaning the SCSI Unmap command(s) 
can't finish somehow, I think.

Any idea why the patch can cause this?

Thanks!
-- Dexuan

PS, this is the calltrace:

[ 1450.976205] INFO: task fstrim:1300 blocked for more than 120 seconds.
[ 1450.976264]       Not tainted 4.9.0+ #58
[ 1450.976291] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1450.976342] fstrim          D    0  1300   1280 0x00000000
[ 1450.976382] Call Trace:
[ 1450.976412]  ? __schedule+0x232/0x700
[ 1450.976442]  ? try_to_grab_pending+0xb3/0x160
[ 1450.976476]  schedule+0x36/0x80
[ 1450.976501]  schedule_timeout+0x235/0x3f0
[ 1450.976532]  ? blk_run_queue_async+0x3c/0x40
[ 1450.976565]  io_schedule_timeout+0xa4/0x110
[ 1450.976596]  wait_for_completion_io+0xa5/0x110
[ 1450.976628]  ? wake_up_q+0x70/0x70
[ 1450.976654]  submit_bio_wait+0x59/0x70
[ 1450.976683]  blkdev_issue_discard+0x6a/0xb0
[ 1450.976783]  xfs_trim_extents+0x24c/0x410 [xfs]
[ 1450.976862]  xfs_ioc_trim+0x157/0x1c0 [xfs]
[ 1450.976938]  xfs_file_ioctl+0x8ee/0xb20 [xfs]
[ 1450.976972]  ? path_openat+0x3fb/0x13f0
[ 1450.977002]  ? page_add_file_rmap+0x58/0x140
[ 1450.977035]  ? alloc_set_pte+0x4ee/0x640
[ 1450.977065]  ? do_filp_open+0x92/0xe0
[ 1450.977093]  ? _copy_to_user+0x2e/0x40
[ 1450.977121]  ? cp_new_stat+0x141/0x160
[ 1450.977151]  do_vfs_ioctl+0x92/0x5a0
[ 1450.977178]  ? SYSC_newfstat+0x25/0x30
[ 1450.977206]  SyS_ioctl+0x79/0x90
[ 1450.977232]  entry_SYSCALL_64_fastpath+0x1e/0xad
[ 1450.977264] RIP: 0033:0x7f8cac393687
[ 1450.977290] RSP: 002b:00007ffdce06fa38 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[ 1450.977340] RAX: ffffffffffffffda RBX: 0000000000609330 RCX: 00007f8cac393687
[ 1450.977386] RDX: 00007ffdce06fa40 RSI: 00000000c0185879 RDI: 0000000000000003
[ 1450.977431] RBP: 00007ffdce06fd18 R08: 0000000000000000 R09: 0000000000000000
[ 1450.977476] R10: 000000000000053f R11: 0000000000000202 R12: 0000000000000000
[ 1450.977522] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1450.977570] INFO: task ls:1304 blocked for more than 120 seconds.
[ 1450.977609]       Not tainted 4.9.0+ #58
[ 1450.977636] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1450.977685] ls              D    0  1304   1219 0x00000000
[ 1450.977723] Call Trace:
[ 1450.977745]  ? __schedule+0x232/0x700
[ 1450.977774]  ? __blk_run_queue+0x33/0x40
[ 1450.977803]  ? queue_unplugged+0x2a/0xb0
[ 1450.977833]  schedule+0x36/0x80
[ 1450.977857]  schedule_timeout+0x235/0x3f0
[ 1450.977886]  ? blk_finish_plug+0x2c/0x40
[ 1450.977963]  ? _xfs_buf_ioapply+0x324/0x440 [xfs]
[ 1450.977998]  wait_for_completion+0xa5/0x110
[ 1450.978028]  ? wake_up_q+0x70/0x70
[ 1450.978107]  ? xfs_trans_read_buf_map+0xf5/0x330 [xfs]
[ 1450.979283]  ? _xfs_buf_read+0x23/0x30 [xfs]
[ 1450.980522]  xfs_buf_submit_wait+0x7f/0x210 [xfs]
[ 1450.981706]  ? xfs_trans_read_buf_map+0xf5/0x330 [xfs]
[ 1450.982863]  _xfs_buf_read+0x23/0x30 [xfs]
[ 1450.984420]  xfs_buf_read_map+0x108/0x180 [xfs]
[ 1450.985559]  xfs_trans_read_buf_map+0xf5/0x330 [xfs]
[ 1450.986672]  xfs_imap_to_bp+0x5f/0xc0 [xfs]
[ 1450.987761]  xfs_iread+0x79/0x320 [xfs]
[ 1450.988894]  xfs_iget+0x32a/0x840 [xfs]
[ 1450.990055]  xfs_lookup+0xc6/0xe0 [xfs]
[ 1450.991132]  xfs_vn_lookup+0x4f/0x90 [xfs]
[ 1450.992221]  lookup_slow+0x96/0x140
[ 1450.993254]  walk_component+0x1ca/0x2f0
[ 1450.994283]  ? path_init+0x1d9/0x330
[ 1450.995309]  ? mntput+0x24/0x40
[ 1450.996955]  path_lookupat+0x5d/0x110
[ 1450.997979]  filename_lookup+0x9e/0x150
[ 1450.999001]  ? kmem_cache_alloc+0xd7/0x1b0
[ 1451.000126]  ? getname_flags+0x56/0x1f0
[ 1451.001150]  ? getname_flags+0x72/0x1f0
[ 1451.002164]  user_path_at_empty+0x36/0x40
[ 1451.003173]  vfs_fstatat+0x53/0xa0
[ 1451.004223]  SYSC_newlstat+0x22/0x40
[ 1451.005232]  SyS_newlstat+0xe/0x10
[ 1451.006233]  entry_SYSCALL_64_fastpath+0x1e/0xad
[ 1451.007750] RIP: 0033:0x7ff2730993d5
[ 1451.008820] RSP: 002b:00007ffc7c1650c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
[ 1451.009880] RAX: ffffffffffffffda RBX: 00007ff273366b78 RCX: 00007ff2730993d5
[ 1451.010953] RDX: 00000000019dfb20 RSI: 00000000019dfb20 RDI: 00007ffc7c1650d0
[ 1451.012078] RBP: 00007ff273366b20 R08: 0000000000000000 R09: 00000000000000c0
[ 1451.013175] R10: 00000000019e4550 R11: 0000000000000246 R12: 0000000000008041
[ 1451.014260] R13: 00007ff273366b78 R14: 000000000000270f R15: 00007ff273366b78

^ permalink raw reply	[flat|nested] 7+ messages in thread