* [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload"
@ 2017-01-12 10:55 Dexuan Cui
2017-01-12 13:44 ` Christoph Hellwig
0 siblings, 1 reply; 5+ messages in thread
From: Dexuan Cui @ 2017-01-12 10:55 UTC (permalink / raw)
To: Christoph Hellwig, linux-block, Jens Axboe
Cc: Vitaly Kuznetsov, linux-kernel, KY Srinivasan,
Chris Valean (Cloudbase Solutions SRL)
Hi,
Recently fstrim and mkfs always hang in Linux VM running on Hyper-V 2012 R2 or 2016.
The VM uses the latest mainline kernel (v4.10-rc3).
git-bisect shows the patch
"block: improve handling of the magic discard payload (f9d03f96)"
causes the issue.
If I revert the patch, the issue will go away.
When the issue happens, any new shell command causing disk I/O will hang too, and
I even can't reboot the VM due to the pending I/O.
It seems blkdev_issue_discard() never returns, meaning the SCSI Unmap command(s)
can't finish somehow, I think.
Any idea why the patch can cause this?
Thanks!
-- Dexuan
PS, this is the calltrace:
[ 1450.976205] INFO: task fstrim:1300 blocked for more than 120 seconds.
[ 1450.976264] Not tainted 4.9.0+ #58
[ 1450.976291] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1450.976342] fstrim D 0 1300 1280 0x00000000
[ 1450.976382] Call Trace:
[ 1450.976412] ? __schedule+0x232/0x700
[ 1450.976442] ? try_to_grab_pending+0xb3/0x160
[ 1450.976476] schedule+0x36/0x80
[ 1450.976501] schedule_timeout+0x235/0x3f0
[ 1450.976532] ? blk_run_queue_async+0x3c/0x40
[ 1450.976565] io_schedule_timeout+0xa4/0x110
[ 1450.976596] wait_for_completion_io+0xa5/0x110
[ 1450.976628] ? wake_up_q+0x70/0x70
[ 1450.976654] submit_bio_wait+0x59/0x70
[ 1450.976683] blkdev_issue_discard+0x6a/0xb0
[ 1450.976783] xfs_trim_extents+0x24c/0x410 [xfs]
[ 1450.976862] xfs_ioc_trim+0x157/0x1c0 [xfs]
[ 1450.976938] xfs_file_ioctl+0x8ee/0xb20 [xfs]
[ 1450.976972] ? path_openat+0x3fb/0x13f0
[ 1450.977002] ? page_add_file_rmap+0x58/0x140
[ 1450.977035] ? alloc_set_pte+0x4ee/0x640
[ 1450.977065] ? do_filp_open+0x92/0xe0
[ 1450.977093] ? _copy_to_user+0x2e/0x40
[ 1450.977121] ? cp_new_stat+0x141/0x160
[ 1450.977151] do_vfs_ioctl+0x92/0x5a0
[ 1450.977178] ? SYSC_newfstat+0x25/0x30
[ 1450.977206] SyS_ioctl+0x79/0x90
[ 1450.977232] entry_SYSCALL_64_fastpath+0x1e/0xad
[ 1450.977264] RIP: 0033:0x7f8cac393687
[ 1450.977290] RSP: 002b:00007ffdce06fa38 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[ 1450.977340] RAX: ffffffffffffffda RBX: 0000000000609330 RCX: 00007f8cac393687
[ 1450.977386] RDX: 00007ffdce06fa40 RSI: 00000000c0185879 RDI: 0000000000000003
[ 1450.977431] RBP: 00007ffdce06fd18 R08: 0000000000000000 R09: 0000000000000000
[ 1450.977476] R10: 000000000000053f R11: 0000000000000202 R12: 0000000000000000
[ 1450.977522] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1450.977570] INFO: task ls:1304 blocked for more than 120 seconds.
[ 1450.977609] Not tainted 4.9.0+ #58
[ 1450.977636] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1450.977685] ls D 0 1304 1219 0x00000000
[ 1450.977723] Call Trace:
[ 1450.977745] ? __schedule+0x232/0x700
[ 1450.977774] ? __blk_run_queue+0x33/0x40
[ 1450.977803] ? queue_unplugged+0x2a/0xb0
[ 1450.977833] schedule+0x36/0x80
[ 1450.977857] schedule_timeout+0x235/0x3f0
[ 1450.977886] ? blk_finish_plug+0x2c/0x40
[ 1450.977963] ? _xfs_buf_ioapply+0x324/0x440 [xfs]
[ 1450.977998] wait_for_completion+0xa5/0x110
[ 1450.978028] ? wake_up_q+0x70/0x70
[ 1450.978107] ? xfs_trans_read_buf_map+0xf5/0x330 [xfs]
[ 1450.979283] ? _xfs_buf_read+0x23/0x30 [xfs]
[ 1450.980522] xfs_buf_submit_wait+0x7f/0x210 [xfs]
[ 1450.981706] ? xfs_trans_read_buf_map+0xf5/0x330 [xfs]
[ 1450.982863] _xfs_buf_read+0x23/0x30 [xfs]
[ 1450.984420] xfs_buf_read_map+0x108/0x180 [xfs]
[ 1450.985559] xfs_trans_read_buf_map+0xf5/0x330 [xfs]
[ 1450.986672] xfs_imap_to_bp+0x5f/0xc0 [xfs]
[ 1450.987761] xfs_iread+0x79/0x320 [xfs]
[ 1450.988894] xfs_iget+0x32a/0x840 [xfs]
[ 1450.990055] xfs_lookup+0xc6/0xe0 [xfs]
[ 1450.991132] xfs_vn_lookup+0x4f/0x90 [xfs]
[ 1450.992221] lookup_slow+0x96/0x140
[ 1450.993254] walk_component+0x1ca/0x2f0
[ 1450.994283] ? path_init+0x1d9/0x330
[ 1450.995309] ? mntput+0x24/0x40
[ 1450.996955] path_lookupat+0x5d/0x110
[ 1450.997979] filename_lookup+0x9e/0x150
[ 1450.999001] ? kmem_cache_alloc+0xd7/0x1b0
[ 1451.000126] ? getname_flags+0x56/0x1f0
[ 1451.001150] ? getname_flags+0x72/0x1f0
[ 1451.002164] user_path_at_empty+0x36/0x40
[ 1451.003173] vfs_fstatat+0x53/0xa0
[ 1451.004223] SYSC_newlstat+0x22/0x40
[ 1451.005232] SyS_newlstat+0xe/0x10
[ 1451.006233] entry_SYSCALL_64_fastpath+0x1e/0xad
[ 1451.007750] RIP: 0033:0x7ff2730993d5
[ 1451.008820] RSP: 002b:00007ffc7c1650c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
[ 1451.009880] RAX: ffffffffffffffda RBX: 00007ff273366b78 RCX: 00007ff2730993d5
[ 1451.010953] RDX: 00000000019dfb20 RSI: 00000000019dfb20 RDI: 00007ffc7c1650d0
[ 1451.012078] RBP: 00007ff273366b20 R08: 0000000000000000 R09: 00000000000000c0
[ 1451.013175] R10: 00000000019e4550 R11: 0000000000000246 R12: 0000000000008041
[ 1451.014260] R13: 00007ff273366b78 R14: 000000000000270f R15: 00007ff273366b78
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload"
2017-01-12 10:55 [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload" Dexuan Cui
@ 2017-01-12 13:44 ` Christoph Hellwig
2017-01-12 15:39 ` Dexuan Cui
0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2017-01-12 13:44 UTC (permalink / raw)
To: Dexuan Cui
Cc: Christoph Hellwig, linux-block, Jens Axboe, Vitaly Kuznetsov,
linux-kernel, KY Srinivasan,
Chris Valean (Cloudbase Solutions SRL)
Hi Dexuan,
sorry for dropping the ball on the previous private report, I hoped
I could get my hands on a Hyper-V VM and reproduce it myself, but
that has obviously not happened.
Can you send me the output of the provisioning_mode file for the
scsi disk in question to get started?
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload"
2017-01-12 13:44 ` Christoph Hellwig
@ 2017-01-12 15:39 ` Dexuan Cui
2017-01-12 15:52 ` Christoph Hellwig
0 siblings, 1 reply; 5+ messages in thread
From: Dexuan Cui @ 2017-01-12 15:39 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-block, Jens Axboe, Vitaly Kuznetsov, linux-kernel,
KY Srinivasan, Chris Valean (Cloudbase Solutions SRL)
> From: Christoph Hellwig [mailto:hch@lst.de]
> Sent: Thursday, January 12, 2017 21:44
> To: Dexuan Cui <decui@microsoft.com>
> Cc: Christoph Hellwig <hch@lst.de>; linux-block@vger.kernel.org; Jens Axboe
> <axboe@fb.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; linux-
> kernel@vger.kernel.org; KY Srinivasan <kys@microsoft.com>; Chris Valean
> (Cloudbase Solutions SRL) <v-chvale@microsoft.com>
> Subject: Re: [Regression] fstrim hangs on Hyper-V: caused by "block: improve
> handling of the magic discard payload"
>
> Hi Dexuan,
>
> sorry for dropping the ball on the previous private report, I hoped
> I could get my hands on a Hyper-V VM and reproduce it myself, but
> that has obviously not happened.
>
> Can you send me the output of the provisioning_mode file for the
> scsi disk in question to get started?
Hi Christoph,
Thank you very much for the help!
The file just shows "unmap":
root@decui-u1604:~# cd /sys/class/scsi_disk/2\:0\:0\:0
root@decui-u1604:/sys/class/scsi_disk/2:0:0:0# ls
allow_restart cache_type device manage_start_stop max_write_same_blocks protection_mode provisioning_mode thin_provisioning
app_tag_own deferred_probe FUA max_medium_access_timeouts power protection_type subsystem uevent
root@decui-u1604:/sys/class/scsi_disk/2:0:0:0# cat provisioning_mode
unmap
I'm ready to provide any info you need. :-)
Thanks,
-- Dexuan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload"
2017-01-12 15:39 ` Dexuan Cui
@ 2017-01-12 15:52 ` Christoph Hellwig
2017-01-12 16:39 ` Dexuan Cui
0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2017-01-12 15:52 UTC (permalink / raw)
To: Dexuan Cui
Cc: Christoph Hellwig, linux-block, Jens Axboe, Vitaly Kuznetsov,
linux-kernel, KY Srinivasan,
Chris Valean (Cloudbase Solutions SRL)
Can you check if this debug printk triggers for the discard commands?
---
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 888e16e..7ab7d08 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1031,6 +1031,10 @@ static void storvsc_command_completion(struct storvsc_cmd_request *cmd_request,
data_transfer_length = 0;
}
+ if (cmd_request->payload->range.len != data_transfer_length)
+ printk_ratelimited("request len: %u, transfer len: %u\n",
+ cmd_request->payload->range.len,
+ data_transfer_length);
scsi_set_resid(scmnd,
cmd_request->payload->range.len - data_transfer_length);
^ permalink raw reply related [flat|nested] 5+ messages in thread
* RE: [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload"
2017-01-12 15:52 ` Christoph Hellwig
@ 2017-01-12 16:39 ` Dexuan Cui
0 siblings, 0 replies; 5+ messages in thread
From: Dexuan Cui @ 2017-01-12 16:39 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-block, Jens Axboe, Vitaly Kuznetsov, linux-kernel,
KY Srinivasan, Chris Valean (Cloudbase Solutions SRL)
> From: Christoph Hellwig [mailto:hch@lst.de]
> Sent: Thursday, January 12, 2017 23:53
> To: Dexuan Cui <decui@microsoft.com>
> Cc: Christoph Hellwig <hch@lst.de>; linux-block@vger.kernel.org; Jens Axboe
> <axboe@fb.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; linux-
> kernel@vger.kernel.org; KY Srinivasan <kys@microsoft.com>; Chris Valean
> (Cloudbase Solutions SRL) <v-chvale@microsoft.com>
> Subject: Re: [Regression] fstrim hangs on Hyper-V: caused by "block: improve
> handling of the magic discard payload"
>
> Can you check if this debug printk triggers for the discard commands?
>
> ---
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index 888e16e..7ab7d08 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -1031,6 +1031,10 @@ static void storvsc_command_completion(struct
> storvsc_cmd_request *cmd_request,
> data_transfer_length = 0;
> }
>
> + if (cmd_request->payload->range.len != data_transfer_length)
> + printk_ratelimited("request len: %u, transfer len: %u\n",
> + cmd_request->payload->range.len,
> + data_transfer_length);
> scsi_set_resid(scmnd,
> cmd_request->payload->range.len - data_transfer_length);
>
// I fixed the small building issue (data_transfer_length ==> vm_srb->data_transfer_length).
No, the printk doesn't trigger for fstrim.
It does trigger at the early boot phase, though.
# dmesg |grep len:
[ 0.000000] log_buf_len: 134217728 bytes
[ 7.073423] request len: 255, transfer len: 12
[ 7.084937] request len: 255, transfer len: 52
[ 7.121728] request len: 64, transfer len: 12
[ 7.121915] request len: 64, transfer len: 12
[ 7.123180] request len: 64, transfer len: 12
[ 7.123367] request len: 64, transfer len: 12
[ 7.127193] request len: 64, transfer len: 12
[ 7.127350] request len: 64, transfer len: 12
[ 7.178930] request len: 255, transfer len: 12
[ 7.179045] request len: 255, transfer len: 52
-- Dexuan
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-01-12 16:39 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-12 10:55 [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload" Dexuan Cui
2017-01-12 13:44 ` Christoph Hellwig
2017-01-12 15:39 ` Dexuan Cui
2017-01-12 15:52 ` Christoph Hellwig
2017-01-12 16:39 ` Dexuan Cui
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).