* Occasional VM soft lockup when a remote cdrom is attached
@ 2019-10-11 13:22 Guoheyi
2019-10-11 19:06 ` John Snow
0 siblings, 1 reply; 4+ messages in thread
From: Guoheyi @ 2019-10-11 13:22 UTC (permalink / raw)
To: QEMU Developers, qemu-block
Cc: Kevin Wolf, wanghaibin 00208455, zhengxiang 00421807, John Snow
Hi folks,
We observed Linux on VM occasionally (at very low rate) got soft lockup
when a remote cdrom is attached. The guest hangs up at below call trace:
[Tue Oct8 23:02:53 2019]ata_scsi_queuecmd+0xe0/0x2a0 [libata]
[Tue Oct8 23:02:53 2019]scsi_dispatch_cmd+0xec/0x288
[Tue Oct8 23:02:53 2019]scsi_queue_rq+0x5f4/0x6b8
[Tue Oct8 23:02:53 2019]blk_mq_dispatch_rq_list+0xb0/0x690
[Tue Oct8 23:02:53 2019]blk_mq_do_dispatch_sched+0x8c/0x130
[Tue Oct8 23:02:53 2019]blk_mq_sched_dispatch_requests+0x128/0x1f0
[Tue Oct8 23:02:53 2019]__blk_mq_run_hw_queue+0x9c/0x128
[Tue Oct8 23:02:53 2019]__blk_mq_delay_run_hw_queue+0x198/0x1d8
[Tue Oct8 23:02:53 2019]blk_mq_run_hw_queue+0x68/0x180
[Tue Oct8 23:02:53 2019]blk_mq_sched_insert_request+0xbc/0x210
[Tue Oct8 23:02:53 2019]blk_execute_rq_nowait+0x118/0x168
[Tue Oct8 23:02:53 2019]blk_execute_rq+0x74/0xd8
[Tue Oct8 23:02:53 2019]__scsi_execute+0xd8/0x1e0
[Tue Oct8 23:02:53 2019]sr_check_events+0xd4/0x2c8 [sr_mod]
[Tue Oct8 23:02:53 2019]cdrom_check_events+0x34/0x50 [cdrom]
[Tue Oct8 23:02:53 2019]sr_block_check_events+0xdc/0x108 [sr_mod]
[Tue Oct8 23:02:53 2019]disk_check_events+0x60/0x198
[Tue Oct8 23:02:53 2019]disk_events_workfn+0x24/0x30
[Tue Oct8 23:02:53 2019]process_one_work+0x1b4/0x3f8
[Tue Oct8 23:02:53 2019]worker_thread+0x54/0x470
[Tue Oct8 23:02:53 2019]kthread+0x134/0x138
[Tue Oct8 23:02:53 2019]ret_from_fork+0x10/0x18
We are running the whole stack on ARM64 platforms, using rcdrom on host
to connect a remote cdrom, which is appeared as "/dev/sr0" on the host.
Our Linux kernel version is 4.19.36 and qemu version is 2.8.1, which is
fairly old but I checked the mainline and found the work flow does not
change much. And KVM is enabled.
We provide the remote cdrom to guest as a block device, attached under
ICH SATA bus.
The work flow should be like this (please correct me if I was wrong):
1. There is a kworker thread in guest kernel which will check cdrom
status periodically.
2. The call of "ata_scsi_queuecmd" in guest will write AHCI port
register "PORT_CMD_ISSUE", so this VCPU thread is trapped out to qemu.
3. qemu will grab the BQL and then dispatch the access to ahci_port_write().
4. For this is a "get event status notification" command, qemu finally
goes to cmd_get_event_status_notification() and then cdrom_is_inserted().
5. In cdrom_is_inserted(), an ioctl to cdrom fd is issued.
However, in the last step, we found the ioctl() may have large latency,
for it is a virtual device of remote cdrom, when the remote server is
busy and of poor performance. We have observed more than 8 seconds
latency in half an hour test, and the latency might reach more than 20
seconds when guest soft lockup occurred.
My question is, is there any way to get around of this issue? Does it
make sense for qemu to setup an IO thread to issue this ioctl() and let
the VCPU thread return to guest as soon as possible? Or it is kernel's
responsibility to break up the long time ioctl() and return to user space?
Any comments or advice will be appreciated :)
HG
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Occasional VM soft lockup when a remote cdrom is attached
2019-10-11 13:22 Occasional VM soft lockup when a remote cdrom is attached Guoheyi
@ 2019-10-11 19:06 ` John Snow
2019-10-14 2:04 ` Guoheyi
0 siblings, 1 reply; 4+ messages in thread
From: John Snow @ 2019-10-11 19:06 UTC (permalink / raw)
To: Guoheyi, QEMU Developers, qemu-block
Cc: Kevin Wolf, wanghaibin 00208455, zhengxiang 00421807
On 10/11/19 9:22 AM, Guoheyi wrote:
> Hi folks,
>
> We observed Linux on VM occasionally (at very low rate) got soft lockup
> when a remote cdrom is attached. The guest hangs up at below call trace:
>
That's certainly a new one to me :)
> [Tue Oct8 23:02:53 2019]ata_scsi_queuecmd+0xe0/0x2a0 [libata]
>
> [Tue Oct8 23:02:53 2019]scsi_dispatch_cmd+0xec/0x288
>
> [Tue Oct8 23:02:53 2019]scsi_queue_rq+0x5f4/0x6b8
>
> [Tue Oct8 23:02:53 2019]blk_mq_dispatch_rq_list+0xb0/0x690
>
> [Tue Oct8 23:02:53 2019]blk_mq_do_dispatch_sched+0x8c/0x130
>
> [Tue Oct8 23:02:53 2019]blk_mq_sched_dispatch_requests+0x128/0x1f0
>
> [Tue Oct8 23:02:53 2019]__blk_mq_run_hw_queue+0x9c/0x128
>
> [Tue Oct8 23:02:53 2019]__blk_mq_delay_run_hw_queue+0x198/0x1d8
>
> [Tue Oct8 23:02:53 2019]blk_mq_run_hw_queue+0x68/0x180
>
> [Tue Oct8 23:02:53 2019]blk_mq_sched_insert_request+0xbc/0x210
>
> [Tue Oct8 23:02:53 2019]blk_execute_rq_nowait+0x118/0x168
>
> [Tue Oct8 23:02:53 2019]blk_execute_rq+0x74/0xd8
>
> [Tue Oct8 23:02:53 2019]__scsi_execute+0xd8/0x1e0
>
> [Tue Oct8 23:02:53 2019]sr_check_events+0xd4/0x2c8 [sr_mod]
>
> [Tue Oct8 23:02:53 2019]cdrom_check_events+0x34/0x50 [cdrom]
>
> [Tue Oct8 23:02:53 2019]sr_block_check_events+0xdc/0x108 [sr_mod]
>
> [Tue Oct8 23:02:53 2019]disk_check_events+0x60/0x198
>
> [Tue Oct8 23:02:53 2019]disk_events_workfn+0x24/0x30
>
> [Tue Oct8 23:02:53 2019]process_one_work+0x1b4/0x3f8
>
> [Tue Oct8 23:02:53 2019]worker_thread+0x54/0x470
>
> [Tue Oct8 23:02:53 2019]kthread+0x134/0x138
>
> [Tue Oct8 23:02:53 2019]ret_from_fork+0x10/0x18
>
>
> We are running the whole stack on ARM64 platforms, using rcdrom on host
> to connect a remote cdrom, which is appeared as "/dev/sr0" on the host.
> Our Linux kernel version is 4.19.36 and qemu version is 2.8.1, which is
> fairly old but I checked the mainline and found the work flow does not
> change much. And KVM is enabled.
>
> We provide the remote cdrom to guest as a block device, attached under
> ICH SATA bus.
>
>
> The work flow should be like this (please correct me if I was wrong):
>
> 1. There is a kworker thread in guest kernel which will check cdrom
> status periodically.
>
> 2. The call of "ata_scsi_queuecmd" in guest will write AHCI port
> register "PORT_CMD_ISSUE", so this VCPU thread is trapped out to qemu.
>
> 3. qemu will grab the BQL and then dispatch the access to
> ahci_port_write().
>
> 4. For this is a "get event status notification" command, qemu finally
> goes to cmd_get_event_status_notification() and then cdrom_is_inserted().
>
via
cmd_get_event_status_notification (SCSI 0x4A)
event_status_media
blk_is_inserted
> 5. In cdrom_is_inserted(), an ioctl to cdrom fd is issued.
>
Using the bdrv_host_cdrom BlockDriver, for the .bdrv_is_inserted callback.
>
> However, in the last step, we found the ioctl() may have large latency,
> for it is a virtual device of remote cdrom, when the remote server is
> busy and of poor performance. We have observed more than 8 seconds
> latency in half an hour test, and the latency might reach more than 20
> seconds when guest soft lockup occurred.
>
I'm not sure what can be done here. the host_cdrom driver has a few
methods to query state (cdrom_is_inserted, cdrom_eject,
cdrom_lock_medium) and in general code is going to rely on
bdrv_is_inserted returning a truthful answer.
(I'm not sure we have callbacks established to tell when the backing
media we are ourselves relying on has gone away. Maybe it could be
added, but it's not there now. We could maybe cache the answer if we had
something reliable.)
You could always try using the host_device driver instead of the
host_cdrom one, which will just treat it as a "normal" block device
instead of a CDROM one, and doesn't use any cdrom specific ioctls. It
might avoid the costly call.
>
> My question is, is there any way to get around of this issue? Does it
> make sense for qemu to setup an IO thread to issue this ioctl() and let
> the VCPU thread return to guest as soon as possible? Or it is kernel's
> responsibility to break up the long time ioctl() and return to user space?
>
Yeah, I think you could probably try to make this change -- the code is
unfortunately very callback-hell-ish with poor abstraction boundaries,
but obviously the data transfer commands already defer to bottom halves.
(Ideally, I think every ATAPI command would just immediately defer to a
bottom half, but I don't know what effect that would have on callers of
ide_atapi_cmd. I'd have to audit it, and it's quite squiggly.)
"Patches welcome" as they say, but it's quite messy down there.
--js
>
> Any comments or advice will be appreciated :)
>
>
> HG
>
>
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Occasional VM soft lockup when a remote cdrom is attached
2019-10-11 19:06 ` John Snow
@ 2019-10-14 2:04 ` Guoheyi
2019-10-14 16:18 ` John Snow
0 siblings, 1 reply; 4+ messages in thread
From: Guoheyi @ 2019-10-14 2:04 UTC (permalink / raw)
To: John Snow, QEMU Developers, qemu-block
Cc: Kevin Wolf, wanghaibin 00208455, zhengxiang 00421807
Really appreciate your advice. Some comments below:
On 2019/10/12 3:06, John Snow wrote:
>
> On 10/11/19 9:22 AM, Guoheyi wrote:
>> Hi folks,
>>
>> We observed Linux on VM occasionally (at very low rate) got soft lockup
>> when a remote cdrom is attached. The guest hangs up at below call trace:
>>
> That's certainly a new one to me :)
>
>> [Tue Oct8 23:02:53 2019]ata_scsi_queuecmd+0xe0/0x2a0 [libata]
>>
>> [Tue Oct8 23:02:53 2019]scsi_dispatch_cmd+0xec/0x288
>>
>> [Tue Oct8 23:02:53 2019]scsi_queue_rq+0x5f4/0x6b8
>>
>> [Tue Oct8 23:02:53 2019]blk_mq_dispatch_rq_list+0xb0/0x690
>>
>> [Tue Oct8 23:02:53 2019]blk_mq_do_dispatch_sched+0x8c/0x130
>>
>> [Tue Oct8 23:02:53 2019]blk_mq_sched_dispatch_requests+0x128/0x1f0
>>
>> [Tue Oct8 23:02:53 2019]__blk_mq_run_hw_queue+0x9c/0x128
>>
>> [Tue Oct8 23:02:53 2019]__blk_mq_delay_run_hw_queue+0x198/0x1d8
>>
>> [Tue Oct8 23:02:53 2019]blk_mq_run_hw_queue+0x68/0x180
>>
>> [Tue Oct8 23:02:53 2019]blk_mq_sched_insert_request+0xbc/0x210
>>
>> [Tue Oct8 23:02:53 2019]blk_execute_rq_nowait+0x118/0x168
>>
>> [Tue Oct8 23:02:53 2019]blk_execute_rq+0x74/0xd8
>>
>> [Tue Oct8 23:02:53 2019]__scsi_execute+0xd8/0x1e0
>>
>> [Tue Oct8 23:02:53 2019]sr_check_events+0xd4/0x2c8 [sr_mod]
>>
>> [Tue Oct8 23:02:53 2019]cdrom_check_events+0x34/0x50 [cdrom]
>>
>> [Tue Oct8 23:02:53 2019]sr_block_check_events+0xdc/0x108 [sr_mod]
>>
>> [Tue Oct8 23:02:53 2019]disk_check_events+0x60/0x198
>>
>> [Tue Oct8 23:02:53 2019]disk_events_workfn+0x24/0x30
>>
>> [Tue Oct8 23:02:53 2019]process_one_work+0x1b4/0x3f8
>>
>> [Tue Oct8 23:02:53 2019]worker_thread+0x54/0x470
>>
>> [Tue Oct8 23:02:53 2019]kthread+0x134/0x138
>>
>> [Tue Oct8 23:02:53 2019]ret_from_fork+0x10/0x18
>>
>>
>> We are running the whole stack on ARM64 platforms, using rcdrom on host
>> to connect a remote cdrom, which is appeared as "/dev/sr0" on the host.
>> Our Linux kernel version is 4.19.36 and qemu version is 2.8.1, which is
>> fairly old but I checked the mainline and found the work flow does not
>> change much. And KVM is enabled.
>>
>> We provide the remote cdrom to guest as a block device, attached under
>> ICH SATA bus.
>>
>>
>> The work flow should be like this (please correct me if I was wrong):
>>
>> 1. There is a kworker thread in guest kernel which will check cdrom
>> status periodically.
>>
>> 2. The call of "ata_scsi_queuecmd" in guest will write AHCI port
>> register "PORT_CMD_ISSUE", so this VCPU thread is trapped out to qemu.
>>
>> 3. qemu will grab the BQL and then dispatch the access to
>> ahci_port_write().
>>
>> 4. For this is a "get event status notification" command, qemu finally
>> goes to cmd_get_event_status_notification() and then cdrom_is_inserted().
>>
> via
>
> cmd_get_event_status_notification (SCSI 0x4A)
> event_status_media
> blk_is_inserted
>
>> 5. In cdrom_is_inserted(), an ioctl to cdrom fd is issued.
>>
> Using the bdrv_host_cdrom BlockDriver, for the .bdrv_is_inserted callback.
>
>> However, in the last step, we found the ioctl() may have large latency,
>> for it is a virtual device of remote cdrom, when the remote server is
>> busy and of poor performance. We have observed more than 8 seconds
>> latency in half an hour test, and the latency might reach more than 20
>> seconds when guest soft lockup occurred.
>>
> I'm not sure what can be done here. the host_cdrom driver has a few
> methods to query state (cdrom_is_inserted, cdrom_eject,
> cdrom_lock_medium) and in general code is going to rely on
> bdrv_is_inserted returning a truthful answer.
>
> (I'm not sure we have callbacks established to tell when the backing
> media we are ourselves relying on has gone away. Maybe it could be
> added, but it's not there now. We could maybe cache the answer if we had
> something reliable.)
>
> You could always try using the host_device driver instead of the
> host_cdrom one, which will just treat it as a "normal" block device
> instead of a CDROM one, and doesn't use any cdrom specific ioctls. It
> might avoid the costly call.
By following this work around, the cdrom device was represented to guest
as a normal disk ("/dev/sdb"). We are not sure if this will cause more
functional differences.
>
>> My question is, is there any way to get around of this issue? Does it
>> make sense for qemu to setup an IO thread to issue this ioctl() and let
>> the VCPU thread return to guest as soon as possible? Or it is kernel's
>> responsibility to break up the long time ioctl() and return to user space?
>>
> Yeah, I think you could probably try to make this change -- the code is
> unfortunately very callback-hell-ish with poor abstraction boundaries,
> but obviously the data transfer commands already defer to bottom halves.
>
> (Ideally, I think every ATAPI command would just immediately defer to a
> bottom half, but I don't know what effect that would have on callers of
> ide_atapi_cmd. I'd have to audit it, and it's quite squiggly.)
>
> "Patches welcome" as they say, but it's quite messy down there.
It sounds like a long term work, especially for we are not familiar with
SATA/AHCI devices...
Thanks,
HG
>
> --js
>
>> Any comments or advice will be appreciated :)
>>
>>
>> HG
>>
>>
>>
>>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Occasional VM soft lockup when a remote cdrom is attached
2019-10-14 2:04 ` Guoheyi
@ 2019-10-14 16:18 ` John Snow
0 siblings, 0 replies; 4+ messages in thread
From: John Snow @ 2019-10-14 16:18 UTC (permalink / raw)
To: Guoheyi, QEMU Developers, qemu-block
Cc: Kevin Wolf, wanghaibin 00208455, zhengxiang 00421807
On 10/13/19 10:04 PM, Guoheyi wrote:
> Really appreciate your advice. Some comments below:
>
>
> On 2019/10/12 3:06, John Snow wrote:
>>
>> On 10/11/19 9:22 AM, Guoheyi wrote:
>>> Hi folks,
>>>
>>> We observed Linux on VM occasionally (at very low rate) got soft lockup
>>> when a remote cdrom is attached. The guest hangs up at below call trace:
>>>
>> That's certainly a new one to me :)
>>
>>> [Tue Oct8 23:02:53 2019]ata_scsi_queuecmd+0xe0/0x2a0 [libata]
>>>
>>> [Tue Oct8 23:02:53 2019]scsi_dispatch_cmd+0xec/0x288
>>>
>>> [Tue Oct8 23:02:53 2019]scsi_queue_rq+0x5f4/0x6b8
>>>
>>> [Tue Oct8 23:02:53 2019]blk_mq_dispatch_rq_list+0xb0/0x690
>>>
>>> [Tue Oct8 23:02:53 2019]blk_mq_do_dispatch_sched+0x8c/0x130
>>>
>>> [Tue Oct8 23:02:53 2019]blk_mq_sched_dispatch_requests+0x128/0x1f0
>>>
>>> [Tue Oct8 23:02:53 2019]__blk_mq_run_hw_queue+0x9c/0x128
>>>
>>> [Tue Oct8 23:02:53 2019]__blk_mq_delay_run_hw_queue+0x198/0x1d8
>>>
>>> [Tue Oct8 23:02:53 2019]blk_mq_run_hw_queue+0x68/0x180
>>>
>>> [Tue Oct8 23:02:53 2019]blk_mq_sched_insert_request+0xbc/0x210
>>>
>>> [Tue Oct8 23:02:53 2019]blk_execute_rq_nowait+0x118/0x168
>>>
>>> [Tue Oct8 23:02:53 2019]blk_execute_rq+0x74/0xd8
>>>
>>> [Tue Oct8 23:02:53 2019]__scsi_execute+0xd8/0x1e0
>>>
>>> [Tue Oct8 23:02:53 2019]sr_check_events+0xd4/0x2c8 [sr_mod]
>>>
>>> [Tue Oct8 23:02:53 2019]cdrom_check_events+0x34/0x50 [cdrom]
>>>
>>> [Tue Oct8 23:02:53 2019]sr_block_check_events+0xdc/0x108 [sr_mod]
>>>
>>> [Tue Oct8 23:02:53 2019]disk_check_events+0x60/0x198
>>>
>>> [Tue Oct8 23:02:53 2019]disk_events_workfn+0x24/0x30
>>>
>>> [Tue Oct8 23:02:53 2019]process_one_work+0x1b4/0x3f8
>>>
>>> [Tue Oct8 23:02:53 2019]worker_thread+0x54/0x470
>>>
>>> [Tue Oct8 23:02:53 2019]kthread+0x134/0x138
>>>
>>> [Tue Oct8 23:02:53 2019]ret_from_fork+0x10/0x18
>>>
>>>
>>> We are running the whole stack on ARM64 platforms, using rcdrom on host
>>> to connect a remote cdrom, which is appeared as "/dev/sr0" on the host.
>>> Our Linux kernel version is 4.19.36 and qemu version is 2.8.1, which is
>>> fairly old but I checked the mainline and found the work flow does not
>>> change much. And KVM is enabled.
>>>
>>> We provide the remote cdrom to guest as a block device, attached under
>>> ICH SATA bus.
>>>
>>>
>>> The work flow should be like this (please correct me if I was wrong):
>>>
>>> 1. There is a kworker thread in guest kernel which will check cdrom
>>> status periodically.
>>>
>>> 2. The call of "ata_scsi_queuecmd" in guest will write AHCI port
>>> register "PORT_CMD_ISSUE", so this VCPU thread is trapped out to qemu.
>>>
>>> 3. qemu will grab the BQL and then dispatch the access to
>>> ahci_port_write().
>>>
>>> 4. For this is a "get event status notification" command, qemu finally
>>> goes to cmd_get_event_status_notification() and then
>>> cdrom_is_inserted().
>>>
>> via
>>
>> cmd_get_event_status_notification (SCSI 0x4A)
>> event_status_media
>> blk_is_inserted
>>
>>> 5. In cdrom_is_inserted(), an ioctl to cdrom fd is issued.
>>>
>> Using the bdrv_host_cdrom BlockDriver, for the .bdrv_is_inserted
>> callback.
>>
>>> However, in the last step, we found the ioctl() may have large latency,
>>> for it is a virtual device of remote cdrom, when the remote server is
>>> busy and of poor performance. We have observed more than 8 seconds
>>> latency in half an hour test, and the latency might reach more than 20
>>> seconds when guest soft lockup occurred.
>>>
>> I'm not sure what can be done here. the host_cdrom driver has a few
>> methods to query state (cdrom_is_inserted, cdrom_eject,
>> cdrom_lock_medium) and in general code is going to rely on
>> bdrv_is_inserted returning a truthful answer.
>>
>> (I'm not sure we have callbacks established to tell when the backing
>> media we are ourselves relying on has gone away. Maybe it could be
>> added, but it's not there now. We could maybe cache the answer if we had
>> something reliable.)
>>
>> You could always try using the host_device driver instead of the
>> host_cdrom one, which will just treat it as a "normal" block device
>> instead of a CDROM one, and doesn't use any cdrom specific ioctls. It
>> might avoid the costly call.
> By following this work around, the cdrom device was represented to guest
> as a normal disk ("/dev/sdb"). We are not sure if this will cause more
> functional differences.
>
What command line are you using? You should be able to specify the
emulator (ask for device ide-cd) separately from the backend (specify
block driver host_device.)
It might require the use of -blockdev command line flags instead of
sugared variants (-cdrom, -drive.)
>>
>>> My question is, is there any way to get around of this issue? Does it
>>> make sense for qemu to setup an IO thread to issue this ioctl() and let
>>> the VCPU thread return to guest as soon as possible? Or it is kernel's
>>> responsibility to break up the long time ioctl() and return to user
>>> space?
>>>
>> Yeah, I think you could probably try to make this change -- the code is
>> unfortunately very callback-hell-ish with poor abstraction boundaries,
>> but obviously the data transfer commands already defer to bottom halves.
>>
>> (Ideally, I think every ATAPI command would just immediately defer to a
>> bottom half, but I don't know what effect that would have on callers of
>> ide_atapi_cmd. I'd have to audit it, and it's quite squiggly.)
>>
>> "Patches welcome" as they say, but it's quite messy down there.
> It sounds like a long term work, especially for we are not familiar with
> SATA/AHCI devices...
>
> Thanks,
>
> HG
>
Yes, probably. It's low priority right now, but if it becomes more
important you can always try formal channels at RH to get it escalated.
--js.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-10-14 17:25 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-11 13:22 Occasional VM soft lockup when a remote cdrom is attached Guoheyi
2019-10-11 19:06 ` John Snow
2019-10-14 2:04 ` Guoheyi
2019-10-14 16:18 ` John Snow
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).