Occasional VM soft lockup when a remote cdrom is attached

* Occasional VM soft lockup when a remote cdrom is attached
@ 2019-10-11 13:22 Guoheyi
  2019-10-11 19:06 ` John Snow
  0 siblings, 1 reply; 4+ messages in thread
From: Guoheyi @ 2019-10-11 13:22 UTC (permalink / raw)
  To: QEMU Developers, qemu-block
  Cc: Kevin Wolf, wanghaibin 00208455, zhengxiang 00421807, John Snow

Hi folks,

We observed Linux on VM occasionally (at very low rate) got soft lockup 
when a remote cdrom is attached. The guest hangs up at below call trace:

[Tue Oct8 23:02:53 2019]ata_scsi_queuecmd+0xe0/0x2a0 [libata]

[Tue Oct8 23:02:53 2019]scsi_dispatch_cmd+0xec/0x288

[Tue Oct8 23:02:53 2019]scsi_queue_rq+0x5f4/0x6b8

[Tue Oct8 23:02:53 2019]blk_mq_dispatch_rq_list+0xb0/0x690

[Tue Oct8 23:02:53 2019]blk_mq_do_dispatch_sched+0x8c/0x130

[Tue Oct8 23:02:53 2019]blk_mq_sched_dispatch_requests+0x128/0x1f0

[Tue Oct8 23:02:53 2019]__blk_mq_run_hw_queue+0x9c/0x128

[Tue Oct8 23:02:53 2019]__blk_mq_delay_run_hw_queue+0x198/0x1d8

[Tue Oct8 23:02:53 2019]blk_mq_run_hw_queue+0x68/0x180

[Tue Oct8 23:02:53 2019]blk_mq_sched_insert_request+0xbc/0x210

[Tue Oct8 23:02:53 2019]blk_execute_rq_nowait+0x118/0x168

[Tue Oct8 23:02:53 2019]blk_execute_rq+0x74/0xd8

[Tue Oct8 23:02:53 2019]__scsi_execute+0xd8/0x1e0

[Tue Oct8 23:02:53 2019]sr_check_events+0xd4/0x2c8 [sr_mod]

[Tue Oct8 23:02:53 2019]cdrom_check_events+0x34/0x50 [cdrom]

[Tue Oct8 23:02:53 2019]sr_block_check_events+0xdc/0x108 [sr_mod]

[Tue Oct8 23:02:53 2019]disk_check_events+0x60/0x198

[Tue Oct8 23:02:53 2019]disk_events_workfn+0x24/0x30

[Tue Oct8 23:02:53 2019]process_one_work+0x1b4/0x3f8

[Tue Oct8 23:02:53 2019]worker_thread+0x54/0x470

[Tue Oct8 23:02:53 2019]kthread+0x134/0x138

[Tue Oct8 23:02:53 2019]ret_from_fork+0x10/0x18

We are running the whole stack on ARM64 platforms, using rcdrom on host 
to connect a remote cdrom, which is appeared as "/dev/sr0" on the host. 
Our Linux kernel version is 4.19.36 and qemu version is 2.8.1, which is 
fairly old but I checked the mainline and found the work flow does not 
change much. And KVM is enabled.

We provide the remote cdrom to guest as a block device, attached under 
ICH SATA bus.

The work flow should be like this (please correct me if I was wrong):

1. There is a kworker thread in guest kernel which will check cdrom 
status periodically.

2. The call of "ata_scsi_queuecmd" in guest will write AHCI port 
register "PORT_CMD_ISSUE", so this VCPU thread is trapped out to qemu.

3. qemu will grab the BQL and then dispatch the access to ahci_port_write().

4. For this is a "get event status notification" command, qemu finally 
goes to cmd_get_event_status_notification() and then cdrom_is_inserted().

5. In cdrom_is_inserted(), an ioctl to cdrom fd is issued.

However, in the last step, we found the ioctl() may have large latency, 
for it is a virtual device of remote cdrom, when the remote server is 
busy and of poor performance. We have observed more than 8 seconds 
latency in half an hour test, and the latency might reach more than 20 
seconds when guest soft lockup occurred.

My question is, is there any way to get around of this issue? Does it 
make sense for qemu to setup an IO thread to issue this ioctl() and let 
the VCPU thread return to guest as soon as possible? Or it is kernel's 
responsibility to break up the long time ioctl() and return to user space?

Any comments or advice will be appreciated :)

HG

^ permalink raw reply	[flat|nested] 4+ messages in thread