Long term approaches to mitigate device reset issue in vhost-user-scsi

* Long term approaches to mitigate device reset issue in vhost-user-scsi
@ 2019-10-25 13:40 Raphael Norwitz
  2019-10-26 16:20 ` Michael S. Tsirkin
  0 siblings, 1 reply; 2+ messages in thread
From: Raphael Norwitz @ 2019-10-25 13:40 UTC (permalink / raw)
  To: mst; +Cc: qemu-devel, felipe

Hi MST,

We are trying to develop a long term fix to the following issue with
vhost-user-scsi:

When a live migration starts, Qemu sends a SET_VRING_ADDR message to
update the VQ's flags (turning log on). We can't distinguish that
message from the first SET_VRING_ADDR message sent after a device
reset (given that vhost-user backends are not notified about resets).
That distinction is important because we need to know whether to
refetch the used ring from guest memory.

A while back we sent a patch [1] (which we still use internally) to introduce a
message which tells vhost-user backends about device resets. No one
ever responded to that patch. They are getting clunky to maintain
and we would prefer to converge on a solution which is inline with
upstream.

[1] https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg05077.html

Vhost seems to support the concept of a reset through the reset_device
callback in the VhostOps struct. Currently, the vhost-user VhostOps
reset callback sends RESET_OWNER message.

The docs currently state, though, that this message is obsolete. Looking
at the history, I see change d1f8b30ec8dde0318fd1b98d24a64926feae9625
actually changed the message name to RESET_DEVICE, although it was
subsequently changed back to RESET_OWNER.

With this in mind, we think the code should be improved by:

1) Stopping qemu from sending the RESET_OWNER message on the
vhost-user device_reset callback.
2) Amending the docs to better align with the code.
3) If you agree with 1), adding a separate DEVICE_RESET message.

If you agree with 1) and 3) would you reconsider patch [1]? If so, I will
have to update the patch because the message/features numbers
are now taken. Should I update the patch and resend?

If you don't plan on stopping Qemu from sending RESET_OWNER,
I'd like to post a patch allowing vhost-user-scsi benefit from
the RESET_OWNER message (as it currently don't offer a device
reset callback).

Thanks,
Raphael

^ permalink raw reply	[flat|nested] 2+ messages in thread