All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	Eugenio Perez Martin <eperezma@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	virtio-comment@lists.oasis-open.org,
	Virtio-Dev <virtio-dev@lists.oasis-open.org>,
	Max Gurtovoy <mgurtovoy@nvidia.com>,
	Cornelia Huck <cohuck@redhat.com>, Oren Duer <oren@nvidia.com>,
	Shahaf Shuler <shahafs@nvidia.com>,
	Parav Pandit <parav@nvidia.com>, Bodong Wang <bodong@nvidia.com>,
	Alexander Mikheev <amikheev@nvidia.com>,
	Halil Pasic <pasic@linux.ibm.com>
Subject: Re: [virtio-comment] [PATCH V2 2/2] virtio: introduce STOP status bit
Date: Wed, 14 Jul 2021 10:53:34 +0100	[thread overview]
Message-ID: <YO60HpxfZnmfMCMj@stefanha-x1.localdomain> (raw)
In-Reply-To: <8a2037df-e527-dcb4-c4c8-a568885180e4@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 10713 bytes --]

On Tue, Jul 13, 2021 at 08:16:35PM +0800, Jason Wang wrote:
> 
> 在 2021/7/13 下午6:00, Stefan Hajnoczi 写道:
> > On Tue, Jul 13, 2021 at 11:27:03AM +0800, Jason Wang wrote:
> > > 在 2021/7/12 下午5:57, Stefan Hajnoczi 写道:
> > > > On Mon, Jul 12, 2021 at 12:00:39PM +0800, Jason Wang wrote:
> > > > > 在 2021/7/11 上午4:36, Michael S. Tsirkin 写道:
> > > > > > On Fri, Jul 09, 2021 at 07:23:33PM +0200, Eugenio Perez Martin wrote:
> > > > > > > > >      If I understand correctly, this is all
> > > > > > > > > driven from the driver inside the guest, so for this to work
> > > > > > > > > the guest must be running and already have initialised the driver.
> > > > > > > > Yes.
> > > > > > > > 
> > > > > > > As I see it, the feature can be driven entirely by the VMM as long as
> > > > > > > it intercept the relevant configuration space (PCI, MMIO, etc) from
> > > > > > > guest's reads and writes, and present it as coherent and transparent
> > > > > > > for the guest. Some use cases I can imagine with a physical device (or
> > > > > > > vp_vpda device) with VIRTIO_F_STOP:
> > > > > > > 
> > > > > > > 1) The VMM chooses not to pass the feature flag. The guest cannot stop
> > > > > > > the device, so any write to this flag is an error/undefined.
> > > > > > > 2) The VMM passes the flag to the guest. The guest can stop the device.
> > > > > > > 2.1) The VMM stops the device to perform a live migration, and the
> > > > > > > guest does not write to STOP in any moment of the LM. It resets the
> > > > > > > destination device with the state, and then initializes the device.
> > > > > > > 2.2) The guest stops the device and, when STOP(32) is set, the source
> > > > > > > VMM migrates the device status. The destination VMM realizes the bit,
> > > > > > > so it sets the bit in the destination too after device initialization.
> > > > > > > 2.3) The device is not initialized by the guest so it doesn't matter
> > > > > > > what bit has the HW, but the VM can be migrated.
> > > > > > > 
> > > > > > > Am I missing something?
> > > > > > > 
> > > > > > > Thanks!
> > > > > > It's doable like this. It's all a lot of hoops to jump through though.
> > > > > > It's also not easy for devices to implement.
> > > > > It just requires a new status bit. Anything that makes you think it's hard
> > > > > to implement?
> > > > > 
> > > > > E.g for networking device, it should be sufficient to use this bit + the
> > > > > virtqueue state.
> > > > > 
> > > > > 
> > > > > > Why don't we design the feature in a way that is useable by VMMs
> > > > > > and implementable by devices in a simple way?
> > > > > It use the common technology like register shadowing without any further
> > > > > stuffs.
> > > > > 
> > > > > Or do you have any other ideas?
> > > > > 
> > > > > (I think we all know migration will be very hard if we simply pass through
> > > > > those state registers).
> > > > If an admin virtqueue is used instead of the STOP Device Status field
> > > > bit then there's no need to re-read the Device Status field in a loop
> > > > until the device has stopped.
> > > 
> > > Probably not. Let me to clarify several points:
> > > 
> > > - This proposal has nothing to do with admin virtqueue. Actually, admin
> > > virtqueue could be used for carrying any basic device facility like status
> > > bit. E.g I'm going to post patches that use admin virtqueue as a "transport"
> > > for device slicing at virtio level.
> > > - Even if we had introduced admin virtqueue, we still need a per function
> > > interface for this. This is a must for nested virtualization, we can't
> > > always expect things like PF can be assigned to L1 guest.
> > > - According to the proposal, there's no need for the device to complete all
> > > the consumed buffers, device can choose to expose those inflight descriptors
> > > in a device specific way and set the STOP bit. This means, if we have the
> > > device specific in-flight descriptor reporting facility, the device can
> > > almost set the STOP bit immediately.
> > > - If we don't go with the basic device facility but using the admin
> > > virtqueue specific method, we still need to clarify how it works with the
> > > device status state machine, it will be some kind of sub-states which looks
> > > much more complicated than the current proposal.
> > > 
> > > 
> > > > When migrating a guest with many VIRTIO devices a busy waiting approach
> > > > extends downtime if implemented sequentially (stopping one device at a
> > > > time).
> > > 
> > > Well. You need some kinds of waiting for sure, the device/DMA needs sometime
> > > to be stopped. The downtime is determined by a specific virtio
> > > implementation which is hard to be restricted at the spec level. We can
> > > clarify that the device must set the STOP bit in e.g 100ms.
> > > 
> > > 
> > > >    It can be implemented concurrently (setting the STOP bit on all
> > > > devices and then looping until all their Device Status fields have the
> > > > bit set), but this becomes more complex to implement.
> > > 
> > > I still don't get what kind of complexity did you worry here.
> > > 
> > > 
> > > > I'm a little worried about adding a new bit that requires busy
> > > > waiting...
> > > 
> > > Busy wait is not something that is introduced in this patch:
> > > 
> > > 4.1.4.3.2 Driver Requirements: Common configuration structure layout
> > > 
> > > After writing 0 to device_status, the driver MUST wait for a read of
> > > device_status to return 0 before reinitializing the device.
> > > 
> > > Since it was required for at least one transport. We need do something
> > > similar to when introducing basic facility.
> > Adding the STOP but as a Device Status bit is a small and clean VIRTIO
> > spec change. I like that.
> > 
> > On the other hand, devices need time to stop and that time can be
> > unbounded. For example, software virtio-blk/scsi implementations since
> > cannot immediately cancel in-flight I/O requests on Linux hosts.
> > 
> > The natural interface for long-running operations is virtqueue requests.
> > That's why I mentioned the alternative of using an admin virtqueue
> > instead of a Device Status bit.
> 
> 
> So I'm not against the admin virtqueue. As said before, admin virtqueue
> could be used for carrying the device status bit.
> 
> Send a command to set STOP status bit to admin virtqueue. Device will make
> the command buffer used after it has successfully stopped the device.
> 
> AFAIK, they are not mutually exclusive, since they are trying to solve
> different problems.
> 
> Device status - basic device facility
> 
> Admin virtqueue - transport/device specific way to implement (part of) the
> device facility
> 
> > 
> > Although you mentioned that the stopped state needs to be reflected in
> > the Device Status field somehow, I'm not sure about that since the
> > driver typically doesn't need to know whether the device is being
> > migrated.
> 
> 
> The guest won't see the real device status bit. VMM will shadow the device
> status bit in this case.
> 
> E.g with the current vhost-vDPA, vDPA behave like a vhost device, guest is
> unaware of the migration.
> 
> STOP status bit is set by Qemu to real virtio hardware. But guest will only
> see the DRIVER_OK without STOP.
> 
> It's not hard to implement the nested on top, see the discussion initiated
> by Eugenio about how expose VIRTIO_F_STOP to guest for nested live
> migration.
> 
> 
> >   In fact, the VMM would need to hide this bit and it's safer to
> > keep it out-of-band instead of risking exposing it by accident.
> 
> 
> See above, VMM may choose to hide or expose the capability. It's useful for
> migrating a nested guest.
> 
> If we design an interface that can be used in the nested environment, it's
> not an ideal interface.
> 
> 
> > 
> > In addition, stateful devices need to load/save non-trivial amounts of
> > data. They need DMA to do this efficiently, so an admin virtqueue is a
> > good fit again.
> 
> 
> I don't get the point here. You still need to address the exact the similar
> issues for admin virtqueue: the unbound time in freezing the device, the
> interaction with the virtio device status state machine.

Device state state can be large so a register interface would be a
bottleneck. DMA is needed. I think a virtqueue is a good fit for
saving/loading device state.

If we're going to need it for saving/loading device state anyway, then
that's another reason to consider using a virtqueue for stopping the
device, saving/loading virtqueue state, etc.

> And with admin virtqueue, it's actually far more complicated e.g you need to
> define how to synchronize the concurrent access to the basic facilites.

I'm not sure I understand? Driver complexity? Device implementation
complexity?

> >   This isn't addressed in this patch series, but it's the
> > next step and I think it's worth planning for it.
> 
> 
> I agree, but for admin virtqueue, it's better to use it as a full transport
> instead of just use it for carrying part of the device basic facilities.
> Actually, as I said, I had patches to do that. But the motivation is not for
> live migration but for device slicing. I will post RFC before the KVM Forum
> this year (since I'm going to talk device slicing at virtio level). It does
> not conflict with Max's proposal, since migration part is not there.

Great, I'm looking forward to your device slicing idea.

> > If all devices could stop very quickly and were stateless then I would
> > agree that the STOP bit is an ideal solution.
> 
> 
> Note that in Max's proposal it also have something similar the "quiescence"
> and "freezed" state. It doesn't differ from STOP bit fundamentally. As Max
> suggested, we could introduce more status bit if necessary or even consider
> to unify Max's proposal with mine.
> 
> 
> > I think it will be
> > necessary to support devices that don't behave like that, so the admin
> > virtqueue approach seems worth exploring.
> 
> 
> Yes and as mentioned in another thread. I think the best way is to define
> the device specific state first and then consider how to implement the
> interface.
> 
> Admin virtqueue is worth to explore but should not be the only method.
> Device/transport are freed to implement it in many ways based on the actual
> hardware.

What's the advantage of this proposal compared to an admin virtqueue? I
see the admin virtqueue as a more general interface than this proposal
and it can cover this use case.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2021-07-14  9:53 UTC|newest]

Thread overview: 115+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-06  4:33 [PATCH V2 0/2] Vitqueue State Synchronization Jason Wang
2021-07-06  4:33 ` [PATCH V2 1/2] virtio: introduce virtqueue state as basic facility Jason Wang
2021-07-06  9:32   ` Michael S. Tsirkin
2021-07-06 17:09     ` Eugenio Perez Martin
2021-07-06 19:08       ` Michael S. Tsirkin
2021-07-06 23:49         ` Max Gurtovoy
2021-07-07  2:50           ` Jason Wang
2021-07-07 12:03             ` Max Gurtovoy
2021-07-07 12:11               ` [virtio-comment] " Jason Wang
2021-07-07  2:42         ` Jason Wang
2021-07-07  4:36           ` Jason Wang
2021-07-07  2:41       ` Jason Wang
2021-07-06 12:27   ` [virtio-comment] " Cornelia Huck
2021-07-07  3:29     ` [virtio-dev] " Jason Wang
2021-07-06  4:33 ` [PATCH V2 2/2] virtio: introduce STOP status bit Jason Wang
2021-07-06  9:24   ` [virtio-comment] " Dr. David Alan Gilbert
2021-07-07  3:20     ` Jason Wang
2021-07-09 17:23       ` Eugenio Perez Martin
2021-07-10 20:36         ` Michael S. Tsirkin
2021-07-12  4:00           ` Jason Wang
2021-07-12  9:57             ` Stefan Hajnoczi
2021-07-13  3:27               ` Jason Wang
2021-07-13  8:19                 ` Cornelia Huck
2021-07-13  9:13                   ` Jason Wang
2021-07-13 11:31                     ` Cornelia Huck
2021-07-13 12:23                       ` Jason Wang
2021-07-13 12:28                         ` Cornelia Huck
2021-07-14  2:47                           ` Jason Wang
2021-07-14  6:20                             ` Cornelia Huck
2021-07-14  8:53                               ` Jason Wang
2021-07-14  9:24                                 ` [virtio-dev] " Cornelia Huck
2021-07-15  2:01                                   ` Jason Wang
2021-07-13 10:00                 ` Stefan Hajnoczi
2021-07-13 12:16                   ` Jason Wang
2021-07-14  9:53                     ` Stefan Hajnoczi [this message]
2021-07-14 10:29                       ` Jason Wang
2021-07-14 15:07                         ` Stefan Hajnoczi
2021-07-14 16:22                           ` Max Gurtovoy
2021-07-15  1:38                             ` Jason Wang
2021-07-15  9:26                               ` Stefan Hajnoczi
2021-07-16  1:48                                 ` Jason Wang
2021-07-19 12:08                                   ` Stefan Hajnoczi
2021-07-20  2:46                                     ` Jason Wang
2021-07-15 21:18                               ` Michael S. Tsirkin
2021-07-16  2:19                                 ` Jason Wang
2021-07-15  1:35                           ` Jason Wang
2021-07-15  9:16                             ` [virtio-dev] " Stefan Hajnoczi
2021-07-16  1:44                               ` Jason Wang
2021-07-19 12:18                                 ` [virtio-dev] " Stefan Hajnoczi
2021-07-20  2:50                                   ` Jason Wang
2021-07-20 10:31                                 ` Cornelia Huck
2021-07-21  2:59                                   ` Jason Wang
2021-07-15 10:01                             ` Stefan Hajnoczi
2021-07-16  2:03                               ` Jason Wang
2021-07-16  3:53                                 ` Jason Wang
2021-07-19 12:45                                   ` Stefan Hajnoczi
2021-07-20  3:04                                     ` Jason Wang
2021-07-20  8:50                                       ` Stefan Hajnoczi
2021-07-20 10:48                                         ` Cornelia Huck
2021-07-20 12:47                                           ` Stefan Hajnoczi
2021-07-21  2:29                                         ` Jason Wang
2021-07-21 10:20                                           ` Stefan Hajnoczi
2021-07-22  7:33                                             ` Jason Wang
2021-07-22 10:24                                               ` Stefan Hajnoczi
2021-07-22 13:08                                                 ` Jason Wang
2021-07-26 15:07                                                   ` Stefan Hajnoczi
2021-07-27  7:43                                                     ` Max Reitz
2021-08-03  6:33                                                     ` Jason Wang
2021-08-03 10:37                                                       ` Stefan Hajnoczi
2021-08-03 11:42                                                         ` Jason Wang
2021-08-03 12:22                                                           ` Dr. David Alan Gilbert
2021-08-04  1:42                                                             ` Jason Wang
2021-08-04  9:07                                                               ` Dr. David Alan Gilbert
2021-08-05  6:38                                                                 ` Jason Wang
2021-08-05  8:19                                                                   ` Dr. David Alan Gilbert
2021-08-06  6:15                                                                     ` Jason Wang
2021-08-08  9:31                                                                       ` Max Gurtovoy
2021-08-04  9:20                                                               ` Stefan Hajnoczi
2021-08-05  6:45                                                                 ` Jason Wang
2021-08-04  8:38                                                             ` Stefan Hajnoczi
2021-08-04  8:36                                                           ` Stefan Hajnoczi
2021-08-05  6:35                                                             ` Jason Wang
2021-07-19 12:43                                 ` Stefan Hajnoczi
2021-07-20  3:02                                   ` Jason Wang
2021-07-20 10:19                                     ` Stefan Hajnoczi
2021-07-21  2:52                                       ` Jason Wang
2021-07-21 10:42                                         ` Stefan Hajnoczi
2021-07-22  2:08                                           ` Jason Wang
2021-07-22 10:30                                             ` Stefan Hajnoczi
2021-07-20 12:27                                     ` Max Gurtovoy
2021-07-20 12:57                                       ` Stefan Hajnoczi
2021-07-20 13:09                                         ` Max Gurtovoy
2021-07-21  3:06                                           ` Jason Wang
2021-07-21 10:48                                           ` Stefan Hajnoczi
2021-07-21 11:37                                             ` Max Gurtovoy
2021-07-21  3:09                                       ` Jason Wang
2021-07-21 11:43                                         ` Max Gurtovoy
2021-07-22  2:01                                           ` Jason Wang
2021-07-12  3:53         ` Jason Wang
2021-07-06 12:50   ` [virtio-comment] " Cornelia Huck
2021-07-06 13:18     ` Jason Wang
2021-07-06 14:27       ` [virtio-dev] " Cornelia Huck
2021-07-07  0:05         ` Max Gurtovoy
2021-07-07  3:14           ` Jason Wang
2021-07-07  2:56         ` Jason Wang
2021-07-07 16:45           ` [virtio-comment] " Cornelia Huck
2021-07-08  4:06             ` Jason Wang
2021-07-09 17:35   ` Eugenio Perez Martin
2021-07-12  4:06     ` Jason Wang
2021-07-10 20:40   ` Michael S. Tsirkin
2021-07-12  4:04     ` Jason Wang
2021-07-12 10:12 ` [PATCH V2 0/2] Vitqueue State Synchronization Stefan Hajnoczi
2021-07-13  3:08   ` Jason Wang
2021-07-13 10:30     ` Stefan Hajnoczi
2021-07-13 11:56       ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YO60HpxfZnmfMCMj@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=amikheev@nvidia.com \
    --cc=bodong@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eperezma@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=mgurtovoy@nvidia.com \
    --cc=mst@redhat.com \
    --cc=oren@nvidia.com \
    --cc=parav@nvidia.com \
    --cc=pasic@linux.ibm.com \
    --cc=shahafs@nvidia.com \
    --cc=virtio-comment@lists.oasis-open.org \
    --cc=virtio-dev@lists.oasis-open.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.