On Mon, May 24, 2021 at 07:29:06AM -0400, Michael S. Tsirkin wrote: > On Mon, May 24, 2021 at 12:37:48PM +0200, Eugenio Perez Martin wrote: > > On Mon, May 24, 2021 at 11:38 AM Michael S. Tsirkin wrote: > > > > > > On Wed, May 19, 2021 at 06:28:34PM +0200, Eugenio Pérez wrote: > > > > Commit 17 introduces the buffer forwarding. Previous one are for > > > > preparations again, and laters are for enabling some obvious > > > > optimizations. However, it needs the vdpa device to be able to map > > > > every IOVA space, and some vDPA devices are not able to do so. Checking > > > > of this is added in previous commits. > > > > > > That might become a significant limitation. And it worries me that > > > this is such a big patchset which might yet take a while to get > > > finalized. > > > > > > > Sorry, maybe I've been unclear here: Latter commits in this series > > address this limitation. Still not perfect: for example, it does not > > support adding or removing guest's memory at the moment, but this > > should be easy to implement on top. > > > > The main issue I'm observing is from the kernel if I'm not wrong: If I > > unmap every address, I cannot re-map them again. But code in this > > patchset is mostly final, except for the comments it may arise in the > > mail list of course. > > > > > I have an idea: how about as a first step we implement a transparent > > > switch from vdpa to a software virtio in QEMU or a software vhost in > > > kernel? > > > > > > This will give us live migration quickly with performance comparable > > > to failover but without dependance on guest cooperation. > > > > > > > I think it should be doable. I'm not sure about the effort that needs > > to be done in qemu to hide these "hypervisor-failover devices" from > > guest's view but it should be comparable to failover, as you say. > > > > Networking should be ok by its nature, although it could require care > > on the host hardware setup. But I'm not sure how other types of > > vhost/vdpa devices may work that way. How would a disk/scsi device > > switch modes? Can the kernel take control of the vdpa device through > > vhost, and just start reporting with a dirty bitmap? > > > > Thanks! > > It depends of course, e.g. blk is mostly reads/writes so > not a lot of state. just don't reorder or drop requests. QEMU's virtio-blk does not attempt to change states (e.g. quiesce the device or switch between vhost kernel/QEMU, etc) while there are in-flight requests. Instead all currently active requests must complete (in some cases they can be cancelled to stop them early). Note that failed requests can be kept in a list across the switch and then resubmitted later. The underlying storage never has requests in flight while the device is switched. The reason QEMU does this is because there's no way to hand over an in-flight preadv(2), Linux AIO, or other host kernel block layer request to another process. Stefan