On Wed, Dec 09, 2020 at 06:08:14PM +0100, Eugenio Perez Martin wrote:
> On Mon, Dec 7, 2020 at 6:42 PM Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > On Fri, Nov 20, 2020 at 07:50:45PM +0100, Eugenio Pérez wrote:
> > > +{
> > > +    struct vhost_vring_file file = {
> > > +        .index = idx
> > > +    };
> > > +    VirtQueue *vq = virtio_get_queue(dev->vdev, idx);
> > > +    VhostShadowVirtqueue *svq;
> > > +    int r;
> > > +
> > > +    svq = g_new0(VhostShadowVirtqueue, 1);
> > > +    svq->vq = vq;
> > > +
> > > +    r = event_notifier_init(&svq->hdev_notifier, 0);
> > > +    assert(r == 0);
> > > +
> > > +    file.fd = event_notifier_get_fd(&svq->hdev_notifier);
> > > +    r = dev->vhost_ops->vhost_set_vring_kick(dev, &file);
> > > +    assert(r == 0);
> > > +
> > > +    return svq;
> > > +}
> >
> > I guess there are assumptions about the status of the device? Does the
> > virtqueue need to be disabled when this function is called?
> >
> 
> Yes. Maybe an assertion checking the notification state?

Sounds good.

> > > +
> > > +static int vhost_sw_live_migration_stop(struct vhost_dev *dev)
> > > +{
> > > +    int idx;
> > > +
> > > +    vhost_dev_enable_notifiers(dev, dev->vdev);
> > > +    for (idx = 0; idx < dev->nvqs; ++idx) {
> > > +        vhost_sw_lm_shadow_vq_free(dev->sw_lm_shadow_vq[idx]);
> > > +    }
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +static int vhost_sw_live_migration_start(struct vhost_dev *dev)
> > > +{
> > > +    int idx;
> > > +
> > > +    for (idx = 0; idx < dev->nvqs; ++idx) {
> > > +        dev->sw_lm_shadow_vq[idx] = vhost_sw_lm_shadow_vq(dev, idx);
> > > +    }
> > > +
> > > +    vhost_dev_disable_notifiers(dev, dev->vdev);
> >
> > There is a race condition if the guest kicks the vq while this is
> > happening. The shadow vq hdev_notifier needs to be set so the vhost
> > device checks the virtqueue for requests that slipped in during the
> > race window.
> >
> 
> I'm not sure if I follow you. If I understand correctly,
> vhost_dev_disable_notifiers calls virtio_bus_cleanup_host_notifier,
> and the latter calls virtio_queue_host_notifier_read. That's why the
> documentation says "This might actually run the qemu handlers right
> away, so virtio in qemu must be completely setup when this is
> called.". Am I missing something?

There are at least two cases:

1. Virtqueue kicks that come before vhost_dev_disable_notifiers().
   vhost_dev_disable_notifiers() notices that and calls
   virtio_queue_notify_vq(). Will handle_sw_lm_vq() be invoked or is the
   device's vq handler function invoked?

2. Virtqueue kicks that come in after vhost_dev_disable_notifiers()
   returns. We hold the QEMU global mutex so the vCPU thread cannot
   perform MMIO/PIO dispatch immediately. The vCPU thread's
   ioctl(KVM_RUN) has already returned and will dispatch dispatch the
   MMIO/PIO access inside QEMU as soon as the global mutex is released.
   In other words, we're not taking the kvm.ko ioeventfd path but
   memory_region_dispatch_write_eventfds() should signal the ioeventfd
   that is registered at the time the dispatch occurs. Is that eventfd
   handled by handle_sw_lm_vq()?

Neither of these cases are obvious from the code. At least comments
would help but I suspect restructuring the code so the critical
ioeventfd state changes happen in a sequence would make it even clearer.