On Mon, Oct 11, 2021 at 01:31:16AM -0400, Jagannathan Raman wrote:
> +static void vfu_mig_state_running(vfu_ctx_t *vfu_ctx)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(OBJECT(o));
> +    static int migrated_devs;
> +    Error *local_err = NULL;
> +    int ret;
> +
> +    /**
> +     * TODO: move to VFU_MIGR_STATE_RESUME handler. Presently, the
> +     * VMSD data from source is not available at RESUME state.
> +     * Working on a fix for this.
> +     */
> +    if (!o->vfu_mig_file) {
> +        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_load, false);
> +    }
> +
> +    ret = qemu_remote_loadvm(o->vfu_mig_file);
> +    if (ret) {
> +        error_setg(&error_abort, "vfu: failed to restore device state");
> +        return;
> +    }
> +
> +    qemu_file_shutdown(o->vfu_mig_file);
> +    o->vfu_mig_file = NULL;
> +
> +    /* VFU_MIGR_STATE_RUNNING begins here */
> +    if (++migrated_devs == k->nr_devs) {

See below about migrated_devs.

> +        bdrv_invalidate_cache_all(&local_err);
> +        if (local_err) {
> +            error_report_err(local_err);
> +            return;
> +        }
> +
> +        vm_start();
> +    }
> +}
> +
> +static void vfu_mig_state_stop(vfu_ctx_t *vfu_ctx)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(OBJECT(o));
> +    static int migrated_devs;
> +
> +    /**
> +     * note: calling bdrv_inactivate_all() is not the best approach.
> +     *
> +     *  Ideally, we would identify the block devices (if any) indirectly
> +     *  linked (such as via a scs-hd device) to each of the migrated devices,

s/scs/scsi/

> +     *  and inactivate them individually. This is essential while operating
> +     *  the server in a storage daemon mode, with devices from different VMs.
> +     *
> +     *  However, we currently don't have this capability. As such, we need to
> +     *  inactivate all devices at the same time when migration is completed.
> +     */
> +    if (++migrated_devs == k->nr_devs) {
> +        bdrv_inactivate_all();
> +        vm_stop(RUN_STATE_PAUSED);

The order of these two functions is reversed in migration/migration.c.
First we pause the VM, then we inactivate disks.

I think we need to zero migrated_devs in case migration fails and we try
to migrate again later:

  migrated_devs = 0;

This is still not quite right because maybe only a few VfuObjects are
stopped before migration fails. A different approach for counting
devices is necessary, like zeroing migrated_devs in
vfu_mig_state_stop_and_copy().

> @@ -422,6 +722,35 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
>          return;
>      }
>  
> +    /*
> +     * TODO: The 0x20000 number used below is a temporary. We are working on
> +     *     a cleaner fix for this.
> +     *
> +     *     The libvfio-user library assumes that the remote knows the size of
> +     *     the data to be migrated at boot time, but that is not the case with
> +     *     VMSDs, as it can contain a variable-size buffer. 0x20000 is used
> +     *     as a sufficiently large buffer to demonstrate migration, but that
> +     *     cannot be used as a solution.
> +     *
> +     */

My question from the previous revision was not answered:

  libvfio-user has the vfu_migration_callbacks_t interface that allows the
  device to save/load more data regardless of the size of the migration
  region. I don't see the issue here since the region doesn't need to be
  sized to fit the savevm data?