All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Jagannathan Raman <jag.raman@oracle.com>
Cc: elena.ufimtseva@oracle.com, john.g.johnson@oracle.com,
	thuth@redhat.com, swapnil.ingle@nutanix.com,
	john.levon@nutanix.com, philmd@redhat.com, qemu-devel@nongnu.org,
	alex.williamson@redhat.com, marcandre.lureau@gmail.com,
	thanos.makatos@nutanix.com, alex.bennee@linaro.org
Subject: Re: [PATCH RFC server v2 10/11] vfio-user: register handlers to facilitate migration
Date: Thu, 9 Sep 2021 09:14:18 +0100	[thread overview]
Message-ID: <YTnCWv5pKEksApTz@stefanha-x1.localdomain> (raw)
In-Reply-To: <1550222ea65ae3dc425dff236f4f36b723ab8597.1630084211.git.jag.raman@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 9492 bytes --]

On Fri, Aug 27, 2021 at 01:53:29PM -0400, Jagannathan Raman wrote:
> +static ssize_t vfu_mig_buf_read(void *opaque, uint8_t *buf, int64_t pos,
> +                                size_t size, Error **errp)
> +{
> +    VfuObject *o = opaque;
> +
> +    if (pos > o->vfu_mig_buf_size) {
> +        size = 0;
> +    } else if ((pos + size) > o->vfu_mig_buf_size) {
> +        size = o->vfu_mig_buf_size;
> +    }
> +
> +    memcpy(buf, (o->vfu_mig_buf + pos), size);
> +
> +    o->vfu_mig_buf_size -= size;

This looks strange. pos increases each time we're called. We seem to be
truncating the buffer on each read. Should this line be dropped? Did you
test live migration (maybe this code needs more debugging)?

> +
> +    return size;
> +}
> +
> +static ssize_t vfu_mig_buf_write(void *opaque, struct iovec *iov, int iovcnt,
> +                                 int64_t pos, Error **errp)
> +{
> +    VfuObject *o = opaque;
> +    uint64_t end = pos + iov_size(iov, iovcnt);
> +    int i;
> +
> +    if (end > o->vfu_mig_buf_size) {
> +        o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
> +    }
> +
> +    for (i = 0; i < iovcnt; i++) {
> +        memcpy((o->vfu_mig_buf + o->vfu_mig_buf_size), iov[i].iov_base,
> +               iov[i].iov_len);
> +        o->vfu_mig_buf_size += iov[i].iov_len;
> +        o->vfu_mig_buf_pending += iov[i].iov_len;
> +    }
> +
> +    return iov_size(iov, iovcnt);
> +}
> +
> +static int vfu_mig_buf_shutdown(void *opaque, bool rd, bool wr, Error **errp)
> +{
> +    VfuObject *o = opaque;
> +
> +    o->vfu_mig_buf_size = 0;
> +
> +    g_free(o->vfu_mig_buf);
> +
> +    return 0;
> +}
> +
> +static const QEMUFileOps vfu_mig_fops_save = {
> +    .writev_buffer  = vfu_mig_buf_write,
> +    .shut_down      = vfu_mig_buf_shutdown,
> +};
> +
> +static const QEMUFileOps vfu_mig_fops_load = {
> +    .get_buffer     = vfu_mig_buf_read,
> +    .shut_down      = vfu_mig_buf_shutdown,
> +};
> +
> +/**
> + * handlers for vfu_migration_callbacks_t
> + *
> + * The libvfio-user library accesses these handlers to drive the migration
> + * at the remote end, and also to transport the data stored in vfu_mig_buf
> + *
> + */
> +static void vfu_mig_state_precopy(vfu_ctx_t *vfu_ctx)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    int ret;
> +
> +    if (!o->vfu_mig_file) {
> +        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_save, false);
> +    }
> +
> +    ret = qemu_remote_savevm(o->vfu_mig_file, DEVICE(o->pci_dev));
> +    if (ret) {
> +        qemu_file_shutdown(o->vfu_mig_file);
> +        return;
> +    }
> +
> +    qemu_fflush(o->vfu_mig_file);
> +}

Are you sure pre-copy is the state where you want to serialize the
savevm data? IIUC pre-copy is the iterative state while the device is
still running (e.g. when copying RAM but before devices are stopped). I
expected savevm to happen when we reach stop-and-copy.

The reason why this matters is that we're saving the state of the device
while the guest is still running and possibly interacting with the
device. The destination won't have the final state of the device, it
will have an earlier state of the device when we started migrating RAM!

Maybe I'm wrong, please double-check, but this looks like a bug.

> +
> +static void vfu_mig_state_running(vfu_ctx_t *vfu_ctx)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(OBJECT(o));
> +    static int migrated_devs;
> +    Error *local_err = NULL;
> +    int ret;
> +
> +    ret = qemu_remote_loadvm(o->vfu_mig_file);
> +    if (ret) {
> +        error_setg(&error_abort, "vfu: failed to restore device state");
> +        return;
> +    }
> +
> +    if (++migrated_devs == k->nr_devs) {
> +        bdrv_invalidate_cache_all(&local_err);
> +        if (local_err) {
> +            error_report_err(local_err);
> +            return;
> +        }
> +
> +        vm_start();
> +    }
> +}

This looks like it's intended for the destination side. Does this code
work on the source side if the device is transitioned back into the
running state?

> +
> +static void vfu_mig_state_stop(vfu_ctx_t *vfu_ctx)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(OBJECT(o));
> +    static int migrated_devs;
> +
> +    /**
> +     * note: calling bdrv_inactivate_all() is not the best approach.
> +     *
> +     *  Ideally, we would identify the block devices (if any) indirectly
> +     *  linked (such as via a scs-hd device) to each of the migrated devices,
> +     *  and inactivate them individually. This is essential while operating
> +     *  the server in a storage daemon mode, with devices from different VMs.
> +     *
> +     *  However, we currently don't have this capability. As such, we need to
> +     *  inactivate all devices at the same time when migration is completed.
> +     */
> +    if (++migrated_devs == k->nr_devs) {
> +        bdrv_inactivate_all();
> +    }
> +}
> +
> +static int vfu_mig_transition(vfu_ctx_t *vfu_ctx, vfu_migr_state_t state)
> +{
> +    switch (state) {
> +    case VFU_MIGR_STATE_RESUME:
> +    case VFU_MIGR_STATE_STOP_AND_COPY:
> +        break;
> +    case VFU_MIGR_STATE_STOP:
> +        vfu_mig_state_stop(vfu_ctx);
> +        break;
> +    case VFU_MIGR_STATE_PRE_COPY:
> +        vfu_mig_state_precopy(vfu_ctx);
> +        break;
> +    case VFU_MIGR_STATE_RUNNING:
> +        if (!runstate_is_running()) {
> +            vfu_mig_state_running(vfu_ctx);
> +        }
> +        break;
> +    default:
> +        warn_report("vfu: Unknown migration state %d", state);
> +    }
> +
> +    return 0;
> +}
> +
> +static uint64_t vfu_mig_get_pending_bytes(vfu_ctx_t *vfu_ctx)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +
> +    return o->vfu_mig_buf_pending;
> +}
> +
> +static int vfu_mig_prepare_data(vfu_ctx_t *vfu_ctx, uint64_t *offset,
> +                                uint64_t *size)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +
> +    if (offset) {
> +        *offset = 0;
> +    }
> +
> +    if (size) {
> +        *size = o->vfu_mig_buf_size;
> +    }
> +
> +    return 0;
> +}
> +
> +static ssize_t vfu_mig_read_data(vfu_ctx_t *vfu_ctx, void *buf,
> +                                 uint64_t size, uint64_t offset)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +
> +    if (offset > o->vfu_mig_buf_size) {
> +        return -1;
> +    }
> +
> +    if ((offset + size) > o->vfu_mig_buf_size) {
> +        warn_report("vfu: buffer overflow - check pending_bytes");
> +        size = o->vfu_mig_buf_size - offset;
> +    }
> +
> +    memcpy(buf, (o->vfu_mig_buf + offset), size);
> +
> +    o->vfu_mig_buf_pending -= size;
> +
> +    return size;
> +}
> +
> +static ssize_t vfu_mig_write_data(vfu_ctx_t *vfu_ctx, void *data,
> +                                  uint64_t size, uint64_t offset)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    uint64_t end = offset + size;
> +
> +    if (end > o->vfu_mig_buf_size) {
> +        o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
> +        o->vfu_mig_buf_size = end;
> +    }
> +
> +    memcpy((o->vfu_mig_buf + offset), data, size);
> +
> +    if (!o->vfu_mig_file) {
> +        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_load, false);
> +    }

Why open the file here where it's not accessed? I expected this to
happen at the point where data has been fully written and we call
qemu_remote_loadvm().

> +
> +    return size;
> +}
> +
> +static int vfu_mig_data_written(vfu_ctx_t *vfu_ctx, uint64_t count)
> +{
> +    return 0;
> +}
> +
> +static const vfu_migration_callbacks_t vfu_mig_cbs = {
> +    .version = VFU_MIGR_CALLBACKS_VERS,
> +    .transition = &vfu_mig_transition,
> +    .get_pending_bytes = &vfu_mig_get_pending_bytes,
> +    .prepare_data = &vfu_mig_prepare_data,
> +    .read_data = &vfu_mig_read_data,
> +    .data_written = &vfu_mig_data_written,
> +    .write_data = &vfu_mig_write_data,
> +};
> +
>  static void vfu_object_ctx_run(void *opaque)
>  {
>      VfuObject *o = opaque;
> @@ -340,6 +615,7 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
>  {
>      VfuObject *o = container_of(notifier, VfuObject, machine_done);
>      DeviceState *dev = NULL;
> +    size_t migr_area_size;
>      QemuThread thread;
>      int ret;
>  
> @@ -401,6 +677,35 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
>          return;
>      }
>  
> +    /*
> +     * TODO: The 0x20000 number used below is a temporary. We are working on
> +     *     a cleaner fix for this.
> +     *
> +     *     The libvfio-user library assumes that the remote knows the size of
> +     *     the data to be migrated at boot time, but that is not the case with
> +     *     VMSDs, as it can contain a variable-size buffer. 0x20000 is used
> +     *     as a sufficiently large buffer to demonstrate migration, but that
> +     *     cannot be used as a solution.
> +     *
> +     */

libvfio-user has the vfu_migration_callbacks_t interface that allows the
device to save/load more data regardless of the size of the migration
region. I don't see the issue here since the region doesn't need to be
sized to fit the savevm data?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2021-09-09  8:17 UTC|newest]

Thread overview: 108+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-16 16:42 [PATCH RFC v2 00/16] vfio-user implementation Elena Ufimtseva
2021-08-16 16:42 ` [PATCH RFC v2 01/16] vfio-user: introduce vfio-user protocol specification Elena Ufimtseva
2021-08-17 23:04   ` Alex Williamson
2021-08-19  9:28     ` Swapnil Ingle
2021-08-19 15:32     ` John Johnson
2021-08-19 16:26       ` Alex Williamson
2021-08-16 16:42 ` [PATCH RFC v2 02/16] vfio-user: add VFIO base abstract class Elena Ufimtseva
2021-08-16 16:42 ` [PATCH RFC v2 03/16] vfio-user: Define type vfio_user_pci_dev_info Elena Ufimtseva
2021-08-24 13:52   ` Stefan Hajnoczi
2021-08-16 16:42 ` [PATCH RFC v2 04/16] vfio-user: connect vfio proxy to remote server Elena Ufimtseva
2021-08-18 18:47   ` Alex Williamson
2021-08-19 14:10     ` John Johnson
2021-08-24 14:15   ` Stefan Hajnoczi
2021-08-30  3:00     ` John Johnson
2021-09-07 13:21       ` Stefan Hajnoczi
2021-09-09  5:11         ` John Johnson
2021-09-09  6:29           ` Stefan Hajnoczi
2021-09-10  5:25             ` John Johnson
2021-09-13 12:35               ` Stefan Hajnoczi
2021-09-13 17:23               ` John Johnson
2021-09-14 13:06                 ` Stefan Hajnoczi
2021-09-15  0:21                   ` John Johnson
2021-09-15 13:04                     ` Stefan Hajnoczi
2021-09-15 19:14                       ` John Johnson
2021-09-16 11:49                         ` Stefan Hajnoczi
2021-08-16 16:42 ` [PATCH RFC v2 05/16] vfio-user: define VFIO Proxy and communication functions Elena Ufimtseva
2021-08-24 15:14   ` Stefan Hajnoczi
2021-08-30  3:04     ` John Johnson
2021-09-07 13:35       ` Stefan Hajnoczi
2021-08-16 16:42 ` [PATCH RFC v2 06/16] vfio-user: negotiate version with remote server Elena Ufimtseva
2021-08-24 15:59   ` Stefan Hajnoczi
2021-08-30  3:08     ` John Johnson
2021-09-07 13:52       ` Stefan Hajnoczi
2021-08-16 16:42 ` [PATCH RFC v2 07/16] vfio-user: get device info Elena Ufimtseva
2021-08-24 16:04   ` Stefan Hajnoczi
2021-08-30  3:11     ` John Johnson
2021-09-07 13:54       ` Stefan Hajnoczi
2021-08-16 16:42 ` [PATCH RFC v2 08/16] vfio-user: get region info Elena Ufimtseva
2021-09-07 14:31   ` Stefan Hajnoczi
2021-09-09  5:35     ` John Johnson
2021-09-09  5:59       ` Stefan Hajnoczi
2021-08-16 16:42 ` [PATCH RFC v2 09/16] vfio-user: region read/write Elena Ufimtseva
2021-09-07 14:41   ` Stefan Hajnoczi
2021-09-07 17:24   ` John Levon
2021-09-09  6:00     ` John Johnson
2021-09-09 12:05       ` John Levon
2021-09-10  6:07         ` John Johnson
2021-09-10 12:16           ` John Levon
2021-08-16 16:42 ` [PATCH RFC v2 10/16] vfio-user: pci_user_realize PCI setup Elena Ufimtseva
2021-09-07 15:00   ` Stefan Hajnoczi
2021-08-16 16:42 ` [PATCH RFC v2 11/16] vfio-user: get and set IRQs Elena Ufimtseva
2021-09-07 15:14   ` Stefan Hajnoczi
2021-09-09  5:50     ` John Johnson
2021-09-09 13:50       ` Stefan Hajnoczi
2021-08-16 16:42 ` [PATCH RFC v2 12/16] vfio-user: proxy container connect/disconnect Elena Ufimtseva
2021-09-08  8:30   ` Stefan Hajnoczi
2021-08-16 16:42 ` [PATCH RFC v2 13/16] vfio-user: dma map/unmap operations Elena Ufimtseva
2021-09-08  9:16   ` Stefan Hajnoczi
2021-08-16 16:42 ` [PATCH RFC v2 14/16] vfio-user: dma read/write operations Elena Ufimtseva
2021-09-08  9:51   ` Stefan Hajnoczi
2021-09-08 11:03     ` John Levon
2021-08-16 16:42 ` [PATCH RFC v2 15/16] vfio-user: pci reset Elena Ufimtseva
2021-09-08  9:56   ` Stefan Hajnoczi
2021-08-16 16:42 ` [PATCH RFC v2 16/16] vfio-user: migration support Elena Ufimtseva
2021-09-08 10:04   ` Stefan Hajnoczi
2021-08-27 17:53 ` [PATCH RFC server v2 00/11] vfio-user server in QEMU Jagannathan Raman
2021-08-27 17:53   ` [PATCH RFC server v2 01/11] vfio-user: build library Jagannathan Raman
2021-08-27 18:05     ` Jag Raman
2021-09-08 12:25     ` Stefan Hajnoczi
2021-09-10 15:21       ` Philippe Mathieu-Daudé
2021-09-13 12:15         ` Stefan Hajnoczi
2021-09-10 15:20     ` Philippe Mathieu-Daudé
2021-09-10 17:08       ` Jag Raman
2021-09-11 22:29       ` John Levon
2021-09-13 10:19         ` Philippe Mathieu-Daudé
2021-08-27 17:53   ` [PATCH RFC server v2 02/11] vfio-user: define vfio-user object Jagannathan Raman
2021-09-08 12:37     ` Stefan Hajnoczi
2021-09-10 14:04       ` Jag Raman
2021-08-27 17:53   ` [PATCH RFC server v2 03/11] vfio-user: instantiate vfio-user context Jagannathan Raman
2021-09-08 12:40     ` Stefan Hajnoczi
2021-09-10 14:58       ` Jag Raman
2021-08-27 17:53   ` [PATCH RFC server v2 04/11] vfio-user: find and init PCI device Jagannathan Raman
2021-09-08 12:43     ` Stefan Hajnoczi
2021-09-10 15:02       ` Jag Raman
2021-08-27 17:53   ` [PATCH RFC server v2 05/11] vfio-user: run vfio-user context Jagannathan Raman
2021-09-08 12:58     ` Stefan Hajnoczi
2021-09-08 13:37       ` John Levon
2021-09-08 15:02         ` Stefan Hajnoczi
2021-09-08 15:21           ` John Levon
2021-09-08 15:46             ` Stefan Hajnoczi
2021-08-27 17:53   ` [PATCH RFC server v2 06/11] vfio-user: handle PCI config space accesses Jagannathan Raman
2021-09-09  7:27     ` Stefan Hajnoczi
2021-09-10 16:22       ` Jag Raman
2021-09-13 12:13         ` Stefan Hajnoczi
2021-08-27 17:53   ` [PATCH RFC server v2 07/11] vfio-user: handle DMA mappings Jagannathan Raman
2021-09-09  7:29     ` Stefan Hajnoczi
2021-08-27 17:53   ` [PATCH RFC server v2 08/11] vfio-user: handle PCI BAR accesses Jagannathan Raman
2021-09-09  7:37     ` Stefan Hajnoczi
2021-09-10 16:36       ` Jag Raman
2021-08-27 17:53   ` [PATCH RFC server v2 09/11] vfio-user: handle device interrupts Jagannathan Raman
2021-09-09  7:40     ` Stefan Hajnoczi
2021-08-27 17:53   ` [PATCH RFC server v2 10/11] vfio-user: register handlers to facilitate migration Jagannathan Raman
2021-09-09  8:14     ` Stefan Hajnoczi [this message]
2021-08-27 17:53   ` [PATCH RFC server v2 11/11] vfio-user: acceptance test Jagannathan Raman
2021-09-08 10:08   ` [PATCH RFC server v2 00/11] vfio-user server in QEMU Stefan Hajnoczi
2021-09-08 12:06     ` Jag Raman
2021-09-09  8:17   ` Stefan Hajnoczi
2021-09-10 14:02     ` Jag Raman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YTnCWv5pKEksApTz@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=alex.williamson@redhat.com \
    --cc=elena.ufimtseva@oracle.com \
    --cc=jag.raman@oracle.com \
    --cc=john.g.johnson@oracle.com \
    --cc=john.levon@nutanix.com \
    --cc=marcandre.lureau@gmail.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=swapnil.ingle@nutanix.com \
    --cc=thanos.makatos@nutanix.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.