All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thanos Makatos <thanos.makatos@nutanix.com>
To: Stefan Hajnoczi <stefanha@redhat.com>, Jag Raman <jag.raman@oracle.com>
Cc: "eduardo@habkost.net" <eduardo@habkost.net>,
	"Elena Ufimtseva" <elena.ufimtseva@oracle.com>,
	"John Johnson" <john.g.johnson@oracle.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Beraldo Leal" <bleal@redhat.com>,
	"John Levon" <john.levon@nutanix.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Philippe Mathieu-Daudé" <f4bug@amsat.org>,
	qemu-devel <qemu-devel@nongnu.org>,
	"Juan Quintela" <quintela@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@gmail.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>
Subject: RE: [PATCH v5 17/18] vfio-user: register handlers to facilitate migration
Date: Fri, 28 Jan 2022 14:49:15 +0000	[thread overview]
Message-ID: <DM8PR02MB80054AB856989FA5434962738B229@DM8PR02MB8005.namprd02.prod.outlook.com> (raw)
In-Reply-To: <YfOpZmI4GM6oGhGH@stefanha-x1.localdomain>



> -----Original Message-----
> From: Stefan Hajnoczi <stefanha@redhat.com>
> Sent: 28 January 2022 08:29
> To: Jag Raman <jag.raman@oracle.com>
> Cc: John Levon <john.levon@nutanix.com>; Thanos Makatos
> <thanos.makatos@nutanix.com>; qemu-devel <qemu-devel@nongnu.org>;
> Marc-André Lureau <marcandre.lureau@gmail.com>; Philippe Mathieu-Daudé
> <f4bug@amsat.org>; Paolo Bonzini <pbonzini@redhat.com>; Beraldo Leal
> <bleal@redhat.com>; Daniel P. Berrangé <berrange@redhat.com>;
> eduardo@habkost.net; Michael S. Tsirkin <mst@redhat.com>; Marcel
> Apfelbaum <marcel.apfelbaum@gmail.com>; Eric Blake <eblake@redhat.com>;
> Markus Armbruster <armbru@redhat.com>; Juan Quintela
> <quintela@redhat.com>; Dr . David Alan Gilbert <dgilbert@redhat.com>; Elena
> Ufimtseva <elena.ufimtseva@oracle.com>; John Johnson
> <john.g.johnson@oracle.com>
> Subject: Re: [PATCH v5 17/18] vfio-user: register handlers to facilitate migration
> 
> On Thu, Jan 27, 2022 at 05:04:26PM +0000, Jag Raman wrote:
> >
> >
> > > On Jan 25, 2022, at 10:48 AM, Stefan Hajnoczi <stefanha@redhat.com>
> wrote:
> > >
> > > On Wed, Jan 19, 2022 at 04:42:06PM -0500, Jagannathan Raman wrote:
> > >> +     * The client subsequetly asks the remote server for any data that
> > >
> > > subsequently
> > >
> > >> +static void vfu_mig_state_running(vfu_ctx_t *vfu_ctx)
> > >> +{
> > >> +    VfuObject *o = vfu_get_private(vfu_ctx);
> > >> +    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(OBJECT(o));
> > >> +    static int migrated_devs;
> > >> +    Error *local_err = NULL;
> > >> +    int ret;
> > >> +
> > >> +    /**
> > >> +     * TODO: move to VFU_MIGR_STATE_RESUME handler. Presently, the
> > >> +     * VMSD data from source is not available at RESUME state.
> > >> +     * Working on a fix for this.
> > >> +     */
> > >> +    if (!o->vfu_mig_file) {
> > >> +        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_load, false);
> > >> +    }
> > >> +
> > >> +    ret = qemu_remote_loadvm(o->vfu_mig_file);
> > >> +    if (ret) {
> > >> +        VFU_OBJECT_ERROR(o, "vfu: failed to restore device state");
> > >> +        return;
> > >> +    }
> > >> +
> > >> +    qemu_file_shutdown(o->vfu_mig_file);
> > >> +    o->vfu_mig_file = NULL;
> > >> +
> > >> +    /* VFU_MIGR_STATE_RUNNING begins here */
> > >> +    if (++migrated_devs == k->nr_devs) {
> > >
> > > When is this counter reset so migration can be tried again if it
> > > fails/cancels?
> >
> > Detecting cancellation is a pending item. We will address it in the
> > next rev. Will check with you if  we get stuck during the process
> > of implementing it.
> >
> > >
> > >> +static ssize_t vfu_mig_read_data(vfu_ctx_t *vfu_ctx, void *buf,
> > >> +                                 uint64_t size, uint64_t offset)
> > >> +{
> > >> +    VfuObject *o = vfu_get_private(vfu_ctx);
> > >> +
> > >> +    if (offset > o->vfu_mig_buf_size) {
> > >> +        return -1;
> > >> +    }
> > >> +
> > >> +    if ((offset + size) > o->vfu_mig_buf_size) {
> > >> +        warn_report("vfu: buffer overflow - check pending_bytes");
> > >> +        size = o->vfu_mig_buf_size - offset;
> > >> +    }
> > >> +
> > >> +    memcpy(buf, (o->vfu_mig_buf + offset), size);
> > >> +
> > >> +    o->vfu_mig_buf_pending -= size;
> > >
> > > This assumes that the caller increments offset by size each time. If
> > > that assumption is okay, then we can just trust offset and don't need to
> > > do arithmetic on vfu_mig_buf_pending. If that assumption is not correct,
> > > then the code needs to be extended to safely update vfu_mig_buf_pending
> > > when offset jumps around arbitrarily between calls.
> >
> > Going by the definition of vfu_migration_callbacks_t in the library, I assumed
> > that read_data advances the offset by size bytes.
> >
> > Will add a comment a comment to explain that.

libvfio-user does not automatically increment offset by size each time, since
the vfio-user client can re-read the migration data multiple times. In
libvfio-user API we state:

    Function that is called to read migration data. offset and size can be
    any subrange on the offset and size previously returned by prepare_data.

Reading the pending_bytes register is what marks the end of the iteration, and
this is where you need to decrement vfu_mig_buf_pending.

I'll add more unit tests to libvfio-user to validate this behavior.

> >
> > >
> > >> +uint64_t vmstate_vmsd_size(PCIDevice *pci_dev)
> > >> +{
> > >> +    DeviceClass *dc = DEVICE_GET_CLASS(DEVICE(pci_dev));
> > >> +    const VMStateField *field = NULL;
> > >> +    uint64_t size = 0;
> > >> +
> > >> +    if (!dc->vmsd) {
> > >> +        return 0;
> > >> +    }
> > >> +
> > >> +    field = dc->vmsd->fields;
> > >> +    while (field && field->name) {
> > >> +        size += vmstate_size(pci_dev, field);
> > >> +        field++;
> > >> +    }
> > >> +
> > >> +    return size;
> > >> +}
> > >
> > > This function looks incorrect because it ignores subsections as well as
> > > runtime behavior during save(). Although VMStateDescription is partially
> > > declarative, there is still a bunch of imperative code that can write to
> > > the QEMUFile at save() time so there's no way of knowing the size ahead
> > > of time.
> >
> > I see your point, it would be a problem for any field which has the
> > (VMS_BUFFER | VMS_ALLOC) flags set.
> >
> > >
> > > I asked this in a previous revision of this series but I'm not sure if
> > > it was answered: is it really necessary to know the size of the vmstate?
> > > I thought the VFIO migration interface is designed to support
> > > streaming reads/writes. We could choose a fixed size like 64KB and
> > > stream the vmstate in 64KB chunks.
> >
> > The library exposes the migration data to the client as a device BAR with
> > fixed size - the size of which is fixed at boot time, even when using
> > vfu_migration_callbacks_t callbacks.
> >
> > I don’t believe the library supports streaming vmstate/migration-data - see
> > the following comment in migration_region_access() defined in the library:
> >
> > * Does this mean that partial reads are not allowed?
> >
> > Thanos or John,
> >
> >     Could you please clarify this?

libvfio-user does support streaming of migration data, this comment is based on
the VFIO documentation:

    d. Read data_size bytes of data from (region + data_offset) from the
        migration region.

It's not clear to me whether streaming should be allowed, I'd be surprised if
it didn't.

> >
> > Stefan,
> >     We attempted to answer the migration cancellation and vmstate size
> >     questions previously also, in the following email:
> >
> > https://lore.kernel.org/all/F48606B1-15A4-4DD2-9D71-
> 2FCAFC0E671F@oracle.com/
> 
> >  libvfio-user has the vfu_migration_callbacks_t interface that allows the
> >  device to save/load more data regardless of the size of the migration
> >  region. I don't see the issue here since the region doesn't need to be
> >  sized to fit the savevm data?
> 
> The answer didn't make sense to me:
> 
> "In both scenarios at the server end - whether using the migration BAR or
> using callbacks, the migration data is transported to the other end using
> the BAR. As such we need to specify the BAR’s size during initialization.
> 
> In the case of the callbacks, the library translates the BAR access to callbacks."
> 
> The BAR and the migration region within it need a size but my
> understanding is that VFIO migration is designed to stream the device
> state, allowing it to be broken up into multiple reads/writes with
> knowing the device state's size upfront. Here is the description from
> <linux/vfio.h>:
> 
>   * The sequence to be followed while in pre-copy state and stop-and-copy state
>   * is as follows:
>   * a. Read pending_bytes, indicating the start of a new iteration to get device
>   *    data. Repeated read on pending_bytes at this stage should have no side
>   *    effects.
>   *    If pending_bytes == 0, the user application should not iterate to get data
>   *    for that device.
>   *    If pending_bytes > 0, perform the following steps.
>   * b. Read data_offset, indicating that the vendor driver should make data
>   *    available through the data section. The vendor driver should return this
>   *    read operation only after data is available from (region + data_offset)
>   *    to (region + data_offset + data_size).
>   * c. Read data_size, which is the amount of data in bytes available through
>   *    the migration region.
>   *    Read on data_offset and data_size should return the offset and size of
>   *    the current buffer if the user application reads data_offset and
>   *    data_size more than once here.
>   * d. Read data_size bytes of data from (region + data_offset) from the
>   *    migration region.
>   * e. Process the data.
>   * f. Read pending_bytes, which indicates that the data from the previous
>   *    iteration has been read. If pending_bytes > 0, go to step b.
>   *
>   * The user application can transition from the _SAVING|_RUNNING
>   * (pre-copy state) to the _SAVING (stop-and-copy) state regardless of the
>   * number of pending bytes. The user application should iterate in _SAVING
>   * (stop-and-copy) until pending_bytes is 0.
> 
> This means you can report pending_bytes > 0 until the entire vmstate has
> been read and can pick a fixed chunk size like 64KB for the migration
> region. There's no need to size the migration region to fit the entire
> vmstate.
> 
> Stefan

  reply	other threads:[~2022-01-28 15:02 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-19 21:41 [PATCH v5 00/18] vfio-user server in QEMU Jagannathan Raman
2022-01-19 21:41 ` [PATCH v5 01/18] configure, meson: override C compiler for cmake Jagannathan Raman
2022-01-20 13:27   ` Paolo Bonzini
2022-01-20 15:21     ` Jag Raman
2022-02-17  6:10     ` Jag Raman
2022-01-19 21:41 ` [PATCH v5 02/18] tests/avocado: Specify target VM argument to helper routines Jagannathan Raman
2022-01-25  9:40   ` Stefan Hajnoczi
2022-01-19 21:41 ` [PATCH v5 03/18] pci: isolated address space for PCI bus Jagannathan Raman
2022-01-20  0:12   ` Michael S. Tsirkin
2022-01-20 15:20     ` Jag Raman
2022-01-25 18:38       ` Dr. David Alan Gilbert
2022-01-26  5:27         ` Jag Raman
2022-01-26  9:45           ` Stefan Hajnoczi
2022-01-26 20:07             ` Dr. David Alan Gilbert
2022-01-26 21:13               ` Michael S. Tsirkin
2022-01-27  8:30                 ` Stefan Hajnoczi
2022-01-27 12:50                   ` Michael S. Tsirkin
2022-01-27 21:22                   ` Alex Williamson
2022-01-28  8:19                     ` Stefan Hajnoczi
2022-01-28  9:18                     ` Stefan Hajnoczi
2022-01-31 16:16                       ` Alex Williamson
2022-02-01  9:30                         ` Stefan Hajnoczi
2022-02-01 15:24                           ` Alex Williamson
2022-02-01 21:24                             ` Jag Raman
2022-02-01 22:47                               ` Alex Williamson
2022-02-02  1:13                                 ` Jag Raman
2022-02-02  5:34                                   ` Alex Williamson
2022-02-02  9:22                                     ` Stefan Hajnoczi
2022-02-10  0:08                                     ` Jag Raman
2022-02-10  8:02                                       ` Michael S. Tsirkin
2022-02-10 22:23                                         ` Jag Raman
2022-02-10 22:53                                           ` Michael S. Tsirkin
2022-02-10 23:46                                             ` Jag Raman
2022-02-10 23:17                                           ` Alex Williamson
2022-02-10 23:28                                             ` Michael S. Tsirkin
2022-02-10 23:49                                               ` Alex Williamson
2022-02-11  0:26                                                 ` Michael S. Tsirkin
2022-02-11  0:54                                                   ` Jag Raman
2022-02-11  0:10                                             ` Jag Raman
2022-02-02  9:30                                 ` Peter Maydell
2022-02-02 10:06                                   ` Michael S. Tsirkin
2022-02-02 15:49                                     ` Alex Williamson
2022-02-02 16:53                                       ` Michael S. Tsirkin
2022-02-02 17:12                                   ` Alex Williamson
2022-02-01 10:42                     ` Dr. David Alan Gilbert
2022-01-26 18:13           ` Dr. David Alan Gilbert
2022-01-27 17:43             ` Jag Raman
2022-01-25  9:56   ` Stefan Hajnoczi
2022-01-25 13:49     ` Jag Raman
2022-01-25 14:19       ` Stefan Hajnoczi
2022-01-19 21:41 ` [PATCH v5 04/18] pci: create and free isolated PCI buses Jagannathan Raman
2022-01-25 10:25   ` Stefan Hajnoczi
2022-01-25 14:10     ` Jag Raman
2022-01-19 21:41 ` [PATCH v5 05/18] qdev: unplug blocker for devices Jagannathan Raman
2022-01-25 10:27   ` Stefan Hajnoczi
2022-01-25 14:43     ` Jag Raman
2022-01-26  9:32       ` Stefan Hajnoczi
2022-01-26 15:13         ` Jag Raman
2022-01-19 21:41 ` [PATCH v5 06/18] vfio-user: add HotplugHandler for remote machine Jagannathan Raman
2022-01-25 10:32   ` Stefan Hajnoczi
2022-01-25 18:12     ` Jag Raman
2022-01-26  9:35       ` Stefan Hajnoczi
2022-01-26 15:20         ` Jag Raman
2022-01-26 15:43           ` Stefan Hajnoczi
2022-01-19 21:41 ` [PATCH v5 07/18] vfio-user: set qdev bus callbacks " Jagannathan Raman
2022-01-25 10:44   ` Stefan Hajnoczi
2022-01-25 21:12     ` Jag Raman
2022-01-26  9:37       ` Stefan Hajnoczi
2022-01-26 15:51         ` Jag Raman
2022-01-19 21:41 ` [PATCH v5 08/18] vfio-user: build library Jagannathan Raman
2022-01-19 21:41 ` [PATCH v5 09/18] vfio-user: define vfio-user-server object Jagannathan Raman
2022-01-25 14:40   ` Stefan Hajnoczi
2022-01-19 21:41 ` [PATCH v5 10/18] vfio-user: instantiate vfio-user context Jagannathan Raman
2022-01-25 14:44   ` Stefan Hajnoczi
2022-01-19 21:42 ` [PATCH v5 11/18] vfio-user: find and init PCI device Jagannathan Raman
2022-01-25 14:48   ` Stefan Hajnoczi
2022-01-26  3:14     ` Jag Raman
2022-01-19 21:42 ` [PATCH v5 12/18] vfio-user: run vfio-user context Jagannathan Raman
2022-01-25 15:10   ` Stefan Hajnoczi
2022-01-26  3:26     ` Jag Raman
2022-01-19 21:42 ` [PATCH v5 13/18] vfio-user: handle PCI config space accesses Jagannathan Raman
2022-01-25 15:13   ` Stefan Hajnoczi
2022-01-19 21:42 ` [PATCH v5 14/18] vfio-user: handle DMA mappings Jagannathan Raman
2022-01-19 21:42 ` [PATCH v5 15/18] vfio-user: handle PCI BAR accesses Jagannathan Raman
2022-01-19 21:42 ` [PATCH v5 16/18] vfio-user: handle device interrupts Jagannathan Raman
2022-01-25 15:25   ` Stefan Hajnoczi
2022-01-19 21:42 ` [PATCH v5 17/18] vfio-user: register handlers to facilitate migration Jagannathan Raman
2022-01-25 15:48   ` Stefan Hajnoczi
2022-01-27 17:04     ` Jag Raman
2022-01-28  8:29       ` Stefan Hajnoczi
2022-01-28 14:49         ` Thanos Makatos [this message]
2022-02-01  3:49         ` Jag Raman
2022-02-01  9:37           ` Stefan Hajnoczi
2022-01-19 21:42 ` [PATCH v5 18/18] vfio-user: avocado tests for vfio-user Jagannathan Raman
2022-01-26  4:25   ` Philippe Mathieu-Daudé via
2022-01-26 15:12     ` Jag Raman
2022-01-25 16:00 ` [PATCH v5 00/18] vfio-user server in QEMU Stefan Hajnoczi
2022-01-26  5:04   ` Jag Raman
2022-01-26  9:56     ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM8PR02MB80054AB856989FA5434962738B229@DM8PR02MB8005.namprd02.prod.outlook.com \
    --to=thanos.makatos@nutanix.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=bleal@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eblake@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=elena.ufimtseva@oracle.com \
    --cc=f4bug@amsat.org \
    --cc=jag.raman@oracle.com \
    --cc=john.g.johnson@oracle.com \
    --cc=john.levon@nutanix.com \
    --cc=marcandre.lureau@gmail.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.