QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Zhengxiao.zx@alibaba-inc.com, kevin.tian@intel.com,
	yi.l.liu@intel.com, cjia@nvidia.com, eskultet@redhat.com,
	ziye.yang@intel.com, cohuck@redhat.com,
	shuangtai.tst@alibaba-inc.com, qemu-devel@nongnu.org,
	zhi.a.wang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com,
	aik@ozlabs.ru, Kirti Wankhede <kwankhede@nvidia.com>,
	eauger@redhat.com, felipe@nutanix.com,
	jonathan.davies@nutanix.com, yan.y.zhao@intel.com,
	changpeng.liu@intel.com, Ken.Xue@amd.com
Subject: Re: [PATCH v16 QEMU 04/16] vfio: Add save and load functions for VFIO PCI devices
Date: Thu, 26 Mar 2020 11:38:48 -0600
Message-ID: <20200326113848.39b6e2f4@x1.home> (raw)
In-Reply-To: <20200326172926.GF2713@work-vm>

On Thu, 26 Mar 2020 17:29:26 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Alex Williamson (alex.williamson@redhat.com) wrote:
> > On Wed, 25 Mar 2020 02:39:02 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >   
> > > These functions save and restore PCI device specific data - config
> > > space of PCI device.
> > > Tested save and restore with MSI and MSIX type.
> > > 
> > > Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
> > > Reviewed-by: Neo Jia <cjia@nvidia.com>
> > > ---
> > >  hw/vfio/pci.c                 | 163 ++++++++++++++++++++++++++++++++++++++++++
> > >  include/hw/vfio/vfio-common.h |   2 +
> > >  2 files changed, 165 insertions(+)
> > > 
> > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > index 6c77c12e44b9..8deb11e87ef7 100644
> > > --- a/hw/vfio/pci.c
> > > +++ b/hw/vfio/pci.c
> > > @@ -41,6 +41,7 @@
> > >  #include "trace.h"
> > >  #include "qapi/error.h"
> > >  #include "migration/blocker.h"
> > > +#include "migration/qemu-file.h"
> > >  
> > >  #define TYPE_VFIO_PCI "vfio-pci"
> > >  #define PCI_VFIO(obj)    OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI)
> > > @@ -1632,6 +1633,50 @@ static void vfio_bars_prepare(VFIOPCIDevice *vdev)
> > >      }
> > >  }
> > >  
> > > +static int vfio_bar_validate(VFIOPCIDevice *vdev, int nr)
> > > +{
> > > +    PCIDevice *pdev = &vdev->pdev;
> > > +    VFIOBAR *bar = &vdev->bars[nr];
> > > +    uint64_t addr;
> > > +    uint32_t addr_lo, addr_hi = 0;
> > > +
> > > +    /* Skip unimplemented BARs and the upper half of 64bit BARS. */
> > > +    if (!bar->size) {
> > > +        return 0;
> > > +    }
> > > +
> > > +    addr_lo = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + nr * 4, 4);
> > > +
> > > +    addr_lo = addr_lo & (bar->ioport ? PCI_BASE_ADDRESS_IO_MASK :
> > > +                                       PCI_BASE_ADDRESS_MEM_MASK);  
> > 
> > Nit, &= or combine with previous set.
> >   
> > > +    if (bar->type == PCI_BASE_ADDRESS_MEM_TYPE_64) {
> > > +        addr_hi = pci_default_read_config(pdev,
> > > +                                         PCI_BASE_ADDRESS_0 + (nr + 1) * 4, 4);
> > > +    }
> > > +
> > > +    addr = ((uint64_t)addr_hi << 32) | addr_lo;  
> > 
> > Could we use a union?
> >   
> > > +
> > > +    if (!QEMU_IS_ALIGNED(addr, bar->size)) {
> > > +        return -EINVAL;
> > > +    }  
> > 
> > What specifically are we validating here?  This should be true no
> > matter what we wrote to the BAR or else BAR emulation is broken.  The
> > bits that could make this unaligned are not implemented in the BAR.  
> 
> That I think is based on a comment I asked a few versions back.
> Remember the value being checked here is a value loaded from the
> migration stream; it could be garbage, so it's good to do whatever
> checks you can.

It's not the migration stream though, we're reading it from config
space emulation.  The migration stream could have written absolutely
anything to the device BAR and this test should still be ok.  PCI BARs
are naturally aligned by definition.  The address bits that could make
the value unaligned are not implemented.  This is why we can determine
the size of the BAR by writing -1 to it.  Thanks,

Alex

> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +static int vfio_bars_validate(VFIOPCIDevice *vdev)
> > > +{
> > > +    int i, ret;
> > > +
> > > +    for (i = 0; i < PCI_ROM_SLOT; i++) {
> > > +        ret = vfio_bar_validate(vdev, i);
> > > +        if (ret) {
> > > +            error_report("vfio: BAR address %d validation failed", i);
> > > +            return ret;
> > > +        }
> > > +    }
> > > +    return 0;
> > > +}
> > > +
> > >  static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
> > >  {
> > >      VFIOBAR *bar = &vdev->bars[nr];
> > > @@ -2414,11 +2459,129 @@ static Object *vfio_pci_get_object(VFIODevice *vbasedev)
> > >      return OBJECT(vdev);
> > >  }
> > >  
> > > +static void vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f)
> > > +{
> > > +    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
> > > +    PCIDevice *pdev = &vdev->pdev;
> > > +    uint16_t pci_cmd;
> > > +    int i;
> > > +
> > > +    for (i = 0; i < PCI_ROM_SLOT; i++) {
> > > +        uint32_t bar;
> > > +
> > > +        bar = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, 4);
> > > +        qemu_put_be32(f, bar);
> > > +    }
> > > +
> > > +    qemu_put_be32(f, vdev->interrupt);
> > > +    if (vdev->interrupt == VFIO_INT_MSI) {
> > > +        uint32_t msi_flags, msi_addr_lo, msi_addr_hi = 0, msi_data;
> > > +        bool msi_64bit;
> > > +
> > > +        msi_flags = pci_default_read_config(pdev, pdev->msi_cap + PCI_MSI_FLAGS,
> > > +                                            2);
> > > +        msi_64bit = (msi_flags & PCI_MSI_FLAGS_64BIT);
> > > +
> > > +        msi_addr_lo = pci_default_read_config(pdev,
> > > +                                         pdev->msi_cap + PCI_MSI_ADDRESS_LO, 4);
> > > +        qemu_put_be32(f, msi_addr_lo);
> > > +
> > > +        if (msi_64bit) {
> > > +            msi_addr_hi = pci_default_read_config(pdev,
> > > +                                             pdev->msi_cap + PCI_MSI_ADDRESS_HI,
> > > +                                             4);
> > > +        }
> > > +        qemu_put_be32(f, msi_addr_hi);
> > > +
> > > +        msi_data = pci_default_read_config(pdev,
> > > +                pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : PCI_MSI_DATA_32),
> > > +                2);
> > > +        qemu_put_be32(f, msi_data);  
> > 
> > Isn't the data field only a u16?
> >   
> > > +    } else if (vdev->interrupt == VFIO_INT_MSIX) {
> > > +        uint16_t offset;
> > > +
> > > +        /* save enable bit and maskall bit */
> > > +        offset = pci_default_read_config(pdev,
> > > +                                       pdev->msix_cap + PCI_MSIX_FLAGS + 1, 2);
> > > +        qemu_put_be16(f, offset);
> > > +        msix_save(pdev, f);
> > > +    }
> > > +    pci_cmd = pci_default_read_config(pdev, PCI_COMMAND, 2);
> > > +    qemu_put_be16(f, pci_cmd);
> > > +}
> > > +
> > > +static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
> > > +{
> > > +    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
> > > +    PCIDevice *pdev = &vdev->pdev;
> > > +    uint32_t interrupt_type;
> > > +    uint32_t msi_flags, msi_addr_lo, msi_addr_hi = 0, msi_data;
> > > +    uint16_t pci_cmd;
> > > +    bool msi_64bit;
> > > +    int i, ret;
> > > +
> > > +    /* retore pci bar configuration */
> > > +    pci_cmd = pci_default_read_config(pdev, PCI_COMMAND, 2);
> > > +    vfio_pci_write_config(pdev, PCI_COMMAND,
> > > +                        pci_cmd & (!(PCI_COMMAND_IO | PCI_COMMAND_MEMORY)), 2);
> > > +    for (i = 0; i < PCI_ROM_SLOT; i++) {
> > > +        uint32_t bar = qemu_get_be32(f);
> > > +
> > > +        vfio_pci_write_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, bar, 4);
> > > +    }
> > > +
> > > +    ret = vfio_bars_validate(vdev);
> > > +    if (ret) {
> > > +        return ret;
> > > +    }
> > > +
> > > +    interrupt_type = qemu_get_be32(f);
> > > +
> > > +    if (interrupt_type == VFIO_INT_MSI) {
> > > +        /* restore msi configuration */
> > > +        msi_flags = pci_default_read_config(pdev,
> > > +                                            pdev->msi_cap + PCI_MSI_FLAGS, 2);
> > > +        msi_64bit = (msi_flags & PCI_MSI_FLAGS_64BIT);
> > > +
> > > +        vfio_pci_write_config(pdev, pdev->msi_cap + PCI_MSI_FLAGS,
> > > +                              msi_flags & (!PCI_MSI_FLAGS_ENABLE), 2);
> > > +
> > > +        msi_addr_lo = qemu_get_be32(f);
> > > +        vfio_pci_write_config(pdev, pdev->msi_cap + PCI_MSI_ADDRESS_LO,
> > > +                              msi_addr_lo, 4);
> > > +
> > > +        msi_addr_hi = qemu_get_be32(f);
> > > +        if (msi_64bit) {
> > > +            vfio_pci_write_config(pdev, pdev->msi_cap + PCI_MSI_ADDRESS_HI,
> > > +                                  msi_addr_hi, 4);
> > > +        }
> > > +        msi_data = qemu_get_be32(f);
> > > +        vfio_pci_write_config(pdev,
> > > +                pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : PCI_MSI_DATA_32),
> > > +                msi_data, 2);
> > > +
> > > +        vfio_pci_write_config(pdev, pdev->msi_cap + PCI_MSI_FLAGS,
> > > +                              msi_flags | PCI_MSI_FLAGS_ENABLE, 2);
> > > +    } else if (interrupt_type == VFIO_INT_MSIX) {
> > > +        uint16_t offset = qemu_get_be16(f);
> > > +
> > > +        /* load enable bit and maskall bit */
> > > +        vfio_pci_write_config(pdev, pdev->msix_cap + PCI_MSIX_FLAGS + 1,
> > > +                              offset, 2);
> > > +        msix_load(pdev, f);
> > > +    }
> > > +    pci_cmd = qemu_get_be16(f);
> > > +    vfio_pci_write_config(pdev, PCI_COMMAND, pci_cmd, 2);
> > > +    return 0;
> > > +}  
> > 
> > It always seems like there should be a lot more state than this, and I
> > probably sound like a broken record because I ask every time, but maybe
> > that's a good indication that we (or at least I) need a comment
> > explaining why we only care about these.  For example, what if we
> > migrate a device in the D3 power state, don't we need to account for
> > the state stored in the PM capability or does the device wake up into
> > D0 auto-magically after migration?  I think we could repeat that
> > question for every capability that can be modified.  Even for the MSI/X
> > cases, the interrupt may not be active, but there could be state in
> > virtual config space that would be different on the target.  For
> > example, if we migrate with a device in INTx mode where the guest had
> > written vector fields on the source, but only writes the enable bit on
> > the target, can we seamlessly figure out the rest?  For other
> > capabilities, that state may represent config space changes written
> > through to the physical device and represent a functional difference on
> > the target.  Thanks,
> > 
> > Alex
> >   
> > > +
> > >  static VFIODeviceOps vfio_pci_ops = {
> > >      .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
> > >      .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
> > >      .vfio_eoi = vfio_intx_eoi,
> > >      .vfio_get_object = vfio_pci_get_object,
> > > +    .vfio_save_config = vfio_pci_save_config,
> > > +    .vfio_load_config = vfio_pci_load_config,
> > >  };
> > >  
> > >  int vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp)
> > > diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> > > index 74261feaeac9..d69a7f3ae31e 100644
> > > --- a/include/hw/vfio/vfio-common.h
> > > +++ b/include/hw/vfio/vfio-common.h
> > > @@ -120,6 +120,8 @@ struct VFIODeviceOps {
> > >      int (*vfio_hot_reset_multi)(VFIODevice *vdev);
> > >      void (*vfio_eoi)(VFIODevice *vdev);
> > >      Object *(*vfio_get_object)(VFIODevice *vdev);
> > > +    void (*vfio_save_config)(VFIODevice *vdev, QEMUFile *f);
> > > +    int (*vfio_load_config)(VFIODevice *vdev, QEMUFile *f);
> > >  };
> > >  
> > >  typedef struct VFIOGroup {  
> >   
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



  reply index

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-24 21:08 [PATCH v16 QEMU 00/16] Add migration support for VFIO devices Kirti Wankhede
2020-03-24 21:08 ` [PATCH v16 QEMU 01/16] vfio: KABI for migration interface - Kernel header placeholder Kirti Wankhede
2020-03-24 21:09 ` [PATCH v16 QEMU 02/16] vfio: Add function to unmap VFIO region Kirti Wankhede
2020-03-24 21:09 ` [PATCH v16 QEMU 03/16] vfio: Add vfio_get_object callback to VFIODeviceOps Kirti Wankhede
2020-03-24 21:09 ` [PATCH v16 QEMU 04/16] vfio: Add save and load functions for VFIO PCI devices Kirti Wankhede
2020-03-25 19:56   ` Alex Williamson
2020-03-26 17:29     ` Dr. David Alan Gilbert
2020-03-26 17:38       ` Alex Williamson [this message]
2020-05-04 23:18     ` Kirti Wankhede
2020-05-05  4:37       ` Alex Williamson
2020-05-06  6:11         ` Yan Zhao
2020-05-06 19:48           ` Kirti Wankhede
2020-05-06 20:03             ` Alex Williamson
2020-05-07  5:40               ` Kirti Wankhede
2020-05-07 18:14                 ` Alex Williamson
2020-03-26 17:46   ` Dr. David Alan Gilbert
2020-05-04 23:19     ` Kirti Wankhede
2020-04-07  4:10   ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
2020-05-04 23:21     ` Kirti Wankhede
2020-03-24 21:09 ` [PATCH v16 QEMU 05/16] vfio: Add migration region initialization and finalize function Kirti Wankhede
2020-03-26 17:52   ` Dr. David Alan Gilbert
2020-05-04 23:19     ` Kirti Wankhede
2020-05-19 19:32       ` Dr. David Alan Gilbert
2020-03-24 21:09 ` [PATCH v16 QEMU 06/16] vfio: Add VM state change handler to know state of VM Kirti Wankhede
2020-03-24 21:09 ` [PATCH v16 QEMU 07/16] vfio: Add migration state change notifier Kirti Wankhede
2020-04-01 11:27   ` Dr. David Alan Gilbert
2020-05-04 23:20     ` Kirti Wankhede
2020-03-24 21:09 ` [PATCH v16 QEMU 08/16] vfio: Register SaveVMHandlers for VFIO device Kirti Wankhede
2020-03-25 21:02   ` Alex Williamson
2020-05-04 23:19     ` Kirti Wankhede
2020-05-05  4:37       ` Alex Williamson
2020-05-06  6:38         ` Yan Zhao
2020-05-06  9:58           ` Cornelia Huck
2020-05-06 16:53             ` Dr. David Alan Gilbert
2020-05-06 19:30               ` Kirti Wankhede
2020-05-07  6:37                 ` Cornelia Huck
2020-05-07 20:29                 ` Alex Williamson
2020-04-01 17:36   ` Dr. David Alan Gilbert
2020-05-04 23:20     ` Kirti Wankhede
2020-03-24 21:09 ` [PATCH v16 QEMU 09/16] vfio: Add save state functions to SaveVMHandlers Kirti Wankhede
2020-03-25 22:03   ` Alex Williamson
2020-05-04 23:18     ` Kirti Wankhede
2020-05-05  4:37       ` Alex Williamson
2020-05-11  9:53         ` Kirti Wankhede
2020-05-11 15:59           ` Alex Williamson
2020-05-12  2:06           ` Yan Zhao
2020-05-09  5:31   ` Yan Zhao
2020-05-11 10:22     ` Kirti Wankhede
2020-05-12  0:50       ` Yan Zhao
2020-03-24 21:09 ` [PATCH v16 QEMU 10/16] vfio: Add load " Kirti Wankhede
2020-03-25 22:36   ` Alex Williamson
2020-04-01 18:58   ` Dr. David Alan Gilbert
2020-05-04 23:20     ` Kirti Wankhede
2020-03-24 21:09 ` [PATCH v16 QEMU 11/16] iommu: add callback to get address limit IOMMU supports Kirti Wankhede
2020-03-24 21:09 ` [PATCH v16 QEMU 12/16] memory: Set DIRTY_MEMORY_MIGRATION when IOMMU is enabled Kirti Wankhede
2020-04-01 19:00   ` Dr. David Alan Gilbert
2020-04-01 19:42     ` Alex Williamson
2020-03-24 21:09 ` [PATCH v16 QEMU 13/16] vfio: Add function to start and stop dirty pages tracking Kirti Wankhede
2020-03-26 19:10   ` Alex Williamson
2020-05-04 23:20     ` Kirti Wankhede
2020-04-01 19:03   ` Dr. David Alan Gilbert
2020-05-04 23:21     ` Kirti Wankhede
2020-03-24 21:09 ` [PATCH v16 QEMU 14/16] vfio: Add vfio_listener_log_sync to mark dirty pages Kirti Wankhede
2020-03-25  2:19   ` Yan Zhao
2020-03-26 19:46   ` Alex Williamson
2020-04-01 19:08     ` Dr. David Alan Gilbert
2020-04-01  5:50   ` Yan Zhao
2020-04-03 20:11     ` Kirti Wankhede
2020-03-24 21:09 ` [PATCH v16 QEMU 15/16] vfio: Add ioctl to get dirty pages bitmap during dma unmap Kirti Wankhede
2020-03-24 21:09 ` [PATCH v16 QEMU 16/16] vfio: Make vfio-pci device migration capable Kirti Wankhede
2020-03-24 23:36 ` [PATCH v16 QEMU 00/16] Add migration support for VFIO devices no-reply
2020-03-31 18:34 ` Alex Williamson
2020-04-01  6:41   ` Yan Zhao
2020-04-01 18:34     ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200326113848.39b6e2f4@x1.home \
    --to=alex.williamson@redhat.com \
    --cc=Ken.Xue@amd.com \
    --cc=Zhengxiao.zx@alibaba-inc.com \
    --cc=aik@ozlabs.ru \
    --cc=changpeng.liu@intel.com \
    --cc=cjia@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eauger@redhat.com \
    --cc=eskultet@redhat.com \
    --cc=felipe@nutanix.com \
    --cc=jonathan.davies@nutanix.com \
    --cc=kevin.tian@intel.com \
    --cc=kwankhede@nvidia.com \
    --cc=mlevitsk@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=shuangtai.tst@alibaba-inc.com \
    --cc=yan.y.zhao@intel.com \
    --cc=yi.l.liu@intel.com \
    --cc=zhi.a.wang@intel.com \
    --cc=ziye.yang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git