qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kirti Wankhede <kwankhede@nvidia.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: cohuck@redhat.com, cjia@nvidia.com, aik@ozlabs.ru,
	Zhengxiao.zx@Alibaba-inc.com, shuangtai.tst@alibaba-inc.com,
	qemu-devel@nongnu.org, peterx@redhat.com, eauger@redhat.com,
	yi.l.liu@intel.com, quintela@redhat.com, ziye.yang@intel.com,
	armbru@redhat.com, mlevitsk@redhat.com, pasic@linux.ibm.com,
	felipe@nutanix.com, zhi.a.wang@intel.com, kevin.tian@intel.com,
	yan.y.zhao@intel.com, dgilbert@redhat.com,
	changpeng.liu@intel.com, eskultet@redhat.com, Ken.Xue@amd.com,
	jonathan.davies@nutanix.com, pbonzini@redhat.com
Subject: Re: [PATCH QEMU v25 03/17] vfio: Add save and load functions for VFIO PCI devices
Date: Wed, 24 Jun 2020 19:59:39 +0530	[thread overview]
Message-ID: <88ace5f6-09ee-4d82-f304-bc2d4cdc17cf@nvidia.com> (raw)
In-Reply-To: <20200622142803.109565e3@x1.home>



On 6/23/2020 1:58 AM, Alex Williamson wrote:
> On Sun, 21 Jun 2020 01:51:12 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> 
>> These functions save and restore PCI device specific data - config
>> space of PCI device.
>> Tested save and restore with MSI and MSIX type.
>>
>> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
>> Reviewed-by: Neo Jia <cjia@nvidia.com>
>> ---
>>   hw/vfio/pci.c                 | 95 +++++++++++++++++++++++++++++++++++++++++++
>>   include/hw/vfio/vfio-common.h |  2 +
>>   2 files changed, 97 insertions(+)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 27f8872db2b1..5ba340aee1d4 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -41,6 +41,7 @@
>>   #include "trace.h"
>>   #include "qapi/error.h"
>>   #include "migration/blocker.h"
>> +#include "migration/qemu-file.h"
>>   
>>   #define TYPE_VFIO_PCI "vfio-pci"
>>   #define PCI_VFIO(obj)    OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI)
>> @@ -2407,11 +2408,105 @@ static Object *vfio_pci_get_object(VFIODevice *vbasedev)
>>       return OBJECT(vdev);
>>   }
>>   
>> +static void vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f)
>> +{
>> +    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>> +    PCIDevice *pdev = &vdev->pdev;
>> +
>> +    qemu_put_buffer(f, vdev->emulated_config_bits, vdev->config_size);
>> +    qemu_put_buffer(f, vdev->pdev.wmask, vdev->config_size);
>> +    pci_device_save(pdev, f);
>> +
>> +    qemu_put_be32(f, vdev->interrupt);
>> +    if (vdev->interrupt == VFIO_INT_MSIX) {
>> +        msix_save(pdev, f);
> 
> msix_save() checks msix_present() so shouldn't we include this
> unconditionally?  Can't there also be state in the vector table
> regardless of whether we're currently running in MSI-X mode?
> 
>> +    }
>> +}
>> +
>> +static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
>> +{
>> +    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>> +    PCIDevice *pdev = &vdev->pdev;
>> +    uint32_t interrupt_type;
>> +    uint16_t pci_cmd;
>> +    int i, ret;
>> +
>> +    qemu_get_buffer(f, vdev->emulated_config_bits, vdev->config_size);
>> +    qemu_get_buffer(f, vdev->pdev.wmask, vdev->config_size);
> 
> This doesn't seem safe, why is it ok to indiscriminately copy these
> arrays that are configured via support or masking of various device
> features from the source to the target?
> 

Ideally, software state at host should be restrored at destination - 
this is the attempt to do that.


> I think this still fails basic feature support negotiation.  For
> instance, Intel IGD assignment modifies emulated_config_bits and wmask
> to allow the VM BIOS to allocate fake stolen memory for the GPU and
> store this value in config space.  This support can be controlled via a
> QEMU build-time option, therefore the feature support on the target can
> be different from the source.  If this sort of feature set doesn't
> match between source and target, I think we'd want to abort the
> migration, but we don't have any provisions for that here (a physical
> IGD device is obviously just an example as it doesn't support migration
> currently).
> 

Then is it ok not to include vdev->pdev.wmask? If yes, I'll remove it.
But we need vdev->emulated_config_bits to be restored.

>> +
>> +    ret = pci_device_load(pdev, f);
>> +    if (ret) {
>> +        return ret;
>> +    }
>> +
>> +    /* retore pci bar configuration */
>> +    pci_cmd = pci_default_read_config(pdev, PCI_COMMAND, 2);
>> +    vfio_pci_write_config(pdev, PCI_COMMAND,
>> +                        pci_cmd & (!(PCI_COMMAND_IO | PCI_COMMAND_MEMORY)), 2);
> 
> s/!/~/?  Extra parenthesis too
> 
>> +    for (i = 0; i < PCI_ROM_SLOT; i++) {
>> +        uint32_t bar = pci_default_read_config(pdev,
>> +                                               PCI_BASE_ADDRESS_0 + i * 4, 4);
>> +
>> +        vfio_pci_write_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, bar, 4);
>> +    }
>> +
>> +    interrupt_type = qemu_get_be32(f);
>> +
>> +    if (interrupt_type == VFIO_INT_MSI) {
>> +        uint32_t msi_flags, msi_addr_lo, msi_addr_hi = 0, msi_data;
>> +        bool msi_64bit;
>> +
>> +        /* restore msi configuration */
>> +        msi_flags = pci_default_read_config(pdev,
>> +                                            pdev->msi_cap + PCI_MSI_FLAGS, 2);
>> +        msi_64bit = (msi_flags & PCI_MSI_FLAGS_64BIT);
>> +
>> +        vfio_pci_write_config(pdev, pdev->msi_cap + PCI_MSI_FLAGS,
>> +                              msi_flags & (!PCI_MSI_FLAGS_ENABLE), 2);
>> +
> 
> What if I migrate from a device with MSI support to a device without
> MSI support, or to a device with MSI support at a different offset, who
> is responsible for triggering a migration fault?
> 

Migration compatibility check should take care of that. If there is such 
a big difference in hardware then other things would also fail.

> 
>> +        msi_addr_lo = pci_default_read_config(pdev,
>> +                                        pdev->msi_cap + PCI_MSI_ADDRESS_LO, 4);
>> +        vfio_pci_write_config(pdev, pdev->msi_cap + PCI_MSI_ADDRESS_LO,
>> +                              msi_addr_lo, 4);
>> +
>> +        if (msi_64bit) {
>> +            msi_addr_hi = pci_default_read_config(pdev,
>> +                                        pdev->msi_cap + PCI_MSI_ADDRESS_HI, 4);
>> +            vfio_pci_write_config(pdev, pdev->msi_cap + PCI_MSI_ADDRESS_HI,
>> +                                  msi_addr_hi, 4);
>> +        }
>> +
>> +        msi_data = pci_default_read_config(pdev,
>> +                pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : PCI_MSI_DATA_32),
>> +                2);
>> +
>> +        vfio_pci_write_config(pdev,
>> +                pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : PCI_MSI_DATA_32),
>> +                msi_data, 2);
>> +
>> +        vfio_pci_write_config(pdev, pdev->msi_cap + PCI_MSI_FLAGS,
>> +                              msi_flags | PCI_MSI_FLAGS_ENABLE, 2);
>> +    } else if (interrupt_type == VFIO_INT_MSIX) {
>> +        uint16_t offset;
>> +
>> +        offset = pci_default_read_config(pdev,
>> +                                       pdev->msix_cap + PCI_MSIX_FLAGS + 1, 2);
>> +        /* load enable bit and maskall bit */
>> +        vfio_pci_write_config(pdev, pdev->msix_cap + PCI_MSIX_FLAGS + 1,
>> +                              offset, 2);
>> +        msix_load(pdev, f);
> 
> Isn't this ordering backwards, or at least less efficient?  The config
> write will cause us to enable MSI-X; presumably we'd have nothing in
> the vector table though.  Then msix_load() will write the vector
> and pba tables and trigger a use notifier for each vector.  It seems
> like that would trigger a bunch of SET_IRQS ioctls as if the guest
> wrote individual unmasked vectors to the vector table, whereas if we
> setup the vector table and then enable MSI-X, we do it with one ioctl.
> 

Makes sense. Changing the order here.

> Also same question as above, I'm not sure who is responsible for making
> sure both devices support MSI-X and that the capability exists at the
> same place on each.  Repeat for essentially every capability.  Are we
> leaning on the migration regions to fail these migrations before we get
> here?  If so, should we be?
> 
As I mentioned about it should be vendor drivers responsibility to have 
compatibility check in that case.

> Also, besides BARs, the command register, and MSI & MSI-X, there must
> be other places where the guest can write config data through to the
> device.  pci_device_{save,load}() only sets QEMU's config space.
> 

 From QEMU we can restore QEMU's software state. For mediated device, 
emulated state at vendor driver should be maintained by vendor driver, 
right?

> A couple more theoretical (probably not too distant) examples related
> to that; there's a resizable BAR capability that at some point we'll
> probably need to allow the guest to interact with (ie. manipulation of
> capability changes the reported region size for a BAR).  How would we
> support that with this save/load scheme?

Config space is saved at the start of stop-and-copy phase, that means 
vCPUs are stopped. So QEMU's config space saved at this phase should 
include the change. Will there be any other software state that would be 
required to save/load?

>  We'll likely also have SR-IOV
> PFs assigned where we'll perhaps have support for emulating the SR-IOV
> capability to call out to a privileged userspace helper to enable VFs,
> how does this get extended to support that type of emulation?
> 
> I'm afraid that making carbon copies of emulated_config_bits, wmask,
> and invoking pci_device_save/load() doesn't address my concerns that
> saving and restoring config space between source and target really
> seems like a much more important task than outlined here.  Thanks,
> 

Are you suggesting to load config space using vfio_pci_write_config() 
from PCI_CONFIG_HEADER_SIZE to 
PCI_CONFIG_SPACE_SIZE/PCIE_CONFIG_SPACE_SIZE? I was kind of avoiding it.

Thanks,
Kirti



  reply	other threads:[~2020-06-24 14:30 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-20 20:21 [PATCH QEMU v25 00/17] Add migration support for VFIO devices Kirti Wankhede
2020-06-20 20:21 ` [PATCH QEMU v25 01/17] vfio: Add function to unmap VFIO region Kirti Wankhede
2020-06-20 20:21 ` [PATCH QEMU v25 02/17] vfio: Add vfio_get_object callback to VFIODeviceOps Kirti Wankhede
2020-06-20 20:21 ` [PATCH QEMU v25 03/17] vfio: Add save and load functions for VFIO PCI devices Kirti Wankhede
2020-06-22 20:28   ` Alex Williamson
2020-06-24 14:29     ` Kirti Wankhede [this message]
2020-06-24 19:49       ` Alex Williamson
2020-06-26 12:16         ` Dr. David Alan Gilbert
2020-06-26 22:44           ` Alex Williamson
2020-06-29  9:59             ` Dr. David Alan Gilbert
2020-06-20 20:21 ` [PATCH QEMU v25 04/17] vfio: Add migration region initialization and finalize function Kirti Wankhede
2020-06-23  7:54   ` Cornelia Huck
2020-06-20 20:21 ` [PATCH QEMU v25 05/17] vfio: Add VM state change handler to know state of VM Kirti Wankhede
2020-06-22 22:50   ` Alex Williamson
2020-06-23 18:55     ` Kirti Wankhede
2020-06-26 14:51       ` Dr. David Alan Gilbert
2020-06-23  8:07   ` Cornelia Huck
2020-06-20 20:21 ` [PATCH QEMU v25 06/17] vfio: Add migration state change notifier Kirti Wankhede
2020-06-23  8:10   ` Cornelia Huck
2020-06-20 20:21 ` [PATCH QEMU v25 07/17] vfio: Register SaveVMHandlers for VFIO device Kirti Wankhede
2020-06-22 22:50   ` Alex Williamson
2020-06-23 19:21     ` Kirti Wankhede
2020-06-23 19:50       ` Alex Williamson
2020-06-26 14:22         ` Dr. David Alan Gilbert
2020-06-26 14:31   ` Dr. David Alan Gilbert
2020-06-20 20:21 ` [PATCH QEMU v25 08/17] vfio: Add save state functions to SaveVMHandlers Kirti Wankhede
2020-06-22 22:50   ` Alex Williamson
2020-06-23 20:34     ` Kirti Wankhede
2020-06-23 20:40       ` Alex Williamson
2020-06-20 20:21 ` [PATCH QEMU v25 09/17] vfio: Add load " Kirti Wankhede
2020-06-24 18:54   ` Alex Williamson
2020-06-25 14:16     ` Kirti Wankhede
2020-06-25 14:57       ` Alex Williamson
2020-06-26 14:54     ` Dr. David Alan Gilbert
2020-06-20 20:21 ` [PATCH QEMU v25 10/17] memory: Set DIRTY_MEMORY_MIGRATION when IOMMU is enabled Kirti Wankhede
2020-06-20 20:21 ` [PATCH QEMU v25 11/17] vfio: Get migration capability flags for container Kirti Wankhede
2020-06-24  8:43   ` Cornelia Huck
2020-06-24 18:55   ` Alex Williamson
2020-06-25 14:09     ` Kirti Wankhede
2020-06-25 14:56       ` Alex Williamson
2020-06-20 20:21 ` [PATCH QEMU v25 12/17] vfio: Add function to start and stop dirty pages tracking Kirti Wankhede
2020-06-23 10:32   ` Cornelia Huck
2020-06-23 11:01     ` Dr. David Alan Gilbert
2020-06-23 11:06       ` Cornelia Huck
2020-06-24 18:55   ` Alex Williamson
2020-06-20 20:21 ` [PATCH QEMU v25 13/17] vfio: create mapped iova list when vIOMMU is enabled Kirti Wankhede
2020-06-24 18:55   ` Alex Williamson
2020-06-25 14:34     ` Kirti Wankhede
2020-06-25 17:40       ` Alex Williamson
2020-06-26 14:43         ` Peter Xu
2020-06-20 20:21 ` [PATCH QEMU v25 14/17] vfio: Add vfio_listener_log_sync to mark dirty pages Kirti Wankhede
2020-06-24 18:55   ` Alex Williamson
2020-06-25 14:43     ` Kirti Wankhede
2020-06-25 17:57       ` Alex Williamson
2020-06-20 20:21 ` [PATCH QEMU v25 15/17] vfio: Add ioctl to get dirty pages bitmap during dma unmap Kirti Wankhede
2020-06-23  8:25   ` Cornelia Huck
2020-06-24 18:56   ` Alex Williamson
2020-06-25 15:01     ` Kirti Wankhede
2020-06-25 19:18       ` Alex Williamson
2020-06-26 14:15         ` Dr. David Alan Gilbert
2020-06-20 20:21 ` [PATCH QEMU v25 16/17] vfio: Make vfio-pci device migration capable Kirti Wankhede
2020-06-22 16:51   ` Cornelia Huck
2020-06-20 20:21 ` [PATCH QEMU v25 17/17] qapi: Add VFIO devices migration stats in Migration stats Kirti Wankhede
2020-06-23  7:21   ` Markus Armbruster
2020-06-23 21:16     ` Kirti Wankhede
2020-06-25  5:51       ` Markus Armbruster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=88ace5f6-09ee-4d82-f304-bc2d4cdc17cf@nvidia.com \
    --to=kwankhede@nvidia.com \
    --cc=Ken.Xue@amd.com \
    --cc=Zhengxiao.zx@Alibaba-inc.com \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=armbru@redhat.com \
    --cc=changpeng.liu@intel.com \
    --cc=cjia@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eauger@redhat.com \
    --cc=eskultet@redhat.com \
    --cc=felipe@nutanix.com \
    --cc=jonathan.davies@nutanix.com \
    --cc=kevin.tian@intel.com \
    --cc=mlevitsk@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=shuangtai.tst@alibaba-inc.com \
    --cc=yan.y.zhao@intel.com \
    --cc=yi.l.liu@intel.com \
    --cc=zhi.a.wang@intel.com \
    --cc=ziye.yang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).