linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yan Zhao <yan.y.zhao@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	cohuck@redhat.com, zhenyuw@linux.intel.com, zhi.a.wang@intel.com,
	kevin.tian@intel.com, shaopeng.he@intel.com, yi.l.liu@intel.com,
	xin.zeng@intel.com, hang.yuan@intel.com
Subject: Re: [RFC PATCH v4 07/10] vfio/pci: introduce a new irq type VFIO_IRQ_TYPE_REMAP_BAR_REGION
Date: Tue, 2 Jun 2020 21:40:58 -0400	[thread overview]
Message-ID: <20200603014058.GA12300@joy-OptiPlex-7040> (raw)
In-Reply-To: <20200602133435.1ab650c5@x1.home>

On Tue, Jun 02, 2020 at 01:34:35PM -0600, Alex Williamson wrote:
> I'm not at all happy with this.  Why do we need to hide the migration
> sparse mmap from the user until migration time?  What if instead we
> introduced a new VFIO_REGION_INFO_CAP_SPARSE_MMAP_SAVING capability
> where the existing capability is the normal runtime sparse setup and
> the user is required to use this new one prior to enabled device_state
> with _SAVING.  The vendor driver could then simply track mmap vmas to
> the region and refuse to change device_state if there are outstanding
> mmaps conflicting with the _SAVING sparse mmap layout.  No new IRQs
> required, no new irqfds, an incremental change to the protocol,
> backwards compatible to the extent that a vendor driver requiring this
> will automatically fail migration.
> 
right. looks we need to use this approach to solve the problem.
thanks for your guide.
so I'll abandon the current remap irq way for dirty tracking during live
migration.
but anyway, it demos how to customize irq_types in vendor drivers.
then, what do you think about patches 1-5?

> > > What happens if the mmap re-evaluation occurs asynchronous to the
> > > device_state write?  The vendor driver can track outstanding mmap vmas
> > > to areas it's trying to revoke, so the vendor driver can know when
> > > userspace has reached an acceptable state (assuming we require
> > > userspace to munmap areas that are no longer valid).  We should also
> > > consider what we can accomplish by invalidating user mmaps, ex. can we
> > > fault them back in on a per-page basis and continue to mark them dirty
> > > in the migration state, re-invalidating on each iteration until they've
> > > finally been closed.   It seems the vendor driver needs to handle
> > > incrementally closing each mmap anyway, there's no requirement to the
> > > user to stop the device (ie. block all access), make these changes,
> > > then restart the device.  So perhaps the vendor driver can "limp" along
> > > until userspace completes the changes.  I think we can assume we are in
> > > a cooperative environment here, userspace wants to perform a migration,
> > > disabling direct access to some regions is for mediating those accesses
> > > during migration, not for preventing the user from accessing something
> > > they shouldn't have access to, userspace is only delaying the migration
> > > or affecting the state of their device by not promptly participating in
> > > the protocol.
> > >   
> > the problem is that the mmap re-evaluation has to be done before
> > device_state is successfully set to SAVING. otherwise, the QEMU may
> > have left save_setup stage and it's too late to start dirty tracking.
> > And the reason for us to trap the BAR regions is not because there're
> > dirty data in this region, it is because we want to know when the device
> > registers mapped in the BARs are written, so we can do dirty page track
> > of system memory in software way.
> 
> I think my proposal above resolves this.
>
yes.

> > > Another problem I see though is what about p2p DMA?  If the vendor
> > > driver invalidates an mmap we're removing it from both direct CPU as
> > > well as DMA access via the IOMMU.  We can't signal to the guest OS that
> > > a DMA channel they've been using is suddenly no longer valid.  Is QEMU
> > > going to need to avoid ever IOMMU mapping device_ram for regions
> > > subject to mmap invalidation?  That would introduce an undesirable need
> > > to choose whether we want to support p2p or migration unless we had an
> > > IOMMU that could provide dirty tracking via p2p, right?  Thanks,  
> > 
> > yes, if there are device memory mapped in the BARs to be remapped, p2p
> > DMA would be affected. Perhaps it is what vendor driver should be aware
> > of and know what it is doing before sending out the remap irq ?
> > in i40e vf's case, the BAR 0 to be remapped is only for device registers,
> > so is it still good?
> 
> No, we can't design the interface based on one vendor driver's
> implementation of the interface or the requirements of a single device.
> If we took the approach above where the user is provided both the
> normal sparse mmap and the _SAVING sparse mmap, perhaps QEMU could
> avoid DMA mapping portions that don't exist in the _SAVING version, at
> least then the p2p DMA mappings would be consistent across the
> transition.  QEMU might be able to combine the sparse mmap maps such
> that it can easily drop ranges not present during _SAVING.  QEMU would
> need to munmap() the dropped ranges rather than simply mark the
> MemoryRegion disabled though for the vendor driver to have visibility
> of the vm_ops.close callback.  Thanks,
>
ok. got it! thanks you!

Yan

  reply	other threads:[~2020-06-03  1:51 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-18  2:42 [RFC PATCH v4 00/10] Introduce vendor ops in vfio-pci Yan Zhao
2020-05-18  2:43 ` [RFC PATCH v4 01/10] vfio/pci: register/unregister vfio_pci_vendor_driver_ops Yan Zhao
2020-05-18  2:45 ` [RFC PATCH v4 02/10] vfio/pci: macros to generate module_init and module_exit for vendor modules Yan Zhao
2020-06-04 15:01   ` Cornelia Huck
2020-06-05  2:05     ` Yan Zhao
2020-05-18  2:49 ` [RFC PATCH v4 03/10] vfio/pci: export vendor_data, irq_type, num_regions, pdev and functions in vfio_pci_ops Yan Zhao
2020-05-18  2:49 ` [RFC PATCH v4 04/10] vfio/pci: let vfio_pci know number of vendor regions and vendor irqs Yan Zhao
2020-06-04 15:25   ` Cornelia Huck
2020-06-05  2:15     ` Yan Zhao
2020-06-11 12:31       ` David Edmondson
2020-06-11 23:09         ` Yan Zhao
2020-05-18  2:50 ` [RFC PATCH v4 05/10] vfio/pci: export vfio_pci_get_barmap Yan Zhao
2020-05-18  2:50 ` [RFC PATCH v4 06/10] vfio: Define device specific irq type capability Yan Zhao
2020-05-18  2:52 ` [RFC PATCH v4 07/10] vfio/pci: introduce a new irq type VFIO_IRQ_TYPE_REMAP_BAR_REGION Yan Zhao
2020-05-18  2:56   ` [QEMU RFC PATCH v4] hw/vfio/pci: remap bar region irq Yan Zhao
2020-05-29 21:45   ` [RFC PATCH v4 07/10] vfio/pci: introduce a new irq type VFIO_IRQ_TYPE_REMAP_BAR_REGION Alex Williamson
2020-06-01  6:57     ` Yan Zhao
2020-06-01 16:43       ` Alex Williamson
2020-06-02  8:28         ` Yan Zhao
2020-06-02 19:34           ` Alex Williamson
2020-06-03  1:40             ` Yan Zhao [this message]
2020-06-03 23:04               ` Alex Williamson
2020-06-04  2:42                 ` Yan Zhao
2020-06-04  4:10                   ` Alex Williamson
2020-06-05  0:26                     ` He, Shaopeng
2020-06-05 17:54                       ` Alex Williamson
2020-06-05  2:02                     ` Yan Zhao
2020-06-05 16:13                       ` Alex Williamson
2020-06-10  5:23                         ` Yan Zhao
2020-06-19 22:55                           ` Alex Williamson
2020-06-22  3:34                             ` Yan Zhao
2020-05-18  2:53 ` [RFC PATCH v4 08/10] i40e/vf_migration: VF live migration - pass-through VF first Yan Zhao
2020-06-10  8:59   ` Xiang Zheng
2020-06-11  0:23     ` Yan Zhao
2020-06-11  2:27       ` Xiang Zheng
2020-06-11 23:10         ` Yan Zhao
2020-05-18  2:54 ` [RFC PATCH v4 09/10] i40e/vf_migration: register a migration vendor region Yan Zhao
2020-05-18  2:54 ` [RFC PATCH v4 10/10] i40e/vf_migration: vendor defined irq_type to support dynamic bar map Yan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200603014058.GA12300@joy-OptiPlex-7040 \
    --to=yan.y.zhao@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=hang.yuan@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shaopeng.he@intel.com \
    --cc=xin.zeng@intel.com \
    --cc=yi.l.liu@intel.com \
    --cc=zhenyuw@linux.intel.com \
    --cc=zhi.a.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).