kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yan Zhao <yan.y.zhao@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	cohuck@redhat.com, zhenyuw@linux.intel.com, zhi.a.wang@intel.com,
	kevin.tian@intel.com, shaopeng.he@intel.com, yi.l.liu@intel.com,
	xin.zeng@intel.com, hang.yuan@intel.com
Subject: Re: [RFC PATCH v4 07/10] vfio/pci: introduce a new irq type VFIO_IRQ_TYPE_REMAP_BAR_REGION
Date: Wed, 3 Jun 2020 22:42:28 -0400	[thread overview]
Message-ID: <20200604024228.GD12300@joy-OptiPlex-7040> (raw)
In-Reply-To: <20200603170452.7f172baf@x1.home>

On Wed, Jun 03, 2020 at 05:04:52PM -0600, Alex Williamson wrote:
> On Tue, 2 Jun 2020 21:40:58 -0400
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> > On Tue, Jun 02, 2020 at 01:34:35PM -0600, Alex Williamson wrote:
> > > I'm not at all happy with this.  Why do we need to hide the migration
> > > sparse mmap from the user until migration time?  What if instead we
> > > introduced a new VFIO_REGION_INFO_CAP_SPARSE_MMAP_SAVING capability
> > > where the existing capability is the normal runtime sparse setup and
> > > the user is required to use this new one prior to enabled device_state
> > > with _SAVING.  The vendor driver could then simply track mmap vmas to
> > > the region and refuse to change device_state if there are outstanding
> > > mmaps conflicting with the _SAVING sparse mmap layout.  No new IRQs
> > > required, no new irqfds, an incremental change to the protocol,
> > > backwards compatible to the extent that a vendor driver requiring this
> > > will automatically fail migration.
> > >   
> > right. looks we need to use this approach to solve the problem.
> > thanks for your guide.
> > so I'll abandon the current remap irq way for dirty tracking during live
> > migration.
> > but anyway, it demos how to customize irq_types in vendor drivers.
> > then, what do you think about patches 1-5?
> In broad strokes, I don't think we've found the right solution yet.  I
> really question whether it's supportable to parcel out vfio-pci like
> this and I don't know how I'd support unraveling whether we have a bug
> in vfio-pci, the vendor driver, or how the vendor driver is making use
> of vfio-pci.
> Let me also ask, why does any of this need to be in the kernel?  We
> spend 5 patches slicing up vfio-pci so that we can register a vendor
> driver and have that vendor driver call into vfio-pci as it sees fit.
> We have two patches creating device specific interrupts and a BAR
> remapping scheme that we've decided we don't need.  That brings us to
> the actual i40e vendor driver, where the first patch is simply making
> the vendor driver work like vfio-pci already does, the second patch is
> handling the migration region, and the third patch is implementing the
> BAR remapping IRQ that we decided we don't need.  It's difficult to
> actually find the small bit of code that's required to support
> migration outside of just dealing with the protocol we've defined to
> expose this from the kernel.  So why are we trying to do this in the
> kernel?  We have quirk support in QEMU, we can easily flip
> MemoryRegions on and off, etc.  What access to the device outside of
> what vfio-pci provides to the user, and therefore QEMU, is necessary to
> implement this migration support for i40e VFs?  Is this just an
> exercise in making use of the migration interface?  Thanks,
hi Alex

There was a description of intention of this series in RFC v1
sorry, I didn't include it in starting from RFC v2.

The reason why we don't choose the way of writing mdev parent driver is
(1) VFs are almost all the time directly passthroughed. Directly binding
to vfio-pci can make most of the code shared/reused. If we write a
vendor specific mdev parent driver, most of the code (like passthrough
style of rw/mmap) still needs to be copied from vfio-pci driver, which is
actually a duplicated and tedious work.
(2) For features like dynamically trap/untrap pci bars, if they are in
vfio-pci, they can be available to most people without repeated code
copying and re-testing.
(3) with a 1:1 mdev driver which passes through VFs most of the time, people
have to decide whether to bind VFs to vfio-pci or mdev parent driver before
it runs into a real migration need. However, if vfio-pci is bound
initially, they have no chance to do live migration when there's a need
particularly, there're some devices (like NVMe) they purely reply on
vfio-pci to do device pass-through and they have no standalone parent driver
to do mdev way.

I think live migration is a general requirement for most devices and to
interact with the migration interface requires vendor drivers to do
device specific tasks like geting/seting device state, starting/stopping
devices, tracking dirty data, report migration capabilities... all those
works need be in kernel.
do you think it's better to create numerous vendor quirks in vfio-pci?

as to this series, though patch 9/10 currently only demos reporting a
migration region, it actually shows the capability iof vendor driver to
customize device regions. e.g. in patch 10/10, it customizes the BAR0 to
be read/write. and though we abandoned the REMAP BAR irq_type in patch
10/10 for migration purpose, I have to say this irq_type has its usage
in other use cases, where synchronization is not a hard requirement and
all it needs is a notification channel from kernel to use. this series
just provides a possibility for vendors to customize device regions and

for interfaces exported in patch 3/10-5/10, they anyway need to be
exported for writing mdev parent drivers that pass through devices at
normal time to avoid duplication. and yes, your worry about
identification of bug sources is reasonable. but if a device is binding
to vfio-pci with a vendor module loaded, and there's a bug, they can do at
least two ways to identify if it's a bug in vfio-pci itself.
(1) prevent vendor modules from loading and see if the problem exists
with pure vfio-pci.
(2) do what's demoed in patch 8/10, i.e. do nothing but simply pass all
operations to vfio-pci.

so, do you think this series has its merit and we can continue improving


  reply	other threads:[~2020-06-04  2:52 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-18  2:42 [RFC PATCH v4 00/10] Introduce vendor ops in vfio-pci Yan Zhao
2020-05-18  2:43 ` [RFC PATCH v4 01/10] vfio/pci: register/unregister vfio_pci_vendor_driver_ops Yan Zhao
2020-05-18  2:45 ` [RFC PATCH v4 02/10] vfio/pci: macros to generate module_init and module_exit for vendor modules Yan Zhao
2020-06-04 15:01   ` Cornelia Huck
2020-06-05  2:05     ` Yan Zhao
2020-05-18  2:49 ` [RFC PATCH v4 03/10] vfio/pci: export vendor_data, irq_type, num_regions, pdev and functions in vfio_pci_ops Yan Zhao
2020-05-18  2:49 ` [RFC PATCH v4 04/10] vfio/pci: let vfio_pci know number of vendor regions and vendor irqs Yan Zhao
2020-06-04 15:25   ` Cornelia Huck
2020-06-05  2:15     ` Yan Zhao
2020-06-11 12:31       ` David Edmondson
2020-06-11 23:09         ` Yan Zhao
2020-05-18  2:50 ` [RFC PATCH v4 05/10] vfio/pci: export vfio_pci_get_barmap Yan Zhao
2020-05-18  2:50 ` [RFC PATCH v4 06/10] vfio: Define device specific irq type capability Yan Zhao
2020-05-18  2:52 ` [RFC PATCH v4 07/10] vfio/pci: introduce a new irq type VFIO_IRQ_TYPE_REMAP_BAR_REGION Yan Zhao
2020-05-18  2:56   ` [QEMU RFC PATCH v4] hw/vfio/pci: remap bar region irq Yan Zhao
2020-05-29 21:45   ` [RFC PATCH v4 07/10] vfio/pci: introduce a new irq type VFIO_IRQ_TYPE_REMAP_BAR_REGION Alex Williamson
2020-06-01  6:57     ` Yan Zhao
2020-06-01 16:43       ` Alex Williamson
2020-06-02  8:28         ` Yan Zhao
2020-06-02 19:34           ` Alex Williamson
2020-06-03  1:40             ` Yan Zhao
2020-06-03 23:04               ` Alex Williamson
2020-06-04  2:42                 ` Yan Zhao [this message]
2020-06-04  4:10                   ` Alex Williamson
2020-06-05  0:26                     ` He, Shaopeng
2020-06-05 17:54                       ` Alex Williamson
2020-06-05  2:02                     ` Yan Zhao
2020-06-05 16:13                       ` Alex Williamson
2020-06-10  5:23                         ` Yan Zhao
2020-06-19 22:55                           ` Alex Williamson
2020-06-22  3:34                             ` Yan Zhao
2020-05-18  2:53 ` [RFC PATCH v4 08/10] i40e/vf_migration: VF live migration - pass-through VF first Yan Zhao
2020-06-10  8:59   ` Xiang Zheng
2020-06-11  0:23     ` Yan Zhao
2020-06-11  2:27       ` Xiang Zheng
2020-06-11 23:10         ` Yan Zhao
2020-05-18  2:54 ` [RFC PATCH v4 09/10] i40e/vf_migration: register a migration vendor region Yan Zhao
2020-05-18  2:54 ` [RFC PATCH v4 10/10] i40e/vf_migration: vendor defined irq_type to support dynamic bar map Yan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200604024228.GD12300@joy-OptiPlex-7040 \
    --to=yan.y.zhao@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=hang.yuan@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shaopeng.he@intel.com \
    --cc=xin.zeng@intel.com \
    --cc=yi.l.liu@intel.com \
    --cc=zhenyuw@linux.intel.com \
    --cc=zhi.a.wang@intel.com \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).