From: Jason Wang <jasowang@redhat.com>
To: Yan Zhao <yan.y.zhao@intel.com>, alex.williamson@redhat.com
Cc: kevin.tian@intel.com, kvm@vger.kernel.org,
libvir-list@redhat.com, cohuck@redhat.com,
linux-kernel@vger.kernel.org, zhenyuw@linux.intel.com,
qemu-devel@nongnu.org, shaopeng.he@intel.com,
zhi.a.wang@intel.com
Subject: Re: [RFC PATCH 0/9] Introduce mediate ops in vfio-pci
Date: Thu, 5 Dec 2019 14:33:19 +0800 [thread overview]
Message-ID: <8bcf603c-f142-f96d-bb11-834d686f5519@redhat.com> (raw)
In-Reply-To: <20191205032419.29606-1-yan.y.zhao@intel.com>
Hi:
On 2019/12/5 上午11:24, Yan Zhao wrote:
> For SRIOV devices, VFs are passthroughed into guest directly without host
> driver mediation. However, when VMs migrating with passthroughed VFs,
> dynamic host mediation is required to (1) get device states, (2) get
> dirty pages. Since device states as well as other critical information
> required for dirty page tracking for VFs are usually retrieved from PFs,
> it is handy to provide an extension in PF driver to centralizingly control
> VFs' migration.
>
> Therefore, in order to realize (1) passthrough VFs at normal time, (2)
> dynamically trap VFs' bars for dirty page tracking and
A silly question, what's the reason for doing this, is this a must for
dirty page tracking?
> (3) centralizing
> VF critical states retrieving and VF controls into one driver, we propose
> to introduce mediate ops on top of current vfio-pci device driver.
>
>
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>
> __________ register mediate ops| ___________ ___________ |
> | |<-----------------------| VF | | |
> | vfio-pci | | | mediate | | PF driver | |
> |__________|----------------------->| driver | |___________|
> | open(pdev) | ----------- | |
> | |
> | |_ _ _ _ _ _ _ _ _ _ _ _|_ _ _ _ _|
> \|/ \|/
> ----------- ------------
> | VF | | PF |
> ----------- ------------
>
>
> VF mediate driver could be a standalone driver that does not bind to
> any devices (as in demo code in patches 5-6) or it could be a built-in
> extension of PF driver (as in patches 7-9) .
>
> Rather than directly bind to VF, VF mediate driver register a mediate
> ops into vfio-pci in driver init. vfio-pci maintains a list of such
> mediate ops.
> (Note that: VF mediate driver can register mediate ops into vfio-pci
> before vfio-pci binding to any devices. And VF mediate driver can
> support mediating multiple devices.)
>
> When opening a device (e.g. a VF), vfio-pci goes through the mediate ops
> list and calls each vfio_pci_mediate_ops->open() with pdev of the opening
> device as a parameter.
> VF mediate driver should return success or failure depending on it
> supports the pdev or not.
> E.g. VF mediate driver would compare its supported VF devfn with the
> devfn of the passed-in pdev.
> Once vfio-pci finds a successful vfio_pci_mediate_ops->open(), it will
> stop querying other mediate ops and bind the opening device with this
> mediate ops using the returned mediate handle.
>
> Further vfio-pci ops (VFIO_DEVICE_GET_REGION_INFO ioctl, rw, mmap) on the
> VF will be intercepted into VF mediate driver as
> vfio_pci_mediate_ops->get_region_info(),
> vfio_pci_mediate_ops->rw,
> vfio_pci_mediate_ops->mmap, and get customized.
> For vfio_pci_mediate_ops->rw and vfio_pci_mediate_ops->mmap, they will
> further return 'pt' to indicate whether vfio-pci should further
> passthrough data to hw.
>
> when vfio-pci closes the VF, it calls its vfio_pci_mediate_ops->release()
> with a mediate handle as parameter.
>
> The mediate handle returned from vfio_pci_mediate_ops->open() lets VF
> mediate driver be able to differentiate two opening VFs of the same device
> id and vendor id.
>
> When VF mediate driver exits, it unregisters its mediate ops from
> vfio-pci.
>
>
> In this patchset, we enable vfio-pci to provide 3 things:
> (1) calling mediate ops to allow vendor driver customizing default
> region info/rw/mmap of a region.
> (2) provide a migration region to support migration
What's the benefit of introducing a region? It looks to me we don't
expect the region to be accessed directly from guest. Could we simply
extend device fd ioctl for doing such things?
> (3) provide a dynamic trap bar info region to allow vendor driver
> control trap/untrap of device pci bars
>
> This vfio-pci + mediate ops way differs from mdev way in that
> (1) medv way needs to create a 1:1 mdev device on top of one VF, device
> specific mdev parent driver is bound to VF directly.
> (2) vfio-pci + mediate ops way does not create mdev devices and VF
> mediate driver does not bind to VFs. Instead, vfio-pci binds to VFs.
>
> The reason why we don't choose the way of writing mdev parent driver is
> that
> (1) VFs are almost all the time directly passthroughed. Directly binding
> to vfio-pci can make most of the code shared/reused.
Can we split out the common parts from vfio-pci?
> If we write a
> vendor specific mdev parent driver, most of the code (like passthrough
> style of rw/mmap) still needs to be copied from vfio-pci driver, which is
> actually a duplicated and tedious work.
The mediate ops looks quite similar to what vfio-mdev did. And it looks
to me we need to consider live migration for mdev as well. In that case,
do we still expect mediate ops through VFIO directly?
> (2) For features like dynamically trap/untrap pci bars, if they are in
> vfio-pci, they can be available to most people without repeated code
> copying and re-testing.
> (3) with a 1:1 mdev driver which passthrough VFs most of the time, people
> have to decide whether to bind VFs to vfio-pci or mdev parent driver before
> it runs into a real migration need. However, if vfio-pci is bound
> initially, they have no chance to do live migration when there's a need
> later.
We can teach management layer to do this.
Thanks
>
> In this patchset,
> - patches 1-4 enable vfio-pci to call mediate ops registered by vendor
> driver to mediate/customize region info/rw/mmap.
>
> - patches 5-6 provide a standalone sample driver to register a mediate ops
> for Intel Graphics Devices. It does not bind to IGDs directly but decides
> what devices it supports via its pciidlist. It also demonstrates how to
> dynamic trap a device's PCI bars. (by adding more pciids in its
> pciidlist, this sample driver actually is not necessarily limited to
> support IGDs)
>
> - patch 7-9 provide a sample on i40e driver that supports Intel(R)
> Ethernet Controller XL710 Family of devices. It supports VF precopy live
> migration on Intel's 710 SRIOV. (but we commented out the real
> implementation of dirty page tracking and device state retrieving part
> to focus on demonstrating framework part. Will send out them in future
> versions)
>
> patch 7 registers/unregisters VF mediate ops when PF driver
> probes/removes. It specifies its supporting VFs via
> vfio_pci_mediate_ops->open(pdev)
>
> patch 8 reports device cap of VFIO_PCI_DEVICE_CAP_MIGRATION and
> provides a sample implementation of migration region.
> The QEMU part of vfio migration is based on v8
> https://lists.gnu.org/archive/html/qemu-devel/2019-08/msg05542.html.
> We do not based on recent v9 because we think there are still opens in
> dirty page track part in that series.
>
> patch 9 reports device cap of VFIO_PCI_DEVICE_CAP_DYNAMIC_TRAP_BAR and
> provides an example on how to trap part of bar0 when migration starts
> and passthrough this part of bar0 again when migration fails.
>
> Yan Zhao (9):
> vfio/pci: introduce mediate ops to intercept vfio-pci ops
> vfio/pci: test existence before calling region->ops
> vfio/pci: register a default migration region
> vfio-pci: register default dynamic-trap-bar-info region
> samples/vfio-pci/igd_dt: sample driver to mediate a passthrough IGD
> sample/vfio-pci/igd_dt: dynamically trap/untrap subregion of IGD bar0
> i40e/vf_migration: register mediate_ops to vfio-pci
> i40e/vf_migration: mediate migration region
> i40e/vf_migration: support dynamic trap of bar0
>
> drivers/net/ethernet/intel/Kconfig | 2 +-
> drivers/net/ethernet/intel/i40e/Makefile | 3 +-
> drivers/net/ethernet/intel/i40e/i40e.h | 2 +
> drivers/net/ethernet/intel/i40e/i40e_main.c | 3 +
> .../ethernet/intel/i40e/i40e_vf_migration.c | 626 ++++++++++++++++++
> .../ethernet/intel/i40e/i40e_vf_migration.h | 78 +++
> drivers/vfio/pci/vfio_pci.c | 189 +++++-
> drivers/vfio/pci/vfio_pci_private.h | 2 +
> include/linux/vfio.h | 18 +
> include/uapi/linux/vfio.h | 160 +++++
> samples/Kconfig | 6 +
> samples/Makefile | 1 +
> samples/vfio-pci/Makefile | 2 +
> samples/vfio-pci/igd_dt.c | 367 ++++++++++
> 14 files changed, 1455 insertions(+), 4 deletions(-)
> create mode 100644 drivers/net/ethernet/intel/i40e/i40e_vf_migration.c
> create mode 100644 drivers/net/ethernet/intel/i40e/i40e_vf_migration.h
> create mode 100644 samples/vfio-pci/Makefile
> create mode 100644 samples/vfio-pci/igd_dt.c
>
next prev parent reply other threads:[~2019-12-05 6:33 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-05 3:24 [RFC PATCH 0/9] Introduce mediate ops in vfio-pci Yan Zhao
2019-12-05 3:25 ` [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops Yan Zhao
2019-12-05 23:55 ` Alex Williamson
2019-12-06 7:56 ` Yan Zhao
2019-12-06 21:22 ` Alex Williamson
2019-12-09 3:42 ` Yan Zhao
2019-12-10 0:03 ` Alex Williamson
2019-12-10 2:44 ` Yan Zhao
2019-12-10 16:58 ` Alex Williamson
2019-12-11 1:19 ` Yan Zhao
2019-12-06 23:13 ` Eric Blake
2019-12-09 3:17 ` Yan Zhao
2019-12-05 3:25 ` [RFC PATCH 2/9] vfio/pci: test existence before calling region->ops Yan Zhao
2019-12-05 3:26 ` [RFC PATCH 3/9] vfio/pci: register a default migration region Yan Zhao
2019-12-05 23:55 ` Alex Williamson
2019-12-06 5:50 ` Yan Zhao
2019-12-05 3:26 ` [RFC PATCH 4/9] vfio-pci: register default dynamic-trap-bar-info region Yan Zhao
2019-12-05 23:55 ` Alex Williamson
2019-12-06 6:04 ` Yan Zhao
2019-12-06 15:20 ` Alex Williamson
2019-12-09 6:22 ` Yan Zhao
2019-12-09 21:16 ` Alex Williamson
2019-12-10 7:44 ` Yan Zhao
2019-12-10 16:38 ` Alex Williamson
2019-12-11 6:25 ` Yan Zhao
2019-12-11 18:56 ` Alex Williamson
2019-12-12 2:02 ` Yan Zhao
2019-12-12 3:07 ` Alex Williamson
2019-12-12 3:11 ` Yan Zhao
2019-12-05 3:27 ` [RFC PATCH 5/9] samples/vfio-pci/igd_dt: sample driver to mediate a passthrough IGD Yan Zhao
2019-12-05 3:27 ` [RFC PATCH 6/9] sample/vfio-pci/igd_dt: dynamically trap/untrap subregion of IGD bar0 Yan Zhao
2019-12-05 3:27 ` [RFC PATCH 7/9] i40e/vf_migration: register mediate_ops to vfio-pci Yan Zhao
2019-12-05 3:27 ` [RFC PATCH 8/9] i40e/vf_migration: mediate migration region Yan Zhao
2019-12-05 3:27 ` [RFC PATCH 9/9] i40e/vf_migration: support dynamic trap of bar0 Yan Zhao
2019-12-05 6:33 ` Jason Wang [this message]
2019-12-05 8:51 ` [RFC PATCH 0/9] Introduce mediate ops in vfio-pci Yan Zhao
2019-12-05 13:05 ` Jason Wang
2019-12-06 8:22 ` Yan Zhao
2019-12-06 9:40 ` Jason Wang
2019-12-06 12:49 ` Yan Zhao
2019-12-12 3:48 ` Jason Wang
2019-12-12 5:47 ` Yan Zhao
2019-12-18 2:36 ` Jason Wang
2019-12-06 17:42 ` Alex Williamson
2019-12-12 4:09 ` Jason Wang
2019-12-12 18:39 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8bcf603c-f142-f96d-bb11-834d686f5519@redhat.com \
--to=jasowang@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=cohuck@redhat.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=libvir-list@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=qemu-devel@nongnu.org \
--cc=shaopeng.he@intel.com \
--cc=yan.y.zhao@intel.com \
--cc=zhenyuw@linux.intel.com \
--cc=zhi.a.wang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).