linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/9] Introduce vendor ops in vfio-pci
@ 2020-02-11  9:57 Yan Zhao
  2020-02-11 10:10 ` [RFC PATCH v3 1/9] vfio/pci: export vfio_pci_device public and add vfio_pci_device_private Yan Zhao
                   ` (8 more replies)
  0 siblings, 9 replies; 14+ messages in thread
From: Yan Zhao @ 2020-02-11  9:57 UTC (permalink / raw)
  To: alex.williamson
  Cc: kvm, linux-kernel, cohuck, zhenyuw, zhi.a.wang, kevin.tian,
	shaopeng.he, yi.l.liu, Yan Zhao

When using module vfio-pci to pass through devices, though for the most
of time, it is desired to use default implementations in vfio-pci, vendors
sometimes may want to do certain kind of customization.
For example, vendors may want to add a vendor device region or may want to
intercept writes to a BAR region.
So, in this patch set, we introduce a way to allow vendors to focus on
handling of regions of their interest and call default vfio-pci ops to
handle the reset ones.

It goes like this:
(1) macros are provided to let vendor drivers register/unregister
vfio_pci_vendor_driver_ops to vfio_pci in their module_init() and
module_exit().
vfio_pci_vendor_driver_ops contains callbacks probe() and remove() and a
pointer to vfio_device_ops.

(2) vendor drivers define their module aliases as
"vfio-pci:$vendor_id-$device_id".
E.g. A vendor module for VF devices of Intel(R) Ethernet Controller XL710
family can define its module alias as MODULE_ALIAS("vfio-pci:8086-154c").

(3) when module vfio_pci is bound to a device, it would call modprobe in
user space for modules of alias "vfio-pci:$vendor_id-$device_id", which
would trigger unloaded vendor drivers to register their
vfio_pci_vendor_driver_ops to vfio_pci.
Then it searches registered ops list and calls probe() to test whether this
vendor driver supports this physical device.
A success probe() would make vfio_pci to use vfio_device_ops provided
vendor driver as the ops of the vfio device. So vfio_pci_ops are not to be
called for this device any more. Instead, they are exported to be called
from vendor drivers as a default implementation.


                                        _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
                                  
 __________   (un)register vendor ops  |  ___________    ___________   |
|          |<----------------------------|    VF    |   |           |   
| vfio-pci |                           | |  vendor  |   | PF driver |  |
|__________|---------------------------->|  driver  |   |___________|   
     |           probe/remove()        |  -----------          |       |
     |                                                         |         
     |                                 |_ _ _ _ _ _ _ _ _ _ _ _|_ _ _ _|
    \|/                                                       \|/
-----------                                              ------------
|    VF   |                                              |    PF    |
-----------                                              ------------
                   a typical usage in SRIOV



Ref counts:
(1) vendor drivers must be a module and compiled to depend on module
vfio_pci.
(2) In vfio_pci, a successful register would add refs of itself, and a
successful unregister would derefs of itself.
(3) In vfio_pci, a successful probe() of a vendor driver would add ref of
the vendor module. It derefs of the vendor module after calling remove().
(4) macro provided to make sure vendor module always unregister itself in
its module_exit

Those are to prevent below conditions:
a. vfio_pci is unloaded after a successful register from vendor driver.
   Though vfio_pci would later call modprobe to ask the vendor module to
   register again, it cannot help if vendor driver remain as loaded
   across unloading-loading of vfio_pci.
b. vendor driver unregisters itself after successfully probed by vfio_pci.
c. circular dependency between vfio_pci and the vendor driver.
   if vfio_pci adds refs to both vfio_pci and vendor driver on a successful
   register and if vendor driver only do the unregister in its module_exit,
   then it would have no chance to do the unregister.


Patch Overview
patches 1-2 making struct vfio_pci_device public and functions
            in struct vfio_pci_ops exported
patches 3-4 provide register/unregister interfaces for vendor drivers
patches 5-6 some more enhancements
patch 7 provides an sample to pass through IGD devices
patches 8-9 implement VF live migration on Intel's 710 SRIOV devices.
            Some dirty page tracking functions are intentionally
            commented out and would send out later in future.

Changelog:
RFC v2- RFC v3:
- embedding struct vfio_pci_device into struct vfio_pci_device_private.
(Alex)

RFC v1- RFC v2:
- renamed mediate ops to vendor ops
- use of request_module and module alias to manage vendor driver load
  (Alex)
- changed from vfio_pci_ops calling vendor ops
  to vendor ops calling default vfio_pci_ops  (Alex)
- dropped patches for dynamic traps of BARs. will submit them later.

Links:
Previous versions:
RFC v2:
https://lkml.org/lkml/2020/1/30/956

RFC v1:
kernel part: https://www.spinics.net/lists/kernel/msg3337337.html.
qemu part: https://www.spinics.net/lists/kernel/msg3337337.html.

VFIO live migration v8:
https://lists.gnu.org/archive/html/qemu-devel/2019-08/msg05542.html.


Yan Zhao (9):
  vfio/pci: export vfio_pci_device public and add
    vfio_pci_device_private
  vfio/pci: export functions in vfio_pci_ops
  vfio/pci: register/unregister vfio_pci_vendor_driver_ops
  vfio/pci: macros to generate module_init and module_exit for vendor
    modules
  vfio/pci: let vfio_pci know how many vendor regions are registered
  vfio/pci: export vfio_pci_setup_barmap
  samples/vfio-pci: add a sample vendor module of vfio-pci for IGD
    devices
  vfio: header for vfio live migration region.
  i40e/vf_migration: vfio-pci vendor driver for VF live migration

 drivers/net/ethernet/intel/Kconfig            |  10 +
 drivers/net/ethernet/intel/i40e/Makefile      |   2 +
 drivers/net/ethernet/intel/i40e/i40e.h        |   2 +
 .../ethernet/intel/i40e/i40e_vf_migration.c   | 635 ++++++++++++++++++
 .../ethernet/intel/i40e/i40e_vf_migration.h   |  92 +++
 drivers/vfio/pci/vfio_pci.c                   | 385 +++++++----
 drivers/vfio/pci/vfio_pci_config.c            | 186 ++---
 drivers/vfio/pci/vfio_pci_igd.c               |  19 +-
 drivers/vfio/pci/vfio_pci_intrs.c             | 186 ++---
 drivers/vfio/pci/vfio_pci_nvlink2.c           |  22 +-
 drivers/vfio/pci/vfio_pci_private.h           |  13 +-
 drivers/vfio/pci/vfio_pci_rdwr.c              |  64 +-
 include/linux/vfio.h                          |  57 ++
 include/uapi/linux/vfio.h                     | 149 ++++
 samples/Kconfig                               |   6 +
 samples/Makefile                              |   1 +
 samples/vfio-pci/Makefile                     |   2 +
 samples/vfio-pci/igd_pt.c                     | 148 ++++
 18 files changed, 1645 insertions(+), 334 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/i40e/i40e_vf_migration.c
 create mode 100644 drivers/net/ethernet/intel/i40e/i40e_vf_migration.h
 create mode 100644 samples/vfio-pci/Makefile
 create mode 100644 samples/vfio-pci/igd_pt.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 1/9] vfio/pci: export vfio_pci_device public and add vfio_pci_device_private
  2020-02-11  9:57 [RFC PATCH v3 0/9] Introduce vendor ops in vfio-pci Yan Zhao
@ 2020-02-11 10:10 ` Yan Zhao
  2020-02-20 21:00   ` Alex Williamson
  2020-02-11 10:11 ` [RFC PATCH v3 2/9] vfio/pci: export functions in vfio_pci_ops Yan Zhao
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: Yan Zhao @ 2020-02-11 10:10 UTC (permalink / raw)
  To: alex.williamson
  Cc: kvm, linux-kernel, cohuck, zhenyuw, zhi.a.wang, kevin.tian,
	shaopeng.he, yi.l.liu, Yan Zhao

(1) make vfio_pci_device public, so it is accessible from external code.
(2) add a private struct vfio_pci_device_private, which is only accessible
from internal code. It extends struct vfio_pci_device.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/vfio/pci/vfio_pci.c         | 256 +++++++++++++++-------------
 drivers/vfio/pci/vfio_pci_config.c  | 186 ++++++++++++--------
 drivers/vfio/pci/vfio_pci_igd.c     |  19 ++-
 drivers/vfio/pci/vfio_pci_intrs.c   | 186 +++++++++++---------
 drivers/vfio/pci/vfio_pci_nvlink2.c |  22 +--
 drivers/vfio/pci/vfio_pci_private.h |   7 +-
 drivers/vfio/pci/vfio_pci_rdwr.c    |  40 +++--
 include/linux/vfio.h                |   5 +
 8 files changed, 408 insertions(+), 313 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 02206162eaa9..f19e2dc498a4 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -112,8 +112,9 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev)
 	struct resource *res;
 	int bar;
 	struct vfio_pci_dummy_resource *dummy_res;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 
-	INIT_LIST_HEAD(&vdev->dummy_resources_list);
+	INIT_LIST_HEAD(&priv->dummy_resources_list);
 
 	for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; bar++) {
 		res = vdev->pdev->resource + bar;
@@ -133,7 +134,7 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev)
 			goto no_mmap;
 
 		if (resource_size(res) >= PAGE_SIZE) {
-			vdev->bar_mmap_supported[bar] = true;
+			priv->bar_mmap_supported[bar] = true;
 			continue;
 		}
 
@@ -158,8 +159,8 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev)
 			}
 			dummy_res->index = bar;
 			list_add(&dummy_res->res_next,
-					&vdev->dummy_resources_list);
-			vdev->bar_mmap_supported[bar] = true;
+					&priv->dummy_resources_list);
+			priv->bar_mmap_supported[bar] = true;
 			continue;
 		}
 		/*
@@ -171,7 +172,7 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev)
 		 * the BAR's location in a page.
 		 */
 no_mmap:
-		vdev->bar_mmap_supported[bar] = false;
+		priv->bar_mmap_supported[bar] = false;
 	}
 }
 
@@ -210,6 +211,7 @@ static bool vfio_pci_nointx(struct pci_dev *pdev)
 static void vfio_pci_probe_power_state(struct vfio_pci_device *vdev)
 {
 	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	u16 pmcsr;
 
 	if (!pdev->pm_cap)
@@ -217,7 +219,7 @@ static void vfio_pci_probe_power_state(struct vfio_pci_device *vdev)
 
 	pci_read_config_word(pdev, pdev->pm_cap + PCI_PM_CTRL, &pmcsr);
 
-	vdev->needs_pm_restore = !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET);
+	priv->needs_pm_restore = !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET);
 }
 
 /*
@@ -231,9 +233,10 @@ int vfio_pci_set_power_state(struct vfio_pci_device *vdev, pci_power_t state)
 {
 	struct pci_dev *pdev = vdev->pdev;
 	bool needs_restore = false, needs_save = false;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	int ret;
 
-	if (vdev->needs_pm_restore) {
+	if (priv->needs_pm_restore) {
 		if (pdev->current_state < PCI_D3hot && state >= PCI_D3hot) {
 			pci_save_state(pdev);
 			needs_save = true;
@@ -248,9 +251,9 @@ int vfio_pci_set_power_state(struct vfio_pci_device *vdev, pci_power_t state)
 	if (!ret) {
 		/* D3 might be unsupported via quirk, skip unless in D3 */
 		if (needs_save && pdev->current_state >= PCI_D3hot) {
-			vdev->pm_save = pci_store_saved_state(pdev);
+			priv->pm_save = pci_store_saved_state(pdev);
 		} else if (needs_restore) {
-			pci_load_and_free_saved_state(pdev, &vdev->pm_save);
+			pci_load_and_free_saved_state(pdev, &priv->pm_save);
 			pci_restore_state(pdev);
 		}
 	}
@@ -261,6 +264,7 @@ int vfio_pci_set_power_state(struct vfio_pci_device *vdev, pci_power_t state)
 static int vfio_pci_enable(struct vfio_pci_device *vdev)
 {
 	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	int ret;
 	u16 cmd;
 	u8 msix_pos;
@@ -281,31 +285,31 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
 		return ret;
 	}
 
-	vdev->reset_works = !ret;
+	priv->reset_works = !ret;
 	pci_save_state(pdev);
-	vdev->pci_saved_state = pci_store_saved_state(pdev);
-	if (!vdev->pci_saved_state)
+	priv->pci_saved_state = pci_store_saved_state(pdev);
+	if (!priv->pci_saved_state)
 		pci_dbg(pdev, "%s: Couldn't store saved state\n", __func__);
 
 	if (likely(!nointxmask)) {
 		if (vfio_pci_nointx(pdev)) {
 			pci_info(pdev, "Masking broken INTx support\n");
-			vdev->nointx = true;
+			priv->nointx = true;
 			pci_intx(pdev, 0);
 		} else
-			vdev->pci_2_3 = pci_intx_mask_supported(pdev);
+			priv->pci_2_3 = pci_intx_mask_supported(pdev);
 	}
 
 	pci_read_config_word(pdev, PCI_COMMAND, &cmd);
-	if (vdev->pci_2_3 && (cmd & PCI_COMMAND_INTX_DISABLE)) {
+	if (priv->pci_2_3 && (cmd & PCI_COMMAND_INTX_DISABLE)) {
 		cmd &= ~PCI_COMMAND_INTX_DISABLE;
 		pci_write_config_word(pdev, PCI_COMMAND, cmd);
 	}
 
 	ret = vfio_config_init(vdev);
 	if (ret) {
-		kfree(vdev->pci_saved_state);
-		vdev->pci_saved_state = NULL;
+		kfree(priv->pci_saved_state);
+		priv->pci_saved_state = NULL;
 		pci_disable_device(pdev);
 		return ret;
 	}
@@ -318,14 +322,14 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
 		pci_read_config_word(pdev, msix_pos + PCI_MSIX_FLAGS, &flags);
 		pci_read_config_dword(pdev, msix_pos + PCI_MSIX_TABLE, &table);
 
-		vdev->msix_bar = table & PCI_MSIX_TABLE_BIR;
-		vdev->msix_offset = table & PCI_MSIX_TABLE_OFFSET;
-		vdev->msix_size = ((flags & PCI_MSIX_FLAGS_QSIZE) + 1) * 16;
+		priv->msix_bar = table & PCI_MSIX_TABLE_BIR;
+		priv->msix_offset = table & PCI_MSIX_TABLE_OFFSET;
+		priv->msix_size = ((flags & PCI_MSIX_FLAGS_QSIZE) + 1) * 16;
 	} else
-		vdev->msix_bar = 0xFF;
+		priv->msix_bar = 0xFF;
 
 	if (!vfio_vga_disabled() && vfio_pci_is_vga(pdev))
-		vdev->has_vga = true;
+		priv->has_vga = true;
 
 
 	if (vfio_pci_is_vga(pdev) &&
@@ -369,6 +373,7 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
 {
 	struct pci_dev *pdev = vdev->pdev;
 	struct vfio_pci_dummy_resource *dummy_res, *tmp;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp;
 	int i, bar;
 
@@ -381,40 +386,40 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
 
 	/* Device closed, don't need mutex here */
 	list_for_each_entry_safe(ioeventfd, ioeventfd_tmp,
-				 &vdev->ioeventfds_list, next) {
+				 &priv->ioeventfds_list, next) {
 		vfio_virqfd_disable(&ioeventfd->virqfd);
 		list_del(&ioeventfd->next);
 		kfree(ioeventfd);
 	}
-	vdev->ioeventfds_nr = 0;
+	priv->ioeventfds_nr = 0;
 
-	vdev->virq_disabled = false;
+	priv->virq_disabled = false;
 
 	for (i = 0; i < vdev->num_regions; i++)
-		vdev->region[i].ops->release(vdev, &vdev->region[i]);
+		priv->region[i].ops->release(vdev, &priv->region[i]);
 
 	vdev->num_regions = 0;
-	kfree(vdev->region);
-	vdev->region = NULL; /* don't krealloc a freed pointer */
+	kfree(priv->region);
+	priv->region = NULL; /* don't krealloc a freed pointer */
 
 	vfio_config_free(vdev);
 
 	for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; bar++) {
-		if (!vdev->barmap[bar])
+		if (!priv->barmap[bar])
 			continue;
-		pci_iounmap(pdev, vdev->barmap[bar]);
+		pci_iounmap(pdev, priv->barmap[bar]);
 		pci_release_selected_regions(pdev, 1 << bar);
-		vdev->barmap[bar] = NULL;
+		priv->barmap[bar] = NULL;
 	}
 
 	list_for_each_entry_safe(dummy_res, tmp,
-				 &vdev->dummy_resources_list, res_next) {
+				 &priv->dummy_resources_list, res_next) {
 		list_del(&dummy_res->res_next);
 		release_resource(&dummy_res->resource);
 		kfree(dummy_res);
 	}
 
-	vdev->needs_reset = true;
+	priv->needs_reset = true;
 
 	/*
 	 * If we have saved state, restore it.  If we can reset the device,
@@ -422,10 +427,10 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
 	 * nothing, but saving and restoring current state without reset
 	 * is just busy work.
 	 */
-	if (pci_load_and_free_saved_state(pdev, &vdev->pci_saved_state)) {
+	if (pci_load_and_free_saved_state(pdev, &priv->pci_saved_state)) {
 		pci_info(pdev, "%s: Couldn't reload saved state\n", __func__);
 
-		if (!vdev->reset_works)
+		if (!priv->reset_works)
 			goto out;
 
 		pci_save_state(pdev);
@@ -444,10 +449,10 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
 	 * We can not use the "try" reset interface here, which will
 	 * overwrite the previously restored configuration information.
 	 */
-	if (vdev->reset_works && pci_cfg_access_trylock(pdev)) {
+	if (priv->reset_works && pci_cfg_access_trylock(pdev)) {
 		if (device_trylock(&pdev->dev)) {
 			if (!__pci_reset_function_locked(pdev))
-				vdev->needs_reset = false;
+				priv->needs_reset = false;
 			device_unlock(&pdev->dev);
 		}
 		pci_cfg_access_unlock(pdev);
@@ -466,15 +471,16 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
 static void vfio_pci_release(void *device_data)
 {
 	struct vfio_pci_device *vdev = device_data;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 
-	mutex_lock(&vdev->reflck->lock);
+	mutex_lock(&priv->reflck->lock);
 
-	if (!(--vdev->refcnt)) {
+	if (!(--priv->refcnt)) {
 		vfio_spapr_pci_eeh_release(vdev->pdev);
 		vfio_pci_disable(vdev);
 	}
 
-	mutex_unlock(&vdev->reflck->lock);
+	mutex_unlock(&priv->reflck->lock);
 
 	module_put(THIS_MODULE);
 }
@@ -482,23 +488,24 @@ static void vfio_pci_release(void *device_data)
 static int vfio_pci_open(void *device_data)
 {
 	struct vfio_pci_device *vdev = device_data;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	int ret = 0;
 
 	if (!try_module_get(THIS_MODULE))
 		return -ENODEV;
 
-	mutex_lock(&vdev->reflck->lock);
+	mutex_lock(&priv->reflck->lock);
 
-	if (!vdev->refcnt) {
+	if (!priv->refcnt) {
 		ret = vfio_pci_enable(vdev);
 		if (ret)
 			goto error;
 
 		vfio_spapr_pci_eeh_open(vdev->pdev);
 	}
-	vdev->refcnt++;
+	priv->refcnt++;
 error:
-	mutex_unlock(&vdev->reflck->lock);
+	mutex_unlock(&priv->reflck->lock);
 	if (ret)
 		module_put(THIS_MODULE);
 	return ret;
@@ -506,11 +513,13 @@ static int vfio_pci_open(void *device_data)
 
 static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
 {
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+
 	if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
 		u8 pin;
 
 		if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) ||
-		    vdev->nointx || vdev->pdev->is_virtfn)
+		    priv->nointx || vdev->pdev->is_virtfn)
 			return 0;
 
 		pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN, &pin);
@@ -668,20 +677,21 @@ int vfio_pci_register_dev_region(struct vfio_pci_device *vdev,
 				 size_t size, u32 flags, void *data)
 {
 	struct vfio_pci_region *region;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 
-	region = krealloc(vdev->region,
+	region = krealloc(priv->region,
 			  (vdev->num_regions + 1) * sizeof(*region),
 			  GFP_KERNEL);
 	if (!region)
 		return -ENOMEM;
 
-	vdev->region = region;
-	vdev->region[vdev->num_regions].type = type;
-	vdev->region[vdev->num_regions].subtype = subtype;
-	vdev->region[vdev->num_regions].ops = ops;
-	vdev->region[vdev->num_regions].size = size;
-	vdev->region[vdev->num_regions].flags = flags;
-	vdev->region[vdev->num_regions].data = data;
+	priv->region = region;
+	priv->region[vdev->num_regions].type = type;
+	priv->region[vdev->num_regions].subtype = subtype;
+	priv->region[vdev->num_regions].ops = ops;
+	priv->region[vdev->num_regions].size = size;
+	priv->region[vdev->num_regions].flags = flags;
+	priv->region[vdev->num_regions].data = data;
 
 	vdev->num_regions++;
 
@@ -693,6 +703,7 @@ static long vfio_pci_ioctl(void *device_data,
 {
 	struct vfio_pci_device *vdev = device_data;
 	unsigned long minsz;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 
 	if (cmd == VFIO_DEVICE_GET_INFO) {
 		struct vfio_device_info info;
@@ -707,7 +718,7 @@ static long vfio_pci_ioctl(void *device_data,
 
 		info.flags = VFIO_DEVICE_FLAGS_PCI;
 
-		if (vdev->reset_works)
+		if (priv->reset_works)
 			info.flags |= VFIO_DEVICE_FLAGS_RESET;
 
 		info.num_regions = VFIO_PCI_NUM_REGIONS + vdev->num_regions;
@@ -747,9 +758,9 @@ static long vfio_pci_ioctl(void *device_data,
 
 			info.flags = VFIO_REGION_INFO_FLAG_READ |
 				     VFIO_REGION_INFO_FLAG_WRITE;
-			if (vdev->bar_mmap_supported[info.index]) {
+			if (priv->bar_mmap_supported[info.index]) {
 				info.flags |= VFIO_REGION_INFO_FLAG_MMAP;
-				if (info.index == vdev->msix_bar) {
+				if (info.index == priv->msix_bar) {
 					ret = msix_mmappable_cap(vdev, &caps);
 					if (ret)
 						return ret;
@@ -797,7 +808,7 @@ static long vfio_pci_ioctl(void *device_data,
 			break;
 		}
 		case VFIO_PCI_VGA_REGION_INDEX:
-			if (!vdev->has_vga)
+			if (!priv->has_vga)
 				return -EINVAL;
 
 			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
@@ -822,20 +833,20 @@ static long vfio_pci_ioctl(void *device_data,
 			i = info.index - VFIO_PCI_NUM_REGIONS;
 
 			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
-			info.size = vdev->region[i].size;
-			info.flags = vdev->region[i].flags;
+			info.size = priv->region[i].size;
+			info.flags = priv->region[i].flags;
 
-			cap_type.type = vdev->region[i].type;
-			cap_type.subtype = vdev->region[i].subtype;
+			cap_type.type = priv->region[i].type;
+			cap_type.subtype = priv->region[i].subtype;
 
 			ret = vfio_info_add_capability(&caps, &cap_type.header,
 						       sizeof(cap_type));
 			if (ret)
 				return ret;
 
-			if (vdev->region[i].ops->add_capability) {
-				ret = vdev->region[i].ops->add_capability(vdev,
-						&vdev->region[i], &caps);
+			if (priv->region[i].ops->add_capability) {
+				ret = priv->region[i].ops->add_capability(vdev,
+						&priv->region[i], &caps);
 				if (ret)
 					return ret;
 			}
@@ -925,18 +936,18 @@ static long vfio_pci_ioctl(void *device_data,
 				return PTR_ERR(data);
 		}
 
-		mutex_lock(&vdev->igate);
+		mutex_lock(&priv->igate);
 
 		ret = vfio_pci_set_irqs_ioctl(vdev, hdr.flags, hdr.index,
 					      hdr.start, hdr.count, data);
 
-		mutex_unlock(&vdev->igate);
+		mutex_unlock(&priv->igate);
 		kfree(data);
 
 		return ret;
 
 	} else if (cmd == VFIO_DEVICE_RESET) {
-		return vdev->reset_works ?
+		return priv->reset_works ?
 			pci_try_reset_function(vdev->pdev) : -EINVAL;
 
 	} else if (cmd == VFIO_DEVICE_GET_PCI_HOT_RESET_INFO) {
@@ -1147,6 +1158,7 @@ static ssize_t vfio_pci_rw(void *device_data, char __user *buf,
 {
 	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
 	struct vfio_pci_device *vdev = device_data;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 
 	if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions)
 		return -EINVAL;
@@ -1167,7 +1179,7 @@ static ssize_t vfio_pci_rw(void *device_data, char __user *buf,
 		return vfio_pci_vga_rw(vdev, buf, count, ppos, iswrite);
 	default:
 		index -= VFIO_PCI_NUM_REGIONS;
-		return vdev->region[index].ops->rw(vdev, buf,
+		return priv->region[index].ops->rw(vdev, buf,
 						   count, ppos, iswrite);
 	}
 
@@ -1195,6 +1207,7 @@ static ssize_t vfio_pci_write(void *device_data, const char __user *buf,
 static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
 {
 	struct vfio_pci_device *vdev = device_data;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	struct pci_dev *pdev = vdev->pdev;
 	unsigned int index;
 	u64 phys_len, req_len, pgoff, req_start;
@@ -1208,7 +1221,7 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
 		return -EINVAL;
 	if (index >= VFIO_PCI_NUM_REGIONS) {
 		int regnum = index - VFIO_PCI_NUM_REGIONS;
-		struct vfio_pci_region *region = vdev->region + regnum;
+		struct vfio_pci_region *region = priv->region + regnum;
 
 		if (region && region->ops && region->ops->mmap &&
 		    (region->flags & VFIO_REGION_INFO_FLAG_MMAP))
@@ -1217,7 +1230,7 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
 	}
 	if (index >= VFIO_PCI_ROM_REGION_INDEX)
 		return -EINVAL;
-	if (!vdev->bar_mmap_supported[index])
+	if (!priv->bar_mmap_supported[index])
 		return -EINVAL;
 
 	phys_len = PAGE_ALIGN(pci_resource_len(pdev, index));
@@ -1233,14 +1246,14 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
 	 * Even though we don't make use of the barmap for the mmap,
 	 * we need to request the region and the barmap tracks that.
 	 */
-	if (!vdev->barmap[index]) {
+	if (!priv->barmap[index]) {
 		ret = pci_request_selected_regions(pdev,
 						   1 << index, "vfio-pci");
 		if (ret)
 			return ret;
 
-		vdev->barmap[index] = pci_iomap(pdev, index, 0);
-		if (!vdev->barmap[index]) {
+		priv->barmap[index] = pci_iomap(pdev, index, 0);
+		if (!priv->barmap[index]) {
 			pci_release_selected_regions(pdev, 1 << index);
 			return -ENOMEM;
 		}
@@ -1258,21 +1271,22 @@ static void vfio_pci_request(void *device_data, unsigned int count)
 {
 	struct vfio_pci_device *vdev = device_data;
 	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 
-	mutex_lock(&vdev->igate);
+	mutex_lock(&priv->igate);
 
-	if (vdev->req_trigger) {
+	if (priv->req_trigger) {
 		if (!(count % 10))
 			pci_notice_ratelimited(pdev,
 				"Relaying device request to user (#%u)\n",
 				count);
-		eventfd_signal(vdev->req_trigger, 1);
+		eventfd_signal(priv->req_trigger, 1);
 	} else if (count == 0) {
 		pci_warn(pdev,
 			"No device request channel registered, blocked until released by user\n");
 	}
 
-	mutex_unlock(&vdev->igate);
+	mutex_unlock(&priv->igate);
 }
 
 static const struct vfio_device_ops vfio_pci_ops = {
@@ -1291,7 +1305,7 @@ static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck);
 
 static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
-	struct vfio_pci_device *vdev;
+	struct vfio_pci_device_private *priv;
 	struct iommu_group *group;
 	int ret;
 
@@ -1315,41 +1329,42 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (!group)
 		return -EINVAL;
 
-	vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
-	if (!vdev) {
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv) {
 		vfio_iommu_group_put(group, &pdev->dev);
 		return -ENOMEM;
 	}
 
-	vdev->pdev = pdev;
-	vdev->irq_type = VFIO_PCI_NUM_IRQS;
-	mutex_init(&vdev->igate);
-	spin_lock_init(&vdev->irqlock);
-	mutex_init(&vdev->ioeventfds_lock);
-	INIT_LIST_HEAD(&vdev->ioeventfds_list);
+	priv->vdev.pdev = pdev;
+	priv->vdev.irq_type = VFIO_PCI_NUM_IRQS;
+	mutex_init(&priv->igate);
+	spin_lock_init(&priv->irqlock);
+	mutex_init(&priv->ioeventfds_lock);
+	INIT_LIST_HEAD(&priv->ioeventfds_list);
 
-	ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev);
+	ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, &priv->vdev);
 	if (ret) {
 		vfio_iommu_group_put(group, &pdev->dev);
-		kfree(vdev);
+		kfree(priv);
 		return ret;
 	}
 
-	ret = vfio_pci_reflck_attach(vdev);
+	ret = vfio_pci_reflck_attach(&priv->vdev);
 	if (ret) {
 		vfio_del_group_dev(&pdev->dev);
 		vfio_iommu_group_put(group, &pdev->dev);
-		kfree(vdev);
+		kfree(priv);
 		return ret;
 	}
 
 	if (vfio_pci_is_vga(pdev)) {
-		vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
+		vga_client_register(pdev, &priv->vdev, NULL,
+				    vfio_pci_set_vga_decode);
 		vga_set_legacy_decoding(pdev,
-					vfio_pci_set_vga_decode(vdev, false));
+				vfio_pci_set_vga_decode(&priv->vdev, false));
 	}
 
-	vfio_pci_probe_power_state(vdev);
+	vfio_pci_probe_power_state(&priv->vdev);
 
 	if (!disable_idle_d3) {
 		/*
@@ -1361,8 +1376,8 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		 * be able to get to D3.  Therefore first do a D0 transition
 		 * before going to D3.
 		 */
-		vfio_pci_set_power_state(vdev, PCI_D0);
-		vfio_pci_set_power_state(vdev, PCI_D3hot);
+		vfio_pci_set_power_state(&priv->vdev, PCI_D0);
+		vfio_pci_set_power_state(&priv->vdev, PCI_D3hot);
 	}
 
 	return ret;
@@ -1371,22 +1386,25 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 static void vfio_pci_remove(struct pci_dev *pdev)
 {
 	struct vfio_pci_device *vdev;
+	struct vfio_pci_device_private *priv;
 
 	vdev = vfio_del_group_dev(&pdev->dev);
 	if (!vdev)
 		return;
 
-	vfio_pci_reflck_put(vdev->reflck);
+	priv = VDEV_TO_PRIV(vdev);
+
+	vfio_pci_reflck_put(priv->reflck);
 
 	vfio_iommu_group_put(pdev->dev.iommu_group, &pdev->dev);
-	kfree(vdev->region);
-	mutex_destroy(&vdev->ioeventfds_lock);
+	kfree(priv->region);
+	mutex_destroy(&priv->ioeventfds_lock);
 
 	if (!disable_idle_d3)
 		vfio_pci_set_power_state(vdev, PCI_D0);
 
-	kfree(vdev->pm_save);
-	kfree(vdev);
+	kfree(priv->pm_save);
+	kfree(priv);
 
 	if (vfio_pci_is_vga(pdev)) {
 		vga_client_register(pdev, NULL, NULL, NULL);
@@ -1401,6 +1419,7 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
 {
 	struct vfio_pci_device *vdev;
 	struct vfio_device *device;
+	struct vfio_pci_device_private *priv;
 
 	device = vfio_device_get_from_dev(&pdev->dev);
 	if (device == NULL)
@@ -1411,13 +1430,14 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
 		vfio_device_put(device);
 		return PCI_ERS_RESULT_DISCONNECT;
 	}
+	priv = VDEV_TO_PRIV(vdev);
 
-	mutex_lock(&vdev->igate);
+	mutex_lock(&priv->igate);
 
-	if (vdev->err_trigger)
-		eventfd_signal(vdev->err_trigger, 1);
+	if (priv->err_trigger)
+		eventfd_signal(priv->err_trigger, 1);
 
-	mutex_unlock(&vdev->igate);
+	mutex_unlock(&priv->igate);
 
 	vfio_device_put(device);
 
@@ -1462,6 +1482,7 @@ static int vfio_pci_reflck_find(struct pci_dev *pdev, void *data)
 	struct vfio_pci_reflck **preflck = data;
 	struct vfio_device *device;
 	struct vfio_pci_device *vdev;
+	struct vfio_pci_device_private *priv;
 
 	device = vfio_device_get_from_dev(&pdev->dev);
 	if (!device)
@@ -1473,10 +1494,11 @@ static int vfio_pci_reflck_find(struct pci_dev *pdev, void *data)
 	}
 
 	vdev = vfio_device_data(device);
+	priv = VDEV_TO_PRIV(vdev);
 
-	if (vdev->reflck) {
-		vfio_pci_reflck_get(vdev->reflck);
-		*preflck = vdev->reflck;
+	if (priv->reflck) {
+		vfio_pci_reflck_get(priv->reflck);
+		*preflck = priv->reflck;
 		vfio_device_put(device);
 		return 1;
 	}
@@ -1488,17 +1510,18 @@ static int vfio_pci_reflck_find(struct pci_dev *pdev, void *data)
 static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev)
 {
 	bool slot = !pci_probe_reset_slot(vdev->pdev->slot);
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 
 	mutex_lock(&reflck_lock);
 
 	if (pci_is_root_bus(vdev->pdev->bus) ||
 	    vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_reflck_find,
-					  &vdev->reflck, slot) <= 0)
-		vdev->reflck = vfio_pci_reflck_alloc();
+					  &priv->reflck, slot) <= 0)
+		priv->reflck = vfio_pci_reflck_alloc();
 
 	mutex_unlock(&reflck_lock);
 
-	return PTR_ERR_OR_ZERO(vdev->reflck);
+	return PTR_ERR_OR_ZERO(priv->reflck);
 }
 
 static void vfio_pci_reflck_release(struct kref *kref)
@@ -1527,6 +1550,7 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data)
 	struct vfio_devices *devs = data;
 	struct vfio_device *device;
 	struct vfio_pci_device *vdev;
+	struct vfio_pci_device_private *priv;
 
 	if (devs->cur_index == devs->max_index)
 		return -ENOSPC;
@@ -1541,9 +1565,10 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data)
 	}
 
 	vdev = vfio_device_data(device);
+	priv = VDEV_TO_PRIV(vdev);
 
 	/* Fault if the device is not unused */
-	if (vdev->refcnt) {
+	if (priv->refcnt) {
 		vfio_device_put(device);
 		return -EBUSY;
 	}
@@ -1559,7 +1584,7 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data)
  *  - At least one of the affected devices is marked dirty via
  *    needs_reset (such as by lack of FLR support)
  * Then attempt to perform that bus or slot reset.  Callers are required
- * to hold vdev->reflck->lock, protecting the bus/slot reset group from
+ * to hold priv->reflck->lock, protecting the bus/slot reset group from
  * concurrent opens.  A vfio_device reference is acquired for each device
  * to prevent unbinds during the reset operation.
  *
@@ -1574,6 +1599,7 @@ static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev)
 	int i = 0, ret = -EINVAL;
 	bool slot = false;
 	struct vfio_pci_device *tmp;
+	struct vfio_pci_device_private *priv;
 
 	if (!pci_probe_reset_slot(vdev->pdev->slot))
 		slot = true;
@@ -1597,7 +1623,8 @@ static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev)
 	/* Does at least one need a reset? */
 	for (i = 0; i < devs.cur_index; i++) {
 		tmp = vfio_device_data(devs.devices[i]);
-		if (tmp->needs_reset) {
+		priv = VDEV_TO_PRIV(tmp);
+		if (priv->needs_reset) {
 			ret = pci_reset_bus(vdev->pdev);
 			break;
 		}
@@ -1606,6 +1633,7 @@ static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev)
 put_devs:
 	for (i = 0; i < devs.cur_index; i++) {
 		tmp = vfio_device_data(devs.devices[i]);
+		priv = VDEV_TO_PRIV(tmp);
 
 		/*
 		 * If reset was successful, affected devices no longer need
@@ -1615,7 +1643,7 @@ static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev)
 		 * the power state.
 		 */
 		if (!ret) {
-			tmp->needs_reset = false;
+			priv->needs_reset = false;
 
 			if (tmp != vdev && !disable_idle_d3)
 				vfio_pci_set_power_state(tmp, PCI_D3hot);
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index f0891bd8444c..f67d38e24faa 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -176,8 +176,9 @@ static int vfio_default_config_read(struct vfio_pci_device *vdev, int pos,
 				    int offset, __le32 *val)
 {
 	__le32 virt = 0;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 
-	memcpy(val, vdev->vconfig + pos, count);
+	memcpy(val, priv->vconfig + pos, count);
 
 	memcpy(&virt, perm->virt + offset, count);
 
@@ -202,6 +203,7 @@ static int vfio_default_config_write(struct vfio_pci_device *vdev, int pos,
 				     int offset, __le32 val)
 {
 	__le32 virt = 0, write = 0;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 
 	memcpy(&write, perm->write + offset, count);
 
@@ -214,12 +216,12 @@ static int vfio_default_config_write(struct vfio_pci_device *vdev, int pos,
 	if (write & virt) {
 		__le32 virt_val = 0;
 
-		memcpy(&virt_val, vdev->vconfig + pos, count);
+		memcpy(&virt_val, priv->vconfig + pos, count);
 
 		virt_val &= ~(write & virt);
 		virt_val |= (val & (write & virt));
 
-		memcpy(vdev->vconfig + pos, &virt_val, count);
+		memcpy(priv->vconfig + pos, &virt_val, count);
 	}
 
 	/* Non-virtualzed and writable bits go to hardware */
@@ -249,6 +251,7 @@ static int vfio_direct_config_read(struct vfio_pci_device *vdev, int pos,
 				   int offset, __le32 *val)
 {
 	int ret;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 
 	ret = vfio_user_config_read(vdev->pdev, pos, val, count);
 	if (ret)
@@ -256,13 +259,13 @@ static int vfio_direct_config_read(struct vfio_pci_device *vdev, int pos,
 
 	if (pos >= PCI_CFG_SPACE_SIZE) { /* Extended cap header mangling */
 		if (offset < 4)
-			memcpy(val, vdev->vconfig + pos, count);
+			memcpy(val, priv->vconfig + pos, count);
 	} else if (pos >= PCI_STD_HEADER_SIZEOF) { /* Std cap mangling */
 		if (offset == PCI_CAP_LIST_ID && count > 1)
-			memcpy(val, vdev->vconfig + pos,
+			memcpy(val, priv->vconfig + pos,
 			       min(PCI_CAP_FLAGS, count));
 		else if (offset == PCI_CAP_LIST_NEXT)
-			memcpy(val, vdev->vconfig + pos, 1);
+			memcpy(val, priv->vconfig + pos, 1);
 	}
 
 	return count;
@@ -300,7 +303,9 @@ static int vfio_virt_config_write(struct vfio_pci_device *vdev, int pos,
 				  int count, struct perm_bits *perm,
 				  int offset, __le32 val)
 {
-	memcpy(vdev->vconfig + pos, &val, count);
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+
+	memcpy(priv->vconfig + pos, &val, count);
 	return count;
 }
 
@@ -308,7 +313,9 @@ static int vfio_virt_config_read(struct vfio_pci_device *vdev, int pos,
 				 int count, struct perm_bits *perm,
 				 int offset, __le32 *val)
 {
-	memcpy(val, vdev->vconfig + pos, count);
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+
+	memcpy(val, priv->vconfig + pos, count);
 	return count;
 }
 
@@ -402,7 +409,9 @@ static inline void p_setd(struct perm_bits *p, int off, u32 virt, u32 write)
 static void vfio_bar_restore(struct vfio_pci_device *vdev)
 {
 	struct pci_dev *pdev = vdev->pdev;
-	u32 *rbar = vdev->rbar;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+
+	u32 *rbar = priv->rbar;
 	u16 cmd;
 	int i;
 
@@ -416,7 +425,7 @@ static void vfio_bar_restore(struct vfio_pci_device *vdev)
 
 	pci_user_write_config_dword(pdev, PCI_ROM_ADDRESS, *rbar);
 
-	if (vdev->nointx) {
+	if (priv->nointx) {
 		pci_user_read_config_word(pdev, PCI_COMMAND, &cmd);
 		cmd |= PCI_COMMAND_INTX_DISABLE;
 		pci_user_write_config_word(pdev, PCI_COMMAND, cmd);
@@ -449,11 +458,12 @@ static __le32 vfio_generate_bar_flags(struct pci_dev *pdev, int bar)
 static void vfio_bar_fixup(struct vfio_pci_device *vdev)
 {
 	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	int i;
 	__le32 *bar;
 	u64 mask;
 
-	bar = (__le32 *)&vdev->vconfig[PCI_BASE_ADDRESS_0];
+	bar = (__le32 *)&priv->vconfig[PCI_BASE_ADDRESS_0];
 
 	for (i = PCI_STD_RESOURCES; i <= PCI_STD_RESOURCE_END; i++, bar++) {
 		if (!pci_resource_start(pdev, i)) {
@@ -473,7 +483,7 @@ static void vfio_bar_fixup(struct vfio_pci_device *vdev)
 		}
 	}
 
-	bar = (__le32 *)&vdev->vconfig[PCI_ROM_ADDRESS];
+	bar = (__le32 *)&priv->vconfig[PCI_ROM_ADDRESS];
 
 	/*
 	 * NB. REGION_INFO will have reported zero size if we weren't able
@@ -492,13 +502,15 @@ static void vfio_bar_fixup(struct vfio_pci_device *vdev)
 	} else
 		*bar = 0;
 
-	vdev->bardirty = false;
+	priv->bardirty = false;
 }
 
 static int vfio_basic_config_read(struct vfio_pci_device *vdev, int pos,
 				  int count, struct perm_bits *perm,
 				  int offset, __le32 *val)
 {
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+
 	if (is_bar(offset)) /* pos == offset for basic config */
 		vfio_bar_fixup(vdev);
 
@@ -506,7 +518,8 @@ static int vfio_basic_config_read(struct vfio_pci_device *vdev, int pos,
 
 	/* Mask in virtual memory enable for SR-IOV devices */
 	if (offset == PCI_COMMAND && vdev->pdev->is_virtfn) {
-		u16 cmd = le16_to_cpu(*(__le16 *)&vdev->vconfig[PCI_COMMAND]);
+		u16 cmd = le16_to_cpu(*(__le16 *)
+				&priv->vconfig[PCI_COMMAND]);
 		u32 tmp_val = le32_to_cpu(*val);
 
 		tmp_val |= cmd & PCI_COMMAND_MEMORY;
@@ -521,11 +534,12 @@ static bool vfio_need_bar_restore(struct vfio_pci_device *vdev)
 {
 	int i = 0, pos = PCI_BASE_ADDRESS_0, ret;
 	u32 bar;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 
 	for (; pos <= PCI_BASE_ADDRESS_5; i++, pos += 4) {
-		if (vdev->rbar[i]) {
+		if (priv->rbar[i]) {
 			ret = pci_user_read_config_dword(vdev->pdev, pos, &bar);
-			if (ret || vdev->rbar[i] != bar)
+			if (ret || priv->rbar[i] != bar)
 				return true;
 		}
 	}
@@ -538,11 +552,12 @@ static int vfio_basic_config_write(struct vfio_pci_device *vdev, int pos,
 				   int offset, __le32 val)
 {
 	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	__le16 *virt_cmd;
 	u16 new_cmd = 0;
 	int ret;
 
-	virt_cmd = (__le16 *)&vdev->vconfig[PCI_COMMAND];
+	virt_cmd = (__le16 *)&priv->vconfig[PCI_COMMAND];
 
 	if (offset == PCI_COMMAND) {
 		bool phys_mem, virt_mem, new_mem, phys_io, virt_io, new_io;
@@ -598,17 +613,17 @@ static int vfio_basic_config_write(struct vfio_pci_device *vdev, int pos,
 		virt_intx_disable = !!(le16_to_cpu(*virt_cmd) &
 				       PCI_COMMAND_INTX_DISABLE);
 
-		if (virt_intx_disable && !vdev->virq_disabled) {
-			vdev->virq_disabled = true;
+		if (virt_intx_disable && !priv->virq_disabled) {
+			priv->virq_disabled = true;
 			vfio_pci_intx_mask(vdev);
-		} else if (!virt_intx_disable && vdev->virq_disabled) {
-			vdev->virq_disabled = false;
+		} else if (!virt_intx_disable && priv->virq_disabled) {
+			priv->virq_disabled = false;
 			vfio_pci_intx_unmask(vdev);
 		}
 	}
 
 	if (is_bar(offset))
-		vdev->bardirty = true;
+		priv->bardirty = true;
 
 	return count;
 }
@@ -721,8 +736,11 @@ static int vfio_vpd_config_write(struct vfio_pci_device *vdev, int pos,
 				 int offset, __le32 val)
 {
 	struct pci_dev *pdev = vdev->pdev;
-	__le16 *paddr = (__le16 *)(vdev->vconfig + pos - offset + PCI_VPD_ADDR);
-	__le32 *pdata = (__le32 *)(vdev->vconfig + pos - offset + PCI_VPD_DATA);
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+	__le16 *paddr = (__le16 *)(priv->vconfig + pos - offset +
+			PCI_VPD_ADDR);
+	__le32 *pdata = (__le32 *)(priv->vconfig + pos - offset +
+			PCI_VPD_DATA);
 	u16 addr;
 	u32 data;
 
@@ -802,7 +820,8 @@ static int vfio_exp_config_write(struct vfio_pci_device *vdev, int pos,
 				 int count, struct perm_bits *perm,
 				 int offset, __le32 val)
 {
-	__le16 *ctrl = (__le16 *)(vdev->vconfig + pos -
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+	__le16 *ctrl = (__le16 *)(priv->vconfig + pos -
 				  offset + PCI_EXP_DEVCTL);
 	int readrq = le16_to_cpu(*ctrl) & PCI_EXP_DEVCTL_READRQ;
 
@@ -883,7 +902,8 @@ static int vfio_af_config_write(struct vfio_pci_device *vdev, int pos,
 				int count, struct perm_bits *perm,
 				int offset, __le32 val)
 {
-	u8 *ctrl = vdev->vconfig + pos - offset + PCI_AF_CTRL;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+	u8 *ctrl = priv->vconfig + pos - offset + PCI_AF_CTRL;
 
 	count = vfio_default_config_write(vdev, pos, count, perm, offset, val);
 	if (count < 0)
@@ -1040,13 +1060,15 @@ static int vfio_find_cap_start(struct vfio_pci_device *vdev, int pos)
 	u8 cap;
 	int base = (pos >= PCI_CFG_SPACE_SIZE) ? PCI_CFG_SPACE_SIZE :
 						 PCI_STD_HEADER_SIZEOF;
-	cap = vdev->pci_config_map[pos];
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+
+	cap = priv->pci_config_map[pos];
 
 	if (cap == PCI_CAP_ID_BASIC)
 		return 0;
 
 	/* XXX Can we have to abutting capabilities of the same type? */
-	while (pos - 1 >= base && vdev->pci_config_map[pos - 1] == cap)
+	while (pos - 1 >= base && priv->pci_config_map[pos - 1] == cap)
 		pos--;
 
 	return pos;
@@ -1056,6 +1078,8 @@ static int vfio_msi_config_read(struct vfio_pci_device *vdev, int pos,
 				int count, struct perm_bits *perm,
 				int offset, __le32 *val)
 {
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+
 	/* Update max available queue size from msi_qmax */
 	if (offset <= PCI_MSI_FLAGS && offset + count >= PCI_MSI_FLAGS) {
 		__le16 *flags;
@@ -1063,10 +1087,10 @@ static int vfio_msi_config_read(struct vfio_pci_device *vdev, int pos,
 
 		start = vfio_find_cap_start(vdev, pos);
 
-		flags = (__le16 *)&vdev->vconfig[start];
+		flags = (__le16 *)&priv->vconfig[start];
 
 		*flags &= cpu_to_le16(~PCI_MSI_FLAGS_QMASK);
-		*flags |= cpu_to_le16(vdev->msi_qmax << 1);
+		*flags |= cpu_to_le16(priv->msi_qmax << 1);
 	}
 
 	return vfio_default_config_read(vdev, pos, count, perm, offset, val);
@@ -1076,6 +1100,8 @@ static int vfio_msi_config_write(struct vfio_pci_device *vdev, int pos,
 				 int count, struct perm_bits *perm,
 				 int offset, __le32 val)
 {
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+
 	count = vfio_default_config_write(vdev, pos, count, perm, offset, val);
 	if (count < 0)
 		return count;
@@ -1088,7 +1114,7 @@ static int vfio_msi_config_write(struct vfio_pci_device *vdev, int pos,
 
 		start = vfio_find_cap_start(vdev, pos);
 
-		pflags = (__le16 *)&vdev->vconfig[start + PCI_MSI_FLAGS];
+		pflags = (__le16 *)&priv->vconfig[start + PCI_MSI_FLAGS];
 
 		flags = le16_to_cpu(*pflags);
 
@@ -1097,9 +1123,9 @@ static int vfio_msi_config_write(struct vfio_pci_device *vdev, int pos,
 			flags &= ~PCI_MSI_FLAGS_ENABLE;
 
 		/* Check queue size */
-		if ((flags & PCI_MSI_FLAGS_QSIZE) >> 4 > vdev->msi_qmax) {
+		if ((flags & PCI_MSI_FLAGS_QSIZE) >> 4 > priv->msi_qmax) {
 			flags &= ~PCI_MSI_FLAGS_QSIZE;
-			flags |= vdev->msi_qmax << 4;
+			flags |= priv->msi_qmax << 4;
 		}
 
 		/* Write back to virt and to hardware */
@@ -1155,6 +1181,7 @@ static int init_pci_cap_msi_perm(struct perm_bits *perm, int len, u16 flags)
 static int vfio_msi_cap_len(struct vfio_pci_device *vdev, u8 pos)
 {
 	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	int len, ret;
 	u16 flags;
 
@@ -1168,16 +1195,16 @@ static int vfio_msi_cap_len(struct vfio_pci_device *vdev, u8 pos)
 	if (flags & PCI_MSI_FLAGS_MASKBIT)
 		len += 10;
 
-	if (vdev->msi_perm)
+	if (priv->msi_perm)
 		return len;
 
-	vdev->msi_perm = kmalloc(sizeof(struct perm_bits), GFP_KERNEL);
-	if (!vdev->msi_perm)
+	priv->msi_perm = kmalloc(sizeof(struct perm_bits), GFP_KERNEL);
+	if (!priv->msi_perm)
 		return -ENOMEM;
 
-	ret = init_pci_cap_msi_perm(vdev->msi_perm, len, flags);
+	ret = init_pci_cap_msi_perm(priv->msi_perm, len, flags);
 	if (ret) {
-		kfree(vdev->msi_perm);
+		kfree(priv->msi_perm);
 		return ret;
 	}
 
@@ -1229,6 +1256,7 @@ static int vfio_vc_cap_len(struct vfio_pci_device *vdev, u16 pos)
 static int vfio_cap_len(struct vfio_pci_device *vdev, u8 cap, u8 pos)
 {
 	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	u32 dword;
 	u16 word;
 	u8 byte;
@@ -1247,7 +1275,7 @@ static int vfio_cap_len(struct vfio_pci_device *vdev, u8 cap, u8 pos)
 				/* Test for extended capabilities */
 				pci_read_config_dword(pdev, PCI_CFG_SPACE_SIZE,
 						      &dword);
-				vdev->extended_caps = (dword != 0);
+				priv->extended_caps = (dword != 0);
 			}
 			return PCI_CAP_PCIX_SIZEOF_V2;
 		} else
@@ -1263,7 +1291,7 @@ static int vfio_cap_len(struct vfio_pci_device *vdev, u8 cap, u8 pos)
 		if (pdev->cfg_size > PCI_CFG_SPACE_SIZE) {
 			/* Test for extended capabilities */
 			pci_read_config_dword(pdev, PCI_CFG_SPACE_SIZE, &dword);
-			vdev->extended_caps = (dword != 0);
+			priv->extended_caps = (dword != 0);
 		}
 
 		/* length based on version and type */
@@ -1380,6 +1408,7 @@ static int vfio_fill_vconfig_bytes(struct vfio_pci_device *vdev,
 {
 	struct pci_dev *pdev = vdev->pdev;
 	int ret = 0;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 
 	/*
 	 * We try to read physical config space in the largest chunks
@@ -1390,7 +1419,7 @@ static int vfio_fill_vconfig_bytes(struct vfio_pci_device *vdev,
 		int filled;
 
 		if (size >= 4 && !(offset % 4)) {
-			__le32 *dwordp = (__le32 *)&vdev->vconfig[offset];
+			__le32 *dwordp = (__le32 *)&priv->vconfig[offset];
 			u32 dword;
 
 			ret = pci_read_config_dword(pdev, offset, &dword);
@@ -1399,7 +1428,7 @@ static int vfio_fill_vconfig_bytes(struct vfio_pci_device *vdev,
 			*dwordp = cpu_to_le32(dword);
 			filled = 4;
 		} else if (size >= 2 && !(offset % 2)) {
-			__le16 *wordp = (__le16 *)&vdev->vconfig[offset];
+			__le16 *wordp = (__le16 *)&priv->vconfig[offset];
 			u16 word;
 
 			ret = pci_read_config_word(pdev, offset, &word);
@@ -1408,7 +1437,7 @@ static int vfio_fill_vconfig_bytes(struct vfio_pci_device *vdev,
 			*wordp = cpu_to_le16(word);
 			filled = 2;
 		} else {
-			u8 *byte = &vdev->vconfig[offset];
+			u8 *byte = &priv->vconfig[offset];
 			ret = pci_read_config_byte(pdev, offset, byte);
 			if (ret)
 				return ret;
@@ -1425,7 +1454,8 @@ static int vfio_fill_vconfig_bytes(struct vfio_pci_device *vdev,
 static int vfio_cap_init(struct vfio_pci_device *vdev)
 {
 	struct pci_dev *pdev = vdev->pdev;
-	u8 *map = vdev->pci_config_map;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+	u8 *map = priv->pci_config_map;
 	u16 status;
 	u8 pos, *prev, cap;
 	int loops, ret, caps = 0;
@@ -1443,7 +1473,7 @@ static int vfio_cap_init(struct vfio_pci_device *vdev)
 		return ret;
 
 	/* Mark the previous position in case we want to skip a capability */
-	prev = &vdev->vconfig[PCI_CAPABILITY_LIST];
+	prev = &priv->vconfig[PCI_CAPABILITY_LIST];
 
 	/* We can bound our loop, capabilities are dword aligned */
 	loops = (PCI_CFG_SPACE_SIZE - PCI_STD_HEADER_SIZEOF) / PCI_CAP_SIZEOF;
@@ -1493,14 +1523,14 @@ static int vfio_cap_init(struct vfio_pci_device *vdev)
 		if (ret)
 			return ret;
 
-		prev = &vdev->vconfig[pos + PCI_CAP_LIST_NEXT];
+		prev = &priv->vconfig[pos + PCI_CAP_LIST_NEXT];
 		pos = next;
 		caps++;
 	}
 
 	/* If we didn't fill any capabilities, clear the status flag */
 	if (!caps) {
-		__le16 *vstatus = (__le16 *)&vdev->vconfig[PCI_STATUS];
+		__le16 *vstatus = (__le16 *)&priv->vconfig[PCI_STATUS];
 		*vstatus &= ~cpu_to_le16(PCI_STATUS_CAP_LIST);
 	}
 
@@ -1510,12 +1540,13 @@ static int vfio_cap_init(struct vfio_pci_device *vdev)
 static int vfio_ecap_init(struct vfio_pci_device *vdev)
 {
 	struct pci_dev *pdev = vdev->pdev;
-	u8 *map = vdev->pci_config_map;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+	u8 *map = priv->pci_config_map;
 	u16 epos;
 	__le32 *prev = NULL;
 	int loops, ret, ecaps = 0;
 
-	if (!vdev->extended_caps)
+	if (!priv->extended_caps)
 		return 0;
 
 	epos = PCI_CFG_SPACE_SIZE;
@@ -1590,17 +1621,17 @@ static int vfio_ecap_init(struct vfio_pci_device *vdev)
 		 * ecaps are absent, hope users check all the way to next.
 		 */
 		if (hidden)
-			*(__le32 *)&vdev->vconfig[epos] &=
+			*(__le32 *)&priv->vconfig[epos] &=
 				cpu_to_le32((0xffcU << 20));
 		else
 			ecaps++;
 
-		prev = (__le32 *)&vdev->vconfig[epos];
+		prev = (__le32 *)&priv->vconfig[epos];
 		epos = PCI_EXT_CAP_NEXT(header);
 	}
 
 	if (!ecaps)
-		*(u32 *)&vdev->vconfig[PCI_CFG_SPACE_SIZE] = 0;
+		*(u32 *)&priv->vconfig[PCI_CFG_SPACE_SIZE] = 0;
 
 	return 0;
 }
@@ -1630,6 +1661,7 @@ static const struct pci_device_id known_bogus_vf_intx_pin[] = {
 int vfio_config_init(struct vfio_pci_device *vdev)
 {
 	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	u8 *map, *vconfig;
 	int ret;
 
@@ -1649,8 +1681,8 @@ int vfio_config_init(struct vfio_pci_device *vdev)
 		return -ENOMEM;
 	}
 
-	vdev->pci_config_map = map;
-	vdev->vconfig = vconfig;
+	priv->pci_config_map = map;
+	priv->vconfig = vconfig;
 
 	memset(map, PCI_CAP_ID_BASIC, PCI_STD_HEADER_SIZEOF);
 	memset(map + PCI_STD_HEADER_SIZEOF, PCI_CAP_ID_INVALID,
@@ -1660,7 +1692,7 @@ int vfio_config_init(struct vfio_pci_device *vdev)
 	if (ret)
 		goto out;
 
-	vdev->bardirty = true;
+	priv->bardirty = true;
 
 	/*
 	 * XXX can we just pci_load_saved_state/pci_restore_state?
@@ -1668,13 +1700,13 @@ int vfio_config_init(struct vfio_pci_device *vdev)
 	 */
 
 	/* For restore after reset */
-	vdev->rbar[0] = le32_to_cpu(*(__le32 *)&vconfig[PCI_BASE_ADDRESS_0]);
-	vdev->rbar[1] = le32_to_cpu(*(__le32 *)&vconfig[PCI_BASE_ADDRESS_1]);
-	vdev->rbar[2] = le32_to_cpu(*(__le32 *)&vconfig[PCI_BASE_ADDRESS_2]);
-	vdev->rbar[3] = le32_to_cpu(*(__le32 *)&vconfig[PCI_BASE_ADDRESS_3]);
-	vdev->rbar[4] = le32_to_cpu(*(__le32 *)&vconfig[PCI_BASE_ADDRESS_4]);
-	vdev->rbar[5] = le32_to_cpu(*(__le32 *)&vconfig[PCI_BASE_ADDRESS_5]);
-	vdev->rbar[6] = le32_to_cpu(*(__le32 *)&vconfig[PCI_ROM_ADDRESS]);
+	priv->rbar[0] = le32_to_cpu(*(__le32 *)&vconfig[PCI_BASE_ADDRESS_0]);
+	priv->rbar[1] = le32_to_cpu(*(__le32 *)&vconfig[PCI_BASE_ADDRESS_1]);
+	priv->rbar[2] = le32_to_cpu(*(__le32 *)&vconfig[PCI_BASE_ADDRESS_2]);
+	priv->rbar[3] = le32_to_cpu(*(__le32 *)&vconfig[PCI_BASE_ADDRESS_3]);
+	priv->rbar[4] = le32_to_cpu(*(__le32 *)&vconfig[PCI_BASE_ADDRESS_4]);
+	priv->rbar[5] = le32_to_cpu(*(__le32 *)&vconfig[PCI_BASE_ADDRESS_5]);
+	priv->rbar[6] = le32_to_cpu(*(__le32 *)&vconfig[PCI_ROM_ADDRESS]);
 
 	if (pdev->is_virtfn) {
 		*(__le16 *)&vconfig[PCI_VENDOR_ID] = cpu_to_le16(pdev->vendor);
@@ -1699,7 +1731,7 @@ int vfio_config_init(struct vfio_pci_device *vdev)
 		vconfig[PCI_INTERRUPT_PIN] = 0; /* Gratuitous for good VFs */
 	}
 
-	if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || vdev->nointx)
+	if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || priv->nointx)
 		vconfig[PCI_INTERRUPT_PIN] = 0;
 
 	ret = vfio_cap_init(vdev);
@@ -1714,20 +1746,22 @@ int vfio_config_init(struct vfio_pci_device *vdev)
 
 out:
 	kfree(map);
-	vdev->pci_config_map = NULL;
+	priv->pci_config_map = NULL;
 	kfree(vconfig);
-	vdev->vconfig = NULL;
+	priv->vconfig = NULL;
 	return pcibios_err_to_errno(ret);
 }
 
 void vfio_config_free(struct vfio_pci_device *vdev)
 {
-	kfree(vdev->vconfig);
-	vdev->vconfig = NULL;
-	kfree(vdev->pci_config_map);
-	vdev->pci_config_map = NULL;
-	kfree(vdev->msi_perm);
-	vdev->msi_perm = NULL;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+
+	kfree(priv->vconfig);
+	priv->vconfig = NULL;
+	kfree(priv->pci_config_map);
+	priv->pci_config_map = NULL;
+	kfree(priv->msi_perm);
+	priv->msi_perm = NULL;
 }
 
 /*
@@ -1737,10 +1771,11 @@ void vfio_config_free(struct vfio_pci_device *vdev)
 static size_t vfio_pci_cap_remaining_dword(struct vfio_pci_device *vdev,
 					   loff_t pos)
 {
-	u8 cap = vdev->pci_config_map[pos];
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+	u8 cap = priv->pci_config_map[pos];
 	size_t i;
 
-	for (i = 1; (pos + i) % 4 && vdev->pci_config_map[pos + i] == cap; i++)
+	for (i = 1; (pos + i) % 4 && priv->pci_config_map[pos + i] == cap; i++)
 		/* nop */;
 
 	return i;
@@ -1750,6 +1785,7 @@ static ssize_t vfio_config_do_rw(struct vfio_pci_device *vdev, char __user *buf,
 				 size_t count, loff_t *ppos, bool iswrite)
 {
 	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	struct perm_bits *perm;
 	__le32 val = 0;
 	int cap_start = 0, offset;
@@ -1774,7 +1810,7 @@ static ssize_t vfio_config_do_rw(struct vfio_pci_device *vdev, char __user *buf,
 
 	ret = count;
 
-	cap_id = vdev->pci_config_map[*ppos];
+	cap_id = priv->pci_config_map[*ppos];
 
 	if (cap_id == PCI_CAP_ID_INVALID) {
 		perm = &unassigned_perms;
@@ -1794,7 +1830,7 @@ static ssize_t vfio_config_do_rw(struct vfio_pci_device *vdev, char __user *buf,
 			perm = &cap_perms[cap_id];
 
 			if (cap_id == PCI_CAP_ID_MSI)
-				perm = vdev->msi_perm;
+				perm = priv->msi_perm;
 
 			if (cap_id > PCI_CAP_ID_BASIC)
 				cap_start = vfio_find_cap_start(vdev, *ppos);
diff --git a/drivers/vfio/pci/vfio_pci_igd.c b/drivers/vfio/pci/vfio_pci_igd.c
index 53d97f459252..def84168b035 100644
--- a/drivers/vfio/pci/vfio_pci_igd.c
+++ b/drivers/vfio/pci/vfio_pci_igd.c
@@ -25,13 +25,14 @@ static size_t vfio_pci_igd_rw(struct vfio_pci_device *vdev, char __user *buf,
 			      size_t count, loff_t *ppos, bool iswrite)
 {
 	unsigned int i = VFIO_PCI_OFFSET_TO_INDEX(*ppos) - VFIO_PCI_NUM_REGIONS;
-	void *base = vdev->region[i].data;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+	void *base = priv->region[i].data;
 	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
 
-	if (pos >= vdev->region[i].size || iswrite)
+	if (pos >= priv->region[i].size || iswrite)
 		return -EINVAL;
 
-	count = min(count, (size_t)(vdev->region[i].size - pos));
+	count = min(count, (size_t)(priv->region[i].size - pos));
 
 	if (copy_to_user(buf, base + pos, count))
 		return -EFAULT;
@@ -54,7 +55,8 @@ static const struct vfio_pci_regops vfio_pci_igd_regops = {
 
 static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev)
 {
-	__le32 *dwordp = (__le32 *)(vdev->vconfig + OPREGION_PCI_ADDR);
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+	__le32 *dwordp = (__le32 *)(priv->vconfig + OPREGION_PCI_ADDR);
 	u32 addr, size;
 	void *base;
 	int ret;
@@ -101,7 +103,7 @@ static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev)
 
 	/* Fill vconfig with the hw value and virtualize register */
 	*dwordp = cpu_to_le32(addr);
-	memset(vdev->pci_config_map + OPREGION_PCI_ADDR,
+	memset(priv->pci_config_map + OPREGION_PCI_ADDR,
 	       PCI_CAP_ID_INVALID_VIRT, 4);
 
 	return ret;
@@ -112,15 +114,16 @@ static size_t vfio_pci_igd_cfg_rw(struct vfio_pci_device *vdev,
 				  bool iswrite)
 {
 	unsigned int i = VFIO_PCI_OFFSET_TO_INDEX(*ppos) - VFIO_PCI_NUM_REGIONS;
-	struct pci_dev *pdev = vdev->region[i].data;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+	struct pci_dev *pdev = priv->region[i].data;
 	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
 	size_t size;
 	int ret;
 
-	if (pos >= vdev->region[i].size || iswrite)
+	if (pos >= priv->region[i].size || iswrite)
 		return -EINVAL;
 
-	size = count = min(count, (size_t)(vdev->region[i].size - pos));
+	size = count = min(count, (size_t)(priv->region[i].size - pos));
 
 	if ((pos & 1) && size) {
 		u8 val;
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 3fa3f728fb39..f6fd6ab575ed 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -28,17 +28,19 @@
 static void vfio_send_intx_eventfd(void *opaque, void *unused)
 {
 	struct vfio_pci_device *vdev = opaque;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 
-	if (likely(is_intx(vdev) && !vdev->virq_disabled))
-		eventfd_signal(vdev->ctx[0].trigger, 1);
+	if (likely(is_intx(vdev) && !priv->virq_disabled))
+		eventfd_signal(priv->ctx[0].trigger, 1);
 }
 
 void vfio_pci_intx_mask(struct vfio_pci_device *vdev)
 {
 	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	unsigned long flags;
 
-	spin_lock_irqsave(&vdev->irqlock, flags);
+	spin_lock_irqsave(&priv->irqlock, flags);
 
 	/*
 	 * Masking can come from interrupt, ioctl, or config space
@@ -47,22 +49,22 @@ void vfio_pci_intx_mask(struct vfio_pci_device *vdev)
 	 * try to have the physical bit follow the virtual bit.
 	 */
 	if (unlikely(!is_intx(vdev))) {
-		if (vdev->pci_2_3)
+		if (priv->pci_2_3)
 			pci_intx(pdev, 0);
-	} else if (!vdev->ctx[0].masked) {
+	} else if (!priv->ctx[0].masked) {
 		/*
 		 * Can't use check_and_mask here because we always want to
 		 * mask, not just when something is pending.
 		 */
-		if (vdev->pci_2_3)
+		if (priv->pci_2_3)
 			pci_intx(pdev, 0);
 		else
 			disable_irq_nosync(pdev->irq);
 
-		vdev->ctx[0].masked = true;
+		priv->ctx[0].masked = true;
 	}
 
-	spin_unlock_irqrestore(&vdev->irqlock, flags);
+	spin_unlock_irqrestore(&priv->irqlock, flags);
 }
 
 /*
@@ -75,34 +77,35 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
 {
 	struct vfio_pci_device *vdev = opaque;
 	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	unsigned long flags;
 	int ret = 0;
 
-	spin_lock_irqsave(&vdev->irqlock, flags);
+	spin_lock_irqsave(&priv->irqlock, flags);
 
 	/*
 	 * Unmasking comes from ioctl or config, so again, have the
 	 * physical bit follow the virtual even when not using INTx.
 	 */
 	if (unlikely(!is_intx(vdev))) {
-		if (vdev->pci_2_3)
+		if (priv->pci_2_3)
 			pci_intx(pdev, 1);
-	} else if (vdev->ctx[0].masked && !vdev->virq_disabled) {
+	} else if (priv->ctx[0].masked && !priv->virq_disabled) {
 		/*
 		 * A pending interrupt here would immediately trigger,
 		 * but we can avoid that overhead by just re-sending
 		 * the interrupt to the user.
 		 */
-		if (vdev->pci_2_3) {
+		if (priv->pci_2_3) {
 			if (!pci_check_and_unmask_intx(pdev))
 				ret = 1;
 		} else
 			enable_irq(pdev->irq);
 
-		vdev->ctx[0].masked = (ret > 0);
+		priv->ctx[0].masked = (ret > 0);
 	}
 
-	spin_unlock_irqrestore(&vdev->irqlock, flags);
+	spin_unlock_irqrestore(&priv->irqlock, flags);
 
 	return ret;
 }
@@ -116,22 +119,23 @@ void vfio_pci_intx_unmask(struct vfio_pci_device *vdev)
 static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
 {
 	struct vfio_pci_device *vdev = dev_id;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	unsigned long flags;
 	int ret = IRQ_NONE;
 
-	spin_lock_irqsave(&vdev->irqlock, flags);
+	spin_lock_irqsave(&priv->irqlock, flags);
 
-	if (!vdev->pci_2_3) {
+	if (!priv->pci_2_3) {
 		disable_irq_nosync(vdev->pdev->irq);
-		vdev->ctx[0].masked = true;
+		priv->ctx[0].masked = true;
 		ret = IRQ_HANDLED;
-	} else if (!vdev->ctx[0].masked &&  /* may be shared */
+	} else if (!priv->ctx[0].masked &&  /* may be shared */
 		   pci_check_and_mask_intx(vdev->pdev)) {
-		vdev->ctx[0].masked = true;
+		priv->ctx[0].masked = true;
 		ret = IRQ_HANDLED;
 	}
 
-	spin_unlock_irqrestore(&vdev->irqlock, flags);
+	spin_unlock_irqrestore(&priv->irqlock, flags);
 
 	if (ret == IRQ_HANDLED)
 		vfio_send_intx_eventfd(vdev, NULL);
@@ -141,17 +145,19 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
 
 static int vfio_intx_enable(struct vfio_pci_device *vdev)
 {
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+
 	if (!is_irq_none(vdev))
 		return -EINVAL;
 
 	if (!vdev->pdev->irq)
 		return -ENODEV;
 
-	vdev->ctx = kzalloc(sizeof(struct vfio_pci_irq_ctx), GFP_KERNEL);
-	if (!vdev->ctx)
+	priv->ctx = kzalloc(sizeof(struct vfio_pci_irq_ctx), GFP_KERNEL);
+	if (!priv->ctx)
 		return -ENOMEM;
 
-	vdev->num_ctx = 1;
+	priv->num_ctx = 1;
 
 	/*
 	 * If the virtual interrupt is masked, restore it.  Devices
@@ -159,9 +165,9 @@ static int vfio_intx_enable(struct vfio_pci_device *vdev)
 	 * here, non-PCI-2.3 devices will have to wait until the
 	 * interrupt is enabled.
 	 */
-	vdev->ctx[0].masked = vdev->virq_disabled;
-	if (vdev->pci_2_3)
-		pci_intx(vdev->pdev, !vdev->ctx[0].masked);
+	priv->ctx[0].masked = priv->virq_disabled;
+	if (priv->pci_2_3)
+		pci_intx(vdev->pdev, !priv->ctx[0].masked);
 
 	vdev->irq_type = VFIO_PCI_INTX_IRQ_INDEX;
 
@@ -171,42 +177,43 @@ static int vfio_intx_enable(struct vfio_pci_device *vdev)
 static int vfio_intx_set_signal(struct vfio_pci_device *vdev, int fd)
 {
 	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	unsigned long irqflags = IRQF_SHARED;
 	struct eventfd_ctx *trigger;
 	unsigned long flags;
 	int ret;
 
-	if (vdev->ctx[0].trigger) {
+	if (priv->ctx[0].trigger) {
 		free_irq(pdev->irq, vdev);
-		kfree(vdev->ctx[0].name);
-		eventfd_ctx_put(vdev->ctx[0].trigger);
-		vdev->ctx[0].trigger = NULL;
+		kfree(priv->ctx[0].name);
+		eventfd_ctx_put(priv->ctx[0].trigger);
+		priv->ctx[0].trigger = NULL;
 	}
 
 	if (fd < 0) /* Disable only */
 		return 0;
 
-	vdev->ctx[0].name = kasprintf(GFP_KERNEL, "vfio-intx(%s)",
+	priv->ctx[0].name = kasprintf(GFP_KERNEL, "vfio-intx(%s)",
 				      pci_name(pdev));
-	if (!vdev->ctx[0].name)
+	if (!priv->ctx[0].name)
 		return -ENOMEM;
 
 	trigger = eventfd_ctx_fdget(fd);
 	if (IS_ERR(trigger)) {
-		kfree(vdev->ctx[0].name);
+		kfree(priv->ctx[0].name);
 		return PTR_ERR(trigger);
 	}
 
-	vdev->ctx[0].trigger = trigger;
+	priv->ctx[0].trigger = trigger;
 
-	if (!vdev->pci_2_3)
+	if (!priv->pci_2_3)
 		irqflags = 0;
 
 	ret = request_irq(pdev->irq, vfio_intx_handler,
-			  irqflags, vdev->ctx[0].name, vdev);
+			  irqflags, priv->ctx[0].name, vdev);
 	if (ret) {
-		vdev->ctx[0].trigger = NULL;
-		kfree(vdev->ctx[0].name);
+		priv->ctx[0].trigger = NULL;
+		kfree(priv->ctx[0].name);
 		eventfd_ctx_put(trigger);
 		return ret;
 	}
@@ -215,22 +222,24 @@ static int vfio_intx_set_signal(struct vfio_pci_device *vdev, int fd)
 	 * INTx disable will stick across the new irq setup,
 	 * disable_irq won't.
 	 */
-	spin_lock_irqsave(&vdev->irqlock, flags);
-	if (!vdev->pci_2_3 && vdev->ctx[0].masked)
+	spin_lock_irqsave(&priv->irqlock, flags);
+	if (!priv->pci_2_3 && priv->ctx[0].masked)
 		disable_irq_nosync(pdev->irq);
-	spin_unlock_irqrestore(&vdev->irqlock, flags);
+	spin_unlock_irqrestore(&priv->irqlock, flags);
 
 	return 0;
 }
 
 static void vfio_intx_disable(struct vfio_pci_device *vdev)
 {
-	vfio_virqfd_disable(&vdev->ctx[0].unmask);
-	vfio_virqfd_disable(&vdev->ctx[0].mask);
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+
+	vfio_virqfd_disable(&priv->ctx[0].unmask);
+	vfio_virqfd_disable(&priv->ctx[0].mask);
 	vfio_intx_set_signal(vdev, -1);
 	vdev->irq_type = VFIO_PCI_NUM_IRQS;
-	vdev->num_ctx = 0;
-	kfree(vdev->ctx);
+	priv->num_ctx = 0;
+	kfree(priv->ctx);
 }
 
 /*
@@ -247,14 +256,15 @@ static irqreturn_t vfio_msihandler(int irq, void *arg)
 static int vfio_msi_enable(struct vfio_pci_device *vdev, int nvec, bool msix)
 {
 	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	unsigned int flag = msix ? PCI_IRQ_MSIX : PCI_IRQ_MSI;
 	int ret;
 
 	if (!is_irq_none(vdev))
 		return -EINVAL;
 
-	vdev->ctx = kcalloc(nvec, sizeof(struct vfio_pci_irq_ctx), GFP_KERNEL);
-	if (!vdev->ctx)
+	priv->ctx = kcalloc(nvec, sizeof(struct vfio_pci_irq_ctx), GFP_KERNEL);
+	if (!priv->ctx)
 		return -ENOMEM;
 
 	/* return the number of supported vectors if we can't get all: */
@@ -262,11 +272,11 @@ static int vfio_msi_enable(struct vfio_pci_device *vdev, int nvec, bool msix)
 	if (ret < nvec) {
 		if (ret > 0)
 			pci_free_irq_vectors(pdev);
-		kfree(vdev->ctx);
+		kfree(priv->ctx);
 		return ret;
 	}
 
-	vdev->num_ctx = nvec;
+	priv->num_ctx = nvec;
 	vdev->irq_type = msix ? VFIO_PCI_MSIX_IRQ_INDEX :
 				VFIO_PCI_MSI_IRQ_INDEX;
 
@@ -275,7 +285,7 @@ static int vfio_msi_enable(struct vfio_pci_device *vdev, int nvec, bool msix)
 		 * Compute the virtual hardware field for max msi vectors -
 		 * it is the log base 2 of the number of vectors.
 		 */
-		vdev->msi_qmax = fls(nvec * 2 - 1) - 1;
+		priv->msi_qmax = fls(nvec * 2 - 1) - 1;
 	}
 
 	return 0;
@@ -285,34 +295,35 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 				      int vector, int fd, bool msix)
 {
 	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	struct eventfd_ctx *trigger;
 	int irq, ret;
 
-	if (vector < 0 || vector >= vdev->num_ctx)
+	if (vector < 0 || vector >= priv->num_ctx)
 		return -EINVAL;
 
 	irq = pci_irq_vector(pdev, vector);
 
-	if (vdev->ctx[vector].trigger) {
-		free_irq(irq, vdev->ctx[vector].trigger);
-		irq_bypass_unregister_producer(&vdev->ctx[vector].producer);
-		kfree(vdev->ctx[vector].name);
-		eventfd_ctx_put(vdev->ctx[vector].trigger);
-		vdev->ctx[vector].trigger = NULL;
+	if (priv->ctx[vector].trigger) {
+		free_irq(irq, priv->ctx[vector].trigger);
+		irq_bypass_unregister_producer(&priv->ctx[vector].producer);
+		kfree(priv->ctx[vector].name);
+		eventfd_ctx_put(priv->ctx[vector].trigger);
+		priv->ctx[vector].trigger = NULL;
 	}
 
 	if (fd < 0)
 		return 0;
 
-	vdev->ctx[vector].name = kasprintf(GFP_KERNEL, "vfio-msi%s[%d](%s)",
+	priv->ctx[vector].name = kasprintf(GFP_KERNEL, "vfio-msi%s[%d](%s)",
 					   msix ? "x" : "", vector,
 					   pci_name(pdev));
-	if (!vdev->ctx[vector].name)
+	if (!priv->ctx[vector].name)
 		return -ENOMEM;
 
 	trigger = eventfd_ctx_fdget(fd);
 	if (IS_ERR(trigger)) {
-		kfree(vdev->ctx[vector].name);
+		kfree(priv->ctx[vector].name);
 		return PTR_ERR(trigger);
 	}
 
@@ -331,22 +342,22 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 	}
 
 	ret = request_irq(irq, vfio_msihandler, 0,
-			  vdev->ctx[vector].name, trigger);
+			  priv->ctx[vector].name, trigger);
 	if (ret) {
-		kfree(vdev->ctx[vector].name);
+		kfree(priv->ctx[vector].name);
 		eventfd_ctx_put(trigger);
 		return ret;
 	}
 
-	vdev->ctx[vector].producer.token = trigger;
-	vdev->ctx[vector].producer.irq = irq;
-	ret = irq_bypass_register_producer(&vdev->ctx[vector].producer);
+	priv->ctx[vector].producer.token = trigger;
+	priv->ctx[vector].producer.irq = irq;
+	ret = irq_bypass_register_producer(&priv->ctx[vector].producer);
 	if (unlikely(ret))
 		dev_info(&pdev->dev,
 		"irq bypass producer (token %p) registration fails: %d\n",
-		vdev->ctx[vector].producer.token, ret);
+		priv->ctx[vector].producer.token, ret);
 
-	vdev->ctx[vector].trigger = trigger;
+	priv->ctx[vector].trigger = trigger;
 
 	return 0;
 }
@@ -354,9 +365,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 static int vfio_msi_set_block(struct vfio_pci_device *vdev, unsigned start,
 			      unsigned count, int32_t *fds, bool msix)
 {
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	int i, j, ret = 0;
 
-	if (start >= vdev->num_ctx || start + count > vdev->num_ctx)
+	if (start >= priv->num_ctx || start + count > priv->num_ctx)
 		return -EINVAL;
 
 	for (i = 0, j = start; i < count && !ret; i++, j++) {
@@ -375,14 +387,15 @@ static int vfio_msi_set_block(struct vfio_pci_device *vdev, unsigned start,
 static void vfio_msi_disable(struct vfio_pci_device *vdev, bool msix)
 {
 	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	int i;
 
-	for (i = 0; i < vdev->num_ctx; i++) {
-		vfio_virqfd_disable(&vdev->ctx[i].unmask);
-		vfio_virqfd_disable(&vdev->ctx[i].mask);
+	for (i = 0; i < priv->num_ctx; i++) {
+		vfio_virqfd_disable(&priv->ctx[i].unmask);
+		vfio_virqfd_disable(&priv->ctx[i].mask);
 	}
 
-	vfio_msi_set_block(vdev, 0, vdev->num_ctx, NULL, msix);
+	vfio_msi_set_block(vdev, 0, priv->num_ctx, NULL, msix);
 
 	pci_free_irq_vectors(pdev);
 
@@ -390,12 +403,12 @@ static void vfio_msi_disable(struct vfio_pci_device *vdev, bool msix)
 	 * Both disable paths above use pci_intx_for_msi() to clear DisINTx
 	 * via their shutdown paths.  Restore for NoINTx devices.
 	 */
-	if (vdev->nointx)
+	if (priv->nointx)
 		pci_intx(pdev, 0);
 
 	vdev->irq_type = VFIO_PCI_NUM_IRQS;
-	vdev->num_ctx = 0;
-	kfree(vdev->ctx);
+	priv->num_ctx = 0;
+	kfree(priv->ctx);
 }
 
 /*
@@ -405,6 +418,8 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_device *vdev,
 				    unsigned index, unsigned start,
 				    unsigned count, uint32_t flags, void *data)
 {
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+
 	if (!is_intx(vdev) || start != 0 || count != 1)
 		return -EINVAL;
 
@@ -420,9 +435,9 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_device *vdev,
 			return vfio_virqfd_enable((void *) vdev,
 						  vfio_pci_intx_unmask_handler,
 						  vfio_send_intx_eventfd, NULL,
-						  &vdev->ctx[0].unmask, fd);
+						  &priv->ctx[0].unmask, fd);
 
-		vfio_virqfd_disable(&vdev->ctx[0].unmask);
+		vfio_virqfd_disable(&priv->ctx[0].unmask);
 	}
 
 	return 0;
@@ -497,6 +512,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_device *vdev,
 {
 	int i;
 	bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 
 	if (irq_is(vdev, index) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
 		vfio_msi_disable(vdev, msix);
@@ -525,18 +541,18 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_device *vdev,
 		return ret;
 	}
 
-	if (!irq_is(vdev, index) || start + count > vdev->num_ctx)
+	if (!irq_is(vdev, index) || start + count > priv->num_ctx)
 		return -EINVAL;
 
 	for (i = start; i < start + count; i++) {
-		if (!vdev->ctx[i].trigger)
+		if (!priv->ctx[i].trigger)
 			continue;
 		if (flags & VFIO_IRQ_SET_DATA_NONE) {
-			eventfd_signal(vdev->ctx[i].trigger, 1);
+			eventfd_signal(priv->ctx[i].trigger, 1);
 		} else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
 			uint8_t *bools = data;
 			if (bools[i - start])
-				eventfd_signal(vdev->ctx[i].trigger, 1);
+				eventfd_signal(priv->ctx[i].trigger, 1);
 		}
 	}
 	return 0;
@@ -601,10 +617,12 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev,
 				    unsigned index, unsigned start,
 				    unsigned count, uint32_t flags, void *data)
 {
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+
 	if (index != VFIO_PCI_ERR_IRQ_INDEX || start != 0 || count > 1)
 		return -EINVAL;
 
-	return vfio_pci_set_ctx_trigger_single(&vdev->err_trigger,
+	return vfio_pci_set_ctx_trigger_single(&priv->err_trigger,
 					       count, flags, data);
 }
 
@@ -612,10 +630,12 @@ static int vfio_pci_set_req_trigger(struct vfio_pci_device *vdev,
 				    unsigned index, unsigned start,
 				    unsigned count, uint32_t flags, void *data)
 {
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+
 	if (index != VFIO_PCI_REQ_IRQ_INDEX || start != 0 || count > 1)
 		return -EINVAL;
 
-	return vfio_pci_set_ctx_trigger_single(&vdev->req_trigger,
+	return vfio_pci_set_ctx_trigger_single(&priv->req_trigger,
 					       count, flags, data);
 }
 
diff --git a/drivers/vfio/pci/vfio_pci_nvlink2.c b/drivers/vfio/pci/vfio_pci_nvlink2.c
index f2983f0f84be..bd2bb99cbf0b 100644
--- a/drivers/vfio/pci/vfio_pci_nvlink2.c
+++ b/drivers/vfio/pci/vfio_pci_nvlink2.c
@@ -43,16 +43,17 @@ static size_t vfio_pci_nvgpu_rw(struct vfio_pci_device *vdev,
 		char __user *buf, size_t count, loff_t *ppos, bool iswrite)
 {
 	unsigned int i = VFIO_PCI_OFFSET_TO_INDEX(*ppos) - VFIO_PCI_NUM_REGIONS;
-	struct vfio_pci_nvgpu_data *data = vdev->region[i].data;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+	struct vfio_pci_nvgpu_data *data = priv->region[i].data;
 	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
 	loff_t posaligned = pos & PAGE_MASK, posoff = pos & ~PAGE_MASK;
 	size_t sizealigned;
 	void __iomem *ptr;
 
-	if (pos >= vdev->region[i].size)
+	if (pos >= priv->region[i].size)
 		return -EINVAL;
 
-	count = min(count, (size_t)(vdev->region[i].size - pos));
+	count = min(count, (size_t)(priv->region[i].size - pos));
 
 	/*
 	 * We map only a bit of GPU RAM for a short time instead of mapping it
@@ -115,7 +116,7 @@ static vm_fault_t vfio_pci_nvgpu_mmap_fault(struct vm_fault *vmf)
 {
 	vm_fault_t ret;
 	struct vm_area_struct *vma = vmf->vma;
-	struct vfio_pci_region *region = vma->vm_private_data;
+	struct vfio_pci_region *region = vma->vm_priv_data;
 	struct vfio_pci_nvgpu_data *data = region->data;
 	unsigned long vmf_off = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
 	unsigned long nv2pg = data->gpu_hpa >> PAGE_SHIFT;
@@ -146,7 +147,7 @@ static int vfio_pci_nvgpu_mmap(struct vfio_pci_device *vdev,
 	if (vma->vm_end - vma->vm_start > data->size)
 		return -EINVAL;
 
-	vma->vm_private_data = region;
+	vma->vm_priv_data = region;
 	vma->vm_flags |= VM_PFNMAP;
 	vma->vm_ops = &vfio_pci_nvgpu_mmap_vmops;
 
@@ -193,9 +194,7 @@ static int vfio_pci_nvgpu_group_notifier(struct notifier_block *nb,
 		unsigned long action, void *opaque)
 {
 	struct kvm *kvm = opaque;
-	struct vfio_pci_nvgpu_data *data = container_of(nb,
-			struct vfio_pci_nvgpu_data,
-			group_notifier);
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 
 	if (action == VFIO_GROUP_NOTIFY_SET_KVM && kvm &&
 			pnv_npu2_map_lpar_dev(data->gpdev,
@@ -306,13 +305,14 @@ static size_t vfio_pci_npu2_rw(struct vfio_pci_device *vdev,
 		char __user *buf, size_t count, loff_t *ppos, bool iswrite)
 {
 	unsigned int i = VFIO_PCI_OFFSET_TO_INDEX(*ppos) - VFIO_PCI_NUM_REGIONS;
-	struct vfio_pci_npu2_data *data = vdev->region[i].data;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
+	struct vfio_pci_npu2_data *data = priv->region[i].data;
 	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
 
-	if (pos >= vdev->region[i].size)
+	if (pos >= priv->region[i].size)
 		return -EINVAL;
 
-	count = min(count, (size_t)(vdev->region[i].size - pos));
+	count = min(count, (size_t)(priv->region[i].size - pos));
 
 	if (iswrite) {
 		if (copy_from_user(data->base + pos, buf, count))
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index ee6ee91718a4..df665fa8be52 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -84,8 +84,8 @@ struct vfio_pci_reflck {
 	struct mutex		lock;
 };
 
-struct vfio_pci_device {
-	struct pci_dev		*pdev;
+struct vfio_pci_device_private {
+	struct vfio_pci_device	vdev;
 	void __iomem		*barmap[PCI_STD_RESOURCE_END + 1];
 	bool			bar_mmap_supported[PCI_STD_RESOURCE_END + 1];
 	u8			*pci_config_map;
@@ -95,8 +95,6 @@ struct vfio_pci_device {
 	struct mutex		igate;
 	struct vfio_pci_irq_ctx	*ctx;
 	int			num_ctx;
-	int			irq_type;
-	int			num_regions;
 	struct vfio_pci_region	*region;
 	u8			msi_qmax;
 	u8			msix_bar;
@@ -123,6 +121,7 @@ struct vfio_pci_device {
 	struct mutex		ioeventfds_lock;
 	struct list_head	ioeventfds_list;
 };
+#define VDEV_TO_PRIV(p) container_of((p), struct vfio_pci_device_private, vdev)
 
 #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)
 #define is_msi(vdev) (vdev->irq_type == VFIO_PCI_MSI_IRQ_INDEX)
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 0120d8324a40..a0ef1de4f74a 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -132,10 +132,11 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
 static int vfio_pci_setup_barmap(struct vfio_pci_device *vdev, int bar)
 {
 	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	int ret;
 	void __iomem *io;
 
-	if (vdev->barmap[bar])
+	if (priv->barmap[bar])
 		return 0;
 
 	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
@@ -148,7 +149,7 @@ static int vfio_pci_setup_barmap(struct vfio_pci_device *vdev, int bar)
 		return -ENOMEM;
 	}
 
-	vdev->barmap[bar] = io;
+	priv->barmap[bar] = io;
 
 	return 0;
 }
@@ -157,6 +158,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
 			size_t count, loff_t *ppos, bool iswrite)
 {
 	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
 	int bar = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
 	size_t x_start = 0, x_end = 0;
@@ -192,12 +194,12 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
 		if (ret)
 			return ret;
 
-		io = vdev->barmap[bar];
+		io = priv->barmap[bar];
 	}
 
-	if (bar == vdev->msix_bar) {
-		x_start = vdev->msix_offset;
-		x_end = vdev->msix_offset + vdev->msix_size;
+	if (bar == priv->msix_bar) {
+		x_start = priv->msix_offset;
+		x_end = priv->msix_offset + priv->msix_size;
 	}
 
 	done = do_io_rw(io, buf, pos, count, x_start, x_end, iswrite);
@@ -214,6 +216,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
 ssize_t vfio_pci_vga_rw(struct vfio_pci_device *vdev, char __user *buf,
 			       size_t count, loff_t *ppos, bool iswrite)
 {
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	int ret;
 	loff_t off, pos = *ppos & VFIO_PCI_OFFSET_MASK;
 	void __iomem *iomem = NULL;
@@ -221,7 +224,7 @@ ssize_t vfio_pci_vga_rw(struct vfio_pci_device *vdev, char __user *buf,
 	bool is_ioport;
 	ssize_t done;
 
-	if (!vdev->has_vga)
+	if (!priv->has_vga)
 		return -EINVAL;
 
 	if (pos > 0xbfffful)
@@ -302,6 +305,7 @@ long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
 			uint64_t data, int count, int fd)
 {
 	struct pci_dev *pdev = vdev->pdev;
+	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
 	loff_t pos = offset & VFIO_PCI_OFFSET_MASK;
 	int ret, bar = VFIO_PCI_OFFSET_TO_INDEX(offset);
 	struct vfio_pci_ioeventfd *ioeventfd;
@@ -314,9 +318,9 @@ long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
 		return -EINVAL;
 
 	/* Disallow ioeventfds working around MSI-X table writes */
-	if (bar == vdev->msix_bar &&
-	    !(pos + count <= vdev->msix_offset ||
-	      pos >= vdev->msix_offset + vdev->msix_size))
+	if (bar == priv->msix_bar &&
+	    !(pos + count <= priv->msix_offset ||
+	      pos >= priv->msix_offset + priv->msix_size))
 		return -EINVAL;
 
 #ifndef iowrite64
@@ -328,15 +332,15 @@ long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
 	if (ret)
 		return ret;
 
-	mutex_lock(&vdev->ioeventfds_lock);
+	mutex_lock(&priv->ioeventfds_lock);
 
-	list_for_each_entry(ioeventfd, &vdev->ioeventfds_list, next) {
+	list_for_each_entry(ioeventfd, &priv->ioeventfds_list, next) {
 		if (ioeventfd->pos == pos && ioeventfd->bar == bar &&
 		    ioeventfd->data == data && ioeventfd->count == count) {
 			if (fd == -1) {
 				vfio_virqfd_disable(&ioeventfd->virqfd);
 				list_del(&ioeventfd->next);
-				vdev->ioeventfds_nr--;
+				priv->ioeventfds_nr--;
 				kfree(ioeventfd);
 				ret = 0;
 			} else
@@ -351,7 +355,7 @@ long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
 		goto out_unlock;
 	}
 
-	if (vdev->ioeventfds_nr >= VFIO_PCI_IOEVENTFD_MAX) {
+	if (priv->ioeventfds_nr >= VFIO_PCI_IOEVENTFD_MAX) {
 		ret = -ENOSPC;
 		goto out_unlock;
 	}
@@ -362,7 +366,7 @@ long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
 		goto out_unlock;
 	}
 
-	ioeventfd->addr = vdev->barmap[bar] + pos;
+	ioeventfd->addr = priv->barmap[bar] + pos;
 	ioeventfd->data = data;
 	ioeventfd->pos = pos;
 	ioeventfd->bar = bar;
@@ -375,11 +379,11 @@ long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
 		goto out_unlock;
 	}
 
-	list_add(&ioeventfd->next, &vdev->ioeventfds_list);
-	vdev->ioeventfds_nr++;
+	list_add(&ioeventfd->next, &priv->ioeventfds_list);
+	priv->ioeventfds_nr++;
 
 out_unlock:
-	mutex_unlock(&vdev->ioeventfds_lock);
+	mutex_unlock(&priv->ioeventfds_lock);
 
 	return ret;
 }
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e42a711a2800..70a2b8fb6179 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -195,4 +195,9 @@ extern int vfio_virqfd_enable(void *opaque,
 			      void *data, struct virqfd **pvirqfd, int fd);
 extern void vfio_virqfd_disable(struct virqfd **pvirqfd);
 
+struct vfio_pci_device {
+	struct pci_dev			*pdev;
+	int				num_regions;
+	int				irq_type;
+};
 #endif /* VFIO_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 2/9] vfio/pci: export functions in vfio_pci_ops
  2020-02-11  9:57 [RFC PATCH v3 0/9] Introduce vendor ops in vfio-pci Yan Zhao
  2020-02-11 10:10 ` [RFC PATCH v3 1/9] vfio/pci: export vfio_pci_device public and add vfio_pci_device_private Yan Zhao
@ 2020-02-11 10:11 ` Yan Zhao
  2020-02-11 10:11 ` [RFC PATCH v3 3/9] vfio/pci: register/unregister vfio_pci_vendor_driver_ops Yan Zhao
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Yan Zhao @ 2020-02-11 10:11 UTC (permalink / raw)
  To: alex.williamson
  Cc: kvm, linux-kernel, cohuck, zhenyuw, zhi.a.wang, kevin.tian,
	shaopeng.he, yi.l.liu, Yan Zhao

export functions in vfio_pci_ops, so they are able to be called from
outside modules and make them a kind of inherited by vfio_device_ops
provided by other modules
including
    vfio_pci_open,
    vfio_pci_release,
    vfio_pci_ioctl,
    vfio_pci_read,
    vfio_pci_write,
    vfio_pci_mmap,
    vfio_pci_request,

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/vfio/pci/vfio_pci.c | 21 ++++++++++++++-------
 include/linux/vfio.h        | 11 +++++++++++
 2 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index f19e2dc498a4..02297bd3f6c2 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -468,7 +468,7 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
 		vfio_pci_set_power_state(vdev, PCI_D3hot);
 }
 
-static void vfio_pci_release(void *device_data)
+void vfio_pci_release(void *device_data)
 {
 	struct vfio_pci_device *vdev = device_data;
 	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
@@ -484,8 +484,9 @@ static void vfio_pci_release(void *device_data)
 
 	module_put(THIS_MODULE);
 }
+EXPORT_SYMBOL_GPL(vfio_pci_release);
 
-static int vfio_pci_open(void *device_data)
+int vfio_pci_open(void *device_data)
 {
 	struct vfio_pci_device *vdev = device_data;
 	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
@@ -510,6 +511,7 @@ static int vfio_pci_open(void *device_data)
 		module_put(THIS_MODULE);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(vfio_pci_open);
 
 static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
 {
@@ -698,7 +700,7 @@ int vfio_pci_register_dev_region(struct vfio_pci_device *vdev,
 	return 0;
 }
 
-static long vfio_pci_ioctl(void *device_data,
+long vfio_pci_ioctl(void *device_data,
 			   unsigned int cmd, unsigned long arg)
 {
 	struct vfio_pci_device *vdev = device_data;
@@ -1152,6 +1154,7 @@ static long vfio_pci_ioctl(void *device_data,
 
 	return -ENOTTY;
 }
+EXPORT_SYMBOL_GPL(vfio_pci_ioctl);
 
 static ssize_t vfio_pci_rw(void *device_data, char __user *buf,
 			   size_t count, loff_t *ppos, bool iswrite)
@@ -1186,7 +1189,7 @@ static ssize_t vfio_pci_rw(void *device_data, char __user *buf,
 	return -EINVAL;
 }
 
-static ssize_t vfio_pci_read(void *device_data, char __user *buf,
+ssize_t vfio_pci_read(void *device_data, char __user *buf,
 			     size_t count, loff_t *ppos)
 {
 	if (!count)
@@ -1194,8 +1197,9 @@ static ssize_t vfio_pci_read(void *device_data, char __user *buf,
 
 	return vfio_pci_rw(device_data, buf, count, ppos, false);
 }
+EXPORT_SYMBOL_GPL(vfio_pci_read);
 
-static ssize_t vfio_pci_write(void *device_data, const char __user *buf,
+ssize_t vfio_pci_write(void *device_data, const char __user *buf,
 			      size_t count, loff_t *ppos)
 {
 	if (!count)
@@ -1203,8 +1207,9 @@ static ssize_t vfio_pci_write(void *device_data, const char __user *buf,
 
 	return vfio_pci_rw(device_data, (char __user *)buf, count, ppos, true);
 }
+EXPORT_SYMBOL_GPL(vfio_pci_write);
 
-static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
+int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
 {
 	struct vfio_pci_device *vdev = device_data;
 	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
@@ -1266,8 +1271,9 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
 	return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
 			       req_len, vma->vm_page_prot);
 }
+EXPORT_SYMBOL_GPL(vfio_pci_mmap);
 
-static void vfio_pci_request(void *device_data, unsigned int count)
+void vfio_pci_request(void *device_data, unsigned int count)
 {
 	struct vfio_pci_device *vdev = device_data;
 	struct pci_dev *pdev = vdev->pdev;
@@ -1288,6 +1294,7 @@ static void vfio_pci_request(void *device_data, unsigned int count)
 
 	mutex_unlock(&priv->igate);
 }
+EXPORT_SYMBOL_GPL(vfio_pci_request);
 
 static const struct vfio_device_ops vfio_pci_ops = {
 	.name		= "vfio-pci",
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 70a2b8fb6179..64714cbd02f9 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -200,4 +200,15 @@ struct vfio_pci_device {
 	int				num_regions;
 	int				irq_type;
 };
+
+extern long vfio_pci_ioctl(void *device_data,
+			   unsigned int cmd, unsigned long arg);
+extern ssize_t vfio_pci_read(void *device_data, char __user *buf,
+			     size_t count, loff_t *ppos);
+extern ssize_t vfio_pci_write(void *device_data, const char __user *buf,
+			      size_t count, loff_t *ppos);
+extern int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma);
+extern void vfio_pci_request(void *device_data, unsigned int count);
+extern int vfio_pci_open(void *device_data);
+extern void vfio_pci_release(void *device_data);
 #endif /* VFIO_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 3/9] vfio/pci: register/unregister vfio_pci_vendor_driver_ops
  2020-02-11  9:57 [RFC PATCH v3 0/9] Introduce vendor ops in vfio-pci Yan Zhao
  2020-02-11 10:10 ` [RFC PATCH v3 1/9] vfio/pci: export vfio_pci_device public and add vfio_pci_device_private Yan Zhao
  2020-02-11 10:11 ` [RFC PATCH v3 2/9] vfio/pci: export functions in vfio_pci_ops Yan Zhao
@ 2020-02-11 10:11 ` Yan Zhao
  2020-02-11 10:12 ` [RFC PATCH v3 4/9] vfio/pci: macros to generate module_init and module_exit for vendor modules Yan Zhao
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Yan Zhao @ 2020-02-11 10:11 UTC (permalink / raw)
  To: alex.williamson
  Cc: kvm, linux-kernel, cohuck, zhenyuw, zhi.a.wang, kevin.tian,
	shaopeng.he, yi.l.liu, Yan Zhao

vfio_pci_vendor_driver_ops includes two parts:
(1) .probe() and .remove() interface to be called by vfio_pci_probe()
and vfio_pci_remove().
(2) pointer to struct vfio_device_ops. It will be registered as ops of vfio
device if .probe() succeeds.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/vfio/pci/vfio_pci.c         | 105 +++++++++++++++++++++++++++-
 drivers/vfio/pci/vfio_pci_private.h |   6 ++
 include/linux/vfio.h                |  11 +++
 3 files changed, 121 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 02297bd3f6c2..e5bfb4948667 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -62,6 +62,10 @@ static inline bool vfio_vga_disabled(void)
 	return true;
 #endif
 }
+static struct vfio_pci {
+	struct  mutex		vendor_drivers_lock;
+	struct  list_head	vendor_drivers_list;
+} vfio_pci;
 
 /*
  * Our VGA arbiter participation is limited since we don't know anything
@@ -1310,6 +1314,37 @@ static const struct vfio_device_ops vfio_pci_ops = {
 static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev);
 static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck);
 
+static int probe_vendor_drivers(struct vfio_pci_device *vdev)
+{
+	struct vfio_pci_device_private *priv = container_of(vdev,
+				struct vfio_pci_device_private, vdev);
+	struct vfio_pci_vendor_driver *driver;
+	int ret = -ENODEV;
+
+	request_module("vfio-pci:%x-%x", vdev->pdev->vendor,
+					 vdev->pdev->device);
+
+	mutex_lock(&vfio_pci.vendor_drivers_lock);
+	list_for_each_entry(driver, &vfio_pci.vendor_drivers_list, next) {
+		void *data;
+
+		if (!try_module_get(driver->ops->owner))
+			continue;
+
+		data = driver->ops->probe(vdev->pdev);
+		if (IS_ERR(data)) {
+			module_put(driver->ops->owner);
+			continue;
+		}
+		priv->vendor_driver = driver;
+		vdev->vendor_data = data;
+		ret = 0;
+		break;
+	}
+	mutex_unlock(&vfio_pci.vendor_drivers_lock);
+	return ret;
+}
+
 static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	struct vfio_pci_device_private *priv;
@@ -1349,7 +1384,13 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	mutex_init(&priv->ioeventfds_lock);
 	INIT_LIST_HEAD(&priv->ioeventfds_list);
 
-	ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, &priv->vdev);
+	if (probe_vendor_drivers(&priv->vdev))
+		ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops,
+					 &priv->vdev);
+	else
+		ret = vfio_add_group_dev(&pdev->dev,
+					 priv->vendor_driver->ops->device_ops,
+					 &priv->vdev);
 	if (ret) {
 		vfio_iommu_group_put(group, &pdev->dev);
 		kfree(priv);
@@ -1387,6 +1428,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		vfio_pci_set_power_state(&priv->vdev, PCI_D3hot);
 	}
 
+
 	return ret;
 }
 
@@ -1410,6 +1452,11 @@ static void vfio_pci_remove(struct pci_dev *pdev)
 	if (!disable_idle_d3)
 		vfio_pci_set_power_state(vdev, PCI_D0);
 
+	if (priv->vendor_driver) {
+		priv->vendor_driver->ops->remove(vdev->vendor_data);
+		module_put(priv->vendor_driver->ops->owner);
+	}
+
 	kfree(priv->pm_save);
 	kfree(priv);
 
@@ -1725,6 +1772,8 @@ static int __init vfio_pci_init(void)
 
 	vfio_pci_fill_ids();
 
+	mutex_init(&vfio_pci.vendor_drivers_lock);
+	INIT_LIST_HEAD(&vfio_pci.vendor_drivers_list);
 	return 0;
 
 out_driver:
@@ -1732,6 +1781,60 @@ static int __init vfio_pci_init(void)
 	return ret;
 }
 
+int __vfio_pci_register_vendor_driver(struct vfio_pci_vendor_driver_ops *ops)
+{
+	struct vfio_pci_vendor_driver *driver, *tmp;
+
+	if (!ops || !ops->device_ops)
+		return -EINVAL;
+
+	driver = kzalloc(sizeof(*driver), GFP_KERNEL);
+	if (!driver)
+		return -ENOMEM;
+
+	driver->ops = ops;
+
+	mutex_lock(&vfio_pci.vendor_drivers_lock);
+
+	/* Check for duplicates */
+	list_for_each_entry(tmp, &vfio_pci.vendor_drivers_list, next) {
+		if (tmp->ops->device_ops == ops->device_ops) {
+			mutex_unlock(&vfio_pci.vendor_drivers_lock);
+			kfree(driver);
+			return -EINVAL;
+		}
+	}
+
+	list_add(&driver->next, &vfio_pci.vendor_drivers_list);
+
+	mutex_unlock(&vfio_pci.vendor_drivers_lock);
+
+	if (!try_module_get(THIS_MODULE))
+		return -ENODEV;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(__vfio_pci_register_vendor_driver);
+
+void vfio_pci_unregister_vendor_driver(struct vfio_device_ops *device_ops)
+{
+	struct vfio_pci_vendor_driver *driver, *tmp;
+
+	mutex_lock(&vfio_pci.vendor_drivers_lock);
+	list_for_each_entry_safe(driver, tmp,
+				 &vfio_pci.vendor_drivers_list, next) {
+		if (driver->ops->device_ops == device_ops) {
+			list_del(&driver->next);
+			mutex_unlock(&vfio_pci.vendor_drivers_lock);
+			kfree(driver);
+			module_put(THIS_MODULE);
+			return;
+		}
+	}
+	mutex_unlock(&vfio_pci.vendor_drivers_lock);
+}
+EXPORT_SYMBOL_GPL(vfio_pci_unregister_vendor_driver);
+
 module_init(vfio_pci_init);
 module_exit(vfio_pci_cleanup);
 
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index df665fa8be52..a5bc53e2d849 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -84,6 +84,11 @@ struct vfio_pci_reflck {
 	struct mutex		lock;
 };
 
+struct vfio_pci_vendor_driver {
+	const struct vfio_pci_vendor_driver_ops *ops;
+	struct list_head			next;
+};
+
 struct vfio_pci_device_private {
 	struct vfio_pci_device	vdev;
 	void __iomem		*barmap[PCI_STD_RESOURCE_END + 1];
@@ -120,6 +125,7 @@ struct vfio_pci_device_private {
 	struct list_head	dummy_resources_list;
 	struct mutex		ioeventfds_lock;
 	struct list_head	ioeventfds_list;
+	struct vfio_pci_vendor_driver	*vendor_driver;
 };
 #define VDEV_TO_PRIV(p) container_of((p), struct vfio_pci_device_private, vdev)
 
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 64714cbd02f9..43b2222da2bf 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -195,10 +195,21 @@ extern int vfio_virqfd_enable(void *opaque,
 			      void *data, struct virqfd **pvirqfd, int fd);
 extern void vfio_virqfd_disable(struct virqfd **pvirqfd);
 
+struct vfio_pci_vendor_driver_ops {
+	char			*name;
+	struct module		*owner;
+	void			*(*probe)(struct pci_dev *pdev);
+	void			(*remove)(void *vendor_data);
+	struct vfio_device_ops *device_ops;
+};
+int __vfio_pci_register_vendor_driver(struct vfio_pci_vendor_driver_ops *ops);
+void vfio_pci_unregister_vendor_driver(struct vfio_device_ops *device_ops);
+
 struct vfio_pci_device {
 	struct pci_dev			*pdev;
 	int				num_regions;
 	int				irq_type;
+	void				*vendor_data;
 };
 
 extern long vfio_pci_ioctl(void *device_data,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 4/9] vfio/pci: macros to generate module_init and module_exit for vendor modules
  2020-02-11  9:57 [RFC PATCH v3 0/9] Introduce vendor ops in vfio-pci Yan Zhao
                   ` (2 preceding siblings ...)
  2020-02-11 10:11 ` [RFC PATCH v3 3/9] vfio/pci: register/unregister vfio_pci_vendor_driver_ops Yan Zhao
@ 2020-02-11 10:12 ` Yan Zhao
  2020-02-11 10:13 ` [RFC PATCH v3 5/9] vfio/pci: let vfio_pci know how many vendor regions are registered Yan Zhao
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Yan Zhao @ 2020-02-11 10:12 UTC (permalink / raw)
  To: alex.williamson
  Cc: kvm, linux-kernel, cohuck, zhenyuw, zhi.a.wang, kevin.tian,
	shaopeng.he, yi.l.liu, Yan Zhao

vendor modules call macro module_vfio_pci_register_vendor_handler to
generate module_init and module_exit.
It is necessary to ensure that vendor modules always call
vfio_pci_register_vendor_driver() on driver loading and
vfio_pci_unregister_vendor_driver on driver unloading,
because
(1) at compiling time, there's only a dependency of vendor modules on
vfio_pci.
(2) at runtime,
- vendor modules add refs of vfio_pci on a successful calling of
  vfio_pci_register_vendor_driver() and deref of vfio_pci on a
  successful calling of vfio_pci_unregister_vendor_driver().
- vfio_pci only adds refs of vendor module on a successful probe of vendor
  driver.
  vfio_pci derefs vendor module when unbinding from a device.

So, after vfio_pci is unbound from a device, the vendor module to that
device is free to get unloaded. However, if that vendor module does not
call vfio_pci_unregister_vendor_driver() in its module_exit, vfio_pci may
hold a stale pointer to vendor module.

That's how module_vfio_pci_register_vendor_handler helps.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 include/linux/vfio.h | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 43b2222da2bf..71a03471b208 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -222,4 +222,31 @@ extern int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma);
 extern void vfio_pci_request(void *device_data, unsigned int count);
 extern int vfio_pci_open(void *device_data);
 extern void vfio_pci_release(void *device_data);
+
+#define vfio_pci_register_vendor_driver(__name, __probe, __remove,	\
+					__device_ops)			\
+static struct vfio_pci_vendor_driver_ops  __ops ## _node = {		\
+	.owner		= THIS_MODULE,					\
+	.name		= __name,					\
+	.probe		= __probe,					\
+	.remove		= __remove,					\
+	.device_ops	= __device_ops,					\
+};									\
+__vfio_pci_register_vendor_driver(&__ops ## _node)
+
+#define module_vfio_pci_register_vendor_handler(name, probe, remove,	\
+						device_ops)		\
+static int __init device_ops ## _module_init(void)			\
+{									\
+	vfio_pci_register_vendor_driver(name, probe, remove,		\
+					device_ops);			\
+	return 0;							\
+};									\
+static void __exit device_ops ## _module_exit(void)			\
+{									\
+	vfio_pci_unregister_vendor_driver(device_ops);			\
+};									\
+module_init(device_ops ## _module_init);				\
+module_exit(device_ops ## _module_exit)
+
 #endif /* VFIO_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 5/9] vfio/pci: let vfio_pci know how many vendor regions are registered
  2020-02-11  9:57 [RFC PATCH v3 0/9] Introduce vendor ops in vfio-pci Yan Zhao
                   ` (3 preceding siblings ...)
  2020-02-11 10:12 ` [RFC PATCH v3 4/9] vfio/pci: macros to generate module_init and module_exit for vendor modules Yan Zhao
@ 2020-02-11 10:13 ` Yan Zhao
  2020-02-11 10:14 ` [RFC PATCH v3 6/9] vfio/pci: export vfio_pci_setup_barmap Yan Zhao
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Yan Zhao @ 2020-02-11 10:13 UTC (permalink / raw)
  To: alex.williamson
  Cc: kvm, linux-kernel, cohuck, zhenyuw, zhi.a.wang, kevin.tian,
	shaopeng.he, yi.l.liu, Yan Zhao

This allows a simpler VFIO_DEVICE_GET_INFO ioctl in vendor driver

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/vfio/pci/vfio_pci.c | 3 ++-
 include/linux/vfio.h        | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index e5bfb4948667..9e5d878f5f32 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -727,7 +727,8 @@ long vfio_pci_ioctl(void *device_data,
 		if (priv->reset_works)
 			info.flags |= VFIO_DEVICE_FLAGS_RESET;
 
-		info.num_regions = VFIO_PCI_NUM_REGIONS + vdev->num_regions;
+		info.num_regions = VFIO_PCI_NUM_REGIONS + vdev->num_regions +
+						vdev->num_vendor_regions;
 		info.num_irqs = VFIO_PCI_NUM_IRQS;
 
 		return copy_to_user((void __user *)arg, &info, minsz) ?
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 71a03471b208..4d7e80b2ed1b 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -208,6 +208,7 @@ void vfio_pci_unregister_vendor_driver(struct vfio_device_ops *device_ops);
 struct vfio_pci_device {
 	struct pci_dev			*pdev;
 	int				num_regions;
+	int				num_vendor_regions;
 	int				irq_type;
 	void				*vendor_data;
 };
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 6/9] vfio/pci: export vfio_pci_setup_barmap
  2020-02-11  9:57 [RFC PATCH v3 0/9] Introduce vendor ops in vfio-pci Yan Zhao
                   ` (4 preceding siblings ...)
  2020-02-11 10:13 ` [RFC PATCH v3 5/9] vfio/pci: let vfio_pci know how many vendor regions are registered Yan Zhao
@ 2020-02-11 10:14 ` Yan Zhao
  2020-02-20 21:00   ` Alex Williamson
  2020-02-11 10:14 ` [RFC PATCH v3 7/9] samples/vfio-pci: add a sample vendor module of vfio-pci for IGD devices Yan Zhao
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: Yan Zhao @ 2020-02-11 10:14 UTC (permalink / raw)
  To: alex.williamson
  Cc: kvm, linux-kernel, cohuck, zhenyuw, zhi.a.wang, kevin.tian,
	shaopeng.he, yi.l.liu, Yan Zhao

This allows vendor driver to read/write to bars directly which is useful
in security checking condition.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/vfio/pci/vfio_pci_rdwr.c | 26 +++++++++++++-------------
 include/linux/vfio.h             |  2 ++
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index a0ef1de4f74a..3ba85fb2af5b 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -129,7 +129,7 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
 	return done;
 }
 
-static int vfio_pci_setup_barmap(struct vfio_pci_device *vdev, int bar)
+void __iomem *vfio_pci_setup_barmap(struct vfio_pci_device *vdev, int bar)
 {
 	struct pci_dev *pdev = vdev->pdev;
 	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
@@ -137,22 +137,23 @@ static int vfio_pci_setup_barmap(struct vfio_pci_device *vdev, int bar)
 	void __iomem *io;
 
 	if (priv->barmap[bar])
-		return 0;
+		return priv->barmap[bar];
 
 	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
 	if (ret)
-		return ret;
+		return NULL;
 
 	io = pci_iomap(pdev, bar, 0);
 	if (!io) {
 		pci_release_selected_regions(pdev, 1 << bar);
-		return -ENOMEM;
+		return NULL;
 	}
 
 	priv->barmap[bar] = io;
 
-	return 0;
+	return io;
 }
+EXPORT_SYMBOL_GPL(vfio_pci_setup_barmap);
 
 ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
 			size_t count, loff_t *ppos, bool iswrite)
@@ -190,11 +191,9 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
 			return -ENOMEM;
 		x_end = end;
 	} else {
-		int ret = vfio_pci_setup_barmap(vdev, bar);
-		if (ret)
-			return ret;
-
-		io = priv->barmap[bar];
+		io = vfio_pci_setup_barmap(vdev, bar);
+		if (!io)
+			return -EFAULT;
 	}
 
 	if (bar == priv->msix_bar) {
@@ -309,6 +308,7 @@ long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
 	loff_t pos = offset & VFIO_PCI_OFFSET_MASK;
 	int ret, bar = VFIO_PCI_OFFSET_TO_INDEX(offset);
 	struct vfio_pci_ioeventfd *ioeventfd;
+	void __iomem *io;
 
 	/* Only support ioeventfds into BARs */
 	if (bar > VFIO_PCI_BAR5_REGION_INDEX)
@@ -328,9 +328,9 @@ long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
 		return -EINVAL;
 #endif
 
-	ret = vfio_pci_setup_barmap(vdev, bar);
-	if (ret)
-		return ret;
+	io = vfio_pci_setup_barmap(vdev, bar);
+	if (!io)
+		return -EFAULT;
 
 	mutex_lock(&priv->ioeventfds_lock);
 
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 4d7e80b2ed1b..00112ffd9ad0 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -223,6 +223,8 @@ extern int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma);
 extern void vfio_pci_request(void *device_data, unsigned int count);
 extern int vfio_pci_open(void *device_data);
 extern void vfio_pci_release(void *device_data);
+extern void __iomem *vfio_pci_setup_barmap(struct vfio_pci_device *vdev,
+					   int bar);
 
 #define vfio_pci_register_vendor_driver(__name, __probe, __remove,	\
 					__device_ops)			\
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 7/9] samples/vfio-pci: add a sample vendor module of vfio-pci for IGD devices
  2020-02-11  9:57 [RFC PATCH v3 0/9] Introduce vendor ops in vfio-pci Yan Zhao
                   ` (5 preceding siblings ...)
  2020-02-11 10:14 ` [RFC PATCH v3 6/9] vfio/pci: export vfio_pci_setup_barmap Yan Zhao
@ 2020-02-11 10:14 ` Yan Zhao
  2020-02-11 10:15 ` [RFC PATCH v3 8/9] vfio: header for vfio live migration region Yan Zhao
  2020-02-11 10:15 ` [RFC PATCH v3 9/9] i40e/vf_migration: vfio-pci vendor driver for VF live migration Yan Zhao
  8 siblings, 0 replies; 14+ messages in thread
From: Yan Zhao @ 2020-02-11 10:14 UTC (permalink / raw)
  To: alex.williamson
  Cc: kvm, linux-kernel, cohuck, zhenyuw, zhi.a.wang, kevin.tian,
	shaopeng.he, yi.l.liu, Yan Zhao

This sample driver does nothing but calls default device ops of vfio-pci
to pass through IGD devices.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 samples/Kconfig           |   6 ++
 samples/Makefile          |   1 +
 samples/vfio-pci/Makefile |   2 +
 samples/vfio-pci/igd_pt.c | 148 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 157 insertions(+)
 create mode 100644 samples/vfio-pci/Makefile
 create mode 100644 samples/vfio-pci/igd_pt.c

diff --git a/samples/Kconfig b/samples/Kconfig
index c8dacb4dda80..84d6a91567af 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -169,4 +169,10 @@ config SAMPLE_VFS
 	  as mount API and statx().  Note that this is restricted to the x86
 	  arch whilst it accesses system calls that aren't yet in all arches.
 
+config SAMPLE_VFIO_PCI_IGD_PT
+	tristate "Build VFIO sample vendor driver to pass through IGD devices -- loadable modules only"
+	depends on VFIO_PCI && m
+	help
+	  Build a sample driver to pass through IGD devices as a vendor driver
+	  of VFIO PCI.
 endif # SAMPLES
diff --git a/samples/Makefile b/samples/Makefile
index 7d6e4ca28d69..db56daf3a11c 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -19,4 +19,5 @@ obj-$(CONFIG_SAMPLE_TRACE_EVENTS)	+= trace_events/
 obj-$(CONFIG_SAMPLE_TRACE_PRINTK)	+= trace_printk/
 obj-$(CONFIG_VIDEO_PCI_SKELETON)	+= v4l/
 obj-y					+= vfio-mdev/
+obj-y					+= vfio-pci/
 subdir-$(CONFIG_SAMPLE_VFS)		+= vfs
diff --git a/samples/vfio-pci/Makefile b/samples/vfio-pci/Makefile
new file mode 100644
index 000000000000..7125f0a325c2
--- /dev/null
+++ b/samples/vfio-pci/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_SAMPLE_VFIO_PCI_IGD_PT) += igd_pt.o
diff --git a/samples/vfio-pci/igd_pt.c b/samples/vfio-pci/igd_pt.c
new file mode 100644
index 000000000000..45acbd1ad2fb
--- /dev/null
+++ b/samples/vfio-pci/igd_pt.c
@@ -0,0 +1,148 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Pass through IGD devices as a vendor driver of vfio-pci device driver
+ * Copyright(c) 2019 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/vfio.h>
+#include <linux/sysfs.h>
+#include <linux/file.h>
+#include <linux/pci.h>
+#include <linux/eventfd.h>
+
+#define VERSION_STRING  "0.1"
+#define DRIVER_AUTHOR   "Intel Corporation"
+
+/*
+ * below are pciids of IGD devices supported in this driver
+ */
+static const struct pci_device_id pciidlist[] = {
+	{0x8086, 0x5927, ~0, ~0, 0x30000, 0xff0000, 0},
+	{0x8086, 0x591d, ~0, ~0, 0x30000, 0xff0000, 0},
+	{0x8086, 0x193b, ~0, ~0, 0x30000, 0xff0000, 0},
+};
+
+struct igd_pt_device {
+	__u32 vendor;
+	__u32 device;
+
+};
+
+void *igd_pt_probe(struct pci_dev *pdev)
+{
+	int supported_dev_cnt =
+		sizeof(pciidlist)/sizeof(struct pci_device_id);
+	int i;
+	struct igd_pt_device *igd_device;
+
+	for (i = 0; i < supported_dev_cnt; i++) {
+		if (pciidlist[i].vendor == pdev->vendor &&
+				pciidlist[i].device == pdev->device)
+			goto support;
+	}
+
+	return ERR_PTR(-ENODEV);
+
+support:
+
+	igd_device = kzalloc(sizeof(*igd_device), GFP_KERNEL);
+
+	if (!igd_device)
+		return ERR_PTR(-ENOMEM);
+
+	igd_device->vendor = pdev->vendor;
+	igd_device->device = pdev->device;
+
+	return igd_device;
+}
+
+static void igd_pt_remove(void *vendor_data)
+{
+	struct igd_pt_device *igd_device =
+		(struct igd_pt_device *)vendor_data;
+
+	kfree(igd_device);
+}
+
+static int igd_pt_open(void *device_data)
+{
+	struct vfio_pci_device *vdev = device_data;
+
+	return vfio_pci_open(vdev);
+}
+
+void igd_pt_release(void *device_data)
+{
+	struct vfio_pci_device *vdev = device_data;
+
+	vfio_pci_release(vdev);
+}
+
+static long igd_pt_ioctl(void *device_data,
+			 unsigned int cmd, unsigned long arg)
+{
+	struct vfio_pci_device *vdev = device_data;
+
+	return vfio_pci_ioctl(vdev, cmd, arg);
+}
+
+static ssize_t igd_pt_read(void *device_data, char __user *buf,
+			     size_t count, loff_t *ppos)
+{
+	struct vfio_pci_device *vdev = device_data;
+
+	return vfio_pci_read(vdev, buf, count, ppos);
+}
+
+static ssize_t igd_pt_write(void *device_data, const char __user *buf,
+			    size_t count, loff_t *ppos)
+{
+	struct vfio_pci_device *vdev = device_data;
+
+	return vfio_pci_write(vdev, buf, count, ppos);
+}
+
+static int igd_pt_mmap(void *device_data, struct vm_area_struct *vma)
+{
+	struct vfio_pci_device *vdev = device_data;
+
+	return vfio_pci_mmap(vdev, vma);
+}
+
+static void igd_pt_request(void *device_data, unsigned int count)
+{
+	struct vfio_pci_device *vdev = device_data;
+
+	vfio_pci_request(vdev, count);
+}
+
+static struct vfio_device_ops igd_pt_device_ops_node = {
+	.name		= "IGD dt",
+	.open		= igd_pt_open,
+	.release	= igd_pt_release,
+	.ioctl		= igd_pt_ioctl,
+	.read		= igd_pt_read,
+	.write		= igd_pt_write,
+	.mmap		= igd_pt_mmap,
+	.request	= igd_pt_request,
+};
+
+#define igd_pt_device_ops (&igd_pt_device_ops_node)
+
+module_vfio_pci_register_vendor_handler("IGD dt", igd_pt_probe,
+					igd_pt_remove, igd_pt_device_ops);
+
+MODULE_ALIAS("vfio-pci:8086-591d");
+MODULE_ALIAS("vfio-pci:8086-5927");
+MODULE_LICENSE("GPL v2");
+MODULE_INFO(supported, "Sample driver as vendor driver of vfio-pci to pass through IGD");
+MODULE_VERSION(VERSION_STRING);
+MODULE_AUTHOR(DRIVER_AUTHOR);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 8/9] vfio: header for vfio live migration region.
  2020-02-11  9:57 [RFC PATCH v3 0/9] Introduce vendor ops in vfio-pci Yan Zhao
                   ` (6 preceding siblings ...)
  2020-02-11 10:14 ` [RFC PATCH v3 7/9] samples/vfio-pci: add a sample vendor module of vfio-pci for IGD devices Yan Zhao
@ 2020-02-11 10:15 ` Yan Zhao
  2020-02-11 10:15 ` [RFC PATCH v3 9/9] i40e/vf_migration: vfio-pci vendor driver for VF live migration Yan Zhao
  8 siblings, 0 replies; 14+ messages in thread
From: Yan Zhao @ 2020-02-11 10:15 UTC (permalink / raw)
  To: alex.williamson
  Cc: kvm, linux-kernel, cohuck, zhenyuw, zhi.a.wang, kevin.tian,
	shaopeng.he, yi.l.liu, Yan Zhao, Kirti Wankhede

Header file copied from vfio live migration patch series v8. [1].

[1] https://lists.gnu.org/archive/html/qemu-devel/2019-08/msg05542.html

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 include/uapi/linux/vfio.h | 149 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 149 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 9e843a147ead..135a1d3fa111 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -306,6 +306,155 @@ struct vfio_region_info_cap_type {
 #define VFIO_REGION_TYPE_GFX                    (1)
 #define VFIO_REGION_TYPE_CCW			(2)
 
+/* Migration region type and sub-type */
+#define VFIO_REGION_TYPE_MIGRATION          (3)
+#define VFIO_REGION_SUBTYPE_MIGRATION           (1)
+
+/**
+ * Structure vfio_device_migration_info is placed at 0th offset of
+ * VFIO_REGION_SUBTYPE_MIGRATION region to get/set VFIO device related migration
+ * information. Field accesses from this structure are only supported at their
+ * native width and alignment, otherwise the result is undefined and vendor
+ * drivers should return an error.
+ *
+ * device_state: (read/write)
+ *      To indicate vendor driver the state VFIO device should be transitioned
+ *      to. If device state transition fails, write on this field return error.
+ *      It consists of 3 bits:
+ *      - If bit 0 set, indicates _RUNNING state. When its reset, that indicates
+ *        _STOPPED state. When device is changed to _STOPPED, driver should stop
+ *        device before write() returns.
+ *      - If bit 1 set, indicates _SAVING state.
+ *      - If bit 2 set, indicates _RESUMING state.
+ *      Bits 3 - 31 are reserved for future use. User should perform
+ *      read-modify-write operation on this field.
+ *      _SAVING and _RESUMING bits set at the same time is invalid state.
+ *
+ * pending bytes: (read only)
+ *      Number of pending bytes yet to be migrated from vendor driver
+ *
+ * data_offset: (read only)
+ *      User application should read data_offset in migration region from where
+ *      user application should read device data during _SAVING state or write
+ *      device data during _RESUMING state or read dirty pages bitmap. See below
+ *      for detail of sequence to be followed.
+ *
+ * data_size: (read/write)
+ *      User application should read data_size to get size of data copied in
+ *      migration region during _SAVING state and write size of data copied in
+ *      migration region during _RESUMING state.
+ *
+ * start_pfn: (write only)
+ *      Start address pfn to get bitmap of dirty pages from vendor driver duing
+ *      _SAVING state.
+ *
+ * page_size: (write only)
+ *      User application should write the page_size of pfn.
+ *
+ * total_pfns: (write only)
+ *      Total pfn count from start_pfn for which dirty bitmap is requested.
+ *
+ * copied_pfns: (read only)
+ *      pfn count for which dirty bitmap is copied to migration region.
+ *      Vendor driver should copy the bitmap with bits set only for pages to be
+ *      marked dirty in migration region.
+ *      - Vendor driver should return VFIO_DEVICE_DIRTY_PFNS_NONE if none of the
+ *        pages are dirty in requested range or rest of the range.
+ *      - Vendor driver should return VFIO_DEVICE_DIRTY_PFNS_ALL to mark all
+ *        pages dirty in the given range or rest of the range.
+ *      - Vendor driver should return pfn count for which bitmap is written in
+ *        the region.
+ *
+ * Migration region looks like:
+ *  ------------------------------------------------------------------
+ * |vfio_device_migration_info|    data section                      |
+ * |                          |     ///////////////////////////////  |
+ * ------------------------------------------------------------------
+ *   ^                              ^                             ^
+ *  offset 0-trapped part        data_offset                 data_size
+ *
+ * Data section is always followed by vfio_device_migration_info structure
+ * in the region, so data_offset will always be non-0. Offset from where data
+ * is copied is decided by kernel driver, data section can be trapped or
+ * mapped or partitioned, depending on how kernel driver defines data section.
+ * Data section partition can be defined as mapped by sparse mmap capability.
+ * If mmapped, then data_offset should be page aligned, where as initial section
+ * which contain vfio_device_migration_info structure might not end at offset
+ * which is page aligned.
+ * Data_offset can be same or different for device data and dirty pages bitmap.
+ * Vendor driver should decide whether to partition data section and how to
+ * partition the data section. Vendor driver should return data_offset
+ * accordingly.
+ *
+ * Sequence to be followed for _SAVING|_RUNNING device state or pre-copy phase
+ * and for _SAVING device state or stop-and-copy phase:
+ * a. read pending_bytes. If pending_bytes > 0, go through below steps.
+ * b. read data_offset, indicates kernel driver to write data to staging buffer.
+ * c. read data_size, amount of data in bytes written by vendor driver in
+ *    migration region.
+ * d. read data_size bytes of data from data_offset in the migration region.
+ * e. process data.
+ * f. Loop through a to e.
+ *
+ * To copy system memory content during migration, vendor driver should be able
+ * to report system memory pages which are dirtied by that driver. For such
+ * dirty page reporting, user application should query for a range of GFNs
+ * relative to device address space (IOVA), then vendor driver should provide
+ * the bitmap of pages from this range which are dirtied by him through
+ * migration region where each bit represents a page and bit set to 1 represents
+ * that the page is dirty.
+ * User space application should take care of copying content of system memory
+ * for those pages.
+ *
+ * Steps to get dirty page bitmap:
+ * a. write start_pfn, page_size and total_pfns.
+ * b. read copied_pfns. Vendor driver should take one of the below action:
+ *     - Vendor driver should return VFIO_DEVICE_DIRTY_PFNS_NONE if driver
+ *       doesn't have any page to report dirty in given range or rest of the
+ *       range. Exit the loop.
+ *     - Vendor driver should return VFIO_DEVICE_DIRTY_PFNS_ALL to mark all
+ *       pages dirty for given range or rest of the range. User space
+ *       application mark all pages in the range as dirty and exit the loop.
+ *     - Vendor driver should return copied_pfns and provide bitmap for
+ *       copied_pfn in migration region.
+ * c. read data_offset, where vendor driver has written bitmap.
+ * d. read bitmap from the migration region from data_offset.
+ * e. Iterate through steps a to d while (total copied_pfns < total_pfns)
+ *
+ * Sequence to be followed while _RESUMING device state:
+ * While data for this device is available, repeat below steps:
+ * a. read data_offset from where user application should write data.
+ * b. write data of data_size to migration region from data_offset.
+ * c. write data_size which indicates vendor driver that data is written in
+ *    staging buffer.
+ *
+ * For user application, data is opaque. User should write data in the same
+ * order as received.
+ */
+
+struct vfio_device_migration_info {
+	__u32 device_state;         /* VFIO device state */
+#define VFIO_DEVICE_STATE_RUNNING   (1 << 0)
+#define VFIO_DEVICE_STATE_SAVING    (1 << 1)
+#define VFIO_DEVICE_STATE_RESUMING  (1 << 2)
+#define VFIO_DEVICE_STATE_MASK      (VFIO_DEVICE_STATE_RUNNING | \
+				     VFIO_DEVICE_STATE_SAVING | \
+				     VFIO_DEVICE_STATE_RESUMING)
+#define VFIO_DEVICE_STATE_INVALID   (VFIO_DEVICE_STATE_SAVING | \
+				     VFIO_DEVICE_STATE_RESUMING)
+	__u32 reserved;
+	__u64 pending_bytes;
+	__u64 data_offset;
+	__u64 data_size;
+	__u64 start_pfn;
+	__u64 page_size;
+	__u64 total_pfns;
+	__u64 copied_pfns;
+#define VFIO_DEVICE_DIRTY_PFNS_NONE     (0)
+#define VFIO_DEVICE_DIRTY_PFNS_ALL      (~0ULL)
+} __attribute__((packed));
+
+
 /* sub-types for VFIO_REGION_TYPE_PCI_* */
 
 /* 8086 vendor PCI sub-types */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 9/9] i40e/vf_migration: vfio-pci vendor driver for VF live migration
  2020-02-11  9:57 [RFC PATCH v3 0/9] Introduce vendor ops in vfio-pci Yan Zhao
                   ` (7 preceding siblings ...)
  2020-02-11 10:15 ` [RFC PATCH v3 8/9] vfio: header for vfio live migration region Yan Zhao
@ 2020-02-11 10:15 ` Yan Zhao
  8 siblings, 0 replies; 14+ messages in thread
From: Yan Zhao @ 2020-02-11 10:15 UTC (permalink / raw)
  To: alex.williamson
  Cc: kvm, linux-kernel, cohuck, zhenyuw, zhi.a.wang, kevin.tian,
	shaopeng.he, yi.l.liu, Yan Zhao

vfio pci operates on regions
(1) 0 ~ VFIO_PCI_NUM_REGIONS - 1
(2) VFIO_PCI_NUM_REGIONS ~ VFIO_PCI_NUM_REGIONS + vdev->num_regions -1

vf_migration operates on regions
VFIO_PCI_NUM_REGIONS + vdev->num_regions ~
VFIO_PCI_NUM_REGIONS + vdev->num_regions + vdev->num_vendor_regions

vf_migration also intercept BAR0 write operation.

Cc: Shaopeng He <shaopeng.he@intel.com>

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/net/ethernet/intel/Kconfig            |  10 +
 drivers/net/ethernet/intel/i40e/Makefile      |   2 +
 drivers/net/ethernet/intel/i40e/i40e.h        |   2 +
 .../ethernet/intel/i40e/i40e_vf_migration.c   | 635 ++++++++++++++++++
 .../ethernet/intel/i40e/i40e_vf_migration.h   |  92 +++
 5 files changed, 741 insertions(+)
 create mode 100644 drivers/net/ethernet/intel/i40e/i40e_vf_migration.c
 create mode 100644 drivers/net/ethernet/intel/i40e/i40e_vf_migration.h

diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
index 154e2e818ec6..fee0e70e6164 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -264,6 +264,16 @@ config I40E_DCB
 
 	  If unsure, say N.
 
+config I40E_VF_MIGRATION
+	tristate "XL710 Family VF live migration support -- loadable modules only"
+	depends on I40E && VFIO_PCI && m
+	help
+	  Say m if you want to enable live migration of
+	  Virtual Functions of Intel(R) Ethernet Controller XL710
+	  Family of devices. It must be a module.
+	  This module serves as vendor module of module vfio_pci.
+	  VFs bind to module vfio_pci directly.
+
 # this is here to allow seamless migration from I40EVF --> IAVF name
 # so that CONFIG_IAVF symbol will always mirror the state of CONFIG_I40EVF
 config IAVF
diff --git a/drivers/net/ethernet/intel/i40e/Makefile b/drivers/net/ethernet/intel/i40e/Makefile
index 2f21b3e89fd0..b80c224c2602 100644
--- a/drivers/net/ethernet/intel/i40e/Makefile
+++ b/drivers/net/ethernet/intel/i40e/Makefile
@@ -27,3 +27,5 @@ i40e-objs := i40e_main.o \
 	i40e_xsk.o
 
 i40e-$(CONFIG_I40E_DCB) += i40e_dcb.o i40e_dcb_nl.o
+
+obj-$(CONFIG_I40E_VF_MIGRATION) += i40e_vf_migration.o
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 2af9f6308f84..0141c94b835f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -1162,4 +1162,6 @@ int i40e_add_del_cloud_filter(struct i40e_vsi *vsi,
 int i40e_add_del_cloud_filter_big_buf(struct i40e_vsi *vsi,
 				      struct i40e_cloud_filter *filter,
 				      bool add);
+int i40e_vf_migration_register(void);
+void i40e_vf_migration_unregister(void);
 #endif /* _I40E_H_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_vf_migration.c b/drivers/net/ethernet/intel/i40e/i40e_vf_migration.c
new file mode 100644
index 000000000000..64517d54e81d
--- /dev/null
+++ b/drivers/net/ethernet/intel/i40e/i40e_vf_migration.c
@@ -0,0 +1,635 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2013 - 2019 Intel Corporation. */
+
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/vfio.h>
+#include <linux/pci.h>
+#include <linux/eventfd.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/sysfs.h>
+#include <linux/file.h>
+#include <linux/pci.h>
+
+#include "i40e.h"
+#include "i40e_vf_migration.h"
+
+#define VERSION_STRING  "0.1"
+#define DRIVER_AUTHOR   "Intel Corporation"
+
+static size_t set_device_state(struct i40e_vf_migration *i40e_vf_dev, u32 state)
+{
+	int ret = 0;
+	struct vfio_device_migration_info *mig_ctl = i40e_vf_dev->mig_ctl;
+
+	if (state == mig_ctl->device_state)
+		return ret;
+
+	switch (state) {
+	case VFIO_DEVICE_STATE_RUNNING:
+		break;
+	case VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RUNNING:
+		// alloc dirty page tracking resources and
+		// do the first round dirty page scanning
+		break;
+	case VFIO_DEVICE_STATE_SAVING:
+		// do the last round of dirty page scanning
+		break;
+	case ~VFIO_DEVICE_STATE_MASK & VFIO_DEVICE_STATE_MASK:
+		// release dirty page tracking resources
+		//if (mig_ctl->device_state == VFIO_DEVICE_STATE_SAVING)
+		//      i40e_release_scan_resources(i40e_vf_dev);
+		break;
+	case VFIO_DEVICE_STATE_RESUMING:
+		break;
+	default:
+		ret = -EFAULT;
+	}
+
+	mig_ctl->device_state = state;
+
+	return ret;
+}
+
+static
+ssize_t i40e_vf_region_migration_rw(struct i40e_vf_migration *i40e_vf_dev,
+				    char __user *buf, size_t count,
+				    loff_t *ppos, bool iswrite)
+{
+#define VDM_OFFSET(x) offsetof(struct vfio_device_migration_info, x)
+	struct vfio_device_migration_info *mig_ctl = i40e_vf_dev->mig_ctl;
+	u64 pos = *ppos & VFIO_PCI_OFFSET_MASK;
+	int ret = 0;
+
+	switch (pos) {
+	case VDM_OFFSET(device_state):
+		if (count != sizeof(mig_ctl->device_state)) {
+			ret = -EINVAL;
+			break;
+		}
+
+		if (iswrite) {
+			u32 device_state;
+
+			if (copy_from_user(&device_state, buf, count)) {
+				ret = -EFAULT;
+				break;
+			}
+
+			set_device_state(i40e_vf_dev, device_state);
+			ret = count;
+		} else {
+			ret = -EFAULT;
+		}
+		break;
+
+	case VDM_OFFSET(reserved):
+		ret = -EFAULT;
+		break;
+
+	case VDM_OFFSET(pending_bytes):
+		{
+		u64 p_bytes = 0;
+
+		if (count != sizeof(mig_ctl->pending_bytes)) {
+			ret = -EINVAL;
+			break;
+		}
+
+		if (iswrite)
+			ret = -EFAULT;
+		else
+			ret = copy_to_user(buf, &p_bytes, count) ?
+					   -EFAULT : count;
+
+		break;
+		}
+
+	case VDM_OFFSET(data_offset):
+		{
+		u64 d_off = DIRTY_BITMAP_OFFSET;
+
+		if (count != sizeof(mig_ctl->data_offset)) {
+			ret = -EINVAL;
+			break;
+		}
+
+		if (iswrite) {
+			ret = -EFAULT;
+			break;
+		}
+
+		/* always return dirty bitmap offset here as we don't support
+		 * device internal dirty data and our pending_bytes always
+		 * return 0
+		 */
+		ret = copy_to_user(buf, &d_off, count) ? -EFAULT : count;
+		break;
+		}
+	case VDM_OFFSET(data_size):
+		if (count != sizeof(mig_ctl->data_size)) {
+			ret = -EINVAL;
+			break;
+		}
+
+		if (iswrite)
+			ret = copy_from_user(&mig_ctl->data_size, buf, count) ?
+					     -EFAULT : count;
+		else
+			ret = copy_to_user(buf, &mig_ctl->data_size, count) ?
+					   -EFAULT : count;
+		break;
+
+	case VDM_OFFSET(start_pfn):
+		if (count != sizeof(mig_ctl->start_pfn)) {
+			ret = -EINVAL;
+			break;
+		}
+		if (iswrite)
+			ret = copy_from_user(&mig_ctl->start_pfn, buf, count) ?
+					     -EFAULT : count;
+		else
+			ret = -EFAULT;
+		break;
+
+	case VDM_OFFSET(page_size):
+		if (count != sizeof(mig_ctl->page_size)) {
+			ret = -EINVAL;
+			break;
+		}
+
+		if (iswrite)
+			ret = copy_from_user(&mig_ctl->page_size, buf,	count) ?
+					     -EFAULT : count;
+		else
+			ret = -EFAULT;
+		break;
+
+	case VDM_OFFSET(total_pfns):
+		if (count != sizeof(mig_ctl->total_pfns)) {
+			ret = -EINVAL;
+			break;
+		}
+
+		if (iswrite) {
+			if (copy_from_user(&mig_ctl->total_pfns, buf, count)) {
+				ret = -EFAULT;
+				break;
+			}
+			//calc dirty page bitmap
+			ret = count;
+		} else {
+			ret = -EFAULT;
+		}
+		break;
+
+	case VDM_OFFSET(copied_pfns):
+		if (count != sizeof(mig_ctl->copied_pfns)) {
+			ret = -EINVAL;
+			break;
+		}
+
+		if (iswrite)
+			ret = -EFAULT;
+		else
+			ret = copy_to_user(buf, &mig_ctl->copied_pfns, count) ?
+					   -EFAULT : count;
+		break;
+
+	case DIRTY_BITMAP_OFFSET:
+		if (count > MIGRATION_DIRTY_BITMAP_SIZE || count < 0) {
+			ret = -EINVAL;
+			break;
+		}
+
+		if (iswrite)
+			ret = -EFAULT;
+		else
+			ret = copy_to_user(buf, i40e_vf_dev->dirty_bitmap,
+					   count) ? -EFAULT : count;
+		break;
+
+	default:
+		ret = -EFAULT;
+		break;
+	}
+	return ret;
+}
+
+static
+int i40e_vf_region_migration_add_cap(struct i40e_vf_migration *i40e_vf_dev,
+				     struct i40e_vf_region *region,
+				     struct vfio_info_cap *caps)
+{
+	struct vfio_region_info_cap_sparse_mmap *sparse;
+	size_t size;
+	int nr_areas = 1;
+
+	size = sizeof(*sparse) + (nr_areas * sizeof(*sparse->areas));
+
+	sparse = kzalloc(size, GFP_KERNEL);
+	if (!sparse)
+		return -ENOMEM;
+
+	sparse->header.id = VFIO_REGION_INFO_CAP_SPARSE_MMAP;
+	sparse->header.version = 1;
+	sparse->nr_areas = nr_areas;
+
+	sparse->areas[0].offset = DIRTY_BITMAP_OFFSET;
+	sparse->areas[0].size = MIGRATION_DIRTY_BITMAP_SIZE;
+
+	vfio_info_add_capability(caps, &sparse->header, size);
+	kfree(sparse);
+	return 0;
+}
+
+static
+int i40e_vf_region_migration_mmap(struct i40e_vf_migration *i40e_vf_dev,
+				  struct i40e_vf_region *region,
+				  struct vm_area_struct *vma)
+{
+	unsigned long pgoff = 0;
+	void *base;
+
+	base = i40e_vf_dev->dirty_bitmap;
+
+	if (vma->vm_end < vma->vm_start)
+		return -EINVAL;
+
+	if (!(vma->vm_flags & VM_SHARED))
+		return -EINVAL;
+
+	pgoff = vma->vm_pgoff &
+		((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
+
+	if (pgoff != DIRTY_BITMAP_OFFSET / PAGE_SIZE)
+		return -EINVAL;
+
+	return remap_vmalloc_range(vma, base, 0);
+}
+
+static
+void i40e_vf_region_migration_release(struct i40e_vf_migration *i40e_vf_dev,
+				      struct i40e_vf_region *region)
+{
+	if (i40e_vf_dev->dirty_bitmap) {
+		vfree(i40e_vf_dev->dirty_bitmap);
+		i40e_vf_dev->dirty_bitmap = NULL;
+	}
+	kfree(i40e_vf_dev->mig_ctl);
+	i40e_vf_dev->mig_ctl = NULL;
+}
+
+static const struct i40e_vf_region_ops i40e_vf_region_ops_migration = {
+	.rw		= i40e_vf_region_migration_rw,
+	.release	= i40e_vf_region_migration_release,
+	.mmap		= i40e_vf_region_migration_mmap,
+	.add_cap	= i40e_vf_region_migration_add_cap
+};
+
+static int i40e_vf_register_region(struct i40e_vf_migration *i40e_vf_dev,
+				   unsigned int type, unsigned int subtype,
+				   const struct i40e_vf_region_ops *ops,
+				   size_t size, u32 flags, void *data)
+{
+	struct i40e_vf_region *regions;
+
+	regions = krealloc(i40e_vf_dev->regions,
+			   (i40e_vf_dev->num_regions + 1) * sizeof(*regions),
+			   GFP_KERNEL);
+	if (!regions)
+		return -ENOMEM;
+
+	i40e_vf_dev->regions = regions;
+	regions[i40e_vf_dev->num_regions].type = type;
+	regions[i40e_vf_dev->num_regions].subtype = subtype;
+	regions[i40e_vf_dev->num_regions].ops = ops;
+	regions[i40e_vf_dev->num_regions].size = size;
+	regions[i40e_vf_dev->num_regions].flags = flags;
+	regions[i40e_vf_dev->num_regions].data = data;
+	i40e_vf_dev->num_regions++;
+	return 0;
+}
+
+void *i40e_vf_probe(struct pci_dev *pdev)
+{
+	struct i40e_vf_migration *i40e_vf_dev = NULL;
+	struct pci_dev *pf_dev, *vf_dev;
+	struct i40e_pf *pf;
+	struct i40e_vf *vf;
+	unsigned int vf_devfn, devfn;
+	int vf_id = -1;
+	int i;
+
+	pf_dev = pdev->physfn;
+	pf = pci_get_drvdata(pf_dev);
+	vf_dev = pdev;
+	vf_devfn = vf_dev->devfn;
+
+	for (i = 0; i < pci_num_vf(pf_dev); i++) {
+		devfn = (pf_dev->devfn + pf_dev->sriov->offset +
+			 pf_dev->sriov->stride * i) & 0xff;
+		if (devfn == vf_devfn) {
+			vf_id = i;
+			break;
+		}
+	}
+
+	if (vf_id == -1)
+		return ERR_PTR(-EINVAL);
+
+	i40e_vf_dev = kzalloc(sizeof(*i40e_vf_dev), GFP_KERNEL);
+
+	if (!i40e_vf_dev)
+		return ERR_PTR(-ENOMEM);
+
+	i40e_vf_dev->vf_id = vf_id;
+	i40e_vf_dev->vf_vendor = pdev->vendor;
+	i40e_vf_dev->vf_device = pdev->device;
+	i40e_vf_dev->pf_dev = pf_dev;
+	i40e_vf_dev->vf_dev = vf_dev;
+	mutex_init(&i40e_vf_dev->reflock);
+
+	vf = &pf->vf[vf_id];
+
+	return i40e_vf_dev;
+}
+
+static void i40e_vf_remove(void *vendor_data)
+{
+	struct i40e_vf_migration *i40e_vf_dev =
+				(struct i40e_vf_migration *)vendor_data;
+	kfree(i40e_vf_dev);
+}
+
+static int i40e_vf_open(void *device_data)
+{
+	struct vfio_pci_device *vdev = device_data;
+	struct i40e_vf_migration *i40e_vf_dev =
+				(struct i40e_vf_migration *)vdev->vendor_data;
+	int ret;
+	struct vfio_device_migration_info *mig_ctl = NULL;
+	void *dirty_bitmap_base = NULL;
+
+	if (!try_module_get(THIS_MODULE))
+		return -ENODEV;
+
+	mutex_lock(&i40e_vf_dev->reflock);
+	if (!i40e_vf_dev->refcnt) {
+		mig_ctl = kzalloc(sizeof(*mig_ctl), GFP_KERNEL);
+		if (!mig_ctl) {
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		dirty_bitmap_base = vmalloc_user(MIGRATION_DIRTY_BITMAP_SIZE);
+		if (!dirty_bitmap_base) {
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		ret = i40e_vf_register_region(i40e_vf_dev,
+					      VFIO_REGION_TYPE_MIGRATION,
+					      VFIO_REGION_SUBTYPE_MIGRATION,
+					      &i40e_vf_region_ops_migration,
+					      DIRTY_BITMAP_OFFSET +
+					      MIGRATION_DIRTY_BITMAP_SIZE,
+					      VFIO_REGION_INFO_FLAG_CAPS |
+					      VFIO_REGION_INFO_FLAG_READ |
+					      VFIO_REGION_INFO_FLAG_WRITE |
+					      VFIO_REGION_INFO_FLAG_MMAP,
+					      NULL);
+		if (ret)
+			goto error;
+
+		i40e_vf_dev->dirty_bitmap = dirty_bitmap_base;
+		i40e_vf_dev->mig_ctl = mig_ctl;
+		vdev->num_vendor_regions = i40e_vf_dev->num_regions;
+	}
+
+	ret = vfio_pci_open(vdev);
+	if (ret)
+		goto error;
+
+	i40e_vf_dev->refcnt++;
+	mutex_unlock(&i40e_vf_dev->reflock);
+	return 0;
+error:
+	if (!i40e_vf_dev->refcnt) {
+		kfree(mig_ctl);
+		vfree(dirty_bitmap_base);
+		kfree(i40e_vf_dev->regions);
+		i40e_vf_dev->num_regions = 0;
+		i40e_vf_dev->regions = NULL;
+		vdev->num_vendor_regions = 0;
+	}
+	module_put(THIS_MODULE);
+	mutex_unlock(&i40e_vf_dev->reflock);
+	return ret;
+ }
+
+void i40e_vf_release(void *device_data)
+{
+	struct vfio_pci_device *vdev = device_data;
+	struct i40e_vf_migration *i40e_vf_dev =
+				(struct i40e_vf_migration *)vdev->vendor_data;
+
+	mutex_lock(&i40e_vf_dev->reflock);
+	if (!--i40e_vf_dev->refcnt) {
+		int i;
+
+		for (i = 0; i < i40e_vf_dev->num_regions; i++)
+			i40e_vf_dev->regions[i].ops->release(i40e_vf_dev,
+						&i40e_vf_dev->regions[i]);
+		i40e_vf_dev->num_regions = 0;
+		kfree(i40e_vf_dev->regions);
+		i40e_vf_dev->regions = NULL;
+		vdev->num_vendor_regions = 0;
+	}
+	mutex_unlock(&i40e_vf_dev->reflock);
+	vfio_pci_release(vdev);
+	module_put(THIS_MODULE);
+}
+
+static long i40e_vf_ioctl(void *device_data,
+			  unsigned int cmd, unsigned long arg)
+{
+	struct vfio_pci_device *vdev = device_data;
+	struct i40e_vf_migration *i40e_vf_dev =
+				(struct i40e_vf_migration *)vdev->vendor_data;
+	unsigned long minsz;
+
+	if (cmd == VFIO_DEVICE_GET_REGION_INFO) {
+		struct vfio_region_info info;
+		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+		int index, ret;
+		struct vfio_region_info_cap_type cap_type = {
+			.header.id = VFIO_REGION_INFO_CAP_TYPE,
+			.header.version = 1 };
+		struct i40e_vf_region *regions;
+
+		minsz = offsetofend(struct vfio_region_info, offset);
+
+		if (copy_from_user(&info, (void __user *)arg, minsz))
+			return -EFAULT;
+		if (info.argsz < minsz)
+			return -EINVAL;
+		if (info.index < VFIO_PCI_NUM_REGIONS + vdev->num_regions)
+			goto default_handle;
+
+		index = info.index - VFIO_PCI_NUM_REGIONS - vdev->num_regions;
+		if (index > i40e_vf_dev->num_regions)
+			return -EINVAL;
+
+		info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
+		regions = i40e_vf_dev->regions;
+		info.size = regions[index].size;
+		info.flags = regions[index].flags;
+		cap_type.type = regions[index].type;
+		cap_type.subtype = regions[index].subtype;
+
+		ret = vfio_info_add_capability(&caps, &cap_type.header,
+					       sizeof(cap_type));
+		if (ret)
+			return ret;
+
+		if (regions[index].ops->add_cap) {
+			ret = regions[index].ops->add_cap(i40e_vf_dev,
+						&regions[index], &caps);
+				if (ret)
+					return ret;
+		}
+
+		if (caps.size) {
+			info.flags |= VFIO_REGION_INFO_FLAG_CAPS;
+			if (info.argsz < sizeof(info) + caps.size) {
+				info.argsz = sizeof(info) + caps.size;
+				info.cap_offset = 0;
+			} else {
+				vfio_info_cap_shift(&caps, sizeof(info));
+				if (copy_to_user((void __user *)arg +
+						  sizeof(info), caps.buf,
+						  caps.size)) {
+					kfree(caps.buf);
+					return -EFAULT;
+				}
+				info.cap_offset = sizeof(info);
+			}
+
+			kfree(caps.buf);
+		}
+
+		return copy_to_user((void __user *)arg, &info, minsz) ?
+			-EFAULT : 0;
+	}
+
+default_handle:
+	return vfio_pci_ioctl(vdev, cmd, arg);
+}
+
+static ssize_t i40e_vf_read(void *device_data, char __user *buf,
+			    size_t count, loff_t *ppos)
+{
+	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+	struct vfio_pci_device *vdev = device_data;
+	struct i40e_vf_migration *i40e_vf_dev =
+				(struct i40e_vf_migration *)vdev->vendor_data;
+	struct i40e_vf_region *region;
+
+	if (index < VFIO_PCI_NUM_REGIONS + vdev->num_regions)
+		return vfio_pci_read(vdev, buf, count, ppos);
+	else if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions +
+			vdev->num_vendor_regions)
+		return -EINVAL;
+
+	index -= VFIO_PCI_NUM_REGIONS + vdev->num_regions;
+
+	region = &i40e_vf_dev->regions[index];
+	if (!region->ops->rw)
+		return -EINVAL;
+
+	return region->ops->rw(i40e_vf_dev, buf, count, ppos, false);
+}
+
+static ssize_t i40e_vf_write(void *device_data, const char __user *buf,
+			     size_t count, loff_t *ppos)
+{
+	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+	struct vfio_pci_device *vdev = device_data;
+	struct i40e_vf_migration *i40e_vf_dev =
+				(struct i40e_vf_migration *)vdev->vendor_data;
+	struct i40e_vf_region *region;
+
+	if (index == VFIO_PCI_BAR0_REGION_INDEX) {
+		;// scan dirty pages
+	}
+
+	if (index < VFIO_PCI_NUM_REGIONS + vdev->num_regions)
+		return vfio_pci_write(vdev, buf, count, ppos);
+	else if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions +
+			vdev->num_vendor_regions)
+		return -EINVAL;
+
+	index -= VFIO_PCI_NUM_REGIONS + vdev->num_regions;
+
+	region = &i40e_vf_dev->regions[index];
+
+	if (!region->ops->rw)
+		return -EINVAL;
+
+	return region->ops->rw(i40e_vf_dev, (char __user *)buf,
+			       count, ppos, true);
+}
+
+static int i40e_vf_mmap(void *device_data, struct vm_area_struct *vma)
+{
+	struct vfio_pci_device *vdev = device_data;
+	struct i40e_vf_migration *i40e_vf_dev =
+				(struct i40e_vf_migration *)vdev->vendor_data;
+	unsigned int index;
+	struct i40e_vf_region *region;
+
+	index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
+	if (index < VFIO_PCI_NUM_REGIONS + vdev->num_regions)
+		return vfio_pci_mmap(vdev, vma);
+	else if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions +
+			vdev->num_vendor_regions)
+		return -EINVAL;
+
+	index -= VFIO_PCI_NUM_REGIONS + vdev->num_regions;
+
+	region = &i40e_vf_dev->regions[index];
+	if (!region->ops->mmap)
+		return -EINVAL;
+
+	return region->ops->mmap(i40e_vf_dev, region, vma);
+}
+
+static void i40e_vf_request(void *device_data, unsigned int count)
+{
+	struct vfio_pci_device *vdev = device_data;
+
+	vfio_pci_request(vdev, count);
+}
+
+static struct vfio_device_ops i40e_vf_device_ops_node = {
+	.name		= "i40e_vf",
+	.open		= i40e_vf_open,
+	.release	= i40e_vf_release,
+	.ioctl		= i40e_vf_ioctl,
+	.read		= i40e_vf_read,
+	.write		= i40e_vf_write,
+	.mmap		= i40e_vf_mmap,
+	.request	= i40e_vf_request,
+};
+
+#define i40e_vf_device_ops (&i40e_vf_device_ops_node)
+module_vfio_pci_register_vendor_handler("I40E VF", i40e_vf_probe,
+					i40e_vf_remove, i40e_vf_device_ops);
+
+MODULE_ALIAS("vfio-pci:8086-154c");
+MODULE_LICENSE("GPL v2");
+MODULE_INFO(supported, "Vendor driver of vfio pci to support VF live migration");
+MODULE_VERSION(VERSION_STRING);
+MODULE_AUTHOR(DRIVER_AUTHOR);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_vf_migration.h b/drivers/net/ethernet/intel/i40e/i40e_vf_migration.h
new file mode 100644
index 000000000000..4d804f8cb032
--- /dev/null
+++ b/drivers/net/ethernet/intel/i40e/i40e_vf_migration.h
@@ -0,0 +1,92 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright(c) 2013 - 2019 Intel Corporation. */
+
+#ifndef I40E_MIG_H
+#define I40E_MIG_H
+
+#include <linux/pci.h>
+#include <linux/vfio.h>
+#include <linux/mdev.h>
+
+#include "i40e.h"
+#include "i40e_txrx.h"
+
+/* helper macros copied from vfio-pci */
+#define VFIO_PCI_OFFSET_SHIFT   40
+#define VFIO_PCI_OFFSET_TO_INDEX(off)   ((off) >> VFIO_PCI_OFFSET_SHIFT)
+#define VFIO_PCI_INDEX_TO_OFFSET(index)	((u64)(index) << VFIO_PCI_OFFSET_SHIFT)
+#define VFIO_PCI_OFFSET_MASK    (((u64)(1) << VFIO_PCI_OFFSET_SHIFT) - 1)
+#define DIRTY_BITMAP_OFFSET \
+			PAGE_ALIGN(sizeof(struct vfio_device_migration_info))
+#define MIGRATION_DIRTY_BITMAP_SIZE (64 * 1024UL)
+
+/* Single Root I/O Virtualization */
+struct pci_sriov {
+	int		pos;		/* Capability position */
+	int		nres;		/* Number of resources */
+	u32		cap;		/* SR-IOV Capabilities */
+	u16		ctrl;		/* SR-IOV Control */
+	u16		total_VFs;	/* Total VFs associated with the PF */
+	u16		initial_VFs;	/* Initial VFs associated with the PF */
+	u16		num_VFs;	/* Number of VFs available */
+	u16		offset;		/* First VF Routing ID offset */
+	u16		stride;		/* Following VF stride */
+	u16		vf_device;	/* VF device ID */
+	u32		pgsz;		/* Page size for BAR alignment */
+	u8		link;		/* Function Dependency Link */
+	u8		max_VF_buses;	/* Max buses consumed by VFs */
+	u16		driver_max_VFs;	/* Max num VFs driver supports */
+	struct pci_dev	*dev;		/* Lowest numbered PF */
+	struct pci_dev	*self;		/* This PF */
+	u32		cfg_size;	/* VF config space size */
+	u32		class;		/* VF device */
+	u8		hdr_type;	/* VF header type */
+	u16		subsystem_vendor; /* VF subsystem vendor */
+	u16		subsystem_device; /* VF subsystem device */
+	resource_size_t	barsz[PCI_SRIOV_NUM_BARS];	/* VF BAR size */
+	bool		drivers_autoprobe; /* Auto probing of VFs by driver */
+};
+
+struct i40e_vf_migration {
+	__u32				vf_vendor;
+	__u32				vf_device;
+	__u32				handle;
+	struct pci_dev			*pf_dev;
+	struct pci_dev			*vf_dev;
+	int				vf_id;
+	int				refcnt;
+	struct				mutex reflock; /*mutex protect refcnt */
+
+	struct vfio_device_migration_info *mig_ctl;
+	void				*dirty_bitmap;
+
+	struct i40e_vf_region		*regions;
+	int				num_regions;
+};
+
+struct i40e_vf_region;
+struct i40e_vf_region_ops {
+	ssize_t	(*rw)(struct i40e_vf_migration *i40e_vf_dev,
+		      char __user *buf, size_t count,
+		      loff_t *ppos, bool iswrite);
+	void	(*release)(struct i40e_vf_migration *i40e_vf_dev,
+			   struct i40e_vf_region *region);
+	int	(*mmap)(struct i40e_vf_migration *i40e_vf_dev,
+			struct i40e_vf_region *region,
+			struct vm_area_struct *vma);
+	int	(*add_cap)(struct i40e_vf_migration *i40e_vf_dev,
+			   struct i40e_vf_region *region,
+			   struct vfio_info_cap *caps);
+};
+
+struct i40e_vf_region {
+	u32				type;
+	u32				subtype;
+	size_t				size;
+	u32				flags;
+	const struct i40e_vf_region_ops	*ops;
+	void				*data;
+};
+
+#endif /* I40E_MIG_H */
+
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 1/9] vfio/pci: export vfio_pci_device public and add vfio_pci_device_private
  2020-02-11 10:10 ` [RFC PATCH v3 1/9] vfio/pci: export vfio_pci_device public and add vfio_pci_device_private Yan Zhao
@ 2020-02-20 21:00   ` Alex Williamson
  2020-02-21  6:19     ` Yan Zhao
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Williamson @ 2020-02-20 21:00 UTC (permalink / raw)
  To: Yan Zhao
  Cc: kvm, linux-kernel, cohuck, zhenyuw, zhi.a.wang, kevin.tian,
	shaopeng.he, yi.l.liu

On Tue, 11 Feb 2020 05:10:38 -0500
Yan Zhao <yan.y.zhao@intel.com> wrote:

> (1) make vfio_pci_device public, so it is accessible from external code.
> (2) add a private struct vfio_pci_device_private, which is only accessible
> from internal code. It extends struct vfio_pci_device.
> 
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci.c         | 256 +++++++++++++++-------------
>  drivers/vfio/pci/vfio_pci_config.c  | 186 ++++++++++++--------
>  drivers/vfio/pci/vfio_pci_igd.c     |  19 ++-
>  drivers/vfio/pci/vfio_pci_intrs.c   | 186 +++++++++++---------
>  drivers/vfio/pci/vfio_pci_nvlink2.c |  22 +--
>  drivers/vfio/pci/vfio_pci_private.h |   7 +-
>  drivers/vfio/pci/vfio_pci_rdwr.c    |  40 +++--
>  include/linux/vfio.h                |   5 +
>  8 files changed, 408 insertions(+), 313 deletions(-)

[SNIP!]

> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index e42a711a2800..70a2b8fb6179 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -195,4 +195,9 @@ extern int vfio_virqfd_enable(void *opaque,
>  			      void *data, struct virqfd **pvirqfd, int fd);
>  extern void vfio_virqfd_disable(struct virqfd **pvirqfd);
>  
> +struct vfio_pci_device {
> +	struct pci_dev			*pdev;
> +	int				num_regions;
> +	int				irq_type;
> +};
>  #endif /* VFIO_H */

Hi Yan,

Sorry for the delay.  I'm still not very happy with this result, I was
hoping the changes could be done less intrusively.  Maybe here's
another suggestion, why can't the vendor driver use a struct
vfio_pci_device* as an opaque pointer?  If you only want these three
things initially, I think this whole massive patch can be reduced to:

struct pci_dev *vfio_pci_pdev(struct vfio_pci_device *vdev)
{
	return vdev->pdev;
}
EXPORT_SYMBOL_GPL(vfio_pci_dev);

int vfio_pci_num_regions(struct vfio_pci_device *vdev)
{
	return vdev->num_regions;
}
EXPORT_SYMBOL_GPL(vfio_pci_num_region);

int vfio_pci_irq_type(struct vfio_pci_device *vdev)
{
	return vdev->irq_type;
}
EXPORT_SYMBOL_GPL(vfio_pci_irq_type);

This is how vfio-pci works with vfio, we don't know a struct
vfio_device as anything other than an opaque pointer and we have access
function where we need to see some property of that object.

Patch 5/9 would become a vfio_pci_set_vendor_regions() interface.

Thanks,
Alex


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 6/9] vfio/pci: export vfio_pci_setup_barmap
  2020-02-11 10:14 ` [RFC PATCH v3 6/9] vfio/pci: export vfio_pci_setup_barmap Yan Zhao
@ 2020-02-20 21:00   ` Alex Williamson
  2020-02-21  6:20     ` Yan Zhao
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Williamson @ 2020-02-20 21:00 UTC (permalink / raw)
  To: Yan Zhao
  Cc: kvm, linux-kernel, cohuck, zhenyuw, zhi.a.wang, kevin.tian,
	shaopeng.he, yi.l.liu

On Tue, 11 Feb 2020 05:14:19 -0500
Yan Zhao <yan.y.zhao@intel.com> wrote:

> This allows vendor driver to read/write to bars directly which is useful
> in security checking condition.
> 
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_rdwr.c | 26 +++++++++++++-------------
>  include/linux/vfio.h             |  2 ++
>  2 files changed, 15 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
> index a0ef1de4f74a..3ba85fb2af5b 100644
> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> @@ -129,7 +129,7 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
>  	return done;
>  }
>  
> -static int vfio_pci_setup_barmap(struct vfio_pci_device *vdev, int bar)
> +void __iomem *vfio_pci_setup_barmap(struct vfio_pci_device *vdev, int bar)
>  {
>  	struct pci_dev *pdev = vdev->pdev;
>  	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
> @@ -137,22 +137,23 @@ static int vfio_pci_setup_barmap(struct vfio_pci_device *vdev, int bar)
>  	void __iomem *io;
>  
>  	if (priv->barmap[bar])
> -		return 0;
> +		return priv->barmap[bar];
>  
>  	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
>  	if (ret)
> -		return ret;
> +		return NULL;
>  
>  	io = pci_iomap(pdev, bar, 0);
>  	if (!io) {
>  		pci_release_selected_regions(pdev, 1 << bar);
> -		return -ENOMEM;
> +		return NULL;
>  	}
>  
>  	priv->barmap[bar] = io;
>  
> -	return 0;
> +	return io;
>  }
> +EXPORT_SYMBOL_GPL(vfio_pci_setup_barmap);

This should instead become a vfio_pci_get_barmap() function that tests
for an optionally calls vfio_pci_setup_barmap before returning the
pointer.  I'm now willing to lose the better error returns in the
original.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 1/9] vfio/pci: export vfio_pci_device public and add vfio_pci_device_private
  2020-02-20 21:00   ` Alex Williamson
@ 2020-02-21  6:19     ` Yan Zhao
  0 siblings, 0 replies; 14+ messages in thread
From: Yan Zhao @ 2020-02-21  6:19 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, linux-kernel, cohuck, zhenyuw, Wang, Zhi A, Tian, Kevin, He,
	Shaopeng, Liu, Yi L

On Fri, Feb 21, 2020 at 05:00:11AM +0800, Alex Williamson wrote:
> On Tue, 11 Feb 2020 05:10:38 -0500
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > (1) make vfio_pci_device public, so it is accessible from external code.
> > (2) add a private struct vfio_pci_device_private, which is only accessible
> > from internal code. It extends struct vfio_pci_device.
> > 
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > ---
> >  drivers/vfio/pci/vfio_pci.c         | 256 +++++++++++++++-------------
> >  drivers/vfio/pci/vfio_pci_config.c  | 186 ++++++++++++--------
> >  drivers/vfio/pci/vfio_pci_igd.c     |  19 ++-
> >  drivers/vfio/pci/vfio_pci_intrs.c   | 186 +++++++++++---------
> >  drivers/vfio/pci/vfio_pci_nvlink2.c |  22 +--
> >  drivers/vfio/pci/vfio_pci_private.h |   7 +-
> >  drivers/vfio/pci/vfio_pci_rdwr.c    |  40 +++--
> >  include/linux/vfio.h                |   5 +
> >  8 files changed, 408 insertions(+), 313 deletions(-)
> 
> [SNIP!]
> 
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index e42a711a2800..70a2b8fb6179 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -195,4 +195,9 @@ extern int vfio_virqfd_enable(void *opaque,
> >  			      void *data, struct virqfd **pvirqfd, int fd);
> >  extern void vfio_virqfd_disable(struct virqfd **pvirqfd);
> >  
> > +struct vfio_pci_device {
> > +	struct pci_dev			*pdev;
> > +	int				num_regions;
> > +	int				irq_type;
> > +};
> >  #endif /* VFIO_H */
> 
> Hi Yan,
> 
> Sorry for the delay.  I'm still not very happy with this result, I was
> hoping the changes could be done less intrusively.  Maybe here's
> another suggestion, why can't the vendor driver use a struct
> vfio_pci_device* as an opaque pointer?  If you only want these three
> things initially, I think this whole massive patch can be reduced to:
> 
> struct pci_dev *vfio_pci_pdev(struct vfio_pci_device *vdev)
> {
> 	return vdev->pdev;
> }
> EXPORT_SYMBOL_GPL(vfio_pci_dev);
> 
> int vfio_pci_num_regions(struct vfio_pci_device *vdev)
> {
> 	return vdev->num_regions;
> }
> EXPORT_SYMBOL_GPL(vfio_pci_num_region);
> 
> int vfio_pci_irq_type(struct vfio_pci_device *vdev)
> {
> 	return vdev->irq_type;
> }
> EXPORT_SYMBOL_GPL(vfio_pci_irq_type);
> 
> This is how vfio-pci works with vfio, we don't know a struct
> vfio_device as anything other than an opaque pointer and we have access
> function where we need to see some property of that object.
> 
> Patch 5/9 would become a vfio_pci_set_vendor_regions() interface.
>
you are right!
I'll change it to this way. Thanks.

Yan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 6/9] vfio/pci: export vfio_pci_setup_barmap
  2020-02-20 21:00   ` Alex Williamson
@ 2020-02-21  6:20     ` Yan Zhao
  0 siblings, 0 replies; 14+ messages in thread
From: Yan Zhao @ 2020-02-21  6:20 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, linux-kernel, cohuck, zhenyuw, Wang, Zhi A, Tian, Kevin, He,
	Shaopeng, Liu, Yi L

On Fri, Feb 21, 2020 at 05:00:13AM +0800, Alex Williamson wrote:
> On Tue, 11 Feb 2020 05:14:19 -0500
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > This allows vendor driver to read/write to bars directly which is useful
> > in security checking condition.
> > 
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > ---
> >  drivers/vfio/pci/vfio_pci_rdwr.c | 26 +++++++++++++-------------
> >  include/linux/vfio.h             |  2 ++
> >  2 files changed, 15 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
> > index a0ef1de4f74a..3ba85fb2af5b 100644
> > --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> > +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> > @@ -129,7 +129,7 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
> >  	return done;
> >  }
> >  
> > -static int vfio_pci_setup_barmap(struct vfio_pci_device *vdev, int bar)
> > +void __iomem *vfio_pci_setup_barmap(struct vfio_pci_device *vdev, int bar)
> >  {
> >  	struct pci_dev *pdev = vdev->pdev;
> >  	struct vfio_pci_device_private *priv = VDEV_TO_PRIV(vdev);
> > @@ -137,22 +137,23 @@ static int vfio_pci_setup_barmap(struct vfio_pci_device *vdev, int bar)
> >  	void __iomem *io;
> >  
> >  	if (priv->barmap[bar])
> > -		return 0;
> > +		return priv->barmap[bar];
> >  
> >  	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
> >  	if (ret)
> > -		return ret;
> > +		return NULL;
> >  
> >  	io = pci_iomap(pdev, bar, 0);
> >  	if (!io) {
> >  		pci_release_selected_regions(pdev, 1 << bar);
> > -		return -ENOMEM;
> > +		return NULL;
> >  	}
> >  
> >  	priv->barmap[bar] = io;
> >  
> > -	return 0;
> > +	return io;
> >  }
> > +EXPORT_SYMBOL_GPL(vfio_pci_setup_barmap);
> 
> This should instead become a vfio_pci_get_barmap() function that tests
> for an optionally calls vfio_pci_setup_barmap before returning the
> pointer.  I'm now willing to lose the better error returns in the
> original.  Thanks,
>
Got it. will change it.
Thanks!

Yan

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-02-21  6:29 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-11  9:57 [RFC PATCH v3 0/9] Introduce vendor ops in vfio-pci Yan Zhao
2020-02-11 10:10 ` [RFC PATCH v3 1/9] vfio/pci: export vfio_pci_device public and add vfio_pci_device_private Yan Zhao
2020-02-20 21:00   ` Alex Williamson
2020-02-21  6:19     ` Yan Zhao
2020-02-11 10:11 ` [RFC PATCH v3 2/9] vfio/pci: export functions in vfio_pci_ops Yan Zhao
2020-02-11 10:11 ` [RFC PATCH v3 3/9] vfio/pci: register/unregister vfio_pci_vendor_driver_ops Yan Zhao
2020-02-11 10:12 ` [RFC PATCH v3 4/9] vfio/pci: macros to generate module_init and module_exit for vendor modules Yan Zhao
2020-02-11 10:13 ` [RFC PATCH v3 5/9] vfio/pci: let vfio_pci know how many vendor regions are registered Yan Zhao
2020-02-11 10:14 ` [RFC PATCH v3 6/9] vfio/pci: export vfio_pci_setup_barmap Yan Zhao
2020-02-20 21:00   ` Alex Williamson
2020-02-21  6:20     ` Yan Zhao
2020-02-11 10:14 ` [RFC PATCH v3 7/9] samples/vfio-pci: add a sample vendor module of vfio-pci for IGD devices Yan Zhao
2020-02-11 10:15 ` [RFC PATCH v3 8/9] vfio: header for vfio live migration region Yan Zhao
2020-02-11 10:15 ` [RFC PATCH v3 9/9] i40e/vf_migration: vfio-pci vendor driver for VF live migration Yan Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).